Master of Science in Computer Science

Faculty / School

Faculty of Computer Sciences (FCS)


Department of Computer Science

Date of Submission



Dr. Quratulain Nizamuddin Rajput, Assistant Professor, Institute of Business Administration (IBA), Karachi

Project Type

MSCS Survey Report


Author attribution has been a challenging area of research for many years. It is very helpful in identifying the author of an unknown document. Using features extracted from the documents with known authors we try and match the features with features extracted from document with unknown author. But during the last few years’ technology has substantially advanced. Many new software’s invented, many new algorithms created.

Research advances in areas of information retrieval, natural language processing and machine learning have helped in making the author attribution easier. So the author attribution from the past has now comparably improved in terms of accuracy and is now used in many different areas like plagiarism detection and others.

The main focus of the paper is mainly on the approaches used for author attribution. In this research we will discuss various author attribution approaches that have been discovered in recent years. Furthermore, discuss their merits and demerits.


The research paper is different approaches for author attributions have been presented. Author attribution is well known problem and many researchers have worked towards solving this problem. We have discussed the four basic steps for author attribution. The first step is getting the dataset. The next step is generating the features dataset which include lexical features (word frequencies, word length, and sentence length), character features (character types, and character n-grams), syntactic features (PoS, sentence structure, and rewrite rule frequencies), semantic features (synonyms), and application specific features (content specific and language specific). The next step is using the classifier to generate model and lastly to test the model.

The full text of this document is only accessible to authorized users.