Degree

Master of Science in Computer Science

Faculty / School

Faculty of Computer Sciences (FCS)

Department

Department of Computer Science

Date of Submission

2020-06-30

Advisor

Dr. Sajjad Haider, Professor and Chairperson, Department of Computer Science, Institute of Business Administration (IBA), Karachi

Project Type

MSCS Survey Report

Abstract

Named Entity Recognition (NER) aims to find mentions from text belonging to predefined semantic types like a person, location, organization and others. NER not only acts as a standalone tool for information extraction but also plays a vital role in natural language processing applications including but not limited to text understanding, information retrieval, automatic text summarization, question answering computational linguistics and Personally identifiable information (PII) discovery.

This survey aims to summarize some of the popular techniques developed for Named Entity Recognition. Chapter 1 gives a brief introduction on NER, Chapter 2 explores the Maximum Entropy (ME) models and its usage in NER, Chapter 3 focuses on Hidden Markov Models (HMM). Chapters 4 and 5 describes the Neural models-based techniques, which is the most recent advancement in this field. The work in Chapter 2 - 4 focuses on the English language while in Chapter 5, we discus some of the research done for the Urdu NER.

Notes

Named Entity Recognition (NER) is used to find names belonging to predefined semantic types like person, location, organization or some custom entities which are domain specific.

In this survey, we went through multiple approaches to perform NER on English and Urdu languages. We started from ME based approach, explored HMM based approach, and finally went through utilizing deep learning to solve this problem. For English language, the work present is in abundance but for Urdu, the work is limited and even the training data is limited. With advancement from rule based NER approaches, and gradually shifting towards machine learned approaches have both the efficacy and efficiency of NER, but there still are some challenges remaining for languages like Urdu.

The full text of this document is only accessible to authorized users.

Share

COinS