Degree
Master of Science in Computer Science
Department
Department of Computer Science
School
School of Mathematics and Computer Science (SMCS)
Date of Submission
Spring 2022
Supervisor
Dr. Sajjad Haider, Professor, Department of Computer Science, School of Mathematics and Computer Science (SMCS)
Keywords
Natural Language Processing, Text Summarization, Urdu Language Processing, Sentence Weight Algorithm, Weighted Term Frequency, Term Frequency Inverse Document Frequency (TD-IDF), mT5-multilingual-XLSum
Abstract
Text summarization is a formidable challenge in Natural Language Processing (NLP) because it requires precise text analysis, such as semantic and lexical analysis, to produce a good summary. A good summary must contain valuable information and must be concise while considering aspects such as non-redundancy, relevance, coverage, coherence, and readability.
A lot of research, time, effort, and funding has been invested in the English language, as it is the global language for communication, but not so much in low resource languages like Urdu. This project intends to develop an application that addresses this problem. It also provides Parts of Speech (POS) tagging, which would help users understand the language better. Additionally, it has applications in several industries, for example, newspaper summarization, microblog/tweet summarization, book summarization, biomedical/legal/research documents summarization and so on.
Document Type
Restricted Access
Submission Type
Research Project
Recommended Citation
Hassan, Nabeel. "Urdu Text Summarization using Machine Learning." Unpublished graduate research project. Institute of Business Administration. 2022. https://ir.iba.edu.pk/research-projects-mscs/10
The full text of this document is only accessible to authorized users.