Date of Submission
Spring 2025
Supervisor
Dr. Xiaorui Jiang, Lecturer, Information School, University of Sheffield
Co-Supervisor
Dr. Sajjad Haider, Co-Supervisor, Institute of Business Administration (IBA), Karachi
Committee Member 1
Dr. Sajjad Haider, Co-Supervisor, Department of Computer Science, Institute of Business Administration, Karachi
Committee Member 2
Dr. Atif Tahir, Examiner – I Department of Computer Science, Institute of Business Administration (IBA), Karachi
Committee Member 3
Dr. Tahir Syed, Examiner – II Department of Computer Science, Institute of Business Administration (IBA), Karachi
Degree
Master of Science in Data Science
Department
Department of Computer Science
Faculty/ School
School of Mathematics and Computer Science (SMCS)
Keywords
Sequential Sentence Classification, Medical Evidence Extraction, Hierarchical Models, Rhetorical Role Classification, Software Requirement Specifications, Automated Medical Coding
Abstract
This thesis presents a novel hierarchical sequential classification model designed to address challenges in processing long and complex documents across multiple domains. The model has been applied to two distinct tasks: medical evidence extraction and software requirement classification. Both tasks involve binary classification, with the additional complexity of paired sentence classification for the software requirements dataset. For the medical domain, the study utilizes a subset of the MIMIC-III dataset, employing domain-specific BERT variants to extract textual evidence associated with International Classification of Diseases (ICD) codes. In the software engineering domain, the DRIP dataset is used to classify and analyze software requirements, leveraging paired sentence classification to identify semantic relationships between requirement pairs. Domain-specific BERT models, such as BioBERT, ClinicalBERT, and others, were fine-tuned for each dataset to improve task-specific performance. The hierarchical architecture allows the model to integrate sentence-level and document-level representations, effectively capturing both local and global context for sequential sentence classification tasks. Results demonstrate the model's ability to achieve decent accuracy in both domains, underscoring the potential of hierarchical approaches combined with domain-specific pretrained language models. This work contributes to advancing the state-of-the-art in document-level sequence classification by bridging gaps in medical and software engineering applications.
Document Type
Restricted Access
Submission Type
Thesis
Recommended Citation
Khan, K. (2025). Hierarchical Model for Context-Aware Sentence Classification Across Domains (Unpublished Unpublished graduate thesis). Retrieved from https://ir.iba.edu.pk/etd-ms-ds/11
