Date of Submission
Spring 2025
Supervisor
Dr. Sajjad Haider, Professor, Department of Computer Science, Institute of Business Administration, Karachi
Committee Member 1
Dr. Tariq Mahmood, Examiner – I, Institute of Business Administration (IBA), Karachi, Institute of Business Administration (IBA), Karachi
Committee Member 2
Dr. Tahir Syed, Examiner – II, Institute of Business Administration (IBA), Karachi, Institute of Business Administration (IBA), Karachi
Degree
Master of Science in Data Science
Department
Department of Computer Science
Faculty/ School
School of Mathematics and Computer Science (SMCS)
Keywords
Automated ICD Coding, Healthcare, LLMs, MedCAT, Model Performance
Abstract
Automated ICD-10 coding in healthcare struggles with efficiently and accurately classifying EHRs. Traditional methods have limitations, and the potential of LLMs is still underexplored. This research explored the efficiency of quantized LLMs in automated ICD-10 coding, emphasizing the impact of fine-tuning techniques such as QLoRA and LoRA. Our study utilized the benchmark MIMIC dataset for comprehensive evaluation, comparing various methods, including fine-tuned LLMs, vanilla LLMs with few-shot and zero-shot capabilities, and baseline NER method like MedCAT. Experiments revealed that unquantized LLMs, particularly those with larger parameters like LLaMA-3-70B, exhibit high accuracy in zero-shot and few-shot scenarios, showcasing their inherent understanding of medical language and code structure. However, fine-tuning quantized LLMs, despite their potential for task-specific learning, did not surpass the performance of unquantized counterparts. This suggested that optimizing hyperparameters and finetuning strategies for large, quantized models is crucial for achieving optimal results. The research identified key areas for future investigation, including exploration of hyperparameter combinations, utilization of larger and more diverse training datasets, and extending fine-tuning processes for more epochs. Additionally, incorporating explainability techniques would enhance transparency and build trust in the model's decision-making. Advancing these areas can make automated ICD-10 coding more accurate, efficient, and reliable, improving patient care, billing, and healthcare workflows.
Document Type
Restricted Access
Submission Type
Thesis
Recommended Citation
Khan, F. H. (2025). Performance Comparison of Quantized Large Language Models in ICD-10 Medical Coding (Unpublished Unpublished graduate thesis). Retrieved from https://ir.iba.edu.pk/etd-ms-ds/8