Date of Submission

Spring 2025

Supervisor

Dr. Sajjad Haider, Professor, Department of Computer Science, Institute of Business Administration, Karachi

Committee Member 1

Dr. Tariq Mahmood, Examiner – I, Institute of Business Administration (IBA), Karachi, Institute of Business Administration (IBA), Karachi

Committee Member 2

Dr. Tahir Syed, Examiner – II, Institute of Business Administration (IBA), Karachi, Institute of Business Administration (IBA), Karachi

Degree

Master of Science in Data Science

Department

Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Keywords

Automated ICD Coding, Healthcare, LLMs, MedCAT, Model Performance

Abstract

Automated ICD-10 coding in healthcare struggles with efficiently and accurately classifying EHRs. Traditional methods have limitations, and the potential of LLMs is still underexplored. This research explored the efficiency of quantized LLMs in automated ICD-10 coding, emphasizing the impact of fine-tuning techniques such as QLoRA and LoRA. Our study utilized the benchmark MIMIC dataset for comprehensive evaluation, comparing various methods, including fine-tuned LLMs, vanilla LLMs with few-shot and zero-shot capabilities, and baseline NER method like MedCAT. Experiments revealed that unquantized LLMs, particularly those with larger parameters like LLaMA-3-70B, exhibit high accuracy in zero-shot and few-shot scenarios, showcasing their inherent understanding of medical language and code structure. However, fine-tuning quantized LLMs, despite their potential for task-specific learning, did not surpass the performance of unquantized counterparts. This suggested that optimizing hyperparameters and finetuning strategies for large, quantized models is crucial for achieving optimal results. The research identified key areas for future investigation, including exploration of hyperparameter combinations, utilization of larger and more diverse training datasets, and extending fine-tuning processes for more epochs. Additionally, incorporating explainability techniques would enhance transparency and build trust in the model's decision-making. Advancing these areas can make automated ICD-10 coding more accurate, efficient, and reliable, improving patient care, billing, and healthcare workflows.

Document Type

Restricted Access

Submission Type

Thesis

Included in

Data Science Commons

Share

COinS