Degree
Master of Science in Data Science
Department
Department of Computer Science
Faculty/ School
School of Mathematics and Computer Science (SMCS)
Date of Submission
Fall 2023
Supervisor
Dr. Sajjad Haider, Professor, Department of Computer Science, Institute of Business Administration, Karachi
Keywords
Transformers, BERT, Deep Learning, Medical Coding, ICD-10, MEDCAT, SNOMED-CT, Natural Language Processing, Text Cleaning, NER
Abstract
This project delves into the complexities of medical coding within the healthcare sector, specifically focusing on the International Classification of Diseases, Tenth Revision (ICD-10) coding system. The main objective is to conduct a thorough comparative analysis between various pre-trained language models and MEDCAT's SNOMED-CT model for multi-label classification of clinical notes. Leveraging the MIMIC IV dataset, the project primarily evaluates methodologies using the shortest 1,000 clinical notes, predicting the initial three letters of the ICD-10 code (category). Multiple methods are scrutinized, each benchmarked against MEDCAT's SNOMED-CT. The findings underscore the consistent performance of SNOMED-CT, achieving a notable Macro-F1 score of 0.218. Conversely, BERT, the most successful transformer-based approach, attains a noteworthy Macro-F1 score of 0.123. Despite its modest performance, this result is significant, given the considerable number of predicted classes—around 24,000 for the overall MIMIC IV dataset and 700 for the testing dataset. Moreover, compared to related works, both high-performing approaches exhibit superior metrics. This comparative analysis yields valuable insights into the efficacy of various pre-trained language models alongside MEDCAT's SNOMED-CT when mapping clinical notes to ICD-10 codes, showcasing their proficiency in handling diverse text sizes and styles.
Document Type
Restricted Access
Submission Type
Research Project
Recommended Citation
Rizwan, S. (2023). Mapping ICD-10 Codes to Clinical Notes: A comparative analysis of Pre-Trained Language Models and SNOMED-CT MEDCAT (Unpublished graduate research project). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/research-projects-msds/17
Loading...
The full text of this document is only accessible to authorized users.