MSDS Research Projects

Mapping ICD-10 Codes to Clinical Notes: A comparative analysis of Pre-Trained Language Models and SNOMED-CT MEDCAT

Syed Bilal RizwanFollow

Degree

Master of Science in Data Science

Department

Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Date of Submission

Fall 2023

Supervisor

Dr. Sajjad Haider, Professor, Department of Computer Science, Institute of Business Administration, Karachi

Keywords

Transformers, BERT, Deep Learning, Medical Coding, ICD-10, MEDCAT, SNOMED-CT, Natural Language Processing, Text Cleaning, NER

Abstract

This project delves into the complexities of medical coding within the healthcare sector, specifically focusing on the International Classification of Diseases, Tenth Revision (ICD-10) coding system. The main objective is to conduct a thorough comparative analysis between various pre-trained language models and MEDCAT's SNOMED-CT model for multi-label classification of clinical notes. Leveraging the MIMIC IV dataset, the project primarily evaluates methodologies using the shortest 1,000 clinical notes, predicting the initial three letters of the ICD-10 code (category). Multiple methods are scrutinized, each benchmarked against MEDCAT's SNOMED-CT. The findings underscore the consistent performance of SNOMED-CT, achieving a notable Macro-F1 score of 0.218. Conversely, BERT, the most successful transformer-based approach, attains a noteworthy Macro-F1 score of 0.123. Despite its modest performance, this result is significant, given the considerable number of predicted classes—around 24,000 for the overall MIMIC IV dataset and 700 for the testing dataset. Moreover, compared to related works, both high-performing approaches exhibit superior metrics. This comparative analysis yields valuable insights into the efficacy of various pre-trained language models alongside MEDCAT's SNOMED-CT when mapping clinical notes to ICD-10 codes, showcasing their proficiency in handling diverse text sizes and styles.

Document Type

Restricted Access

Submission Type

Research Project

Recommended Citation

Rizwan, S. (2023). Mapping ICD-10 Codes to Clinical Notes: A comparative analysis of Pre-Trained Language Models and SNOMED-CT MEDCAT (Unpublished graduate research project). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/research-projects-msds/17

Media is loading

Download

The full text of this document is only accessible to authorized users.

COinS

MSDS Research Projects

Mapping ICD-10 Codes to Clinical Notes: A comparative analysis of Pre-Trained Language Models and SNOMED-CT MEDCAT

Degree

Department

Faculty/ School

Date of Submission

Supervisor

Keywords

Abstract

Document Type

Submission Type

Recommended Citation

Browse

Search

Author Corner

LINKS

MSDS Research Projects

Mapping ICD-10 Codes to Clinical Notes: A comparative analysis of Pre-Trained Language Models and SNOMED-CT MEDCAT

Student Name

Degree

Department

Faculty/ School

Date of Submission

Supervisor

Keywords

Abstract

Document Type

Submission Type

Recommended Citation

Share

Browse

Search

Author Corner

LINKS