Degree
Master of Science in Data Science
Department
Department of Computer Science
Faculty/ School
School of Mathematics and Computer Science (SMCS)
Date of Submission
Fall 2022
Supervisor
Dr. Tariq Mehmood, Professor, Department of Computer Science, School of Mathematics and Computer Science (SMCS)
Keywords
Mahalanobis Distance, Gaussian Mixture Model, Streamlit, Clustering, Gaussian Of Interest, Probability Density Function, Likelihood
Abstract
Anomaly detection is an important task in various fields, such as cybersecurity, financial fraud detection, and manufacturing quality control. The Gaussian mixture model (GMM) is an effective technique used for anomaly detection in multivariate data spaces due to its ability to estimate complex distributions. In this work, we propose an anomaly detection framework using GMM for detecting abnormal data points in highly imbalanced data. Our method entails training a GMM on the inlier's data and then using the trained model to find the Gaussian distribution with the maximum likelihood for outliers. Furthermore, we validate the estimated Gaussian distribution by measuring the Average statistical distance (Mahalanobis distance) between outliers and all trained Gaussians. We used different datasets from the UCI machine learning repository to validate our proposed methodology for anomaly detection. At last, we built a user-friendly interactive web app using Streamlit integrated with five solved anomaly detection problems of various domains to get a real-time experience of the framework.
Document Type
Restricted Access
Submission Type
Research Project
Recommended Citation
Bari, A. (2022). Anomaly detection using GMM (Unpublished graduate research project). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/research-projects-msds/13
The full text of this document is only accessible to authorized users.