All Theses and Dissertations
Degree
Doctor of Philosophy in Computer Science
Faculty / School
School of Mathematics and Computer Science (SMCS)
Department
Department of Computer Science
Date of Award
Spring 2026
Advisor
Dr. Tariq Mahmood, Professor, School of Mathematics and Computer Science (SMCS)
Committee Member 1
Dr. Sohail Asghar, Examiner – I, COMSATS Islamabad
Committee Member 2
Dr. Ahmer Rashid, Examiner – II, GIKI
Committee Member 3
Dr. Muhammad Atif Tahir, Program Coordinator, Graduate & Postgraduate Programs (CS), IBA Karachi
Project Type
Dissertation
Access Type
Restricted Access
Document Version
Final
Pages
xiv, 188
Keywords
Concept Drift, Machine Learning, Autoencoder, Data Streams, Deep Learning, Drift Detection, Drift Adaptation
Subjects
Artificial Intelligence, Computer Science, Data Science
Abstract
This research work addresses the problem of unsupervised concept drift detection i.e., drift detection without the need of truth labels with reduced false alarms. To address this, we established an autoencoder based drift detection framework (which can be followed by any standard drift adaptation mechanism) for machine learning based classification problems in data streams. In streaming data environments, data characteristics and probability distributions are likely to change over time, causing a phenomenon called concept drift, which poses challenges for machine learning models to predict accurately. In such non-stationary environments, there is a need to detect concept drift and update the model to maintain an acceptable predictive performance. Existing approaches to drift detection have inherent problems like requirements of truth labels in supervised detection methods and high false positive rate in case of unsupervised drift detection. This research presents a novel semi-supervised Autoencoder based Drift Detection Method (AEDDM) aimed at detecting drift without the need of truth labels with reduced false alarms.
The developed AEDDM method works in a batch mode and has three architectural components; an offline component (training phase) where two autoencoders are trained on labelled data to learn the data distribution of each class and two different thresholds namely batch threshold and count threshold are computed from the reconstruction error values of the validation data; an ensemble component which defines the sequential order of the autoencoders; and an online component where data arrives in batches and drift detection is performed for the whole batch data stream by comparing changes in reconstruction loss values with thresholds learned in the offline training phase.
AEDDM is considered as a semi-supervised drift detection method since it leverages both labelled and unlabeled data in its complete framework. While labeled training data is required in the initial training of the autoencoders during the offline phase, there is no need for class labels in online detection phase. Although the drift is detected in a completely unsupervised way in online detection phase, considering the whole framework, the method is considered as a semi-supervised drift detection method.
The AEDDM method is assessed on a combination of four synthetic and four real-world datasets, which exhibited both sudden and gradual changes in the data distribution. To evaluate the method's effectiveness, it was tested on seven popular batch classifiers and a Hoeffding’s Tree classifier in an online learning setting. The results indicate that AEDDM accurately identifies distributional changes that are likely to degrade classifier performance xiv (real drift), while disregarding irrelevant changes (virtual drift). AEDDM demonstrated drift detection with zero delay in 6 out of 8 datasets while with a delay of up to 4 batches in 2 datasets. In the case of real world datasets and real drift (both sudden and gradual) AEDDM detected drift in all four datasets while in the case of virtual drift (both sudden and gradual) it ignored the drift in 6 out of 8 cases. In operational scenario, AEDDM outperformed four methods namely ADD, DD-SAPH, Prequential HT and No-Update in all four real-world datasets while outperformed KS-Test in three out of four datasets and shared the best rank in one dataset. This ability to detect both sudden and gradual drifts, distinguish between real and virtual drifts, coupled with its adaptability to changing data distributions (based on adaptation), makes AEDDM a valuable tool for maintaining classifier performance in dynamic environments.
Within the field of drift detection, AEDDM is a novel and a comprehensive work that leverages the power of deep learning specifically autoencoders. It is designed considering the characteristics of an ideal drift detector after careful review of supervised, semi-supervised, unsupervised, and deep learning-based techniques. It is probably the first method that integrates the best part of each method; the detected drift through AEDDM is real as incase of supervised drift detection methods, available labelled data is fully leveraged for autoencoder’s training and threshold computations like in semi-supervised drift detection methods, drift detection is done in completely unsupervised way similar to unsupervised drift detection methods, and the power of deep learning is harnessed to process multidimensional data eliminating the needs of any feature selection or dimensionality reduction.
Recommended Citation
Ali, U. (2026). A Framework for Concept Drift Detection and Adaptation for Classification Problems in Data Streams (Unpublished doctoral dissertation). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/etd/98
The full text of this document is only accessible to authorized users.
