MS Data Science Theses

A Modular Approach to Cluster-based Ensemble Learning: Optimizing Subspace Design and Classifier Aggregation

Date of Submission

Fall 2025

Supervisor

Dr. Syed Ali Raza, Assistant Professor, Department of Computer Science

Committee Member 1

Dr. Tariq Mahmood, Examiner – I, Institute of Business Administration (IBA), Karachi, Institute of Business Administration (IBA), Karachi

Committee Member 2

Dr. Syed Farrukh Hasan, Examiner – II, FAST National University

Degree

Master of Science in Data Science

Department

Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Keywords

Ensemble Classifiers, Clustering, Multiple Classifier Systems, Cluster Selection, Classification, Evolutionary Algorithm

Abstract

In ensemble learning, a promising strategy for improving base learners is to train them on different subspaces (subsets) of the dataset. However, generating meaningful and diverse subspaces that lead to strong individual classifiers remains a significant challenge, particularly in the presence of class imbalance within subspaces. Poorly constructed subspaces can produce weak learners that ultimately degrade ensemble performance. This thesis proposes a modular approach to address this. It starts with employing a clustering technique to generate candidate subspaces that capture the intrinsic structure of the data. Next, recognizing that not all clusters are equally informative, it incorporates an evolutionary optimization process to filter out low-quality subspaces and retain only the most promising ones. Furthermore, a second optimization step is applied to explore whether further improvements in ensemble performance can be achieved by selecting an optimal subset of base classifiers from a diverse pool and aggregating complementary models more effectively.

Experiments were conducted on a variety of small and large benchmark datasets. The results show that excluding highly imbalanced or homogenous subspaces from the set of candidate subspaces improves ensemble performance across most datasets. Furthermore, removing Support Vector Machine (SVM)-based weak learners from the classifier pool enhanced computational efficiency without compromising accuracy.

For smaller datasets, Particle Swarm Optimization (PSO) boosted performance when applied to both cluster filtering and classifier selections. In contrast, for larger datasets, Binary PSO for subspace optimization and SHAP-based methods for classifier selection yielded superior results. Notably, for large datasets, the second optimization step—focused on base-classifier selection—did not offer further performance gains; optimal subspace selection alone was sufficient to match or surpass state-of-the-art ensemble methods such as Random Forest and XGBoost.

These findings highlight that while optimization techniques for subspace generation and classifier aggregation are effective, their efficacy varies with dataset size and complexity, and strategies effective on smaller datasets may not generalize well to larger ones.

Document Type

Restricted Access

Submission Type

Thesis

Recommended Citation

Parker, W. (2025). A Modular Approach to Cluster-based Ensemble Learning: Optimizing Subspace Design and Classifier Aggregation (Unpublished Unpublished graduate thesis). Retrieved from https://ir.iba.edu.pk/etd-ms-ds/9

Download

COinS

MS Data Science Theses

A Modular Approach to Cluster-based Ensemble Learning: Optimizing Subspace Design and Classifier Aggregation

Date of Submission

Supervisor

Committee Member 1

Committee Member 2

Degree

Department

Faculty/ School

Keywords

Abstract

Document Type

Submission Type

Recommended Citation

Browse

Search

Author Corner

LINKS

MS Data Science Theses

A Modular Approach to Cluster-based Ensemble Learning: Optimizing Subspace Design and Classifier Aggregation

Student Name

Date of Submission

Supervisor

Committee Member 1

Committee Member 2

Degree

Department

Faculty/ School

Keywords

Abstract

Document Type

Submission Type

Recommended Citation

Share

Browse

Search

Author Corner

LINKS