Student Name

Ayman RehanFollow

Degree

Master of Science in Data Science

Department

Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Date of Submission

Fall 2022

Supervisor

Dr. Tariq Mahmood, Professor and Program Coordinator MS (CS) and MS (DS) Programs, Department of Computer Science

Abstract

The question that this application aims to answer is what the attributes of a certain segment of customers are who tend to behave in a particular way for example what attributes will define churners for a business, what attributes will define the potential buyers for a newly launched product, and what attributes will define the potential customers who are likely to be influenced by a certain marketing campaign. These are questions that every business needs to answer over its product life cycle so it can prevent losing customers, stay competitive by targeting the right customer market and successfully penetrate new markets. This tool will be able to enrich the analysis of a business and will help it to understand its customer base in depth which will aid the business in setting up the right business strategies to attract new customers, to retain existing customers, to offer new products, to offer personalized products, and to target the right channels for content delivery. It is a machine learning based segmentation solution developed using python and streamlit to perform automated clustering and cluster analysis on a provided dataset. The user needs to select a target variable and the application is calibrated as such that the target variable should be discrete. The main purpose to keep the target variable discrete is to understand the attributes which define a particular class label in detail through segmentation. An elbow plot is plotted based on the selected features of the user. The elbow plot provides a guide to the ideal number of clusters based on the selected features. Then the silhouette score and the Calinski score is computed for 6 of the clustering methods K Means, Birch, Gaussian Mixture, Agglomerative and Hierarchical. The best method can be then selected to take the clustering results further for analysis. A complete set of visuals are provided to compare the clusters with each other and understand the composition of attributes which define them. Another aspect of clustering which can be performed on customer transactional data is covered using the frequency monetary technique in the RFM dashboard where the customers are segmented into three broad categories based on their purchase history and creation of the RFM features which are created using the datetime data, transactional amount and customer id upon which the clustering is then performed using the clustering methods that are used on demographical data as discussed earlier.

Document Type

Restricted Access

Submission Type

Research Project

Available for download on Saturday, October 31, 2026

The full text of this document is only accessible to authorized users.

Share

COinS