Master of Science in Data Science
Department of Computer Science
School of Mathematics and Computer Science (SMCS)
Date of Submission
Dr. Tariq Mahmood, Professor and Program Coordinator MS (CS) and MS (DS) Programs, Department of Computer Science
The question that this application aims to answer is what the attributes of a certain segment of customers are who tend to behave in a particular way for example what attributes will define churners for a business, what attributes will define the potential buyers for a newly launched product, and what attributes will define the potential customers who are likely to be influenced by a certain marketing campaign. These are questions that every business needs to answer over its product life cycle so it can prevent losing customers, stay competitive by targeting the right customer market and successfully penetrate new markets. This tool will be able to enrich the analysis of a business and will help it to understand its customer base in depth which will aid the business in setting up the right business strategies to attract new customers, to retain existing customers, to offer new products, to offer personalized products, and to target the right channels for content delivery. It is a machine learning based segmentation solution developed using python and streamlit to perform automated clustering and cluster analysis on a provided dataset. The user needs to select a target variable and the application is calibrated as such that the target variable should be discrete. The main purpose to keep the target variable discrete is to understand the attributes which define a particular class label in detail through segmentation. An elbow plot is plotted based on the selected features of the user. The elbow plot provides a guide to the ideal number of clusters based on the selected features. Then the silhouette score and the Calinski score is computed for 6 of the clustering methods K Means, Birch, Gaussian Mixture, Agglomerative and Hierarchical. The best method can be then selected to take the clustering results further for analysis. A complete set of visuals are provided to compare the clusters with each other and understand the composition of attributes which define them. Another aspect of clustering which can be performed on customer transactional data is covered using the frequency monetary technique in the RFM dashboard where the customers are segmented into three broad categories based on their purchase history and creation of the RFM features which are created using the datetime data, transactional amount and customer id upon which the clustering is then performed using the clustering methods that are used on demographical data as discussed earlier.
Rehan, A. (2022). Persona development using Clutser Analysis in Python (Unpublished graduate research project). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/research-projects-msds/5
Available for download on Saturday, October 31, 2026
The full text of this document is only accessible to authorized users.