Master of Science in Data Science
Department of Computer Science
School of Mathematics and Computer Science (SMCS)
Date of Submission
Dr. Tariq Mahmood, Professor and Program Coordinator MS (CS) and MS (DS) Programs, Department of Computer Science
In modern era, speed at which data is being generated is exponentially increasing. Industries are looking for ways to not only sustain and store Big Data but also find ways to process and find meaningful results which can aid in the business decision making process. E-commerce is an example of such an industry. The solution provided in this project allows data ingestion of raw data dump to HDFS HIVE Data Lake which holds the capability to hold large amounts of data in its raw form which can then be used to derive insights for analytical purposes. Docker containers allow to run multiple Big Data components on a single machine. It is a lightweight mechanism in contrast to virtual machines which requires a lot more resources. Multiple Docker containers communicate with each other through a bridge network. This lightweight feature allows various advantages like cost savings, efficiency, faster configurations, maintainability, application isolation and lower dependability.
Hasan, M. U. (2023). Creating a Big Data pipeline for E-commerce data (Unpublished graduate research project). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/research-projects-msds/2
Available for download on Saturday, October 31, 2026
The full text of this document is only accessible to authorized users.