Master of Science in Data Science


Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Date of Submission

Fall 2023


Dr. Tariq Mahmood, Professor and Program Coordinator MS (CS) and MS (DS) Programs, Department of Computer Science


In modern era, speed at which data is being generated is exponentially increasing. Industries are looking for ways to not only sustain and store Big Data but also find ways to process and find meaningful results which can aid in the business decision making process. E-commerce is an example of such an industry. The solution provided in this project allows data ingestion of raw data dump to HDFS HIVE Data Lake which holds the capability to hold large amounts of data in its raw form which can then be used to derive insights for analytical purposes. Docker containers allow to run multiple Big Data components on a single machine. It is a lightweight mechanism in contrast to virtual machines which requires a lot more resources. Multiple Docker containers communicate with each other through a bridge network. This lightweight feature allows various advantages like cost savings, efficiency, faster configurations, maintainability, application isolation and lower dependability.

Document Type

Restricted Access

Submission Type

Research Project


Media is loading

Available for download on Saturday, October 31, 2026

The full text of this document is only accessible to authorized users.