Master of Science in Data Science


Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Date of Submission

Fall 2022


Dr. Tariq Mahmood, Professor and Program Coordinator MS (CS) and MS (DS) Programs, Department of Computer Science


In recent years, the E-commerce domain has gained a lot of relevance especially in the post covid world. This has led to the increase in emphasis on the use of data analytics. However due to the high volume and variety of data being generated, traditional storage and analytical methodologies fail to cater the demands. This project presents a prototype concept of Data lake to perform analytics. The data lake is developed using Apache Hive which is based on the Hadoop distributed file system (HDFS). Data from the lake is then Extracted, Transformed and Loaded (ETL) into the Data warehouse. The warehouse is then connected to multiple analytical tools. This architecture has been demonstrated using docker containers connected via a bridge network

Document Type

Restricted Access

Submission Type

Research Project


Media is loading

Available for download on Saturday, October 31, 2026

The full text of this document is only accessible to authorized users.