Master of Science in Data Science
Department of Computer Science
School of Mathematics and Computer Science (SMCS)
Date of Submission
Dr. Tariq Mahmood, Professor and Program Coordinator MS (CS) and MS (DS) Programs, Department of Computer Science
In recent years, the E-commerce domain has gained a lot of relevance especially in the post covid world. This has led to the increase in emphasis on the use of data analytics. However due to the high volume and variety of data being generated, traditional storage and analytical methodologies fail to cater the demands. This project presents a prototype concept of Data lake to perform analytics. The data lake is developed using Apache Hive which is based on the Hadoop distributed file system (HDFS). Data from the lake is then Extracted, Transformed and Loaded (ETL) into the Data warehouse. The warehouse is then connected to multiple analytical tools. This architecture has been demonstrated using docker containers connected via a bridge network
Haqqani, S. (2022). Development of Analytics Pipeline for E-commerce (Unpublished graduate research project). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/research-projects-msds/6
The full text of this document is only accessible to authorized users.