Degree
Master of Science in Data Science
Department
Department of Computer Science
Faculty/ School
School of Mathematics and Computer Science (SMCS)
Date of Submission
Fall 2023
Supervisor
Dr. Tariq Mahmood, Professor and Program Coordinator MS (CS) and MS (DS) Programs, Department of Computer Science
Abstract
In modern era, speed at which data is being generated is exponentially increasing. Industries are looking for ways to not only sustain and store Big Data but also find ways to process and find meaningful results which can aid in the business decision making process. E-commerce is an example of such an industry. The solution provided in this project allows data ingestion of raw data dump to HDFS HIVE Data Lake which holds the capability to hold large amounts of data in its raw form which can then be used to derive insights for analytical purposes. Docker containers allow to run multiple Big Data components on a single machine. It is a lightweight mechanism in contrast to virtual machines which requires a lot more resources. Multiple Docker containers communicate with each other through a bridge network. This lightweight feature allows various advantages like cost savings, efficiency, faster configurations, maintainability, application isolation and lower dependability.
Document Type
Restricted Access
Submission Type
Research Project
Recommended Citation
Hasan, M. U. (2023). Creating a Big Data pipeline for E-commerce data (Unpublished graduate research project). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/research-projects-msds/2
Loading...
The full text of this document is only accessible to authorized users.