Degree

Master of Science in Data Science

Department

Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Date of Submission

Fall 2023

Supervisor

Dr. Tariq Mahmood, Professor and Program Coordinator MS (CS) and MS (DS) Programs, Department of Computer Science

Abstract

In modern era, speed at which data is being generated is exponentially increasing. Industries are looking for ways to not only sustain and store Big Data but also find ways to process and find meaningful results which can aid in the business decision making process. E-commerce is an example of such an industry. The solution provided in this project allows data ingestion of raw data dump to HDFS HIVE Data Lake which holds the capability to hold large amounts of data in its raw form which can then be used to derive insights for analytical purposes. Docker containers allow to run multiple Big Data components on a single machine. It is a lightweight mechanism in contrast to virtual machines which requires a lot more resources. Multiple Docker containers communicate with each other through a bridge network. This lightweight feature allows various advantages like cost savings, efficiency, faster configurations, maintainability, application isolation and lower dependability.

Document Type

Restricted Access

Submission Type

Research Project

Loading...

Media is loading
 

Available for download on Saturday, October 31, 2026

The full text of this document is only accessible to authorized users.

Share

COinS