Degree
Master of Science in Data Science
Department
Department of Computer Science
Faculty/ School
School of Mathematics and Computer Science (SMCS)
Date of Submission
Winter 2022
Supervisor
Dr. Tariq Mahmood, Professor, Department of Computer Science, School of Mathematics and Computer Science (SMCS)
Keywords
Big Data, Apache, Metadata Management, Data Ingestion, Pipeline, Databases, Python, Real-Time, API
Abstract
Data analytics is growing at a rapid pace. Organizations generate a vast amount of data daily and need it analyzed to take business decisions. The increase in digitized solutions has led to a rapid increase in diverse datasets which later bring challenges in data ingestion, data storage and data structuring. The project discussed in this report is a proof of concept for the implementation of real-time data ingestion with metadata monitoring using Apache Kafka and Lyft’s Amundsen. The language used for coding is python and SQL. This project aims to equip data scientists, data analysts and business users with information about raw data ingested in real-time, stored and formatted in a structured form in on-premises databases.
Document Type
Restricted Access
Submission Type
Research Project
Recommended Citation
Owais, R. (2022). Real-Time Data Ingestion with Metadata Management using Apache Kafka and Amundsen by Lyft (Unpublished graduate research project). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/research-projects-msds/9
The full text of this document is only accessible to authorized users.