Degree
Master of Science in Data Science
Department
Department of Computer Science
Faculty/ School
School of Mathematics and Computer Science (SMCS)
Date of Submission
Fall 2022
Supervisor
Dr. Tariq Mehmood, Professor, Department of Computer Science, School of Mathematics and Computer Science (SMCS)
Keywords
Abstract
The project discusses the implementation of real-time data ingestion with integration of metadata analytics. With businesses growing rapidly and demanding quick data availability, this project will equip data analyst and data scientists with readily available data for querying along with information about the ingested data such as data volume, definitions, field types, joins and schemas.
Data analytics is growing at a rapid pace (Spelluru 2023). Organizations generate a vast amount of data daily and need it analyzed to take business decisions. The increase in digitized solutions has led to a rapid increase in diverse datasets which later bring challenges in data ingestion, data storage and data structuring (Databricks 2022). The project discussed in this report is a proof of concept for the implementation of real-time data ingestion with metadata monitoring using Apache Kafka and Lyft’s Amundsen. The language used for coding is Python and SQL. This project aims to equip data scientists, data analysts and business users with information about raw data ingested in real-time, stored and formatted in a structured form in on-premises databases.
Document Type
Restricted Access
Submission Type
Research Project
Recommended Citation
Owais, R. (2022). Real-Time data ingestion with Metadata Management using Apache Kafka and Amundsen by Lyft (Unpublished graduate research project). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/research-projects-msds/15
The full text of this document is only accessible to authorized users.