Student Name

Rabiya OwaisFollow

Degree

Master of Science in Data Science

Department

Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Date of Submission

Fall 2022

Supervisor

Dr. Tariq Mehmood, Professor, Department of Computer Science, School of Mathematics and Computer Science (SMCS)

Abstract

The project discusses the implementation of real-time data ingestion with integration of metadata analytics. With businesses growing rapidly and demanding quick data availability, this project will equip data analyst and data scientists with readily available data for querying along with information about the ingested data such as data volume, definitions, field types, joins and schemas.

Data analytics is growing at a rapid pace (Spelluru 2023). Organizations generate a vast amount of data daily and need it analyzed to take business decisions. The increase in digitized solutions has led to a rapid increase in diverse datasets which later bring challenges in data ingestion, data storage and data structuring (Databricks 2022). The project discussed in this report is a proof of concept for the implementation of real-time data ingestion with metadata monitoring using Apache Kafka and Lyft’s Amundsen. The language used for coding is Python and SQL. This project aims to equip data scientists, data analysts and business users with information about raw data ingested in real-time, stored and formatted in a structured form in on-premises databases.

Document Type

Restricted Access

Submission Type

Research Project

Available for download on Saturday, October 31, 2026

The full text of this document is only accessible to authorized users.

Share

COinS