Student Name

Syed Hamza JamilFollow

Degree

Master of Science in Computer Science

Department

Department of Computer Science

School

School of Mathematics and Computer Science (SMCS)

Date of Submission

Fall 2024

Supervisor

Dr. Tariq Mahmood, Professor and Program Coordinator MS(CS) and MS(DS) Programs, School of Mathematics and Computer Science (SMCS)

Keywords

Data Warehousing, Data lake, Data Transformation, Data Model, Data Extraction, Redshift, Airflow, Business Intelligence, ETL, Data Automation

Abstract

This project aims to develop a scalable data warehouse solution that significantly reduces the cost of ETL extract, transform and load processes by using Apache Airflow both to write complex ETL workflows and to automate them. The solution integrates Amazon Redshift as a central data repository, Amazon S3 used as a data lake, and Power BI to develop interactive dashboards and data visualizations. By replacing expensive ETL tools with the open source Apache Airflow ETL tool or orchestration functionality, this project provides a cost-effective and fully automated solution for processing large volumes of data. It is developed for businesses in different fields, such as telecommunications, retail and the financial sector, helping them make data-driven decisions while reducing costs.

Keywords: Data Warehousing, Data lake, Data Transformation, Data Model, Data Extraction, Redshift, Airflow, Business Intelligence, ETL, Data Automation

Document Type

Restricted Access

Submission Type

Research Project

Loading...

Media is loading
 

The full text of this document is only accessible to authorized users.

Share

COinS