Degree
Master of Science in Computer Science
Department
Department of Computer Science
School
School of Mathematics and Computer Science (SMCS)
Date of Submission
Fall 2024
Supervisor
Dr. Tariq Mahmood, Professor and Program Coordinator MS(CS) and MS(DS) Programs, School of Mathematics and Computer Science (SMCS)
Keywords
Data Warehousing, Data lake, Data Transformation, Data Model, Data Extraction, Redshift, Airflow, Business Intelligence, ETL, Data Automation
Abstract
This project aims to develop a scalable data warehouse solution that significantly reduces the cost of ETL extract, transform and load processes by using Apache Airflow both to write complex ETL workflows and to automate them. The solution integrates Amazon Redshift as a central data repository, Amazon S3 used as a data lake, and Power BI to develop interactive dashboards and data visualizations. By replacing expensive ETL tools with the open source Apache Airflow ETL tool or orchestration functionality, this project provides a cost-effective and fully automated solution for processing large volumes of data. It is developed for businesses in different fields, such as telecommunications, retail and the financial sector, helping them make data-driven decisions while reducing costs.
Keywords: Data Warehousing, Data lake, Data Transformation, Data Model, Data Extraction, Redshift, Airflow, Business Intelligence, ETL, Data Automation
Document Type
Restricted Access
Submission Type
Research Project
Recommended Citation
Jamil, Syed Hamza. "Scalable Data Warehouse Development using AWS Redshift, Airflow, S3, and Power BI." Unpublished graduate research project. Institute of Business Administration. 2024. https://ir.iba.edu.pk/research-projects-mscs/60
Loading...
The full text of this document is only accessible to authorized users.