Degree
Bachelor of Science (Computer Science)
Department
Department of Computer Science
School
School of Mathematics and Computer Science (SMCS)
Advisor
Ms. Abeera Tariq, Lecturer, Department of Computer Science
Keywords
Data Engineering, Data Pipelines, Containerization, AI Agents, Cloud Integration
Abstract
Today’s organizations deal with huge amounts of data every day, and they need efficient systems to move, clean, and organize that data for analysis. But in many cases, the data pipelines in place are rigid and overloaded with manual coding and scripting. This not only makes the process slow and tedious but also leaves room for inefficiencies and delays. Our project introduces a Hybrid ETL/ELT Data Processing Pipeline that provides manual control with intelligent recommendations. The system analyzes metadata such as data size, structure, and transformation complexity to recommend the best strategy (ETL or ELT), while users retain full control over the final choice. Our platform supports multiple relational database sources and destinations (MySql, PostgresSql, MS SqlServer) alongside AWS S3 buckets for data lakes, integrates AI agents to suggest warehouse/mart schemas, and enables users to build complete data pipelines through an interactive UI, with the option to fine-tune them using SQL queries. It’s built with React on the frontend, .NET and Python microservices on the backend, and packaged with Docker for easy cloud deployment. This hybrid setup boosts performance, offers greater flexibility, and makes the entire data integration process more transparent and user-friendly.
Tools and Technologies Used
Frontend: React, JavaScript
Backend: .NET, C#, Python, FastAPI
AI/LLM Tools: Groq, Langchain, Llama
Data Processing & Orchestration: Apache Spark, APScheduler
Database Systems: MySQL, PostgreSQL, SQL Server
DevOps/Cloud: Docker, AWS, Azure
Methodology
We followed Agile methodology with regular sprints to iteratively develop core components like the data characterization module, pipeline decision engine, and AI schema designer. Continuous integration and testing ensured flexibility, robustness, and system adaptability throughout the development cycle.
Document Type
Open Access
Submission Type
BSCS Final Year Project
Recommended Citation
Tariq, H. B., Khatri, S., Ghulam Farooq, H., Lokhandwala, A., & Kumar, K. (2025). ETL/ELT Pipeline – Dataflow. Retrieved from https://ir.iba.edu.pk/fyp-bscs/21