Degree

Bachelor of Science (Computer Science)

Department

Department of Computer Science

School

School of Mathematics and Computer Science (SMCS)

Advisor

Ms. Abeera Tariq, Lecturer, Department of Computer Science

Keywords

Data Engineering, Data Pipelines, Containerization, AI Agents, Cloud Integration

Abstract

Today’s organizations deal with huge amounts of data every day, and they need efficient systems to move, clean, and organize that data for analysis. But in many cases, the data pipelines in place are rigid and overloaded with manual coding and scripting. This not only makes the process slow and tedious but also leaves room for inefficiencies and delays. Our project introduces a Hybrid ETL/ELT Data Processing Pipeline that provides manual control with intelligent recommendations. The system analyzes metadata such as data size, structure, and transformation complexity to recommend the best strategy (ETL or ELT), while users retain full control over the final choice. Our platform supports multiple relational database sources and destinations (MySql, PostgresSql, MS SqlServer) alongside AWS S3 buckets for data lakes, integrates AI agents to suggest warehouse/mart schemas, and enables users to build complete data pipelines through an interactive UI, with the option to fine-tune them using SQL queries. It’s built with React on the frontend, .NET and Python microservices on the backend, and packaged with Docker for easy cloud deployment. This hybrid setup boosts performance, offers greater flexibility, and makes the entire data integration process more transparent and user-friendly.

Tools and Technologies Used

Frontend: React, JavaScript

Backend: .NET, C#, Python, FastAPI

AI/LLM Tools: Groq, Langchain, Llama

Data Processing & Orchestration: Apache Spark, APScheduler

Database Systems: MySQL, PostgreSQL, SQL Server

DevOps/Cloud: Docker, AWS, Azure

Methodology

We followed Agile methodology with regular sprints to iteratively develop core components like the data characterization module, pipeline decision engine, and AI schema designer. Continuous integration and testing ensured flexibility, robustness, and system adaptability throughout the development cycle.

Document Type

Open Access

Submission Type

BSCS Final Year Project

Share

COinS