Degree

Bachelor of Science (Computer Science)

Department

Department of Computer Science

School

School of Mathematics and Computer Science (SMCS)

Advisor

Ms. Abeera Tariq, Lecturer and Academic Director, Department of Computer Science IBA Karachi

Co-Advisor

Muhammad Naeem (Industry Advisor)

Keywords

Natural Language to SQL, Agentic AI, Conversational BI

Abstract

IntelliQuery is an AI-powered natural language to SQL agent built on a multi-node LangGraph workflow that enables non-technical business users to query a PostgreSQL (Northwind) database hosted on AWS RDS simply by typing plain English questions. The system contributes a fully automated pipeline: Schema Inspector, SQL Planner, Query Validator, SQL Executor, and Output Formatter, where each node is intelligently connected by conditional routing edges that handle retries, error recovery, and security enforcement; the Planner node leverages Google Gemini LLM combined with ChromaDB semantic vector search over 23 curated few-shot examples to generate contextually accurate SQL, while the Validator enforces security (blocking injections and destructive statements) and business-logic correctness before any query reaches the database, ultimately democratizing data access by letting users get formatted, business-ready results from complex relational data without writing a single line of SQL.

Tools and Technologies Used

Python, LangChain, LangGraph, LangChain-Google-GenAI, Google Gemini (LLM), LangChain-Groq, Groq, Ollama, FastAPI, Uvicorn, Pydantic, PostgreSQL, AWS RDS, psycopg2, ChromaDB, FastEmbed, Redis, PyJWT, MCP (Model Context Protocol), LangChain-MCP-Adapters, python-dotenv, asyncssh, DuckDuckGo Search, pypdf, docx2txt, React (frontend), Vite, Node.js

Methodology

The project follows an Agentic AI development methodology built around a phased, iterative approach. The architecture is designed as a multi-node LangGraph directed graph, where each node (Schema Inspector, Planner, Validator, Executor, Output Formatter) is developed, tested, and validated independently before being wired into the full workflow, a modular-first strategy that allows unit-level testing at the tool, node, flow, and business-logic layers progressively. The development is structured across 7 planned stages, starting with a working NLP-to-SQL baseline (Stage 1, current) and incrementally adding self-correction, hierarchical memory, multi-tool ecosystems, context optimization, and finally PowerBI visualization integration, ensuring each stage delivers a functional system before complexity is added. The SQL generation uses a Retrieval-Augmented Generation (RAG) pattern, where ChromaDB with FastEmbed performs semantic similarity search over 23 curated few-shot examples to ground the Gemini LLM's output in domain-specific query patterns. A multi-layer validation strategy (security, syntax, business logic, database EXPLAIN) acts as a quality gate, and conditional edge routing enables automatic retry loops back to the Planner on failures, making the system selfcorrecting without human intervention. The overall design philosophy is domain-aware AI, where the agent is pre-loaded with hand-crafted business context, table relationship mappings, and JOIN patterns specific to the Northwind dataset, reducing hallucination and improving SQL accuracy over a generic LLM approach.

Document Type

Restricted Access

Submission Type

BSCS Final Year Project

Creative Commons License

Creative Commons Attribution-NonCommercial 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Share

COinS