Degree

Bachelor of Science (Computer Science)

Department

Department of Computer Science

School

School of Mathematics and Computer Science (SMCS)

Advisor

Dr. Solat Jabeen Sheikh, Lecturer, Department of Computer Science

Keywords

Agentic AI, Retrieval-Augmented Generation, Legal Natural Language Processing, LangGraph, Multi-Agent Systems, Hybrid Retrieval

Abstract

Legal research and petition drafting in Pakistan remain manual processes, costing advocates three to five hours per case and locking out litigants without specialist support,  a gap that global legal AI tools like Westlaw and LexisNexis cannot fill, since none are trained on Pakistani case law, citation hierarchy, or PPC/CrPC statutes. Legal Sahara closes this gap with a jurisdiction-specific, three-agent system: a hybrid RAG engine for grounded case research, an 11-node pipeline that drafts court-ready petitions across five filing types, and a document summarizer that converts dense judgments into structured legal briefs. Unlike single-prompt LLM tools, each agent is built to self-correct,  replanning failed retrievals, flagging high-severity offences before drafting, and validating output against quality rubrics before delivery. Tested on 10,482 Supreme Court judgments, the system retrieves relevant precedent with 0.79 Precision@3, drafts petitions scoring 8.2/10 under expert LLM review, and produces case summaries averaging 79.3/100 against human-expert benchmarks. The results show that purpose-built agentic architectures can deliver practically usable first-draft legal research and drafting support for junior advocates, pro se litigants, and rural practitioners, a population current legal AI tools were never designed to serve.

Tools and Technologies Used

Python, FastAPI, React.js, LangGraph, LangChain, LLaMA 3.3 70B (Groq), LLaMA 3.1 8B (Groq), Gemma 3 27B (Google Gemini), ChromaDB, BAAI/bge-large-en- v1.5, rank_bm25, Reciprocal Rank Fusion (RRF), pypdf, pytesseract, pdf2image, python-docx, Pillow, Docker, Docker Compose v3.9, AWS EC2, Nginx, Git, GitHub

Methodology

Legal Sahara was developed using an agentic AI architecture orchestrated through LangGraph, with three specialized agents each built as a stateful graph with conditional edges, retry logic, and self-correction capabilities. The RAG Agent follows a plan-execute-replan loop where Gemma 3 routes queries, LLaMA 3.3 generates answers, and Gemma 3 independently verifies outputs through a two-gate faithfulness check, preventing hallucinated citations by retrieving case IDs directly from ChromaDB metadata rather than generating them. The Petition Drafter operates as an 11-node LangGraph pipeline that classifies the petition type, checks for missing information, detects PPC severity red flags, retrieves supporting precedents via hybrid search, ranks citations by legal hierarchy, drafts the petition using LLaMA 3.3 70B, and then validates the output using a 5- dimensional LLM-as-judge scoring rubric, triggering an automatic revision if thresholds are not met. The Summarizer Agent processes uploaded documents through an 8-node sequential pipeline covering classification, facts, issues, holding, reasoning, ratio decidendi, synthesis, and evaluation, with OCR auto- detection for scanned PDFs at 300 DPI and exponential backoff retry logic throughout. All agents are exposed via a FastAPI REST backend, containerized with Docker Compose, and deployed on AWS EC2 with persistent ChromaDB and HuggingFace model cache volumes

Document Type

Restricted Access

Submission Type

BSCS Final Year Project

Share

COinS