Loading...

Media is loading
 

Degree

Bachelor of Science (Computer Science)

Department

Department of Computer Science

School

School of Mathematics and Computer Science (SMCS)

Advisor

Dr. Faisal Iradat, Assistant Professor, School of Mathematics & Computer Science (SMCS)

Co-Advisor

Musab Zuberi

Keywords

Automated penetration testing, Agentic AI, Offensive AI, Multi Agent Systems

Abstract

APEX (Adaptive Pool-based Ephemeral Execution) is an autonomous web application penetration testing platform built as a final year project at the Institute of Business Administration, Karachi, in response to the growing gap between the speed of AI-assisted software delivery and the thoroughness of conventional security auditing. The project combines a theoretical framework with a working implementation. The framework identifies two structural failure modes in LLM-based penetration testing agents: Depth-First Myopia, where agents exhaust their resources on the first discovered vector while ignoring the rest of the attack surface, and Context Pollution, where accumulated logs from unrelated services degrade reasoning quality over time. APEX addresses both through Targeted Context Pools (TCPs), an architecture that decomposes the attack surface into isolated, domain-specific pools, each handled by a short-lived specialist agent spun up on demand, with findings routed between pools to form multi-stage exploit chains. APEX is built on LangGraph and orchestrates specialist agents in parallel through a fan-out/fan-in dispatch model. It integrates over twenty-five open-source security tools, exposes a FastAPI backend with real-time SSE event streaming, and ships with a Next.js web dashboard for engagement management and live run monitoring. After an engagement, a two-pass LLM pipeline generates a structured VAPT report with CVSS scoring, evidence, and remediation guidance, exported as a PDF. On the XBOW Validation Benchmark (104 challenges), APEX achieved a pass rate of 85/104 (81.7%) using Gemini 3 Flash and 65/104 (62.5%) using Gemini 3.1 Flash Lite, an ultra-lightweight model included to test cost-performance trade-offs. It was also evaluated against the OWASP Autonomous Penetration Testing Standard, satisfying 17 of 72 Tier 1 requirements. Current limitations include partial coverage of blind injection vulnerability classes and no network-layer infrastructure testing.

Tools and Technologies Used

Python, FastAPI, Uvicorn, LangChain, LangGraph,langchain-anthropic, langchain-google-genai,MongoDB, PyMongo, langgraph-checkpoint-mongodb, Docker, Docker Compose, Pydantic,HTTPX, BeautifulSoup4, Rich, ReportLab, Paramiko,Pillow, PyYAML, Readchar, MkDocs (mkdocs-material), Git.

Methodology

Modular agent-based design with graph-driven workflows (LangGraph), event-driven architecture (in-process event bus + SSE), background-thread runner with deferred imports, MongoDB checkpointing for resumability, iterative development with containerized integration, and LLMs (LangChain providers) for orchestration and decision-making.

Document Type

Restricted Access

Submission Type

BSCS Final Year Project

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS