Loading...
Degree
Bachelor of Science (Computer Science)
Department
Department of Computer Science
School
School of Mathematics and Computer Science (SMCS)
Advisor
Dr. Faisal Iradat, Assistant Professor, School of Mathematics & Computer Science (SMCS)
Co-Advisor
Musab Zuberi
Keywords
Automated penetration testing, Agentic AI, Offensive AI, Multi Agent Systems
Abstract
APEX (Adaptive Pool-based Ephemeral Execution) is an autonomous web application penetration testing platform built as a final year project at the Institute of Business Administration, Karachi, in response to the growing gap between the speed of AI-assisted software delivery and the thoroughness of conventional security auditing. The project combines a theoretical framework with a working implementation. The framework identifies two structural failure modes in LLM-based penetration testing agents: Depth-First Myopia, where agents exhaust their resources on the first discovered vector while ignoring the rest of the attack surface, and Context Pollution, where accumulated logs from unrelated services degrade reasoning quality over time. APEX addresses both through Targeted Context Pools (TCPs), an architecture that decomposes the attack surface into isolated, domain-specific pools, each handled by a short-lived specialist agent spun up on demand, with findings routed between pools to form multi-stage exploit chains. APEX is built on LangGraph and orchestrates specialist agents in parallel through a fan-out/fan-in dispatch model. It integrates over twenty-five open-source security tools, exposes a FastAPI backend with real-time SSE event streaming, and ships with a Next.js web dashboard for engagement management and live run monitoring. After an engagement, a two-pass LLM pipeline generates a structured VAPT report with CVSS scoring, evidence, and remediation guidance, exported as a PDF. On the XBOW Validation Benchmark (104 challenges), APEX achieved a pass rate of 85/104 (81.7%) using Gemini 3 Flash and 65/104 (62.5%) using Gemini 3.1 Flash Lite, an ultra-lightweight model included to test cost-performance trade-offs. It was also evaluated against the OWASP Autonomous Penetration Testing Standard, satisfying 17 of 72 Tier 1 requirements. Current limitations include partial coverage of blind injection vulnerability classes and no network-layer infrastructure testing.
Tools and Technologies Used
Python, FastAPI, Uvicorn, LangChain, LangGraph,langchain-anthropic, langchain-google-genai,MongoDB, PyMongo, langgraph-checkpoint-mongodb, Docker, Docker Compose, Pydantic,HTTPX, BeautifulSoup4, Rich, ReportLab, Paramiko,Pillow, PyYAML, Readchar, MkDocs (mkdocs-material), Git.
Methodology
Modular agent-based design with graph-driven workflows (LangGraph), event-driven architecture (in-process event bus + SSE), background-thread runner with deferred imports, MongoDB checkpointing for resumability, iterative development with containerized integration, and LLMs (LangChain providers) for orchestration and decision-making.
Document Type
Restricted Access
Submission Type
BSCS Final Year Project
Recommended Citation
Haider Kazmi, S. D., Imran, A. W., & Khan, O. A. (2026). APEX: Agentic AI for Autonomous Autonomous Pentesting. Retrieved from https://ir.iba.edu.pk/fyp-bscs/47
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.
COinS
