STIX and TAXII pipeline integration in Wazuh

Loading...

Media is loading
 

Degree

Bachelor of Science (Computer Science)

Department

Department of Computer Science

School

School of Mathematics and Computer Science (SMCS)

Advisor

Dr. Faisal Iradat, Assistant Professor, School of Mathematics & Computer Science (SMCS)

Co-Advisor

Faisal Iradat

Keywords

Cyber Security, Information Security, Product, Open source contribution, Deployable

Abstract

Security teams running open-source SIEM platforms are stuck in a familiar bind: too many uncontextualized alerts to triage by hand, no fast way to check indicators of compromise against external threat intelligence, and detection logic that only reacts after an attack signature is already known. Wazuh-TI addresses this by extending the open-source Wazuh 4.x SIEM with automated STIX 2.1/TAXII 2.1 threat intelligence feeds, two-way MITRE ATT&CK correlation, a Random Forest risk-scoring model, and a Gemini-powered analyst assistant for natural-language investigation, all wrapped in a modular, containerised, API-first architecture. A seven-stage pipeline (Ingest, Parse, Normalise, ML Score, Store, CDB Sync, Enrich) turns raw alerts into ready-to-act triage packages, each carrying a 0–100 risk score, a four-tier priority label, a suggested response, a predicted next attack stage, and supporting TI evidence. On 1,151 alerts from six live threat feeds, the classifier reached 91% accuracy, 0.94 AUC-ROC, and 89% precision, correctly flagging 47% as Critical. Deployment testing suggests roughly an 85% cut in mean-time-to-respond versus manual triage. To our knowledge, this is the first open-source platform to combine all five capabilities in one system.

Tools and Technologies Used

Python, JavaScript, FastAPI, SQLAlchemy, React, Vite, Tailwind CSS, Recharts, Scikit-learn, Google Gemini AI, Wazuh SIEM, Docker, Docker Compose, SQLite, STIX 2.1, TAXII 2.1, AlienVault OTX, AbuseIPDB, ThreatFox

Methodology

The platform was built using an iterative Agile process paired with a modular architecture, which made it easier to integrate and test complex components as they came together. Development started with a FastAPI-based microservices foundation for fast, asynchronous data handling, with Docker used to containerize everything and keep development and deployment environments consistent. From there, the team built the intelligence ingestion engine, which queries TAXII servers and external REST APIs, then parses, standardizes, deduplicates, and stores the resulting threat data using the STIX format and SQLAlchemy for persistence. Next came the analytical layer: a Random Forest classifier was trained to generate probabilistic risk scores based on host criticality, and the Google Gemini SDK was integrated to give the platform a conversational AI analyst capable of natural-language forensic investigation. Finally, the team built the frontend using React and Tailwind CSS, with usability as a priority throughout. The complete system was then tested using simulated attack scenarios and mock agents to confirm the machine learning model's prediction accuracy and the stability of the real-time data pipeline.

Document Type

Restricted Access

Submission Type

BSCS Final Year Project

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

This document is currently not available here.

Share

COinS