Degree

Bachelor of Science (Computer Science)

Department

Department of Computer Science

School

School of Mathematics and Computer Science (SMCS)

Advisor

Ms. Samreen Kazi, Lecturer, Department of Computer Science

Keywords

Financial, News, Sentiment, Analyzer, Market, Fintech, AI, NLP.

Abstract

Financial markets generate thousands of news articles every day, far more than any analyst can read, and the language of finance is specialized, so generic sentiment tools misread it. This dissertation presents the Financial News & Report Sentiment Analyzer, a complete locally deployable system that ingests financial news from multiple sources, scores it with a finance specific transformer (FinBERT), links each article to the companies it mentions, and exposes the resulting signal through a REST API, an interactive dashboard, and a chat assistant grounded by retrieval. The system fuses seven independent data sources (Google News, Yahoo Finance, Alpha Vantage, GDELT, a HuggingFace historical corpus, yfinance prices, and the Adanos social sentiment API) into a single PostgreSQL schema, a corpus of over 354,000 fully scored articles. It computes statistical relationships between sentiment and price (Pearson, Spearman, lagged, and rolling correlations), supports an active model training loop, and ships as a one command Docker Compose stack. On the Financial PhraseBank, FinBERT outperforms VADER and the Loughran and McDonald lexicon baseline by 21 points of accuracy, validating the chosen classifier. Crucially, the sentiment signal is evaluated honestly through a long horizon A/B trading simulation: six LLM based trader personalities are each run twice, once with sentiment briefings (treatment) and once without (control), under a strict no look ahead protocol. Over the first three simulated months of a 1,256 day backtest, sentiment informed traders beat their control counterparts by an average of 5.73 percentage points and won in 5 of 6 trading styles. We frame the contribution as evidence that news sentiment is a useful, measurable signal, not a market predictor, and provide a fully reproducible platform for studying it.

Tools and Technologies Used

Backend API FastAPI, Uvicorn, Pydantic 2, SQLAlchemy 2 (async), Alembic NLP / ML PyTorch, Transformers, sentence-transformers, scikit-learn, accelerate Workers Celery 5.3 with a Redis broker, scheduled via Celery Beat Storage PostgreSQL 16, Redis 7, ChromaDB (vector store) RAG / LLMs LangChain, langchain-groq (chat), langchain-fireworks (simulation) Data APIs yfinance, feedparser, httpx, aiohttp, Alpha Vantage, GDELT, HuggingFace datasets Frontend React 18, TypeScript 5, Vite 6, Tailwind CSS, Recharts, Radix UI Infrastructure Docker Compose (7 services), Nginx (reverse proxy, SSE, WebSocket) Testing pytest, pytest-asyncio, aiosqlite

Methodology

The methodology of the Financial News & Report Sentiment Analyzer is built upon a robust, multi-layered data ingestion and Natural Language Processing (NLP) pipeline designed to accurately quantify the impact of news on the stock market. The system continuously ingests financial news and social sentiment from seven diverse sources, including Google News, Yahoo Finance, Alpha Vantage, GDELT, and Adanos, alongside daily OHLCV market data. To ensure high data quality, the pipeline normalizes the incoming feeds, filters them for company relevance across a tracked universe of 20 large-cap United States equities, and meticulously deduplicates articles using stable identifiers and URL hashing. Missing data is handled conservatively by explicitly skipping invalid text or unmatched market days rather than applying artificial imputation, preserving the integrity of the temporal analysis. Once filtered, the core NLP layer processes the text using FinBERT, a finance-specific transformer model chosen specifically for its superior contextual understanding of specialized financial language. The model performs batched inference to categorize the text into positive, negative, or neutral sentiment classes, while simultaneously persisting the hard labels, per-class probabilities, and overall confidence scores into a structured relational database. This end-to-end design also natively supports continuous fine-tuning and active-model hot-swapping without requiring system downtime.

Beyond classification, the extracted sentiment scores are seamlessly integrated into a comprehensive statistical analytics and evaluation framework. The methodology aligns the daily-average sentiment scores with close-to-close market returns to compute detailed statistical relationships, including Pearson and Spearman coefficients, as well as time-lagged and 7-day rolling correlations. To make this data interactive and actionable, the system employs a Retrieval-Augmented Generation (RAG) architecture. Scored article chunks are embedded into a ChromaDB vector store, allowing an LLM-powered chat assistant to retrieve ranked evidence and provide verifiable answers grounded strictly in the processed financial corpus. Crucially, the practical validity of the sentiment signal is evaluated through a rigorous, no-look-ahead A/B trading simulation rather than relying on static metrics alone. The experimental setup utilizes twelve LLM-based trader agents configured across six distinct trading personalities, such as day trader, contrarian, and value investor. These agents are tested in a paired environment where the treatment group receives daily briefings containing both price data and FinBERT-derived sentiment roll-ups, whereas the control group receives only price-derived context. By restricting information to the prior day's close and simulating trades at the next open, this empirical backtest perfectly isolates the specific performance lift and risk-adjusted behavior provided by the NLP sentiment signal against passive market baselines.

Document Type

Restricted Access

Submission Type

BSCS Final Year Project

Creative Commons License

Creative Commons Attribution-NonCommercial 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License

Share

COinS