Degree
Master of Science in Data Science
Department
Department of Computer Science
Faculty/ School
School of Mathematics and Computer Science (SMCS)
Date of Submission
Summer 2025
Supervisor
Dr. Tariq Mahmood, Professor and Program Coordinator MS(CS) and MS(DS) Programs, School of Mathematics and Computer Science (SMCS)
Committee Member 1
Dr. Wali-Ullah
Committee Member 2
Dr. Shaukat Wasi
Keywords
Time Series Forecasting, Irregular Data, Hybrid Kernel SVR-LSTM, Semantic Embeddings, Deep Learning
Abstract
Forecasting irregular time series data presents a persistent challenge across diverse domains such as energy, healthcare, environment, retail, and finance. Traditional models including ARIMA, Prophet, and LSTM typically assume regularly spaced data and often fail to capture critical characteristics of irregular sequences, such as disrupted temporal dependencies, masked seasonality, and misaligned time intervals. Moreover, these models usually lack the ability to incorporate semantic or contextual information (e.g., holidays, events, station metadata), which limits their adaptability and real-world relevance. This thesis proposes a novel hybrid methodology designed to address these limitations through a combination of structural kernel-based modeling and semantic enrichment via large language models (LLMs). The framework fuses three complementary kernels: Radial Basis Function (RBF) to capture nonlinear temporal patterns, Periodic kernels for extracting seasonal cycles, and Dynamic Time Warping (DTW) to handle distorted or misaligned sequences. To incorporate domain-specific context, structured features are converted into natural language prompts and embedded using a lightweight pretrained SentenceTransformer (MiniLM/OpenAI text emeddings), generating rich semantic vectors. These kernel-derived and contextual features are then passed through a two-stage learning pipeline: Support Vector Regression (SVR) to model static nonlinear relationships, followed by Long Short-Term Memory (LSTM) networks to capture dynamic temporal dependencies. The proposed approach was empirically validated on nine real-world datasets comprising 31 forecasting variants, spanning applications such as electric vehicle charging load, heart rate monitoring, multi-city pollution levels, temperature forecasting, store sales, stock prices, exchange rates, and property values. Experimental results show that the hybrid model significantly outperformed baseline models including ARIMA, regime switching AR SARIMA, Random Forest, GRU, Regima fitting GRU, LSTM and regime fitting LSTM in five out of nine datasets, particularly where the data exhibited irregularity and semantic complexity. In addressing gaps identified through a comprehensive systematic literature review, this work contributes: (i) a novel hybrid methodology combining kernel fusion and contextual embeddings for irregular time series forecasting, (ii) extensive depth-first and breadth-first validation across domains, and (iii) a highly interpretable framework that balances forecasting accuracy with contextual depth, while recognizing trade-offs in computational efficiency and scalability.
Document Type
Restricted Access
Submission Type
Thesis
Recommended Citation
Urooj Mumtaz Joyo. 2025. A MULTI-KERNEL SVR-LSTM FRAMEWORK FOR IRREGULAR TIME SERIES FORECASTING WITH LIGHTWEIGHT SEMANTIC FEATURES. Master’s thesis, Institute of Business Administration Karachi, IBA
The full text of this document is only accessible to authorized users.
