Date of Submission

Spring 2025

Supervisor

Dr. Saurav Sthapit, Centre for Computational Sciences and Mathematical Modelling, Coventry University, UK

Committee Member 1

Dr. Tariq Mahmood, Examiner-I, Institute of Business Administration

Committee Member 2

Dr. Syed Ali Raza, Examiner-II, Institute of Business Administration

Degree

Master of Science in Data Science

Department

Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Keywords

Synthetic data, Data generation, Timeseries, Generative models, Diffusion Models

Abstract

Synthetic data, generated by generative models (in popular literature, generative artificial intelligence), are impacting the way further machine learning models are learnt. They have the advantage of 1) being rapidly available on demand without the need for human centric data gathering and curation, 2) preserving privacy by k-anonymizing real physical entities (people, business process touchpoint, etc.) to aid the cause of safe, secure, and responsible AI, 3) being relatively free of acquisition artefacts such as missing or corrupted data. In recent years, the concept of data generation has been extended to time series applications, and many powerful models have been developed. This study proposes a resilient machine learning framework designed specifically for complex surroundings such as power plants or any industrial environment where sensors are installed and time series data is generated continuously. In conditions where the sensor fails or malfunctions, our framework has the capability of producing synthetic data that has similar properties and characteristics to the original data making the system more resilient, robust and eliminating potential hazards. This study examines different time series data generation approaches, such as GANs, autoencoders, and diffusion models, to generate time series sensorial data in the complex industrial environment. By analyzing the performance of these methodologies, the research aims to provide valuable insights into the advantages and disadvantages of these methods and generate sensorial data with the most effective approach, which will help in more reliable, accurate, and useful data generation that can be used in place of the original data, following the same dynamics and distribution.

Document Type

Restricted Access

Submission Type

Thesis

Share

COinS