Date of Submission

Fall 2023

Supervisor

Dr. Sajjad Haider, Professor, Department of Computer Science, Institute of Business Administration, Karachi

Co-Supervisor

Dr. Ramla Shahid, Professor, Department of Biochemistry, Kohsar University Murree

Committee Member 1

Dr. Tariq Mehmood, Examiner - I, Institute of Business Administration (IBA), Karachi

Committee Member 2

Dr. Imran Rauf, Examiner - II, Institute of Business Administration (IBA), Karachi

Degree

Master of Science in Data Science

Department

Department of Computer Science

Faculty/ School

School of Mathematics and Computer Science (SMCS)

Keywords

Process Characterization, Quality by Design, Machine Learning, Design of Experiments

Abstract

This research aims to highlight the importance of process characterization and quality by design (QbD) in developing bioprocesses. It evaluates the utilization of classical and intensified design of experiments (cDoE and iDoE) data with conventional machine learning models and sequential methods. The objective is to enhance understanding of the process, optimize protocols, and ensure the production of high-quality biopharmaceutical products. The conducted experiments utilize two datasets: a static dataset and a dataset that includes intensified fed-batch fermentations. Data preprocessing involves standardization and handling of missing values using various techniques. Seven machine learning models and three sequential models are trained on the preprocessed cDoE and iDoE datasets and assessed using the cDoE dataset. Key findings include the superior performance of conventional models, particularly the Gradient Boosting Regressor, in predicting CDW. For Product Titer predictions, the Simple RNN emerged as a more effective model, emphasizing the importance of capturing temporal dynamics in bioprocess data. Interestingly, models trained on cDoE data showed higher accuracy in predicting CDW, while the difference was less marked in Product Titer predictions. While cDoE datasets require more experiments, they provide richer insights for model training, particularly for CDW predictions. In contrast, iDoE datasets require fewer experiments and present a trade-off in predictive accuracy for certain CQAs.

Document Type

Restricted Access

Submission Type

Thesis

Available for download on Friday, June 13, 2025

Share

COinS