Master of Science in Computer Science

Faculty / School

Faculty of Computer Sciences (FCS)


Department of Computer Science

Date of Submission



Dr. Tariq Mehmood, Associate Professor, Department of Computer Science, Institute of Business Administration (IBA), Karachi

Project Type

MSCS Survey Report


With the increasing amount of data and big data technology around us businesses can utilize this data to gain valuable insights from it and make profitable decisions on the basis of those insights. Several businesses are also moving towards automating these decisions for which they require machine learning. Big Data architectures are categorized into two major categories namely lambda architecture and kappa architecture. Both these architectures have their own strengths and cater to two different use cases of batch processing and stream processing.

Businesses require decisions in real-time because by the time a batch processing system produces result, the business loses its business moment or the customer need changes. Kappa architecture builds upon the concept of stream processing, however it still lacks the capability to perform machine learning in real-time. Several approaches have been used to update machine learning models in batch mode and near real-time mode, but there is no significant work in online machine learning model update. This paper presents a systematic literature review on the state of machine learning in Big Data Architectures. This paper finalizes 29 paper for the final review with respect to big data architecture and the extent of machine learning accommodated in them.

Most of the papers were performing analytics using machine learning models in real-time, but model update was performed in offline mode. Only one paper talked about online machine learning and that it would require writing different versions of machine learning algorithms for online model update but that too wasn’t talking with reference to a big data architecture.


In this paper we present a systematic literature review to analyze the current state of machine learning in big data architectures. Our major areas of investigation involved a) Speed of machine learning in big data architecture. b) Current technology stack used in big data architectures c) Problems faced when performing analytics in real-time. The major contribution of this SLR was to present framework for classification of machine learning architectures, their strengths and weaknesses with respect to big data analytics.

The results suggested that there is work that still needs to be done in case of online machine learning in big data architectures. Several frameworks proposals suggested online model update but no experimentation was done to support the claim.

The full text of this document is only accessible to authorized users.