Master of Science in Computer Science


Department of Computer Science

Faculty / School

Faculty of Computer Sciences (FCS)

Date of Submission



Dr. Tariq Mahmood, Associate Professor, Faculty of Computer Science

Document type

MSCS Survey Report


With the increasing amount of data and big data technology around us businesses can utilize this data to gain valuable insights from it and make profitable decisions on the basis of those insights. Several businesses are also moving towards automating these decisions for which they require machine learning. Big Data architectures are categorized into two major categories namely lambda architecture and kappa architecture. Both these architectures have their own strengths and cater to two different use cases of batch processing and stream processing.

Businesses require decisions in real-time because by the time a batch processing system produces result, the business loses its business moment or the customer need changes. Kappa architecture builds upon the concept of stream processing, however it still lacks the capability to perform machine learning in real-time. Several approaches have been used to update machine learning models in batch mode and near real-time mode, but there is no significant work in online machine learning model update. This paper presents a systematic literature review on the state of machine learning in Big Data Architectures. This paper finalizes 29 paper for the final review with respect to big data architecture and the extent of machine learning accommodated in them.

Most of the papers were performing analytics using machine learning models in real-time, but model update was performed in offline mode. Only one paper talked about online machine learning and that it would require writing different versions of machine learning algorithms for online model update but that too wasn’t talking with reference to a big data architecture.

The full text of this document is only accessible to authorized users.