Degree
Master of Science in Computer Science
Department
Department of Computer Science
Faculty / School
Faculty of Computer Sciences (FCS)
Date of Submission
2020-06-30
Supervisor
Dr. Tariq Mahmood, Associate Professor, Faculty of Computer Science
Document type
MSCS Survey Report
Keywords
Big data, Machine learning, Big data architectures, Lambda Architecture, Kappa Architecture, Real-time streaming data
Abstract
With the increasing amount of data and big data technology around us businesses can utilize this data to gain valuable insights from it and make profitable decisions on the basis of those insights. Several businesses are also moving towards automating these decisions for which they require machine learning. Big Data architectures are categorized into two major categories namely lambda architecture and kappa architecture. Both these architectures have their own strengths and cater to two different use cases of batch processing and stream processing.
Businesses require decisions in real-time because by the time a batch processing system produces result, the business loses its business moment or the customer need changes. Kappa architecture builds upon the concept of stream processing, however it still lacks the capability to perform machine learning in real-time. Several approaches have been used to update machine learning models in batch mode and near real-time mode, but there is no significant work in online machine learning model update. This paper presents a systematic literature review on the state of machine learning in Big Data Architectures. This paper finalizes 29 paper for the final review with respect to big data architecture and the extent of machine learning accommodated in them.
Most of the papers were performing analytics using machine learning models in real-time, but model update was performed in offline mode. Only one paper talked about online machine learning and that it would require writing different versions of machine learning algorithms for online model update but that too wasn’t talking with reference to a big data architecture.
Recommended Citation
Awan, S. Z. (2020). State of real-time machine learning for big data- a systematic literature review (Unpublished MSCS survey report). Institute of Business Administration, Pakistan. Retrieved from https://ir.iba.edu.pk/survey-reports-mscs/24
The full text of this document is only accessible to authorized users.