Degree
Master of Science in Computer Science
Department
Department of Computer Science
School
School of Mathematics and Computer Science (SMCS)
Date of Submission
Fall 2023
Supervisor
Waqas Mahmood, Visiting Faculty, Department of Computer Science, School of Mathematics and Computer Science (SMCS)
Keywords
Machine Learning, Data Crowdsourcing, ML Training Data, Data Curation, DaaS
Abstract
As the AI revolution continues, the success of modern artificial intelligence (AI) and machine learning (ML) largely depends on having access to accurate and varied data. However, obtaining such quality data poses a challenge due to issues like data inaccuracy, inconsistency, or unavailability, which indirectly hampers the advancements in AI and ML. This project aims to solve this problem by creating a platform where regular internet users can contribute data by categorizing or labeling it. Another group of regular users on the platform will verify these categories or labels to make sure the data is correct and genuine. If they find any incorrect data, they can report it, and reports of inaccurate data can affect the original uploader's rating on the platform, encouraging everyone to share accurate data.
The ultimate goal here is to collect, then clean up and confirm the accuracy of this collected data, making it ready for use in machine learning projects. This cleaned-up data will be made available for sale, so companies or individuals looking to train their machine learning models can buy it. An important feature of this platform is that it will reward users with a share of the sales from the data they helped collect or check, depending on how much and how well they contributed. This way, the platform encourages users to share and check data, making a continuous supply of good quality data available for ML projects, which in turn supports the ongoing growth of AI and ML technologies.
Document Type
Restricted Access
Submission Type
Research Project
Recommended Citation
Anwar, Muaaz. "Crowdsourced Data Curation Platform: Harnessing Data’s Potential for Machine Learning." Unpublished graduate research project. Institute of Business Administration. 2023. https://ir.iba.edu.pk/research-projects-mscs/50
Project Code
CDCP-Demo.mp4 (119763 kB)
Project Video
CDCP-poster.pdf (572 kB)
Project Poster
The full text of this document is only accessible to authorized users.