Using predictive analytics for predicting host availability in desktop grids

Author Affiliation

Tariq Mahmood is Assistant Professor at Institute of Business Administration (IBA), Karachi

Faculty / School

Faculty of Computer Sciences (FCS)


Department of Computer Science

Was this content written or created while at IBA?


Document Type


Source Publication

International Journal of Grid and Distributed Computing




Analysis | Mathematics


Desktop grid systems are one of the largest paradigms of distributed computing in the world. The idea is to use the idle and underutilized processing cycles and memory of the desktop machines to support large scale computation. The design issues in desktop grid systems are complex because the hosts (desktop machines) participating in the computation do not work under one administrative control and can become unavailable at any point in time. The heterogeneity and volatility of computing resources, for example, diversity of memory, processors, and hardware architectures also play its role. To get fruitful results from such hostile environment, scheduling tasks to better hosts become one of the most important issues. In this paper, we have predicted the host availability in desktop grid systems by using Predictive Analytics (PA) that can help in scheduling tasks to highly available hosts. We have presented a comprehensive, high-level evaluation of standard PA techniques to predict host availability in desktop grids with the aim to determine the relatively better algorithms. We addressed both PA perspectives, i.e., classification and regression. We used the following standard classification algorithms: k-Nearest Neighbour (k-NN) for Lazy Learning technique, Naïve Bayes for Bayesian learning technique, LibSVM library for Support Vector Modeling (SVM) technique, Random Forest for Tree Induction technique, and Multi-Layer Perceptron for Neural Network technique. We found that the level of selected threshold for availability is critical for acquiring accurate predictions, and k-NN gives the best accuracy across all thresholds. Also, precision-wise, SVM gives perfect performance (100%) across all thresholds followed closely by Neural Networks. We used Multiple Linear Regression (MLR), Polynomial Regression (PR) and MLP for regression, and found that MLP gives the best performance, followed by PR and MLR.

Indexing Information

HJRS - X Category, Scopus, Web of Science - Emerging Sources Citation Index (ESCI)

Publication Status