Reduction of variables for predicting breast cancer survivability using principal component analysis

Was this content written or created while at IBA?

Yes

Document Type

Conference Paper

Publication Date

1-1-2015

Author Affiliation

  • Sharaf Hussain is Labs Administrators at Institute of Business Administration, Karachi
  • Naveen Zehra Quazilbash is PhD Scholar at the Department of Computer Science, Institute of Business Administration, Karachi
  • Samita Bai s PhD Scholar at the Department of Computer Science, Institute of Business Administration, Karachi
  • Shakeel Ahmed Khoja is Professor at the Institute of Business Administration, Karachi

Conference Name

2015 IEEE 28th International Symposium on Computer-Based Medical Systems (CBMS)

Conference Location

Sao Carlos, Brazil

Conference Dates

22-25 June 2015

ISBN/ISSN

84944218114 (Scopus)

First Page

131

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Abstract / Description

This research uses breast cancer data from the Surveillance, Epidemiology, and End Results (SEER) dataset's (1973-2010), which contains 684394 records. It is cleaned using several data pre-processing techniques. Survivability predictions are proposed using two different methods. In the first method, 14 variables are used as suggested by Delen et al[1], and in second method 14 variables are reduced to 5 variables (Principal Components) using a statistical technique called Principal Component Analysis (PCA), which captures 98% of total variance. The results of both of the methods propose almost same level of accuracy, thereby reducing the number of variables to be taken into account for the analysis of data.

Find in your library

Share

COinS