Abstract/Description

In this paper, the precisions of the logistic regression, naive-Bayes and linear data classification methods, with regard to the area under curve (AUC) metric have been compared. The effect of parameters including size of the dataset, kind of the independent attributes, number of the discrete attributes, and their values have been investigated. From the results, it can be concluded that in datasets consisting of both discrete and continuous attributes, the AUC of the three mentioned classifiers is the same. With increasing the number of the discrete attributes, the AUC of the logistic regression is increased and the precision related to this classifier become more than the other two classifiers. Also considering impact of the discrete attributes it can be seen that with increasing the number of values in discrete attributes the AUC related to the logistic regression classifier increases and linear regressions' AUC decreases, but the AUC of the naive-Bayes classifier remains constant. The results of this research can help data miners in selecting the more efficient classifiers based on the conditions of feature that exist in their datasets.

Session Theme

Data Mining

Session Type

Other

Session Chair

Dr. Sajjad Haider

Start Date

15-8-2009 6:35 PM

End Date

15-8-2009 6:55 PM

Share

COinS
 
Aug 15th, 6:35 PM Aug 15th, 6:55 PM

Data Mining: An experimental investigation of the effect of discrete attributes on the precision of classification methods

In this paper, the precisions of the logistic regression, naive-Bayes and linear data classification methods, with regard to the area under curve (AUC) metric have been compared. The effect of parameters including size of the dataset, kind of the independent attributes, number of the discrete attributes, and their values have been investigated. From the results, it can be concluded that in datasets consisting of both discrete and continuous attributes, the AUC of the three mentioned classifiers is the same. With increasing the number of the discrete attributes, the AUC of the logistic regression is increased and the precision related to this classifier become more than the other two classifiers. Also considering impact of the discrete attributes it can be seen that with increasing the number of values in discrete attributes the AUC related to the logistic regression classifier increases and linear regressions' AUC decreases, but the AUC of the naive-Bayes classifier remains constant. The results of this research can help data miners in selecting the more efficient classifiers based on the conditions of feature that exist in their datasets.