Title

Data Mining: Analyzing impact of outliers' detection and removal from the test sample in Blind Source Extraction using Multivariate Calibration Techniques

Abstract/Description

Blind source extraction (BSE) may be an essential but a challenging task where multiple sources are convolved and/or time delayed. In this article we discuss the performance of multivariate calibration techniques that comprise of classical least square (CLS), inverse linear regression (ILS), principal component regression (PCR) and partial least square regression (PLS) in achieving this task in robust speech recognition systems with varying signal-to-noise ratios (SNR). We specifically analyze two methods for identifying and removing outliers from the sample, namely; outlier sample removal (OSR) and descriptor selection (DS) for classical least square and factor Based regression respectively, which results in higher correlation among predicted and the expected results. Our experiments suggest that factor based methods produce much reliable results than classical least square regression. However, classical least square is much more immune to white noise as compared to factor based regressions. Our results prove that successful detection and removal of outliers from the sample under test (SUT) may result in as low as 37% and 56% improvement in prediction with classical least square and principal component regression respectively.

Session Theme

Data Mining

Session Type

Other

Session Chair

Dr. Sajjad Haider

Start Date

15-8-2009 5:35 PM

End Date

15-8-2009 5:55 PM

Share

COinS
 
Aug 15th, 5:35 PM Aug 15th, 5:55 PM

Data Mining: Analyzing impact of outliers' detection and removal from the test sample in Blind Source Extraction using Multivariate Calibration Techniques

Blind source extraction (BSE) may be an essential but a challenging task where multiple sources are convolved and/or time delayed. In this article we discuss the performance of multivariate calibration techniques that comprise of classical least square (CLS), inverse linear regression (ILS), principal component regression (PCR) and partial least square regression (PLS) in achieving this task in robust speech recognition systems with varying signal-to-noise ratios (SNR). We specifically analyze two methods for identifying and removing outliers from the sample, namely; outlier sample removal (OSR) and descriptor selection (DS) for classical least square and factor Based regression respectively, which results in higher correlation among predicted and the expected results. Our experiments suggest that factor based methods produce much reliable results than classical least square regression. However, classical least square is much more immune to white noise as compared to factor based regressions. Our results prove that successful detection and removal of outliers from the sample under test (SUT) may result in as low as 37% and 56% improvement in prediction with classical least square and principal component regression respectively.