New .........
New .........
circles is that 80% of the analysis effort is spent on data cleaning and prepa-
ration and only 20% is typically spent on modeling. In light of this it may
seem strange that this book has devoted more than a dozen chapters to
cleansing and preparation are things that are better learned through experi-
ence and not so much from a book. That said, it is essential to be conversant
with the many techniques that are available for these important early
process
steps. In this chapter the focus will not be on data cleaning, as it was
partially
First a brief introduction to feature selection along with the need for this pre-
processing step is given. There are fundamentally two types of feature selec-
tion processes: filter type and wrapper type. Filter approaches work by
selecting
only those attributes that rank among the top in meeting certain stated crite-
ria (Blum & Langley, 1997; Yu & Liu, 2003). Wrapper approaches work by
iteratively selecting, via a feedback loop, only those attributes that improve
the performance of an algorithm (Kohavi & John, 1997). Among the filter-
type methods, one can further classify based on the data types: numeric
versus nominal. The most common wrapper-type methods are the ones
associated with multiple regression: stepwise regression, forward selection,
square-based filtering.