0% found this document useful (0 votes)
305 views7 pages

S7 extraFeatureSelection

This document discusses feature selection techniques for machine learning models. It outlines 4 main steps: 1) use domain knowledge, 2) visualize and preprocess attributes, 3) construct new attributes, and 4) select the best attribute subset. It then describes ranking and subset selection algorithms like forward and backward stepwise regression. It cautions that multicollinearity between attributes can produce imprecise coefficient estimates. The goal is to balance including predictive attributes while excluding noise to improve model performance.

Uploaded by

sargentshriver
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
305 views7 pages

S7 extraFeatureSelection

This document discusses feature selection techniques for machine learning models. It outlines 4 main steps: 1) use domain knowledge, 2) visualize and preprocess attributes, 3) construct new attributes, and 4) select the best attribute subset. It then describes ranking and subset selection algorithms like forward and backward stepwise regression. It cautions that multicollinearity between attributes can produce imprecise coefficient estimates. The goal is to balance including predictive attributes while excluding noise to improve model performance.

Uploaded by

sargentshriver
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

S7

Extra: Feature Selec/on


Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30



Feature Selec/on
Step 1: Use Domain knowledge to guide you whenever possible Step 2: Visualize aKributes Remove aKributes with no values, too many missing values Check for obvious outliers and remove them Step 3: Construct new aKributes (if it makes sense) Combine aKributes Normalize numeric aKributes (for regression, Nave Bayes, NN hKp://www.tuVs.edu/ ~gdallal/regtrans.htm) Create binary aKributes from nominal aKributes Step 4: Select the best subset of aKributes for the problem IF IN DOUBT CHOOSE A METHOD THAT DOES THE FEATURE SELECTION FOR YOU (for example, decision trees)

The Basics
Basic Ideas
Usually faced with problem of selec/ng subset of possible predictors Have to balance conic/ng objec/ves
Want to include all variables that have legi/mate predic/ve skill Want to exclude all extraneous variables that t only sample- specic noise
Reduce predic/ve skill Increase standard errors of regression coecients , classica/on, etc.

Ideally would be able to determine single best subset of predictors to include


But no single deni/on of best Dierent algorithms will produce dierent "best" subsets Problems magnied by correla/on among predictors

Feature Selec/on
Ranking
By some objec/ve (for example, informa/on gain)

Subset
Algorithms (see next slide) Wrapper (try subset within the context of the algorithm you know you are going to use)

Feature Selec/on Algorithms


All possible subsets
Only feasible with small number of poten/al predictors (maybe 10 or less) Then can use one or more of possible numerical criteria to nd overall best Start with no predictors

Forward stepwise regression


First include predictor with highest correla/on with response In subsequent steps add predictors with highest par/al correla/on with response controlling for variables already in equa/ons Stop when numerical criterion signals maximum (minimum) Some/mes eliminate variables when t value gets too small

Backward elimina/on

Only possible method for very large predictor pools Local op/miza/on at each step, no guarantee of nding overall op/mum Start with all predictors in equa/on

OVen produces dierent nal model than forward stepwise method

Remove predictor with smallest t value Con/nue un/l numerical criterion signals maximum (minimum)

Mul/colinearity (regression)
The degree of correlation between Xs. A high degree of multicolinearity produces unacceptable uncertainty (large variance) in regression coefficient estimates (i.e., large sampling variation) Imprecise estimates of slopes and even the signs of the coefficients may be misleading. t-tests which fail to reveal significant factors. The analysis of variance for the overall model may show a highly signicantly good t, when paradoxically; the tests for individual predictors are non-signicant.

S7 Extra: Feature Selec/on


Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30

You might also like