S7 extraFeatureSelection

This document discusses feature selection techniques for machine learning models. It outlines 4 main steps: 1) use domain knowledge, 2) visualize and preprocess attributes, 3) construct new attributes, and 4) select the best attribute subset. It then describes ranking and subset selection algorithms like forward and backward stepwise regression. It cautions that multicollinearity between attributes can produce imprecise coefficient estimates. The goal is to balance including predictive attributes while excluding noise to improve model performance.

Uploaded by

sargentshriver

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

305 views7 pages

S7 extraFeatureSelection

Uploaded by

sargentshriver

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

S7

Extra: Feature Selec/on

Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30

Feature Selec/on
Step 1: Use Domain knowledge to guide you whenever possible Step 2: Visualize aKributes Remove aKributes with no values, too many missing values Check for obvious outliers and remove them Step 3: Construct new aKributes (if it makes sense) Combine aKributes Normalize numeric aKributes (for regression, Nave Bayes, NN hKp://www.tuVs.edu/ ~gdallal/regtrans.htm) Create binary aKributes from nominal aKributes Step 4: Select the best subset of aKributes for the problem IF IN DOUBT CHOOSE A METHOD THAT DOES THE FEATURE SELECTION FOR YOU (for example, decision trees)

The Basics
Basic Ideas
Usually faced with problem of selec/ng subset of possible predictors Have to balance conic/ng objec/ves
Want to include all variables that have legi/mate predic/ve skill Want to exclude all extraneous variables that t only sample- specic noise
Reduce predic/ve skill Increase standard errors of regression coecients , classica/on, etc.

Ideally would be able to determine single best subset of predictors to include

But no single deni/on of best Dierent algorithms will produce dierent "best" subsets Problems magnied by correla/on among predictors

Feature Selec/on
Ranking
By some objec/ve (for example, informa/on gain)

Subset
Algorithms (see next slide) Wrapper (try subset within the context of the algorithm you know you are going to use)

Feature Selec/on Algorithms

All possible subsets
Only feasible with small number of poten/al predictors (maybe 10 or less) Then can use one or more of possible numerical criteria to nd overall best Start with no predictors

Forward stepwise regression

First include predictor with highest correla/on with response In subsequent steps add predictors with highest par/al correla/on with response controlling for variables already in equa/ons Stop when numerical criterion signals maximum (minimum) Some/mes eliminate variables when t value gets too small

Backward elimina/on

Only possible method for very large predictor pools Local op/miza/on at each step, no guarantee of nding overall op/mum Start with all predictors in equa/on

OVen produces dierent nal model than forward stepwise method

Remove predictor with smallest t value Con/nue un/l numerical criterion signals maximum (minimum)

Mul/colinearity (regression)
The degree of correlation between Xs. A high degree of multicolinearity produces unacceptable uncertainty (large variance) in regression coefficient estimates (i.e., large sampling variation) Imprecise estimates of slopes and even the signs of the coefficients may be misleading. t-tests which fail to reveal significant factors. The analysis of variance for the overall model may show a highly signicantly good t, when paradoxically; the tests for individual predictors are non-signicant.

S7 Extra: Feature Selec/on

Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30

DDMA05 ModelSelection
No ratings yet
DDMA05 ModelSelection
28 pages
Handbook of Regression Methods
100% (5)
Handbook of Regression Methods
654 pages
L2D-Multiple Regression D 2022-03-03 21 - 20 - 03
No ratings yet
L2D-Multiple Regression D 2022-03-03 21 - 20 - 03
31 pages
Stepwise Regression
100% (2)
Stepwise Regression
28 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
Maximum Likelihood Estimation With Stata, Fourth Edition by William Gould, Jeffrey Pitblado, Brian Poi
No ratings yet
Maximum Likelihood Estimation With Stata, Fourth Edition by William Gould, Jeffrey Pitblado, Brian Poi
376 pages
Regression PDF
No ratings yet
Regression PDF
33 pages
Internship PPT Final of Collage
No ratings yet
Internship PPT Final of Collage
19 pages
Week8 Lecture 1 ML SPR25
No ratings yet
Week8 Lecture 1 ML SPR25
20 pages
Lasso and Ridge Regression
No ratings yet
Lasso and Ridge Regression
30 pages
Variable Selection
No ratings yet
Variable Selection
26 pages
Stepwise Regression
No ratings yet
Stepwise Regression
9 pages
Lab 5
No ratings yet
Lab 5
30 pages
Lesson 5 Model Selection
No ratings yet
Lesson 5 Model Selection
41 pages
Da Sem Unit 3-1
No ratings yet
Da Sem Unit 3-1
13 pages
RM - Variable Selection Methods and Goodness of Fit
No ratings yet
RM - Variable Selection Methods and Goodness of Fit
20 pages
Module 2
No ratings yet
Module 2
12 pages
ML Lecture 02
No ratings yet
ML Lecture 02
40 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
Analog and Digital Signal Processing Sec
No ratings yet
Analog and Digital Signal Processing Sec
821 pages
Notes 12
No ratings yet
Notes 12
41 pages
QUIZ Notes
No ratings yet
QUIZ Notes
5 pages
Feature Selection
No ratings yet
Feature Selection
22 pages
STA302 Week12 Full
No ratings yet
STA302 Week12 Full
30 pages
Unit 4
No ratings yet
Unit 4
7 pages
Model Selection R Chap 4
No ratings yet
Model Selection R Chap 4
5 pages
Trend Analysis - CompContr12
No ratings yet
Trend Analysis - CompContr12
68 pages
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
From Everand
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
S. Deviant
4.5/5 (6)
Course Regression Model Strategies PDF
No ratings yet
Course Regression Model Strategies PDF
307 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
No ratings yet
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
17 pages
Project Idea
No ratings yet
Project Idea
8 pages
TP MSDC 2 Sujet
No ratings yet
TP MSDC 2 Sujet
5 pages
Featuere Selection
No ratings yet
Featuere Selection
5 pages
Data Mining
No ratings yet
Data Mining
2 pages
Model Selection-Handout PDF
No ratings yet
Model Selection-Handout PDF
57 pages
Module-3 DSV
No ratings yet
Module-3 DSV
20 pages
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
From Everand
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
P.Y. Cheng
No ratings yet
Data Science Module 5 Q & A
No ratings yet
Data Science Module 5 Q & A
8 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Iterative Predictor Weighting (IPW) PLS: A Technique For The Elimination of Useless Predictors in Regression Problems
No ratings yet
Iterative Predictor Weighting (IPW) PLS: A Technique For The Elimination of Useless Predictors in Regression Problems
21 pages
3rd Module EDBA Contiuation1
No ratings yet
3rd Module EDBA Contiuation1
6 pages
Selecion Secuencial de Caracteristicas
No ratings yet
Selecion Secuencial de Caracteristicas
5 pages
Practical Design of Experiments: DoE Made Easy
From Everand
Practical Design of Experiments: DoE Made Easy
Colin Hardwick
4.5/5 (7)
MultiLinear VariableSelection
No ratings yet
MultiLinear VariableSelection
10 pages
Métodos numéricos aplicados a Ingeniería: Casos de estudio usando MATLAB
From Everand
Métodos numéricos aplicados a Ingeniería: Casos de estudio usando MATLAB
Héctor Jorquera González
5/5 (1)
Features Election
No ratings yet
Features Election
18 pages
Steps in Logistic Regression
No ratings yet
Steps in Logistic Regression
5 pages
Python MP Report PDF
No ratings yet
Python MP Report PDF
61 pages
Backward Elimination and Stepwise Regression
No ratings yet
Backward Elimination and Stepwise Regression
5 pages
1 Model Building and Application in Logistic Regression
No ratings yet
1 Model Building and Application in Logistic Regression
7 pages
S2 DataStructuresandSQL
No ratings yet
S2 DataStructuresandSQL
180 pages
Sa 16
No ratings yet
Sa 16
5 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
MINITAB Which Is Better
No ratings yet
MINITAB Which Is Better
6 pages
Regression Analysis (Spring, 2000) : by Wonjae
No ratings yet
Regression Analysis (Spring, 2000) : by Wonjae
6 pages
Data Mining Primer
No ratings yet
Data Mining Primer
5 pages
Shawndra Hill Spring 2013 TR 1:30 - 3pm and 3 - 4:30
No ratings yet
Shawndra Hill Spring 2013 TR 1:30 - 3pm and 3 - 4:30
104 pages
Business Analytics M17 EVAN7821 09 SE SUPPA Online - Evans
No ratings yet
Business Analytics M17 EVAN7821 09 SE SUPPA Online - Evans
30 pages
Chapter 2. Signals and Spectra
No ratings yet
Chapter 2. Signals and Spectra
105 pages
Shawndra Hill Spring 2013 TR 1:30 - 3pm and 3 - 4:30
No ratings yet
Shawndra Hill Spring 2013 TR 1:30 - 3pm and 3 - 4:30
75 pages
Chapter 14
No ratings yet
Chapter 14
15 pages
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
From Everand
Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers
Jens K. Perret
No ratings yet
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
3.6 Iterative Methods For Solving Linear Systems
No ratings yet
3.6 Iterative Methods For Solving Linear Systems
35 pages
AF1 BSLimitations PDF
No ratings yet
AF1 BSLimitations PDF
18 pages
OBC-YOLOv8 - An Improved Road Damage Detection Model Based On YOLOv8
No ratings yet
OBC-YOLOv8 - An Improved Road Damage Detection Model Based On YOLOv8
15 pages
A New Generalized Pole Clustering-Based Model Reduction 2022
No ratings yet
A New Generalized Pole Clustering-Based Model Reduction 2022
33 pages
Heuristic Optimization Methods: Tabu Search: Slides Prepared by Nina Skorin-Kapov
No ratings yet
Heuristic Optimization Methods: Tabu Search: Slides Prepared by Nina Skorin-Kapov
40 pages
Building Aerodynamics and Shape Optimisation
No ratings yet
Building Aerodynamics and Shape Optimisation
16 pages
XEmoAccent Embracing Diversity in Cross-Accent Emo
No ratings yet
XEmoAccent Embracing Diversity in Cross-Accent Emo
19 pages
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
No ratings yet
College of Natural and Computational Science Department of Statistics Linear Regression Biostatistics Master Program
3 pages
ReGNN A Redundancy-Eliminated Graph Neural Networks Accelerator
No ratings yet
ReGNN A Redundancy-Eliminated Graph Neural Networks Accelerator
15 pages
PKI Sunny Classes
No ratings yet
PKI Sunny Classes
22 pages
Direct Substitution
No ratings yet
Direct Substitution
3 pages
5th Sem Main Exam + Test Exam Questions
No ratings yet
5th Sem Main Exam + Test Exam Questions
10 pages
Computer Laboratory 2 Oral Question Answer Set I
No ratings yet
Computer Laboratory 2 Oral Question Answer Set I
10 pages
KV Sample Paper - 1
No ratings yet
KV Sample Paper - 1
6 pages
3804-Article Text-13528-1-18-20230606
No ratings yet
3804-Article Text-13528-1-18-20230606
11 pages
FPGA-Based Farsi Handwritten Digit Recognition System
No ratings yet
FPGA-Based Farsi Handwritten Digit Recognition System
7 pages
Maths Inequalities Region
No ratings yet
Maths Inequalities Region
8 pages
VNA Calibration Methods
No ratings yet
VNA Calibration Methods
7 pages
Antony 2016
No ratings yet
Antony 2016
6 pages
Probio MRSA (Lactobacillus) NMD - B01 - 15 - 04 - 19
No ratings yet
Probio MRSA (Lactobacillus) NMD - B01 - 15 - 04 - 19
9 pages
Extension of Playfair Cipher Using 16X16
No ratings yet
Extension of Playfair Cipher Using 16X16
5 pages
Gaussian Quadrature
No ratings yet
Gaussian Quadrature
6 pages
10d Exam Review Overall 1 Solutions
No ratings yet
10d Exam Review Overall 1 Solutions
4 pages
Ways to Achieve Quality
From Everand
Ways to Achieve Quality
chakrapani srinivasa
5/5 (1)
The Spring 2013 Syllabus For OPIM 472
No ratings yet
The Spring 2013 Syllabus For OPIM 472
1 page
Assignment #5: Sorting Lab: Due: Mon, Feb 25 2:15pm
No ratings yet
Assignment #5: Sorting Lab: Due: Mon, Feb 25 2:15pm
5 pages
Meadows or Malls - 3 24 21 - Report Writeup and Reflection
No ratings yet
Meadows or Malls - 3 24 21 - Report Writeup and Reflection
3 pages
Assignment 2 - S
No ratings yet
Assignment 2 - S
2 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

S7 extraFeatureSelection

Uploaded by

S7 extraFeatureSelection

Uploaded by

S7

Extra: Feature Selec/on

Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30

Ideally would be able to determine single best subset of predictors to include

Feature Selec/on Algorithms

Forward stepwise regression

OVen produces dierent nal model than forward stepwise method

S7 Extra: Feature Selec/on

Shawndra Hill Spring 2013 TR 1:30-3pm and 3-4:30

You might also like