0% found this document useful (0 votes)

5 views49 pages

Jdavis Advice

Uploaded by

Riccardo Forte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views49 pages

Jdavis Advice

Uploaded by

Riccardo Forte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

1

ADVICE ABOUT PRACTICAL

ASPECTS OF ML

Jesse Davis
Goals of this Lecture: Address Practical Aspects of
Machine Learning
2

 Massaging the data for better performance

 Discussing how to set up an appropriate empirical evaluation

 Identifying potential pitfalls

 At high level a bunch of stuff I wish I knew for

 Performing academic empirical evaluations
 Dealing with real-world “applied” tasks
3 Part I: Selecting Features
Dimensionality Reduction
4

 Represent data with fewer dimensions! ☺

 Effectively: Alter the given feature space

 Two broad ways

 Construct new feature space
 Simple drop dimensions in given space
Why Dimensionality Reduction
5

 Easier learning – fewer parameters

 |Features| ≫ |training examples| ??
 Better visualization
 Hard to understand more than 3D or 4D
 Discover “intrinsic dimensionality” of data
 High dimensional data may truly be low dimensional
 More interpretable models
 Interested in which features are relevant for task
 Improve efficiency
 Fewer features = less memory / runtime
Don’t Some Algorithms Do This?
6

 Decision trees:
 Selectmost promising feature at each node
 Tree only contains a subset of features

 Problem: Irrelevant attributes can degrade performance due to

data fragmentation
 Datasplit into smaller and smaller sets
 Even random attribute can look good with little data by chance

 More data does not help

Principal Component Analysis
7

 First principal component:

Direction of the largest variance x2
u1

 Each subsequent principal component:

 Orthogonal to the previous ones, and

 Directions of the largest variance of

the residuals

Big Idea: Rotate the axes and drop irrelevant ones!

Eigenfaces [Turk, Pentland ’91]
8

Input images:
 N images

 Each 5050 pixels

 2500 features

Misleading figure.
Best to think of as an N  2500 matrix: |Examples|  |Features|
Reduce Dimensionality 2500 → 15
9

First principal component

Average
face

Other
components
Problematic Data Set for PCA
10

PCA cannot capture NON-LINEAR structure!

PCA Conclusions
11

 PCA
 Rotate the axes and sort new dimensions in order of “importance”
 Discard low significance dimensions
 Uses:
 Get compact description
 Ignore noise
 Improve classification (hopefully)
 Not magic:
 Doesn’t know class labels
 Can only capture linear variations

 One of many tricks to reduce dimensionality!

Feature Selection:
12
Two Approaches
Filtering-Based Wrapper-Based
Feature Selection Feature Selection
all features
all FS algorithm
FS algorithm features calls ML
algorithm
Score and rank each many times,
feature: Pick top k uses it to help
model select features
ML algorithm
ML algorithm
model
Filter-Based Approaches
13

 Idea: Measure each feature’s usefulness in isolation (i.e.,

independent of other features)

 Pro: Very fast so scales to large feature sets or large data sets

 Cons
 Misses feature interactions
 May select many redundant feature
Approach 1: Correlation
14

Gain(S,A) = Entropy(S) - Σ(|Sv| / |S|) Entropy(Sv)

v  Values(A)

cov( fi , y)
R ( fi , y ) =
var( fi ) var( y)

 (f )( y )
m
k =1 k,i − fi k −y
R( f i , y) =
 (f )  (y )
m 2 m 2

k =1 k ,i
− fi k =1 k
−y
Approach 2: Single Variable Classifier
15

 Select variable according to individual predictive performance

 Build classifier with just one variable

 Discrete:
Decision stump
 Continuous: Threshold the variable value

 Measure performance using accuracy, balanced accuracy, AUC,

etc.
Wrapper-Based Feature Selection
16

 Feature selection = search

 State = set of features
 Start state
 Forwardselection: Empty
 Backward elimination: Full

 Operators:
 Forward:add a feature
 Backward: subtract a feature

 Scoring function: Learned model’s performance on

training/tuning/ CV on the state’s feature set
Forward Feature Selection
17

Greedy search (aka “Hill Climbing”)

{}
50%

{F1} {F2} {Fd}

...
62% 72% 52%

add F3

{F1,F2} {F2,F3} ... {F2,Fd}

74% 73% 84%
Backward Feature Selection
18

Greedy search (aka “Hill Climbing”)

{F1,…,Fd}
75%
subtract F2

{F2,…,Fd} {F1, F3,…,Fd} ... {F1,…,Fd-1}

72% 82% 78%

subtract F3

{F3,…,Fd} {F1, F4,…,Fd} ... {F1, F3,…,Fd-1}

80% 83% 81%
Forward vs. Backward Selection
19

Forward Backward
 Faster in early steps because  Fast for choosing all but a
fewer features to test small subset of the features

 Fast for choosing a small

 Preserves features whose
subset of the features
usefulness requires other
features (e.g., area requires
 Misses features whose both length & width)
usefulness requires other
features (feature synergy)
Impact of feature selection on classification of
20
fMRI data [Pereira et al. ’05]
Feature Selection vs. Dimensionality Reduction
21

 Feature selection: Project to a lower dimensional subspace

perpendicular to removed feature
 Dimensionality reduction: allow other kinds of projection
Project onto
x2 x2 rotated axes

Drop x2

x1 x1
Feature Selection in Practice
22

 You cannot globally select the best features

 Thisis cheating
 Data leakage from test set to training set

 Results would be overoptimistic

 Feature selection must be performed separately for each fold

 Implication: Each fold could have a different feature set

23 Advice for Evaluation
Empirical Evaluation: Think about What You Want to
Demonstrate
24

 Many relevent questions

 Do we beat competitors?
 Are we more data efficient than competitions?
 Are we faster than the competition?

 Good practices:
 Pose a question / hypothesis and answer it
 Also include a naive baseline such as
◼ Always predict majority class
◼ Return mean value in training data
Case Study: RPE for Professional Soccer
25
Players
1.20
Given: GPS and
accelerometer data from a 1.00
player’s training session Train
set
Predict: Player’s Rate of 0.80 average
Neural
Perceived Exertion Net

MAE
0.60
LASSO
Question: Is model valid 0.40

across seasons?
0.20

0.00
Results: Is an Individual Model More Accurate Than
a Team Model?
26

0.90

0.85
Mean Absolute Error

0.80

0.75

0.70

0.65
Individual Team
Neural Net Boosted Tree LASSO

Lower value is better

How Does Amount of Data Affect Performance?
27

0.50
0.45
0.40
0.35
0.30
AUCPR

0.25
0.20
0.15
0.10
0.05
0.00
1 2 3
Number of Training Databases

TODTLER DTM LSM Random

Learning curve: Show performance as a function

of the amount of training data
Case Study: Activity Recognition
28

 Given: 3D accelerometer data from a phone

 Predict: Person’s activity (walking, ascending stairs, descending
stairs, cycling, jogging)
 Hypothesis: Deriving new signals will help
 Setup: Simulate different attachments by
rotating axes
 Approaches compared:
 TSFuse + GBT
 TSFresh + GBT (Time series features, but no fusion)
 RNN (LSTM)
Results Activity Recognition
29

TSFuse
TSFresh
RNN
Case Study: Energy Efficient Prediction
30

 Motivation: Learned models often deployed on devices with

resource constraints (e.g., battery)
 Question: How does feature selection strategy affect performance?
 Static
selection: Always consider k feature
 Dynamic selection: May ignore some features

 Approach: Fix max feature budget

RCV: Speedup and Weighted Accuracy vs.
Feature Budget
31

6.00
Speedup Factor Our approach:
5.00
4X more predictions
4.00
on resource budget
3.00
2.00
1.00
0.00
Δ Weighted Accuracy

0.02

0.01
IG
0.00
ΔCP
-0.01
0 200 400 600 800 1000
Feature Budget
Comparing Run Times Is A Dark Art
32

 What to measure: Wall clock or CPU time?

 Be sure to run everything on identically configured machines
 Should you include time to tune models?
 Easy to manipulate
 Also very relevant…

 Differences due to
 Programming languages
 How optimized the code is (definitely relevant)
Evaluate Design Decision: Ablasion or Lesion Study
33

 When designing your algorithm / model you make lots of decision

choices
 Which features
 Which normalizations

 Which functionality

 Ablative analysis tries to explain the difference between some

baseline (much poorer) performance and current performance
 Remove aspects of system and measure effect on performance
Case Study: Fatigue Protocol Data
34
Rating of perceived exertion (RPE): 6 – 20
Upper
arm

Wrist
Both
Tibia

Given: IMU data from a runner

Predict: Current fatigue level
Pre-processing: Normalizations Based on Domain
Knowledge
35

RPE evolution is trial-dependent: Normalize to first value

Normalize based on change from first windows

Domain Insight: Change in feature values over time is key
Effects of Feature Normalization for
Gradient Boosted Trees
36

No learning baselines:
3.50 Constant predictions

3.00
Median RPE
2.50
MAE of RPE

Personalized
2.00
Median
1.50 No Normalization

1.00 Normalization

0.50

0.00
Case Study: Resource Monitoring
37

Maintenance Univariate measurement: Abnormally high

usage Patterns Sampled every 5 minutes

Given: Real water usage data from a retail store

Do: Detect high periods of usage

Approach: Semi-supervised learning

 Simple statistical features, day of week, etc.

 Above features plus learned shape patterns

Results: Anomaly Detection for
Water Usage
38

Area under ROC Curve

Simple Features
Simple Features + Learned Patterns

Time of Data
39 Potential Problems or Pitfalls
Cross Validation Errors
40

 Must repeat entire data processing pipeline on every fold of

cross-validation using only that fold’s TRAINING DATA
 E.g.,
cannot do preprocessing over entire data set (feature selection,
parameter tuning, etc.

 Did I tweak my algorithm a million times until I get good results?

 Solution:Use one or two datasets for development, then expand
evaluation

 Temporal dependencies in the data?

Temporal Data Is Trickier!
41

 Setting: One season of data from training sessions from a professional

football team:
Season start Season end

Training: first 80% of data Testing: last 20% of data

 Predict adverse drug reactions
Adverse
Patient’s history First Prescription
Reaction?

Training data Censoring window

Class Imbalance
42

 Real-world problems: Often more examples of one class

(negatives) than the other (positives)

 One class rare: Anomaly detection, cancer, goals in a soccer

match, etc.

 This causes difficulties for learners: Hard to beat always

predicting the majority class!
Idea 1: Sampling
43

 Oversample the minority class: May lead to overfitting

 Undersample the majority class: Odd to throw away data

 SMOTE: Generate synthetic minority examples

 Find nearest neighbors
 Interpolate between them

- -
+ - +
+ - +
+ - -- - +
Synthetic - - -
example
Idea 2: Manipulate the Learner
44

 Change the cost function: Penalize mistakes on minority class

more heavily

 Optimize towards something that is better at capturing skew

0.5 × 𝑇𝑃 0.5 × 𝑇𝑁
 Balanced accuracy = +
𝑇𝑃 + 𝐹𝑁 𝐹𝑃 + 𝑇𝑁
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
 F1 =2×
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙

 ROC

 Precision / Recall
My Model Is Not Accurate Enough
45

 Suppose: Activity recognition into walking, running, ascend stairs,

descend stairs
 Five minutes of data from ten subjects
 Divide data into 5 second windows which yields 600 examples

 Use five simple features from X, Y, Z acceleration

 Train linear separator using log loss

 Optimize using gradient descent

 Leave-one-subject out CV: 70% accuracy

Question: What do I do?

Possible fixes
46

 More data
 More / better features
 Change optimizer
 Change objective function
 Change model class
Question: What do I do?

Option 1: Grad student descent and try everything

Option 2: Debug the learning process

Look at Learning Curve
47

Error

Error
Train
Test

#Training examples #Training examples

Train and test error  Train error is low
 High
 Test error is high
 Close
=> More data
 More/better features

 More expressive model?

Conclusions
48

 Feature selection is important in practice

 Think about what you want to show in your empirical evaluation

 Practical issues are hard

 It
is a lot of guess and check at first
 Eventually you develop intuitions

 Generally speaking: Features and data more important than model

Questions?
49

Settings 1000 Pip Climber System EA
No ratings yet
Settings 1000 Pip Climber System EA
11 pages
ML Performance Improvement Cheatsheet
No ratings yet
ML Performance Improvement Cheatsheet
11 pages
Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
Murchison Guide
100% (1)
Murchison Guide
23 pages
PowertrainCatalog Ford PDF
100% (1)
PowertrainCatalog Ford PDF
222 pages
3ML.03.Feature Reduction
No ratings yet
3ML.03.Feature Reduction
44 pages
ML Notes
No ratings yet
ML Notes
15 pages
DSH - L5 - Data-Driven Approaches - Concepts
No ratings yet
DSH - L5 - Data-Driven Approaches - Concepts
38 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
56 pages
Feature Selection: Slide 1
No ratings yet
Feature Selection: Slide 1
29 pages
Azencott BioML
No ratings yet
Azencott BioML
87 pages
Unit V
No ratings yet
Unit V
82 pages
Minor Project
No ratings yet
Minor Project
21 pages
Lecture 5 - Feature Extraction, Model Building & Evaluation
No ratings yet
Lecture 5 - Feature Extraction, Model Building & Evaluation
35 pages
40 Interview Questions On Machine Learning From Analytics Vidhya
No ratings yet
40 Interview Questions On Machine Learning From Analytics Vidhya
14 pages
Feature and Feature Extractionlect2
No ratings yet
Feature and Feature Extractionlect2
28 pages
Week 2 v1.1 (Hidden) - Dimensionality and Evaluation
No ratings yet
Week 2 v1.1 (Hidden) - Dimensionality and Evaluation
47 pages
Lecture 13
No ratings yet
Lecture 13
39 pages
Module 2
No ratings yet
Module 2
73 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
Machine Learning and Econometrics
No ratings yet
Machine Learning and Econometrics
50 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
AI5003 AML Week07
No ratings yet
AI5003 AML Week07
14 pages
ML Notes All
No ratings yet
ML Notes All
32 pages
Chapter 4
No ratings yet
Chapter 4
25 pages
ML and DL
No ratings yet
ML and DL
15 pages
Information Gain - Towards Data Science
No ratings yet
Information Gain - Towards Data Science
8 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
Interview Questions On Machine Learning
100% (4)
Interview Questions On Machine Learning
22 pages
Cheat Sheet - Machine Learning - Data Science Interview PDF
No ratings yet
Cheat Sheet - Machine Learning - Data Science Interview PDF
16 pages
Lecture 10 Merged
No ratings yet
Lecture 10 Merged
14 pages
ML Mdu 2024 10939237
No ratings yet
ML Mdu 2024 10939237
20 pages
4 DL
No ratings yet
4 DL
81 pages
Sta 5
No ratings yet
Sta 5
16 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
FAI Lecture - 23-10-2023 PDF
No ratings yet
FAI Lecture - 23-10-2023 PDF
12 pages
PR Assignment 01 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 01 - Seemal Ajaz (206979)
7 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
57 pages
Features Selection and Featurs Generation
No ratings yet
Features Selection and Featurs Generation
5 pages
Final Report XG 2025(2) 2
No ratings yet
Final Report XG 2025(2) 2
49 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
ML Imp Ques 1
No ratings yet
ML Imp Ques 1
22 pages
Top 10 Data Mining Mistakes
No ratings yet
Top 10 Data Mining Mistakes
25 pages
An Introduction To Pattern Recognition - 2
No ratings yet
An Introduction To Pattern Recognition - 2
46 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
Dimenn Red PDF
No ratings yet
Dimenn Red PDF
135 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
40 Interview Questions On Machine Learning - AnalyticsVidhya
100% (1)
40 Interview Questions On Machine Learning - AnalyticsVidhya
21 pages
Preface To The Second Edition V 1 1
No ratings yet
Preface To The Second Edition V 1 1
9 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
CS3244 (2120) - Project Discussion 1 - Overview
No ratings yet
CS3244 (2120) - Project Discussion 1 - Overview
25 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Autodesk Maya 2022: A Comprehensive Guide, 13th Edition
From Everand
Autodesk Maya 2022: A Comprehensive Guide, 13th Edition
Prof. Sham Tickoo
No ratings yet
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
From Everand
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
Eric Elliott
No ratings yet
Mastering OpenGL: From Basics to Advanced Rendering Techniques: OpenGL
From Everand
Mastering OpenGL: From Basics to Advanced Rendering Techniques: OpenGL
Kameron Hussain
No ratings yet
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Beginning Software Engineering
From Everand
Beginning Software Engineering
Rod Stephens
4.5/5 (2)
Chapter 8 Risk and The Risk Management Process
No ratings yet
Chapter 8 Risk and The Risk Management Process
24 pages
Robotic League CDC
No ratings yet
Robotic League CDC
7 pages
Cladding System
100% (1)
Cladding System
44 pages
18002I-EA-ECD-002-02-132kV HT Cable Laying Methodology PDF
No ratings yet
18002I-EA-ECD-002-02-132kV HT Cable Laying Methodology PDF
32 pages
PPE Part 1 Initial Measurement
No ratings yet
PPE Part 1 Initial Measurement
19 pages
Camera Summary
No ratings yet
Camera Summary
3 pages
JotaFloor Pu Crete
No ratings yet
JotaFloor Pu Crete
6 pages
Business Process Models Ais
No ratings yet
Business Process Models Ais
18 pages
The Importance of Frequent Turning of The Bedridden
No ratings yet
The Importance of Frequent Turning of The Bedridden
22 pages
A Quick-and-Dirty Introduction To Accelerationism - Jacobite PDF - PDF - Nihilism
No ratings yet
A Quick-and-Dirty Introduction To Accelerationism - Jacobite PDF - PDF - Nihilism
10 pages
Surveying 1 2
No ratings yet
Surveying 1 2
2 pages
Current Trends Inengineering Practice Volume III
No ratings yet
Current Trends Inengineering Practice Volume III
428 pages
Assignment 2 Relational Schema
No ratings yet
Assignment 2 Relational Schema
5 pages
The Standish Group Report: Project Management
No ratings yet
The Standish Group Report: Project Management
3 pages
A 100-Gbs PAM-4 DSP in 28-Nm CMOS For Serdes Receiver
No ratings yet
A 100-Gbs PAM-4 DSP in 28-Nm CMOS For Serdes Receiver
14 pages
Introduction To Research - Lecture 1 - Research Methods and Methodology
No ratings yet
Introduction To Research - Lecture 1 - Research Methods and Methodology
31 pages
AIX系统使用USB闪存设备
No ratings yet
AIX系统使用USB闪存设备
4 pages
Summary of Demand, Supply, and Market Equilibrium
No ratings yet
Summary of Demand, Supply, and Market Equilibrium
3 pages
Udoh CV 2
No ratings yet
Udoh CV 2
3 pages
Nagaland State Disaster Management Plan
No ratings yet
Nagaland State Disaster Management Plan
146 pages
Majalah BancoNota
No ratings yet
Majalah BancoNota
32 pages
Assignment #1 (Marketing Management)
No ratings yet
Assignment #1 (Marketing Management)
7 pages
2019-07 Machine-Details EN Mail
No ratings yet
2019-07 Machine-Details EN Mail
63 pages
Aquasim Manual
No ratings yet
Aquasim Manual
219 pages
Test Result 17.52 MM
No ratings yet
Test Result 17.52 MM
1 page
Biomechanics For Dummies Cheat Sheet Pt. 1
No ratings yet
Biomechanics For Dummies Cheat Sheet Pt. 1
6 pages
Chemical Engineering Journal: Seung-Ah Hong, Su Jin Kim, Jaehoon Kim, Byung Gwon Lee, Kyung Yoon Chung, Youn-Woo Lee
No ratings yet
Chemical Engineering Journal: Seung-Ah Hong, Su Jin Kim, Jaehoon Kim, Byung Gwon Lee, Kyung Yoon Chung, Youn-Woo Lee
9 pages