Applied Machine Learning Supervised Machine Learning (Part 2)
Applied Machine Learning Supervised Machine Learning (Part 2)
LEARNING IN PYTHON
Feature 2
Feature 1
APPLIED MACHINE
LEARNING IN PYTHON
Feature 2
Feature 1
APPLIED MACHINE
LEARNING IN PYTHON
Random Forests
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
Random Forests
• An ensemble of trees, not just one tree.
• Widely used, very good results on many problems.
• sklearn.ensemble module:
– Classification: RandomForestClassifier
– Regression: RandomForestRegressor
• One decision tree → Prone to overfitting.
• Many decision trees → More stable, better generalization
• Ensemble of trees should be diverse: introduce random
variation into tree-building.
APPLIED MACHINE
LEARNING IN PYTHON
+ + +…
• The learning rate controls how hard each new tree tries to
correct remaining mistakes from previous round.
• High learning rate: more complex trees
• Low learning rate: simpler trees GBDT decision regions on two-
feature fruits dataset
APPLIED MACHINE
LEARNING IN PYTHON
GBDT: GradientBoostingClassifier
Key Parameters
• n_estimators: sets # of small decision trees to use
(weak learners) in the ensemble.
• learning_rate: controls emphasis on fixing errors
from previous iteration.
• The above two are typically tuned together.
• n_estimators is adjusted first, to best exploit memory
and CPUs during training, then other parameters.
• max_depth is typically set to a small value (e.g. 3-5) for
most applications.
APPLIED MACHINE
LEARNING IN PYTHON
Neural networks
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
𝑦𝑦� = 𝑏𝑏� + 𝑤𝑤
�1 ⋅ 𝑥𝑥1 + ⋯ 𝑤𝑤
�𝑛𝑛 ⋅ 𝑥𝑥𝑛𝑛 𝑦𝑦� = logistic(𝑏𝑏� + 𝑤𝑤
�1 ⋅ 𝑥𝑥1 + ⋯ 𝑤𝑤
�𝑛𝑛 ⋅ 𝑥𝑥𝑛𝑛 )
APPLIED MACHINE
LEARNING IN PYTHON
Activation Functions
APPLIED MACHINE
LEARNING IN PYTHON
activation = 'tanh'
activation = 'relu'
Deep Learning
(Optional overview)
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
Image: M. Peemen, B. Mesman, and H. Corporaal. Efficiency Optimization of Trainable Feature Extractors for a Consumer Platform.
Proceedings of the 13th International Conference on Advanced Concepts for Intelligent Vision Systems, 2011.
APPLIED MACHINE
LEARNING IN PYTHON
Deep Learning
First, second, and third feature layer bases learned for faces
Image: Honglak Lee and colleagues (2011) from “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief
Networks”. Communications of the ACM, Vol. 54 No. 10, Pages 95-103
APPLIED MACHINE
LEARNING IN PYTHON
Deep Learning
Second
layer
Third
layer
• TensorFlow https://www.tensorflow.org/
• Theano http://deeplearning.net/software/theano/
Data Leakage
Kevyn Collins-Thompson
Associate Professor of Information & Computer Science
University of Michigan
APPLIED MACHINE
LEARNING IN PYTHON
Data Leakage
• When the data you're using to train contains
information about what you're trying to predict.
• Introducing information about the target during
training that would not legitimately be available during
actual use.
• Obvious examples:
– Including the label to be predicted as a feature
– Including test data with training data
• If your model performance is too good to be true, it
probably is and likely due to "giveaway" features.
APPLIED MACHINE
LEARNING IN PYTHON