01 Ml-Overview Slides
01 Ml-Overview Slides
Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat479-fs2018/
1
About this Course
When
• Tue 8:00-9:15 am
• Thu 8:00-9:15 am
Where
• SMI 331
Office Hours
• Sebastian Raschka:
Tue 3:00-4:00, Room MSC 1171
• Shan Lu (TA):
Wed 3:00-4:00 pm, Room MSC B248
(This is likely not an original quote but a paraphrased— Arthur L. Samuel, AI pioneer, 1959
predictions) themselves. Machine learning is turning data into programs.
version of Samuel’s sentence ”Pro-
It is said that the term machine learning was first coined by Arthur Lee Samuel in 19591 .
gramming computers to learn from experience should eventually eliminate
One quote that almost every introductory machine learning resource is often accredited to
the need for much
of this an
Samuel, detailed programming
pioneer of the field of AI: e↵ort.”)
1 ArthurL Samuel. “Some studies in machine learning using the game of checkers”. In: IBM Journal of
“The field ofconstruct
research and developmentmachine
“The field 3.3
of machine learning is concerned with the question of how to
(1959),learning is concerned with the question of how to
pp. 210–229.
computer programs that automatically improve with experience”
construct computer programs
— Tom Mitchell, that
former chair of automatically
the Machine Learning department of improve with experience”
Carnegie Mellon University
— Tom Mitchell, former chair of the Machine Learning department of
1Arthur L Samuel. “Some studies in machine learning using the game of checkers”. In: IBM Journal of Carnegie Mellon University
research and development 3.3 (1959), pp. 210–229.
Sebastian Raschka STAT 479: Machine Learning FS 2018 !6
Sebastian Raschka STAT 479: Machine Learning FS 2018 !7
Sebastian Raschka STAT 479: Machine Learning FS 2018 !8
“A breakthrough in machine learning would be worth ten Microsofts”
— Bill Gates, Microsoft Co-Founder
“A• computer
Performance measure is
program P said
: percent of words
to learn correctly
from classified
experience E with respect to
some class ofexperience
• Training tasks T E:
anda performance measure words
database of handwritten P , if its
withperformance at tasks
given classifications
in T , as measured by P , improves with experience E.”
1.2 Applications—ofTom Mitchell,
Machine Professor at Carnegie Mellon University
Learning
Labeled data
Supervised Learning Direct feedback
Predict outcome/future
No labels/targets
Unsupervised Learning No feedback
Find hidden structure in data
Decision process
Reinforcement Learning Reward system
Learn series of actions
x2
x1
Labeled data
Supervised Learning Direct feedback
Predict outcome/future
No labels/targets
Unsupervised Learning No feedback
Find hidden structure in data
Decision process
Reinforcement Learning Reward system
Learn series of actions
x2
x1
Labeled data
Supervised Learning Direct feedback
Predict outcome/future
No labels/targets
Unsupervised Learning No feedback
Find hidden structure in data
Decision process
Reinforcement Learning Reward system
Learn series of actions
Environment
Reward
State
Action
Agent
Classification Regression
m m
h :ℝ → ___ h :ℝ → ___
Labels
Training Data
Machine Learning
Algorithm
Labels
Training Dataset
Learning
Final Model New Data
Labels Algorithm
Model Selection
Cross-Validation
Performance Metrics
Hyperparameter Optimization
x1
x2
x=
⋮
xm
Feature vector
xT1
x1
x2 xT2
x= X=
⋮ ⋮
xm xTn
Feature vector
__________ ____________
Feature vector
__________ ____________ __________ ____________
m= _____
n= _____
x1 y [1]
x2 y [2]
x= y=
⋮ ⋮
xm y [n]
Hypothesis space
a particular learning
algorithm category
has access to
Hypothesis space
a particular learning
algorithm can sample
Particular hypothesis
(i.e., a model/classifier)
sepal length < 5 cm sepal width < 5 cm petal length < 5 cm petal width < 5 cm Class Label
True True True True Setosa
True True True False Versicolor
True True False True Setosa
... ... ... ... ...
• Ensembles (e.g.,
Labels
Training Dataset
Learning
Final Model New Data
Labels Algorithm
Model Selection
Cross-Validation
Performance Metrics
Hyperparameter Optimization
• Maximize information gain/minimize child node impurities (CART decision tree classification)
• Minimize a mean squared error cost (or loss) function (CART, decision tree regression, linear
regression, adaptive linear neurons, ...)
{1 if ŷ ≠y
0 if ŷ = y
L(y,̂ y) =
n
1
L(ŷ , y )
[i] [i]
n∑
ERR𝒟test =
i= 1
• ROC AUC
• Precision
• Recall
• (Cross) Entropy
• Likelihood
• Squared Error/MSE
• L-norms
• Utility
• Fitness
• ...
• eager vs lazy;
• eager vs lazy;
• batch vs online;
• eager vs lazy;
• batch vs online;
• parametric vs nonparametric;
• eager vs lazy;
• batch vs online;
• parametric vs nonparametric;
• discriminative vs generative.
• Neuroscientists:
Machine Learning
Deep Learning
AI
Programming
language
"popularity"
https://www.tiobe.com/tiobe-index/
https://www.tiobe.com/tiobe-index/programming-languages-definition/
http://stat.wisc.edu/~sraschka/teaching/stat479-fs2018/#schedule