0% found this document useful (0 votes)
85 views

01 Ml-Overview Slides

This document provides an overview of an introductory machine learning course. It outlines details about the course such as when and where it will be held, contact information for the professor and TA, and links for additional course information. It also includes several quotes from experts emphasizing the importance and potential of machine learning.

Uploaded by

ashishamitav123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

01 Ml-Overview Slides

This document provides an overview of an introductory machine learning course. It outlines details about the course such as when and where it will be held, contact information for the professor and TA, and links for additional course information. It also includes several quotes from experts emphasizing the importance and potential of machine learning.

Uploaded by

ashishamitav123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Lecture 01

What is Machine Learning?



An Overview.

STAT 479: Machine Learning, Fall 2018

Sebastian Raschka

http://stat.wisc.edu/~sraschka/teaching/stat479-fs2018/

1
About this Course
When
• Tue 8:00-9:15 am
• Thu 8:00-9:15 am
Where
• SMI 331

Office Hours
• Sebastian Raschka:

Tue 3:00-4:00, Room MSC 1171
• Shan Lu (TA):

Wed 3:00-4:00 pm, Room MSC B248

For details -> http://stat.wisc.edu/~sraschka/teaching/stat479-fs2018/

Sebastian Raschka STAT 479: Machine Learning FS 2018 !2


What is Machine Learning?


Sebastian Raschka STAT 479: Machine Learning FS 2018 !3


hka STAT479 Fall 2018. Lecture #: Placeholder Page 2

“Machine learning is the hot new thing”


— John L. Hennessy, President of Stanford (2000–2016)

“A breakthrough in machine learning would be worth ten Microsofts”


— Bill Gates, Microsoft Co-Founder

Sebastian Raschka STAT 479: Machine Learning FS 2018 !4


“Machine learning is the hot new thing”
— John L. Hennessy, President of Stanford (2000–2016)

“A breakthrough in machine learning would be worth ten Microsofts”


— Bill Gates, Microsoft Co-Founder

Sebastian Raschka STAT 479: Machine Learning FS 2018 !5


We develop (computer) programs
Departmenttoofautomate
Statistics various kinds of processes. Originally devel-
words, machine learning lets computers ”create” programs (often for making
oped as a subfield ofUniversity
ArtificialofIntelligence (AI), one of the goals behind machine learning
Wisconsin–Madison
emselves. Machine learning is turning data into programs.
was to replace the need for developing computer programs ”manually.” If programs are a
means to http://stat.wisc.edu/
automate processes, ⇠sraschka/teaching/stat479-fs2018/
we can think of machine learning as ”automating automa-
tion.” In other words, machine learning lets computers ”create” programs (often for making
the term machine learning was first coined by Arthur Lee Samuel in 19591 .
predictions) themselves. Machine Falllearning
2018 is turning data into programs.
almostIt is every
said that introductory machine
the term machine learning was firstlearning resource
coined by Arthur Lee Samuelis inoften
19591 . accredited to
neer ofOne
1 the quote that
Whatfield almost every
of AI:
is Machine introductory
Learning?
Samuel, an pioneer of the field of AI:
Anmachine
Overview.learning resource is often accredited to

1.1 Machine Learning – The Big Picture


“Machine
We develop (computer) programs learning various
to automate is the kinds
field of
ofprocesses.
study that gives computers
Originally devel- the ability to
“Machine learning is the field of study that gives computers the ability to
learn Intelligence
oped as a subfield of Artificial without being explicitly
(AI), one programmed”
of the goals behind machine learning
was to replace the need for developing computer programs ”manually.”—If Arthur
programsL.areSamuel,
a AI pioneer, 1959
learn without being explicitly programmed”
means to automate processes, we can think of machine learning as ”automating automa-
tion.” In other words, machine learning lets computers ”create” programs (often for making

(This is likely not an original quote but a paraphrased— Arthur L. Samuel, AI pioneer, 1959
predictions) themselves. Machine learning is turning data into programs.
version of Samuel’s sentence ”Pro-
It is said that the term machine learning was first coined by Arthur Lee Samuel in 19591 .
gramming computers to learn from experience should eventually eliminate
One quote that almost every introductory machine learning resource is often accredited to
the need for much
of this an
Samuel, detailed programming
pioneer of the field of AI: e↵ort.”)

not an original “Machine quote but


learning
“The field
is theafieldparaphrased version
of study that gives computers the ability to of Samuel’s sentence ”Pro-
learn without being of machine
explicitly learning is concerned with the question of how to
programmed”
puters to learn from experience
construct computer programs should thatL.eventually
— Arthur automatically
Samuel, AI pioneer, eliminate
improve
1959 the need for much
with experience”
— Tom Mitchell, former chair of the Machine Learning department of
programming e↵ort.”)
(This is likely not an original quote but a paraphrased version of Samuel’s sentence Carnegie ”Pro- Mellon University
gramming computers to learn from experience should eventually eliminate the need for much
of this detailed programming e↵ort.”)

1 ArthurL Samuel. “Some studies in machine learning using the game of checkers”. In: IBM Journal of
“The field ofconstruct
research and developmentmachine
“The field 3.3
of machine learning is concerned with the question of how to
(1959),learning is concerned with the question of how to
pp. 210–229.
computer programs that automatically improve with experience”
construct computer programs
— Tom Mitchell, that
former chair of automatically
the Machine Learning department of improve with experience”
Carnegie Mellon University
— Tom Mitchell, former chair of the Machine Learning department of
1Arthur L Samuel. “Some studies in machine learning using the game of checkers”. In: IBM Journal of Carnegie Mellon University
research and development 3.3 (1959), pp. 210–229.
Sebastian Raschka STAT 479: Machine Learning FS 2018 !6
Sebastian Raschka STAT 479: Machine Learning FS 2018 !7
Sebastian Raschka STAT 479: Machine Learning FS 2018 !8
“A breakthrough in machine learning would be worth ten Microsofts”
— Bill Gates, Microsoft Co-Founder

“If software ate the world, models will run it”


— Steven A. Cohen and Matthew W. Granade, The Wallstreet Journal, 2018

Sebastian Raschka STAT 479: Machine Learning FS 2018 !9


in T , as measured by P , improves with experience E.”
Figure 1: Machine learning vs.—”classic” programming.
Tom Mitchell, Professor at Carnegie Mellon University

As an example, consider a handwriting recognition learning problem (from Mitchell’s book):


bit more concrete, Tom Mitchell’s quote from his Machine Learning book2 :
• Task T : recognizing and classifying handwritten words within images

“A• computer
Performance measure is
program P said
: percent of words
to learn correctly
from classified
experience E with respect to
some class ofexperience
• Training tasks T E:
anda performance measure words
database of handwritten P , if its
withperformance at tasks
given classifications
in T , as measured by P , improves with experience E.”
1.2 Applications—ofTom Mitchell,
Machine Professor at Carnegie Mellon University
Learning

Email spam detection


s an example, consider
2
a handwriting recognition learning problem (from Mitchell’s book):
Tom M Mitchell et al. “Machine learning. 1997”. In: Burr Ridge, IL: McGraw Hill 45.37 (1997),
pp. 870–877.
• Task T : recognizing and classifying handwritten words within images
• Performance measure P : percent of words correctly classified
• Training experience E: a database of handwritten words with given classifications

.2 Applications of Machine Learning

mail spam detection


2 Tom M Mitchell et al. “Machine learning. 1997”. In: Burr Ridge, IL: McGraw Hill 45.37 (1997),
p. 870–877.

Sebastian Raschka STAT 479: Machine Learning FS 2018 !10


A bit more concrete, Tom Mitchell’s quote from his Machin
“A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P , if its performance at tasks
in T , as measured by P , improves with experience E.”
— “A
Tom computer program
Mitchell, Professor at CarnegieisMellon
saidUniversity
to learn from
some class of tasks T and performance mea
ample, consider a handwritingHandwriting
in T , asRecognition
recognition learning
measured Example:
problem (from
by P ,Mitchell’s
improves book):with exp

sk T : recognizing and classifying handwritten words within — Tom Mitchell, Profe


images
formance measure P : percent of words correctly classified
aining experience E: a database of handwritten words with given classifications
As an example, consider a handwriting recognition learning
Applications of Machine Learning
• Task T : recognizing and classifying handwritten word
pam detection
• Performance measure P : percent of words correctly c
M Mitchell et al. “Machine learning. 1997”. In: Burr Ridge, IL: McGraw Hill 45.37 (1997),
77.

• Training experience E: a database of handwritten wo


Sebastian Raschka STAT 479: Machine Learning FS 2018 !11
Some Applications 

of Machine Learning (1):

Sebastian Raschka STAT 479: Machine Learning FS 2018 !12


Some Applications 

of Machine Learning (2):

Sebastian Raschka STAT 479: Machine Learning FS 2018 !13


Categories of Machine Learning

Labeled data
Supervised Learning Direct feedback
Predict outcome/future

No labels/targets
Unsupervised Learning No feedback
Find hidden structure in data

Decision process
Reinforcement Learning Reward system
Learn series of actions

Sebastian Raschka STAT 479: Machine Learning FS 2018 !14


Supervised Learning: Classification

x2

x1

Sebastian Raschka STAT 479: Machine Learning FS 2018 !15


Supervised Learning: Regression

Sebastian Raschka STAT 479: Machine Learning FS 2018 !16


Categories of Machine Learning

Labeled data
Supervised Learning Direct feedback
Predict outcome/future

No labels/targets
Unsupervised Learning No feedback
Find hidden structure in data

Decision process
Reinforcement Learning Reward system
Learn series of actions

Sebastian Raschka STAT 479: Machine Learning FS 2018 !17


Unsupervised Learning -- Clustering

x2

x1

Sebastian Raschka STAT 479: Machine Learning FS 2018 !18


Unsupervised Learning 

-- Dimensionality Reduction

Sebastian Raschka STAT 479: Machine Learning FS 2018 !19


Categories of Machine Learning

Labeled data
Supervised Learning Direct feedback
Predict outcome/future

No labels/targets
Unsupervised Learning No feedback
Find hidden structure in data

Decision process
Reinforcement Learning Reward system
Learn series of actions

Sebastian Raschka STAT 479: Machine Learning FS 2018 !20


Reinforcement Learning

Environment
Reward
State
Action

Agent

Sebastian Raschka STAT 479: Machine Learning FS 2018 !21


Semi-Supervised Learning

Sebastian Raschka STAT 479: Machine Learning FS 2018 !22


Supervised Learning (Formal Notation)

Training set: 𝒟 = { < x[i], y [i] > , i = 1,… , n},

Unknown function: f(x) = y


Hypothesis: h (x) = ŷ

Classification Regression

m m
h :ℝ → ___ h :ℝ → ___

Sebastian Raschka STAT 479: Machine Learning FS 2018 !23


Supervised Learning Workflow
-- Overview

Labels
Training Data

Machine Learning
Algorithm

New Data Predictive Model Prediction

Sebastian Raschka STAT 479: Machine Learning FS 2018 !24


Supervised Learning Workflow
-- More Detailed Overview
Feature Extraction and Scaling
Feature Selection
Dimensionality Reduction
Sampling

Labels

Training Dataset
Learning
Final Model New Data
Labels Algorithm

Raw Test Dataset


Data
Labels

Preprocessing Learning Evaluation Prediction

Model Selection
Cross-Validation
Performance Metrics
Hyperparameter Optimization

Sebastian Raschka STAT 479: Machine Learning FS 2018 !25


Data Representation

x1
x2
x=

xm

Feature vector

Sebastian Raschka STAT 479: Machine Learning FS 2018 !26


Data Representation

xT1
x1
x2 xT2
x= X=
⋮ ⋮
xm xTn

Feature vector
__________ ____________

Sebastian Raschka STAT 479: Machine Learning FS 2018 !27


Data Representation

x1T x1[1] x2[1] ⋯ xm[1]


x1
x2 xT2 x1[2] x2[2] ⋯ xm[2]
X= X=
x= ⋮ ⋮ ⋮ ⋱ ⋮

xm xTn x1[n] x2[n] ⋯ xm[n]

Feature vector
__________ ____________ __________ ____________

Sebastian Raschka STAT 479: Machine Learning FS 2018 !28


Data Representation

m= _____

n= _____

Sebastian Raschka STAT 479: Machine Learning FS 2018 !29


Data Representation

x1 y [1]
x2 y [2]
x= y=
⋮ ⋮
xm y [n]

Input features ______________

Sebastian Raschka STAT 479: Machine Learning FS 2018 !30


Hypothesis Space
Entire hypothesis space

Hypothesis space
a particular learning
algorithm category
has access to

Hypothesis space
a particular learning
algorithm can sample
Particular hypothesis
(i.e., a model/classifier)

Sebastian Raschka STAT 479: Machine Learning FS 2018 !31


Hypothesis Space Size

sepal length < 5 cm sepal width < 5 cm petal length < 5 cm petal width < 5 cm Class Label
True True True True Setosa
True True True False Versicolor
True True False True Setosa
... ... ... ... ...

How many possible hypotheses?

4 binary features: __________ different feature combinations

3 classes and (Setosa, Versicolor, Virginica) and ____________ rules,

that is _______________ potential combinations


Sebastian Raschka STAT 479: Machine Learning FS 2018 !32
Classes of Machine Learning Algorithms

• Generalized linear models (e.g.,

• Support vector machines (e.g.,

• Artificial neural networks (e.g.,

• Tree- or rule-based models (e.g.,

• Graphical models (e.g.,

• Ensembles (e.g.,

• Instance-based learners (e.g.,

Sebastian Raschka STAT 479: Machine Learning FS 2018 !33


5 Steps for Approaching a Machine
Learning Application

1. Define the problem to be solved.

2. Collect (labeled) data.

3. Choose an algorithm class.

4. Choose an optimization metric for learning the model.

5. Choose a metric for evaluating the model.

Sebastian Raschka STAT 479: Machine Learning FS 2018 !34


Feature Extraction and Scaling
Feature Selection
Dimensionality Reduction
Sampling

Labels

Training Dataset
Learning
Final Model New Data
Labels Algorithm

Raw Test Dataset


Data
Labels

Preprocessing Learning Evaluation Prediction

Model Selection
Cross-Validation
Performance Metrics
Hyperparameter Optimization

Sebastian Raschka STAT 479: Machine Learning FS 2018 !35


Objective Functions
• Maximize the posterior probabilities (e.g., naive Bayes)

• Maximize a fitness function (genetic programming)

• Maximize the total reward/value function (reinforcement learning)

• Maximize information gain/minimize child node impurities (CART decision tree classification)

• Minimize a mean squared error cost (or loss) function (CART, decision tree regression, linear
regression, adaptive linear neurons, ...)

• Maximize log-likelihood or minimize cross-entropy loss (or cost) function

• Minimize hinge loss (support vector machine)

Sebastian Raschka STAT 479: Machine Learning FS 2018 !36


Optimization Methods

• Combinatorial search, greedy search (e.g.,

• Unconstrained convex optimization (e.g.,

• Constrained convex optimization (e.g.,

• Nonconvex optimization, here: using backpropagation, chain rule,


reverse autodiff. (e.g.,

• Constrained nonconvex optimization (e.g.,

Sebastian Raschka STAT 479: Machine Learning FS 2018 !37


Evaluation -- Misclassification Error

{1 if ŷ ≠y
0 if ŷ = y
L(y,̂ y) =

n
1
L(ŷ , y )
[i] [i]
n∑
ERR𝒟test =
i= 1

Sebastian Raschka STAT 479: Machine Learning FS 2018 !38


Other Metrics in Future Lectures
• Accuracy (1-Error)

• ROC AUC

• Precision

• Recall

• (Cross) Entropy

• Likelihood

• Squared Error/MSE

• L-norms

• Utility

• Fitness

• ...

But more on other metrics in future lectures.

Sebastian Raschka STAT 479: Machine Learning FS 2018 !39


Categorizing Machine Learning Algorithms

• eager vs lazy;

Sebastian Raschka STAT 479: Machine Learning FS 2018 !40


Categorizing Machine Learning Algorithms

• eager vs lazy;

• batch vs online;

Sebastian Raschka STAT 479: Machine Learning FS 2018 !41


Categorizing Machine Learning Algorithms

• eager vs lazy;

• batch vs online;

• parametric vs nonparametric;

Sebastian Raschka STAT 479: Machine Learning FS 2018 !42


Categorizing Machine Learning Algorithms

• eager vs lazy;

• batch vs online;

• parametric vs nonparametric;

• discriminative vs generative.

Sebastian Raschka STAT 479: Machine Learning FS 2018 !43


Pedro Domingo's 5 Tribes of Machine Learning

Source: Domingos, Pedro.


The master algorithm: How the quest for the
ultimate learning machine will remake our world.
Basic Books, 2015.

Sebastian Raschka STAT 479: Machine Learning FS 2018 !44


rapidly in fields outside statistics. It can be used both on large
Abstract.
data sets and There
as a are
more twoaccurate
culturesand
in the use of statistical
informative alternativ m
reach conclusions from data. One assumes that the data 1.
are
modeling on smaller data sets. If our goal as a field is to us
by aproblems,
solve given stochastic
then we data
needmodel. The away
to move other from
uses algorithmic
Statistics
exclusive dem
starts
ontreats the dataand
data models mechanism
adopt a more as unknown.
diverse set The of statistical
being generated
tools. comm b
been committed to the almost exclusive use of datavariables
input models. Th x
ment has led to irrelevant theory, questionable side,conclusions,
and on thean o
Breiman, Leo. "Statistical 1. INTRODUCTIONmodeling: The
statisticians from working on a largeThetwo cultures
range of interesting
values
come of the
out. para
Inside cu
(with comments andwith
Statistics starts a rejoinder
lems. Algorithmicby
data. Think of the the
modeling,
dataauthor).
both in the
as
theory dataand andpractice,
associate thethemodel has
pred
rapidly in fields outside statistics. It and/or can bevariables,
used bothsoThus
prediction. onthe larpt
" Statistical science 16.3
being generated by a black (2001):
box in which199-231.
a vector
data sets and as a more accurate and
of
this:informative alternat
input variables x (independent variables) go in one
modeling on smaller data sets. If our goal as a field yis to u
side, and on the other side the response variables y linear r
solve problems, then we need to move away from y exclusivelogistic d
come out. Inside the black box, nature functions to
on data models and adopt a more diverse set of tools. Cox m
associate the predictor variables with the response There are two g
variables, so the picture is like this:
validation.ToYes
Model Prediction.
A 1. INTRODUCTION B The and
tests values
are of the
residual
going to exam
be
be
part
y nature x the data and the mod
Statistics starts with data. Think of the data as Estimated culture popu
Information. To
being generated by a black box in which a vector of and/orhow
cians. prediction.
nature Thu is a
Therevariables
input are two goals in analyzing
x (independent the data:go in one
variables) this: to the input vari
The Algorithmic Modelin
side, and on the other side the response
Prediction. To be able to predict what the responses variables y linea
There
y are two logis d
come out. Inside the black box,
are going to be to future input variables; nature functions to The analysis in this cu
goals:
associate the predictor variables with
Information. To extract some information about the response the box complex andCox unk
variables,
how naturesoisthe picture is the
associating like response
this: variables find a function Dataf!x"—an
The validation.
Modelin
Model Y
to the input variables. x to predict the respons
tests and residual exa
y nature x like this:The analysis in
There are two different approaches toward these Estimated culture
a stochastic data po
cians.
box.
goals: y For example, unk
There are two goals in analyzing the data: are generated by
The Data Modeling Culture The Algorithmic Mode
Prediction. To be able to predict what the responses response variable
are analysis
The going to be to future
in this cultureinput variables;
starts with assuming The analysis in this decisc
Information.
a stochastic dataTomodel
extract
for some information
the inside about
of the black the box complex and u
neur
how
box. Fornature
example, is aassociating
common data the model
response variables
is that data find a function f!x"—a
Sebastian Raschka STAT 479: Machine Learning FS 2018 x to Leo
Model Breiman
predict !4Meas
5is P
the respon
validation.
aretogenerated
the inputby variables.
independent draws from
in fields outside statistics. It can be used both on large complex
act.
ts and There
as a are
more twoaccurate
culturesandin the use of statistical
informative alternative modeling
to data to
conclusions from data. One assumes that the data 1. generated
are INTRODUCTION The values of the
ng on smaller data sets. If our goal as a field is to use data to
iven stochastic data model. The away
other from
uses algorithmic
Statistics starts models and Think of the data as
with data. the data and the m
roblems, then we need to move exclusive dependence
the dataand mechanism as unknown. The and/or prediction. T
models adopt a more diverse set of statistical
being generated
tools. community
by a black hasbox in which a vector of
committed to the almost exclusive use of datavariables
input models. This commit-
x (independent variables) go in one this:
has led to irrelevant theory,
DUCTION
Breiman,
questionable Leo.
side, and"Statistical
conclusions,
on theand other modeling:
hasside
kept the response The two
variables ycultures
y
li
ticians from working on a largeThe range of interesting
values
come of
out.the current
parameters
Inside the blackprob-
arebox,estimated from
nature functions to lo
Algorithmic modeling, (with both in comments
theory
the dataand and
associate theand
practice,
themodel a then
has rejoinder
developed
predictor used for
variables by the
information
with the author).
response C
ata. Think of the data as
k box in which a vector"ofStatistical science 16.3 (2001): 199-231.
y in fields outside statistics. It and/orcan bevariables,
used bothsoThus
prediction. onthelarge
the complex
pictureblackis box
likeisthis:
filled in like
sets and as a more accurate and informative alternative to data Model validation.
ndent variables) go in one this:
ing on smaller data sets. If our goal as a field yis to use data to tests and residual
e the response variables y linear regression nature x
problems, then we need to move away from y exclusive dependence x Estimated culture
k box, nature functions to logistic regression cians.
ta models and adopt a more diverse set of tools. Cox model
ariables with the response There are two goals in analyzing the data:
s like this: The Algorithmic Mo
validation.ToYes–no
Model Prediction. using goodness-of-fit
ODUCTION B The and
tests values
are of the
residual
going to
be able to
parameters
examination.
be to future
predict
are estimated
input variables;
C
what the responses
from The analysis in th
ure x the data and the model then 98%used forinformation
information the box complex and
data. Think of the data as Estimated culture
Information. population.
To extract someof all statisti- about
ack box in which a vector of and/orhow
cians. prediction.
nature Thus the black the
is associating box is filled in variables
response like find a function f!x"
analyzing
pendent the data:go in one
variables) this: to the input variables. x to predict the res
The Algorithmic Modeling Culture like this:
ide the response variables
predict what the responses y linear regression
There
y are two different
logistic regressionapproaches x toward these
ack box, nature
input variables; functions to The analysis in this culture considers the inside of
goals: Cox model Their approach is to y
variables with the
some information about response the box complex and unknown.
engis the
like response
this: variables find a function Dataf!x"—an
The validation.Modeling algorithm
Culture that operates on
Model Yes–no using goodness-of-fit
x to predict the responses y. Their black box looks
tests and residual examination.
nature x like this:The analysis in this culture starts with assuming
approaches toward these Estimated culture
a stochastic data model for98%
population. the of all statisti-
inside of the black
cians.
box.
y For example, a common data model
unknown x is that data
n analyzing the data: are generated by independent draws from Model validation. M
re The Algorithmic Modeling Culture Estimated culture
o predict what the responses response variables = f(predictor variables,
The analysis in this culture considers the inside of many in other field
re input
ture variables;
starts with assuming decision treesrandom noise, parameters)
ct
or some information
the inside about
of the black the box complex and unknown.
neural nets Their approach is to In this paper I w
ting the model
on data response variables
is that data find a function STAT
Sebastian Raschka f!x"—an 479: algorithm
Machine that operates onFS 2018 statistical communi
Learning !46
x to Leo Breiman
predict the is Professor,
responses y. Department
Their
Model validation. Measured by predictive accuracy. black boxof Statistics,
looks
ng to
data
RODUCTION The values of the parameters are estimated from
arated
to
snce
and Think of the data as
data. the data and the model then used for information
yackhasbox in which a vector of and/or prediction. Thus the black box is filled in like
mmit-
pendent variables) go in one this:
kept
ide the response variables y linear regression
from Breiman, Leo. "Statistical modeling:
prob- y logistic regression x The two cultures
sackarebox,
estimated
nature functions to
loped Cox model
used for information
variables
mplex
(with
with the response comments and a rejoinder by the author).
ack
e is box
likeisthis:
filled in like
data " Statistical
Modelscience
validation.16.3
Yes–no (2001): 199-231.
using goodness-of-fit
ata to tests and residual examination.
nature
ion x
dence x Estimated culture population. 98% of all statisti-
ssion cians.
n analyzing the data:
The Algorithmic Modeling Culture
using goodness-of-fit
o predict
ers
on.
what the responses
are estimated
re input variables;from
C The analysis in this culture considers the inside of
n 98%
n.
act used forinformation
someof information
all statisti- about the box complex and unknown. Their approach is to
black box is
ting the filled in variables
response like find a function f!x"—an algorithm that operates on
x to predict the responses y. Their black box looks
lture like this:
ssion
ent approaches
ression x toward these
considers the inside of
n. Their approach is to y unknown x
ithm
ture that operates on
using goodness-of-fit
Their black box looks
tion.
ulture starts with assuming decision trees
for98%
lon. the of all statisti-
inside of the black neural nets
mmon data model x is that data
endent draws from Model validation. Measured by predictive accuracy.
Culture Estimated culture population. 2% of statisticians,
(predictor variables, many in other fields.
e considers the inside of
esrandom noise, parameters)
wn. Their approach is to In this paper I will argue that the focus in the
orithm that operates on statistical community on data models has:
or,
by Department
. Their
predictive of Statistics,
blackaccuracy.
box looksSebastian Raschka STAT 479: Machine Learning FS 2018 !47
, Berkeley, California 94720- • Led to irrelevant theory and questionable sci-
Sebastian Raschka STAT 479: Machine Learning FS 2018 !48
Evolved antenna (Source: https://en.wikipedia.org/wiki/Evolved\_antenna) via evolutionary algorithms; used on a 2006
NASA spacecraft.

Sebastian Raschka STAT 479: Machine Learning FS 2018 !49


Black Boxes vs Interpretability

Sebastian Raschka STAT 479: Machine Learning FS 2018 !50


Black Boxes vs Interpretability

Sebastian Raschka STAT 479: Machine Learning FS 2018 !51


Different Motivations for Studying
Machine Learning
• Engineers:

• Mathematicians, computer scientists, and statisticians:

• Neuroscientists:

Sebastian Raschka STAT 479: Machine Learning FS 2018 !52


The Relationship between Machine
Learning and Other Fields

Machine Learning and Data Mining

Sebastian Raschka STAT 479: Machine Learning FS 2018 !53


Machine Learning, AI, and Deep Learning

Machine Learning

Deep Learning
AI

Sebastian Raschka STAT 479: Machine Learning FS 2018 !54


Image by Jake VanderPlas; Source: 

https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote?slide=8)

Sebastian Raschka STAT 479: Machine Learning FS 2018 !55


TIOBE Index for September 2018

Programming
language
"popularity"

https://www.tiobe.com/tiobe-index/
https://www.tiobe.com/tiobe-index/programming-languages-definition/

Sebastian Raschka STAT 479: Machine Learning FS 2018 !56


Roadmap for this Course

http://stat.wisc.edu/~sraschka/teaching/stat479-fs2018/#schedule

Sebastian Raschka STAT 479: Machine Learning FS 2018 !57


Reading Assignments

• Raschka and Mirjalili: Python Machine Learning, 2nd ed., Ch 1

• Elements of Statistical Learning, Ch 01 



(https://web.stanford.edu/~hastie/ElemStatLearn/)

Sebastian Raschka STAT 479: Machine Learning FS 2018 !58

You might also like