0% found this document useful (0 votes)
2 views

01-ml-overview__slides

The document is an introductory lecture on Machine Learning, covering its definition, categories, and applications. It outlines the course structure and provides insights into supervised, unsupervised, and reinforcement learning. Key quotes from notable figures in the field emphasize the significance and potential of machine learning in automating processes and enhancing artificial intelligence.

Uploaded by

Lipika Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

01-ml-overview__slides

The document is an introductory lecture on Machine Learning, covering its definition, categories, and applications. It outlines the course structure and provides insights into supervised, unsupervised, and reinforcement learning. Key quotes from notable figures in the field emphasize the significance and potential of machine learning in automating processes and enhancing artificial intelligence.

Uploaded by

Lipika Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Lecture 01

What is Machine Learning?


An Overview.

STAT 451: Intro to Machine Learning, Fall 2020


Sebastian Raschka
http://stat.wisc.edu/~sraschka/teaching/stat451-fs2020/

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 1


Lecture 1 Overview

1. About this course

2. What is machine learning

3. Categories of machine learning

4. Notation

5. Approaching a machine learning application

6. Different machine learning approaches and


motivations

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 2


Course Topics

Part 1: Introduction

Part 2: Computational foundations

Part 3: Tree-based methods

Part 4: Model evaluation

Part 5: Dimensionality reduction and unsupervised learning

Part 6: Bayesian learning

Part 7: Class project presentations

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 3


About this Course

For details -> http://stat.wisc.edu/~sraschka/teaching/stat451-fs2020/

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 4


Lecture 1 Overview

1. About this course

2. What is machine learning

3. Categories of machine learning

4. Notation

5. Approaching a machine learning application

6. Different machine learning approaches and


motivations

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 5


What is Machine Learning?

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 6


"Machine learning is the hot new thing."
-- John L. Hennessy, President of Stanford (2000-2016)

Image Source: https://www.innovateli.com/hennessy-grad-keeps-gifting/

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 7


"A breakthrough in machine learning would be
worth ten Microsofts"
-- Bill Gates, Microsoft Co-founder

Image source: https://www.gatesnotes.com/Books

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 8


[...] machine learning is a subcategory within the field of computer
science, which allows you to implement artificial intelligence. So it’s
kind of a mechanism to get you to artificial intelligence.

-- Rana el Kaliouby, CEO at Affectiva

Image Source: https://fortune.com/2019/03/08/rana-el-kaliouby-ceo-affectiva/

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 9


eld of We develop (computer)
Artificial programs
Departmenttoof(AI),
Intelligence
oped as a subfield ofUniversity
automate various
Statisticsone
ArtificialofIntelligence
ofkinds
the of processes.
goals Originally devel-
behind
(AI), one of the goals behind machine learning
Wisconsin–Madison
machine learning
the need
was tofor developing
replace computer
the need for developing computer programs ”manually.”
programs ”manually.” If programsIfareprograms
a are a
means to http://stat.wisc.edu/
automate processes,sraschka/teaching/stat479-fs2018/

we can think of machine learning as ”automating automa-
mate processes,
tion.” In other words, machine learning of
we can think machine
lets computers learning
”create” programsas ”automating
(often for making automa-
Falllearning
2018
words,predictions)
machine learning
themselves. Machinelets computers ”create”
is turning data programs (often for making
into programs.
1
emselves. Machine learning is turning data into programs.
It is said that the term machine learning was first coined by Arthur Lee Samuel in 1959 .
One quote that
1 What almost every
is Machine introductory
Learning? Anmachine learning resource is often accredited to
Overview.
Samuel, an pioneer of the field of AI: 1
the term machine learning
1.1 Machine Learning – The Big Picture
was first coined by Arthur Lee Samuel in 1959 .
almost every introductory “Machine
We develop (computer) programs learningmachine
to automate is the kinds
various field of learning
ofprocesses.
study that givesresource
Originally computers
devel- the is often
ability to accredited to
neer ofoped
the field
as a subfield of of AI:
learn Intelligence
Artificial without being explicitly
(AI), one programmed”
of the goals behind machine learning
was to replace the need for developing computer programs ”manually.”—If Arthur
programsL.areSamuel,
a AI pioneer, 1959
means to automate processes, we can think of machine learning as ”automating automa-
Image Source: https://history-computer.com/ModernComputer/thinkers/images/Arthur-Samuel1.jpg
tion.” In other words, machine learning lets computers ”create” programs (often for making
predictions) themselves. Machine learning is turning data into programs.
(This is likely not an original quote but a paraphrased version of Samuel’s sentence ”Pro-
“Machine learning
It is said that the
gramming is
term machine learning
computers to learn thewasfield
from of
first coined
experience bystudy
Arthur Leethat
should Samuel gives
eventually in 1959 . computers the
1
eliminate the need for much
One quote that almost every introductory machine learning resource is often accredited to
ability to
of
learn thiswithout
Samuel, detailed
an programming
pioneer of being
the e↵ort.”)
field of AI:explicitly programmed”
— Arthur L. Samuel, AI pioneer, 1959
“Machine learning is the field of study that gives computers the ability to
“The field
learn without being of machine
explicitly learning is concerned with the question of how to
programmed”
construct computer programs thatL. automatically
— Arthur Samuel, AI pioneer,improve
1959 with experience”
— Tom Mitchell, former chair of the Machine Learning department of
(This is likely not an original quote but a paraphrased version of Samuel’s sentence Carnegie
”Pro- Mellon University
not angramming
original computers quote
to learn frombut a should
experience paraphrased version
eventually eliminate the need for muchof Samuel’s sentence ”Pro-
of this detailed programming e↵ort.”)
puters to learn from experience should eventually eliminate the need for much
1
Arthur L Samuel. “Some studies in machine learning using the game of checkers”. In: IBM Journal of
programming e↵ort.”)
“The field 3.3
research and development of machine
(1959),learning is concerned with the question of how to
pp. 210–229.
construct computer programs that automatically improve with experience”
— Tom Mitchell, former chair of the Machine Learning department of
Carnegie Mellon University

“The 1 field
Arthur L Samuel.of“Some
machine learning
studies in machine learning using theis
gameconcerned
of checkers”. In: IBM with
Journal of the question of how to
research and development 3.3 (1959), pp. 210–229.
construct computer programs that automatically improve with experience”
Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 10
The Traditional Programming Paradigm

Inputs (observations)

Programmer Program Computer Outputs

!2

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 11


Inputs (observations)

Programmer Program Computer Outputs

Machine learning is the field of study that gives computers the


ability to learn without being explicitly programmed

— Arthur Samuel (1959)

Inputs
Computer Program
Outputs

!3
Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 12
We will not only use the machines for their
intelligence, we will also collaborate with them in
ways that we cannot even imagine.
-- Fei Fei Li, Director of Stanford's artificial intelligence lab

Image Source: https://en.wikipedia.org/wiki/Fei-Fei_Li#/


media/File:Fei-Fei_Li_at_AI_for_Good_2017.jpg

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 13


some class of tasks T and performance measure P , if its performance at tasks
in T , as measured by P , improves with experience E.”
— Tom Mitchell, Professor at Carnegie Mellon2 University
crete, Tom Mitchell’s quote from his Machine Learning book :
As an example, consider a handwriting recognition learning problem (from Mitchell’s book):
“A computer program is said to learn from experience E with respect to
• Task T : recognizing and classifying handwritten words within images
some class of tasks T and performance measure P , if its performance at tasks
in T• ,Performance
as measuredmeasure
by PP :, percent
improvesof words
withcorrectly classifiedE.”
experience
• Training experience— E: Tom Mitchell,
a database Professor
of handwritten wordsat Carnegie
with Mellon University
given classifications

1.2 Applications of Machine Learning


consider a handwriting recognition learning problem (from Mitchell’s book):
Email spam detection
2 Tom
M Mitchell et al. “Machine learning. 1997”. In: Burr Ridge, IL: McGraw Hill 45.37 (1997),
ecognizing and classifying handwritten words within images
pp. 870–877.

nce measure P : percent of words correctly classified


experience E: a database of handwritten words with given classifications

cations of Machine Learning

detection
Sebastian
hell et al.Raschka
“Machine learning. STAT 451: Intro
1997”. In: to ML Ridge, IL: McGraw
Burr Hill 45.37 (1997), 14
Lecture 1: Introduction
A “A
bitcomputer
more concrete, Tom
program is said Mitchell’s
to learn quoteE with
from experience fromrespect
his Machin
to
some class of tasks T and performance measure P , if its performance at tasks
in T , as measured by P , improves with experience E.”
— Tom “AMitchell,
computer program
Professor is said
at Carnegie to University
Mellon learn from
some class of tasks T and performance mea
Handwriting Recognition Example:
in T , aslearning
consider a handwriting recognition measured by (from
problem P , improves
Mitchell’s with
book):exp
— Tom Mitchell, Profe
cognizing and classifying handwritten words within images
ce measure P : percent of words correctly classified
As an example, consider a handwriting recognition learning
xperience E: a database of handwritten words with given classifications

?
• Task T : recognizing
ations of Machine Learning
and classifying handwritten word
• Performance measure P : ?percent of words correctly c
etection
• Training experience E: ?a database of handwritten wo
ll et al. “Machine learning. 1997”. In: Burr Ridge, IL: McGraw Hill 45.37 (1997),

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 15


Some Applications
of Machine Learning:






Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 16
Lecture 1 Overview

1. About this course

2. What is machine learning

3. Categories of machine learning

4. Notation

5. Approaching a machine learning application

6. Different machine learning approaches and


motivations

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 17


Categories of Machine Learning

Labeled data
Supervised Learning Direct feedback
Predict outcome/future

No labels/targets
Unsupervised Learning No feedback
Find hidden structure in data

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 18


Supervised Learning: Classification

x2

x1
Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 19
Supervised Learning: Regression

x
Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 20
Categories of Machine Learning

Labeled data
Supervised Learning Direct feedback
Predict outcome/future

No labels/targets
Unsupervised Learning No feedback
Find hidden structure in data

Decision process
Reinforcement Learning
Sebastian Raschka STAT 451: Intro to ML Reward system
Lecture 1: Introduction 21
Unsupervised Learning -- Clustering

x2

x1
Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 22
Unsupervised Learning
-- Dimensionality Reduction

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 23


Categories of Machine Learning

Labeled data
Supervised Learning Direct feedback
Predict outcome/future

No labels/targets
Unsupervised Learning No feedback
Find hidden structure in data

Decision process
Reinforcement Learning Reward system
Learn series of actions

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 24


Reinforcement Learning

Environment
Reward
State
Action

Agent

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 25


https://www.theverge.com/tldr/2017/7/10/15946542/deepmind-parkour-agent-reinforcement-learning

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 26


https://video.twimg.com/ext_tw_video/1111683489890332672/pu/vid/1200x674/WqUJEhUETw0M0gCl.mp4?tag=8

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 27


Lecture 1 Overview

1. About this course

2. What is machine learning

3. Categories of machine learning

4. Notation

5. Approaching a machine learning application

6. Different machine learning approaches and


motivations

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 28


Supervised Learning Workflow
-- Overview
Labels
Training Data

Machine Learning
Algorithm

New Data Predictive Model Prediction

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 29


Supervised Learning Notation

Training set: 𝒟 = {⟨x[i], y [i]⟩, i = 1,… , n},

Unknown function: f(x) = y


Hypothesis: h(x) = ŷ

Classification Regression

m m
h:ℝ → ___ h:ℝ → ___

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 30


Data Representation

x1
x2
x=

xm

Feature vector

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 31


Data Representation

xT1
x1
x2 xT2
x= X=
⋮ ⋮
xm xTn

Feature vector D___n m_________

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 32


Data Representation

x1T x1[1] x2[1] ⋯ xm[1]


x1
x2 xT2 x1[2] x2[2] ⋯ xm[2]
X= X=
x= ⋮ ⋮ ⋮ ⋱ ⋮

xm xTn x1[n] x2[n] ⋯ xm[n]

Feature vector
_________________ ______________________ ______________________

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 33


Data Representation

m= _____

n= _____

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 34


Data Representation

x1 y [1]
x2 y [2]
x= y=
⋮ ⋮
xm y [n]

Input features
______________ ______________

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 35


ML Terminology (Part 1)
▪ Training example: A row in the table representing the
dataset. Synonymous to an observation, training record,
training instance, training sample (in some contexts, sample
refers to a collection of training examples)

▪ Feature: a column in the table representing the dataset.


Synonymous to predictor, variable, input, attribute,
covariate.

▪ Targets: What we want to predict. Synonymous to


outcome, output, ground truth, response variable,
dependent variable, (class) label (in classification).

▪ Output / prediction: use this to distinguish from targets;


here, means output from the model.
Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 36
Hypothesis Space
Entire hypothesis space

Hypothesis space
a particular learning
algorithm category
has access to

Hypothesis space
a particular learning
algorithm can sample
Particular hypothesis
(i.e., a model/classifier)
Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 37
Classes of Machine Learning Algorithms

• Generalized linear models (e.g.,

• Support vector machines (e.g.,

• Artificial neural networks (e.g.,

• Tree- or rule-based models (e.g.,

• Graphical models (e.g.,

• Ensembles (e.g.,

• Instance-based learners (e.g.,

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 38


Lecture Overview

1. About this course

2. What is machine learning

3. Categories of machine learning

4. Notation

5. Approaching a machine learning application

6. Different machine learning approaches and


motivations

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 39


Supervised Learning Workflow
-- Overview
Labels
Training Data

Machine Learning
Algorithm

New Data Predictive Model Prediction

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 40


Feature Extraction and Scaling
Feature Selection
Dimensionality Reduction
Sampling

Labels

Training Dataset
Learning
Final Model New Data
Labels Algorithm

Raw Test Dataset


Data
Labels

Preprocessing Learning Evaluation Prediction

Model Selection
Cross-Validation
Performance Metrics
Hyperparameter Optimization

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 41


5 Steps for Approaching a Machine
Learning Application

1. Define the problem to be solved.

2. Collect (labeled) data.

3. Choose an algorithm class.

4. Choose an optimization metric or measure for learning the model.

5. Choose a metric or measure for evaluating the model.

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 42


Objective Functions
• Maximize the posterior probabilities (e.g., naive Bayes)

• Maximize a fitness function (genetic programming)

• Maximize the total reward/value function (reinforcement


learning)

• Maximize information gain/minimize child node impurities


(CART decision tree classification)

• Minimize a mean squared error cost (or loss) function (CART,


decision tree regression, linear regression, adaptive linear
neurons, ...)

• Maximize log-likelihood or minimize cross-entropy loss (or cost)


function

• Minimize hinge loss (support vector machine)


Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 43
Optimization Methods for
Different Learning Algorithms

• Combinatorial search, greedy search (e.g., decision trees)

• Unconstrained convex optimization (e.g.,

• Constrained convex optimization (e.g.,

• Nonconvex optimization, here: using backpropagation, chain rule,


reverse autodiff. (e.g.,

• Constrained nonconvex optimization (e.g.,

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 44


Evaluation -- Misclassification Error

{1 if ŷ ≠ y
0 if ŷ = y
L(y,̂ y) =

n
1
L(ŷ , y )
[i] [i]
test n ∑
ERR𝒟 =
i=1

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 45


ML Terminology (Part 2)

▪ Loss function: Often used synonymously with cost


function; sometimes also called error function. In some
contexts the loss for a single data point, whereas the cost
function refers to the overall (average or summed) loss over
the entire dataset. Sometimes also called empirical risk.

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 46


Other Metrics in Future Lectures
• Accuracy (1-Error)
• ROC AUC
• Precision
• Recall
• (Cross) Entropy
• Likelihood
• Squared Error/MSE
• L-norms
• Utility
• Fitness
• ...

But more on other metrics in future lectures.

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 47


Lecture 1 Overview

1. About this course

2. What is machine learning

3. Categories of machine learning

4. Notation

5. Approaching a machine learning application

6. Different machine learning approaches and


motivations

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 48


Pedro Domingos's 5 Tribes of Machine Learning

Source: Domingos, Pedro.


The master algorithm: How the quest for the
ultimate learning machine will remake our world.
Basic Books, 2015.

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 49


1. INTRODUCTION
statisticians from working on a largeThe rangv
lems. Algorithmic modeling, both in the theor d
Statistics starts with data. Think of the data as
rapidly in fields outside statistics. It and/o can b
being generated by a black box in which a vector of
data sets and as a more accurate and
this:in
Breiman,
input variables x (independent
Leo. "Statistical modeling: The two cultures
variables) go in one
(with comments and a modeling
rejoinder on the
by smaller data sets. If our goa
author).
side, and on the other side the response variables y
" Statistical science 16.3solve problems,
(2001): then we need to move awa
199-231.
come out. Inside the black box, nature functions to
on data models and adopt a more diverse s
associate the predictor variables with the response
variables, so the picture is like this:
Mod
A 1. INTRODUCTION B The
tests
y nature x the
Statistics starts with data. Think of the data as Estim
being generated by a black box in which a vector of and
cians
Therevariables
input are two goals in analyzing
x (independent the data:go in one
variables) this
The A
side, and on
Prediction. Tothe othertoside
be able the what
predict response variables y
the responses
come
are out.toInside
going be to the
future black box,variables;
input nature functions to The
associate the To
Information. predictor
extractvariables with the response
some information about the bo
variables,
how naturesoisthe picture is the
associating like response
this: variables find a
Mo
to the input variables. x to p
tes
y nature x like th
There are two different approaches toward these Est
goals: cia
Sebastian Raschka There are two
STAT 451:goals
Intro to in
ML analyzing Lecture
the data:
1: Introduction 50
e almost exclusive use of datavariables
input models. This commit-
x (independent variables) go in one this
vant theory, questionable conclusions,
side, and on theand otherhasside
kept the response variables y
rking on a largeThe range of interesting
values
come of
out.the
Inside current
parameters prob-
the black arebox,estimated from
nature functions to
deling, both in the
theory
dataand
and practice,
associate the
the has then
model developed
predictor used for information
variables with the response
he data as
de statistics. It and/or
can bevariables,
usedLeo.
Breiman, bothso
prediction. onthe
Thus large
the complex
black
picture
"Statistical is box
likeisthis:
modeling: filled
The twoin like
cultures
a vector of Mo
more accurate and informative
this:
(with comments alternative to data by the author).
and a rejoinder
) go in one tes
data sets. If our" goal as a field
Statistical yis
science to 16.3
use data
(2001): to 199-231. x
nature
variables y linear regression Es
we need to move away from exclusive
y dependence
logistic regression x
unctions to cia
dopt a more diverse set of tools. Cox model
e response There are two goals in analyzing the data:
The
validation.ToYes–no
Model Prediction. using goodness-of-fit
B The and
tests values of the
areresidual
going
be able to
parameters
to examination.
be
predict what the responses
are estimated
to future input variables;from
C T
the data as the data
Estimated and the model
culture
Information. then 98%
population.
To extract used
someofforinformation
information
all statisti- about the
h a vector of and/orhow
cians. prediction.
nature Thus the black the
is associating box is filled in variables
response like find
ata:go in one
es) this: to the input variables. x to
The Algorithmic Modeling Culture like
variables y
e responses linear regression
There
y are two different
logistic approaches
regression x toward these
functions
; to The analysis in this culture considers the inside of
goals:
he response
tion about the box complex andCox model Their approach is to
unknown.
variables find a function
Dataf!x"—an
The validation.
Modeling algorithm
Culture that operates on
Model Yes–no using goodness-of-fit
x to predict the responses y. Their black box looks
tests and
The residual
analysis examination.
in this culture starts with assuming
x like this:
ward these Estimated culture
a stochastic data model for98%
population. the of all statisti-
inside of the black
cians.
box.
y For example, a common data model
unknown x is that data
data: are generated by independent draws from Mo
Sebastian Raschka The Algorithmic Modeling
STAT 451: Intro to ML Culture Lecture 1: Introduction 51
Es
mation
the response Cox model
in like
Model validation. Yes–no using goodness-of-fit
Breiman, Leo. "Statistical modeling: The two cultures
tests and residual examination.
x (with comments and a rejoinder by the author).
Estimated culture population. 98% of all statisti-
" Statistical
cians. science 16.3 (2001): 199-231.
e data:
The Algorithmic Modeling Culture
s-of-fit
the responses
ed from
bles;
C The analysis in this culture considers the inside of
ormation
tatisti- about
mation the box complex and unknown. Their approach is to
d in variables
nse like find a function f!x"—an algorithm that operates on
x to predict the responses y. Their black box looks
like this:
toward these
side of
ch is to y unknown x
ates on
ess-of-fit
x looks
ith assuming decision trees
statisti-
of the black neural nets
el is that data
rom Model validation. Measured by predictive accuracy.
Estimated culture population. 2% of statisticians,
ables, many in other fields.
inside of
parameters)
ach is to In this paper I will argue that the focus in the
Sebastian
rates on Raschka STAT 451: Intro to ML Lecture 1: Introduction 52
Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 53
Evolved antenna (Source: https://en.wikipedia.org/wiki/Evolved\_antenna) via evolutionary algorithms; used on a 2006
NASA spacecraft.

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 54


Black Boxes vs Interpretability

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 55


Black Boxes vs Interpretability

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 56


Different Motivations for Studying
Machine Learning
• Engineers:

• Mathematicians, computer scientists, and statisticians:

• Neuroscientists:

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 57


Machine Learning, AI, and Deep Learning

Machine Learning

Deep Learning
AI

Algorithms that learn


models/representations/
rules automatically
from data/examples
A non-biological system
that is intelligent
through rules Algorithms that parameterize multilayer
neural networks that then learn
representations of data with multiple layers
of abstraction
Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 58
Image by Jake VanderPlas; Source:
https://speakerdeck.com/jakevdp/the-state-of-the-stack-scipy-2015-keynote?slide=8)

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 59


Spam

https://en.wikipedia.org/wiki/Spam_(food)

"It has become the subject of a number of appearances in pop culture, notably
a Monty Python sketch which repeated the name many times, leading to its
name being borrowed for unsolicited electronic messages, especially email."

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 60


Spam

https://en.wikipedia.org/wiki/Spam_(food)

https://en.wikipedia.org/wiki/Monty_Python

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 61


Spam

https://en.wikipedia.org/wiki/Spam_(food)
https://en.wikipedia.org/wiki/Monty_Python

"Python's name is derived from the British comedy group Monty Python, whom Python creator Guido van
Rossum enjoyed while developing the language. "

https://en.wikipedia.org/wiki/Python_(programming_language)

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 62


ML Terminology (Part 3)
▪ Hypothesis: A hypothesis is a certain function that we believe (or hope) is
similar to the true function, the target function that we want to model.

▪ Model: In the machine learning field, the terms hypothesis and model are
often used interchangeably. In other sciences, they can have different
meanings.

▪ Learning algorithm: Again, our goal is to find or approximate the target


function, and the learning algorithm is a set of instructions that tries to
model the target function using our training dataset. A learning algorithm
comes with a hypothesis space, the set of possible hypotheses it
explores to model the unknown target function by formulating the final
hypothesis.

▪ Classifier: A classifier is a special case of a hypothesis (nowadays, often


learned by a machine learning algorithm). A classifier is a hypothesis or
discrete-valued function that is used to assign (categorical) class labels to
particular data points
Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 63
Course Topics

Part 1: Introduction

Part 2: Computational foundations

Part 3: Tree-based methods

Part 4: Model evaluation

Part 5: Dimensionality reduction and unsupervised learning

Part 6: Bayesian learning

Part 7: Class project presentations

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 64


Part 1: Introduction

- Week 01: L01 - Course overview, introduction to machine learning

- Week 02: L02 - Introduction to Supervised Learning


and k-Nearest Neighbors Classifiers

Part 2: Computational foundations

- Week 03: L03 - Using Python

- Week 03: L04 - Introduction to Python's scientific computing stack

- Week 04: L05 - Data preprocessing and machine learning with scikit-learn

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 65


Reading Assignments

• Raschka and Mirjalili: Python Machine Learning, 3rd ed., Ch 1

• Elements of Statistical Learning, Ch 01


(https://web.stanford.edu/~hastie/ElemStatLearn/)

• Optional: Breiman, Leo. "Statistical modeling: The two cultures


(with comments and a rejoinder by the author)".
Statistical science 16.3 (2001): 199-231.
https://projecteuclid.org/euclid.ss/1009213726

Sebastian Raschka STAT 451: Intro to ML Lecture 1: Introduction 66

You might also like