0% found this document useful (0 votes)

15 views35 pages

AA2 Intro ML 2024

Uploaded by

rwoosh42069

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views35 pages

AA2 Intro ML 2024

Uploaded by

rwoosh42069

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

11/18/24

MACHINE LEARNING:
INTRODUCTION
Reading:
• Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Springer, 2009.
• G. James, D. Witten, T. Hastie, R. Tibshirani, J. Taylor. An Introduction to Statistical Learning with
Applications in Python, Springer, July 2023.

Examples of learning problems

• Predict if a patient will have a second heart attack, based on demographic,
diet and clinical measurements for that patient.
• Predict the price of a stock in 6 months from now, based on the company
performance measures and economic data.
• Identify the numbers in handwritten code, from a digitized image.
• Estimate the amount of glucose in the blood of a diabetic person, from the
infrared absorption spectrum of blood.
• Identify the risk factors for prostate cancer, based on clinical and
demographic variables.

1
11/18/24

Learning from data

• Supervised learning:
• Outcome measurement: usually quantitative (such as a stock price)
or categorical (such as heart attack/no heart attack), that one wishes
to predict.
• Based on a set of features (such as diet and clinical measurements).
• Training set of data contains the outcome and feature
measurements.
• Data is used to build a prediction model.
• In unsupervised learning only features are observed; no
measurements of the outcome are available.
3

Learning from data

Examples:
• Customize an email spam detection system.
• Identify the risk factors for prostate cancer.
• Identify the numbers in a handwritten zip code.
• Classify the pixels in a LANDSAT image, by usage.

2
11/18/24

Email spam
• Data from 4601 email messages sent to an individual (named George
at HP labs, before 2000). Each is classified as email or spam.
• Objective: build a customized spam filter.
• Input features: relative frequencies of 57 of the most commonly
occurring words and punctuation marks in the email messages.

Average percentage of words or characters in an email message equal to the

indicated word or character.
5

Learning from data

3
11/18/24

Scatterplot of
cancer data

Learning from data

4
11/18/24

Handwritten digit recognition

• Data: handwritten ZIP codes on envelopes. Each image is a
segment isolating a single digit.
• The images are 16×16 eight-bit grayscale maps, with each pixel
ranging in intensity from 0 to 255.
• Task: predict, from the 16 × 16 matrix of pixel intensities, the
identity of each image (0, 1, . . . , 9) quickly and accurately.
• If it is accurate enough, the resulting algorithm would be used
as part of an automatic sorting procedure for envelopes.

Handwritten digit recognition

5
11/18/24

Learning from data

Predict land usage

Usage ϵ {red soil, cotton, vegetation stubble, mixture, gray soil, damp gray soil} 12

6
11/18/24

Learning from data

• Classification of electricity customers based on smart metering data
• Mortality prediction of septic patients
• Increase grassland production, minimizing fertilizers inputs
• Optimization of EVs charging
• Non-rigid registration of 3D ultrasound for neurosurgery
• Wildfire burnt area classification
• Generalizing GAN for robot fault diagnosis

Learning from data

7
11/18/24

Classification of electricity prosumers

8
11/18/24

Learning from data

PSO for feature selection using SVM

Reset swarm best and local search

9
11/18/24

Learning from data

10
11/18/24

Estimated pasture characteristics

11
11/18/24

Learning from data

Optimization of EVs charging

12
11/18/24

Learning from data

13
11/18/24

Registration of 3D ultrasound for neurosurgery

14
11/18/24

Learning from data

Computational vision and drones

• Wildfire burnt area classification from UAV-based images

15
11/18/24

Learning from data

16
11/18/24

Generalizing GAN for robot fault diagnosis

• Generative adversarial networks

(GAN) augment data for fault
diagnosis of an industrial robot.

33
33

34
34

17
11/18/24

Supervised learning problem

• Outcome measurement Y (also called dependent variable, response,
target).
• Vector of p predictor measurements X (also called inputs, regressors,
covariates, features, independent variables).
• In the regression problem, Y is quantitative (e.g. price, blood pressure).
• In the classification problem, Y takes values in a finite, unordered set
(survived/died, digit 0-9, cancer class of tissue sample).
• We have training data (x1, y1), …, (xN, yN). These are observations
(examples, instances) of these measurements.

Objectives
On the basis of the training data we would like to:
• Accurately predict unseen test cases.
• Understand which inputs affect the outcome, and how.
• Assess the quality of our predictions and inferences.

18
11/18/24

Philosophy
• It is important to understand the ideas behind the various techniques, in order to
know how and when to use them.
• One has to understand the simpler methods first, in order to grasp the more
sophisticated ones.
• It is important to accurately assess the performance of a method, to know how
well or how badly it is working (simpler methods often perform as well as fancier
ones!)
• This is an exciting research area, having important applications in engineering,
science, industry, finance, etc.
• Machine learning is a fundamental ingredient in the training of a modern
engineer or data scientist.
37

Unsupervised learning
• No outcome variable, just a set of predictors (features) measured on a
set of samples.
• Objective is fuzzier (no Y!), as e.g.:
• find groups of samples that behave similarly,
• find features that behave similarly,
• find (non)linear combinations of features with the most variation.
• Difficult to know how well you are doing.
• Different from supervised learning but can be useful as a pre-
processing step for supervised learning.
• It is much more difficult to collect labeled data!
38

19
11/18/24

Statistical Learning vs Machine Learning

• Machine learning arose as a subfield of Artificial Intelligence.
• Statistical learning arose as a subfield of Statistics.
• There is much overlap – both fields focus on supervised and unsupervised
problems:
• Machine learning has a greater emphasis on large scale applications and
prediction accuracy.
• Statistical learning emphasizes models and their interpretability, precision and
stochastic uncertainty.
• But the distinction has become more and more blurred, and there is a
great deal of “cross-fertilization”.
39

What is Machine Learning?

• Sales vs TV, radio, and newspaper budgets. Blue line: least squares fit (linear
regression) of sales to that variable.
• How can we predict Sales, using jointly the three?
Sales ≈ f (TV, radio, newspaper)
40

20
11/18/24

Notation
• Sales is a response or target that we wish to predict. We generically refer to the
response as Y.
• TV is a feature, or input, or predictor; we name it X1. Likewise, Radio is X2.
• We can refer to the input vector as
! !! "
! = ## ! " $$
#% ! # $&
• The model is written as
! = " !# " + !
• where e captures measurement errors and other discrepancies.

What is f(X) good for?

• With a good f we can make predictions of Y at new points X = x.
• We can understand which components of X = (X1, X2, …, Xp) are
important in explaining Y, and which are irrelevant.
• Example: Seniority and Years of Education have a big impact on Income,
but Marital Status typically does not.
• Depending on the complexity of f, we may be able to understand how
each component Xj of X affects Y.

21
11/18/24

Is there an ideal f(X)?

• Good value for f(X) at any selected value of X, say X = 4? There can be many Y
values at X = 4. A good value is
! !"# = " !# $ $ = "#
• ! !" " # = #$ means expected value (average) of Y given X = 4.
• This ideal ! ! "" = # !$ # % = "" is called the regression function.
43

The regression function f(x)

• Is also defined for vector X; e.g.
! $ "% = ! $ "! & "" & "# % = # $$ ' % ! = "! & % " = "" & % # = "# %
• Is the ideal or optimal predictor of Y with regard to mean-squared
prediction error; ! ! "" = # !$ # % = "" is the function that minimizes
!"#" ! # # $ $$ ! % $ = %& over all functions g at all points X = x.
• ! = ! " " ! #" is the irreducible error – i.e., even if we knew f(x), we would
still make errors in prediction, since at each X = x there is typically a
distribution of possible Y values.
!
• For probability distributions, for any estimate 𝑓(𝑥) of f(x):
+#$I " -" $ . %% ! & . = /' = # - $ /% " -" $ /%'! + ()*$! %
!""#"" $ !#$
!"#$%&'E" )**"#$%&'E"
44

22
11/18/24

How to estimate f
• Typically, we have few if any data points with X = 4 exactly.
• So, we cannot compute E(Y | X = x)!
• Relax the definition and let
!! " "# = A%&"# ' $ ! 𝒩
% " "##
• where 𝒩(x) is some neighborhood of x.

Nearest neighbor
• Nearest neighbor averaging can be pretty good for small p, i.e. p £ 4 and
large values of N.
• Nearest neighbor methods can be lousy when p is large. Reason: the curse
of dimensionality. Nearest neighbors tend to be far away in high
dimensions.
• We need to get a reasonable fraction of the N values of yi to average to
bring the variance down, e.g. 10%.
• A 10% neighborhood in high dimensions need no longer be local, so we
lose the spirit of estimating E(Y | X = x) by local averaging.

23
11/18/24

The curse of dimensionality

Parametric and structured models

• The linear model is an important example of a parametric model:
fL(X) = b0 + b1X1 + b2X2 + … + bpXp.
• A linear model is specified in terms of p +1 parameters b0, b1, bp.
• We estimate the parameters by fitting the model to training data.
• Although it is almost never correct, a linear model often serves as a good
and interpretable approximation to the unknown true function f (X).

24
11/18/24

Example
A linear model "#! $ # % = !#! + !#" # gives a reasonable fit here

A quadratic model "$! % # & = !$" + !$# # + !$! # fits slightly better
!

Simulated example

• Red points are simulated values for

income from the model.
• income = f (education, seniority) + e
• f is the blue surface.

25
11/18/24

Linear regression model

• Linear regression model fit to the simulated data.
.$ %"#$%&'()*& +"*(),('- ' = !$ + !$ " "#$%&'()* + !$ " +"*(),('-
! ! " #

Spline regression model

• More flexible regression model fS(education, seniority) fit to the simulated data.
Uses a technique called a thin-plate spline to fit a flexible surface (to see latter in
the course).

26
11/18/24

Spline regression model

• Spline regression model fS(education, seniority) fit to the simulated data, with no
errors on the training data! Also known as overfitting.

Some tradeoffs
• Prediction accuracy versus interpretability.
• Linear models are easy to interpret; thin-plate splines are not.
• Good fit versus overfit or underfit.
• How do we know when the fit is just right?
• Parsimony versus black-box.
• We often prefer a simpler model involving fewer variables over a
black-box predictor involving them all.

27
11/18/24

Flexibility vs Interpretability
Fuzzy models

Assessing model accuracy

• Suppose we fit a model !! " "# to some training data "# = $#" % $" &!!
and we wish to see how well it performs.
• We could compute the average squared prediction error over
Tr:
%&' "# = E)*!!"# + "! " #A + $! ,, !
• This may be biased toward more overfit models.
• Instead, we should, if possible, compute it using fresh test data
"# = $#" % $" &!! :
%&' "# = E)#!!"# * "! " #A * $! ++ !
56

28
11/18/24

Example

• Black curve is truth. Red curve on right is MSETe, grey curve is MSETr. Orange,
blue and green curves/squares correspond to fits of different flexibility.
57

Another example

• Here the truth is smoother, so the smoother fit and linear model do really well.

29
11/18/24

And another example

• Here the truth is wiggly and the noise is low, so the more flexible fits do
the best.
59

Bias-variance tradeoff
• Suppose we have a model !! " "# to some training data Tr, and let (x0, y0) be a
test observation drawn from the population. If the true model is Y = f(X) + e
(with f(x) = E(Y|X = x)), then:
( )
! !
! "" $ ## $ $" B = &'($ ## $ $" BB + "% )*'+$ ## $ $" BB #& + &'($! B
• The expectation averages over the variability of y0 as well as the variability in
Tr. Note that
#$B&' !" ' "! (( = #) !" ' "! (* ! ! ' "! (+
• Typically, as the flexibility of !! increases, its variance increases, and its bias
decreases. So, choosing the flexibility based on the average test error
amounts to a bias-variance tradeoff.

30
11/18/24

Bias-variance tradeoff for the examples

Classification problems
• Here the response variable Y is qualitative.
• Examples: Email is one of C = (spam, ham) (ham = good
email). Digit class is one of C = {0, 1, …,9}.
• Our goals are to:
• Build a classifier C(X) that assigns a class label from C to a future
unlabeled observation X.
• Assess the uncertainty in each classification.
• Understand the roles of the different predictors among
X = (X1, X2, …, Xp).

31
11/18/24

Example

• Is there an ideal C(X)? Suppose the K elements in C are numbered 1, 2, …,K.

Let pk(x) = Pr(Y = k|X = x), k = 1, 2, …,K.
• These are the conditional class probabilities at x; e.g. see little barplot at x = 5.
Then the Bayes optimal classifier at x is
C(x) = j if pj(x) = max{p1(x), p2(x), …, pK(x)}
63

Estimation

• Nearest-neighbor averaging can be used as before.

• Also breaks down as dimension grows. However, the impact on !! " "#
is less than on "! ! " ## , k = 1, …, K.
64

32
11/18/24

Classification: some details

• Typically, we measure the performance of !! " "# using the
misclassification error rate:
%&&!" = 'E"#!!" " ) #! " $A * %! +,
• The Bayes classifier (using the true pK(x)) has the smallest error (in the
population).
• Techniques used for classification in this course:
• Support vector machines,
• Logistic regression, etc.

Example
• K-nearest neighbors in two dimensions

33
11/18/24

34
11/18/24

Error rates of example

ML Merged
No ratings yet
ML Merged
433 pages
ML - Unit I - Final
No ratings yet
ML - Unit I - Final
132 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
01-Introduction To Machine Learning
No ratings yet
01-Introduction To Machine Learning
89 pages
MLUnit 1
No ratings yet
MLUnit 1
131 pages
ML - 1 - Sovan - Introduction To ML
No ratings yet
ML - 1 - Sovan - Introduction To ML
83 pages
Introduction To Machine Learning
100% (1)
Introduction To Machine Learning
119 pages
Unit1 2
No ratings yet
Unit1 2
101 pages
MCA - ML Question Bank Answer
No ratings yet
MCA - ML Question Bank Answer
139 pages
U1 ML Intro and Applications
No ratings yet
U1 ML Intro and Applications
123 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
Machine Learning Intro & Evaluation Metrics
No ratings yet
Machine Learning Intro & Evaluation Metrics
50 pages
AIML Internship Report
No ratings yet
AIML Internship Report
38 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
89 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
Unit 3
No ratings yet
Unit 3
80 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Module 1-Basics of ML
No ratings yet
Module 1-Basics of ML
142 pages
Unit 1&2
No ratings yet
Unit 1&2
270 pages
Lesson 4 - Introduction Machine Learning
No ratings yet
Lesson 4 - Introduction Machine Learning
44 pages
Ch7 Introduction To Machine Learning
No ratings yet
Ch7 Introduction To Machine Learning
29 pages
Chapter One1
No ratings yet
Chapter One1
106 pages
Unit 3
No ratings yet
Unit 3
62 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Lavaan Multilevel Zurich2017
100% (1)
Lavaan Multilevel Zurich2017
162 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
20 pages
Cattaneo-Idrobo-Titiunik 2023 CUP
No ratings yet
Cattaneo-Idrobo-Titiunik 2023 CUP
103 pages
Data Science by Internshala Trainings
No ratings yet
Data Science by Internshala Trainings
46 pages
Intro To ML
No ratings yet
Intro To ML
107 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
49 pages
Machine Learning
100% (2)
Machine Learning
104 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
Mlfa Autumn 22 Lec 01
No ratings yet
Mlfa Autumn 22 Lec 01
43 pages
Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
No ratings yet
Machine Learning: Professional CORE (CET3006B) T. Y. B.Tech CSE
106 pages
Unit 01
No ratings yet
Unit 01
32 pages
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
No ratings yet
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
94 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
01 Introduction 1
No ratings yet
01 Introduction 1
71 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
86 37 196 Mod 5
No ratings yet
86 37 196 Mod 5
52 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Lecture Notes 2016
No ratings yet
Lecture Notes 2016
132 pages
Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
Aimlf Unit 3
No ratings yet
Aimlf Unit 3
20 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
5.1 Large Scale ML
No ratings yet
5.1 Large Scale ML
10 pages
20ECE633T Machine Learning in VLSI
No ratings yet
20ECE633T Machine Learning in VLSI
81 pages
01 Introduction
No ratings yet
01 Introduction
43 pages
ML Notes
No ratings yet
ML Notes
18 pages
6 Advanced Graphics
No ratings yet
6 Advanced Graphics
33 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Annal Horowitz Mammen 2004
No ratings yet
Annal Horowitz Mammen 2004
32 pages
Jayawardhana - Samaranayake - 2003 - Accelerated Testing - Weibull
No ratings yet
Jayawardhana - Samaranayake - 2003 - Accelerated Testing - Weibull
16 pages
Fundamentals of Machine Learning II
No ratings yet
Fundamentals of Machine Learning II
13 pages
Manua (Nuc. Physics)
No ratings yet
Manua (Nuc. Physics)
175 pages
Probability and Statistic Chapter3
No ratings yet
Probability and Statistic Chapter3
55 pages
CS464 Ch1 Intro Fall2020
No ratings yet
CS464 Ch1 Intro Fall2020
83 pages
MSA Study
No ratings yet
MSA Study
15 pages
Midterm Exam Version B
No ratings yet
Midterm Exam Version B
19 pages
Applied Machine Learning
No ratings yet
Applied Machine Learning
49 pages
Module 5 Ge 4educ
No ratings yet
Module 5 Ge 4educ
12 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
MATH220 Probability and Statistics: Asst. Prof. Merve BULUT YILGÖR
No ratings yet
MATH220 Probability and Statistics: Asst. Prof. Merve BULUT YILGÖR
27 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Lab Assignment STA680
No ratings yet
Lab Assignment STA680
7 pages
Bus 173 - 2
No ratings yet
Bus 173 - 2
27 pages
Choosing PLS Path Modeling As Analytical Method in European Management Research - A Realist Perspective
No ratings yet
Choosing PLS Path Modeling As Analytical Method in European Management Research - A Realist Perspective
8 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Internship Report
No ratings yet
Internship Report
31 pages
2016 Hkdse M1
No ratings yet
2016 Hkdse M1
6 pages
QBM101 Business Statistics
No ratings yet
QBM101 Business Statistics
4 pages
Project Management PERT/CPM
100% (1)
Project Management PERT/CPM
32 pages
Applied Natural Language Processing: Barbara Rosario
No ratings yet
Applied Natural Language Processing: Barbara Rosario
39 pages
Unit 5 Mfds
No ratings yet
Unit 5 Mfds
4 pages
Example of Descriptive Statistics in Nursing Research
No ratings yet
Example of Descriptive Statistics in Nursing Research
2 pages
Population Pharmacokinetics II: Estimation Methods
No ratings yet
Population Pharmacokinetics II: Estimation Methods
9 pages
Question Paper Code:: Reg. No.
No ratings yet
Question Paper Code:: Reg. No.
4 pages
5 BM
No ratings yet
5 BM
20 pages
đề CLC số 1
No ratings yet
đề CLC số 1
2 pages
Tutorial 5
No ratings yet
Tutorial 5
5 pages
215 Final Exam Formula Sheet
No ratings yet
215 Final Exam Formula Sheet
2 pages
Operation Management
No ratings yet
Operation Management
43 pages

AA2 Intro ML 2024

Uploaded by

AA2 Intro ML 2024

Uploaded by

11/18/24

Examples of learning problems

Learning from data

Learning from data

Average percentage of words or characters in an email message equal to the

Learning from data

Learning from data

Handwritten digit recognition

Handwritten digit recognition

Learning from data

Predict land usage

Learning from data

Learning from data

Classification of electricity prosumers

Learning from data

PSO for feature selection using SVM

Learning from data

Estimated pasture characteristics

Learning from data

Optimization of EVs charging

Learning from data

Registration of 3D ultrasound for neurosurgery

Learning from data

Computational vision and drones

Learning from data

Generalizing GAN for robot fault diagnosis

• Generative adversarial networks

Supervised learning problem

Statistical Learning vs Machine Learning

What is Machine Learning?

What is f(X) good for?

Is there an ideal f(X)?

The regression function f(x)

The curse of dimensionality

Parametric and structured models

• Red points are simulated values for

Linear regression model

Spline regression model

Spline regression model

Assessing model accuracy

And another example

Bias-variance tradeoff for the examples

• Is there an ideal C(X)? Suppose the K elements in C are numbered 1, 2, …,K.

• Nearest-neighbor averaging can be used as before.

Classification: some details

Error rates of example

You might also like