0% found this document useful (0 votes)

16 views37 pages

ML Lecture1

Uploaded by

maitreyee.banerjee98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views37 pages

ML Lecture1

Uploaded by

maitreyee.banerjee98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Machine Learning

(Lecture 1)

UEM/IEM Summer 2018

1
What is Machine Learning?
• Machine learning is a data analytics technique that
teaches computers to do what comes naturally to
humans and animals: learn from experience.
Machine learning algorithms use computational
methods to “learn” information directly from data
without relying on a predetermined equation as a
model.
• The algorithms adaptively improve their
performance as the number of samples available
for learning increases.

Source: https://www.mathworks.com/discovery/machine-learning.html 2
3
Source: http://www.nersc.gov/users/data-analytics/data-analytics-2/deep-learning/
Why Use Machine Learning (ML)?
• To do three practical things better as a (software)
engineer:
1. Reduce time programming
2. Customize and scale products
3. Complete seemingly “unprogrammable” tasks

4
Why Use Machine Learning (ML)?
• Philosophical reasons:
• ML changes the way you think about problems.
• Software engineers think logically and mathematically
• Focus shift in ML:
• Mathematical science to natural science
• Observations of uncertain world
• Running experiments
• Use statistics (not logic) to analyze the experiments
• Think like scientists
• Open up new areas to explore with ML

5
What is (Supervised) ML?
• ML systems learn how to combine input to produce
useful predictions on never-before-seen data

6
What is (Supervised) ML?
• Terminology: Labels and Features
• Label is the true thing we’re predicting: y
• The y variable in basic linear regression
• The label could be the future price of wheat, the kind of animal
shown in a picture, the meaning of an audio clip, or just about
anything.
• Features are input variables describing our data: xi
• The {x1, x2, … xn} variables in basic linear regression
• A simple machine learning project might use a single feature, while
a more sophisticated machine learning project could use millions
of features.
• In the spam detector example, the features could include the
following:
• words in the email text
• sender's address
• time of day the email was sent
• email contains the phrase "one weird trick." 7
What is (Supervised) ML?
• Terminology: Examples
• Example is a particular instance of data, x
(x is a vector)
• Labeled example has {features, label}: (x, y)
• Used to train the model

• In our spam detector example, the labeled examples would be

individual emails that users have explicitly marked as "spam" or
"not spam."
8
What is (Supervised) ML?
• Terminology: Examples
• Unlabeled example has {features, ?}: (x, ?)
• Used for making predictions on new data

• Once we've trained our model with labeled examples,

we use that model to predict the label on unlabeled
examples. In the spam detector, unlabeled examples are
new emails that humans haven't yet labeled.
9
What is (Supervised) ML?
• Terminology: Models
• Model maps examples to predicted labels: y'
• Defined by internal parameters, which are learned
• For example, a spam detection model might associate
certain features strongly with "spam".
• Two phases of a model’s life:
• Training means creating or learning the model. That is, you
show the model labeled examples and enable the model to
gradually learn the relationships between features and label.
• Inference means applying the trained model to unlabeled
examples. That is, you use the trained model to make useful
predictions (y'). For example, during inference, you can predict
medianHouseValue for new unlabeled examples.
10
What is (Supervised) ML?
• Terminology: Regression vs. classification
• A regression model predicts continuous values. For
example, regression models make predictions that
answer questions like the following:
• What is the value of a house in California?
• What is the probability that a user will click on this ad?
• A classification model predicts discrete values. For
example, classification models make predictions that
answer questions like the following:
• Is a given email message spam or not spam?
• Is this an image of a dog, a cat, or a hamster?

11
Source: https://searchengineland.com/experiment-trying-predict-google-rankings-253621 12
Quiz
• Suppose you want to develop a supervised machine
learning model to predict whether a given email is
"spam" or "not spam." Which of the following
statements are true?
1. We'll use unlabeled examples to train the model.
2. Words in the subject header will make good labels.
3. The labels applied to some examples might be
unreliable.
4. Emails not marked as "spam" or "not spam" are
unlabeled examples.

13
Quiz
• Suppose you want to develop a supervised machine
learning model to predict whether a given email is
"spam" or "not spam." Which of the following
statements are true?
1. We'll use unlabeled examples to train the model.
2. Words in the subject header will make good labels.
3. The labels applied to some examples might be
unreliable.
4. Emails not marked as "spam" or "not spam" are
unlabeled examples.

14
Quiz
• Suppose an online shoe store wants to create a
supervised ML model that will provide personalized
shoe recommendations to users. That is, the model
will recommend certain pairs of shoes to Marty and
different pairs of shoes to Janet. Which of the
following statements are true?
1. The shoes that a user adores is a useful label.
2. Shoe size is a useful feature.
3. User clicks on a shoe's description is a useful label.
4. Shoe beauty is a useful feature.

15
Quiz
• Suppose an online shoe store wants to create a
supervised ML model that will provide personalized
shoe recommendations to users. That is, the model
will recommend certain pairs of shoes to Marty and
different pairs of shoes to Janet. Which of the
following statements are true?
1. The shoes that a user adores is a useful label.
2. Shoe size is a useful feature.
3. User clicks on a shoe's description is a useful label.
4. Shoe beauty is a useful feature.

16
Linear Regression

• Linear regression is a method for finding the

straight line or hyperplane that best fits a set of
points.
• Let’s explore linear regression intuitively before
laying the groundwork for a machine learning
approach to linear regression.

17
• Can you tell the temperature by
listening to the chirping of a cricket?

• Yes!
Temperature (F)
= # of chirps/15 seconds + 37

18
Linear Regression

• It has long been known that crickets (an insect

species) chirp more frequently on hotter days than
on cooler days.
• For decades, professional and amateur scientists
have cataloged data on chirps-per-minute and
temperature.
• As a birthday gift, your Aunt Ruth gives you her
cricket database and asks you to learn a model to
predict this relationship. Using this data, you want
to explore this relationship.
19
Linear Regression

Figure 1. Chirps per Minute vs. Temperature in Celsius.

Is this relationship between chirps and temperature linear?

20
Linear Regression

Figure 2. A linear relationship.

21
Linear Regression
• Using the equation for a line, you could write down
this relationship as follows:
y=mx+b
where:
• y is the temperature in Celsius—the value we're trying
to predict.
• m is the slope of the line.
• x is the number of chirps per minute—the value of our
input feature.
• b is the y-intercept.

22
Linear Regression
• By convention in machine learning, you'll write the
equation for a model slightly differently:
y′=b+w1x1
where:
• y′ is the predicted label (a desired output).
• b is the bias (the y-intercept), sometimes referred to
as w0.
• w1 is the weight of feature 1. Weight is the same
concept as the "slope" m in the traditional equation of a
line.
• x1 is a feature (a known input).

23
Linear Regression
• To infer (predict) the temperature y′ for a new
chirps-per-minute value x1, just substitute
the x1 value into this model.
• Although this model uses only one feature, a more
sophisticated model might rely on multiple
features, each having a separate weight (w1, w2,
etc.). For example, a model that relies on three
features might look as follows:

24
Training and Loss
• Training:
• Training a model simply means learning (determining)
good values for all the weights and the bias from labeled
examples. In supervised learning, a machine learning
algorithm builds a model by examining many examples
and attempting to find a model that minimizes loss; this
process is called empirical risk minimization.

25
Training and Loss
• Loss:
• Loss is the penalty for a bad prediction. That is, loss is a
number indicating how bad the model's prediction was
on a single example.
• If the model's prediction is perfect, the loss is zero;
otherwise, the loss is greater.
• The goal of training a model is to find a set of weights
and biases that have low loss, on average, across all
examples.

26
Training and Loss
• The red arrow represents loss.
• The blue line represents predictions.

Figure 3. High loss in the left model; low loss in the right model.

• The blue line in the right plot is a much better

predictive model than the blue line in the left plot. 27
Training and Loss
• Can you create a mathematical function—a loss
function—that would aggregate the individual losses in
a meaningful fashion?
• Squared loss: a popular loss function
• The linear regression models we'll examine here use a loss
function called squared loss (also known as L2 loss). The
squared loss for a single example is as follows:
= the square of the difference between the label and the prediction
= (observation - prediction(x))2
= (y - y')2

28
Training and Loss
• Mean square error (MSE) is the average squared loss per
example over the whole dataset. To calculate MSE, sum up
all the squared losses for individual examples and then
divide by the number of examples:

where:
• (x,y) is an example in which
• x is the set of features (for example, chirps/minute, age, gender) that the
model uses to make predictions.
• y is the example's label (for example, temperature).
• prediction(x) is a function of the weights and bias in combination
with the set of features x.
• D is a data set containing many labeled examples, which
are (x,y) pairs.
• N is the number of examples in D.
• Although MSE is commonly-used in machine learning, it is
neither the only practical loss function nor the best loss
function for all circumstances. 29
Quiz
• Consider the following two plots:

• Which of the two data sets shown in the preceding

plots has the higher Mean Squared Error (MSE)?

30
Quiz
• Consider the following two plots:

• Which of the two data sets shown in the preceding

plots has the higher Mean Squared Error (MSE)?

Left:

Right:
31
Reducing Loss: An Iterative Approach
• To train a model, we need a good way to reduce the
model’s loss. An iterative approach is one widely
used method for reducing loss, and is as easy and
efficient as walking down a hill.
• The following figure suggests the iterative trial-and-
error process that machine learning algorithms use
to train a model:

Figure 4. An iterative approach to training a model. 32

Reducing Loss: An Iterative Approach
• Iterative strategies are prevalent in machine
learning, primarily because they scale so well to
large data sets.
• The "model" takes one or more features as input
and returns one prediction (y') as output.
• To simplify, consider a model that takes one feature
and returns one prediction:
y′=b+w1x1
• What initial values should we set for b and w1?

33
Reducing Loss: An Iterative Approach
y′=b+w1x1
• For linear regression problems, it turns out that the
starting values aren't important. We could pick
random values, but we'll just take the following
trivial values instead:
• b=0
• w1 = 0
• Suppose that the first feature value is 10. Plugging
that feature value into the prediction function
yields:
• y' = 0 + 0(10)
• y' = 0 34
Reducing Loss: An Iterative Approach
• The "Compute Loss" part of the diagram is the loss
function that the model will use. Suppose we use the
squared loss function. The loss function takes in two
input values:
• y': The model's prediction for features x
• y: The correct label corresponding to features x.
• We’ve reached the "Compute parameter updates" part
of the diagram.
• The machine learning system examines here the value
of the loss function and generates new values
for b and w1.
35
Reducing Loss: An Iterative Approach
• The machine learning system devises new values and
then re-evaluates all those features against all those
labels, yielding a new value for the loss function, which
yields new parameter values.
• The learning continues iterating until the algorithm
discovers the model parameters with the lowest
possible loss.
• Usually, you iterate until overall loss stops changing or
at least changes extremely slowly. When that happens,
we say that the model has converged.

36
Reference
• This lecture note has been developed based on the
machine learning crash course at Google, which is
under Creative Commons Attribution 3.0 License.

D-MSS-DS-23 Exam Updated Practice Questions 2025
No ratings yet
D-MSS-DS-23 Exam Updated Practice Questions 2025
5 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
Lecture 1.1. Introduction
No ratings yet
Lecture 1.1. Introduction
48 pages
Data Analytics - ML Lecturenotes
No ratings yet
Data Analytics - ML Lecturenotes
85 pages
Machine Learning
No ratings yet
Machine Learning
122 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Project
No ratings yet
Project
12 pages
AI
No ratings yet
AI
52 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
Supervised Machine Learning - Linear Regression
No ratings yet
Supervised Machine Learning - Linear Regression
92 pages
AI.5 Machine Learning (21 26)
No ratings yet
AI.5 Machine Learning (21 26)
176 pages
Lecture 2-Regression
No ratings yet
Lecture 2-Regression
49 pages
Machine Learning Ppts
No ratings yet
Machine Learning Ppts
38 pages
Machine Learning - Introduction
No ratings yet
Machine Learning - Introduction
36 pages
ML (Theorey)
No ratings yet
ML (Theorey)
18 pages
A06 Intro To ML
No ratings yet
A06 Intro To ML
17 pages
CE880 Lecture5 Slides
No ratings yet
CE880 Lecture5 Slides
32 pages
Module 2
No ratings yet
Module 2
139 pages
MLP Unit-I
No ratings yet
MLP Unit-I
62 pages
Machine Learning IAI
No ratings yet
Machine Learning IAI
94 pages
AI ML 3 Updated
No ratings yet
AI ML 3 Updated
34 pages
Aiml 4
No ratings yet
Aiml 4
107 pages
Machine Learning
No ratings yet
Machine Learning
100 pages
Slide 1
No ratings yet
Slide 1
29 pages
DS-05 Introduction To Machine Learning
No ratings yet
DS-05 Introduction To Machine Learning
103 pages
Machine Learning Types
No ratings yet
Machine Learning Types
30 pages
Chapter - 2-ML
No ratings yet
Chapter - 2-ML
63 pages
Lecture1 - ML Introduction
No ratings yet
Lecture1 - ML Introduction
21 pages
Unit - III
No ratings yet
Unit - III
40 pages
Machine Learning Reg
No ratings yet
Machine Learning Reg
45 pages
Machine Learning
No ratings yet
Machine Learning
115 pages
Week 6 - Lecture 11-1
No ratings yet
Week 6 - Lecture 11-1
28 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
12 pages
Lec01 - Intro
No ratings yet
Lec01 - Intro
37 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
UNit 1 Introduction To ML
No ratings yet
UNit 1 Introduction To ML
225 pages
Lecture 4.2 Supervised Learning Classification
No ratings yet
Lecture 4.2 Supervised Learning Classification
25 pages
21CSC305P ML - Unit 1-E
No ratings yet
21CSC305P ML - Unit 1-E
137 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
Lecture 4.1 Machine Learning Deep Learning Reinforcement Learning
No ratings yet
Lecture 4.1 Machine Learning Deep Learning Reinforcement Learning
32 pages
Linear Reg Machine - Learning Elaborated
No ratings yet
Linear Reg Machine - Learning Elaborated
247 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
25 pages
Module 01 - Introduction
No ratings yet
Module 01 - Introduction
35 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
Unit II
No ratings yet
Unit II
25 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
10 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
ML Intro Theory
No ratings yet
ML Intro Theory
10 pages
Chap 1 Introduction To ML
No ratings yet
Chap 1 Introduction To ML
33 pages
Ai Unit-4-1
No ratings yet
Ai Unit-4-1
9 pages
Ca10bd6d De86 4bae 9427 c60d433d2076 Supervised Learning
No ratings yet
Ca10bd6d De86 4bae 9427 c60d433d2076 Supervised Learning
17 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
ML 2
No ratings yet
ML 2
39 pages
Unit 4 - Machine Learning PDF
No ratings yet
Unit 4 - Machine Learning PDF
49 pages
Machine Learning
No ratings yet
Machine Learning
41 pages
Machine Learning: Introduction and Linear Regression
No ratings yet
Machine Learning: Introduction and Linear Regression
29 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
Oslo 1
No ratings yet
Oslo 1
69 pages
Visual Media Portfolio: Breanne Huber
No ratings yet
Visual Media Portfolio: Breanne Huber
18 pages
SNL 511-609 + 1309 - 20210127
No ratings yet
SNL 511-609 + 1309 - 20210127
5 pages
Fda Udi Unique Device Identifier Guidance
100% (1)
Fda Udi Unique Device Identifier Guidance
11 pages
Mohammed Radwan CV PDF
No ratings yet
Mohammed Radwan CV PDF
6 pages
Automobile Gannt Chart
No ratings yet
Automobile Gannt Chart
6 pages
CF-AX3mk2 Win8.1 64bit English DriverInstallationGuide
No ratings yet
CF-AX3mk2 Win8.1 64bit English DriverInstallationGuide
3 pages
WF Broadcast Network LUXeTV
No ratings yet
WF Broadcast Network LUXeTV
2 pages
Maharashtra PWD JE Previous PDF - Mechanical 2013
No ratings yet
Maharashtra PWD JE Previous PDF - Mechanical 2013
16 pages
Sybase Administration Guid 1 PDF
No ratings yet
Sybase Administration Guid 1 PDF
432 pages
NI ELVIS II Prototyping Board Pinouts
No ratings yet
NI ELVIS II Prototyping Board Pinouts
5 pages
How To Set Up A LLC in USA For Non Residents
No ratings yet
How To Set Up A LLC in USA For Non Residents
29 pages
Schneider - 45RIEC PDF
No ratings yet
Schneider - 45RIEC PDF
28 pages
IGBT ApplicationManual E
No ratings yet
IGBT ApplicationManual E
16 pages
Bead Dealers in Chawri Bazar, Delhi, India - Justdial
No ratings yet
Bead Dealers in Chawri Bazar, Delhi, India - Justdial
6 pages
Industry Essentials: Enterprise Storage: Delivery Type
No ratings yet
Industry Essentials: Enterprise Storage: Delivery Type
1 page
Gis&Cad Lab Manual
No ratings yet
Gis&Cad Lab Manual
53 pages
TKR-720 (N) /820 (N) : VHF/UHF Desktop Repeater
No ratings yet
TKR-720 (N) /820 (N) : VHF/UHF Desktop Repeater
2 pages
Ensayo Sobre Un Día Lluvioso para Niños
100% (1)
Ensayo Sobre Un Día Lluvioso para Niños
6 pages
Foundation of Cyber Security: Semester III
No ratings yet
Foundation of Cyber Security: Semester III
7 pages
Hmis Report 27.08.2016 TO 02.09.2016
No ratings yet
Hmis Report 27.08.2016 TO 02.09.2016
7 pages
Probuds t31
No ratings yet
Probuds t31
7 pages
Config Guide Firewall Filter
100% (1)
Config Guide Firewall Filter
468 pages
Department of Computer Science and Engineering: Course Name: Differential and Integral Calculus Course Code: MATH 207
No ratings yet
Department of Computer Science and Engineering: Course Name: Differential and Integral Calculus Course Code: MATH 207
13 pages
Paper 1 Answers MPPSC 2021 P
No ratings yet
Paper 1 Answers MPPSC 2021 P
12 pages
Intel 8085 Architecture
No ratings yet
Intel 8085 Architecture
8 pages
S1 - Human Computer Interaction
No ratings yet
S1 - Human Computer Interaction
2 pages
TUV Certificate - HC900 Safety
No ratings yet
TUV Certificate - HC900 Safety
1 page
Review Questions
100% (1)
Review Questions
44 pages

ML Lecture1

Uploaded by

ML Lecture1

Uploaded by

Machine Learning

UEM/IEM Summer 2018

• In our spam detector example, the labeled examples would be

• Once we've trained our model with labeled examples,

• Linear regression is a method for finding the

• It has long been known that crickets (an insect

Figure 1. Chirps per Minute vs. Temperature in Celsius.

Is this relationship between chirps and temperature linear?

Figure 2. A linear relationship.

• The blue line in the right plot is a much better

• Which of the two data sets shown in the preceding

• Which of the two data sets shown in the preceding

Figure 4. An iterative approach to training a model. 32

You might also like