ML Lecture1
ML Lecture1
(Lecture 1)
1
What is Machine Learning?
• Machine learning is a data analytics technique that
teaches computers to do what comes naturally to
humans and animals: learn from experience.
Machine learning algorithms use computational
methods to “learn” information directly from data
without relying on a predetermined equation as a
model.
• The algorithms adaptively improve their
performance as the number of samples available
for learning increases.
Source: https://www.mathworks.com/discovery/machine-learning.html 2
3
Source: http://www.nersc.gov/users/data-analytics/data-analytics-2/deep-learning/
Why Use Machine Learning (ML)?
• To do three practical things better as a (software)
engineer:
1. Reduce time programming
2. Customize and scale products
3. Complete seemingly “unprogrammable” tasks
4
Why Use Machine Learning (ML)?
• Philosophical reasons:
• ML changes the way you think about problems.
• Software engineers think logically and mathematically
• Focus shift in ML:
• Mathematical science to natural science
• Observations of uncertain world
• Running experiments
• Use statistics (not logic) to analyze the experiments
• Think like scientists
• Open up new areas to explore with ML
5
What is (Supervised) ML?
• ML systems learn how to combine input to produce
useful predictions on never-before-seen data
6
What is (Supervised) ML?
• Terminology: Labels and Features
• Label is the true thing we’re predicting: y
• The y variable in basic linear regression
• The label could be the future price of wheat, the kind of animal
shown in a picture, the meaning of an audio clip, or just about
anything.
• Features are input variables describing our data: xi
• The {x1, x2, … xn} variables in basic linear regression
• A simple machine learning project might use a single feature, while
a more sophisticated machine learning project could use millions
of features.
• In the spam detector example, the features could include the
following:
• words in the email text
• sender's address
• time of day the email was sent
• email contains the phrase "one weird trick." 7
What is (Supervised) ML?
• Terminology: Examples
• Example is a particular instance of data, x
(x is a vector)
• Labeled example has {features, label}: (x, y)
• Used to train the model
11
Source: https://searchengineland.com/experiment-trying-predict-google-rankings-253621 12
Quiz
• Suppose you want to develop a supervised machine
learning model to predict whether a given email is
"spam" or "not spam." Which of the following
statements are true?
1. We'll use unlabeled examples to train the model.
2. Words in the subject header will make good labels.
3. The labels applied to some examples might be
unreliable.
4. Emails not marked as "spam" or "not spam" are
unlabeled examples.
13
Quiz
• Suppose you want to develop a supervised machine
learning model to predict whether a given email is
"spam" or "not spam." Which of the following
statements are true?
1. We'll use unlabeled examples to train the model.
2. Words in the subject header will make good labels.
3. The labels applied to some examples might be
unreliable.
4. Emails not marked as "spam" or "not spam" are
unlabeled examples.
14
Quiz
• Suppose an online shoe store wants to create a
supervised ML model that will provide personalized
shoe recommendations to users. That is, the model
will recommend certain pairs of shoes to Marty and
different pairs of shoes to Janet. Which of the
following statements are true?
1. The shoes that a user adores is a useful label.
2. Shoe size is a useful feature.
3. User clicks on a shoe's description is a useful label.
4. Shoe beauty is a useful feature.
15
Quiz
• Suppose an online shoe store wants to create a
supervised ML model that will provide personalized
shoe recommendations to users. That is, the model
will recommend certain pairs of shoes to Marty and
different pairs of shoes to Janet. Which of the
following statements are true?
1. The shoes that a user adores is a useful label.
2. Shoe size is a useful feature.
3. User clicks on a shoe's description is a useful label.
4. Shoe beauty is a useful feature.
16
Linear Regression
17
• Can you tell the temperature by
listening to the chirping of a cricket?
• Yes!
Temperature (F)
= # of chirps/15 seconds + 37
18
Linear Regression
22
Linear Regression
• By convention in machine learning, you'll write the
equation for a model slightly differently:
y′=b+w1x1
where:
• y′ is the predicted label (a desired output).
• b is the bias (the y-intercept), sometimes referred to
as w0.
• w1 is the weight of feature 1. Weight is the same
concept as the "slope" m in the traditional equation of a
line.
• x1 is a feature (a known input).
23
Linear Regression
• To infer (predict) the temperature y′ for a new
chirps-per-minute value x1, just substitute
the x1 value into this model.
• Although this model uses only one feature, a more
sophisticated model might rely on multiple
features, each having a separate weight (w1, w2,
etc.). For example, a model that relies on three
features might look as follows:
24
Training and Loss
• Training:
• Training a model simply means learning (determining)
good values for all the weights and the bias from labeled
examples. In supervised learning, a machine learning
algorithm builds a model by examining many examples
and attempting to find a model that minimizes loss; this
process is called empirical risk minimization.
25
Training and Loss
• Loss:
• Loss is the penalty for a bad prediction. That is, loss is a
number indicating how bad the model's prediction was
on a single example.
• If the model's prediction is perfect, the loss is zero;
otherwise, the loss is greater.
• The goal of training a model is to find a set of weights
and biases that have low loss, on average, across all
examples.
26
Training and Loss
• The red arrow represents loss.
• The blue line represents predictions.
Figure 3. High loss in the left model; low loss in the right model.
28
Training and Loss
• Mean square error (MSE) is the average squared loss per
example over the whole dataset. To calculate MSE, sum up
all the squared losses for individual examples and then
divide by the number of examples:
where:
• (x,y) is an example in which
• x is the set of features (for example, chirps/minute, age, gender) that the
model uses to make predictions.
• y is the example's label (for example, temperature).
• prediction(x) is a function of the weights and bias in combination
with the set of features x.
• D is a data set containing many labeled examples, which
are (x,y) pairs.
• N is the number of examples in D.
• Although MSE is commonly-used in machine learning, it is
neither the only practical loss function nor the best loss
function for all circumstances. 29
Quiz
• Consider the following two plots:
30
Quiz
• Consider the following two plots:
Left:
Right:
31
Reducing Loss: An Iterative Approach
• To train a model, we need a good way to reduce the
model’s loss. An iterative approach is one widely
used method for reducing loss, and is as easy and
efficient as walking down a hill.
• The following figure suggests the iterative trial-and-
error process that machine learning algorithms use
to train a model:
33
Reducing Loss: An Iterative Approach
y′=b+w1x1
• For linear regression problems, it turns out that the
starting values aren't important. We could pick
random values, but we'll just take the following
trivial values instead:
• b=0
• w1 = 0
• Suppose that the first feature value is 10. Plugging
that feature value into the prediction function
yields:
• y' = 0 + 0(10)
• y' = 0 34
Reducing Loss: An Iterative Approach
• The "Compute Loss" part of the diagram is the loss
function that the model will use. Suppose we use the
squared loss function. The loss function takes in two
input values:
• y': The model's prediction for features x
• y: The correct label corresponding to features x.
• We’ve reached the "Compute parameter updates" part
of the diagram.
• The machine learning system examines here the value
of the loss function and generates new values
for b and w1.
35
Reducing Loss: An Iterative Approach
• The machine learning system devises new values and
then re-evaluates all those features against all those
labels, yielding a new value for the loss function, which
yields new parameter values.
• The learning continues iterating until the algorithm
discovers the model parameters with the lowest
possible loss.
• Usually, you iterate until overall loss stops changing or
at least changes extremely slowly. When that happens,
we say that the model has converged.
36
Reference
• This lecture note has been developed based on the
machine learning crash course at Google, which is
under Creative Commons Attribution 3.0 License.
37