Data Science Boot
Camp
Sibt ul Hussain
http://sites.google.com/SibtulHussain
Linear Regression
Majority of the content is borrowed from multiple online
resources. So
2
Predict Gold Prices Over the next
day….
43000
42000
41000
40000
39000
Series1
38000
37000
36000
35000
0 20 40 60 80 100 120 140 160 180 200
3
500000
Housing Prices400000
300000
Price 200000
(in
1000s of 100000
dollars)
0
500 1000 1500 2000 2500 3000
Size (feet2)
Regression Problem
Predict real-valued
output
4
Training set of
housing prices Size in Price ($) in
feet2 (x) 1000's (y)
2104 460
1416 232
1534 315
852 178
Notation: … …
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable
5
Training Set Size in Price ($) in
feet2 (x) 1000's (y)
2104 460
1416 232
1534 315
852 178
… …
Hypothesis:
‘s: Parameters
How to choose ‘s ?
6
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
7
Idea: Choose so
that
is close
to for our
training examples
8
Score Function (Or Hypothesis)
• Score Function or Hypthosis is
used to generate the output given an
input X. In linear regression our
9
Cost Function
• Cost function is used to evaluate our
hypothesis, i.e. how good is our chosen
hypothesis.
▫ For instance, in case of linear regression cost
functions can be:
• Goal of learning thus reduces to searching in
hypthosis space for the best possible hypothesis
(a hypothesis thats optimizes our cost function).
• In other words cost functions specifies the
purpose of learning algorithm.
10
Hypothesis:
Parameters:
Cost Function:
Goal:
11
Possible Cost Functions for LR
• Absolute (or L1) Cost Function:
• Properties:
▫ Penalty for positive and negative deviations the
same
▫ Penalty for large deviations remains same, that
is an error with small value as well as large
value receives same treatment.
▫ Difficult to derivate (non-differentiable at zero).
▫ Convex
12
Possible Cost Functions for LR
• L2 Cost Function:
• Properties:
▫ Penalty for positive and negative deviations
is same
▫ Penalty for large deviations is large
compared to small deviations.
▫ Easy to derivate.
▫ Convex
13
L2 Cost Function
14
How to Optimize Cost Function
• Random Search
• Define a finite interval for values of
• Iterate over the interval
Evaluate the cost function,
Cache the results.
• Choose the values of parameters that
give optimum value of cost function.
15
How to Optimize Cost Function
• Derivation Approach:
▫ Compute derivate of J w.r.t each
variable.
▫ Set all the derivatives equal to zero, i.e.
and solve system of linear equations to
achieve optimum values for the
parameters.
16
How to Optimize Cost Function
• Gradient Descent
How ???
17
How to Optimize Cost Function
• Gradient Descent
18
19
20
21
22
23
Using Multiple Input
Features
24
25
26
27
28
29
Remember Debugging Trick
• Always plot your cost-
function in your gradient
descent loop w.r.t. number
of iterations. If gradient
descent is working then
J(θ) should decrease after
every iteration
• If J(θ) value is increasing -
means you probably need
a smaller α. This is
because your are
overshooting, like in right
graph.
30
Feature Scaling:
Always remember feature scaling can make
your convergence faster
31
What about Non-Linear Cases ?
32
33
34
Feature Mapping (A Simple
Trick)
• We will map our features to higher
dimensions using a simple trick.
• For example, You are given only feature X,
but you can expand this feature by
including higher-order polynomials of X,
i.e.
35
Different Mappings can be
used
36
Non-Linear Case
• Algorithm:
▫ Expand each feature to include the non-
linear mapping.
▫ Learn set of parameters using gradient
descent.