0% found this document useful (0 votes)

39 views37 pages

Python Tutorial

Uploaded by

Sozha Vendhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views37 pages

Python Tutorial

Uploaded by

Sozha Vendhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Linear Regression

Yijun Zhao
Northeastern University

Fall 2016

Yijun Zhao Linear Regression

Regression Examples
Any Attributes Continuous Value
x =⇒ y
{age, major , gender , race} ⇒GPA
{income, credit score, profession} ⇒ loan
{college, major , GPA} ⇒ future income
..
.

Yijun Zhao Linear Regression

Regression Examples
Data often has/can be converted into matrix form:

Age Gender Race Major GPA

20 0 A Art 3.85
22 0 C Engineer 3.90
25 1 A Engineer 3.50
24 0 AA Art 3.60
19 1 H Art 3.70
18 1 C Engineer 3.00
30 0 AA Engineer 3.80
25 0 C Engineer 3.95
28 1 A Art 4.00
26 0 C Engineer 3.20

Yijun Zhao Linear Regression

Formal Problem Setup
Given N observations

{(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )}

a regression problem tries to uncover the function

yi = f (xi ) ∀i = 1, 2. . . . , n

such that for a new input value x∗ , we can

accurately predict the corresponding value

y∗ = f (x∗ ).

Yijun Zhao Linear Regression

Linear Regression
Assume the function f is a linear combination
of components in x
Formally, let x = (1, x1 , x2 , . . . , xd )T , we have
y = ω0 + ω1 x 1 + ω2 x 2 + · · · + ω d x d
= wT x
where w = (ω0 , ω1 , ω2 , . . . , ωd )T
w is the parameter to estimate !
Prediction:
y∗ = wT x∗

Yijun Zhao Linear Regression

Visual Illustration

Figure: 1D and 2D linear regression

Yijun Zhao Linear Regression

Error Measure
Mean Squared Error (MSE):
N
1X T
E (w) = (w xn − yn )2
N n=1
1
= k Xw − y k2
where N
   
— x1 T — y1
 — x2 T —   y2 
X=  y= 
 ... 
 ... 
T yN
— xN —

Yijun Zhao Linear Regression

Minimizing Error Measure
1
E (w) = N k Xw − y k2

5E (w) = N2 XT(Xw − y) = 0

XTXw = XTy

w = X† y

where X† = (XTX)−1XT is the

’pseudo-inverse’ of X

Yijun Zhao Linear Regression

LR Algorithm Summary
Ordinary Least Squares (OLS) Algorithm
Construct matrix X and the vector y from
the dataset {(x1 , y1 ), x2 , y2 ), . . . , (xN , yN )}
(each x includes x0 = 1) as follows:
   
— xT 1 — y1
 — xT —   y2 
X= 2  y =  
 ...   ... 
— xT N — yN
Compute X† = (XT X)−1 XT
Return w = X† y

Yijun Zhao Linear Regression

Gradient Descent
Why?
Minimize our target function (E (w )) by
moving down in the steepest direction

Yijun Zhao Linear Regression

Gradient Descent

Gradient Descent Algorithm

Initialize the w(0) for time t = 0
for t = 0, 1, 2, . . . do
Compute the gradient gt = 5E (w(t))
Set the direction to move, vt = −gt
Update w(t + 1) = w(t) + ηvt
Iterate until it is time to stop
Return the final weights w

Yijun Zhao Linear Regression

Gradient Descent
How η affects the algorithm?

Use 0.1 (practical observation)

Use variable size: ηt = η k 5E k

Yijun Zhao Linear Regression

OLS or Gradient Descent?

Yijun Zhao Linear Regression

Computational Complexity

OLS Gradient Descent

OLS is expensive when D is large!

Yijun Zhao Linear Regression

Linear Regression

What is the Probabilistic Interpretation?

Yijun Zhao Linear Regression

Normal Distribution

Right Skewed Left Skewed Random

Normal Distribution
Yijun Zhao Linear Regression
Normal Distribution
mean = median = mode
symmetry about the center
1 2
x ∼ N(µ, σ 2 ) =⇒ f (x) = σ√12π e − 2σ2 (x−µ)

Yijun Zhao Linear Regression

Central Limit Theorem
All things bell shaped!
Random occurrences over a large population
tend to wash out the asymmetry and
uniformness of individual events. A more
’natural’ distribution ensues. The name for it is
the Normal distribution (the bell curve).
Formal definition: If (y1 , . . . , yn ) are i.i.d. and
0 < σy2 < ∞, then when n is large the
distribution of ȳ is well approximated by a
σ2
normal distribution N(µy , ny ).

Yijun Zhao Linear Regression

Central Limit Theorem
Example:

Yijun Zhao Linear Regression

LR: Probabilistic Interpretation

Yijun Zhao Linear Regression

LR: Probabilistic Interpretation

1 − 12 (wT xi −yi )2
prob(yi |xi ) = √
2πσ
e 2σ
Yijun Zhao Linear Regression
LR: Probabilistic Interpretation
Likelihood of the entire dataset:
− 12 (wT xi −yi )2
Y
L ∝ e 2σ

− 12 (wT xi −yi )2
P
2σ
=e i

P T
Maximize L ⇐⇒ Minimize (w xi − yi )2
i

Yijun Zhao Linear Regression

Non-linear Transformation

Linear is limited:

Linear models become powerful when we

consider non-linear feature transformations:
Xi = (1, xi , xi2 ) =⇒ yi = ω0 + ω1 xi + ω2 xi2

Yijun Zhao Linear Regression

Overfitting

Yijun Zhao Linear Regression

Overfitting
How do we know we overfitted?
Ein : Error from the training data
Eout : Error from the test data
Example:

Yijun Zhao Linear Regression

Overfitting
How to avoid overfitting?
Use more data
Evaluate on a parameter tuning set
Regularization

Yijun Zhao Linear Regression

Regularization
Attempts to impose ”Occam’s razor” principle
Add a penalty term for model complexity
Most commonly used :
L2 regularization (ridge regression) minimizes:
E (w) =k Xw − y k2 + λ k w k2
where λ ≥ 0 and k w k2 = wT w
L1 regularization (LASSO) minimizes:
E (w) =k Xw − y k2 + λ|w|1
D
P
where λ ≥ 0 and |w|1 = |ωi |
i=1

Yijun Zhao Linear Regression

Regularization
L2: closed form solution

w = (XT X + λI)−1 XT y

L1: No closed form solution. Use quadratic

programming:

minimize k Xw − y k2 s.t. k w k1 ≤ s

Yijun Zhao Linear Regression

L2 Regularization Example

Yijun Zhao Linear Regression

Model Selection
Which model?
A central problem in supervised learning
Simple model: ”underfit” the data
Constant function
Linear model applied to quadratic data

Complex model: ”overfit” the data

High degree polynomials
Model with hidden logics that fits the data to
completion

Yijun Zhao Linear Regression

Bias-Variance Trade-off
N

1
(wT xn − yn )2 let ŷ = wT xn
P
Consider E N
n=1

E (ŷ − yn )2 can be decomposed into (reading):

var {noise} + bias 2 + var {yi }
var {noise}: can’t be reduced
bias 2 + var {yi } is what counts for prediction
High bias 2 : model mismatch, often due to
”underfitting”
High var {yi }: training set and test set
mismatch, often due to ”overfitting”
Yijun Zhao Linear Regression
Bias-Variance Trade-off
Often: low bias ⇒ high variance
low variance ⇒ high bias
Trade-off:

Yijun Zhao Linear Regression

How to choose λ ?
But we still need to pick λ.
Use the test set data ? NO!

Set aside another evaluation set

Small evaluation set ⇒ inaccurate estimated error
Large evaluation set ⇒ small training set

CrossValidation

Yijun Zhao Linear Regression

Cross Validation (CV)
Divide data into K folds
Alternatively train on all except k th folds, and
test on k th fold

Yijun Zhao Linear Regression

Cross Validation (CV)
How to choose K?
Common choice of K = 5, 10, or N (LOOCV)

Measure on average performance

Cost of computation: K folds × choices of λ

Yijun Zhao Linear Regression

Learning Curve
A learning curve plots the performance of the
algorithm as a function of the size of training data

Yijun Zhao Linear Regression

Learning Curve

Yijun Zhao Linear Regression

Solutions Manual For Statistical Computing With R - Rizzo
100% (1)
Solutions Manual For Statistical Computing With R - Rizzo
136 pages
The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
Some Notes From The Book - Pairs Trading - Quantitative Methods and Analysis by Ganapathy Vidyamurthy Weatherwax - Vidyamurthy - Notes
No ratings yet
Some Notes From The Book - Pairs Trading - Quantitative Methods and Analysis by Ganapathy Vidyamurthy Weatherwax - Vidyamurthy - Notes
32 pages
CAPE Applied Maths Unit 1 Summary With 2008-2015 Solutions
No ratings yet
CAPE Applied Maths Unit 1 Summary With 2008-2015 Solutions
154 pages
Collection and Presentation of Datafinal
No ratings yet
Collection and Presentation of Datafinal
95 pages
Introml 02 Regression Annotated PDF
No ratings yet
Introml 02 Regression Annotated PDF
26 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Single-Parameter Linear Regression: Predicting Real-Valued Outputs: An Introduction To Regression
No ratings yet
Single-Parameter Linear Regression: Predicting Real-Valued Outputs: An Introduction To Regression
51 pages
Andrew Rosenberg - Lecture 5: Linear Regression With Regularization CSC 84020 - Machine Learning
No ratings yet
Andrew Rosenberg - Lecture 5: Linear Regression With Regularization CSC 84020 - Machine Learning
38 pages
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
No ratings yet
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
25 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Business Statistics
No ratings yet
Business Statistics
16 pages
The Problem With Sturges' Rule For Constructing Histograms: Rob J Hyndman
No ratings yet
The Problem With Sturges' Rule For Constructing Histograms: Rob J Hyndman
2 pages
Learning SQL Zero To Hero
100% (1)
Learning SQL Zero To Hero
110 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
8.estimation I - 530
100% (1)
8.estimation I - 530
22 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Lasso
No ratings yet
Lasso
357 pages
Chapter 10 Solutions
100% (1)
Chapter 10 Solutions
22 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
FEM 3004 - Lab 8 - 24.12.20
No ratings yet
FEM 3004 - Lab 8 - 24.12.20
35 pages
統計摘要
No ratings yet
統計摘要
12 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
ML-Unit I - Linear Regression
No ratings yet
ML-Unit I - Linear Regression
74 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
Chapter 15: Two-Factor Analysis of Variance
No ratings yet
Chapter 15: Two-Factor Analysis of Variance
21 pages
Statistics For Business and Economics
No ratings yet
Statistics For Business and Economics
7 pages
EXAM 4 Review Fall 2010 Converted RTF With Key
No ratings yet
EXAM 4 Review Fall 2010 Converted RTF With Key
11 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
Top 12 Python Libraries
No ratings yet
Top 12 Python Libraries
15 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Regression Through The Origin
No ratings yet
Regression Through The Origin
5 pages
Linear Regression
No ratings yet
Linear Regression
104 pages
Spark A To Z
No ratings yet
Spark A To Z
63 pages
BUS STAT Chapter-3 Freq Distribution
No ratings yet
BUS STAT Chapter-3 Freq Distribution
5 pages
Lecture 3 Multi-Regresion 2022.
No ratings yet
Lecture 3 Multi-Regresion 2022.
16 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Regression
No ratings yet
Regression
11 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Hypothesis
No ratings yet
Hypothesis
3 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
SQL Fundamentals
No ratings yet
SQL Fundamentals
61 pages
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
No ratings yet
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
42 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
Forecasting
No ratings yet
Forecasting
7 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
ML 2
No ratings yet
ML 2
155 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Introduction Supervised Machine Learning
No ratings yet
Introduction Supervised Machine Learning
27 pages
Regression Analysis - Final2
No ratings yet
Regression Analysis - Final2
7 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
ML Unit
No ratings yet
ML Unit
23 pages
Sparse Regression
No ratings yet
Sparse Regression
37 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
3.1 Linear and Logistic Regression
No ratings yet
3.1 Linear and Logistic Regression
36 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Tables With Interpretations - TQM
No ratings yet
Tables With Interpretations - TQM
15 pages
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
No ratings yet
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
46 pages
CS181 HW0
No ratings yet
CS181 HW0
9 pages
CEES 5020 Applied Probability - Syllabus Preview
No ratings yet
CEES 5020 Applied Probability - Syllabus Preview
4 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Class-11 PROBABILITY
No ratings yet
Class-11 PROBABILITY
2 pages
8614 Assignment 2
No ratings yet
8614 Assignment 2
14 pages
Sci ML Mock Exam 2023
No ratings yet
Sci ML Mock Exam 2023
8 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
ML - Lec 5 - Regression - Gradient Descent Least Square
No ratings yet
ML - Lec 5 - Regression - Gradient Descent Least Square
59 pages
2 Linear Regression
No ratings yet
2 Linear Regression
14 pages
Psychologists Should Use Brunner-Munzel's Instead of Mann-Whitney's U Test As The Default Nonparametric Procedure (2021)
No ratings yet
Psychologists Should Use Brunner-Munzel's Instead of Mann-Whitney's U Test As The Default Nonparametric Procedure (2021)
14 pages
Linear Regression - 1st Draft
No ratings yet
Linear Regression - 1st Draft
5 pages
ML Lab
No ratings yet
ML Lab
14 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Lec24 Linear Regression
No ratings yet
Lec24 Linear Regression
10 pages
Environment Assignmnet 3
No ratings yet
Environment Assignmnet 3
3 pages
Ai Unit 2
No ratings yet
Ai Unit 2
33 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Lecture 3
No ratings yet
Lecture 3
33 pages
FRM Tutorial 1
No ratings yet
FRM Tutorial 1
11 pages
Sio, U.N., & Ormerod, T.C. (2009) - Does Incubation Enhance Problem Solving, A Meta-Analytic Review. Psychological Bulletin, 135
No ratings yet
Sio, U.N., & Ormerod, T.C. (2009) - Does Incubation Enhance Problem Solving, A Meta-Analytic Review. Psychological Bulletin, 135
94 pages
ML 3
No ratings yet
ML 3
56 pages
ML Ques Mod-1
No ratings yet
ML Ques Mod-1
25 pages
Linear Reg, Logistic Reg and SVM
No ratings yet
Linear Reg, Logistic Reg and SVM
40 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
Day.9 SML
No ratings yet
Day.9 SML
23 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet

Python Tutorial

Uploaded by

Python Tutorial

Uploaded by

Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Age Gender Race Major GPA

Yijun Zhao Linear Regression

{(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )}

a regression problem tries to uncover the function

such that for a new input value x∗ , we can

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Figure: 1D and 2D linear regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

where X† = (XTX)−1XT is the

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Gradient Descent Algorithm

Yijun Zhao Linear Regression

Use 0.1 (practical observation)

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

OLS Gradient Descent

OLS is expensive when D is large!

Yijun Zhao Linear Regression

What is the Probabilistic Interpretation?

Yijun Zhao Linear Regression

Right Skewed Left Skewed Random

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Linear models become powerful when we

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

L1: No closed form solution. Use quadratic

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Complex model: ”overfit” the data

Yijun Zhao Linear Regression

E (ŷ − yn )2 can be decomposed into (reading):

Yijun Zhao Linear Regression

Set aside another evaluation set

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Measure on average performance

Cost of computation: K folds × choices of λ

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

Yijun Zhao Linear Regression

You might also like