0% found this document useful (0 votes)
47 views18 pages

Estimation (Linear Regression) : Muhamad Fathurahman Data Mining Session 26-27 March 2020

The document discusses linear regression, a supervised machine learning algorithm for modeling the relationship between a target variable and one or more explanatory variables. It presents a housing price dataset with features like living area and bedrooms to predict price, and outlines the linear regression model, cost function, and an example implementation in Python using Scikit-learn to fit the model and evaluate predictions against test data.

Uploaded by

Ahmad Yd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views18 pages

Estimation (Linear Regression) : Muhamad Fathurahman Data Mining Session 26-27 March 2020

The document discusses linear regression, a supervised machine learning algorithm for modeling the relationship between a target variable and one or more explanatory variables. It presents a housing price dataset with features like living area and bedrooms to predict price, and outlines the linear regression model, cost function, and an example implementation in Python using Scikit-learn to fit the model and evaluate predictions against test data.

Uploaded by

Ahmad Yd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Estimation (Linear

Regression)
Muhamad Fathurahman
Data Mining Session 26-27 March 2020
This Slide is adopted from the Andrew NG’s notes on Linear Regression
Lecture
Outline....
• Prediction With Linear Regression
• Linear Regression Model
• Example
Linear Regression
• Type of Supervised Learning Algorithm
• Approach to modelling a relationship between a target variable and one
or more explanatory variables.
• The target variable is a continous value. (e.g Housing Price)
Dataset : Housing Price
Living Area Price
(Feet2) (1000$s)
2104 400
1600 330
2400 369
1416 232
3000 540
.... ....
Source : Lecture Notes, Andrew Ng
Dataset : Housing Price (Cont. 2)
Living Area Price (1000$s) •  as an input variables (Living
(Feet2) area)
• as an output or target
2104 400 variables
1600 330 • as an training example,
2400 369 • training example is called
training set
1416 232
• denoted space input and
3000 540 output values.
.... ....
Dataset : Housing Price (Cont. 3)
Living Area Price (1000$s) •  as an input variables (Living
(Feet2) area)
• as an output or target
2104 400 variables
1600 330 • as an training example,
2400 369 • training example is called
training set
1416 232
• denoted space input and
3000 540 output values.
.... ....
Dataset : Housing Price (Cont. 3)
Living Area Price (1000$s) •  as an input variables (Living
(Feet2) area)
• as an output or target
2104 400 variables
1600 330 • as an training example,
𝑖=3
  2400 369 • training example is called
training set
1416  
( 𝑥 (3) , 𝑦232
(3 ) ( 3)
𝑦(3)=369
)= 𝑥 =2400• ,denoted space input and
3000 540 output values.
.... ....
Dataset : Housing Price (Cont. 3)
Living Area Price (1000$s) •  as an input variables (Living
(Feet2) area)
• as an output or target
2104 400 variables
1600 330 𝑚=3
  • as an training example,
2400 369 • training example is called
training set
1416 232
• denoted space input and
3000 540 output values.
.... ....
 Dataset : Housing Price (Cont. 3)
Living Area Price (1000$s) •  as an input variables (Living
(Feet2) area)
• as an output or target
2104 400 variables
1600 330 • as an training example,
2400 369 • training example is called
𝑋
 
𝑌
 
training set
1416 232
• denoted space input and
3000 540 output values.
.... ....
Linear Regression Model
Training Set

Learning
Algorithm

𝒙  h  𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝒚
Living area of house Predicted price of house
Linear Regression : Dataset
Living •  are two dimensional vectors
Price in
Area Bedrooms
(1000$s)
(Feet2) • is living area and is its
2104 3 400 number of bedrooms
1600 3 330
2400 3 369
1416 2 232
3000 4 540
.... .... ....
Linear Regression: Model or Hypothesis
•  To perform supervised learning, we need to represent
function/hypotheses h in a computer.
• As an initial choise, let’s say we decide to approximate as a
linear function of :
  𝜃𝑖 are called parameters
 

or weights
  𝑥 0=1 is intercept term
  𝑛
𝑇
h ( 𝑥 ) =∑ 𝜃 𝑖 𝑥 𝑖=𝜃 𝑥
𝑖=0
Linear Regression: Model or Hypothesis
•• To
  perform supervised learning, we need to represent function/hypotheses h in a
computer.
• As an initial choise, let’s say we decide to approximate as a linear function of :

 
𝜃𝑖
  are called parameters
or weights
 𝑥 0=1 is intercept term
Living Area
Bedrooms Price (1000$s)
(Feet2)
2104 3 400
 
x(2) 1600 3 330
  y^
2400 3 369
Linear Regression : Cost Function
•  Now, given a training set, how do we pick, or learn parameters
• We need to define a cost function to measure how close ’ are to
corresponding . We defined the cost function :

  𝑚
1 (𝑖) (𝑖) 2
𝐽 ( 𝜃 )= ∑ (h𝜃 ( 𝑥 ) − 𝑦 ) .
2 𝑖=1
Linear Regression in Python
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error as cal_mse
import matplotlib.pyplot as plt
#Load Dataset

data = datasets.load_boston()
Linear Regression in Python (2)
# Split feature and target
X = data.data
Y = data.target
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size =
0.33, random_state = 5)

# Create model
lm = LinearRegression()
lm.fit(X_train, Y_train)
Y_pred = lm.predict(X_test)
Linear Regression in Python (3)
# Print mean Square Error
mse =cal_mse(Y_test, Y_pred)
print(mse)
#
plt.scatter(Y_test, Y_pred)
plt.xlabel("Prices: $Y_i$")
plt.ylabel("Predicted prices: $\hat{Y}_i$")
plt.title("Prices vs Predicted prices: $Y_i$ vs
$\hat{Y}_i$")
plt.show()
Linear Regression in Python (4)

You might also like