0% found this document useful (0 votes)
4 views

Lecture 3

Uploaded by

c8d72twt49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 3

Uploaded by

c8d72twt49
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

machine

learning
Linear regression
Dr. Darkhan Zholtayev
Assistant professor at Department of Computational and Data
Science
[email protected]
Topics to cover
• What is the regression
• Linear regression
• Lest square error
General graph

AI map

Joseph, B. (2020, June 17). Linear Regression Made Easy: How Does It Work and How to Use It in Python. Towards Da
Science. https://towardsdatascience.com/linear-regression-made-easy-how-does-it-work-and-how-to-use-it-in-pytho
Linear Regression
• Technique used for the modeling and analysis of
numerical data
• Exploits the relationship between two or more variables
so that we can gain information about one of them
through knowing values of the other
• Regression can be used for prediction, estimation,
hypothesis testing, and modeling causal relationships
Problem
Data

Xie, Y. (2013). Lecture 11: Simple Linear Regression. H. Milton Stewart School of Industrial
and Systems Engineering, Georgia Institute of Technology. Retrieved from
Data

Xie, Y. (2013). Lecture 11: Simple Linear Regression. H. Milton Stewart School of Industrial
and Systems Engineering, Georgia Institute of Technology. Retrieved from
Data
Linear Regression
Linear regression
Linear
regression

Xie, Y. (2013). Lecture 11: Simple Linear Regression. H. Milton Stewart School of Industrial
and Systems Engineering, Georgia Institute of Technology. Retrieved from
Linear regression: different forms
Linear regression
Linear regression
Linear regression
Estimate regression parameters
Method of least squares
Least square estimates

Xie, Y. (2013). Lecture 11: Simple Linear Regression. H. Milton Stewart School of Industrial
and Systems Engineering, Georgia Institute of Technology. Retrieved from
Alternative notation
Example: oxygen and hydrocarcon level
Calculati
on 2
Calculati
on
Interpretat
ion of
regression
model
Estimation of variance
Sammary

Xie, Y. (2013). Lecture 11: Simple Linear Regression. H. Milton Stewart School of Industrial
and Systems Engineering, Georgia Institute of Technology. Retrieved from
Example
• import pandas as pd # for data manipulation
import numpy as np # for data manipulation
from sklearn.linear_model import LinearRegression # for
creating a model
import plotly.graph_objects as go # for visualizations
import plotly.express as px # for visualizations

• # Read data into a Pandas DataFrame


df = pd.read_csv('Real estate.csv', encoding='utf-8')

# Print DataFrame
df
Joseph, B. (2020, June 17). Linear Regression Made Easy: How Does It Work and How to Use It in Python. Towards Data Science.
https://towardsdatascience.com/linear-regression-made-easy-how-does-it-work-and-how-to-use-it-in-python-be0799d2f159
Data

Joseph, B. (2020, June 17). Linear Regression Made Easy: How Does It Work and How to Use It in Python. Towards Data Science.
https://towardsdatascience.com/linear-regression-made-easy-how-does-it-work-and-how-to-use-it-in-python-be0799d2f159
Code 1
• # Create a scatter plot
fig = px.scatter(df, x=df['X3 distance to the nearest MRT station'], y=df['Y house price of unit area'],
opacity=0.8, color_discrete_sequence=['black'])

# Change chart background color


fig.update_layout(dict(plot_bgcolor = 'white'))

# Update axes lines


fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',
zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey',
showline=True, linewidth=1, linecolor='black')

fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',


zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey',
showline=True, linewidth=1, linecolor='black')

# Set figure title


fig.update_layout(title_text="Scatter Plot")

# Update marker size


fig.update_traces(marker=dict(size=3))

Joseph,fig.show()
B. (2020, June 17). Linear Regression Made Easy: How Does It Work and How to Use It in Python. Towards Data Science.
https://towardsdatascience.com/linear-regression-made-easy-how-does-it-work-and-how-to-use-it-in-python-be0799d2f159
Scatter plot

Joseph, B. (2020, June 17). Linear Regression Made Easy: How Does It Work and How to Use It in Python. Towards Data Science.
https://towardsdatascience.com/linear-regression-made-easy-how-does-it-work-and-how-to-use-it-in-python-be0799d2f159
Training
• # Select variables that we want to use in a model
# Note, we need X to be a 2D array, hence reshape
X=df['X3 distance to the nearest MRT station'].values.reshape(-1,1)
y=df['Y house price of unit area'].values

# Fit linear regression model


model = LinearRegression()
reg = model.fit(X, y)

# Print the slope and intercept of the best-fit line


print(reg.coef_)
print(reg.intercept_)

Joseph, B. (2020, June 17). Linear Regression Made Easy: How Does It Work and How to Use It in Python. Towards Data Science.
https://towardsdatascience.com/linear-regression-made-easy-how-does-it-work-and-how-to-use-it-in-python-be0799d2f159
Code 2
• # We will use below to draw a best-fit line on a chart
# Create 20 evenly spaced points from smallest X to largest X
x_range = np.linspace(X.min(), X.max(), 20)

# Predict y values for our set of X values


y_range = model.predict(x_range.reshape(-1, 1))

# Create a scatter plot


fig = px.scatter(df, x=df['X3 distance to the nearest MRT station'], y=df['Y house price of unit area'],
opacity=0.8, color_discrete_sequence=['black'])

# Add a best-fit line


fig.add_traces(go.Scatter(x=x_range, y=y_range, name='Regression Fit'))

# Change chart background color


fig.update_layout(dict(plot_bgcolor = 'white'))

# Update axes lines


fig.update_xaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',
zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey',
showline=True, linewidth=1, linecolor='black')

fig.update_yaxes(showgrid=True, gridwidth=1, gridcolor='lightgrey',


zeroline=True, zerolinewidth=1, zerolinecolor='lightgrey',
showline=True, linewidth=1, linecolor='black')

# Set figure title


fig.update_layout(title_text="Scatter Plot with Linear Regression Line")

# Update marker size


fig.update_traces(marker=dict(size=3))

fig.show()

Joseph, B. (2020, June 17). Linear Regression Made Easy: How Does It Work and How to Use It in Python. Towards Data Science.
https://towardsdatascience.com/linear-regression-made-easy-how-does-it-work-and-how-to-use-it-in-python-be0799d2f159
Prediction line
• # Select variables that we want to use in a
model
# Note, X in this case is already a 2D
array, hence no reshape
X=df[['X3 distance to the nearest MRT
Multiple station','X2 house age']]
y=df['Y house price of unit area'].values
linear # Fit linear regression model
regression model = LinearRegression()
reg = model.fit(X, y)

# Print slope(s) and intercept


print(reg.coef_)
print(reg.intercept_)
Multiple
linear
regression
— Python
example

Joseph, B. (2020, June 17). Linear Regression Made Easy: How Does It Work and How to Use It in Python. Towards Data Science.
https://towardsdatascience.com/linear-regression-made-easy-how-does-it-work-and-how-to-use-it-in-python-be0799d2f159
Fitted
multiple
linear
regression

Joseph, B. (2020, June 17). Linear Regression Made Easy: How Does It Work and How to Use It in Python. Towards Data Science.
https://towardsdatascience.com/linear-regression-made-easy-how-does-it-work-and-how-to-use-it-in-python-be0799d2f159
Basic statistics
• The sample mean is the sum of all the observations (∑Xi)
divided by the number of observations (n):
ΣXi = X1 + X2 + X3 + X4 + … + Xn

• Example. 1, 2, 2, 4, 5, 10. Calculate the mean. Note: n =


6 (six observations)

∑Xi = 1 + 2+ 2+ 4 + 5 + 10 = 24
= 24 / 6 = 4.0
The median

To get the median, we must first


rearrange the data into an
The median is the middle value of ordered array (in ascending or
the ordered data descending order). Generally, we
order the data from the lowest
value to the highest value.
The mode
• The mode is the value of the data that occurs with the
greatest frequency.

Example. 1, 1, 1, 2, 3, 4, 5
Answer. The mode is 1 since it occurs three times. The other values
each appear only once in the data set.

Example. 5, 5, 5, 6, 8, 10, 10, 10.


Answer. The mode is: 5, 10.
There are two modes. This is a bi-modal dataset.
Standart deviation

• The standard deviation, s, measures a kind of “average” deviation about the


mean. It is not really the “average” deviation, even though we may think of
it that way.

• Why can’t we simply compute the average deviation about the mean, if
that’s what we want?

• If you take a simple mean, and then add up the deviations about the mean,
as above, this sum will be equal to 0. Therefore, a measure of “average
deviation” will not work.
Standard Deviation
• Instead, we use:

• This is the “definitional formula” for standard deviation.


• The standard deviation has lots of nice properties, including:
• By squaring the deviation, we eliminate the problem of the deviations
summing to zero.
• In addition, this sum is a minimum. No other value subtracted from X and
squared will result in a smaller sum of the deviation squared. This is called
the “least squares property.”
• Note we divide by (n-1), not n. This will be referred to as a loss of
one degree of freedom.
Variance
The variance, s2, is the standard deviation (s) squared.
Conversely, .

Definitional formula:
Computational formula:
Thank you
for your
attention

You might also like