0% found this document useful (0 votes)
13 views29 pages

Updated Lecture 7

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views29 pages

Updated Lecture 7

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Regression

Linear Regression and


Multiple linear regression
Lecture Outline
• Regression Techniques
• Linear regression
• Non-linear Regression
• Typical applications of regression
• Linear regression
• Key Concepts of Linear Regression
• Typical simple regression model of Linear Regression
• Evaluating Linear Regression Model
• Non-linear Regression
• Linear Regression in Python
Regression Techniques
• Regression is the technique most often used to create data
models in traditional statistics.
• It is a statistical modeling technique that examines the relation
between a dependent variable and one or more independent
variables.
• There are three main types:
• linear regression, for data whose distribution tends to adjust to a
straight line;
• non-linear regression, for data whose tendency adjusts to a curve; and
• logistic regression, for data models whose output is a binary type.
Linear regression
• Linear regression is one of the oldest predictive methodologies, it the
most easily method for demonstrating function fitting.
• The basic idea is to come up with a function that explains and predicts
the value of the target variable when given the values of the
predictor variables.
• In linear regression, the primary objective is to predict numerical
features such as real estate or stock prices, temperature, exam marks,
and sales revenue. This method is particularly effective when dealing
with continuous predictor and target variables.
Linear Regression …..
• Both the dependent and the independent variables have to be
numerical. Categorical variables such as gender or district of residence
must be recoded as binary variables (two categories) or as numerical.
• Linear regression analyzes the relation between two variables, X and Y,
and tries to find the best straight line that passes through the data, as
seen in below Figure
Key Concepts of Linear Regression :
Continuous Variables:
1. Predictor Variable: The input variable used for prediction.
2. Target Variable: The variable we aim to predict.
Straight Line Relationship:
3. In linear regression, a straight line is 'fitted' to represent the
relationship between predictor and target variables.
Statistical Technique: Least Squares Method:
4. The fitting process employs the statistical concept of the least squares
method.
5. Objective: Minimize the sum of squared differences between
observed and predicted values.
Examples of Linear Regression
• Predicting Stock Prices : As an example, consider predicting stock prices using
linear regression:
❑ Predictor Variables: Historical stock data.
❑ Target Variable: Future stock prices.
❑ Visual representation of the fitted line illustrates the predictive relationship.
• Predict the total volume of purchases : Let's say you work for an e-commerce
company, and your task is to predict how much a client might spend in a quarter.
Here, we have two types of variables:
❑ Predictor Variable (Independent Variables): These are the factors we believe
influence the total volume of purchases. In this case:
• Age of the client
• Income of the client
• Socioeconomic indicator, like the district of residence
❑ Target Variable (Dependent Variable): This is what we want to predict – the total
volume of purchases made by a client in a given quarter.
Typical simple regression model of Linear
Regression
• Definition : Linear regression is the fact that there is a significant correlation between
two variables to make predictions about one variable based on the knowledge of the
other.
• The straight line of linear regression model is expressed in the form:

• Residual ɛ is the distance between the predicted point (on the regression line) and the
actual point
• The objective of linear regression is to find the line that best predicts Y given X.
Linear Regression: Understand how to compute
a and b for a set of X and Y values. ….
Steps to Calculate 'a' and 'b':
1. Determine the Mean of X and Y:

2. Calculate ‘b' (Slope):

3. Calculate ‘a' (Intercept):


Evaluating Linear Regression Model
Once we've built a Linear Regression (LR) model, it's crucial to evaluate its performance. Let's
explore key evaluation methods to ensure our model is effective.
1. Sum of Squared Errors (SSE): SSE measures how much our predicted values differ from the
actual values in a squared manner. It's like calculating the sum of the squared "mistakes" our
model makes in predicting the outcomes.
Formula:

Example :
•Suppose our model predicts student scores (Y_predicted) as 75, 80, 85, while the actual scores
(Y_actual) are 80, 85, 90. SSE would be

•The lower the SSE, the better our model is at predicting.


Evaluating Linear Regression Model
……
2. R-squared is a good measure to evaluate the model fitness. It is also known as the coefficient of
determination, or for multiple regression, the coefficient of multiple determination. The R-squared value lies
between 0 to 1 (0%–100%) with a larger value representing a better fit. It is calculated as:

Example Assume, We've built a LR model, and the predicted


scores
Exam
Hours Studied Scores
(X) (Y_actual) Predicted Scores (Y_predicted)
2 60 65
3 70 75
4 80 85
5 85 95

Interpretation:
The calculated R-squared value of
approximately 0.527 means that about
52.7% of the variability in exam scores is
explained by the number of hours studied.
The remaining 47.3% is not explained by
our model.
Example1 :Predicting Exam Scores
Suppose we want to predict students' exam scores (dependent
variable) based on the number of hours they study (independent
variable). Hours Studied (X) Exam Scores (Y)
2 60
3 70
4 80
5 85
Predicting Exam Scores ………
Step 1: Calculate the Mean of X
and Y
Hours Studied (X) Exam Scores (Y)
2 60
3 70
4 80
5 85
Predicting Exam Scores ………
Hours Studied Exam Scores
(X) (Y)
2 60
3 70
4 80
5 85

= 8.5

= 44.00

So, the linear regression equation is Y=8.5X+44.


Predicting Exam Scores ………
Step 3: Predict Exam Score for a New Value of X
Hours Studied (X) Exam Scores (Y)
Let's say a student studies for 6 hours.
2 60

Ypredicted=8.5⋅6+44=95 3
4
70
80
Step 4: Calculate Residual (Sum of Squared Errors) 5 85
Residual=Yactual−Ypredicted
Residuals=[60−(8.5⋅2+44),70−(8.5⋅3+44),80−(8.5⋅4+44),85−(8.5⋅5+44)]
Residuals=[-1,0.5,2,-1.5]
Sum of Squared Errors (SSE)= -1^2 +0.5^2+2^2+(-1.5)^2
SSE=7.5
This represents the total error in our predictions. The smaller the SSE, the
better the model fits the data.
Example2:Investigating Academic
Performance
A college professor is intrigued by the idea that there might be a correlation
between students' grades in internal examinations and their subsequent
performance in external examinations. To explore this hypothesis, the professor
selects a random sample of 15 students from the class.
Exploring the Relationship:
•The professor is interested in understanding whether a high grade in internal
exams tends to correlate with high grades in external exams.
Data Collection:
•A random sample of 15 students is chosen for the study.
•The professor gathers data on the grades of these students in both internal and
external examinations.
Investigating Academic Performance
…..
As you can observe from the above graph,
the line does not predict the data exactly.
Instead, it just cuts through the data. Some
prediction are lower than expected, while
some others are higher than expected

Residual is the distance between the


predicted point (on the regression line)
and the actual point as depicted
Investigating Academic Performance
…..
As you can observe from the above graph,
the line does not predict the data exactly.
Instead, it just cuts through the data. Some
prediction are lower than expected, while
some others are higher than expected

Residual is the distance between the


predicted point (on the regression line)
and the actual point as depicted
Detailed calculation of regression parameters
The LR model of Investigating Academic
Performance

The value of the intercept from the above equation


is 19.05. However, none of the internal mark is 0. So,
intercept = 19.05 indicates that 19.05 is the portion
of the external examination marks not explained by
the internal examination marks.
Slope measures the estimated change in the average
value of Y as a result of a one-unit change in X. Here,
slope = 1.89 tells us that the average value of the
external examination marks increases by 1.89 for
each additional 1 mark in the internal examination.
Summary : Linear Regression
• When the outcome, or class, is numeric, and all the attributes
are numeric, linear regression is a natural technique to
consider.
• Linear regression is an excellent, simple method for numeric
prediction, and it has been widely used in statistical
applications for decades.
Non-linear Regression
• So far, we have seen the concept of simple linear regression
where a single predictor variable X was used to model the
response variable Y .
• In many applications, there is more than one factor that
influences the response. Multiple regression models thus
describe how a single response variable Y depends linearly on
a number of predictor variables.
• A multiple linear regression model with k predictor variables
X1, X2, ..., Xk and a response Y , can be written as

Examples
• The selling price of a house can depend on the desirability of
the location, the number of bedrooms, the number of
bathrooms, the year the house was built, the square footage
of the lot and a number of other factors.
• The height of a child can depend on the height of the mother,
the height of the father, nutrition, and environmental factors.
Example :Predicting Exam Scores with
Multiple Linear Regression
Interested in understanding what factors influence students' exam scores, a professor conducts a study at a
bustling university. The professor believes that the number of hours students dedicate to studying and the
frequency of practice tests are essential.
Setting the stage:
•Variables:
• Y (Exam Scores)
• X1​(Hours Studied)
• X2​(Practice Tests Taken)
•Objective:
• Predict Y based on X1​and X2​using multiple linear regression.

Hours Studied (X1​ Practice Tests (X2​


) ) Exam Scores (Y)
2 1 60
3 2 70
4 2 80
5 3 85
Example :Predicting Exam Scores with Multiple Linear Regression
Step 1: Set Up the Model:
Our multiple linear regression model is
Step 2: Calculate Means:
Hours
Studied
(X1​) Practice Tests (X2​) Exam Scores (Y)

2 1 60

3 2 70

4 2 80

5 3 85

Step 3: Calculate Slopes (b1​and b2​):


=10.0000
= -2.5000

Step 4: Calculate Intercept (b0​):


=43.7500

The final multiple linear regression model would be:


Y= 43.75+10X1-2.5X2
Example :Predicting Exam Scores with Multiple Linear Regression

This model allows us to predict the exam score based on the number of hours studied and the number
of practice tests taken . For example, if a student studies for 4 hours and takes 2 practice tests , the
predicted exam score would be:

Y= 43.75+10*4 -2.5*2= 78.75

So, according to our multiple linear regression model, the predicted exam score would be 78.75 for a
student who studies for 4 hours and takes 2 practice tests.
Linear Regression in Python
Importing Libraries Building and Fitting the Model

Loading Data into a DataFrame


Understanding and Display Coefficients

Making Predictions on Existing Data


Extracting Features and Target Variable

Evaluating Model Performance


Linear Regression in Python
Making Predictions for New Data

Visualizing Results

Result after playing the code


Reference list
Machine Learning" by Amit Kumar Das Saikat Dutt, Subramanian Chandramouli, Pearson; First
edition (October 1, 2018)

You might also like