Session 18 Regression

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

Department of AI & DS

COURSE NAME: DATA SCIENCE & STATISTICS

COURSE CODE: 23MT2013

Topic
REGRESSION

Session - 18
AIM OF THE
SESSION
To familiarize students with the concept of regression analysis

INSTRUCTIONAL
OBJECTIVES

This Session is designed to:


1. Demonstrate Linear regression
2. Describe Linear and Non linear regression in real life applications
3. List out the two lines of regression

LEARNING OUTCOMES

At the end of this session, you should be able to:


1. Define liner regression
2. Describe the method of least squares to fit a linear and non linear association between two variables
3. Summarize the difference between linear and non linear regression.
SESSION INTRODUCTION
CONTENTS

Linear Regression

Nonlinear Regression
Regression analysis
A reasonable form of a relationship between the dependent variable and the regressors x is the linear relationship Y=α+βx

Where, α is the intercept and β is the slope.

If the relationship is exact, then it is a deterministic relationship between the two variables. However, in the examples
listed above, as well as countless other scientific and engineering phenomena, the relationship is not deterministic and there
will be random component in it. The concept of regression analysis deals with finding the best relationship between Y and
x, and using methods that allow for prediction of the response values for given values of the regressor x.

In many applications there will be more than one regressor. For example, in the case where the dependent variable is the
price of house, one would expect the age of the house to contribute to the explanation of the price so in this case the
multiple regression structure might be written

Y=α+β1X1+β2X2

Where Y is price, X1 is square footage and X2 is age in years. The resulting analysis is termed as multiple regressions while
the analysis of the single regressor case is called simple regression.
Regression analysis

Simple Linear regression model: The dependent variable Y is related to the independent variable x through the
equation

Y=α+βx+ε

Where α and β are unknown intercept and slope parameters respectively, and ε is a random variable that is assumed to
be distributed with E(ε)=0 and Var(ε)=σ2. Since ε is random the quantity Y is a random variable. The value x of the
regressor variable is not random and measured with negligible error. Ε is called random error or random
disturbance, has constant variance. E(ε)=0 implies that at a specific x and y values are distributed around the true or
population regression line Y=α+βx.
Regression analysis

The method of least squares: An aspect of regression analysis is to estimate the parameters α and β. We denote the
estimates a for α and b for β. Then the estimated or fitted regression line is given by

where is the predicted or fitted value. We expect that the fitted line should be closer to the true regression line. When a
large amount of data is available.

Residual: A residual is essentially an error in the fit of the model

Given a set of regression data {(xi, yi), i=1,2,...,n} and a fitted model

, the ith residual εi is given by εi=yi­-, i=1,2,...,n.


ACTIVITIES/ CASE STUDIES/ IMPORTANT FACTS RELATED TO THE
SESSION
We shall find a and b, the estimates of α and β, so that the sum of the squares of the residuals is a minimum. The residual
sum of squares is also called the sum of squares of the errors about the regression line and is denoted by SSE. This
minimization procedure for estimating the parameters is called the method of least squares. Hence, we shall find a and b so
as to minimize

Differentiating SSE with respect to a and b, equating the partial derivatives to zero and rearranging the terms to obtain the
equations (called the normal equations)
ACTIVITIES/ CASE STUDIES/ IMPORTANT FACTS RELATED
TO THE SESSION

Which may solved simultaneously to yield the computing formulas for a and b.
EXAMPLES

Example: Engineers fabricating a new transmission-type electron multiplier created an array of silicon nanopillars on a
flat silicon membrane. The precise structure can influence the electrical properties so, subsequently, the height and widths
of 50 nanopillars were measured in nanometres or 10 -9 meters. The summary statistics, with x=width and y=height, are

N=50, Sxx=7239.22, Sxy=17840.1, Syy=66957.2

a) Find the least squares line for predicting height from width

b) Find the least squares line for predicting width from height.

c) Make a scatter plot and show both lines. Comment.

Solution:

a) Here y=height and the least squares estimates are

slope=b=Sxy/Sxx=17840.1/7239.22=2.464 and
EXAMPLES

The fitted line is height =87.88+2.464 width.

b) Width is now the response variable and height the predictor, so x and y must be interchanged.

Slope b= 17,840.1/66976.2=0.266 and

The fitted line is width=6.944+0.266 height.

c) Here we construct the scatter plot and include the two lines of regression. The line from part (b) is written as

Height =-(6.944/0.266)+(1/0.266)width=-26.11+3.759width

Note that both pass through the mean point (

The chice of fitted line depends on which variable you wish to predict.
SUMMARY

In this session,
1. Define Regression analysis and how it is related with correlation discussed
2. Differentiate the linear and nonlinear regressions.
3. Method of least squares in determining the coefficient have described
SELF-ASSESSMENT QUESTIONS

1. In regression analysis, the variable that is being predicted is the

a) response, or dependent, variable


b) independent variable …
c) intervening variable
d) is usually x

In regression, the equation that describes how the response variable (y) is related to
the explanatory variable (x) is:

a) the correlation model


b) the regression model
c) used to compute the correlation
coefficient
d) None of these alternatives is correct.
TERMINAL QUESTIONS
1. Describe the linear and non linear regression

2. List out the properties of regression coefficients

3. Analyze the regression analysis and its importance in practical experiment

4. In the accompanying table, x is the tensile force applied to a steel specimen in thousands of pounds, and y is the resulting
elongation in thousandths of an inch:
X: 1 2 3 4 5 6
Y: 14 33 40 63 76 85
a) Graph the data to verify that it is reasonable to assume that the regression of Y on x is linear.
b) Find the equation of the least squares line, and use it to predict the elongation when the tensile force is 3.5 thousand pounds.
TERMINAL QUESTIONS

5) A professor in the school of business in a university polled a dozen colleagues about the number of professional
meetings professors attended in the past five years (x) and the number of papers submitted by those to refereed journals
(y) during the same period. The summary data are given as follows:
n=12,
Fit a straight line to the given data.
REFERENCES FOR FURTHER LEARNING OF THE
SESSION
Reference Books:
1. Chapter 1 of TP1: William Feller, An Introduction to Probability Theory and Its Applications:
Volume 1, Third Edition, 1968 by John Wiley & Sons,Inc.
2. Richard A Johnson, Miller& Freund’s Probability and statistics for Engineers, PHI, New Delhi,
11th Edition (2011).

Sites and Web links:


1. https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-formula/
2.https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/regression-
library/v/introduction-to-residuals-and-least-squares
3. https://nptel.ac.in/courses/105105150/24
THANK YOU

Team – DATA SCIENCE AND STATISTICS


2024-25

You might also like