Session 18 Regression
Session 18 Regression
Session 18 Regression
Topic
REGRESSION
Session - 18
AIM OF THE
SESSION
To familiarize students with the concept of regression analysis
INSTRUCTIONAL
OBJECTIVES
LEARNING OUTCOMES
Linear Regression
Nonlinear Regression
Regression analysis
A reasonable form of a relationship between the dependent variable and the regressors x is the linear relationship Y=α+βx
If the relationship is exact, then it is a deterministic relationship between the two variables. However, in the examples
listed above, as well as countless other scientific and engineering phenomena, the relationship is not deterministic and there
will be random component in it. The concept of regression analysis deals with finding the best relationship between Y and
x, and using methods that allow for prediction of the response values for given values of the regressor x.
In many applications there will be more than one regressor. For example, in the case where the dependent variable is the
price of house, one would expect the age of the house to contribute to the explanation of the price so in this case the
multiple regression structure might be written
Y=α+β1X1+β2X2
Where Y is price, X1 is square footage and X2 is age in years. The resulting analysis is termed as multiple regressions while
the analysis of the single regressor case is called simple regression.
Regression analysis
Simple Linear regression model: The dependent variable Y is related to the independent variable x through the
equation
Y=α+βx+ε
Where α and β are unknown intercept and slope parameters respectively, and ε is a random variable that is assumed to
be distributed with E(ε)=0 and Var(ε)=σ2. Since ε is random the quantity Y is a random variable. The value x of the
regressor variable is not random and measured with negligible error. Ε is called random error or random
disturbance, has constant variance. E(ε)=0 implies that at a specific x and y values are distributed around the true or
population regression line Y=α+βx.
Regression analysis
The method of least squares: An aspect of regression analysis is to estimate the parameters α and β. We denote the
estimates a for α and b for β. Then the estimated or fitted regression line is given by
where is the predicted or fitted value. We expect that the fitted line should be closer to the true regression line. When a
large amount of data is available.
Given a set of regression data {(xi, yi), i=1,2,...,n} and a fitted model
Differentiating SSE with respect to a and b, equating the partial derivatives to zero and rearranging the terms to obtain the
equations (called the normal equations)
ACTIVITIES/ CASE STUDIES/ IMPORTANT FACTS RELATED
TO THE SESSION
Which may solved simultaneously to yield the computing formulas for a and b.
EXAMPLES
Example: Engineers fabricating a new transmission-type electron multiplier created an array of silicon nanopillars on a
flat silicon membrane. The precise structure can influence the electrical properties so, subsequently, the height and widths
of 50 nanopillars were measured in nanometres or 10 -9 meters. The summary statistics, with x=width and y=height, are
a) Find the least squares line for predicting height from width
b) Find the least squares line for predicting width from height.
Solution:
slope=b=Sxy/Sxx=17840.1/7239.22=2.464 and
EXAMPLES
b) Width is now the response variable and height the predictor, so x and y must be interchanged.
c) Here we construct the scatter plot and include the two lines of regression. The line from part (b) is written as
Height =-(6.944/0.266)+(1/0.266)width=-26.11+3.759width
The chice of fitted line depends on which variable you wish to predict.
SUMMARY
In this session,
1. Define Regression analysis and how it is related with correlation discussed
2. Differentiate the linear and nonlinear regressions.
3. Method of least squares in determining the coefficient have described
SELF-ASSESSMENT QUESTIONS
In regression, the equation that describes how the response variable (y) is related to
the explanatory variable (x) is:
4. In the accompanying table, x is the tensile force applied to a steel specimen in thousands of pounds, and y is the resulting
elongation in thousandths of an inch:
X: 1 2 3 4 5 6
Y: 14 33 40 63 76 85
a) Graph the data to verify that it is reasonable to assume that the regression of Y on x is linear.
b) Find the equation of the least squares line, and use it to predict the elongation when the tensile force is 3.5 thousand pounds.
TERMINAL QUESTIONS
5) A professor in the school of business in a university polled a dozen colleagues about the number of professional
meetings professors attended in the past five years (x) and the number of papers submitted by those to refereed journals
(y) during the same period. The summary data are given as follows:
n=12,
Fit a straight line to the given data.
REFERENCES FOR FURTHER LEARNING OF THE
SESSION
Reference Books:
1. Chapter 1 of TP1: William Feller, An Introduction to Probability Theory and Its Applications:
Volume 1, Third Edition, 1968 by John Wiley & Sons,Inc.
2. Richard A Johnson, Miller& Freund’s Probability and statistics for Engineers, PHI, New Delhi,
11th Edition (2011).