0% found this document useful (0 votes)

66 views14 pages

03 ES Regression Correlation

This document discusses regression analysis and correlation. Regression analysis allows us to develop relationships between known variables to predict unknown variables. It examines associations between variables and obtains rules for prediction. The document specifically discusses linear regression, which uses a straight line equation (y=a+bx) to model the relationship between a dependent variable (y) and independent variable (x). It also discusses using the method of least squares to estimate the regression line parameters a and b, which minimize the sum of squared errors between observed and predicted y values.

Uploaded by

Muhammad Abdullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views14 pages

03 ES Regression Correlation

Uploaded by

Muhammad Abdullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Handout 03 Regression and Correlation (1)

03 Regression and Correlation

We are mainly concern with the use of associations among variables. These associations may be
useful in many ways, and one of the most important and most common is prediction. This method is used
for predicting the value of one quantity by using values of related quantities. Such method may also lead
to methods for controlling the value of one variable by adjusting the values of related variables.
Regression analysis offers us a sensible and sound approach for examining associations among variables
and for obtaining good rules for prediction.
Regression Analysis
 we make decisions based on prediction of future events
 we develop relationship between what is already known and what is to be estimated
There always exists a tendency towards the average. For example, the height of children born to
tall parents tends towards the shorter height. So we make use of regression that is a process of predicting
one variable (the height of the children) from another (the height of parents). So we develop an estimating
equation. This may be a straight line or a parabolic.

Later on we will study correlation analysis to determine the degree to which the variables are
related. It tells us how well the estimating equation actually describes the relationship.
We find a CAUSAL relationship between variables.
i.e. how does the independent variable causes the dependent variable to change.
Deterministic and Probabilistic Relations or Models
A formula that relates quantities in the real world is called a model. Recall that in physics we
have studies that if a body is moving under uniform motion with an initial velocity ‘u’ and uniform
acceleration ‘a’, the velocity after time ‘t’ is given by:
v = u + at
This is a model for uniform motion. This model has the property that when a value of ‘t’ is
substituted in the above equation, the value of v is determined without any error. Such models are called
deterministic models. An important example of the deterministic model is the relationship between
Celsius and Fahrenheit scales in the form of F = 32+9/5C. Other examples of such models are Boyle’s
law, Newton’s law of gravitation, ohm’s law etc.
Consider an other example to investigate the relationship between the yield of potatoes y and the
level of fertilizer application x. An investigator divides the field into eight plots of equal size withequal
fertility and applied varying amounts of fertilizer to each. The yield of potatoes (in kg) and the fertilizer
application (in kg) was recorded for each plot. This data is given below:

Fertilizer Applied: =x 1 1.5 2 2.5 3 3.5 4 4.5

Yield of Potatoes: =y 25 31 27 28 36 35 32 34

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (2)

Suppose the investigator believes that the relation between y and x is exactly given by:
y = 22 + 2.5 x
If this is true we must obtain the exact value of yield y for a given value of x. Thus when x = 1,
the yield must be:
Y = 22 + 2.5 (1) = 24.5
But it is 25. There is an error of 24.5 – 25.0 = - 0.5. Hence no deterministic model can be
constructed to represent this experiment. This type of error is known as probabilistic model. The
deterministic relation in such cases is then modified to include both a deterministic component and a
random error component given as
Yi = a = bXi + i , where i’s are the unknown random errors.
Regression Model
There are many statistical investigations in which the main objective is to determine whether a
relationship exists between two or more variables. If such a relationship can be expressed by a
mathematical formula, we will then be able to use it for the purpose of making predictions. The reliability
of any prediction will, of course, depend on the strength of the relationship between the variables included
in the formula.
A mathematical equation that allows us to predict values of one dependent variable from known
values of one or more independent variables is called a regression equation. Today the term regression is
applied to all types of prediction problems and does not necessarily imply a regression towards the
population mean.
Linear Regression
We consider here the problem of estimating or predicting the value of a dependent variable Y on
the basis of a known measurement of an independent and frequently controlled variable X. The variable
intended to be estimated or predicted is termed as dependent variable and the variable on the basis of
which the dependent variable is to be estimated is called the independent variable.
e.g. If we want to estimate the heights of children on the basis of their ages, the heights would
be the dependent variable and the ages would be the independent variable. In estimating the yields of a
crop, on the basis of the amount of the fertilizer used, the yield will be the dependent variable and the
amount of fertilizer would be the independent variable.
Scatter Diagram
Let us consider the distribution of chemistry grades corresponding to intelligence test scores of
50, 55, 65 and 70. The chemistry grades for a sample of 12 freshmen having these intelligence test scores
are presented in the following table
Test score (Xi) 65 50 55 65 55 70 65 70 55 70 50 55
Chemistry grade (Yi) 85 74 76 90 85 87 94 98 81 91 76 74
The data table has been plotted in figure to give a scattered
diagram. In the scattered diagram, the points follow closely a
straight line indicate that the two variables are to some extend
linearly related. Once a reasonable linear relationship is obtained,
we usually try to express this mathematically by a straight-line
equation Y = a + bX, called the linear regression line, where
the constants a and b represent the y-intercept and slope
respectively. Such a regression line has been drawn in the
following figure. This linear regression line can be used to predict
the value Y corresponding to any given value X.

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (3)

Using the regression line in figure, we can predict a chemistry grade of 88 for a student whose
intelligence test score is 60. However, we would be extremely fortunate if a student with an intelligence
test score of 60 made a chemistry grade of exactly 88. In fact, the original data of table show that three
students with this intelligence test score received grades of 85, 90 and 94. we must therefore, interpret the
predicted chemistry grade of 88 as an average or expected value for all students taking the course who
have an intelligence test score of 60.
Many possible regression lines could be fitted to the sample data, but we choose that particular
line which best fits that data. The best regression line is obtained by estimating the regression parameters
by the most commonly used method of least squares.
Estimation of a Straight Line using the Method of Least Squares
The basic linear relationship between the dependent variable Yi and the value Xi is
Yi = a + b Xi + i
where a and b are called the unknown population parameters (b is also called the coefficient of
regression), Yi are the observed values and i are the error components.
The estimated regression is written as

Yi = a + b Xi
The method of least squares determines the values of the unknown parameters that minimize the
sum of squares of the errors where errors are defined as the difference between observed values and the
corresponding predicted or estimated values. It is denoted by
n n  n
S(a,b) =  ei2 =  (Yi  Yi)2 =  (Yi  a  b Xi)2
i =1 i =1 i =1
minimizing S(a,b), we put first partial derivatives w. r. t. a and b
equal to zero. Therefore
S(ab) n
= 2  (Yi  a  b Xi)(1) = 0
a i =1
S(ab) n
= 2  (Yi  a  b Xi)(Xi) = 0
b i =1
by simplifying, we have
Yi = na + bXi
Xi Yi = aXi + bXi 2
by solving, we have

(Y)(X2)  (X)(XY) nXY  (X)(Y)

a= b=
n(X2)  (X)2 n(X2)  (X)2

If the variable X is taken as dependent variable , then the least square line is given by
X = c + dY
and the normal equations are
X = nc + dY
XY = cY + dY2
By solving simultaneously, w have the values of c and d
(X)(Y2)  (Y)(XY) nXY  (X)(Y)
c= 2 2 ,d=
n(Y )  (Y) n(Y2)  (Y)2
Example (1)
Fit a least square line of regression to the following data taking (i) X as independent variable (ii)
Y as independent variable.
X 1 3 5 6 7 9 10 13
Y 1 2 5 5 6 7 7 8

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (4)

Solution
(i) The equation of the least square line is
Y = a + bX
and the normal equations are
y = na + bX
XY = aX + bX2
From given data we have n = 8

X Y XY X2 Y2
1 1 1 1 1
3 2 6 9 4
5 5 25 25 25
6 5 30 36 25
7 6 42 49 36
9 7 63 1 49
10 7 70 100 49
13 8 104 169 64
X = 54 Y = 41 XY= 341 X2 =470 Y2= 253

Substituting the values of n, X, Y, XY, X2, Y2 in the normal equations, we have
y = na + bX
XY = aX + bX2
41 = 8a + 54b
341 = 54a + 470b
By solving simultaneously, we get
a = 1.01, b = 0.609
We can also find the values of a and b using the formulas
(Y)(X2)  (X)(XY) nXY  (X)(Y)
a= ,b=
n(X2)  (X)2 n(X2)  (X)2
Thus the equation of the fitted least square line becomes
Y = 1.01 + 0.609X
(ii) When X is taken as dependent variable and Y as the independent variable, the equation of
the least square line of regression is
X = c + dY
With the normal equations
X = nc + dY
XY = cY + dY2
Substituting the values from the table 7.2, we have
54 = 8c + 41d
341 = 41c + 253d
By solving simultaneously we have
c =  0.93, d = 1.499
Thus the equation of the fitted least square line becomes
X = 0.93 + 1.499Y
Example (2)
Following is given the data of 10 randomly selected areas in each area number of oil stoves and
the annual consumption of oil in barrels is given. Fit a regression equation of annual oil consumption on
number of stoves.
No. of stoves: =x 1 1.5 2 2.5 3 3.5 4 4.5
Annual Consumption of oil: =y 25 31 27 28 36 35 32 34

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (5)

Solution
Necessary calculations are given below

x y xy x2
27 142 3834 729
32 170 5440 1024
38 200 7600 1444
42 194 8148 1764
48 224 10752 2304
54 256 13824 2916
60 261 15660 3600
67 270 18090 4489
73 304 22192 5329
79 349 27571 6241
Total 520 2370 133111 29840

(Y)(X2)  (X)(XY) nXY  (X)(Y)

a= ,b=
n(X2)  (X)2 n(X2)  (X)2
we have a = 52.6792 and b = 3.5254
so the linear regression equation is:
^y = 52.68 + 3.53 x
Positive value of b shows that for an increase of one stove, the oil consumption is increased by
3.53 barrels on the average.
Example (3)
Consider the data on experiment on potatoes. We want to fit a line of regression to the data using
the method of least squares.
The necessary calculations are given in the table below:

x y xy x2 y2 y^ e
1 25 25.0 1.00 625 26.83 -1.83
1.5 31 46.5 2.25 961 28.02 2.98
2 27 54.0 4.00 729 29.21 -2.21
2.5 28 70.0 6.25 784 30.40 -2.40
3 36 108.0 9.00 1296 31.59 4.41
3.5 35 122.5 12.25 1225 32.79 2.21
4 32 128.0 16.00 1024 33.98 -1.98
4.5 34 153.0 20.25 1156 35.17 -1.17
Total 22 248 707.0 71.00 7800

(Y)(X2)  (X)(XY) nXY  (X)(Y)

a= ,b=
n(X2)  (X)2 n(X2)  (X)2
we have a = 24.452 and b = 2.38095
The fitted line which is the simple regression line:
^y = 24.452+ 2.38095 x
Conclusions
(i) The above regression line provides the estimated values (or the mean values) of y for
given values of x.
(ii) b = 2.381 indicates an increase of 2.381 kg in yield per unit increase fertilizer. Such
interpretation is valid only when x lies between 1 and 4.5.
(iii) an extension of the model beyond these values may lead to unreasonable results.

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (6)

(iv) If we assume the model to extend to x = 0, a = 24.452 is the estimated expected yield
when no fertilizer is used.
Plot of Residuals
A plot of residuals against the values of x often provides the idea of how good the fit is.
a) If the points in the plot are close to the x-axis and scattered in a random way, the model
appears to provide a good fit.
b) If the points are distributed in a systematic manner we should try some other model.

Example (4)
For the data in Example (3) relating to potatoes yields, remove the first pair (x = 1, y = 25) and fit
a line of regression to the remaining seven pairs. Is the line same as already determined for the eight
pairs? Do the same by removing the second pair (x=1, y = 31) instead of the first. Are the three lines
different?
You will observe that a change in data leads to a different line. We say that a least squares line
has zero breakdown point. There are methods in which a change of as many as 50% data points does not
cause any change in equation of the fitted line.
Exercise
(1) Given below the data relating to the thermal energy generated in Pakistan 1981-94. The
energy generation is in billion kwh.
Year 1981 1982 1983 1984 1985 1986 1987
Energy Generated 4.2 5.2 5.1 5.2 6.5 7.3 8.4
Year 1988 1989 1990 1991 1992 1993 1994
Energy Generated 10.8 11.9 14.5 16.1 19.4 19.7 23.0
Fit a straight line to the data. Find the residuals. Plot the residuals and comment on your result.
(2) Following is the annual installation of computers in labs in UET. Fit a linear regression
equation of the computers on years and give the annual rate of installation of them.

Year: 2001-2003 2003-2005 2005-2007 2007-2009 2009-2011

No of Computers 139 144 150 154 158
installed:

Note
For each situation where the independent variable is a time factor, the values assigned to
2001-2003,… may be taken as 1,2,3,…

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (7)

Example (5) (Curve Fitting)

Using the method of least squares
(i) Fit a second degree parabola to the following data taking X as independent variable:
X= 0 1 2 3 4
Y= 1 1.8 1.3 2.5 6.3

(ii) Fit an equation of the form Y = aX2 + bX to the following data

X= 0 1 2 3 4 5
Y= 1 5 12 20 25 36

(iii) Fit an exponential curve Y = a ebX to the following data

X= 1 2 3 4 5 6
Y= 1.6 4.5 13.8 40.2 125.0 363.0

(iv) Fit an equation of the form Y = a Xb to the following data

X= 1 2 3 4 5 6
Y= 2.98 4.26 5.21 6.10 6.80 7.50

Example (6)
 
Fit a least squares line for 20 pairs of observations having X = 2, Y = 8, X2 = 180 and XY=404
Example (7)
For 5 pairs of observations, it is given that A.M of X is 2 and A.M of Y is 15. It is also known
that X = 30, X3 = 100, X4 =354, XY = 242, X2Y = 850. Fit a second degree parabola taking X as
2

an independent variable.

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (8)

Standard Deviation of Regression or Standard Error of Estimate

Now we study how to measure the reliability of the estimating equation we have developed. In a
scattered diagram, we realize that a line as an estimator is more accurate when the data points lie close to
the line than when the points are farther away from the line. To measure the reliability of the estimating
equation, statisticians have developed the standard error of estimate which is similar to the standard
deviation.
The standard error of estimates measures the variability of or scatter of the observed values
around the regression line. The standard deviation of regression or the standard error of estimate of Y on
X denoted by sY.X and defined by
^
(YY)2 ^
sY.X = where Y = a + bX, the estimated regression line.
n2
Alternative formula,
Yi2  aYi  bXiYi
sY.X = where n is the number of pairs.
n2
Dividing by ‘n-2’
Because the values a and b were obtained from a sample of data points, we lose 2 degree of
freedom when we use points to estimate the regression line.
In statistics, the number of degree of freedom is the number of values in the final calculation of a
statistic that are free to vary.
Example (8)
Given the following sets of values:
X 6.5 5.3 8.6 1.2 4.2 2.9 1.1 3.9
Y 3.2 2.7 4.5 1.0 2.0 1.7 0.6 1.9
(a) Compute the least squares regression equation for Y values on X values.
(b) Compute the least squares regression equation for X values on Y values.
Example (9)
For each of the following data, determine the estimated regression equation Y = a + bX:
 
(a) Y = 20, X = 10, XY = 1000, X2 = 2000, n = 10.
(b) X = 528, Y = 11720, XY = 193640, X2 = 11440, n = 32
Example (10)
For the following set of data: (AIOU)
(a) plot the scatter diagram.
(b) Develop the estimating equation that best describes the data
(c) Predict Y for X = 10, 15, 20
x 13 16 14 11 17 9 13 17 18 12
y 6.2 8.6 7.2 4.5 9.0 3.5 6.5 9.3 9.5 5.7
Example (11)
Cost accountant often estimates overhead based on the level of production. At the Standard Knitting Co.
they have collected information on overhead expenses and units produced at different plants, and want to estimate a
regression equation to predict future overhead. (AIOU)
Overhead 191 170 272 155 280 173 234 116 153 178
Units 40 42 53 35 56 39 48 30 37 40
(a) Develop the regression equation for the cost accountants.
(b) Predict overhead when 50 units are produced.
(c) Calculate the standard error of estimate.

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (9)

Coefficient of Determination
The variability among the values of the dependent variable Y, called the total variation, is given

by (Y  Y)2. This composed of two parts:
^ 
(i) one is explained by (associated with) the regression line. i.e (Y  Y)2
^
(ii) other which is not explained by (not associated with) the regression line. i.e. (Y  Y)2.
Symbolically,
 ^ ^ 
(Y  Y)2 = (Y  Y)2 + (Y  Y)2
Total Variation = Unexplained Variation + Explained Variation
See the diagram;

The coefficient of determination which measures the proportion of variability in the values of the
dependent variable (Y) associated with its linear relation with the independent variable (X) is defined by:
^  ^
Explained Variation
(Y.  Y)2 (Y  Y. )2
r2 = Total Variation = =1
 2 
(Y  Y) (Y  Y)2
Alternate formula for Coefficient of Determination:

2 aY + bXY  nY2
r =
2
Y2 nY
Example (12)
years R&D Annual
expenses (X) Profit (Y)
Calculate 1st 5 31
Coefficient of 2nd 11 40
Determination 3rd 4 30
using both the 4th 5 34
formulas 5th 3 25
6th 2 20
X = 30 Y=180

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (10)

Correlation
Two variables are said to be correlated if they tend to simultaneously vary in some direction; if
both the variables tend to increase (or decrease) together, the correlation is said to be direct or positive.
e.g. the length of an iron bar will increase as temperature increases. If one variable tend to increase as the
other variable decreases, the correlation is said to be negative or inverse. e.g. the volume of gas will
decrease as the pressure increases.
Correlation in fact is the strength of the relationship that is the interdependence between the two
variables that is there is no distinction between dependent and independent variable. In regression, by
contrast, we are interested in determining the dependence of one variable upon the other variable.
The numerical measure of strength in the linear relationship between any two variables is called
the correlation coefficient, usually denoted by r, is defined by
_ _
(XX) (YY)
r=
_ _
(XX)2 (YY) 2
called Pearson Product Moment Correlation Coefficient.
alternatively
XY(X)( Y)/n
r=
[X (X)2/n][ Y2(Y)2/n]
2

It assumes values that range from +1 for perfect positive linear relationship, to – 1, for perfect
negative linear relationship and r = 0 indicates no linear relationship between X and Y.
It is important to note that r = 0 does not mean that there is no relationship at all. e.g. if all the
observed values lie exactly on a circle, there is a perfect non-linear relationship between the variables.
Analysis of Correlation and Regression
(1) The correlation answers the STRENGTH of linear association between paired variables, say X and Y. On
the other hand, the regression tells us the FORM of linear association that best predicts Y from the values
of X.
(2) Correlation is calculated whenever:
o both X and Y is measured in each subject and quantify how much they are linearly associated.
o in particular the Pearson's product moment correlation coefficient is used when the assumption of
both X and Y are sampled from normally-distributed populations are satisfied
o or the Spearman's moment order correlation coefficient is used if the assumption of normality is
not satisfied.
o correlation is not used when the variables are manipulated, for example, in experiments.
(3) Linear regression is used whenever:
 at least one of the independent variables (Xi's) is to predict the dependent variable Y. Note: Some
of the Xi's are dummy variables, i.e. Xi = 0 or 1, which are used to code some nominal variables.
 if one manipulates the X variable, e.g. in an experiment.
(4) Linear regression are not symmetric in terms of X and Y. That is interchanging X and Y will give a
different regression model (i.e. X in terms of Y) against the original Y in terms of X. On the other hand, if
you interchange variables X and Y in the calculation of correlation coefficient you will get the same value
of this correlation coefficient.
(5) The "best" linear regression model is obtained by selecting the variables (X's) with at least strong
correlation to Y, i.e. >= 0.80 or <= -0.80
(6) The same underlying distribution is assumed for all variables in linear regression. Thus, linear regression
will underestimate the correlation of the independent and dependent when they (X's and Y) come from
different underlying distributions.

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (11)

Rank Correlation
Sometimes, the actual measurements of individuals or objects are either not available or accurate
assessment is not possible. They are then arranged in order according to some characteristic of interest..
Such an ordered arrangement is called a ranking and the order given to an individual or object is called its
rank. The correlation between two such sets of ranking is called Rank Correlation.
Let we have n pairs of two data sets ranked with respect to some characteristic. Say, (x1, y1),
(x2,y2), (x3, y3), … , (xn, yn). Since both xi and yi are the first n natural numbers, therefore we have
n(n+1)
xi = 1 + 2 + … + n =
2
n(n+1)(2n+1)
x2 = y2 = 12 + 22 + … + n2 =
6
2
- = (yi - y)2- = yi2 - (y i) n(n+1)(2n+1) n(n+1)2 n(n2-1)
(xi - x)2 = - =
n 6 4 12
Let di = xi - yi
Then
di2 = (xi - yi)2 = xi2 + yi2 - 2xi yi
n(n+1)(2n+1) n(n+1)(2n+1)
= + - 2xi yi
6 6
n(n+1)(2n+1) 1
xi yi = - di2
6 2
The product moment coefficient of correlation is:
XY(X)( Y)/n
r=
[X (X)2/n][ Y2(Y)2/n]
2

by substitution we have
6di2
r=1-
n(n2 - 1)
This is also ranging from – 1 to + 1
Note
If two objects or observations are tied (having same value), lets say for fourth and fifth, then they
are both given the mean rank of 4 and 5. i.e. 4.5.
This situation is given in the following example.
Example (13)
The following table shows the number of hours studied (X) by a random sample of ten students
and their grades in examination (Y):
X: 8 5 11 13 10 5 18 15 2 8
Y: 56 44 79 72 70 54 94 85 33 65
Calculate Spearman’s rank correlation coefficient.
Solution
We rank the X values by giving rank 1 to the highest value 18, rank 2 to 15, rank 3 to 13, rank 4
to 11, rank 5 to 10, rank 6.5 (mean of rank 6 and 7) to both 8, rank 8.5 (mean of rank 8 and 9) to both 5
and rank 10 to 2. Similarly we rank the values of Y by giving 1 to the highest value 94, rank 2 to 85, rank
3 to 79, …, and rank 10 to 33 which is the smallest.
Table given below:

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (12)

X Y Rank of X Rank of Y di d2
8 56 6.5 7 - 0.5 0.25
5 44 8.5 9 - 0.5 0.25
11 79 4 3 1.0 1
13 72 3 4 - 1.0 1
10 70 5 5 0.0 0
5 54 8.5 8 0.5 0.25
18 94 1 1 0.0 0
15 85 2 2 0.0 0
2 33 10 10 0.0 0
8 65 6.5 6 0.5 0.25
d2 = 3
The value of n is 10.
Hence
6di2
r=1-
n(n2 - 1)
6(3)
=1-
10(102 - 1)
= 0.98
Compare this value with the correlation coefficient for the original values.
Example (14)
Ten competitors in a beauty contest are ranked by three judges in the following order
1st Judge 1 6 5 10 3 2 4 9 7 8
2nd Judge 3 5 8 4 7 10 2 1 6 9
3rd Judge 6 4 9 8 1 2 3 10 5 7
Use the rank correlation coefficient to discuss which pair of judges have the nearest approach to
common tastes in beauty.
Example (15)
The body mass index (BMI) For an individual is found as follows. Using your weight in pounds
and your height in inches, multiply your weight by 705, divide the result by your height, and divide again
by your height. The desirable body mass index varies between 19 and 25. Table gives the body mass
index and the age for 20 individuals. Find the coefficient rank correlation for data shown in the table.

BMI AGE MBI AGE

22.5 27 19.0 44
24.6 32 17.5 18
28.7 45 32.5 29
30.1 49 22.4 29
18.5 19 28.8 40
20.0 22 21.3 39
24.5 31 25.0 21
25.0 27 19.0 20
27.5 25 29.7 52
30.0 44 16.7 19

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (13)

Example (16)
Table gives the percent of calories from fat and the micrograms of lead per decilitter of blood for
a sample of preschoolers. Find the coefficient of rank correlation for the data shown in the table.

PERCENT FAT LEAD

CALORIES
40 13
35 12
33 11
29 8
35 13
36 15
30 11
36 14
33 12
28 9
39 15
26 7

Ans. Coefficient of Rank Correlation = 0.919

Muhammad Naeem, Assistant Professor Department of Mathematics

Handout 03 Regression and Correlation (14)

Assignment (1)
Given the following data:

X1: 41 31 26 43 21 33 41 31 46 31 36 32 38 27 35 40
X2: 62 52 50 56 51 52 63 50 47 40 56 54 60 57 57 58
X3: 41 33 33 38 35 36 43 33 37 33 33 31 30 37 35 31

(a) Calculate all the six simple regression coefficients.

(b) Calculate all the simple correlation coefficients.
(c) Calculate the correlation coefficients between X1 and a combined effect of X2 and
X3.
(d) Calculate the correlation coefficients between X1 and X2 when the effect of X3.is
held constant.

X1 X2 X3 X12 X22 X32 X1X2 X1X3 X2X3

… … … … … … … … …
… … … … … … … … …
(a) Calculate six simple regression coefficients.

nX1 X2  (X1)(X2) nX1 X2  (X1)(X2) nX1 X3  (X1)(X3)

b12 = b21 = b13 =
nX22  (X2)2 nX12  (X1)2 nX32  (X3)2
nX1 X3  (X1)(X3) nX2 X3  (X2)(X3) nX2 X3  (X2)(X3)
b31 = b23 = b32 =
nX12  (X1)2 nX32  (X3)2 nX22  (X2)2

(b) Calculate all the simple correlation coefficients.

nX1 X2  X1 X2
r12 =
nX1  (X1)2 nX22  (X2)2
2

nX1 X3  X1 X3

r13 =
nX1  (X1)2 nX32  (X3)2
2

nX2 X3  X2 X3

r23 =
nX2  (X2)2 nX32  (X3)2
2

(c) correlation coefficients between X1 and a combined effect of X2 and X3.

r122 + r132 - 2 r12 r13 r23

R1.23 =
1 - r232

(d) The correlation coefficients between X1 and X2 when the effect of X3.is held constant

r12 - r13 r23

r12.3 =
1 - r132 1 - r232

Muhammad Naeem, Assistant Professor Department of Mathematics

Probit and Logit Models R Program and Output
No ratings yet
Probit and Logit Models R Program and Output
6 pages
EDU 801 Lecture Note Summarized For 2025
No ratings yet
EDU 801 Lecture Note Summarized For 2025
41 pages
Geodetic Surveying and The Adjustment of Observations 1911
No ratings yet
Geodetic Surveying and The Adjustment of Observations 1911
414 pages
Immediate Download Miller and Freunds Probability and Statistics For Engineers 9th Edition Johnson Solutions Manual All Chapters
100% (9)
Immediate Download Miller and Freunds Probability and Statistics For Engineers 9th Edition Johnson Solutions Manual All Chapters
57 pages
Malina 2005
No ratings yet
Malina 2005
9 pages
Buku 1
No ratings yet
Buku 1
12 pages
Megersa MBA Thesis For Defense (2024)
No ratings yet
Megersa MBA Thesis For Defense (2024)
74 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
1 - Course Slides - Data Science and ML Fundamentals
No ratings yet
1 - Course Slides - Data Science and ML Fundamentals
92 pages
Practical 9
No ratings yet
Practical 9
6 pages
Factors Influencing The Development of W
No ratings yet
Factors Influencing The Development of W
14 pages
Linear Regression Analysis - 1
No ratings yet
Linear Regression Analysis - 1
18 pages
Jurnal Akurasi Uav Dem Tampa GCP
No ratings yet
Jurnal Akurasi Uav Dem Tampa GCP
28 pages
Correlation, Regression & Curve Fitting
No ratings yet
Correlation, Regression & Curve Fitting
6 pages
Regression Analysis
No ratings yet
Regression Analysis
6 pages
Model Summary: Dimension0
No ratings yet
Model Summary: Dimension0
8 pages
Ostrich Manual 17-12-19
No ratings yet
Ostrich Manual 17-12-19
79 pages
The Distributive Power of The Philippines National Household Targeting System For Poverty Reduction-NHTSPR Implication For Social Development (CSO Version)
No ratings yet
The Distributive Power of The Philippines National Household Targeting System For Poverty Reduction-NHTSPR Implication For Social Development (CSO Version)
49 pages
STAT1
No ratings yet
STAT1
17 pages
1-27 Propogation of Error
No ratings yet
1-27 Propogation of Error
22 pages
OM Study Guide Lessons 1-12
80% (5)
OM Study Guide Lessons 1-12
122 pages
Themultipleregressionmodel: I I1 I2 I3
No ratings yet
Themultipleregressionmodel: I I1 I2 I3
16 pages
(ENGDAT2) Exercise 3
No ratings yet
(ENGDAT2) Exercise 3
10 pages
Principles of Measurement and Analysis: Module - 6 1
No ratings yet
Principles of Measurement and Analysis: Module - 6 1
49 pages
Sta 224 Lecture Note 2
No ratings yet
Sta 224 Lecture Note 2
17 pages
QMT 3001 Business Forecasting Term Project
No ratings yet
QMT 3001 Business Forecasting Term Project
30 pages
SPC Quiz
0% (1)
SPC Quiz
18 pages
Control Systems: Prepared By: Muhammad Moeen Sultan
No ratings yet
Control Systems: Prepared By: Muhammad Moeen Sultan
96 pages
Bio-L8 - Correlation and Regression Analysis
No ratings yet
Bio-L8 - Correlation and Regression Analysis
15 pages
Notes 1
No ratings yet
Notes 1
26 pages
04 Dispersion Measures
No ratings yet
04 Dispersion Measures
17 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Regression
No ratings yet
Regression
7 pages
Hafiz Shahzad Hussain: Mechanical Engineer
No ratings yet
Hafiz Shahzad Hussain: Mechanical Engineer
1 page
Statistical Model For Agriculture (Cost and Yield Pridiction)
No ratings yet
Statistical Model For Agriculture (Cost and Yield Pridiction)
14 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Submitted By:: University of Engineering and Technology Lahore (KSK Campus)
No ratings yet
Submitted By:: University of Engineering and Technology Lahore (KSK Campus)
12 pages
Correlation
No ratings yet
Correlation
22 pages
DMJAP LinearRegression 3
No ratings yet
DMJAP LinearRegression 3
28 pages
Mechanics of Machines List of Experiments
No ratings yet
Mechanics of Machines List of Experiments
7 pages
Assignment 6 - STAT
No ratings yet
Assignment 6 - STAT
12 pages
CHM121 - Module 3 - Evaluation of Analytical Data
No ratings yet
CHM121 - Module 3 - Evaluation of Analytical Data
114 pages
02 ES Probability Theory
No ratings yet
02 ES Probability Theory
22 pages
04 ES Random Variables
No ratings yet
04 ES Random Variables
17 pages
Problem Description: Figure 1 Schematics of Cheng Cycle
No ratings yet
Problem Description: Figure 1 Schematics of Cheng Cycle
2 pages
Assessment 2
No ratings yet
Assessment 2
10 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
66 July 407
No ratings yet
66 July 407
14 pages
STB1003 - Unit-3 BSC
No ratings yet
STB1003 - Unit-3 BSC
12 pages
Regression
No ratings yet
Regression
60 pages
Mechanics of Machines: Lab Report
No ratings yet
Mechanics of Machines: Lab Report
52 pages
Correlation
No ratings yet
Correlation
13 pages
Lecture 6 Linear Regression
No ratings yet
Lecture 6 Linear Regression
8 pages
Topic 5-Lecture Notes
No ratings yet
Topic 5-Lecture Notes
12 pages
CMB SP 85 1 3e
No ratings yet
CMB SP 85 1 3e
34 pages
Regression
No ratings yet
Regression
20 pages
CV 2018-Me-349
No ratings yet
CV 2018-Me-349
2 pages
Regression Bs
No ratings yet
Regression Bs
29 pages
09 Sampling Distribution
No ratings yet
09 Sampling Distribution
15 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
06 Regression
No ratings yet
06 Regression
18 pages
Multicollinearity Among The Regressors Included in The Regression Model
No ratings yet
Multicollinearity Among The Regressors Included in The Regression Model
13 pages
Valve Classification According To Port Size
No ratings yet
Valve Classification According To Port Size
6 pages
Regression Analysis - SSB
No ratings yet
Regression Analysis - SSB
2 pages
Cha 6
No ratings yet
Cha 6
8 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Regression
No ratings yet
Regression
14 pages
Analytics - PrepBook 2018 PDF
No ratings yet
Analytics - PrepBook 2018 PDF
34 pages
Lecture 3.1.9 (REGRESSION)
No ratings yet
Lecture 3.1.9 (REGRESSION)
9 pages
CH 6
No ratings yet
CH 6
43 pages
The CUSUM Test: When The Regression Is Estimated Using Only The First T 1
No ratings yet
The CUSUM Test: When The Regression Is Estimated Using Only The First T 1
3 pages
INTRO
No ratings yet
INTRO
7 pages
MMW Module 10 - Correlation and Linear Regression
No ratings yet
MMW Module 10 - Correlation and Linear Regression
13 pages
Ma724 - 38
No ratings yet
Ma724 - 38
7 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
CH 6
No ratings yet
CH 6
42 pages
Bad Geometry
No ratings yet
Bad Geometry
5 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
Correlation and Regression
No ratings yet
Correlation and Regression
22 pages
Regression Analysis
No ratings yet
Regression Analysis
22 pages
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
No ratings yet
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
19 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
What Is Simple Linear Regression?
No ratings yet
What Is Simple Linear Regression?
7 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
VII Pearson R
No ratings yet
VII Pearson R
4 pages
Association
No ratings yet
Association
57 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Correlation and Linear
No ratings yet
Correlation and Linear
27 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Statistics of Two Variables: Functions
No ratings yet
Statistics of Two Variables: Functions
15 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
23 pages
Handout 05 Regression and Correlation PDF
No ratings yet
Handout 05 Regression and Correlation PDF
17 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
(Revised) Simple Linear Regression and Correlation
No ratings yet
(Revised) Simple Linear Regression and Correlation
41 pages

03 ES Regression Correlation

Uploaded by

03 ES Regression Correlation

Uploaded by

Handout 03 Regression and Correlation (1)

03 Regression and Correlation

Fertilizer Applied: =x 1 1.5 2 2.5 3 3.5 4 4.5

Muhammad Naeem, Assistant Professor Department of Mathematics

Muhammad Naeem, Assistant Professor Department of Mathematics

(Y)(X2)  (X)(XY) nXY  (X)(Y)

Muhammad Naeem, Assistant Professor Department of Mathematics

Muhammad Naeem, Assistant Professor Department of Mathematics

(Y)(X2)  (X)(XY) nXY  (X)(Y)

(Y)(X2)  (X)(XY) nXY  (X)(Y)

Muhammad Naeem, Assistant Professor Department of Mathematics

Year: 2001-2003 2003-2005 2005-2007 2007-2009 2009-2011

Muhammad Naeem, Assistant Professor Department of Mathematics

Example (5) (Curve Fitting)

(ii) Fit an equation of the form Y = aX2 + bX to the following data

(iii) Fit an exponential curve Y = a ebX to the following data

(iv) Fit an equation of the form Y = a Xb to the following data

Muhammad Naeem, Assistant Professor Department of Mathematics

Standard Deviation of Regression or Standard Error of Estimate

Muhammad Naeem, Assistant Professor Department of Mathematics

Muhammad Naeem, Assistant Professor Department of Mathematics

Muhammad Naeem, Assistant Professor Department of Mathematics

Muhammad Naeem, Assistant Professor Department of Mathematics

BMI AGE MBI AGE

Muhammad Naeem, Assistant Professor Department of Mathematics

PERCENT FAT LEAD

Ans. Coefficient of Rank Correlation = 0.919

Muhammad Naeem, Assistant Professor Department of Mathematics

(a) Calculate all the six simple regression coefficients.

X1 X2 X3 X12 X22 X32 X1X2 X1X3 X2X3

nX1 X2  (X1)(X2) nX1 X2  (X1)(X2) nX1 X3  (X1)(X3)

(b) Calculate all the simple correlation coefficients.

nX1 X3  X1 X3

nX2 X3  X2 X3

(c) correlation coefficients between X1 and a combined effect of X2 and X3.

r122 + r132 - 2 r12 r13 r23

r12 - r13 r23

Muhammad Naeem, Assistant Professor Department of Mathematics

You might also like