0% found this document useful (0 votes)

37 views44 pages

Correlation and Regression

Uploaded by

Aman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views44 pages

Correlation and Regression

Uploaded by

Aman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

CORRELATION

AND
REGRESSION
CORRELATION:
The relationship between two or more than two
variables is known as correlation. For example; the
relationship between cost and price, demand and
supply, distance and velocity, production of crops and
fertility of soil, amount of rain fall, relative humidity etc.
are some examples of correlation. There are three types
of correlation
(i) Simple correlation,
(ii) Partial correlation and
(iii) Multiple correlation.
SIMPLE CORRELATION:
The relationship between two variables is called simple
correlation or linear correlation.
The numerical measurement of strength of relationship
or degree of relationship between two variables is
known as simple correlation coefficient. If x and y are
two variables then simple correlation coefficient
between them is denoted by 𝒓𝒙𝒚 . The variables x and y
are interchangeable so if one is considered as
dependent variable then another will be independent
variable. The simple correlation coefficient is given by
following formula;
𝒔𝒙𝒚
1. 𝒓𝒙𝒚 = where,
√𝒔𝒙𝒙 √𝒔𝒚𝒚
∑𝒙∑𝒚
𝒔𝒙𝒚 =∑(𝒙 − 𝒙 ̅)=∑ 𝒙𝒚 -
̅)(𝒚 − 𝒚
𝒏
𝟐
𝟐 𝟐 (∑ 𝒙)
𝒔𝒙𝒙 =∑(𝒙 − 𝒙
̅) =∑ 𝒙 - and
𝒏

𝟐 𝟐 (∑ 𝒚)𝟐
𝒔𝒚𝒚 =∑(𝒚 − 𝒚
̅) =∑ 𝒚 -
𝒏

2.
∑(𝒙−𝒙 ̅̅̅
̅)(𝒚−𝒚)
𝒓𝒙𝒚 =
̅)𝟐 √∑(𝒚−𝒚
√∑(𝒙−𝒙 ̅)𝟐
3.
𝒏 ∑ 𝒙𝒚−∑ 𝒙 ∑ 𝒚
𝒓𝒙𝒚 =
√𝒏 ∑ 𝒙𝟐 −(∑ 𝒙)𝟐 √𝒏 ∑ 𝒚𝟐 −(∑ 𝒚)𝟐

4.
𝒏 ∑ 𝒖𝒗−𝒖 ∑ 𝒗 𝒙−𝒂 𝒚−𝒂
𝒓𝒙𝒚 = where u= and v=
√𝒏 ∑ 𝒖𝟐 −(∑ 𝒖)𝟐 √𝒏 ∑ 𝒗𝟐 −(∑ 𝒗) 𝟐 𝒉 𝒉

Theorem:
Prove that simple correlation coefficient is always lies
between -1 and +1 i.e. -1≤ 𝒓𝒙𝒚 ≤1.
Proof:
̅)𝟐
∑(𝒙−𝒙
We know 𝒔𝟐𝒙 = ̅)𝟐 =(n-1)𝒔𝟐𝒙
∴ ∑(𝒙 − 𝒙
𝒏−𝟏
̅)𝟐
∑(𝒚−𝒚
𝒔𝟐𝒚 = ̅)𝟐 =(n-1)𝒔𝟐𝒚
∴ ∑(𝒚 − 𝒚
𝒏−𝟏
𝟏 ∑(𝒙−𝒙 ̅̅̅
̅)(𝒚−𝒚)
𝒓𝒙𝒚 = ̅)(𝒚 − ̅̅̅
∴ ∑(𝒙 − 𝒙 𝒚)=(n-1) 𝒓𝒙𝒚 𝒔𝒙 𝒔𝒚
𝒏−𝟏 𝒔𝒙 𝒔 𝒚

Consider the sum of square as,

𝒙−𝒙 ̅ 𝒚−𝒚 ̅ 𝟐
∑( ± ) ≥𝟎
𝒔𝒙 𝒔𝒚
̅)𝟐
(𝒙−𝒙 ̅)(𝒚−𝒚
(𝒙−𝒙 ̅) ̅)𝟐
(𝒚−𝒚
Or, ∑ ±𝟐∑ +∑ ≥𝟎
𝒔𝟐𝒙 𝒔𝒙 𝒔𝒚 𝒔𝟐𝒚
(𝐧−𝟏)𝒔𝟐𝒙 (𝐧−𝟏) 𝒓𝒙𝒚 𝒔𝒙 𝒔𝒚 (𝐧−𝟏)𝒔𝟐𝒚
Or, ±𝟐 + ≥𝟎
𝒔𝟐𝒙 𝒔𝒙 𝒔𝒚 𝒔𝟐𝒚

Or, (n-1)±𝟐(𝒏 − 𝟏)𝒓𝒙𝒚 +(n-1)≥ 𝟎

Or, 2(n-1) ±𝟐(𝒏 − 𝟏)𝒓𝒙𝒚 ≥ 𝟎.
Or, 2(n-1) {𝟏 ± 𝒓𝒙𝒚 } ≥ 𝟎.
Or, 𝟏 ± 𝒓𝒙𝒚 ≥ 𝟎
Taking positive sign
𝒓𝒙𝒚 ≥ −𝟏
Taking negative sign
𝒓𝒙𝒚 ≤ 𝟏.
Combining both we get,
. -1≤ 𝒓𝒙𝒚 ≤1 proved.

Properties of simple correlation coefficient

1.The dependent and independent variables are
interchangeable 𝒓𝒙𝒚= 𝒓𝒚𝒙 .
2. It has no units.
3. It is independent upon the change of scale.
4. Its value is always lies between -1 and +1.
i.e. -1≤ 𝒓𝒙𝒚 ≤1 .
5. (i) If 𝒓𝒙𝒚 =+1 then x and y have perfectly positive
correlation.

(ii)If 𝒓𝒙𝒚 =-1 then x and y have perfectly negative

correlation.

(iv) If 𝒓𝒙𝒚 =0, then x and y have no correlation.

6.
Simple correlation coefficient is geometric mean of
two regression coefficient. 𝒊. 𝒆 𝒓𝒙𝒚 =√𝒃𝒙𝒚 𝒃𝒚𝒙
INTERPRETING THE LINEAR CORRELATION COEFFICIENT
The value of 𝒓𝒙𝒚 must always fall between -1 and +1
inclusive. If r is close to zero, we conclude that there is
no significance linear correlation between x and y but if
it is close to -1 or +1 we conclude that there is a
significance linear correlation between x and y.
COEFFICIENT OF LINEAR DETERMINATION:
If 𝒓𝒙𝒚 be the linear correlation between two variables x
and y then 𝒓𝟐𝒙𝒚 is the coefficient determination. It is
used to interpret the value of coefficient of linear
correlation which gives how far the changes in one
variable is explained by the other variable. For example;
If 𝒓𝒙𝒚 =0.6 then 𝒓𝟐𝒙𝒚 =0.36=36% means 36% in changes in
one variable is explained by another variable.

PARTIAL CORRELATION
The relationship between three or more than three
variables in which one is dependent, one is independent
and rest of independent variables are kept constant is
known as partial correlation.
The numerical measurement of strength of relationship
between a dependent variable and an independent
variable by keeping rest of the independent variables
constant is known as partial correlation coefficient. For
example; the relationship between quantity of
production of crops and fertility of soil by keeping
amount of rain fall, quality of seeds etc. constant is an
example of partial correlation.
If x₁ is a dependent variable, x₂ and x₃ are independent
variables then partial correlation coefficient between x₁
and x₂ by keeping x₃ constant is denoted by r₁₂.₃ and
given by formula as ;
𝒓₁₂−𝒓₁₃𝒓₂₃
r₁₂.₃ = ,
√𝟏−𝒓𝟐𝟏𝟑 √𝟏−𝒓𝟐𝟐𝟑

Similarly,
𝒓₁₃−𝒓₁₂𝒓₃₂
r₁₃.₂ = and
√𝟏−𝒓𝟐𝟏𝟐 √𝟏−𝒓𝟐𝟑𝟐

𝒓₂₃−𝒓₂₁𝒓₃₁
r₂₃.₁ = .
√𝟏−𝒓𝟐𝟐𝟏 √𝟏−𝒓𝟐𝟑𝟏

PROPERTIES OF PARTIAL CORRELATION COEFFICIENT:

1. The subscripts before the dot can be interchanged

i.e. r₁₂.₃= r₂₁.₃
2. Its value lies between -1 and +1.
3. In partial correlation if only one independent
variable is kept constant then it is said to be partial
correlation of first order, if two independent
variables are kept constant then it is said to be
partial correlation of second order and so on.
COEFFICIENT OF PARTIAL DETERMINATION
The square of the partial correlation coefficient is called
coefficient of partial determination.
Hence 𝒓𝟐𝟏𝟐.𝟑 , 𝒓𝟐𝟏𝟑.𝟐 𝒂𝒏𝒅 𝒓𝟐𝟏𝟑.𝟐 are coefficient of partial
determination. It is used to interpret the value of partial
correlation coefficient. For example; if r₁₂.₃ =0.9 then
𝒓𝟐𝟏𝟐.𝟑 =0.81=81% means 81% of variation in dependent
variable x₁ is explained by independent variable x₂ by
keeping x₃ constant.

MULTIPLE CORRELATION
The relationship between a dependent variable and two
or more than two independent variables in which the
effect of all independent variables are kept together is
known as multiple correlation.
The numerical measurement of strength of relationship
between a dependent variable and two or more than
two independent variables in which the effect of all
independent variables are kept together is known as
multiple correlation coefficient. The multiple
correlation coefficient between a dependent variable x₁
and independent variables x₂ and x₃ is denoted by R₁.₂₃
And given by following formula as;
𝒓𝟏𝟐 𝟐 +𝒓𝟏𝟑 𝟐 −𝟐𝒓𝟏𝟐 𝒓𝟏𝟑 𝒓𝟐𝟑
R₁.₂₃ =√
𝟏−𝒓𝟐𝟐𝟑

Similarly,
𝒓𝟐𝟑 𝟐 +𝒓𝟐𝟏 𝟐 −𝟐𝒓𝟐𝟑 𝒓𝟐𝟏 𝒓𝟑𝟏
R₂.₃₁ =√
𝟏−𝒓𝟐𝟏𝟑

𝒓𝟑𝟏 𝟐 +𝒓𝟑𝟐 𝟐 −𝟐𝒓𝟑𝟏 𝒓𝟑𝟐 𝒓𝟏𝟐

R₃.₁₂ =√
𝟏−𝒓𝟐𝟐𝟑

PROPERTIES OF MULTIPLE CORRELATION COEFFICIENT

(i) Its value lies between 0 and +1
i.e.0≤ 𝐑₁. ₂₃ ≤ 𝟏.
(ii) The position of the subscripts after the dot can
be interchanged R₁.₂₃ = R₁.₃₂
(iii) Multiple correlation coefficient is not less than
simple correlation coefficient; i.e. R₁.₂₃ ≥r₁₂,
R₁.₂₃ ≥r₁₃ and R₁.₂₃ ≥r₂₃.
(iv) If R₁.₂₃ =0 then r₁₂=0 and r₁₃=0.
COEFFICIENT OF MULTIPLE DETERMINATION
The square of multiple correlation coefficient is known
as coefficient of multiple determination. Thus
𝑹𝟐𝟏.𝟐𝟑 , 𝑹𝟐𝟐.𝟑𝟏 𝒂𝒏𝒅 𝑹𝟐𝟑.𝟏𝟐 are coefficient of multiple
determination. It is used to interpret the value of
multiple correlation coefficient.
For example; if R₁.₂₃=0.8 then 𝑹𝟐𝟏.𝟐𝟑 =0.64= 64% means
64% of the variation in dependent variable x₁ is
explained by independent variables x₂ and x₃ together.
Problems:(Old questions of IOE)
1. Define partial correlation and multiple correlation
with suitable examples. Write down the properties
of multiple and partial correlation coefficient.
2. What do you mean by the correlation coefficient?
Show that the correlation coefficient lies between
-1 and +1.
3. The simple correlation coefficients between
fertilizer x₁, seeds x₂ and productivity x₃ are
r₁₂=0.69,
r₁₃=0.64 and r₂₃=0.85. Calculate the partial
correlation coefficient r₁₂.₃ and multiple correlation
coefficient R₁.₂₃.
3. Write the uses of correlation and regression in the
field of engineering.

4. Write the properties of correlation coefficient and

describe under what condition there exist only one
regression line.
5. Distinguish between correlation coefficient and
regression coefficient and write its importance in
the
field of engineering.
6. A sample of 10 values of three variables X₁, X₂ and
X₃ were obtained as;
∑ 𝑿𝟏 =10 ∑ 𝑿𝟐 =20 ∑ 𝑿𝟑 =30
∑ 𝑿𝟐𝟏 =20 ∑ 𝑿𝟐𝟐 =68 ∑ 𝑿𝟐𝟑 =170
∑ 𝑿𝟏 𝑿𝟐 =10 ∑ 𝑿𝟏 𝑿𝟑 =15 ∑ 𝑿𝟐 𝑿𝟑 =64
Find
a) Partial correlation coefficient between X₁ and X₂
eliminating the effect of X₃.
b) Multiple correlation coefficient between X₁, X₂
and X₃ assuming X₁ as dependent.
7. Define multiple correlation with suitable example.
You are given simple correlation coefficients as
r₁₂=0.93, r₁₃=0.50 and r₂₃=0.34. Assuming the first
variable as dependent, compute the multiple
correlation coefficient R₁.₂₃ and coefficient of
multiple determination𝑹𝟐𝟏.𝟐𝟑 . Also interpret the
result.
8. Ten still wires of diameter 0.5 mm and length 2.5 m
were extended in a laboratory by applying vertical
forces of varying magnitudes. Results are as
follows:
Forces in kg 15 19 25 35 42 48 53 56 62 65
Increase in 1.7 2.1 2.5 3.4 3.9 4.9 5.4 5.7 6.6 7.2
length(mm)
Determine correlation coefficient and coefficient of
determination between force and increase in
length and interpret the result using coefficient of
determination.
9. The concentration of chloride and phosphates is
solution is given below in milligrams per liter are
determine over a 10 days period.
Chloride 64 66 64 62 65 64 64 67 74 69
Phosphates 1.31 1.39 1.59 1.68 1.89 1.98 1.97 1.99 1.98 2.15
i) Compute the correlation coefficient r. comment
on the result.
ii) Do you see any role in this association for
predictive purposes?
10. An article in the Journal of Environment
Engineering (Vol. 115, No 3, 1989, pp,608-619)
reported the results of a study on the occurrence of
Sodium and chloride in surface streams in central
Rhode Island. The following data are chloride
concentration y (in milligrams per liter) and
roadway area in the watershed x (in percentage).
X 4.4 6.6 9.7 10.6 10.8 10.9 11.8 12.1
Y 0.19 0.15 0.57 0.70 0.67 0.63 0.47 0.70

Find the correlation coefficient and coefficient of

determination of the given data and draw your
conclusion.
11. A house hold survey of monthly
expenditure on food yields the following results;
Monthly 10 15 20 25 30 35 40
expenditure
(Rs 100)
Monthly 2 4 5 7 6 6 5
income (Rs
1000)
Size of the 4 5 7 10 8 11 4
family

Calculate the coefficient of multiple correlation and

coefficient of multiple determination. Also
interpret the result.
12. From following data find the Karl
Pearson’s coefficient of correlation and interpret
the result:
Marks in 39 65 62 90 82 75 25 98 36 78
statistics
Marks in 47 53 58 86 62 68 60 91 51 84
mathematics

13. In trying to evaluate the effectiveness of

the antibodies in killing bacteria. A research
institute compiled the following information
Antibodies(mg) 12 15 14 16 17 10
Bacteria 5 7 5.6 7.2 8.6 6.2

Find strength and direction of relationship between

them.
REGRESSION ANALYSIS
The relationship between a dependent variable and
one or more than one independent variables in
which the value of dependent variable is predicted
with the help of independent variables is known as
regression analysis. It indicates the cause and effect
of relationship between the variables and establish
the functional relationship. Correlation describe, in
what degree the variables are related? On the
other hand, regression describe how the variables
are related? It means regression explain the nature
of relationship whereas correlation explain strength
of relationship.

SIMPLE REGRESSION:
What is Simple Linear Regression?
Simple linear regression is a statistical method that allows us to summarize and study relationships
between two continuous (quantitative) variables:
 One variable, denoted x, is regarded as the predictor, explanatory, or independent variable.
 The other variable, denoted y, is regarded as the response, outcome, or dependent variable.
Because the other terms are used less frequently today, we'll use the "predictor" and "response"
terms to refer to the variables encountered in this course. The other terms are mentioned only to
make you aware of them should you encounter them. Simple linear regression gets its adjective
"simple," because it concerns the study of only one predictor variable. In contrast, multiple linear
regression, which we study later in this course, gets its adjective "multiple," because it concerns the
study of two or more predictor variables.
METHOD OF LEAST SQUARE:

Least Square Method

The least square method is the process of finding the best-fitting curve or line of best fit for a
set of data points by reducing the sum of the squares of the offsets (residual part) of the points
from the curve. During the process of finding the relation between two variables, the trend of
outcomes are estimated quantitatively. This process is termed as regression analysis. The
method of curve fitting is an approach to regression analysis. This method of fitting equations
which approximates the curves to given raw data is the least square.
It is quite obvious that the fitting of curves for a particular data set are not always unique. Thus,
it is required to find a curve having a minimal deviation from all the measured data points. This
is known as the best-fitting curve and is found by using the least-squares method.

Least Square Method Definition

The least-squares method is a crucial statistical method that is practised to find a regression line
or a best-fit line for the given pattern. This method is described by an equation with specific
parameters. The method of least squares is generously used in evaluation and regression. In
regression analysis, this method is said to be a standard approach for the approximation of sets of
equations having more equations than the number of unknowns.
The method of least squares actually defines the solution for the minimization of the sum of
squares of deviations or the errors in the result of each equation. Find the formula for sum of
squares of errors, which help to find the variation in observed data.
The least-squares method is often applied in data fitting. The best fit result is assumed to reduce
the sum of squared errors or residuals which are stated to be the differences between the
observed or experimental value and corresponding fitted value given in the model.
There are two basic categories of least-squares problems:

 Ordinary or linear least squares

 Nonlinear least squares
These depend upon linearity or nonlinearity of the residuals. The linear problems are often seen
in regression analysis in statistics. On the other hand, the non-linear problems generally used in
the iterative method of refinement in which the model is approximated to the linear one with
each iteration.

Least Square Method Graph

In linear regression, the line of best fit is a straight line as shown in the following diagram:
The given data points are to be minimized by the method of reducing residuals or offsets of each
point from the line. The vertical offsets are generally used in surface, polynomial and hyperplane
problems, while perpendicular offsets are utilized in common practice.

Least Square Method Formula

The least-square method states that the curve that best fits a given set of observations, is said to
be a curve having a minimum sum of the squared residuals (or deviations or errors) from the
given data points. Let us assume that the given points of data are (x1,y1), (x2,y2), (x3,y3), …, (xn,yn)
in which all x’s are independent variables, while all y’s are dependent ones. Also, suppose that
f(x) be the fitting curve and d represents error or deviation from each given point.
Now, we can write:
d1 = y1 − f(x1)
d2 = y2 − f(x2)
d3 = y3 − f(x3)
…..
dn = yn – f(xn)
The least-squares explain that the curve that best fits is represented by the property that the sum
of squares of all the deviations from given values must be minimum. I.e:

Sum = Minimum Quantity

Limitations for Least-Square Method

The least-squares method is a very beneficial method of curve fitting. Despite many benefits, it
has a few shortcomings too. One of the main limitations is discussed here.
In the process of regression analysis, which utilizes the least-square method for curve fitting, it is
inevitably assumed that the errors in the independent variable are negligible or zero. In such
cases, when independent variable errors are non-negligible, the models are subjected to
measurement errors. Therefore, here, the least square method may even lead to hypothesis
testing, where parameter estimates and confidence intervals are taken into consideration due to
the presence of errors occurring in the independent variables.
The method of finding a and b in y= a+ bx………(i)
The normals of above line are
∑ 𝒚=na +b∑ 𝒙……..(i)
∑ 𝒙𝒚=a∑ 𝒙 +b∑ 𝒙𝟐 ……..(i)
Solving (i) and (ii) then we get the value of a and b.
Now put these values in (i) we get line of regression y on x.
The method of finding c and d in x= c+ dy………(i)
The normals of above line are
∑ 𝒙=nc +d∑ 𝒚……..(i)
∑ 𝒙𝒚=c∑ 𝒚 +d∑ 𝒚𝟐 ……..(i)
Solving (i) and (ii) then we get the value of c and d.
Now put these values in (i) we get line of regression x on y.

Regression Coefficient
Definition: The Regression Coefficient is the constant ‘b’ in the regression equation that tells
about the change in the value of dependent variable corresponding to the unit change in the
independent variable.

If there are two regression equations, then there will be two regression coefficients:

 Regression Coefficient of X on Y: The regression coefficient of X on Y is represented by the

symbol bxy that measures the change in X for the unit change in Y. Symbolically, it can be

represented as: The bxy can be obtained by using the

following formula when the deviations are taken from the actual means of X and Y:

When the deviations are obtained from the assumed mean, the

following formula is used:

 Regression Coefficient of Y on X: The symbol byx is used that measures the change in Y
corresponding to the unit change in X. Symbolically, it can be represented as:

In case, the deviations are taken from the actual means; the following formula is used:

The b can be calculated by using the following formula when the deviations are taken from
yx

the assumed means:

The Regression Coefficient is also called as a slope coefficient because it determines the slope
of the line i.e. the change in the independent variable for the unit change in the independent
variable
Properties of Regression Coefficient
Definition: The constant ‘b’ in the regression equation (Ye = a + bX) is called as the Regression
Coefficient. It determines the slope of the line, i.e. the change in the value of Y corresponding to
the unit change in X and therefore, it is also called as a “Slope Coefficient.”

Properties of Regression Coefficient

1. The correlation coefficient is the geometric mean of two regression coefficients. Symbolically, it can be
expressed as: r=√𝑏𝑥𝑦 𝑏𝑦𝑥
2. The value of the coefficient of correlation cannot exceed unity i.e. 1. Therefore, if one of the regression
coefficients is greater than unity, the other must be less than unity.
3. The sign of both the regression coefficients will be same, i.e. they will be either positive or negative.
Thus, it is not possible that one regression coefficient is negative while the other is positive.
4. The coefficient of correlation will have the same sign as that of the regression coefficients, such as if
the regression coefficients have a positive sign, then “r” will be positive and vice-versa.
5. The average value of the two regression coefficients will be greater than the value of the correlation.

Symbolically, it can be represented as

6. The regression coefficients are independent of the change of origin, but not of the scale. By origin, we
mean that there will be no effect on the regression coefficients if any constant is subtracted from the
value of X and Y. By scale, we mean that if the value of X and Y is either multiplied or divided by some
constant, then the regression coefficients will also change.

Thus, all these properties should be kept in mind while solving for the regression coefficient
MULTIPLE REGRESSION PLANE:
If x, y and z are three variables then regression line of y
on x and z is given by y=a+bx+cz……………..(i)
The normals of equation (i) are
∑ 𝒚 = 𝒏𝒂 + 𝒃 ∑ 𝒙 + 𝒄 ∑ 𝒛…………..(ii)
∑ 𝒙𝒚 = 𝒂 ∑ 𝒙 + 𝒃 ∑ 𝒙𝟐 + 𝒄 ∑ 𝒙𝒛…………..(iii)
∑ 𝒚𝒛 = 𝒂 ∑ 𝒛 + 𝒃 ∑ 𝒙𝒛 + 𝒄 ∑ 𝒛𝟐 …………..(iv)
Solving equations (ii) and (iii) then we will get values of
a, b and c. now putting these values in equation (i) to
obtain regression line of y on x and z.
INFERENCE CONCERNING LEAST SQUARE METHOD:
The regression equation y=a+bx is obtained on the basis
of sample data. We are often interested in
corresponding equation y= 𝜶+𝜷x from the population
from which the samples are drawn. The following is the
test concerning normal population.
A TEST OF HYPOTHESIS CONCERNING THE SLOPE
PARAMETER 𝜷=b.
To test the hypothesis that the regression coefficient 𝜷
is equal to some specific value b, we use the test
𝒃−𝜷
statistic t=
𝒔𝒆
√𝒔𝒙𝒙 with n-2 degree of freedom.
Similarly, the test statistics inference about 𝜶=a.
𝒂−𝜶 𝒏𝒔𝒙𝒙
t= √𝒔 ̅𝟐
with n-2 degree of freedom.
𝒔𝒆 𝒙𝒙 +𝒏𝒙

(𝒔𝒙𝒚 )𝟐
𝒔𝒚𝒚 −
Where 𝒔𝒆 =√
𝒔𝒙𝒙
𝒏−𝟐

CONFIDENCE INTERVAL FOR INTERCEPT AND SLOPE:

i. For intercept 𝜶:
𝟏 ̅𝟐
𝒙
C.I.= a±𝒕𝜶,𝒏−𝟐 x 𝒔𝒆 √ +
𝟐 𝒏 𝒔𝒙𝒙

ii. For slope 𝜷:

𝟏
C.I.=𝒃±𝒕𝜶,𝒏−𝟐 x 𝒔𝒆 √
𝟐 𝒔𝒙𝒙
WORKEDOUT PROBLEMS:
1. The following are the measurements of the air
velocity and evaporation coefficient of burning fuel
droplets in an impulse engine:

Air velocity (cm/sec) Evaporation coefficient

x (𝒎𝒎𝟐 /𝒔𝒆𝒄) = 𝒚
20 0.18
60 0.37
100 0.35
140 0.78
180 0.56
220 0.75
260 1.18
300 1.36
340 1.17
380 1.65
i. Fit a straight line to these data by the method
of least squares and use it to estimate the
evaporation coefficient of a droplet when the
air velocity is 190 cm/sec.
ii. Construct a 95% confidence interval for the
intercept 𝜶 and slope𝜷.
iii. Test the null hypothesis 𝜷=0 verses alternative
hypothesis 𝜷 ≠0 at 5% level of confidence.
iv. Test the null hypothesis 𝜶 = 𝟎 verses
alternative hypothesis 𝜶 ≠ 𝟎 at 5% level of
significance.
Solution:
x y x2 y2 xy
20 0.18 400 0.0324 3.6
60 0.37 3600 0.1369 22.2
100 0.35 10000 0.1225 35
140 0.78 19600 0.6068 109.2
180 0.56 32400 0.3136 100.8
220 0.75 46400 0.5625 165
260 1.18 67600 1.3924 306.8
300 1.36 90000 1.8496 408
340 1.17 115600 1.3689 397.8
380 1.65 144400 2.7225 625
∑ 𝒙=2000 ∑ 𝒚= ∑ 𝒙𝟐 =532 ∑ 𝒚𝟐 = ∑ 𝒙𝒚=
8.35 9.1097 2175.40

Now;
𝟐 (∑ 𝒙)𝟐 (𝟐𝟎𝟎𝟎)𝟐
𝒔𝒙𝒙 =∑ 𝒙 - = 532 - =132,000
𝒏 𝟏𝟎
𝟐 (∑ 𝒚)𝟐 (𝟖.𝟑𝟓)𝟐
𝒔𝒚𝒚 =∑ 𝒚 - = 9.1097 - =2.13745
𝒏 𝟏𝟎
∑𝒙∑𝒚 (𝟐𝟎𝟎𝟎)(𝟖.𝟑𝟓)
𝒔𝒙𝒚 =∑ 𝒙𝒚- = 2175.40 - =505.40
𝒏 𝟏𝟎
∑𝒙
̅=
𝒙 =200
𝒏
∑𝒚
̅=
𝒚 = 0.835
𝒏
𝒔𝒙𝒚 𝟓𝟎𝟓.𝟒𝟎
𝒃 = 𝒃𝒚𝒙 = = =0.00383
𝒔𝒙𝒙 𝟏𝟑𝟐𝟎𝟎
̅ -b𝒙
𝒂=𝒚 ̅ =0.835-(0.00383)200=0.069
(𝒔𝒙𝒚 )𝟐 (𝟓𝟎𝟓.𝟒𝟎)𝟐
𝒔𝒚𝒚 − 𝟐.𝟏𝟑𝟕𝟒𝟓 −
𝒔𝒆 =√ √
𝒔𝒙𝒙 𝟏𝟑𝟐𝟎𝟎𝟎
= 𝒔𝒆 = =0.0253
𝒏−𝟐 𝟏𝟎−𝟐

(1-𝜶)100%=95% ∴ 𝜶=0.05

(i)
The equation of the straight line that best fit
the given data in the sense of least square is
y= a+bx=0.069+0.00383x
∴y=0.069+0.00383x
When x=190 cm/sec then
y=0.069+(0.00383)190=0.80𝒎𝒎𝟐 /𝒔𝒆𝒄
(ii) 95% confidence interval for slope 𝜷 and
intercept 𝜶.
For intercept:
𝟏 ̅𝟐
𝒙
C.I.= a±𝒕 𝜶
,𝒏−𝟐 x 𝒔𝒆 √ +
𝟐 𝒏 𝒔𝒙𝒙
𝟏 (𝟐𝟎𝟎)𝟐
=0.069 ±(𝟐. 𝟑𝟎𝟔)(𝟎. 𝟏𝟓𝟗)x √ +
𝟏𝟎 𝟏𝟑𝟐𝟎𝟎
= 0.069± 0.233
=(-0.069, 0.302)
For slope 𝜷:
𝟏
C.I.=𝒃±𝒕𝜶,𝒏−𝟐 x 𝒔𝒆 √
𝟐 𝒔𝒙𝒙
=(…. , ……)
(iii) A test of hypothesis concerning slope 𝜷=0.
STEP I:
H0: 𝜷=0
H1: 𝜷 ≠0

STEP II:
𝜶=5%= 0.05
STEP III:
ttab= 𝒕𝜶,𝒏−𝟐 = t0.025, 8=2.306
𝟐
STEP IV:
Test statistic under null hypothesis H0: 𝜷=0
𝒃−𝜷
tcal=
𝒔𝒆
√𝒔
𝒙𝒙
𝟎.𝟎𝟎𝟑𝟖𝟑−𝟎
=
𝟎.𝟏𝟓𝟗
√𝟏𝟑𝟐𝟎𝟎𝟎
=8.75
STEP V:(Decision)
∴tcal>ttab so null hypothesis is rejected and
alternative hypothesis is accepted..
STEP VI:(conclusion)
From above procedure we conclude that the
slope 𝜷 ≠0 .
(v) Test hypothesis concerning intercept 𝜶 :
(do your self)
2. Ten still wires of diameter 0.5 mm and length 2.5 m
were extended in a laboratory by applying vertical
forces of varying magnitudes. Results are as follows:
Forces in kg 15 19 25 35 42 48 53 56 62 65
Increase in 1.7 2.1 2.5 3.4 3.9 4.9 5.4 5.7 6.6 7.2
length(mm)
(a) Estimate the parameter of a simple line
regression model with forces as explanatory
variable.
(b) Find 95% confidence limit for the slope of the
line.
2. Find the equation of the regression line of y on x, if
the observations (xi , yi) are the following:
(1,4),(2,8),(3,2),(4,12),(5,10),(6,14),(7,16),(8,6),(9,18)
3. The following table shows the weight z to the
nearest pound, height x to the nearest inch, and
age y to the nearest year, of 12 boys:
Weight(z) 64 71 53 67 55 58 77 57 56 51 76 68
Height(x) 57 59 49 62 51 50 55 48 52 42 61 57
Age (y) 8 10 6 11 8 7 10 9 10 6 12 9

(a) Fit a least square regression plane.

(b) Find the weight of a boy who is 19 years old
and 54 inches tall.
Solution;

x y z 𝒙𝟐 𝒚𝟐 𝒛𝟐 xy yz zx
64 8 57
71 10 59
53 6 49
67 11 62
55 8 51
58 7 50
77 10 55
57 9 48
56 10 52
51 6 42
78 12 61
68 9 57
∑ 𝒙= ∑ 𝒚= ∑ 𝒛= ∑ 𝒙𝟐 = ∑ 𝒚𝟐 = ∑ 𝒛𝟐 = ∑ 𝒙 𝒚= ∑ 𝒚𝒛= ∑ 𝒛𝒙=
643 106 753 34843 976 48 5779 6796 40830

(a) The linear regression equation z on x and

y can be written as,
z=a+bx+cy…………..(i)
The normals of (i) are
∑ 𝒛 = 𝒏𝒂 + 𝒃 ∑ 𝒙 + 𝒄 ∑ 𝒚.
∑ 𝒙𝒛 = 𝒂 ∑ 𝒙 + 𝒃 ∑ 𝒙𝟐 + 𝒄 ∑ 𝒙𝒚.
∑ 𝒚𝒛 = 𝒂 ∑ 𝒚 + 𝒃 ∑ 𝒙𝒚 + 𝒄 ∑ 𝒚𝟐 .
Then
𝟕𝟓𝟑 = 𝟏𝟐𝒂 + 𝟔𝟒𝟑𝒃 + 𝟐𝟎𝟎𝒄…………..(ii)
𝟒𝟎. 𝟖𝟖𝟑𝟎 = 𝟔𝟒𝟑𝒂 + 𝟑𝟒𝟖𝟒𝟑𝒃 + 𝟓𝟕𝟕𝟗𝒄…………..(iii)
𝟔𝟕𝟗𝟔 = 𝟏𝟎𝟔𝒂 + 𝟓𝟕𝟕𝟗𝒃 + 𝟗𝟕𝟔𝒄…………..(iv)
Solving (ii), (iii) and (iii) we get,
a=3.6512, b=0.8546 and c=1.5063
The required regression plane
z=3.6512+(0.855)x+(1.506)y
(b)
When x=54 and y=9 then
z=3.6512+(0.855)54+(1.506)9
=63.356
≈ 63 pound
4. The following table gives the measurement of train
resistance; V is the velocity in miles per hour, R is
the resistance in pound per ton:
V 20 40 60 80 100 120
R 5.5 9.1 14.9 22.8 33.3 46
If R is related to V by the relation R=A+BV+C𝑽𝟐 find A, B
and C.
Solution:
Here the number of observation is even. The two
middle values of V are 60 and 80. The mean values of V
are 60 and 80, the mean of 60 and 80 is 70.
𝑽−𝟕𝟎
We take, x= ; y=R-22.8.
𝟏𝟎

Let y = a+bx+c𝒙𝟐 …………(i)

The normal equations are
∑ 𝒚 = 𝒏𝒂 + 𝒃 ∑ 𝒙 + 𝒄 ∑ 𝒙𝟐 .
∑ 𝒙𝒚 = 𝒂 ∑ 𝒙 + 𝒃 ∑ 𝒙𝟐 + 𝒄 ∑ 𝒙𝟑 .
∑ 𝒙𝟐 𝒚 = 𝒂 ∑ 𝒙𝟐 + 𝒃 ∑ 𝒙𝟑 + 𝒄 ∑ 𝒙𝟒 .
x y xy 𝒙𝟐 𝒙𝟑 𝒙𝟒 𝒙𝟐 𝒚
-5 -17.3 86.5 25 -125 625 -432.5
-3 -13.7 41.1 9 -27 81 -123.3
-1 -7.9 7.9 1 -1 1 -7.9
0 0 0 1 1 1 0
3 10.5 31.5 9 27 81 94.5
5 23.2 116 25 125 625 580
∑ 𝒙=0 ∑ 𝒚=-5.2 ∑ 𝒙𝒚=283 ∑ 𝒙𝟐 =70 ∑ 𝒙𝟑 =0 ∑ 𝒙𝟒 =1414 ∑ 𝒙𝟐 𝒚=
110.8

Substituting in above normal equations

−𝟓. 𝟐 = 𝟔𝒂 + 𝟎. 𝒃 + 𝟕𝟎𝒄……………(ii)
𝟐𝟖𝟑 = 𝟎. 𝒂 + 𝟕𝟎𝒃 + 𝟎. 𝒄 … ….. …….(iii)
𝟏𝟏𝟎. 𝟖 = 𝟕𝟎𝒂 + 𝟎. 𝒃 + 𝟏𝟒𝟏𝟒𝒄…………(iv)
Solving equations (ii), (iii) and (iv) we get
a=-4.25, b=4.04 and c=0.29
Hence, y=-4.25+4.04x+0.29x2
𝑽−𝟕𝟎 (𝑽−𝟕𝟎)𝟐
Or, R-22.8= -4.25+4.04 +0.29x
𝟏𝟎 𝟏𝟎𝟐

R= 3.48-0.002V+0.0029V2
Comparing with R=A+BV+C𝑽𝟐 then we get;
A=3.48, B=-0.002 and C=0.0029

Scatter Plots

A Scatter (XY) Plot has points that show the relationship between two
sets of data.

In this example, each dot shows one person's weight versus their height.
(The data is plotted on the graph as "Cartesian (x,y) Coordinates")

Example:
The local ice cream shop keeps track of how much ice cream they sell versus
the noon temperature on that day. Here are their figures for the last 12 days:

Ice Cream Sales vs Temperature

Temperature °C Ice Cream Sales

14.2° $215

16.4° $325

11.9° $185

15.2° $332

18.5° $406

22.1° $522

19.4° $412

25.1° $614

23.4° $544

18.1° $421

22.6° $445

17.2° $408

And here is the same data as a Scatter Plot:

It is now easy to see that warmer weather leads to more sales, but the
relationship is not perfect.

Line of Best Fit

We can also draw a "Line of Best Fit" (also called a "Trend Line") on our scatter
plot:

Try to have the line as close as possible to all points, and as many points
above the line as below.

But for better accuracy we can calculate the line using Least Squares
Regression and the Least Squares Calculator.

Example: Sea Level Rise

A Scatter Plot of Sea Level

Rise:
And here I have drawn on
a "Line of Best Fit".

Interpolation and Extrapolation

Interpolation is where we find a value inside our set of data points.

Here we use linear interpolation to estimate the sales at 21 °C.

Extrapolation is where we find a value outside our set of data points.

Here we use linear extrapolation to estimate the sales at 29 °C (which is

higher than any value we have).

Careful: Extrapolation can give misleading results because we are in

"uncharted territory".

As well as using a graph (like above) we can create a formula to help us.
Example: Straight Line Equation

We can estimate a straight line equation from two points from the graph above

Let's estimate two points on the line near actual values: (12°,
$180) and (25°, $610)

First, find the slope:

slope "m" = change in ychange in x

= $610 − $18025° − 12°
= $43013°
= 33 (rounded)

Now put the slope and the point (12°, $180) into the "point-slope" formula:

y − y1 = m(x − x1)

y − 180 = 33(x − 12)

y = 33(x − 12) + 180

y = 33x − 396 + 180

y = 33x − 216

INTERPOLATING

Now we can use that equation to interpolate a sales value at 21°:

y = 33×21° − 216 = $477

EXTRAPOLATING

And to extrapolate a sales value at 29°:

y = 33×29° − 216 = $741

The values are close to what we got on the graph. But that doesn't mean they
are more (or less) accurate. They are all just estimates.

Don't use extrapolation too far! What sales would you expect at 0° ?

y = 33×0° − 216 = −$216

Hmmm... Minus $216? We extrapolated too far!

Note: we used linear (based on a line) interpolation and extrapolation, but

there are many other types, for example we could use polynomials to make
curvy lines, etc

QUM2 Task 1 Linear Regression Analysis
No ratings yet
QUM2 Task 1 Linear Regression Analysis
5 pages
Partial and Multiple Correlation
33% (3)
Partial and Multiple Correlation
2 pages
QA Notes
No ratings yet
QA Notes
8 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
69 pages
Correlation
100% (2)
Correlation
5 pages
Chapter 4 (Correlation Part)
No ratings yet
Chapter 4 (Correlation Part)
16 pages
Lecture 5
No ratings yet
Lecture 5
30 pages
Chapter 4
No ratings yet
Chapter 4
27 pages
curs8-BA-partial Correlation-14.05
No ratings yet
curs8-BA-partial Correlation-14.05
12 pages
Derivation of Partial Correlation Coefficient Formula
100% (1)
Derivation of Partial Correlation Coefficient Formula
17 pages
Correlation Ansd Simple Regression
No ratings yet
Correlation Ansd Simple Regression
27 pages
Correction and Regression
No ratings yet
Correction and Regression
30 pages
Chapter Four Correlation Analysis: Positive or Negative
No ratings yet
Chapter Four Correlation Analysis: Positive or Negative
15 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
71 pages
Final Exam Guidelines-1
No ratings yet
Final Exam Guidelines-1
9 pages
Statistics Chap-7
No ratings yet
Statistics Chap-7
14 pages
Computer Numerical and Statistical Method Unit 2 Calicut Univercitty Note
No ratings yet
Computer Numerical and Statistical Method Unit 2 Calicut Univercitty Note
17 pages
Correlation
No ratings yet
Correlation
5 pages
Topic 2 - Correlation Theory
No ratings yet
Topic 2 - Correlation Theory
15 pages
Ch.12 Correlation
No ratings yet
Ch.12 Correlation
10 pages
Correlation
No ratings yet
Correlation
44 pages
14-Multiple Correlation and Regression-08!02!2024
No ratings yet
14-Multiple Correlation and Regression-08!02!2024
14 pages
Lecture-25 CORRELATION - 34861774 - 2024 - 05 - 04 - 23 - 38
No ratings yet
Lecture-25 CORRELATION - 34861774 - 2024 - 05 - 04 - 23 - 38
4 pages
Correlation and Regression
No ratings yet
Correlation and Regression
43 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
2.5 Multiple Linear Correlation
No ratings yet
2.5 Multiple Linear Correlation
4 pages
Ch.-1 Correlation, Regression and Curve Fitting
No ratings yet
Ch.-1 Correlation, Regression and Curve Fitting
22 pages
Correlation
No ratings yet
Correlation
33 pages
Correlation Regression
No ratings yet
Correlation Regression
5 pages
Correlation
No ratings yet
Correlation
6 pages
Correlation and Regression
100% (1)
Correlation and Regression
45 pages
Regression and Correlation - Upload Compatibility Mode
No ratings yet
Regression and Correlation - Upload Compatibility Mode
31 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
Study On Linear Correlation Coefficient and Nonlinear Correlation Coefficient in Mathematical Statistics
No ratings yet
Study On Linear Correlation Coefficient and Nonlinear Correlation Coefficient in Mathematical Statistics
6 pages
Correlation and Recession
No ratings yet
Correlation and Recession
45 pages
Group Assignment
No ratings yet
Group Assignment
3 pages
PSNM - Ch. 1
No ratings yet
PSNM - Ch. 1
16 pages
Correlation and Regression
No ratings yet
Correlation and Regression
59 pages
Multiple Correlation
No ratings yet
Multiple Correlation
12 pages
4 TH Unit Partial and Multiple Corr
No ratings yet
4 TH Unit Partial and Multiple Corr
8 pages
5-Correlation and Rank Correlation-03!02!2025
No ratings yet
5-Correlation and Rank Correlation-03!02!2025
60 pages
1.2. Ch-2 - Correlation Theory-1
No ratings yet
1.2. Ch-2 - Correlation Theory-1
29 pages
Correction
No ratings yet
Correction
10 pages
Corelation
No ratings yet
Corelation
7 pages
Correlation
No ratings yet
Correlation
5 pages
Portion 10
No ratings yet
Portion 10
55 pages
Correlation
No ratings yet
Correlation
8 pages
Chapter 4 - Correlation
No ratings yet
Chapter 4 - Correlation
14 pages
r23 P & S Unit 2 Material
No ratings yet
r23 P & S Unit 2 Material
14 pages
Correlation
No ratings yet
Correlation
34 pages
Module 2 Unit 4
No ratings yet
Module 2 Unit 4
4 pages
Correlation and Regression
No ratings yet
Correlation and Regression
39 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Unit 3 Fod
No ratings yet
Unit 3 Fod
18 pages
Document
No ratings yet
Document
7 pages
Correlation Analysis
No ratings yet
Correlation Analysis
4 pages
Correlation Coefficient, R
No ratings yet
Correlation Coefficient, R
3 pages
Unit 9 Part 2
No ratings yet
Unit 9 Part 2
7 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Mathematical Analysis 1: theory and solved exercises
From Everand
Mathematical Analysis 1: theory and solved exercises
Alessio Mangoni
5/5 (1)
6 +ARTIKEL+Nur+Rahma
No ratings yet
6 +ARTIKEL+Nur+Rahma
9 pages
Addactis en Ifrs17 Ebook How You Will Crash Project
No ratings yet
Addactis en Ifrs17 Ebook How You Will Crash Project
11 pages
Linreg
No ratings yet
Linreg
54 pages
Module 6.2 Demography and Health Services
No ratings yet
Module 6.2 Demography and Health Services
30 pages
GLM & Logistic
No ratings yet
GLM & Logistic
26 pages
Pre&post Test Grade 5
No ratings yet
Pre&post Test Grade 5
2 pages
Lesson 2 - Applications of Valuing Cash Flows
No ratings yet
Lesson 2 - Applications of Valuing Cash Flows
10 pages
Semester 2 - Actuarial Science
No ratings yet
Semester 2 - Actuarial Science
1 page
STATA 2 Class
No ratings yet
STATA 2 Class
3 pages
004-020guidelines On Valuation Basis For Takaful Family Business
No ratings yet
004-020guidelines On Valuation Basis For Takaful Family Business
39 pages
CR Open Book Notes 2019
No ratings yet
CR Open Book Notes 2019
61 pages
Beard Risk Theory - The Stochastic Basis
No ratings yet
Beard Risk Theory - The Stochastic Basis
206 pages
Exam Handbook
No ratings yet
Exam Handbook
102 pages
Tutorial 2 PSNM (2024-25) Unit-1 Correlation, Regression and Curve Fitting
No ratings yet
Tutorial 2 PSNM (2024-25) Unit-1 Correlation, Regression and Curve Fitting
2 pages
ZTBL 2008
No ratings yet
ZTBL 2008
62 pages
SPA3e 2.5 LecturePPT
No ratings yet
SPA3e 2.5 LecturePPT
15 pages
Gyetvai Attila CV
No ratings yet
Gyetvai Attila CV
2 pages
CS1 and CS2 Guide Jan 20 Final PDF
No ratings yet
CS1 and CS2 Guide Jan 20 Final PDF
9 pages
Preview. Do Not Post or Distribute.: 2014 Edition
No ratings yet
Preview. Do Not Post or Distribute.: 2014 Edition
27 pages
Employee Benefits Pas 19
100% (1)
Employee Benefits Pas 19
44 pages
R Programing 6 Feb
No ratings yet
R Programing 6 Feb
10 pages
Study Designs NCD
No ratings yet
Study Designs NCD
50 pages
Detecting and Resolving Model Specification Errors in STATA
No ratings yet
Detecting and Resolving Model Specification Errors in STATA
7 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
45 pages
2421-Article Text-7513-2-10-20230126
No ratings yet
2421-Article Text-7513-2-10-20230126
9 pages
Using Multivariate Statistics 7th Edition Barbara G. Tabachnickdownload
100% (2)
Using Multivariate Statistics 7th Edition Barbara G. Tabachnickdownload
51 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
Topic 6 - FE, RE and Tests
No ratings yet
Topic 6 - FE, RE and Tests
46 pages
Biostatistics Notes
No ratings yet
Biostatistics Notes
43 pages

Correlation and Regression

Uploaded by

Correlation and Regression

Uploaded by

CORRELATION

Consider the sum of square as,

Or, (n-1)±𝟐(𝒏 − 𝟏)𝒓𝒙𝒚 +(n-1)≥ 𝟎

Properties of simple correlation coefficient

(ii)If 𝒓𝒙𝒚 =-1 then x and y have perfectly negative

(iv) If 𝒓𝒙𝒚 =0, then x and y have no correlation.

PROPERTIES OF PARTIAL CORRELATION COEFFICIENT:

1. The subscripts before the dot can be interchanged

𝒓𝟑𝟏 𝟐 +𝒓𝟑𝟐 𝟐 −𝟐𝒓𝟑𝟏 𝒓𝟑𝟐 𝒓𝟏𝟐

PROPERTIES OF MULTIPLE CORRELATION COEFFICIENT

4. Write the properties of correlation coefficient and

Find the correlation coefficient and coefficient of

Calculate the coefficient of multiple correlation and

13. In trying to evaluate the effectiveness of

Find strength and direction of relationship between

Least Square Method

Least Square Method Definition

 Ordinary or linear least squares

Least Square Method Graph

Least Square Method Formula

Sum = Minimum Quantity

Limitations for Least-Square Method

 Regression Coefficient of X on Y: The regression coefficient of X on Y is represented by the

represented as: The bxy can be obtained by using the

following formula is used:

the assumed means:

Properties of Regression Coefficient

Symbolically, it can be represented as

CONFIDENCE INTERVAL FOR INTERCEPT AND SLOPE:

ii. For slope 𝜷:

Air velocity (cm/sec) Evaporation coefficient

(a) Fit a least square regression plane.

(a) The linear regression equation z on x and

Let y = a+bx+c𝒙𝟐 …………(i)

Substituting in above normal equations

Ice Cream Sales vs Temperature

Temperature °C Ice Cream Sales

And here is the same data as a Scatter Plot:

Line of Best Fit

Example: Sea Level Rise

A Scatter Plot of Sea Level

Interpolation and Extrapolation

Here we use linear interpolation to estimate the sales at 21 °C.

Extrapolation is where we find a value outside our set of data points.

Here we use linear extrapolation to estimate the sales at 29 °C (which is

Careful: Extrapolation can give misleading results because we are in

First, find the slope:

slope "m" = change in ychange in x

y − 180 = 33(x − 12)

y = 33(x − 12) + 180

y = 33x − 396 + 180

Now we can use that equation to interpolate a sales value at 21°:

y = 33×21° − 216 = $477

And to extrapolate a sales value at 29°:

y = 33×29° − 216 = $741

y = 33×0° − 216 = −$216

Hmmm... Minus $216? We extrapolated too far!

Note: we used linear (based on a line) interpolation and extrapolation, but

You might also like