0% found this document useful (0 votes)
19 views109 pages

Chapter 3 Complete

The document provides an overview of correlation and regression analysis, detailing methods such as scatter diagrams, correlation coefficients, and regression equations. It explains the differences between Pearson and Spearman correlation coefficients, along with the assumptions required for regression analysis. Additionally, it includes examples and formulas for calculating correlation and regression, emphasizing the importance of understanding the relationship between dependent and independent variables.

Uploaded by

Sandesh Shrestha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views109 pages

Chapter 3 Complete

The document provides an overview of correlation and regression analysis, detailing methods such as scatter diagrams, correlation coefficients, and regression equations. It explains the differences between Pearson and Spearman correlation coefficients, along with the assumptions required for regression analysis. Additionally, it includes examples and formulas for calculating correlation and regression, emphasizing the importance of understanding the relationship between dependent and independent variables.

Uploaded by

Sandesh Shrestha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

Correlation And Regression Analysis

Parishwar Acharya
Scatter Diagram
Scatter diagram is a graphical method to display the relationship
between two variables
Scatter diagram plots pairs of bivariate observations (x, y) on the XY
plane
Y is called the dependent variable
X is called an independent variable
Correlation
Correlation analysis is used to measure the degree of
relationship between two or more variables.
Only concerned with strength of the relationship
No causal effect is implied
It may be Simple, Partial or Multiple
Scatter Plot Examples
Linear relationships Curvilinear relationships
y y

x x
y y

x x
Scatter Plot Examples
Strong relationships Weak relationships
y y

x x
y y

x x
Scatter Plot Examples
No relationship
y

x
y

x
Simple Correlation coefficient (r)
It is also called Pearson's correlation or product moment
correlation coefficient.
It measures the nature and degree of relationships between
two variables of the quantitative type.
If the sign is positive this means the relation is direct (an increase in
one variable is associated with an increase in the other variable and a
decrease in one variable is associated with a decrease in the other
variable).

While if the sign is negative this means an inverse or indirect


relationship (which means an increase in one variable is associated
with a decrease in the other.
• The value of r ranges between ( -1) and ( +1)

• The value of r denotes the strength of the association as


illustrated.

• If r = 0 this means no association or correlation between the


two variables
If 0 < r < 0.25= weak correlation.
If 0.25 ≤ r < 0.75= intermediate correlation.
If 0.75 ≤ r < 1 = strong correlation.
If r = 1 perfect correlation.
Note : ρ = population correlation coefficient
Computing Formula
Where,
Sxx= ∑x2 – [(∑x)2 /n]
Syy = ∑y2 – [(∑y)2 /n]
Sxy= ∑xy – [(∑x) (∑y) /n]
Below two formula are of same type
Example
The table shows the heights and weights of n = 10 randomly
selected college football players.

Player 1 2 3 4 5 6 7 8 9 10
Height (x) 8 9 7 6 13 7 11 12 9 14

Weight (y) 35 49 27 33 60 21 45 51 46 65
• Calculate correlation coefficient and interpret your result.
• Is there evidence of a linear relationship between weight
and height at 0.05 level of significance?
• Also plot scatter diagram.
Solution:
X Y X2 Y2 XY
8 35 64 1225 280
9 49 81 2401 441
7 27 49 729 189
6 33 36 1089 198
13 60 169 3600 780
7 21 49 441 147
11 45 121 2025 495
12 51 144 2601 612
9 46 81 2116 414
14 65 196 4225 910
∑X=96 ∑Y=432 ∑X2=990 ∑Y2=20452 ∑XY=4466
As t cal > t tab, H0 is rejected.
There is evidence of a linear relationship between weight
and height at 0.05 level of significance.
Spearman Rank Correlation Coefficient

It is a non-parametric measure of correlation. (reason it does not assume that the


data follows normal distribution.

This procedure makes use of the two sets of ranks that may be assigned to the
sample values of x and y.

Spearman Rank correlation coefficient could be computed in the following cases:

Both variables are quantitative.

Both variables are qualitative ordinal.

One variable is quantitative and the other is qualitative ordinal.


Procedure
1. Rank the values of X from 1 to n where n is the numbers of pairs of
values of X and Y in the sample.

2. Rank the values of Y from 1 to n.

3. Compute the value of di = (Yi – Xi) for each pair of observation by


subtracting the rank of Yi from the rank of Xi

4. Square each di and compute ∑(di)2 which is the sum of the squared
values.
Example
Twelve appearance in painting competition were ranked by two judges as
shown below

Entry 1 2 3 4 5 6 7 8 9 10 11 12

Judge I 5 2 3 4 1 6 8 7 10 9 12 11

Judge II 4 5 2 1 6 7 10 9 11 12 3 8

Find the coefficient of rank correlation.


Answer: r = 0.4615
Entry Judge I Rank Judge II Rank d (Difference) d²
1 (ram bahadur) 5 4 1 1
2 (shyam bahadur) 2 5 -3 9
3 ( so on) 3 2 1 1
4 4 1 3 9
5 1 6 -5 25
6 6 7 -1 1
7 8 10 -2 4
8 7 9 -2 4
9 10 11 -1 1
10 9 12 -3 9
11 12 3 9 81
12 11 8 3 9
Example
The following are the marks obtained by two group of students in two
papers.
78 36 98 25 75 82 92 62 75 39
Marks in
Research
84 51 91 69 68 62 86 58 35 49
Marks in Q.
Methods

Find the coefficient of rank correlation.


Also calculate Karl Pearson's correlation coefficient.
To find the coefficient of rank correlation, we need to first rank the data for both variables,
then calculate the difference between the ranks, and finally use the formula for Spearman's
rank correlation coefficient.
Marks in Q.
Marks in Research Rank (R1) Rank (R2)
Methods
98 1 91 1
92 2 86 2
82 3 84 3
78 4 69 4
75 5.5 68 5
75 5.5 62 6
62 7 58 7
39 8 51 8
36 9 49 9
25 10 35 10
Rank (R1) Rank (R2) di = (R1 - R2) di 2
1 1 0 0
2 2 0 0
3 3 0 0
4 4 0 0
5.5 5 0.5 0.25
5.5 6 -0.5 0.25
7 7 0 0
8 8 0 0
9 9 0 0
10 10 0 0

di2 = 0 + 0 + 0 + 0 + 0.25 + 0.25 + 0 + 0 + 0 + 0 = 0.5


Marks in Marks in Q.
XY X2 Y2
Research(X) Methods (Y)
98 91 8918 9604 8281
92 86 7912 8464 7396
82 84 6888 6724 7056
78 69 5382 6084 4761
75 68 5100 5625 4624
75 62 4650 5625 3844
62 58 3596 3844 3364
39 51 1989 1521 2601
36 49 1764 1296 2401
25 35 875 625 1225
∑ X = 662 ∑ Y = 653 ∑ XY = 47074 ∑ X2 = 49412 ∑ Y2= 45553
Difference between Spearman rank correlation coefficient and Karl Pearson
correlation coefficient
Pearson correlation uses the actual data values, while Spearman correlation uses the
ranks of the data.
Pearson correlation measures the linear relationship, while Spearman correlation
measures the monotonic relationship.
Pearson correlation is more sensitive to outliers, while Spearman correlation is
more robust to outliers.
So, in this case, the rank correlation based on the marks would not necessarily be
equal to the Pearson correlation coefficient, unless the relationship between the two
variables is perfectly linear.
Assumption for regression analysis
1. Linearity
It is assumed that dependent variable should be linear function of independent variables in linear regression model. If
the assumption is violated, then there is no linear relationship between the dependent & independent variable
2. Normality
Residuals or errors generated from the fitted regression model are assumed to be normally distributed. If these
assumptions are violated, it creates the doubt in estimation of regression line & testing of regression coefficients
3. Homoscedasticity
This assumptions requires that the variation around the line of regression be constant for all values of independent
variables. This means that on error is same for low values as well as high values of independent variable.
4. Independents of error
This assumptions requires that the errors around the regression line be independent for each value of independent
variable. This assumption is more important for time series data.
Regression Analysis

The process of predicting one variable by using other knowing variable.

The process of predicting variable y using variable x.

Uses a variable x to predict some outcome variable y.

The process of predicting variable x using variable y.

Uses a variable y to predict some outcome variable x.

Tells you how values in y change as a function of changes in values of x

Regression tells us how to draw the straight line described by the correlation

Regression may be simple or multiple


Simple Regression
The regression analysis confined to the study of only two variables
(simple regression).
The variable which is used for prediction is called independent or
predictor or controlled variable
The variable which is to be predicted is called dependent or response
variable.
Lines of regression
Line of regress of y on x is the line which gives the best
estimate for the values of y for any specified value of x.
Line of regress of x on y is the line which gives the best
estimate for the values of x for any specified value of y.
Regression line of y on x
The regression line of y on x gives the best estimated value of y
for given values of x.
The regression equation of y on x is
y = a + bx …..(i)
where a is constant or y-intercept and b is the slope of
regression line (i) or regression coefficient of y on x which is
denoted by byx.
Computing Formula
• byx = Sxy / Sxx

• Sxx= ∑x2 – [(∑x)2 /n]

• Syy = ∑y2 – [(∑y)2 /n]

• Sxy= ∑xy – [(∑x) (∑y) /n]


Interpretation of Parameter of Simple linear regression Model
α = Y intercept = average value of Y when the value of X is equal to
zero.
b1 = expected change in Y per unit change in X
= average change in response variable Y ( either in positive or in
negative direction ) for one unit changes in explanatory variable X.
e = Random error in response variable Y for each observation that
occurs.
Example
The following data were collected on the height (inches (X)) and weight (pounds(Y)) of women swimmers.
a. Develop a scatter diagram for these data with height as independent variable.
b. What does the scatter diagram diagram tell you about the relationship between two variables.
c. Approximate the relationship between height and weight by drawing a straight line through the data ..
d. Obtain the estimated linear regression equation of weight on height.
e. If swimmers height is 63 inches, what would you estimate her weight to be ?
f. Why would it not be appropriate to predict the weight of women if she has weight of 75 inches? Explain

Height 68 64 62 65 66
Weight 132 108 102 115 128
Y

Weight

X
Height
Calculation table: n = 5
Height (X) Weight (Y) Y2 XY
68 132
64 108
62 102
65 115
66 128
∑X = 325 ∑Y = 585 ∑Y2 = 69101 ∑XY = 38135
Questions
1. The annual expenditures (in lakhs of rupees) and the corresponding annual sales (in
crores of rupees) for the past 10 years of a company are presented in the following table.
a. Find the correlation coefficient between annual advertising expenditure and annual
sales revenue and comment the result.
b. Develop a regression model of sales at a function of advertising expenditures.
Predict the value of annual sales while advertising expenditures was 27 lakhs of Rs.
Regression line of x on y
The regression line of x on y gives the best estimated value of x for
given values of y.
The regression equation of x on y is
x = a’ + b’y …..(i)
where a’ is constant or x-intercept and b’ is the slope of regression
line (i) or regression coefficient of x on y which is denoted by bxy.
Computing Formula
Here just we interchange x as y and y as x in every formula
and formula becomes as below

• bxy = Sxy / Syy


• Sxx= ∑x2 – [(∑x)2 /n]
• Syy = ∑y2 – [(∑y)2 /n]
• Sxy= ∑xy – [(∑x) (∑y) /n]
When should we have to kept dependent Y on place of x and x on a
place of y
Y is dependent variable and x is independent we know how to identify
Let we have an example
The following data were collected on the height (inches (X)) and weight
(pounds(Y)) of women swimmers
We know that effect will be seen on weight due to height. So that weight
will be dependent variable and height is independent variable.

Height (X) 68 64 62 65 66
Weight(Y) 132 108 102 115 128
In the linear regression model we will predict the value of dependent variable(Y) then we have
regression equation as y to x . But if we have to predict the value of independent variable (X)
then we will change y to x into x to y. for example
The following data were collected on the height (inches (X)) and weight (pounds(Y)) of women
swimmers
And the condition is of below
Predict the value of height for a given value of weight 105 pound.
Then we have to change height as y(in order to predict) and weight as x.

Height (X) 68 64 62 65 66
Weight(Y) 132 108 102 115 128
Then the regression equation will be
x = a + b1 y. now apply the rules and estimate the parameter then predict x for given y.
Standard Error of the estimate
If the least squares regression of y on x is given
y = a + bx , then the standard error of estimate is given
by
se2 = [syy – (sxy)2/sxx]/ n – 2
For the line x = a’ + b’y,
se2 = [sxx – (sxy)2/syy]/ n – 2
Test of significance of intercept parameter β
In order to test the significance of the regression coefficient of the simple linear regression model:
Y = α + b1 X
, following two statistical test have been applied.
I. t-test for significance in simple linear regression model
II. F-test for significance in simple linear regression model
t-test for significance in simple linear regression model
Ho : β = 0 , (There is no significant relationship between dependent and independent variable).
H1 : β ≠ 0,( There is significant relationship between dependent and independent variable).
Degree of freedom = n – 2
Choose the level of significance
Under Ho : test statistic is (follows n-2 degree of freedom)
You can use any one of the below formula

You can also use the


Conclusion : If tcal < t tab , H0 is accepted otherwise rejected.
Confidence Interval : β ± tα/2× [se / √Sxx ]
Analysis of variance (ANOVA) table for simple linear regression
Source of Sum of squares Degree of Mean Square F
variation freedom
Regression K=1

Error n-2

Total n-1
where
K = no of independent variable
n = number of observation
Decision Rule
Reject H0 if computed value of F > tabulated value of F with one degree of freedom in numerator and
(n-2) degree of freedom in the denominator at α % level of significance and accept otherwise.
Using p-value , reject the null hypothesis if p-value < α
Example
Cost accounts often estimate overhead based on the level of production.
At the Standard Knitting Co., they have collected information on
overhead expenses and units produced at different plants and want to
estimate a regression equation to predict future overhead.
Overhead 191 170 272 155 280 173 234 116 153 178

Units 40 42 53 35 56 39 48 30 37 40
Develop the regression equations for the cost accounts.
Predict overhead when 50 units are produced.
Estimate the number of production units if Company invests 400.
Example
The following measurements show the respective heights in inches of 10
fathers and their eldest sons.
Height of 66 67 63 71 69 65 62 70 61 72
father (X)

Height of 65 68 66 65 70 67 67 71 62 63
son (y)
Obtain the regression line of son’s height and estimate the height of
son when his father is found to be 70 inches high.
Obtain the regression line of father’s height and estimate the height of
father when his son is found to be 80 inches high.
Multiple Regression
Multiple regression analysis is a straightforward extension of simple
regression analysis which allows more than one independent variable.
It is used to estimate or predict the value of one dependent variable
when the values of two or more independent variables are known.
Multiple regression equation
The multiple regression equation of dependent variable y on two independent
variables x1 and x2 is given by
y = a + b x1 + c x2 ……(i)
where
a = value of y when x1 = 0 and x2 =0
b = Partial regression coefficient of
y on x1 when x2 is constant.
c = Partial regression coefficient of
y on x2 when x1 is constant.
Note that a, b , c are parameters of the equation whose values are to
determined.
Using the Principle of least square estimation , the normal equations of
line (i) are
∑y = na + b∑x1 + c∑x2 ….(ii)
∑ yx1 = a∑x1 + b∑x12 + c∑x1x2 ….(iii)
∑ yx2 = a∑x2 + b∑x1x2 + c∑x22 ….(iv)
Solving (ii), (iii), (iv), we get the values of a, b , c.
Substituting the values of a, b , c in (i), we get required multiple
regression equation of y on x1 and x2.
The multiple regression equation of dependent variable Y on n
independent variables x1 ,x2 , x3 ,…, xn is given by
Y = a + b1 x1 + b2 x2 +…… bn xn …(iv)
The values of α, ß1 and ß2 can be estimated by using the least square method.
The normal equation with two independent variables are given below
∑Y = n α + b1 ∑X1 + b2 ∑X2
∑ X1 Y = α ∑X1 + b1 ∑X12 + b2 ∑X1X2
∑ X2 Y = α ∑X2 + b1 ∑ X1X2 + b2 ∑X22
The values of α, ß1 and ß2 can be obtained by solving these three equations.
Source of Sum of Degree of Mean square F
variation squares (SS) freedom(df) (MS)
Regression SSR k

Error SSE n-k-1


(Residual)
Total SST n-1
t-test for significance in Multiple linear regression model
This test is applied to test whether the regression coefficient ß is statistically significant
or not.
The following hypothesis is set up
Setting of Hypothesis
Null Hypothesis, H0: ß j = 0 ( There is no significant relationship between dependent
and independent variable)
Alternative Hypothesis, H1: ß j ≠ 0 (There is significant relationship between dependent
and one independent variable)
Level of significant
Usually, α = 5% if not given.
Question
A random sample of 5 families yields the following data

Family A B C D E
Saving(Y)(Rs.000 6 12 10 7 3
)
Income X1 8 11 9 6 6
(RS.000)
No. of 5 2 1 3 4
children(X2)

I. Estimate the regression equation of Y on X1 and X2.


II. Find the coefficient of multiple determination and interpret its meaning.
Now the normal equation is,
∑Y = n α + b1 ∑X1 + b2 ∑X2
∑ X1 Y = α ∑X1 + b1 ∑X12 + b2 ∑X1X2
∑ X2 Y = α ∑X2 + b1 ∑ X1X2 + b2 ∑X22

Y Y2 X1 X2 X12 X22 X1X2 X1Y X2Y

∑Y=38 ∑Y2=33 ∑X1=40 ∑X2= 15 ∑X12=33 ∑X22=55 ∑X1X2= ∑X1Y=3 ∑X2Y=9


8 8 113 30 7

Now the normal equation becomes


38 = 5 α + 40 b1 + 15 b2
330 = 40 α + 338 b1 + 113 b2
97 = 15 α + 113 b1 + 55 b2
Solving the normal equation
α =1.829, b1 =1.076 and b2= -0.946
Source of variation Sum of squares (SS) Degree of Mean square (MS) F
freedom(df)
Regression SSR =44.076 K=2

Error SSE = 5.124 n-k-1 =5-2-1=2


(Residual)
Total SST =49.2 n-1 = 5-1 = 4

Ftab = 8.60 and Fcalc at 5% level of significance and (k, n-k-1) = (2,2) degree of freedom = 19
Decision
Ftab (8.60) < Fcalc (19) .we accept H1
Conclusion
There is linear relationship between dependent and at least one of the independent variable .
Again if question is asked which independent variable is significant then again we have to do
individual test called t test as like in simple linear regression model.
Question
1. Police stations across the country are interested in predicting the number of arrests they can expert to process each month so as
to better schedule office employee. Historically the average number of arrests (Y) each month is influenced by the number of
officers on police force (X1), the population of the city in thousands (X2) and the percentage of unemployed people in the city
(X3). The SPSS partial output for these factors in 15 cities are presented below.
Coefficient Table
Coefficients (bi) Standard error(sbi) t P-value
Constant 142.4363 25.96474 5.49 <0.001
X1 3.2741 0.2814354 11.64 <0.001
X2 0.5269 0.4693494 1.12 0.287
X3 -0.3203 1.295351 -0.24 0.812

ANOVA Table
Source of variation Sum of squares df Mean Square F
Regression 230500.663 3 76833.5544 246.41 (p-value
<0.001)
Residual 3429.9279 11 311.811627
Total 233930.591 14 16709.3279
a. Using the above output, determine the best-fitting regression equation.
b. What percentage of the total variation in the number of arrests (Y) is explained by
this equation?
c. The police department in a city is trying to predict the number of monthly arrests.
The city has a population of 75,000, a police force of 82 and an unemployment
percentage of 10.5 percent . How many arrests do you predict for each month.
Solution
2. The following is the partially developed SPSS output of the multiple regression where the outcome variable(Y)
represents the scores made by 10 assembly line employees on a test designed to measure job satisfaction . The
scores are affected by two factors- an aptitude test (X1) and the number of days absents(X2) during the past
year.(excluding vacation).
Coefficients Table
Coefficients (bi) Standard error(sbi) t
Intercept 36.2083 7.3441 ?
Aptitude test (X1) 5.3882 0.9900 ?
Number of days absent -1.6191 0.3909 ?
(X2)
ANOVA Table
Source of variation Sum of squares df Mean Square F
Regression 1016.26949 2 ? ?
Residual 62.6305138 ? ?
Total 1078.9 9
Find the following question
I. Complete above ANOVA table and coefficient table.
II. Fit a multiple regression model and predict the value of Y when aptitude test is 7
and number of days absent is 6.
II. Is there any significant relationship between any dependent and two independent
variables? (test at 5% level of significance)
V. Test the significance of the estimated regression coefficient of X2 at the 5%
significance level.
V. What proportion of variations in scores (Y) is explained by two independent
variables?
VI. Compute the standard error of estimate and interpret its meaning.
Coefficients Table
Coefficients (bi) Standard error(sbi) t
Intercept 36.2083 7.3441 ? (4.93)
Aptitude test (X1) 5.3882 0.9900 ? (5.44)
Number of days absent -1.6191 0.3909 ?(-4.14)
(X2)
ANOVA Table
Source of variation Sum of squares df Mean Square F
Regression 1016.26949 2 ?(508.1347) ?(56.79)
Residual 62.6305138 ?(7) ?(8.9472)
Total 1078.9 9
Regression Analysis Using Excel Toolpak
Run the regression and get the output
The output has three components:
• Regression Statistics table

• ANOVA (analysis of variance)

• Regression coefficient table


Regression Statistics Table
Regression Statistics Explanations
Multiple R 0.913765468 R = Square root of R2
R Square 0.834967331 R2
Adjusted R Square 0.768954263 Adjusted R 2 used if more than one x
variable
Standard Error 1.027720333 This is the sample estimate of the
standard deviation of the error u
Observations 8 Number of observations used in the
regression
Interpretations of the ANOVA table
ANOVA
df SS MS F Significance F
(p-value)
Regression 2 26.71895459 13.35927295 12.64832192 0.011064322
Residual 5 5.281045415 1.056209083
Total 7 32
It splits the sum of squares into its components
R2 = 1- Residual SS/Total SS (General formula for R2)
= 1- 5.28104/32 (Data from ANOVA table)
= 0.83496733 (Same as in regression statistics table)
The column labeled F gives the overall F-test of H0 : ß2 = 0 and ß3 = 0 versus H1 : at least one of ß2 and ß3 does not
equal to zero.
F = [regression SS / k]/[residual SS/(n-k-1)] = [26.71895459/2]/[5.281045415/(8-2-1)] = 12.6484
F calculate at 5% level of significance and (2,5) degree of freedom = 5.79
The column labeled significance F has the associated P-value.
Since, 0.01106<0.05, we reject the H0 at significance level 0.05. (This means the model has some validity)
Interpretations Of Regression Coefficients Table
Coefficients Standard t Stat P-value Lower 95% Upper 95%
Error
Intercept -1.788017424 1.810519204 -0.987571642 0.368711965 -6.442105202 2.866070354
Initial 0.14036806 0.030423539 4.613797827 0.005767709 0.062161862 0.218574258
weight(lbs)
Initial age 0.249729367 0.177666545 1.405607161 0.218830913 -0.206977025 0.70643576
(weeks)
Column “coefficient” gives the least squares estimates of ßj
Column “Standard error” gives the standard errors (estimated standard deviations) of the least squares estimates b j of ßj
Column “t stat” gives the computed t-statistics for H0 : ßj = 0 against H1 : ßj ≠ 0
Column “P value” gives the p value for test of H0 : ßj = 0 against H1 : ßj ≠ 0
Columns lower 95% and Upper 95% values define a 95% confidence interval for ßj.

A simple summary of the above output is that the fitted line is


Weight gains (Y) = -1.788 + 0.1403 × Initial weight + 0.2497 × Initial age
Interpretations Of Regression Coefficients Table

Coefficients Standard t Stat P-value Lower 95% Upper 95%


Error
Intercept -1.788017424 1.810519204 -0.987571642 0.368711965 -6.442105202 2.866070354
Initial 0.14036806 0.030423539 4.613797827 0.005767709 0.062161862 0.218574258
weight(lbs)
Initial age 0.249729367 0.177666545 1.405607156 0.218830913 -0.206977025 0.70643576
(weeks)
Confidence intervals for slope coefficients
• 95% confidence interval for slope coefficient ß2 is (0.06216, 0.21857).
• 95% confidence interval for slope coefficient ß3 is (-0.20697, 0.70643).

Test of statistical significance


• The coefficient of initial weight has estimated standard error of 0.0304, t statistic of 4.6137 and p value of 0.005767. it is
therefore statistically significant at significance level = 0.05 as p<0.05.
• The coefficient of initial age has estimated standard error of 0.1776, t statistic of 1.4056 and p value of 0.21883. It is
therefore statistically insignificant at significance level = 0.05 as p<0.05.
Regression Analysis Using EXCEL Toolpak
• Excel 2007 has a built- in regression analysis tool that’s packaged as part of its
“Analysis Toolpak”.
• The analysis toolpak is a standard component of Excel as an add-in
To install this add-in
• Open excel and go to the option to modify the installation and choose the
Analysis Toolpak as an add-in that should be activated.
To start the add-in
• Open excel and go to the “ data “ tab, select “data analysis” and choose
“regression” to run the regression analysis.
Regression Analysis Using Excell Toolpak
Step to follow When Running a Regression with Excel
• The independent X variables used in the analysis must be located together in
the worksheet. There must be no blank columns or columns with non-relevant
data interrupting the range of X variables.
• The dependent Y variable need not be located adjacent to the X variables, but
all Y variable values must be in a single range.
Example
Given the following set of data
Calculate the multiple regression plane
Predict y when x1= 3.0 and x2 = 2.7
Predict x1 when y = 13 and x2 = 2.7
Predict x2 when y = 13 and x1 = 4.3
y 25 30 11 22 27 19
x1 3.5 6.7 1.5 0.3 4.6 2.0

x2 5.0 4.2 8.5 1.4 3.6 1.3


Correlation and Regression

Thank you

You might also like