0% found this document useful (0 votes)

20 views109 pages

Chapter 3 Complete

The document provides an overview of correlation and regression analysis, detailing methods such as scatter diagrams, correlation coefficients, and regression equations. It explains the differences between Pearson and Spearman correlation coefficients, along with the assumptions required for regression analysis. Additionally, it includes examples and formulas for calculating correlation and regression, emphasizing the importance of understanding the relationship between dependent and independent variables.

Uploaded by

Sandesh Shrestha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views109 pages

Chapter 3 Complete

Uploaded by

Sandesh Shrestha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 109

Correlation And Regression Analysis

Parishwar Acharya
Scatter Diagram
Scatter diagram is a graphical method to display the relationship
between two variables
Scatter diagram plots pairs of bivariate observations (x, y) on the XY
plane
Y is called the dependent variable
X is called an independent variable
Correlation
Correlation analysis is used to measure the degree of
relationship between two or more variables.
Only concerned with strength of the relationship
No causal effect is implied
It may be Simple, Partial or Multiple
Scatter Plot Examples
Linear relationships Curvilinear relationships
y y

x x
y y

x x
Scatter Plot Examples
Strong relationships Weak relationships
y y

x x
y y

x x
Scatter Plot Examples
No relationship
y

x
y

x
Simple Correlation coefficient (r)
It is also called Pearson's correlation or product moment
correlation coefficient.
It measures the nature and degree of relationships between
two variables of the quantitative type.
If the sign is positive this means the relation is direct (an increase in
one variable is associated with an increase in the other variable and a
decrease in one variable is associated with a decrease in the other
variable).

While if the sign is negative this means an inverse or indirect

relationship (which means an increase in one variable is associated
with a decrease in the other.
• The value of r ranges between ( -1) and ( +1)

• The value of r denotes the strength of the association as

illustrated.

• If r = 0 this means no association or correlation between the

two variables
If 0 < r < 0.25= weak correlation.
If 0.25 ≤ r < 0.75= intermediate correlation.
If 0.75 ≤ r < 1 = strong correlation.
If r = 1 perfect correlation.
Note : ρ = population correlation coefficient
Computing Formula
Where,
Sxx= ∑x2 – [(∑x)2 /n]
Syy = ∑y2 – [(∑y)2 /n]
Sxy= ∑xy – [(∑x) (∑y) /n]
Below two formula are of same type
Example
The table shows the heights and weights of n = 10 randomly
selected college football players.

Player 1 2 3 4 5 6 7 8 9 10
Height (x) 8 9 7 6 13 7 11 12 9 14

Weight (y) 35 49 27 33 60 21 45 51 46 65
• Calculate correlation coefficient and interpret your result.
• Is there evidence of a linear relationship between weight
and height at 0.05 level of significance?
• Also plot scatter diagram.
Solution:
X Y X2 Y2 XY
8 35 64 1225 280
9 49 81 2401 441
7 27 49 729 189
6 33 36 1089 198
13 60 169 3600 780
7 21 49 441 147
11 45 121 2025 495
12 51 144 2601 612
9 46 81 2116 414
14 65 196 4225 910
∑X=96 ∑Y=432 ∑X2=990 ∑Y2=20452 ∑XY=4466
As t cal > t tab, H0 is rejected.
There is evidence of a linear relationship between weight
and height at 0.05 level of significance.
Spearman Rank Correlation Coefficient

It is a non-parametric measure of correlation. (reason it does not assume that the

data follows normal distribution.

This procedure makes use of the two sets of ranks that may be assigned to the
sample values of x and y.

Spearman Rank correlation coefficient could be computed in the following cases:

Both variables are quantitative.

Both variables are qualitative ordinal.

One variable is quantitative and the other is qualitative ordinal.

Procedure
1. Rank the values of X from 1 to n where n is the numbers of pairs of
values of X and Y in the sample.

2. Rank the values of Y from 1 to n.

3. Compute the value of di = (Yi – Xi) for each pair of observation by

subtracting the rank of Yi from the rank of Xi

4. Square each di and compute ∑(di)2 which is the sum of the squared
values.
Example
Twelve appearance in painting competition were ranked by two judges as
shown below

Entry 1 2 3 4 5 6 7 8 9 10 11 12

Judge I 5 2 3 4 1 6 8 7 10 9 12 11

Judge II 4 5 2 1 6 7 10 9 11 12 3 8

Find the coefficient of rank correlation.

Answer: r = 0.4615
Entry Judge I Rank Judge II Rank d (Difference) d²
1 (ram bahadur) 5 4 1 1
2 (shyam bahadur) 2 5 -3 9
3 ( so on) 3 2 1 1
4 4 1 3 9
5 1 6 -5 25
6 6 7 -1 1
7 8 10 -2 4
8 7 9 -2 4
9 10 11 -1 1
10 9 12 -3 9
11 12 3 9 81
12 11 8 3 9
Example
The following are the marks obtained by two group of students in two
papers.
78 36 98 25 75 82 92 62 75 39
Marks in
Research
84 51 91 69 68 62 86 58 35 49
Marks in Q.
Methods

Find the coefficient of rank correlation.

Also calculate Karl Pearson's correlation coefficient.
To find the coefficient of rank correlation, we need to first rank the data for both variables,
then calculate the difference between the ranks, and finally use the formula for Spearman's
rank correlation coefficient.
Marks in Q.
Marks in Research Rank (R1) Rank (R2)
Methods
98 1 91 1
92 2 86 2
82 3 84 3
78 4 69 4
75 5.5 68 5
75 5.5 62 6
62 7 58 7
39 8 51 8
36 9 49 9
25 10 35 10
Rank (R1) Rank (R2) di = (R1 - R2) di 2
1 1 0 0
2 2 0 0
3 3 0 0
4 4 0 0
5.5 5 0.5 0.25
5.5 6 -0.5 0.25
7 7 0 0
8 8 0 0
9 9 0 0
10 10 0 0

di2 = 0 + 0 + 0 + 0 + 0.25 + 0.25 + 0 + 0 + 0 + 0 = 0.5

Marks in Marks in Q.
XY X2 Y2
Research(X) Methods (Y)
98 91 8918 9604 8281
92 86 7912 8464 7396
82 84 6888 6724 7056
78 69 5382 6084 4761
75 68 5100 5625 4624
75 62 4650 5625 3844
62 58 3596 3844 3364
39 51 1989 1521 2601
36 49 1764 1296 2401
25 35 875 625 1225
∑ X = 662 ∑ Y = 653 ∑ XY = 47074 ∑ X2 = 49412 ∑ Y2= 45553
Difference between Spearman rank correlation coefficient and Karl Pearson
correlation coefficient
Pearson correlation uses the actual data values, while Spearman correlation uses the
ranks of the data.
Pearson correlation measures the linear relationship, while Spearman correlation
measures the monotonic relationship.
Pearson correlation is more sensitive to outliers, while Spearman correlation is
more robust to outliers.
So, in this case, the rank correlation based on the marks would not necessarily be
equal to the Pearson correlation coefficient, unless the relationship between the two
variables is perfectly linear.
Assumption for regression analysis
1. Linearity
It is assumed that dependent variable should be linear function of independent variables in linear regression model. If
the assumption is violated, then there is no linear relationship between the dependent & independent variable
2. Normality
Residuals or errors generated from the fitted regression model are assumed to be normally distributed. If these
assumptions are violated, it creates the doubt in estimation of regression line & testing of regression coefficients
3. Homoscedasticity
This assumptions requires that the variation around the line of regression be constant for all values of independent
variables. This means that on error is same for low values as well as high values of independent variable.
4. Independents of error
This assumptions requires that the errors around the regression line be independent for each value of independent
variable. This assumption is more important for time series data.
Regression Analysis

The process of predicting one variable by using other knowing variable.

The process of predicting variable y using variable x.

Uses a variable x to predict some outcome variable y.

The process of predicting variable x using variable y.

Uses a variable y to predict some outcome variable x.

Tells you how values in y change as a function of changes in values of x

Regression tells us how to draw the straight line described by the correlation

Regression may be simple or multiple

Simple Regression
The regression analysis confined to the study of only two variables
(simple regression).
The variable which is used for prediction is called independent or
predictor or controlled variable
The variable which is to be predicted is called dependent or response
variable.
Lines of regression
Line of regress of y on x is the line which gives the best
estimate for the values of y for any specified value of x.
Line of regress of x on y is the line which gives the best
estimate for the values of x for any specified value of y.
Regression line of y on x
The regression line of y on x gives the best estimated value of y
for given values of x.
The regression equation of y on x is
y = a + bx …..(i)
where a is constant or y-intercept and b is the slope of
regression line (i) or regression coefficient of y on x which is
denoted by byx.
Computing Formula
• byx = Sxy / Sxx

• Sxx= ∑x2 – [(∑x)2 /n]

• Syy = ∑y2 – [(∑y)2 /n]

• Sxy= ∑xy – [(∑x) (∑y) /n]

Interpretation of Parameter of Simple linear regression Model
α = Y intercept = average value of Y when the value of X is equal to
zero.
b1 = expected change in Y per unit change in X
= average change in response variable Y ( either in positive or in
negative direction ) for one unit changes in explanatory variable X.
e = Random error in response variable Y for each observation that
occurs.
Example
The following data were collected on the height (inches (X)) and weight (pounds(Y)) of women swimmers.
a. Develop a scatter diagram for these data with height as independent variable.
b. What does the scatter diagram diagram tell you about the relationship between two variables.
c. Approximate the relationship between height and weight by drawing a straight line through the data ..
d. Obtain the estimated linear regression equation of weight on height.
e. If swimmers height is 63 inches, what would you estimate her weight to be ?
f. Why would it not be appropriate to predict the weight of women if she has weight of 75 inches? Explain

Height 68 64 62 65 66
Weight 132 108 102 115 128
Y

Weight

X
Height
Calculation table: n = 5
Height (X) Weight (Y) Y2 XY
68 132
64 108
62 102
65 115
66 128
∑X = 325 ∑Y = 585 ∑Y2 = 69101 ∑XY = 38135
Questions
1. The annual expenditures (in lakhs of rupees) and the corresponding annual sales (in
crores of rupees) for the past 10 years of a company are presented in the following table.
a. Find the correlation coefficient between annual advertising expenditure and annual
sales revenue and comment the result.
b. Develop a regression model of sales at a function of advertising expenditures.
Predict the value of annual sales while advertising expenditures was 27 lakhs of Rs.
Regression line of x on y
The regression line of x on y gives the best estimated value of x for
given values of y.
The regression equation of x on y is
x = a’ + b’y …..(i)
where a’ is constant or x-intercept and b’ is the slope of regression
line (i) or regression coefficient of x on y which is denoted by bxy.
Computing Formula
Here just we interchange x as y and y as x in every formula
and formula becomes as below

• bxy = Sxy / Syy

• Sxx= ∑x2 – [(∑x)2 /n]
• Syy = ∑y2 – [(∑y)2 /n]
• Sxy= ∑xy – [(∑x) (∑y) /n]
When should we have to kept dependent Y on place of x and x on a
place of y
Y is dependent variable and x is independent we know how to identify
Let we have an example
The following data were collected on the height (inches (X)) and weight
(pounds(Y)) of women swimmers
We know that effect will be seen on weight due to height. So that weight
will be dependent variable and height is independent variable.

Height (X) 68 64 62 65 66
Weight(Y) 132 108 102 115 128
In the linear regression model we will predict the value of dependent variable(Y) then we have
regression equation as y to x . But if we have to predict the value of independent variable (X)
then we will change y to x into x to y. for example
The following data were collected on the height (inches (X)) and weight (pounds(Y)) of women
swimmers
And the condition is of below
Predict the value of height for a given value of weight 105 pound.
Then we have to change height as y(in order to predict) and weight as x.

Height (X) 68 64 62 65 66
Weight(Y) 132 108 102 115 128
Then the regression equation will be
x = a + b1 y. now apply the rules and estimate the parameter then predict x for given y.
Standard Error of the estimate
If the least squares regression of y on x is given
y = a + bx , then the standard error of estimate is given
by
se2 = [syy – (sxy)2/sxx]/ n – 2
For the line x = a’ + b’y,
se2 = [sxx – (sxy)2/syy]/ n – 2
Test of significance of intercept parameter β
In order to test the significance of the regression coefficient of the simple linear regression model:
Y = α + b1 X
, following two statistical test have been applied.
I. t-test for significance in simple linear regression model
II. F-test for significance in simple linear regression model
t-test for significance in simple linear regression model
Ho : β = 0 , (There is no significant relationship between dependent and independent variable).
H1 : β ≠ 0,( There is significant relationship between dependent and independent variable).
Degree of freedom = n – 2
Choose the level of significance
Under Ho : test statistic is (follows n-2 degree of freedom)
You can use any one of the below formula

You can also use the

Conclusion : If tcal < t tab , H0 is accepted otherwise rejected.
Confidence Interval : β ± tα/2× [se / √Sxx ]
Analysis of variance (ANOVA) table for simple linear regression
Source of Sum of squares Degree of Mean Square F
variation freedom
Regression K=1

Error n-2

Total n-1
where
K = no of independent variable
n = number of observation
Decision Rule
Reject H0 if computed value of F > tabulated value of F with one degree of freedom in numerator and
(n-2) degree of freedom in the denominator at α % level of significance and accept otherwise.
Using p-value , reject the null hypothesis if p-value < α
Example
Cost accounts often estimate overhead based on the level of production.
At the Standard Knitting Co., they have collected information on
overhead expenses and units produced at different plants and want to
estimate a regression equation to predict future overhead.
Overhead 191 170 272 155 280 173 234 116 153 178

Units 40 42 53 35 56 39 48 30 37 40
Develop the regression equations for the cost accounts.
Predict overhead when 50 units are produced.
Estimate the number of production units if Company invests 400.
Example
The following measurements show the respective heights in inches of 10
fathers and their eldest sons.
Height of 66 67 63 71 69 65 62 70 61 72
father (X)

Height of 65 68 66 65 70 67 67 71 62 63
son (y)
Obtain the regression line of son’s height and estimate the height of
son when his father is found to be 70 inches high.
Obtain the regression line of father’s height and estimate the height of
father when his son is found to be 80 inches high.
Multiple Regression
Multiple regression analysis is a straightforward extension of simple
regression analysis which allows more than one independent variable.
It is used to estimate or predict the value of one dependent variable
when the values of two or more independent variables are known.
Multiple regression equation
The multiple regression equation of dependent variable y on two independent
variables x1 and x2 is given by
y = a + b x1 + c x2 ……(i)
where
a = value of y when x1 = 0 and x2 =0
b = Partial regression coefficient of
y on x1 when x2 is constant.
c = Partial regression coefficient of
y on x2 when x1 is constant.
Note that a, b , c are parameters of the equation whose values are to
determined.
Using the Principle of least square estimation , the normal equations of
line (i) are
∑y = na + b∑x1 + c∑x2 ….(ii)
∑ yx1 = a∑x1 + b∑x12 + c∑x1x2 ….(iii)
∑ yx2 = a∑x2 + b∑x1x2 + c∑x22 ….(iv)
Solving (ii), (iii), (iv), we get the values of a, b , c.
Substituting the values of a, b , c in (i), we get required multiple
regression equation of y on x1 and x2.
The multiple regression equation of dependent variable Y on n
independent variables x1 ,x2 , x3 ,…, xn is given by
Y = a + b1 x1 + b2 x2 +…… bn xn …(iv)
The values of α, ß1 and ß2 can be estimated by using the least square method.
The normal equation with two independent variables are given below
∑Y = n α + b1 ∑X1 + b2 ∑X2
∑ X1 Y = α ∑X1 + b1 ∑X12 + b2 ∑X1X2
∑ X2 Y = α ∑X2 + b1 ∑ X1X2 + b2 ∑X22
The values of α, ß1 and ß2 can be obtained by solving these three equations.
Source of Sum of Degree of Mean square F
variation squares (SS) freedom(df) (MS)
Regression SSR k

Error SSE n-k-1

(Residual)
Total SST n-1
t-test for significance in Multiple linear regression model
This test is applied to test whether the regression coefficient ß is statistically significant
or not.
The following hypothesis is set up
Setting of Hypothesis
Null Hypothesis, H0: ß j = 0 ( There is no significant relationship between dependent
and independent variable)
Alternative Hypothesis, H1: ß j ≠ 0 (There is significant relationship between dependent
and one independent variable)
Level of significant
Usually, α = 5% if not given.
Question
A random sample of 5 families yields the following data

Family A B C D E
Saving(Y)(Rs.000 6 12 10 7 3
)
Income X1 8 11 9 6 6
(RS.000)
No. of 5 2 1 3 4
children(X2)

I. Estimate the regression equation of Y on X1 and X2.

II. Find the coefficient of multiple determination and interpret its meaning.
Now the normal equation is,
∑Y = n α + b1 ∑X1 + b2 ∑X2
∑ X1 Y = α ∑X1 + b1 ∑X12 + b2 ∑X1X2
∑ X2 Y = α ∑X2 + b1 ∑ X1X2 + b2 ∑X22

Y Y2 X1 X2 X12 X22 X1X2 X1Y X2Y

∑Y=38 ∑Y2=33 ∑X1=40 ∑X2= 15 ∑X12=33 ∑X22=55 ∑X1X2= ∑X1Y=3 ∑X2Y=9

8 8 113 30 7

Now the normal equation becomes

38 = 5 α + 40 b1 + 15 b2
330 = 40 α + 338 b1 + 113 b2
97 = 15 α + 113 b1 + 55 b2
Solving the normal equation
α =1.829, b1 =1.076 and b2= -0.946
Source of variation Sum of squares (SS) Degree of Mean square (MS) F
freedom(df)
Regression SSR =44.076 K=2

Error SSE = 5.124 n-k-1 =5-2-1=2

(Residual)
Total SST =49.2 n-1 = 5-1 = 4

Ftab = 8.60 and Fcalc at 5% level of significance and (k, n-k-1) = (2,2) degree of freedom = 19
Decision
Ftab (8.60) < Fcalc (19) .we accept H1
Conclusion
There is linear relationship between dependent and at least one of the independent variable .
Again if question is asked which independent variable is significant then again we have to do
individual test called t test as like in simple linear regression model.
Question
1. Police stations across the country are interested in predicting the number of arrests they can expert to process each month so as
to better schedule office employee. Historically the average number of arrests (Y) each month is influenced by the number of
officers on police force (X1), the population of the city in thousands (X2) and the percentage of unemployed people in the city
(X3). The SPSS partial output for these factors in 15 cities are presented below.
Coefficient Table
Coefficients (bi) Standard error(sbi) t P-value
Constant 142.4363 25.96474 5.49 <0.001
X1 3.2741 0.2814354 11.64 <0.001
X2 0.5269 0.4693494 1.12 0.287
X3 -0.3203 1.295351 -0.24 0.812

ANOVA Table
Source of variation Sum of squares df Mean Square F
Regression 230500.663 3 76833.5544 246.41 (p-value
<0.001)
Residual 3429.9279 11 311.811627
Total 233930.591 14 16709.3279
a. Using the above output, determine the best-fitting regression equation.
b. What percentage of the total variation in the number of arrests (Y) is explained by
this equation?
c. The police department in a city is trying to predict the number of monthly arrests.
The city has a population of 75,000, a police force of 82 and an unemployment
percentage of 10.5 percent . How many arrests do you predict for each month.
Solution
2. The following is the partially developed SPSS output of the multiple regression where the outcome variable(Y)
represents the scores made by 10 assembly line employees on a test designed to measure job satisfaction . The
scores are affected by two factors- an aptitude test (X1) and the number of days absents(X2) during the past
year.(excluding vacation).
Coefficients Table
Coefficients (bi) Standard error(sbi) t
Intercept 36.2083 7.3441 ?
Aptitude test (X1) 5.3882 0.9900 ?
Number of days absent -1.6191 0.3909 ?
(X2)
ANOVA Table
Source of variation Sum of squares df Mean Square F
Regression 1016.26949 2 ? ?
Residual 62.6305138 ? ?
Total 1078.9 9
Find the following question
I. Complete above ANOVA table and coefficient table.
II. Fit a multiple regression model and predict the value of Y when aptitude test is 7
and number of days absent is 6.
II. Is there any significant relationship between any dependent and two independent
variables? (test at 5% level of significance)
V. Test the significance of the estimated regression coefficient of X2 at the 5%
significance level.
V. What proportion of variations in scores (Y) is explained by two independent
variables?
VI. Compute the standard error of estimate and interpret its meaning.
Coefficients Table
Coefficients (bi) Standard error(sbi) t
Intercept 36.2083 7.3441 ? (4.93)
Aptitude test (X1) 5.3882 0.9900 ? (5.44)
Number of days absent -1.6191 0.3909 ?(-4.14)
(X2)
ANOVA Table
Source of variation Sum of squares df Mean Square F
Regression 1016.26949 2 ?(508.1347) ?(56.79)
Residual 62.6305138 ?(7) ?(8.9472)
Total 1078.9 9
Regression Analysis Using Excel Toolpak
Run the regression and get the output
The output has three components:
• Regression Statistics table

• ANOVA (analysis of variance)

• Regression coefficient table

Regression Statistics Table
Regression Statistics Explanations
Multiple R 0.913765468 R = Square root of R2
R Square 0.834967331 R2
Adjusted R Square 0.768954263 Adjusted R 2 used if more than one x
variable
Standard Error 1.027720333 This is the sample estimate of the
standard deviation of the error u
Observations 8 Number of observations used in the
regression
Interpretations of the ANOVA table
ANOVA
df SS MS F Significance F
(p-value)
Regression 2 26.71895459 13.35927295 12.64832192 0.011064322
Residual 5 5.281045415 1.056209083
Total 7 32
It splits the sum of squares into its components
R2 = 1- Residual SS/Total SS (General formula for R2)
= 1- 5.28104/32 (Data from ANOVA table)
= 0.83496733 (Same as in regression statistics table)
The column labeled F gives the overall F-test of H0 : ß2 = 0 and ß3 = 0 versus H1 : at least one of ß2 and ß3 does not
equal to zero.
F = [regression SS / k]/[residual SS/(n-k-1)] = [26.71895459/2]/[5.281045415/(8-2-1)] = 12.6484
F calculate at 5% level of significance and (2,5) degree of freedom = 5.79
The column labeled significance F has the associated P-value.
Since, 0.01106<0.05, we reject the H0 at significance level 0.05. (This means the model has some validity)
Interpretations Of Regression Coefficients Table
Coefficients Standard t Stat P-value Lower 95% Upper 95%
Error
Intercept -1.788017424 1.810519204 -0.987571642 0.368711965 -6.442105202 2.866070354
Initial 0.14036806 0.030423539 4.613797827 0.005767709 0.062161862 0.218574258
weight(lbs)
Initial age 0.249729367 0.177666545 1.405607161 0.218830913 -0.206977025 0.70643576
(weeks)
Column “coefficient” gives the least squares estimates of ßj
Column “Standard error” gives the standard errors (estimated standard deviations) of the least squares estimates b j of ßj
Column “t stat” gives the computed t-statistics for H0 : ßj = 0 against H1 : ßj ≠ 0
Column “P value” gives the p value for test of H0 : ßj = 0 against H1 : ßj ≠ 0
Columns lower 95% and Upper 95% values define a 95% confidence interval for ßj.

A simple summary of the above output is that the fitted line is

Weight gains (Y) = -1.788 + 0.1403 × Initial weight + 0.2497 × Initial age
Interpretations Of Regression Coefficients Table

Coefficients Standard t Stat P-value Lower 95% Upper 95%

Error
Intercept -1.788017424 1.810519204 -0.987571642 0.368711965 -6.442105202 2.866070354
Initial 0.14036806 0.030423539 4.613797827 0.005767709 0.062161862 0.218574258
weight(lbs)
Initial age 0.249729367 0.177666545 1.405607156 0.218830913 -0.206977025 0.70643576
(weeks)
Confidence intervals for slope coefficients
• 95% confidence interval for slope coefficient ß2 is (0.06216, 0.21857).
• 95% confidence interval for slope coefficient ß3 is (-0.20697, 0.70643).

Test of statistical significance

• The coefficient of initial weight has estimated standard error of 0.0304, t statistic of 4.6137 and p value of 0.005767. it is
therefore statistically significant at significance level = 0.05 as p<0.05.
• The coefficient of initial age has estimated standard error of 0.1776, t statistic of 1.4056 and p value of 0.21883. It is
therefore statistically insignificant at significance level = 0.05 as p<0.05.
Regression Analysis Using EXCEL Toolpak
• Excel 2007 has a built- in regression analysis tool that’s packaged as part of its
“Analysis Toolpak”.
• The analysis toolpak is a standard component of Excel as an add-in
To install this add-in
• Open excel and go to the option to modify the installation and choose the
Analysis Toolpak as an add-in that should be activated.
To start the add-in
• Open excel and go to the “ data “ tab, select “data analysis” and choose
“regression” to run the regression analysis.
Regression Analysis Using Excell Toolpak
Step to follow When Running a Regression with Excel
• The independent X variables used in the analysis must be located together in
the worksheet. There must be no blank columns or columns with non-relevant
data interrupting the range of X variables.
• The dependent Y variable need not be located adjacent to the X variables, but
all Y variable values must be in a single range.
Example
Given the following set of data
Calculate the multiple regression plane
Predict y when x1= 3.0 and x2 = 2.7
Predict x1 when y = 13 and x2 = 2.7
Predict x2 when y = 13 and x1 = 4.3
y 25 30 11 22 27 19
x1 3.5 6.7 1.5 0.3 4.6 2.0

x2 5.0 4.2 8.5 1.4 3.6 1.3

Correlation and Regression

Thank you

PSNM - Ch. 1
No ratings yet
PSNM - Ch. 1
16 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
BStats 2
No ratings yet
BStats 2
66 pages
Microsoft PowerPoint Session 4 PDF
No ratings yet
Microsoft PowerPoint Session 4 PDF
86 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Correlation - and - Regression - Analysis
No ratings yet
Correlation - and - Regression - Analysis
10 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Correlation & Regression
No ratings yet
Correlation & Regression
31 pages
IV - Measures of Relationship
100% (1)
IV - Measures of Relationship
4 pages
Regression Correlation
No ratings yet
Regression Correlation
22 pages
Smtb1402-Probability & Statistics: Correlation
No ratings yet
Smtb1402-Probability & Statistics: Correlation
19 pages
Chapter 7 Correlation and Regression Lyst5582
No ratings yet
Chapter 7 Correlation and Regression Lyst5582
13 pages
26 - Correlation and Regression Analysis
No ratings yet
26 - Correlation and Regression Analysis
50 pages
Unit III Notes
No ratings yet
Unit III Notes
9 pages
QT - LESSON 8-Regression & Correlation
No ratings yet
QT - LESSON 8-Regression & Correlation
12 pages
Lecture 11-Correlation and Linear Regression
No ratings yet
Lecture 11-Correlation and Linear Regression
7 pages
Correlation-Regression 2019
No ratings yet
Correlation-Regression 2019
76 pages
Introduction To Correlation and Regression Analysis
No ratings yet
Introduction To Correlation and Regression Analysis
14 pages
Correlation and Regression
No ratings yet
Correlation and Regression
16 pages
Regression and Correlation - Upload Compatibility Mode
No ratings yet
Regression and Correlation - Upload Compatibility Mode
31 pages
Correlation and Regression: by Tushar Bhatt
100% (1)
Correlation and Regression: by Tushar Bhatt
66 pages
Sta404 - Chapter 5 - Bivariate Analysis (Student)
No ratings yet
Sta404 - Chapter 5 - Bivariate Analysis (Student)
27 pages
Lesson 9: Test of Correlation and Simple Linear Regression
No ratings yet
Lesson 9: Test of Correlation and Simple Linear Regression
7 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
Unit-I-Correlation and Regression
No ratings yet
Unit-I-Correlation and Regression
64 pages
Correlation Analysis - Final
No ratings yet
Correlation Analysis - Final
40 pages
Correlation Regression
No ratings yet
Correlation Regression
24 pages
Statistics Unit 4
No ratings yet
Statistics Unit 4
57 pages
Correlation Analysis
No ratings yet
Correlation Analysis
30 pages
Regression Analysis
No ratings yet
Regression Analysis
6 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Mfylg$f3f !y) NNN) 2
No ratings yet
Mfylg$f3f !y) NNN) 2
13 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
BMA3102 Topic One
No ratings yet
BMA3102 Topic One
34 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Correlation and Regression
No ratings yet
Correlation and Regression
5 pages
Correlation and Simple Linear Regression Analyses: Objectives
No ratings yet
Correlation and Simple Linear Regression Analyses: Objectives
6 pages
Correlation and Regression Original
No ratings yet
Correlation and Regression Original
44 pages
Regression & Correlation 230224 221642
No ratings yet
Regression & Correlation 230224 221642
9 pages
12.1correlation and Simple Linear
No ratings yet
12.1correlation and Simple Linear
45 pages
Correlation & Regression (Complete) .PDF Theory Module-6-B
100% (1)
Correlation & Regression (Complete) .PDF Theory Module-6-B
9 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Correlation and Regression
No ratings yet
Correlation and Regression
43 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
07 - Correlation and Regression Analysis-1
No ratings yet
07 - Correlation and Regression Analysis-1
13 pages
Unit 3 Simple Correlation and Regression Analysis1
No ratings yet
Unit 3 Simple Correlation and Regression Analysis1
16 pages
Correlation vs. Regression
No ratings yet
Correlation vs. Regression
15 pages
SOCI1005 - Correlation and Regression
No ratings yet
SOCI1005 - Correlation and Regression
36 pages
Correlation and Regration
No ratings yet
Correlation and Regration
57 pages
Correction
No ratings yet
Correction
10 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
9correlation and Regression
No ratings yet
9correlation and Regression
41 pages
Correlation - Linear - Logistic Regression
No ratings yet
Correlation - Linear - Logistic Regression
123 pages
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
Correlation and Regression
No ratings yet
Correlation and Regression
62 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
Correlation and Regression Analysis
100% (1)
Correlation and Regression Analysis
59 pages
Employee Competencies Interrelation of Workforce Agility Attributes
No ratings yet
Employee Competencies Interrelation of Workforce Agility Attributes
7 pages
AIMMS Modeling Guide - Linear Programming Tricks
No ratings yet
AIMMS Modeling Guide - Linear Programming Tricks
16 pages
Estad Istica II Chapter 5. Regression Analysis (Second Part)
No ratings yet
Estad Istica II Chapter 5. Regression Analysis (Second Part)
39 pages
SURVEYING Module 1 To 4
100% (1)
SURVEYING Module 1 To 4
17 pages
Illicit Financial Flows and Money Laundering
No ratings yet
Illicit Financial Flows and Money Laundering
32 pages
Journal of Vocational Behavior: Richard P. Douglass, Ryan D. Duffy
No ratings yet
Journal of Vocational Behavior: Richard P. Douglass, Ryan D. Duffy
8 pages
Updating Weight
No ratings yet
Updating Weight
9 pages
Latin Square Design: Arif Rahman
No ratings yet
Latin Square Design: Arif Rahman
31 pages
Cvpr23 - VideoFusion Decomposed Diffusion Models For High-Quality Video Generation
No ratings yet
Cvpr23 - VideoFusion Decomposed Diffusion Models For High-Quality Video Generation
10 pages
Bayesian Bivariate Meta-Analysis of Diagnostic Test Studies With Interpretable Priors
No ratings yet
Bayesian Bivariate Meta-Analysis of Diagnostic Test Studies With Interpretable Priors
20 pages
DMBA103 - Combined Question Answers
No ratings yet
DMBA103 - Combined Question Answers
7 pages
TLP - MATH1310 Statistical Concepts
No ratings yet
TLP - MATH1310 Statistical Concepts
10 pages
Get (Ebook PDF) Research Design Explained 8th Edition PDF Ebook With Full Chapters Now
100% (1)
Get (Ebook PDF) Research Design Explained 8th Edition PDF Ebook With Full Chapters Now
55 pages
2k Factorial Lesson 2
No ratings yet
2k Factorial Lesson 2
48 pages
Prakhar Project Report
No ratings yet
Prakhar Project Report
58 pages
Hypothesis Testing Templates On Excel
No ratings yet
Hypothesis Testing Templates On Excel
14 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
300 pages
Chapter V
No ratings yet
Chapter V
12 pages
Example of 2SLS and Hausman Test
No ratings yet
Example of 2SLS and Hausman Test
4 pages
Pengaruh Citra Merek (Brand Image) Terhadap Loyalitas Konsumen Bakso Bakar Pak Man Kota Malang
No ratings yet
Pengaruh Citra Merek (Brand Image) Terhadap Loyalitas Konsumen Bakso Bakar Pak Man Kota Malang
7 pages
Saint Gba334 Module 3 Quiz 2
0% (1)
Saint Gba334 Module 3 Quiz 2
2 pages
PGDFT Question Papers
100% (1)
PGDFT Question Papers
27 pages
Non Spherical Disturbances - Heteroskedasticity 1
No ratings yet
Non Spherical Disturbances - Heteroskedasticity 1
12 pages
MCQ Regression and Correlation With Correct Answers 1
100% (1)
MCQ Regression and Correlation With Correct Answers 1
9 pages
4 ML
No ratings yet
4 ML
41 pages
Beck and Katz 2011
No ratings yet
Beck and Katz 2011
28 pages
OPIANA - MIDTERM+Problem-set-4-5-6-7-and-8 - 9-10
No ratings yet
OPIANA - MIDTERM+Problem-set-4-5-6-7-and-8 - 9-10
73 pages
ED 801 Module 4 Answers
100% (1)
ED 801 Module 4 Answers
23 pages
E-Cld-6003 - Astm - E74
No ratings yet
E-Cld-6003 - Astm - E74
12 pages
Board Diversity and Financial Performance PDF
100% (1)
Board Diversity and Financial Performance PDF
15 pages

Chapter 3 Complete

Uploaded by

Chapter 3 Complete

Uploaded by

Correlation And Regression Analysis

While if the sign is negative this means an inverse or indirect

• The value of r denotes the strength of the association as

• If r = 0 this means no association or correlation between the

It is a non-parametric measure of correlation. (reason it does not assume that the

Spearman Rank correlation coefficient could be computed in the following cases:

Both variables are quantitative.

Both variables are qualitative ordinal.

One variable is quantitative and the other is qualitative ordinal.

2. Rank the values of Y from 1 to n.

3. Compute the value of di = (Yi – Xi) for each pair of observation by

Find the coefficient of rank correlation.

Find the coefficient of rank correlation.

di2 = 0 + 0 + 0 + 0 + 0.25 + 0.25 + 0 + 0 + 0 + 0 = 0.5

The process of predicting one variable by using other knowing variable.

The process of predicting variable y using variable x.

Uses a variable x to predict some outcome variable y.

The process of predicting variable x using variable y.

Uses a variable y to predict some outcome variable x.

Tells you how values in y change as a function of changes in values of x

Regression may be simple or multiple

• Sxx= ∑x2 – [(∑x)2 /n]

• Syy = ∑y2 – [(∑y)2 /n]

• Sxy= ∑xy – [(∑x) (∑y) /n]

• bxy = Sxy / Syy

You can also use the

Error SSE n-k-1

I. Estimate the regression equation of Y on X1 and X2.

Y Y2 X1 X2 X12 X22 X1X2 X1Y X2Y

∑Y=38 ∑Y2=33 ∑X1=40 ∑X2= 15 ∑X12=33 ∑X22=55 ∑X1X2= ∑X1Y=3 ∑X2Y=9

Now the normal equation becomes

Error SSE = 5.124 n-k-1 =5-2-1=2

• ANOVA (analysis of variance)

• Regression coefficient table

A simple summary of the above output is that the fitted line is

Coefficients Standard t Stat P-value Lower 95% Upper 95%

Test of statistical significance

x2 5.0 4.2 8.5 1.4 3.6 1.3

You might also like