Module9-Correlation and Regression (Business)
Module9-Correlation and Regression (Business)
Correlation and
Regression
Objectives:
In everyday discourse, almost all statements about the mutual relation between variables
are accepted without question. For example, age and physical capacity, income and educational
attainment, intelligence and academic performance, cigarette smoking and lung disease,
unemployment and the condition of the economy, and so on. In almost every field, we find that
one variable is somewhat related to another variable, or that relationship exists between variables.
It should be noted, however, that relationship does not mean causality. That is, relationship does
not necessarily imply that one variable is the cause of the other variable.
The investigation of two or more variables requires not only procedures for defining and
measuring the variables under study, but also for describing the nature of relations between them.
A procedure that may be used to determine the relationship between variables is the correlation.
Correlation is the test of measurement when the degrees of relationship are measured.
The statistic used to describe the degree or magnitude of relationship between variables is called a
correlation coefficient (r) which is composed of the direction and magnitude.
The types of correlation may be classified in terms of its magnitude and direction. The
degree or magnitude may be described as perfect, high, moderate or low. The direction may be
classified as positive correlation, negative correlation or zero correlation. A positive correlation
means that there is a direct relationship between variables. It exists when high values in one
variable are associated with high values in the other variable, and low values in one variable are
associated with low values in the other variable. The negative correlation, on the other hand,
exists when high values in one variable are associated with low values in the second variable, and
vice-versa. When values in one variable tend to score neither systematically high nor
systematically low in the other variable, then there is a zero correlation.
Business Statistics
Here is the correlation scale and the corresponding interpretation of r.
Value of r Interpretation
1 perfect correlation
±.80 - ±.99 high correlation
±.60 - ±.79 moderate high correlation
±.40 - ±.59 moderate correlation
±.20 - ±.39 low correlation
±.01 - ±.19 negligible correlation
0 zero correlation
Degree of relationship of two variables may be represented using the scatter diagram. If
the points in the scatter diagram, follow a straight line, this accurately suggests a linear
relationship. However, we must take note that not all relationships are linear. When a scatterplot
of the X and Y variables is drawn, a curved line fits the point better than a straight line, then the
relationship tends to be curvilinear.
Business Statistics
Pearson Product-Moment Correlation Coefficient
The most widely used measure of correlation is the Pearson product-moment correlation
coefficient, or simply Pearson r which was developed by Karl Pearson. This statistic is used for
interval and ratio type of data. If two variables, X and Y, are under investigation the correlation
coefficient is determined by:
𝑛 ∑ 𝑋𝑌−(∑ 𝑋)(∑ 𝑌)
𝑟=
√[𝑛 ∑ 𝑋 2 −(∑ 𝑋)2 ][𝑛 ∑ 𝑌 2 −(∑ 𝑌)2 ]
Example:
Business Statistics
Solution:
10(71208) − (835)(852)
𝑟=
√[10(69901) − (835)2 ][10(72650) − (852)2 ]
712080 − 711420
𝑟=
√[(699010) − 697225][(726500) − 725904]
660
𝑟=
√[1785][596]
Business Statistics
Name: ____________________________________________ Date: ____________________
Activity 1
I. Indicate the direction of the correlation between two variables as positive, negative, or zero
correlation.
___________________1. Grade of a student and number of hours spent in studying
___________________2. Price of a computer unit and monthly water consumption
___________________3. Stress level and blood pressure of a patient
___________________4. Heights of husbands and income of their wives
___________________5. Unemployment rate and interest rates
II. An insurance company wants to know how the amount of life insurance depends on the
income of persons. The research department of the company collected information on ten
policy holders. The following table lists the monthly incomes (in thousand pesos) and
amounts (in million pesos) of their life insurance policies.
Policy Monthly Life
Holder Income Insurance
01 62 4.0
02 75 5.3
03 50 5.0
04 34 2.0
05 28 2.2
06 23 1.5
07 34 1.8
08 34 2.0
09 67 4.5
10 39 3.4
Find the correlation coefficient between income and insurance amount using Pearson r. Interpret
the result.
Business Statistics
Spearman rho (𝝆) Rank Correlation
The Spearman rho (𝜌) is used in determining the correlation coefficient. This is used
to find out if there is a significant relationship between two variables of ordinal type. The
formula of Spearman rho is as follows:
6 ∑ 𝐷2
𝜌=1−
𝑛(𝑛2 − 1)
Example:
Using Spearman rho, determine the relationship between the capital and profit of
cinnamon rolls of a certain store at 0.05 level of significance.
Business Statistics
Solution:
6(7)
𝜌=1−
10(102 − 1)
42
𝜌 = 1−
10(100 − 1)
42
𝜌=1−
990
𝜌 = 0.958
There is a high positive correlation between capital and profit of cinnamon rolls.
Business Statistics
Name: ____________________________________________ Date: ____________________
Activity 2
1. Using Spearman rho, determine the relationship between weight and height of
babies who were admitted in a certain hospital at 0.05 level of significance.
Business Statistics
Name: ______________________________________________ Date: __________________
Activity 3
A. Pearson’s r
Determine if there is a relationship between the company employee’s half-year rating and
second half of the year rating. Test at 0.05 significance level.
B. Spearman rho
Business Statistics
Linear Regression of Y on X
A regression line is a model that simplifies the relationship between two variables by
approximating a line through the center of a scatterplot that represents the data and creating a
two-dimensional center of the data.
In general, the equation of any line is given by Y = bX + a, where a and b are constant
and b ≠ 0. The constant a is the distance on the Y axis from the origin to the point where the line
cuts the Y axis, in other words, the y-intercept. The quantity b is the slope of the line. The slope
of any line is simply the ratio of the distance in a vertical direction to the distance in horizontal
direction. The slope describes the rate of increase in Y with increase in X.
If X and Y are correlated variables, we can predict or estimate the value of Y given the
value of X by finding the regression equation. The regression line of Y on X is represented by the
equation
𝑌′ = 𝑏𝑦𝑥 𝑋 + 𝑎𝑦𝑥
In the regression equation, the regression coefficient is simply the slope of the regression
line. It represents the change in Y for every one unit change in X.
To obtain the value of the coefficients ayx and byx, the following formula may be used:
Given that 10 students have taken the college admission test (X) and have a general
weighted average (Y), the computed values from the data are as follows:
What is the estimated general weighted average of a student who scored 95 in the college
admission test?
Solution:
To find the coefficients bxy and axy, substitute these in the formula:
𝑛(∑ 𝑋𝑌)−(∑ 𝑋)(∑ 𝑌) (10)(62945)−(745)(836)
𝑏𝑦𝑥 = 𝑏𝑦𝑥 = = 0.375
𝑛 ∑ 𝑋 2 −(∑ 𝑋)2 10(57269)−(745)2
Business Statistics
The equation of the regression line is given by:
𝑌′ = 𝑏𝑦𝑥 𝑋 + 𝑎𝑦𝑥
𝑌′ = 0.375𝑋 + 55.64
If X = 95, then:
Therefore, if a student scored 95 in the college admission test, his estimated or predicted
weighted average at the end of the semester is 91.
Business Statistics
Name: ____________________________________________ Date: ____________________
Activity 4
Business Statistics
Name: ____________________________________________ Date: ____________________
Activity 5
I. Indicate the direction of the correlation between two variables as positive, negative, or zero
correlation.
Business Statistics
REFERENCES
https://www.zcalculator.com
https://www.youtube.com/watch?v=aztcS-3MwHO&features
Business Statistics