0% found this document useful (0 votes)
9 views22 pages

Correlation

The document explains correlation as a statistical technique that measures the relationship between pairs of variables, such as height and weight, and introduces regression analysis as a method to predict outcomes based on independent variables. It outlines the assumptions of linear regression and provides a formula for calculating the regression equation, alongside examples of its application. Additionally, it discusses the correlation coefficient and the use of scatter diagrams to visualize relationships between variables.

Uploaded by

kylajayne205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views22 pages

Correlation

The document explains correlation as a statistical technique that measures the relationship between pairs of variables, such as height and weight, and introduces regression analysis as a method to predict outcomes based on independent variables. It outlines the assumptions of linear regression and provides a formula for calculating the regression equation, alongside examples of its application. Additionally, it discusses the correlation coefficient and the use of scatter diagrams to visualize relationships between variables.

Uploaded by

kylajayne205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

CORRELATION

CORRELATION :

• Correlation is a statistical technique that can show whether and how


strongly pairs of variables are related. For example, height and
weight are related; taller people tend to be heavier than shorter
people. The relationship isn't perfect. People of the same height vary
in weight, and you can easily think of two people you know where the
shorter one is heavier than the taller one. Nonetheless, the average
weight of people 5'5'' is less than the average weight of people 5'6'',
and their average weight is less than that of people 5'7'', etc.
• Correlation can tell you just how much of the variation in peoples'
weights is related to their heights. Although this correlation is fairly
obvious your data may contain unsuspected correlations. You may
also suspect there are correlations, but don't know which are the
strongest. An intelligent correlation analysis can lead to a greater
understanding of your data.
LEARNING OUTCOMES:
• After successful completion of this unit, you should be able to:
• explain the concept of regression;
• apply the concept of regression in solving problems;
• identify the advantages and disadvantages regression;
• explain the concept of Chi-square;
• apply the concept of Chi – square in solving problems; and
• identify the advantages and disadvantages Chi – square.
Regression Analysis – Linear model assumptions
Linear regression analysis is based on six fundamental assumptions:
1. The dependent and independent variables show a linear
relationship between the slope and the intercept.
2. The independent variable is not random.
3. The value of the residual (error) is zero.
4. The value of the residual (error) is constant across all observations.
5. The value of the residual (error) is not correlated across all
observations. 6. The residual (error) values follow the normal
distribution
Regression Analysis – Simple linear regression Simple linear regression
is a model that assesses the relationship between a dependent variable
and an independent variable. The simple linear model is expressed
using the following equation:
Y = a + bX + ϵ
Where: • Y – Dependent variable
• X – Independent (explanatory) variable
• a – Intercept
• b – Slope
• ϵ – Residual (error)
Example:
Last year, five randomly selected students took a math aptitude test before
they began their statistics course. The Statistics Department has three
questions.
▪ What linear regression equation best predicts statistics performance, based
on math aptitude scores?
▪ If a student made an 80 on the aptitude test, what grade would we expect
her to make in statistics?
▪ How well does the regression equation fit the data? How to Find the
Regression Equation In the table below, the xi column shows scores on
the aptitude test. Similarly, the yi column shows statistics grades. The last
two columns show deviations scores - the difference between the
student's score and the average score on each test. The last two rows
show sums and mean scores that we will use to conduct the regression
analysis.
• The regression equation is a linear equation of the form:
ŷ = b0 + b1x .
To conduct a regression analysis, we need to solve for b0 and b1.
Computations are shown below. Notice that all of our inputs for the
regression analysis come from the above three tables.

First, we solve for the regression coefficient (b1):


b1 = Σ [ (xi - x)(yi - y) ] / Σ [ (xi - x )2 ]
b1 = 470/730
b1 = 0.644
• Once we know the value of the regression coefficient (b1), we can
solve for the regression slope (b0):

b0 = y - b1 * x
b0 = 77 - (0.644)(78)
b0 = 26.768

Therefore, the regression equation is: ŷ = 26.768 + 0.644x


How to Use the Regression Equation:
Once you have the regression equation. Choose a value for the
independent variable (x), perform the computation, and you have an
estimated value (ŷ) for the dependent variable.
In our example, the independent variable is the student's score on the
aptitude test. The dependent variable is the student's statistics grade. If
a student made an 80 on the aptitude test, the estimated statistics
grade (ŷ) would be:
ŷ = b0 + b1x
ŷ = 26.768 + 0.644x
= 26.768 + 0.644 * 80
ŷ = 26.768 + 51.52 = 78.288
Correlation coefficient

The degree of association is measured by a correlation coefficient,


denoted by r. It is sometimes called Pearson's correlation coefficient
after its originator and is a measure of linear association.
If a curved line is needed to express the relationship, other and more
complicated measures of the correlation must be used.
The correlation coefficient is measured on a scale that varies from
+ 1 through 0 to - 1
Complete correlation between two variables is expressed by either + 1
or -1. When one variable increases as the other increases the
correlation is positive; when one decreases as the other increases it is
negative. Complete absence of correlation is represented by 0. Figure
below gives some graphical representations of correlation.
Looking at data: scatter diagrams
When an investigator has collected two series of observations and wishes to
see whether there is a relationship between them, he or she should first
construct a scatter diagram. The vertical scale represents one set of
measurements and the horizontal scale the other. If one set of observations
consists of experimental results and the other consists of a time scale or
observed classification of some kind, it is usual to put the experimental
results on the vertical axis. These represent what is called the "dependent
variable". The "independent variable", such as time or height or some other
observed classification, is measured along the horizontal axis, or baseline.
The words "independent" and "dependent" could puzzle the beginner
because it is sometimes not clear what is dependent on what. This confusion
is a triumph of common sense over misleading terminology, because often
each variable is dependent on some third variable, which may or may not be
mentioned. It is reasonable, for instance, to think of the height of children as
dependent on age rather than the converse but consider a positive
correlation between mean tar yield and nicotine yield of certain brands of
cigarette.'
• The nicotine liberated is unlikely to have its origin in the tar: both vary in
parallel with some other factor or factors in the composition of the
cigarettes. The yield of the one does not seem to be "dependent" on the
other in the sense that, on average, the height of a child depends on his
age. In such cases it often does not matter which scale is put on which axis
of the scatter diagram. However, if the intention is to make inferences
about one variable from the other, the observations from which the
inferences are to be made are usually put on the baseline. As a further
example, a plot of monthly deaths from heart disease against monthly
sales of ice cream would show a negative association. However, it is hardly
likely that eating ice cream protects from heart disease! It is simply that the
mortality rate from heart disease is inversely related - and ice cream
consumption positively related - to a third factor, namely environmental
temperature
Example: Calculation of the correlation coefficient

https://www.youtube.com/watch?v=nUD04ka4goA
ASSESSMENT:
• Determine if there is a correlation between size of pulmonary
anatomical dead space and height of child. Make a scatter diagram to
show the heights and pulmonary anatomical dead spaces in the 15
children. Use 5% level of significance.

You might also like