Chapter 2
Chapter 2
Correlation (A Measure)
Does one increase as the other increases?
Direct Correlation. e. g. skills and income
Does one decrease as the other increases?
Inverse Correlation. e. g. health problems and nutrition
Then we can also view the correlation: as numerical measure of
the degree of relationship Strong or Weak.
10-1Correlation and Regression
Scatterplot:
It is a visual way to describe the nature of the relationship between
the x and y. It may shows:
a positive linear relationship,
a negative linear relationship,
a curvilinear relationship,
or no relationship.
x # of hours a student studies ⇒ y grade the student received
Scatterplot:
the independent variable is plotted on the x axis
the dependent variable is plotted on the y axis
10-1Scatter Plot
The simplest three types of patterns in scatter plots that we may observe are:
positive linear relationship
negative linear relationship
type of a nonlinear relationship or a curvilinear relationship
no relationship
10-1 Scatter Plot
Example 10-1 Car Rental Companies
Construct a scatter plot for the data shown for car rental companies in the United
States for a recent year.
Solution: the below graph showed a positive linear relationship exists, as the
number of cars that an agency owns increases the total revenue increase as well.
10-1 Scatter Plot
Correlation Coefficient:
In addition to visually assessing relationship between any two
continuous variables by the scatter plot which depends on how the
viewer see the graph. A formal test can be used, in this chapter the
correlation coefficient is considered.
6(682.77) − (153.8)(18.7)
=p =0.982.
[6(5859.26) − (153.8)2 ][6(80.67) − (18.7)2 ]
6(682.77) − (153.8)(18.7)
=p =0.982.
[6(5859.26) − (153.8)2 ][6(80.67) − (18.7)2 ]
r = −0.944.
r = −0.944.
FIGURE 10-11: Scatter Plot with Three Lines Fit to the Data.
10-2 Simple linear regression
With each observed pair (xi , yi ) there is a quantity called a residual defines as:
0
ei = yi − yi
A summary measure of the distances of the data point to the regression line
(fitted line) is called residual sum of squares denoted by SSE defined as:
Pn 0
ei2 = − yi )2
P
Sum of Squared Error(SSE ) = i =1 (yi
Best fit or a good line is one that minimizes the sum of squared differences
between the points and the line.
The smaller the sum of squared differences, the better the line fit the data.
10-2 Simple linear regression
Best file :
Means that the sum of the squares of the vertical distances from
each point to the line is at a minimum.
10-2 Simple linear regression
Best file :
Means that the sum of the squares of the vertical distances from
each point to the line is at a minimum.
10-2 simple linear regression
The simple linear regression model
y0 = a + b · x
where,
y 0 is called dependent variable, x is called independent
variable.
a is the intercept of the regression constant, b is the slope or
the regression coefficient, x is the observed independent
variable, and they are used to calculate y 0 which is the
predicted dependent variable.
a and b are unknown parameters, and we will use a statistical
method to estimate their values.
y 0 = a + b · x.
( y )( x 2 ) − ( x )( y )
P P P P
a= ,
n ( x ) − ( x )2
P 2 P
P P P
n( xy ) − ( x )( y )
b= .
n( x 2 ) − ( x )2
P P
10-2 Simple linear regression
( y )( x 2 ) − ( x )( xy )
P P P P
a=
n( x 2 ) − ( x )2
P P
(18.7)(5859.26) − (153.8)(682.77)
= = 0.396.
6(5859.26) − (153.8)2
n( xy ) − ( x )( y ) 6(682.77) − (153.8)(18.7)
P P P
b= = = 0.106.
n( x 2 ) − ( x )2 6(5859.26) − (153.8)2
P P
Note: The sign of the correlation coefficient and the sign of the
slope of the regression line will always be the same.
10-2 Simple linear regression
The slope indicates the change in the mean of the probability distribution of y per
unit increase in x.
And the intercept indicates that the expected value of y when x equals zero.
For the example: The slope b = 2.1 indicates, that the preparation of one
additional homework in a week leads to an increase in the mean of the probability
distribution of y by 2.1 hours.
10-2 Simple linear regression
Let x = 4; then
y 0 = 102.493 − 3.622x
= 102.493 − 3.622(4)
= 88.005
= 88(rounded )