0% found this document useful (0 votes)
56 views59 pages

Chapter 2

الإحصاء وقوانينه

Uploaded by

Farooq Alhamdany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views59 pages

Chapter 2

الإحصاء وقوانينه

Uploaded by

Farooq Alhamdany
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Chapter 10-13

Correlation and Regression


Introduction

Sec. Title Page


10-1 Scatter Plots and Correlation 547-563
13-6 The Spearman Rank Correlation Coefficient and the Runs Test 715-718
10-2 Regression 563-572
10-1 Correlation and Regression

Researchers have been interested in the relationship between two


or more variables exists.

For example: Is salt consumption related to blood pressure? or Is


there a relationship between a persons age and his/ her blood
pressure?
10-1 Correlation and Regression
The purpose of this chapter then is to answer these questions
statistically:

Can we draw a model that describe relation between two


variables. Is there a relationship between x and y?
What is the strength of this relationship?
Can we get a numerical measure of the degree of
relationship?
Motivations
Analyze the specific relationships between the variables.
Forecast the value of a variable from the value of another one.

One of the variables, denoted by y, is called the dependent


variable.
The other variable, denoted by x, is called the independent
variable.
10-1Correlation and Regression

In this chapter, we will investigate simple tools used to study


relationship between two continuous variables.

Correlation analysis is used to assess how one variable is related


to another.

Correlation (A Measure)
Does one increase as the other increases?
Direct Correlation. e. g. skills and income
Does one decrease as the other increases?
Inverse Correlation. e. g. health problems and nutrition
Then we can also view the correlation: as numerical measure of
the degree of relationship Strong or Weak.
10-1Correlation and Regression

Regression (The Model): predict the value of a certain


independent variable from dependent variable.

Regression (The Model): refers to the statistical technique of


modeling the relationship between variables.

The model that we will use: is linear (straight-line) relationship

In simple correlation and regression studies: the researcher


collects data on two numerical or quantitative variables to see
whether a relationship exists between the variables.
10-1 Scatter Plots and Correlation

A scatter plot: is a graph of the ordered paired (x,y) that allows to


examine the relationship between any two continuous variables,
consisting of the dependent, response variable y and independent,
explanatory, or predictor variable x.

Independent variable is the one that is capable of influencing the


other, also known as (controlled inputs ).

Dependent variable is the one that is capable of being influenced


by the other.

In this case, the number of hours a student studies is the


independent variable ,x, variable. The grade the student received
on the exam is the dependent variable, y, variable.
10-1 Scatter Plot

Scatterplot:
It is a visual way to describe the nature of the relationship between
the x and y. It may shows:
a positive linear relationship,
a negative linear relationship,
a curvilinear relationship,
or no relationship.
x # of hours a student studies ⇒ y grade the student received

Scatterplot:
the independent variable is plotted on the x axis
the dependent variable is plotted on the y axis
10-1Scatter Plot
The simplest three types of patterns in scatter plots that we may observe are:
positive linear relationship
negative linear relationship
type of a nonlinear relationship or a curvilinear relationship
no relationship
10-1 Scatter Plot
Example 10-1 Car Rental Companies
Construct a scatter plot for the data shown for car rental companies in the United
States for a recent year.

Solution: the below graph showed a positive linear relationship exists, as the
number of cars that an agency owns increases the total revenue increase as well.
10-1 Scatter Plot

Example 10-2 Absences and Final Grades


Construct a scatter plot for the data obtained in a study on the
number of absences and the final grades of seven randomly
selected students from a statistics class.
10-1 Scatter Plot
Solution:
Step 1: Draw and label the x and y axes.
Step 2: Plot each point on the graph.

The graph showed a negative linear relationship exists as the


number of student absences increases the final grade of the
students decreases.
10-1 Scatter Plot
Solution:
Step 1: Draw and label the x and y axes.
Step 2: Plot each point on the graph.

The graph showed a negative linear relationship exists as the


number of student absences increases the final grade of the
students decreases.
10-1 Scatter Plot

Example 10-3 Number of Teachers and Pupils per Teacher

School district Number of teachers (In thousands) Pupils per teacher


1 7 12.4
2 34 14.3
3 9 14.3
4 8 9.2
5 16 18.3
6 15 12.1
7 6 12.3
8 14 12.4
9 32 15.2
10 10 13.4
10-1Scatter Plot

solution: the below graph has shown no pattern (no indication of a


strong positive or negative linear relationship).
10-1Scatter Plot

solution: the below graph has shown no pattern (no indication of a


strong positive or negative linear relationship).
10-1 Correlation Coefficient

Correlation Coefficient:
In addition to visually assessing relationship between any two
continuous variables by the scatter plot which depends on how the
viewer see the graph. A formal test can be used, in this chapter the
correlation coefficient is considered.

Correlation Coefficient Statisticians use a measure called the


correlation coefficient to determine the strength of the linear
relationship between two variables. There are several types of
correlation coefficients.

The population correlation coefficient: denoted by the Greek letter


ρ, is computed by using all possible pairs of data values (x, y)
taken from a population.
10-1 Correlation Coefficient
Correlation Coefficient tell us :
Strength of the relationship, Direction of the relationship.

The correlation coefficient relation(Properties of r):


The range of the correlation coefficient is from −1 to +1.
If there is a strong positive linear relationship between the
variables, the value of r will be close to +1.
If there is a strong negative linear relationship between the
variables, the value of r will be close to -1.
If the correlation coefficient equal zero then there is week or
no linear relationship between the variables.
10-1 Correlation Coefficient
Properties of r
10-1Correlation Coefficient
10-1Correlation Coefficient

The linear correlation coefficient denoted by r, is computed from


the sample data and measures the strength and direction of a
linear relationship between two quantitative variables.

The formula for Pearson correlation coefficient


P P P
n xy − ( x )( y)
r= p
x2 − ( x )2 ][n y2 − ( y )2 ]
P P P P
[n

The above coefficient is known as Pearson product moment


correlation coefficient (PPMC).
10-1 Correlation Coefficient
Example 10-4 Car Rental Companies
Use example 10 − 1 to compute the linear correlation coefficient for
the data shown for car rental companies in the United States for a
recent year.
solution:
Step 1: find the values of x · y, x 2 ,and y 2 and place these values in
corresponding columns of the table.
Step 2: substitute
P in the P and solve for r .
P formula
n( xy ) − ( x )( y )
r= p P
[n( x 2 ) − ( x )2 ][n( y 2 ) − ( y )2 ]
P P P

6(682.77) − (153.8)(18.7)
=p =0.982.
[6(5859.26) − (153.8)2 ][6(80.67) − (18.7)2 ]

Step 3: Conclusion: there is strong positive correlation between


the number of cars (x) and annual revenues (y).
Step 2: substitute
P in the P and solve for r .
P formula
n( xy ) − ( x )( y )
r= p P
[n( x 2 ) − ( x )2 ][n( y 2 ) − ( y )2 ]
P P P

6(682.77) − (153.8)(18.7)
=p =0.982.
[6(5859.26) − (153.8)2 ][6(80.67) − (18.7)2 ]

Step 3: Conclusion: there is strong positive correlation between


the number of cars (x) and annual revenues (y).
10-1 Correlation Coefficient
Example 10-5 page 555 (Negative correlation )
Use example 10 − 2 to compute the linear correlation coefficient for
the data obtained in the study of the number of absences and the
final grade of the seven students in the statistics class for a recent
year.
solution:
Step 1: find the values of x · y, x 2 ,and y 2 and place these values in
corresponding columns of the table.
Step 2: substitute in the formula and solve for r .
P P P
n xy − ( x )( y) 7(3745) − (57)(511)
r= p =p
[7(579) − (57)2 ][7(38, 993) − (511)2 ]
P P P P
[n x2 −( x )2 ][n y2 −( y )2 ]

r = −0.944.

Step 3: Conclusion: there is strong negative relationship between


the student’s final grade (x) and number of absences a student has
(y).
Step 2: substitute in the formula and solve for r .
P P P
n xy − ( x )( y) 7(3745) − (57)(511)
r= p =p
[7(579) − (57)2 ][7(38, 993) − (511)2 ]
P P P P
[n x2 −( x )2 ][n y2 −( y )2 ]

r = −0.944.

Step 3: Conclusion: there is strong negative relationship between


the student’s final grade (x) and number of absences a student has
(y).
Example 10-6 Numbers of Teachers and Pupils per Teacher
Use example 10 − 3 to compute the linear correlation coefficient for the data for
the number of teachers (in thousands) and the number of pupils per teacher.
solution:

Step 2: substitute in the formula and solve for r .


P P P
n( xy ) − (
x )( y )
r= p P P P P
[n( x 2 ) − ( x )2 ][n( y 2 ) − ( y )2 ]
10(2117.4) − (151)(133.9)2 955.1
=p = p = 0.442
[10(3187) − (151)2 ][10(1844.33) − (133.9)2 ] (9069)(514.09)
Step 3: The value of r indicates a weak positive linear relationship between the
number of teachers employed(x) and the number of pupils per teacher(y).
Example 10-6 Numbers of Teachers and Pupils per Teacher
Use example 10 − 3 to compute the linear correlation coefficient for the data for
the number of teachers (in thousands) and the number of pupils per teacher.
solution:

Step 2: substitute in the formula and solve for r .


P P P
n( xy ) − (
x )( y )
r= p P P P P
[n( x 2 ) − ( x )2 ][n( y 2 ) − ( y )2 ]
10(2117.4) − (151)(133.9)2 955.1
=p = p = 0.442
[10(3187) − (151)2 ][10(1844.33) − (133.9)2 ] (9069)(514.09)
Step 3: The value of r indicates a weak positive linear relationship between the
number of teachers employed(x) and the number of pupils per teacher(y).
13-6 The Spearman Rank Correlation Coefficient and the Runs Test

In Chapter 10: To determine whether two variables are linearly


related, you use the Pearson product moment correlation
coefficient. The Pearson coefficient assume that the data from
which the samples are obtained are normally distributed.If this
requirement cannot be met, the nonparametric equivalent, called
the Spearman rank correlation coefficient (denoted by rs ), can be
used when the data are ranked (Pg. 715, chp 13-6).

Formula for Computing the Spearman Rank Correlation Coefficient


6 d2
P
rs =1 −
n (n 2 − 1 )
where,
d= difference in ranks
n=the sample size.
13-6 The Spearman Rank Correlation Coefficient and the Runs Test
EXAMPLE 13-7 Bank Branches and Deposits
A researcher wishes to see if there is a relationship between the
number of branches of a bank has and the total number of
deposits (in billions of dollars) the bank receives.
6 d2
P
6 × 12 72
rs =1 − 2
=1− =1− = 0.857
n (n − 1 ) 8(64 − 1) 504
The above value indicates that we have a strong positive
correlation.
We can calculate Spearmen’s correlation if the data are
ordinal-level qualitative.
6 d2
P
6 × 12 72
rs =1 − 2
=1− =1− = 0.857
n (n − 1 ) 8(64 − 1) 504
The above value indicates that we have a strong positive
correlation.
We can calculate Spearmen’s correlation if the data are
ordinal-level qualitative.
10-2 Introduction to simple linear regression

To determine the significance of the relationship we computed


the value of the correlation coefficient between the variables.
So correlation tells you if there is an association between x
and y but it does not describe the relationship or allow you to
predict one variable from the other.

There may be simple cause-and-effect relationship between


two variables. That is, x causes y.
Example: relationship between the number of persons
employed in the household and the household monthly
income. Or the relationship between the heights of the
building and the number of stories in the building.
10-2 Introduction to simple linear regression

To determine cause-and-effect relationship we need


REGRESSION!
Regression tells us how to draw the straight line described by
the correlation
If the value of the correlation coefficient is significant, the next
step is to determine the equation of the regression line, which
is the data’s line of best file.
10-2 Simple linear regression
Assuming that 2 variables are linearly related
The question is : what should be considered a good line?
We find the best fitting line through the points on the scatterplot
Line of Best Fit

FIGURE 10-11: Scatter Plot with Three Lines Fit to the Data.
10-2 Simple linear regression
With each observed pair (xi , yi ) there is a quantity called a residual defines as:
0
ei = yi − yi
A summary measure of the distances of the data point to the regression line
(fitted line) is called residual sum of squares denoted by SSE defined as:
Pn 0
ei2 = − yi )2
P
Sum of Squared Error(SSE ) = i =1 (yi

Best fit or a good line is one that minimizes the sum of squared differences
between the points and the line.

The smaller the sum of squared differences, the better the line fit the data.
10-2 Simple linear regression

Best file :
Means that the sum of the squares of the vertical distances from
each point to the line is at a minimum.
10-2 Simple linear regression

Best file :
Means that the sum of the squares of the vertical distances from
each point to the line is at a minimum.
10-2 simple linear regression
The simple linear regression model

y0 = a + b · x

where,
y 0 is called dependent variable, x is called independent
variable.
a is the intercept of the regression constant, b is the slope or
the regression coefficient, x is the observed independent
variable, and they are used to calculate y 0 which is the
predicted dependent variable.
a and b are unknown parameters, and we will use a statistical
method to estimate their values.

Note. The true functional relation between x and y is almost


always unknown in practice.
10-2 Simple linear regression

The amount of tilt in the regression line is its slope.


The point where the regression line touches the y axis is the
intercept.
10-2 Simple linear regression

Why it is called the simple linear regression

The simple linear regression: is said to be simple linear since it


is linear in the parameters and their is one predictor variable.

y 0 = a + b · x.

Example: of nonlinear regression: y 0 = a + e bx .


Multiple linear regression
There may be a complexity of interrelationships among many
variables. For example, A researcher may find a significant
relationship between students high school grades and college
grades. But there probably are many other variables involved,
such as IQ, hours of study, status, age, and instructors.

This graph illustrate the observations for two predictor variables


x1 , x2 and a response y
10-2 Simple linear regression

Determination of the Regression Line Equation

The equation for the estimate for a, b are

( y )( x 2 ) − ( x )( y )
P P P P
a= ,
n ( x ) − ( x )2
P 2 P

P P P
n( xy ) − ( x )( y )
b= .
n( x 2 ) − ( x )2
P P
10-2 Simple linear regression

Trying out different values of b is equivalent to changing the slope


of the line, while a stays constant.
10-2 Simple linear regression

Trying different values of f a is equivalent to shifting the line up and


down the scatter plot
10-2 Simple linear regression
Finding the Regression Line Equation
Step 1: Make a table, as shown in step 2.
Step 2: Find the values of xy, x 2 , and y 2 . Place them in the
appropriate columns and sum each column.

Step 3: When r is significant, substitute in the formulas to find


the value of a and b for the regression line equation
y 0 = a + bx .
( y )( x 2 ) − ( x )( xy )
P P P P P P P
n( xy ) − ( x )( y )
a= ,b = .
n( x 2 ) − ( x )2 n ( x 2 ) − ( x )2
P P P P
10-2 Simple linear regression

Example 10-9 Car Rental Companies


Use example 10 − 4 Find the equation of the regression line for the data in
Example 10 -4, and graph the line on the scatter plot of the data.
solution:
a) Step 1: find the values of x · y, x 2 ,and y 2 and place these values in
corresponding columns of the table.
10-2 Simple linear regression
Continue example 10-9:
The values needed for the equation are n=6,
x = 153.8, y = 18.7, xy = 682.77, and x 2 = 5859.26.
P P P P
Step 2:Substituting in the formulas, you get,

( y )( x 2 ) − ( x )( xy )
P P P P
a=
n( x 2 ) − ( x )2
P P
(18.7)(5859.26) − (153.8)(682.77)
= = 0.396.
6(5859.26) − (153.8)2
n( xy ) − ( x )( y ) 6(682.77) − (153.8)(18.7)
P P P
b= = = 0.106.
n( x 2 ) − ( x )2 6(5859.26) − (153.8)2
P P

Hence, the equation of the regression line y 0 = a + bx is


y 0 = 0.396 + 0.106x
Interpretation: When the number of cars increases by 1, the total
revenue that is made by the compony increases by 0.106 in billions
of dollars.
To graph the regression line: select any two points for x and find
the corresponding value for y. Use any x values between 10 and
60.
For example, let x = 15. Substitute in the equation and find the
corresponding y 0 value.
y 0 = 0.396 + 0.106x
= 0.396 + 0.106(15)
= 1.986
Let x = 40; then
y 0 = 0.396 + 0.106x
= 0.396 + 0.106(40)
= 4.636
Then plot the two points (15,1.986) and (40, 4.636) and draw a line
connecting the two points. See Figure 10-14.

Note: The sign of the correlation coefficient and the sign of the
slope of the regression line will always be the same.
10-2 Simple linear regression

EXAMPLE 10-10 Page (567) Absences and Final Grades


Number of absences is the independent variable x, while the
final grade is the dependent variable y. The regression line is
found to be:
y 0 = 102.493 − 3.622 · x
This means that as the number of absences increases by 1 as
the final grade decreases by 3.622 on average.
10-2 Simple linear regression

EXAMPLE 10-10 Page (567) Absences and Final Grades


Number of absences is the independent variable x, while the
final grade is the dependent variable y. The regression line is
found to be:
y 0 = 102.493 − 3.622 · x
This means that as the number of absences increases by 1 as
the final grade decreases by 3.622 on average.
10-2 Simple linear regression
x: the number of homework for a student during a week.
y: time required to finish the homeworks.
Suppose that a simple linear regression model describes the relationship
between y (response) and x (predictor):
y0 = a + b · x
For illustration purposes, assume a = 9.5 and b = 2.1, and present below the
regression line, y 0 = 9.5 + 2.1x:

The slope indicates the change in the mean of the probability distribution of y per
unit increase in x.
And the intercept indicates that the expected value of y when x equals zero.
For the example: The slope b = 2.1 indicates, that the preparation of one
additional homework in a week leads to an increase in the mean of the probability
distribution of y by 2.1 hours.
10-2 Simple linear regression

For given xi (ex ., xi = 45), the observed yi must be normally


distributed about the regression line. This population is
described by a Normal distribution with mean
E (yi ) = 9.5 + 2.1xi (= 104) and variance Var (yi ) = σ2 .
The observed yi (ex ., yi = 108) may not be exactly the
sameE (yi ). In particular, for the above yi we have
ei = yi − E (yi ) = 108 − (9.5 + 2.1 × 45) = 4
10-2 Simple linear regression

EXAMPLE 10-11 Page (568)


Use the equation of the regression line in Example 10-10 to predict
the final grade for a student who missed 4 classes.
Solution:
Substitute 4 for x in the regression line equation
y 0 = 102.493 − 3.622 · x

Let x = 4; then
y 0 = 102.493 − 3.622x
= 102.493 − 3.622(4)
= 88.005
= 88(rounded )

Hence, when a student misses 4 classes, the student’s grade on


the final exam is predicted to be about 88.
HOMEWORK:Chapter 10
Exercises 10 -1: page 384 (1,3,5,7,9,11,15,25).
Exercises 10-2: page 392-393 (31, 33, 35)

You might also like