0% found this document useful (0 votes)

137 views

Topic03 Correlation Regression

0 1 1 0 1 1 0 1 1 0 0 0  = -0.49

Uploaded by

pradeep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views

Topic03 Correlation Regression

0 1 1 0 1 1 0 1 1 0 0 0  = -0.49

Uploaded by

pradeep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 81

Correlation and

Regression
Cal State Northridge
427
Ainsworth
Major Points - Correlation
 Questions answered by correlation
 Scatterplots

 An example

 The correlation coefficient

 Other kinds of correlations

 Factors affecting correlations

 Testing for significance

The Question

 Are two variables related?

 Does one increase as the other increases?
 e. g. skills and income
 Does one decrease as the other increases?
 e. g. health problems and nutrition
 How can we get a numerical measure of
the degree of relationship?
Scatterplots

 AKA scatter diagram or scattergram.

 Graphically depicts the relationship
between two variables in two dimensional
space.
Direct Relationship
Scatterplot:Video Games and Alcohol Consumption

20
Average Number of Alcoholic Drinks

18
16
14
Per Week

12
10
8
6
4
2
0
0 5 10 15 20 25
Average Hours of Video Games Per Week
Inverse Relationship
Scatterplot: Video Games and Test Score

100
90
80
70
Exam Score

60
50
40
30
20
10
0
0 5 10 15 20
Average Hours of Video Games Per Week
An Example

 Does smoking cigarettes increase systolic

blood pressure?
 Plotting number of cigarettes smoked per
day against systolic blood pressure
 Fairly moderate relationship
 Relationship is positive
Trend?
170

160

150

140

130

120
SYSTOLIC

110

100
0 10 20 30

SMOKING
Smoking and BP

 Note relationship is moderate, but real.

 Why do we care about relationship?

 What would conclude if there were no

relationship?
 What if the relationship were near perfect?
 What if the relationship were negative?
Heart Disease and Cigarettes

 Data on heart disease and cigarette

smoking in 21 developed countries
(Landwehr and Watkins, 1987)
 Data have been rounded for computational
convenience.
 The results were not affected.
Country Cigarettes CHD

The Data
1 11 26
2 9 21
3 9 24
4 9 21
5 8 19
6 8 13
7 8 19
Surprisingly, the 8 6 11
9 6 23
U.S. is the first 10 5 15
country on the list- 11 5 13
12 5 4
-the country 13 5 18
with the highest 14 5 12
15 5 3
consumption and 16 4 11
17 4 15
highest mortality. 18 4 6
19 3 13
20 3 4
21 3 14
Scatterplot of Heart Disease

 CHD Mortality goes on ordinate (Y axis)

 Why?

 Cigarette consumption on abscissa (X

axis)
 Why?

 What does each dot represent?

 Best fitting line included for clarity
30

{X = 6, Y = 11}

0
2 4 6 8 10 12

Cigarette Consumption per Adult per Day

What Does the Scatterplot Show?

 As smoking increases, so does coronary

heart disease mortality.
 Relationship looks strong

 Not all data points on line.

 This
gives us “residuals” or “errors of
prediction”
 To be discussed later
Correlation

 Co-relation
 The relationship between two variables

 Measured with a correlation coefficient

 Most popularly seen correlation

coefficient: Pearson Product-Moment
Correlation
Types of Correlation
 Positive correlation
 High values of X tend to be associated with
high values of Y.
 As X increases, Y increases
 Negative correlation
 High values of X tend to be associated with
low values of Y.
 As X increases, Y decreases
 No correlation
 No consistent tendency for values on Y to
increase or decrease as X increases
Correlation Coefficient

 A measure of degree of relationship.

 Between 1 and -1

 Sign refers to direction.

 Based on covariance
 Measure of degree to which large scores on
X go with large scores on Y, and small scores
on X go with small scores on Y
 Think of it as variance, but with 2 variables
instead of 1 (What does that mean??)
18
Covariance
 Remember that variance is:
( X  X ) 2
( X  X )( X  X )
VarX  
N 1 N 1
 The formula for co-variance is:

( X  X )(Y  Y )
Cov XY 
N 1
 How this works, and why?
 When would covXY be large and positive?
Large and negative?
Country X (Cig.) Y (CHD) (X  X ) (Y  Y ) ( X  X ) * (Y  Y )
1 11 26 5.05 11.48 57.97
2 9 21 3.05 6.48 19.76
3 9 24 3.05 9.48 28.91
4 9 21 3.05 6.48 19.76
5 8 19 2.05 4.48 9.18
6 8 13 2.05 -1.52 -3.12
7 8 19 2.05 4.48 9.18
8 6 11 0.05 -3.52 -0.18
9 6 23 0.05 8.48 0.42

Example
10 5 15 -0.95 0.48 -0.46
11 5 13 -0.95 -1.52 1.44
12 5 4 -0.95 -10.52 9.99
13 5 18 -0.95 3.48 -3.31
14 5 12 -0.95 -2.52 2.39
15 5 3 -0.95 -11.52 10.94
16 4 11 -1.95 -3.52 6.86
17 4 15 -1.95 0.48 -0.94
18 4 6 -1.95 -8.52 16.61
19 3 13 -2.95 -1.52 4.48
20 3 4 -2.95 -10.52 31.03
21 3 14 -2.95 -0.52 1.53
Mean 5.95 14.52
SD 2.33 6.69
Sum 222.44
Example
21

( X  X )(Y  Y ) 222.44
Covcig .&CHD    11.12
N 1 21  1
 What the heck is a covariance?
 I thought we were talking about
correlation?
Correlation Coefficient

 Pearson’s Product Moment Correlation

 Symbolized by r

 Covariance ÷ (product of the 2 SDs)

Cov XY
r
s X sY
 Correlation is a standardized
covariance
Calculation for Example

 CovXY = 11.12
 sX = 2.33

 sY = 6.69

cov XY 11.12 11.12

r    .713
s X sY (2.33)(6.69) 15.59
Example

 Correlation = .713
 Sign is positive

 Why?

 If sign were negative

 What would it mean?
 Would not alter the degree of relationship.
Other calculations
25

Z-score method


r
 z z x y

N 1

 Computational (Raw Score) Method

N  XY   X  Y
r
 N  X 2  ( X )2   N  Y 2  ( Y )2 
Other Kinds of Correlation
 Spearman Rank-Order Correlation
Coefficient (rsp)
 used with 2 ranked/ordinal variables
 uses the same Pearson formula
Attractiveness Symmetry
3 2
4 6
1 1
2 3
5 4
6 5 26
rsp = 0.77
Other Kinds of Correlation
 Point biserial correlation coefficient (rpb)
 used with one continuous scale and one
nominal or ordinal or dichotomous scale.
 uses the same Pearson formula
Attractiveness Date?
3 0
4 0
1 1
2 1
5 1
6 0
rpb = -0.49 27
Other Kinds of Correlation
 Phi coefficient ()
 used with two dichotomous scales.
 uses the same Pearson formula

Attractiveness Date?
0 0
1 0
1 1
1 1
0 0
1 1
 = 0.71 28
Factors Affecting r
 Range restrictions
 Looking at only a small portion of the total
scatter plot (looking at a smaller portion of
the scores’ variability) decreases r.
 Reducing variability reduces r
 Nonlinearity
 The Pearson r (and its relatives) measure the
degree of linear relationship between two
variables
 If a strong non-linear relationship exists, r will
provide a low, or at least inaccurate measure
of the true relationship.
Factors Affecting r
 Heterogeneous subsamples
 Everyday examples (e.g. height and weight
using both men and women)
 Outliers
 Overestimate Correlation
 Underestimate Correlation
Countries With Low Consumptions
Data With Restricted Range

Truncated at 5 Cigarettes Per Day

16
CHD Mortality per 10,000

4
2
2.5 3.0 3.5 4.0 4.5 5.0 5.5

Cigarette Consumption per Adult per Day

Truncation
32
Non-linearity
33
Heterogenous samples
34
Outliers
35
Testing Correlations
36

 So you have a correlation. Now what?

 In terms of magnitude, how big is big?
 Smallcorrelations in large samples are “big.”
 Large correlations in small samples aren’t
always “big.”
 Depends upon the magnitude of the
correlation coefficient
AND
 The size of your sample.
Testing r

 Population parameter = 
 Null hypothesis H0:  = 0

 Test
of linear independence
 What would a true null mean here?
 What would a false null mean here?

 Alternative hypothesis (H1)   0

 Two-tailed
Tables of Significance
 We can convert r to t and test for
significance:

N 2
tr
1 r 2

 Where DF = N-2
Tables of Significance
 In our example r was .71
 N-2 = 21 – 2 = 19

N 2 19 19
tr  .71*  .71*  6.90
1 r 2
1  .712
.4959

 T-crit (19) = 2.09

 Since 6.90 is larger than 2.09 reject  = 0.
Computer Printout
 Printout gives test of significance.
Correlations

CIGARET CHD
CIGARET Pears on Correlation 1 .713**
Sig. (2-tailed) . .000
N 21 21
CHD Pears on Correlation .713** 1
Sig. (2-tailed) .000 .
N 21 21
**. Correlation is significant at the 0.01 level (2-tailed).
Regression
What is regression?
42

 How do we predict one variable from

another?
 How does one variable change as the

other changes?
 Influence
Linear Regression
43

 A technique we use to predict the most

likely score on one variable from those
on another variable
 Uses the nature of the relationship (i.e.
correlation) between two variables to
enhance your prediction
Linear Regression: Parts
44

 Y - the variables you are predicting

 i.e. dependent variable
 X - the variables you are using to predict
 i.e. independent variable
 Ŷ - your predictions (also known as Y’)
Why Do We Care?
45

 We may want to make a prediction.

 More likely, we want to understand the
relationship.
 How fast does CHD mortality rise with a
one unit increase in smoking?
 Note: we speak about predicting, but
often don’t actually predict.
An Example
46

 Cigarettes and CHD Mortality again

 Data repeated on next slide

 We want to predict level of CHD

mortality in a country averaging 10
cigarettes per day.
Country Cigarettes CHD
1 11 26

47
The Data 2
3
9
9
21
24
4 9 21
5 8 19
Based on the data we have 6 8 13
what would we predict the 7
8
8
6
19
11
rate of CHD be in a country 9 6 23
10 5 15
that smoked 10 cigarettes on 11 5 13
12 5 4
average? 13 5 18
14 5 12
First, we need to establish a 15 5 3
4 11
prediction of CHD from 16
17 4 15
smoking… 18
19
4
3
6
13
20 3 4
21 3 14
30

We predict a
20
CHD rate of
about 14
Regression
Line

For a country that

smokes 6 C/A/D…
0
2 4 6 8 10 12

Cigarette Consumption per Adult per Day

48
Regression Line
49

 Formula

Yˆ  bX  a
Yˆ
 = the predicted value of Y (e.g. CHD
mortality)
 X = the predictor variable (e.g. average
cig./adult/country)
Regression Coefficients
50

 “Coefficients” are a and b

 b = slope

 Change in predicted Y for one unit change

in X
 a = intercept
 value of Yˆ when X = 0
Calculation
51

 Slope cov XY  sy 
b  2 or b  r  
sX  sx 
N  XY   X  Y
or b 
 N  X 2  ( X ) 2 

 Intercept a  Y bX
For Our Data
52

 CovXY = 11.12
 s2X = 2.332 = 5.447

 b = 11.12/5.447 = 2.042

 a = 14.524 - 2.042*5.952 = 2.32

 See SPSS printout on next slide

Answers are not exact due to rounding error and desire to match
SPSS.
SPSS Printout
53
Note:
54

 The values we obtained are shown on

printout.
 The intercept is the value in the B

column labeled “constant”

 The slope is the value in the B column
labeled by name of predictor variable.
Making a Prediction
55

 Second, once we know the relationship

we can predict
Yˆ  bX  a  2.042 X  2.367
Yˆ  2.042*10  2.367  22.787
 We predict 22.77 people/10,000 in a
country with an average of 10 C/A/D
will die of CHD
Accuracy of Prediction
 Finnish smokers smoke 6 C/A/D
 We predict:

Yˆ  bX  a  2.042 X  2.367
Yˆ  2.042*6  2.367  14.619
 They actually have 23 deaths/10,000
 Our error (“residual”) =

23 - 14.619 = 8.38
a large error
56
30

CHD Mortality per 10,000 Residual

Prediction

0
2 4 6 8 10 12

Cigarette Consumption per Adult per Day

57
Residuals
58

 When we predict Ŷ for a given X, we will

sometimes be in error.
 Y – Ŷ for any X is a an error of estimate

 Also known as: a residual

 We want to Σ(Y- Ŷ) as small as possible.

 BUT, there are infinitely many lines that can do

this.
 Just draw ANY line that goes through the
mean of the X and Y values.
 Minimize Errors of Estimate… How?
Minimizing Residuals
59

 Again, the problem lies with this

definition of the mean:

 ( X  X )  0
 So, how do we get rid of the 0’s?
 Square them.
Regression Line:
A Mathematical Definition
 The regression line is the line which when
drawn through your data set produces the
smallest value of:

 (Y  Y )
ˆ 2

 Called the Sum of Squared Residual or

SSresidual
 Regression line is also called a “least squares
line.” 60
Summarizing Errors of Prediction
61

 Residual variance
 The variability of predicted values
ˆ
(Yi  Yi ) 2
SSresidual
s2
Y Yˆ
 
N 2 N 2
Standard Error of Estimate
62

 Standard error of estimate

 Thestandard deviation of predicted
values
ˆ
(Yi  Yi ) 2
SSresidual
sY Yˆ  
N 2 N 2
 A common measure of the accuracy of
our predictions
 We want it to be as small as possible.
Country X (Cig.) Y (CHD) Y' (Y - Y') (Y - Y')2

Example
1 11 26 24.829 1.171 1.371
2 9 21 20.745 0.255 0.065
3 9 24 20.745 3.255 10.595
63 4 9 21 20.745 0.255 0.065
5 8 19 18.703 0.297 0.088
(Yi  Yˆi ) 2 440.756
6 8 13 18.703 -5.703 32.524 2
s
Y Yˆ
   23.198
7 8 19 18.703 0.297 0.088 N 2 21  2
8 6 11 14.619 -3.619 13.097
9 6 23 14.619 8.381 70.241 (Yi  Yˆi )2 440.756
10 5 15 12.577 2.423 5.871 sY Yˆ   
11 5 13 12.577 0.423 0.179
N 2 21  2
12 5 4 12.577 -8.577 73.565  23.198  4.816
13 5 18 12.577 5.423 29.409
14 5 12 12.577 -0.577 0.333
15 5 3 12.577 -9.577 91.719
16 4 11 10.535 0.465 0.216
17 4 15 10.535 4.465 19.936
18 4 6 10.535 -4.535 20.566
19 3 13 8.493 4.507 20.313
20 3 4 8.493 -4.493 20.187
21 3 14 8.493 5.507 30.327
Mean 5.952 14.524
SD 2.334 6.690
Sum 0.04 440.757
Regression and Z Scores
64

 When your data are standardized (linearly

transformed to z-scores), the slope of the
regression line is called β
 DO NOT confuse this β with the β

associated with type II errors. They’re

different.
 When we have one predictor, r = β

 Zy = βZx, since A now equals 0

Partitioning Variability
65

 Sums of square deviations

 Total
SStotal   (Y  Y )
2

SSregression   (Yˆ  Y )
2
 Regression

 Residual we already covered

SSresidual   (Y  Yˆ )
2

 SStotal = SSregression + SSresidual

Partitioning Variability
66

 Degrees of freedom
 Total
 dftotal =N-1
 Regression
 dfregression = number of predictors
 Residual
 dfresidual = dftotal – dfregression
 dftotal = dfregression + dfresidual
Partitioning Variability
67

 Variance (or Mean Square)

 Total Variance
 s2total = SStotal/ dftotal
 Regression Variance
 s2regression = SSregression/ dfregression
 Residual Variance
 s2residual = SSresidual/ dfresidual
Country X (Cig.) Y (CHD) Y' (Y - Y') (Y - Y')2 (Y' - Ybar) (Y - Ybar)
1 11 26 24.829 1.171 1.371 106.193 131.699
2 9 21 20.745 0.255 0.065 38.701 41.939
3 9 24 20.745 3.255 10.595 38.701 89.795
68
4 9 21 20.745 0.255 0.065 38.701 41.939
5 8 19 18.703 0.297 0.088 17.464 20.035
6 8 13 18.703 -5.703 32.524 17.464 2.323
7 8 19 18.703 0.297 0.088 17.464 20.035
8 6 11 14.619 -3.619 13.097 0.009 12.419
9
10
11
6
5
5
23
15
13
14.619
12.577
12.577
8.381
2.423
0.423
70.241
5.871
0.179
0.009
3.791
3.791
71.843
0.227
2.323
Example
12 5 4 12.577 -8.577 73.565 3.791 110.755
13 5 18 12.577 5.423 29.409 3.791 12.083
14 5 12 12.577 -0.577 0.333 3.791 6.371
15 5 3 12.577 -9.577 91.719 3.791 132.803
16 4 11 10.535 0.465 0.216 15.912 12.419
17 4 15 10.535 4.465 19.936 15.912 0.227
18 4 6 10.535 -4.535 20.566 15.912 72.659
19 3 13 8.493 4.507 20.313 36.373 2.323
20 3 4 8.493 -4.493 20.187 36.373 110.755
21 3 14 8.493 5.507 30.327 36.373 0.275
Mean 5.952 14.524
SD 2.334 6.690
Sum 0.04 440.757 454.307 895.247
Y' = (2.04*X) + 2.37
Example
SSTotal   (Y  Y )  895.247; dftotal  21  1  20
2
69

  (Y  Y )  454.307; df regression  1 (only 1 predictor)

ˆ 2
SSregression

  (Y  Y )  440.757; df residual  20  1  19
ˆ 2
SSresidual


2
(Y  Y ) 895.247
s2
   44.762
N 1
total
20
 (Y  Y )
ˆ 2
454.307
s2
regression    454.307
1 1
 (Y  Y )
ˆ 2
440.757
s2
   23.198
N 2
residual
19
2
Note : sresidual  sY Yˆ
Coefficient of Determination
70

 It is a measure of the percent of

predictable variability
r 2  the correlation squared
or
SS regression
r 
2

SSY
 The percentage of the total variability in
Y explained by X
2
r for our example
71

 r = .713
 r 2 = .7132 =.508

SSregression 454.307
r 
2
  .507
 or SSY 895.247

 Approximately 50% in variability of

incidence of CHD mortality is associated with
variability in smoking.
Coefficient of Alienation
72

 It is defined as 1 - r 2 or
SSresidual
1 r 2

SSY
 Example
1 - .508 = .492
SS residual 440.757
1 r 
2
  .492
SSY 895.247
2
r, SS and sY-Y’
73

 r2 * SStotal = SSregression
 (1 - r2) * SStotal = SSresidual

 We can also use r2 to calculate the

standard error of estimate as:

 N 1   20 
sY Yˆ  s y (1  r ) 
2
  6.690* (.492)    4.816
 N 2  19 
Testing Overall Model
74

 We can test for the overall prediction of

the model by forming the ratio:
2
sregression
2
 F statistic
sresidual
 If the calculated F value is larger than a
tabled value (F-Table) we have a
significant prediction
Testing Overall Model
75

 Example 2
sregression 454.307
2
  19.594
sresidual 23.198

 F-Table – F critical is found using 2 things

dfregression (numerator) and dfresidual.(demoninator)
 F-Table our Fcrit (1,19) = 4.38
 19.594 > 4.38, significant overall
 Should all sound familiar…
SPSS output
76

Model Summary

Adjus ted Std. Error of

Model R R Square R Square the Es timate
1 .713 a .508 .482 4.81640
a. Predictors : (Constant), CIGARETT

ANOVAb

Sum of
Model Squares df Mean Square F Sig.
1 Regress ion 454.482 1 454.482 19.592 .000 a
Res idual 440.757 19 23.198
Total 895.238 20
a. Predictors : (Constant), CIGARETT
b. Dependent Variable: CHD
Testing Slope and Intercept
77

 The regression coefficients can be

tested for significance
 Each coefficient divided by it’s

standard error equals a t value that

can also be looked up in a t-table
 Each coefficient is tested against 0
Testing the Slope
78

 With only 1 predictor, the standard

error for the slope is:
sY Yˆ
seb 
sX N  1
 For our Example:
4.816 4.816
seb    .461
2.334 21  1 10.438
Testing Slope and Intercept
79

 These are given in computer printout as

a t test.
Testing
80

 The t values in the second from right

column are tests on slope and intercept.
 The associated p values are next to
them.
 The slope is significantly different from
zero, but not the intercept.
 Why do we care?
Testing
81

 What does it mean if slope is not

significant?
 How does that relate to test on r?
 What if the intercept is not significant?
 Does significant slope mean we predict

quite well?

Untitled
No ratings yet
Untitled
1,326 pages
638 J Introduction To The New Statistics Estimation, Open Science, and Beyond VCG 2 3
No ratings yet
638 J Introduction To The New Statistics Estimation, Open Science, and Beyond VCG 2 3
594 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Faculties: Training Program On Industry 4.0
100% (3)
Faculties: Training Program On Industry 4.0
45 pages
Soluciones Libro Daniel
No ratings yet
Soluciones Libro Daniel
273 pages
MBA Free Ebooks
No ratings yet
MBA Free Ebooks
56 pages
Calculating Total Scale Scores and Reliability SPSS - D.boduszek
No ratings yet
Calculating Total Scale Scores and Reliability SPSS - D.boduszek
16 pages
Rab Nawaz Lodhi Management Sciences 2016 HSR BU Islamabad 27.07.2017 PDF
No ratings yet
Rab Nawaz Lodhi Management Sciences 2016 HSR BU Islamabad 27.07.2017 PDF
302 pages
Applied Data Analysis (With SPSS)
No ratings yet
Applied Data Analysis (With SPSS)
19 pages
Factor Analysis Using SPSS: Example
No ratings yet
Factor Analysis Using SPSS: Example
14 pages
Calculus I Lectures
100% (19)
Calculus I Lectures
212 pages
Presenter:: Prof. Richard Chinomona
100% (1)
Presenter:: Prof. Richard Chinomona
55 pages
Probability Theory Problems PDF
No ratings yet
Probability Theory Problems PDF
11 pages
SPSS2 Workshop Handout 20200917
No ratings yet
SPSS2 Workshop Handout 20200917
17 pages
Basic Statistics
No ratings yet
Basic Statistics
66 pages
STATA Codes - Basic
No ratings yet
STATA Codes - Basic
8 pages
ch04 - CH 04
No ratings yet
ch04 - CH 04
98 pages
SPSS Multiple Linear Regression
No ratings yet
SPSS Multiple Linear Regression
55 pages
Random Effects Models
No ratings yet
Random Effects Models
37 pages
Measuring The Occurrence of Disease: Dr. Elijah Kakande MBCHB, MPH Department of Public Health
No ratings yet
Measuring The Occurrence of Disease: Dr. Elijah Kakande MBCHB, MPH Department of Public Health
25 pages
Predictor Variables of Cyberloafing and Perceived Organizational Acceptance (Unpublished
No ratings yet
Predictor Variables of Cyberloafing and Perceived Organizational Acceptance (Unpublished
5 pages
6 - CFA-SEM Intro - 4-18-11
100% (1)
6 - CFA-SEM Intro - 4-18-11
94 pages
Linear Regression Model
No ratings yet
Linear Regression Model
3 pages
Logit Model For Binary Data
No ratings yet
Logit Model For Binary Data
50 pages
The Exponential Family of Distributions: P (X) H (X) e
No ratings yet
The Exponential Family of Distributions: P (X) H (X) e
13 pages
Malhotra Mr05 PPT 18
No ratings yet
Malhotra Mr05 PPT 18
47 pages
Dr. Chinmoy Jana Iiswbm: Management House, Kolkata
No ratings yet
Dr. Chinmoy Jana Iiswbm: Management House, Kolkata
22 pages
Factor Analysis Xid-2898537 1 BSCdOjdTGS
No ratings yet
Factor Analysis Xid-2898537 1 BSCdOjdTGS
64 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
MST005 Solved
No ratings yet
MST005 Solved
41 pages
Quartiles, Deciles, Percentiles
100% (1)
Quartiles, Deciles, Percentiles
5 pages
12.simple Regression NLS Edit
No ratings yet
12.simple Regression NLS Edit
62 pages
Best SEM STATA Menu StataSEMMasterDay2and3 PDF
No ratings yet
Best SEM STATA Menu StataSEMMasterDay2and3 PDF
58 pages
Essentials of Modern Business Statistics (7e) : Anderson, Sweeney, Williams, Camm, Cochran
No ratings yet
Essentials of Modern Business Statistics (7e) : Anderson, Sweeney, Williams, Camm, Cochran
57 pages
Groebner Business Statistics 7 Ch07
No ratings yet
Groebner Business Statistics 7 Ch07
34 pages
Topic 1
100% (1)
Topic 1
37 pages
OUTLIERS
100% (1)
OUTLIERS
5 pages
Statistics For Business and Economics: Describing Data: Numerical
No ratings yet
Statistics For Business and Economics: Describing Data: Numerical
40 pages
C 4
No ratings yet
C 4
61 pages
Tabu Ran Normal
100% (1)
Tabu Ran Normal
14 pages
Palompon Institute of Technology Palompon, Leyte: FD 502 (Educational Statitics)
No ratings yet
Palompon Institute of Technology Palompon, Leyte: FD 502 (Educational Statitics)
18 pages
Chapter 9: Correlation and Regression: Solutions
No ratings yet
Chapter 9: Correlation and Regression: Solutions
8 pages
Time Series Characteristic
No ratings yet
Time Series Characteristic
72 pages
10 - 11 SPSS Introduction PDF
No ratings yet
10 - 11 SPSS Introduction PDF
25 pages
9.data Analysis
No ratings yet
9.data Analysis
25 pages
(eBook PDF) Discovering Statistics Using IBM SPSS Statistics 4th download
100% (2)
(eBook PDF) Discovering Statistics Using IBM SPSS Statistics 4th download
60 pages
Practice Midterm2 Fall2011
No ratings yet
Practice Midterm2 Fall2011
9 pages
Count Data Models in SAS
No ratings yet
Count Data Models in SAS
12 pages
4.3. Parametric & Nonparametric Tests
No ratings yet
4.3. Parametric & Nonparametric Tests
26 pages
Ch. 9 Multiple Choice Review Questions: 1.96 B) 1.645 C) 1.699 D) 0.90 E) 1.311
No ratings yet
Ch. 9 Multiple Choice Review Questions: 1.96 B) 1.645 C) 1.699 D) 0.90 E) 1.311
5 pages
S.id.C.8 Linear Regression
No ratings yet
S.id.C.8 Linear Regression
11 pages
5 Measures of Dispersion
No ratings yet
5 Measures of Dispersion
14 pages
Exploratory Factor Analysis Kootstra 04
No ratings yet
Exploratory Factor Analysis Kootstra 04
15 pages
Ignou PGDAST Assignment Booklet Jan-Dec 2020
No ratings yet
Ignou PGDAST Assignment Booklet Jan-Dec 2020
30 pages
Statistics For Begineers
No ratings yet
Statistics For Begineers
28 pages
GSEMModellingusingStata PDF
No ratings yet
GSEMModellingusingStata PDF
97 pages
Correlation
No ratings yet
Correlation
82 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
14 - Pearson R Correlation (Edited)
No ratings yet
14 - Pearson R Correlation (Edited)
36 pages
Linear Correlation 1205885176993532 3
No ratings yet
Linear Correlation 1205885176993532 3
102 pages
Chapter 16
No ratings yet
Chapter 16
19 pages
Basic Statistics (3685) PPT - Lecture On 22-01-2019
No ratings yet
Basic Statistics (3685) PPT - Lecture On 22-01-2019
29 pages
My First Padded Board Books of Times Tables: Multiplication Tables From 1-20
From Everand
My First Padded Board Books of Times Tables: Multiplication Tables From 1-20
Wonder House Books
No ratings yet
8D Presentation ASQ
100% (2)
8D Presentation ASQ
29 pages
Intelligent Use of FMEA Presented By: Quality Associates International
100% (1)
Intelligent Use of FMEA Presented By: Quality Associates International
66 pages
3) Basic Statistics Using Minitab
No ratings yet
3) Basic Statistics Using Minitab
41 pages
What Is Problem Solving?: by Michael E. Martinez
No ratings yet
What Is Problem Solving?: by Michael E. Martinez
7 pages
First, Briefly Introduce Your Research Project. Some Key Questions You May Want To Answer Are
No ratings yet
First, Briefly Introduce Your Research Project. Some Key Questions You May Want To Answer Are
3 pages
I So 9001 Oxford
100% (1)
I So 9001 Oxford
17 pages
Bot Problem Solving PPT Botps2.0 103114 Final
100% (1)
Bot Problem Solving PPT Botps2.0 103114 Final
75 pages
Audit Process
No ratings yet
Audit Process
30 pages
An Introduction To Two-Way ANOVA: Prepared By
No ratings yet
An Introduction To Two-Way ANOVA: Prepared By
46 pages
07 Relation Analysis
No ratings yet
07 Relation Analysis
88 pages
Organization Vs Strategy
No ratings yet
Organization Vs Strategy
10 pages
Lehmann PDF
No ratings yet
Lehmann PDF
15 pages
Monte Carlo Integration: Robert Lin April 20, 2004
No ratings yet
Monte Carlo Integration: Robert Lin April 20, 2004
31 pages
Tell-We Ask Questions. We Don't Command-We Engage. Our People Are
No ratings yet
Tell-We Ask Questions. We Don't Command-We Engage. Our People Are
16 pages
Jan 2014 PDM Big Data Analytics
No ratings yet
Jan 2014 PDM Big Data Analytics
13 pages
Ch2 Operations Strategy
No ratings yet
Ch2 Operations Strategy
17 pages
Presentation For Business & Industrial Revolution
No ratings yet
Presentation For Business & Industrial Revolution
200 pages
Managing Supply Chain Risks: Emerging Trends & Technologies
100% (1)
Managing Supply Chain Risks: Emerging Trends & Technologies
30 pages
Coherence and Correspondence: Public Administration and Policy PAD634 Judgment and Decision Making Behavior
No ratings yet
Coherence and Correspondence: Public Administration and Policy PAD634 Judgment and Decision Making Behavior
27 pages
ENG Industry Commerce and Services
No ratings yet
ENG Industry Commerce and Services
14 pages
Culture & Business
No ratings yet
Culture & Business
20 pages
Decision Making Types
No ratings yet
Decision Making Types
30 pages
Human Error and Biases
No ratings yet
Human Error and Biases
12 pages
Globalisation: Concepts, Problems and Challenges
No ratings yet
Globalisation: Concepts, Problems and Challenges
30 pages
Globalization in India: By-Sumit Kumar
No ratings yet
Globalization in India: By-Sumit Kumar
38 pages
Application of Failure Mode Effect Analysis FMEA F
No ratings yet
Application of Failure Mode Effect Analysis FMEA F
17 pages
Risk and Return A - Solutions
No ratings yet
Risk and Return A - Solutions
3 pages
Midterm Review STA216: Generalized Linear Models: I I I I I I
No ratings yet
Midterm Review STA216: Generalized Linear Models: I I I I I I
26 pages
Choosing A Significance Test Objectives
No ratings yet
Choosing A Significance Test Objectives
15 pages
Stat-703 Final Paper
No ratings yet
Stat-703 Final Paper
13 pages
BIT INFO NEPAL - Basic Statistics - STA154-2078
No ratings yet
BIT INFO NEPAL - Basic Statistics - STA154-2078
3 pages
SL 4.9 Normal Distribution and Calculations
No ratings yet
SL 4.9 Normal Distribution and Calculations
31 pages
Analysis of Variance EBOOK PDF
No ratings yet
Analysis of Variance EBOOK PDF
493 pages
Anova Stat 101
No ratings yet
Anova Stat 101
25 pages
Lecture - 9 EstimationRM (ECON 1005 2011-2012)
No ratings yet
Lecture - 9 EstimationRM (ECON 1005 2011-2012)
52 pages
A-CAT Corp Forecasting Paper - Final
No ratings yet
A-CAT Corp Forecasting Paper - Final
11 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
33 pages
Multivariate Analysis in SPSS
No ratings yet
Multivariate Analysis in SPSS
65 pages
Chapter Three
No ratings yet
Chapter Three
5 pages
Islamic Finance Survey Data
No ratings yet
Islamic Finance Survey Data
571 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
Wilcoxon Test: Lador, Cindy P. Obinguar, Ma. An Gelica U Saludo, Coke Aidenry E. Sombelon, Mary Grace B
No ratings yet
Wilcoxon Test: Lador, Cindy P. Obinguar, Ma. An Gelica U Saludo, Coke Aidenry E. Sombelon, Mary Grace B
20 pages
Sample Multiple Question With Answer
No ratings yet
Sample Multiple Question With Answer
4 pages
30 Case Study
No ratings yet
30 Case Study
12 pages
Wisnu Haji - 1701035098 - Tugas SPSS
No ratings yet
Wisnu Haji - 1701035098 - Tugas SPSS
11 pages
Spss 1. Uji Normalitas Data (Kolmogorov-Smirnov)
No ratings yet
Spss 1. Uji Normalitas Data (Kolmogorov-Smirnov)
3 pages
Demantra Forecasting
No ratings yet
Demantra Forecasting
3 pages
Chapter 8 Testing Other Hypothesis
No ratings yet
Chapter 8 Testing Other Hypothesis
39 pages
4 Association of Attributes
No ratings yet
4 Association of Attributes
2 pages
Business Analytics 5
No ratings yet
Business Analytics 5
15 pages
ES5 Estaistica 1
No ratings yet
ES5 Estaistica 1
4 pages
Quarter 2 Week 2 Statistics and Probability
No ratings yet
Quarter 2 Week 2 Statistics and Probability
5 pages
Instant Download (Ebook PDF) Fundamental Statistics For The Behavioral Sciences 9th Edition by David C. Howell PDF All Chapter
100% (5)
Instant Download (Ebook PDF) Fundamental Statistics For The Behavioral Sciences 9th Edition by David C. Howell PDF All Chapter
41 pages

Topic03 Correlation Regression

Uploaded by

Topic03 Correlation Regression

Uploaded by

Correlation and

 The correlation coefficient

 Other kinds of correlations

 Factors affecting correlations

 Testing for significance

 Are two variables related?

 AKA scatter diagram or scattergram.

 Does smoking cigarettes increase systolic

 Note relationship is moderate, but real.

 What would conclude if there were no

 Data on heart disease and cigarette

 CHD Mortality goes on ordinate (Y axis)

 Cigarette consumption on abscissa (X

 What does each dot represent?

Cigarette Consumption per Adult per Day

 As smoking increases, so does coronary

 Not all data points on line.

 Measured with a correlation coefficient

 Most popularly seen correlation

 A measure of degree of relationship.

 Sign refers to direction.

 Pearson’s Product Moment Correlation

 Covariance ÷ (product of the 2 SDs)

cov XY 11.12 11.12

 If sign were negative

 Computational (Raw Score) Method

Truncated at 5 Cigarettes Per Day

Cigarette Consumption per Adult per Day

 So you have a correlation. Now what?

 Alternative hypothesis (H1)   0

 T-crit (19) = 2.09

 How do we predict one variable from

 A technique we use to predict the most

 Y - the variables you are predicting

 We may want to make a prediction.

 Cigarettes and CHD Mortality again

 We want to predict level of CHD

For a country that

Cigarette Consumption per Adult per Day

 “Coefficients” are a and b

 Change in predicted Y for one unit change

 a = 14.524 - 2.042*5.952 = 2.32

 See SPSS printout on next slide

 The values we obtained are shown on

column labeled “constant”

 Second, once we know the relationship

CHD Mortality per 10,000 Residual

Cigarette Consumption per Adult per Day

 When we predict Ŷ for a given X, we will

 Also known as: a residual

 We want to Σ(Y- Ŷ) as small as possible.

 BUT, there are infinitely many lines that can do

 Again, the problem lies with this

 Called the Sum of Squared Residual or

 Standard error of estimate

 When your data are standardized (linearly

associated with type II errors. They’re

 Zy = βZx, since A now equals 0

 Sums of square deviations

 Residual we already covered

 SStotal = SSregression + SSresidual

 Variance (or Mean Square)

  (Y  Y )  454.307; df regression  1 (only 1 predictor)

 It is a measure of the percent of

 Approximately 50% in variability of

 We can also use r2 to calculate the

 We can test for the overall prediction of

 F-Table – F critical is found using 2 things

Adjus ted Std. Error of

 The regression coefficients can be

standard error equals a t value that

 With only 1 predictor, the standard

 These are given in computer printout as

 The t values in the second from right

 What does it mean if slope is not

You might also like