Powerpoint - Regression and Correlation Analysis
Powerpoint - Regression and Correlation Analysis
Powerpoint - Regression and Correlation Analysis
IRRI-PBGB-CRIL 2
Example
Nitrogen Grain Yield
Content (%) (kg/ha)
For seven 0.12 1652
randomly
selected plots, 0.14 2056
nitrogen content 0.15 2598
in the soil and 0.16 2734
the grain yield 0.19 3238
were recorded.
0.22 4824
0.23
IRRI-PBGB-CRIL
4858 3
How would you describe the graph?
Grain Yield of Rice at differnt levels of
Soil Nitrogen Content
6000
5000
Grain Yield (kg/ha)
4000
3000
2000
1000
0.1 0.15 0.2 0.25
Nitrogen Content (% )
IRRI-PBGB-CRIL 5
Direction of Association
IRRI-PBGB-CRIL 6
Strength of Linear Association
r
value
Interpretation
0 no linear relationship
No Linear Correlation
IRRI-PBGB-CRIL 8
Other Strengths of Association
r value Interpretation
IRRI-PBGB-CRIL 9
Other Strengths of Association
Moderate Negative
Linear Correlation
IRRI-PBGB-CRIL 10
Formula
= the sum
n = number of paired
items
xi = input variable yi = output variable
x = x-bar = mean of y = y-bar = mean of
x’s y’s
sx= standard sy= standard
deviation of x’s deviation of y’s
IRRI-PBGB-CRIL 11
Correlation Coefficient (r)
IRRI-PBGB-CRIL 12
Correlation Coefficient
IRRI-PBGB-CRIL 13
Correlation Coefficient (r)
IRRI-PBGB-CRIL 14
Correlation Coefficient
500
r = .25**
450
n = 234
400
350
When no. of
observations is
Yield/plot
300
large, a low r-value
250
may still be
200
significant.
150
100
0 5 10 15 20
Tiller/plant
IRRI-PBGB-CRIL 15
Correlation Coefficient (r)
To be able to conclude that 2
variables have a strong linear
relationship, r should be both high
and significant
IRRI-PBGB-CRIL 16
Correlation Coefficient
6
5
r = .90**
n = 60
4
Yield (t/ha)
0
20 30 40 50 60 70 80 90 100 110
No. of spikelet/panicle
IRRI-PBGB-CRIL 17
Test of significance for r
Degrees of
Probability, p
Freedom
0.05 0.01 0.001
1 0.997 1.000 1.000
IRRI-PBGB-CRIL 18
CORRELATION ANALYSIS
IRRI-PBGB-CRIL 19
Regression Analysis
IRRI-PBGB-CRIL 20
Scientific Question
What is the growth rate of a rice plant?
IRRI-PBGB-CRIL 21
Data Collection
DAS Height (cm)
0 0
10 12
30 55
60 80
90 110
IRRI-PBGB-CRIL 22
Statistical Questions
• What is the relationship
120 between age and height?
100
Linear
P la n t H e ig h t (c m )
80
60
• How do I describe or
40
quantify the relationship?
20 Regression
0 • Is the association
0 20 40 60 80 100
Days after Seeding
significant?
Statistical Test
IRRI-PBGB-CRIL 23
Linear Regression
IRRI-PBGB-CRIL 24
Statistical Model
56
Data = Model Fit + Residual
54
52
Y Yi = Yˆi + ε i
50
48 Yˆi = β 0 + β1 X i
46
Intercept Slope
X
Yi = µ + α i + εi
IRRI-PBGB-CRIL 25
Least Squares Estimates
Yi = Yˆi + ε i Yˆi = β 0 + β1 X i
To estimate the intercept and slope,
minimize residual sum of squares (RSS)
∂β 0
=
∂β 0
= −2∑ (Yi − β 0 − β1 X i ) = 0 estimation by hand.
==> βˆ 0 = Y − βˆ1 X R/CropStat or other
∂RSS ∑ (Yi − Y + β1 X − β1 X i ) statistical packages can
2
= = −2∑ (X i − X )(Yi − Y + β1 X − β1 X i ) = 0
∂β1 ∂β1
∑ (X − X )(Y − Y ) do the work for us.
βˆ1 =
i i
==>
∑ (X − X )
i
2
IRRI-PBGB-CRIL 26
LINEAR REGRESSION ANALYSIS
Dependent Variable: Height
Analysis of Variance
SV Df Sum Square Mean Square F value Pr (>F)
DAS 1 8201.389781 8201.389781 95.435198 0.002279
Residuals 3 257.810219 85.93674
Model Summary
R Squared 0.969523
Adj. R Squared 0.959364
Parameter Estimates
Parameter Estimate Std. Error t value Pr (> |t|)
IRRI-PBGB-CRIL 27
Example: Growth Rate Data
Parameter Estimates
Parameter Estimate Std. Error t value Pr (> |t|)
(Intercept) 4.912409 6.311259 0.778356 0.493109
DAS 1.223358 0.125227 9.769094 0.002279
140
120
Height =4.9+ 1.223DAS
100 r = 0.98 Intercept: The height at age 0 is 4.9 cm.
Plant Height (cm)
20
0
0 20 40 60 80 100
Days after Seeding
IRRI-PBGB-CRIL 28
Prediction
140
120
Given the regression line, it
100
Height =4.9+ 1.223DAS
r = 0.98
can be predicted that the
Plant Height (cm)
20
0
0 20 40 60 80 100
Days after Seeding
IRRI-PBGB-CRIL 29
Example: Growth Rate Data
Analysis of Variance
SV Df Sum Square Mean Square F value Pr (>F)
DAS 1 8201.389781 8201.389781 95.435198 0.002279
Residuals 3 257.810219 85.93674
Model Summary
R Squared 0.969523
Adj. R Squared 0.959364
Sums of Squares
∑ (Y − Y ) =∑ (Y − Yˆ + Yˆ − Y ) =∑ (Yˆ − Y ) + ∑ (Y − Yˆ )
i
2
i i i
2
i
2
i i
2
R2 =
SSM
=
∑ (Yˆ − Y )
i
2
R2 is IRRI-PBGB-CRIL
the fraction of variation in Y explained by
30 X.
SST ∑ (Y − Y )
i
2
Linear Regression vs. ANOVA
Linear models
ANOVA and regression are the same thing!!!
IRRI-PBGB-CRIL 31
Misuse of Regression
and Correlation Analysis
• Performing regression and correlation on spurious
data could give significant results. But this is not a
valid indication of a linear relationship.
IRRI-PBGB-CRIL 32
Misuse of Regression
and Correlation Analysis
• Extrapolation of results
o scope of data is extended. Example
§ If the relationship of yield IR8 and stemborer
incidence is extended to cover all rice varieties
§ If the relationship between grain yield and protein
content from varietal trials is assumed to be
applicable to other types of experiments such as
fertilizer trials
o functional relationship is assumed to hold beyond
10000
y = 23.751x + 4307.2
9000 r = 0.987** There is no evidence if a
linear relationship still holds
Grain Yield (kg/ha)
8000
above N = 180 kg/ha
7000
6000
5000
4000
0 30 60 90 120 150 180 210 240
N-rate (k g/ha)
IRRI-PBGB-CRIL 34
Coefficient of Determination (R2)
IRRI-PBGB-CRIL 35
Problems with R2
• R2 tends to increase as additional variables are included
to a regression equation, regardless of their true
importance in determining the values of the dependent
variable
The adjusted R2 (Ra2) compensates for this effect
n −1
Ra2 = 1 − (1 − R 2 )
n − ( p + 1)
where n = no . of observatio ns
p = no . of independen t var iables
IRRI-PBGB-CRIL 38