6) CorrelationAndRegression - 27
6) CorrelationAndRegression - 27
2023
Correlations Analysis
The term correlation refers to any type of relationship
between events and objects.
If we are interested only in determining whether a
y relationship exists, we employ correlation analysis.
Correlation analysis refers exclusively to a
quantifiable relationship between two variables.
Y=a+bX For the correlation calculation, there must be two
x measures (variables) for each subject.
If this condition is satisfied, the data can be inserted
Prof.Dr. Ahmet DİRİCAN
into a statistical formulation that will reveal the
İÜ-C. CERRAHPAŞA MEDICAL FACULTY type and strength of the relationship under study.
DEPARTMENT OF BIOSTATISTICS
1
17.10.2023
Scatter Diagrams: The most accurate information about the relationship model
THE MEASURES OF RELATIONSHIP between two variables is obtained from the scatter diagram of individuals.
BETWEEN CONTINUOUS VARIABLES y r0 y r1 y r-1 y r0
As the scatter in the sample space widens, the strength of Interpretation of the Correlation Coefficients
the relationship decreases. ‘‘r’’ indicates two information about the relationship, such as ‘‘strength(1) and direction(2)’’
“r=” Relation
1 1)Strength of relationship:
0.00 – 0.19 None (Chance Effect)
Positive The strength of the ‘‘r’’ is as 0.20 – 0.34 Weak
r=1 r = 0,85 r =0,55 follows, as the absolute value. 0.35 – 0.49 Low
relationship
0.50 – 0.64 Moderate
0.65 – 0.79 Strong
2 0.80 – 0.95 Very Strong
r = -1 r = -0,55
r = -0,85
relationship
2) Direction of relationship: r ranges in value from “–1” to “+1”
• positive (direct, paralel) – variables move in same direction
3 r = 0,0
• negative (inverse) – variables move in opposite directions
No relation
-1 (Strong Negative) 0 (No Relation) +1 (Strong Positive)
x x y y
2
2
related to the nurse’s area of affiliation
89
89
545
600
2
i
n
i
n
2
i
i
t
1 r2
r 90
90
495
545
700 90
91
575
525
State Board Score
n2 91 575
91 600
x y 600 92 490
xy - 92 510
n 92 575
r = 500 93
93
540
595
SDx ² SDy ² n² If the number of cases 94 525
less than 30; 94 545
400 94 600
86 88 90 92 94 96
xy - ( x y / n ) n²-1 is used instead of n² Final Score 94 625
r = n-1 is used instead of n ∑x=2263, ∑y=13425, …
SDx SDy n ∑xy=1216685, ∑x2=7264525 and ∑y2=1216685
2
17.10.2023
Is there any corelation between final score and state board score? Example: The findings of the study
n n of 20 women are presented in the
n x y i i table, to investigate whether there
x y i i i 1
n
i 1
is a relationship between the
r i 1
H0:=0 n
n
number of pregnancies (x) and
n ( xi ) 2 n ( yi ) 2 hemoglobin (y) values.
HA:0
x 2
i 1
y 2
i 1
i
n i
n
i 1
i 1
96 236
1075
r 20
∑x=2263, (2263)(13425) 96 2 236
1216685 - (616 ) (2824
∑y=13425, r 25 0,541788 20 20
∑xy=1216685, 5121169 180230625
∑x2=7264525 7264525 - 1216685 - t
r 2
25 25 [t=4.67] > [t(18;0.05)236
=2.1])
∑y2=1216685 1 r2
r
t n2 p<0.05 20
1 r2 3.09 > t table 2.0687 P<0.05
n2 Reject H0 H0 rejected, H1 accepted,
The strength of the relationship between final score and state ‘‘r is strong, invers and
board score was moderate (r=0.54), paralel and valid. valid’’.
Coefficient of Determination…
Correlation Coefficient = r = - 0.74 , p<0.05
Tests thus far have shown if a linear relationship
Interpretation: A strong, invers and valid correlation exists; it is also useful to measure the strength of the
between the number of pregnancies and relationship. This is done by calculating the coefficient
hemoglobin values was found. of determination.
The coefficient of determination is the square of the
coefficient of correlation (r), hence R2 = r2
Coefficient of Determination = r2 = 0.55 Interpretation of ‘‘R2’’, is the proportion of the
Interpretation: 55% of changes in hemoglobin variation in the dependent variable that is predictable
from the independent variable(s).
values depend on the number of pregnancies. Unlike the value of a test statistic, the coefficient of
…….explained on the next slide
determination does not have a critical value that
enables us to draw conclusions.
In general the higher the value of R2, the better the
model fits the data.
Regression Analysis The problem is to fit a straight line to the data that in some
sense gives the best prediction of y for any value of x.
The nature and strength of the relationships between variables may
be examined by regression and correlation analysis.
Intuitively this will be a line that minimizes the distance
The linear correlation coefficient was presented as a quantity that between the data and the fitted line.
measures the strength of a linear relationship (dependency).
There are several approaches to this problem, but the
Regression analysis is helpful in ascertaining the probable form
standard method is called least squares regression.
of the relationship between variables, and the ultimate objective
When we use this method to fit a regression line
when this method of analysis is employed usually is to predict or
estimate the value of one variable corresponding to we minimize the sum of squares of the vertical distances
of the observations from the line.
a given value of another variable.
The general purpose of regression is to learn more about the Each distance is the difference for an individual between
relationship between one or several independent or predictor the observed value and the value given by the line, known
variables and a dependent or criterion variable. as the fitted value. The technical term for this distance is a
residual.
3
17.10.2023
x y
xy n
Example: The findings of the study of 20 women
are presented in the table, to investigate whether
b To interpret the direction of the relationship
there is a relationship between the number of x
x
2
n
between variables, one looks at the signs (plus or minus)
b=[1075-((96*236)/20)]/(616-(962/20)=-0.37
of the regression or coefficients.
y=a+bx
a=(236/20)-[-0.37* 96/20)]=13.57 If a coefficient is positive, then the relationship of this
y = 13.57 – 0.37 x variable with the dependent variable is positive; if the
coefficient is negative then the relationship is negative.
x: 0 x: 5 x: 10
Of course, if the coefficient is equal to 0 then there is no
relationship between the variables.
Hemoglobin Değerleri
15
0 0 5 10
0 2 4 6 8 10 12
Gebelik Sayısı
Analyze>Correlation>Bivariate
4
17.10.2023
Analyze>Regression>Linear
r=
p=
REGRESSION ANALYSIS