0% found this document useful (0 votes)
11 views5 pages

6) CorrelationAndRegression - 27

Uploaded by

uwtfme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

6) CorrelationAndRegression - 27

Uploaded by

uwtfme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

17.10.

2023

Correlations Analysis
The term correlation refers to any type of relationship
between events and objects.
If we are interested only in determining whether a
y relationship exists, we employ correlation analysis.
Correlation analysis refers exclusively to a
quantifiable relationship between two variables.
Y=a+bX For the correlation calculation, there must be two
x measures (variables) for each subject.
If this condition is satisfied, the data can be inserted
Prof.Dr. Ahmet DİRİCAN
into a statistical formulation that will reveal the
İÜ-C. CERRAHPAŞA MEDICAL FACULTY type and strength of the relationship under study.
DEPARTMENT OF BIOSTATISTICS

Correlation-two variables (Univariate & Bivariate Statistics)


The most widely employed measure of
 linear pattern of relationship between one
statistical correlation is the product-moment variable (x) and another variable (y)
correlation coefficient devised by Pearson.  an association between two variables
Many other techniques used to describe  relative position of one variable correlates with
relationships are analogous to the relative distribution of another variable
Pearson approach.  graphical representation of the relationship
between two variables

We considered the problem of one variable  Warning:


(the dependent variables "Y") from one or more  No proof of causality
related variables (the independent variables "X")  Cannot assume x causes y

Sample vs. Population Hypothesis testing with Correlations


• Sample statistics estimate Population parameters • Two possibilities
– Ho: ρ = 0 (no actual correlation; The Null Hypothesis)
– 𝑥ҧ tries to estimate μ .. ( ‘‘x bar’’ Sample mean  ‘‘mü’’Population Mean) – Ha: ρ ≠ 0 (there is some correlation; The Alternative Hyp.)
– r tries to estimate ρ … (“rho” – greek symbol --- not “p”)
• Case #1 (see correlation worksheet)
• r correlation for a sample (based on a the limited observations we have)
– Correlation between distance and points r = -0.904
• ρ actual correlation in population (the true correlation) – Sample small (n=6), but r is very large
– We guess ρ < 0 (we guess there is some correlation in the pop.)
• Beware Sampling Error!!
– even if ρ=0 (there’s no actual correlation), • Case #2
– you might get r=0.08 or r = -0.26 just by chance. – Correlation between aiming and points, r = 0.628
– We look at r, but we want to know about ρ – Sample small (n=6), and r is only moderate in size
– We guess ρ = 0 (we guess there is NO correlation in pop.)

1
17.10.2023

Scatter Diagrams: The most accurate information about the relationship model
THE MEASURES OF RELATIONSHIP between two variables is obtained from the scatter diagram of individuals.
BETWEEN CONTINUOUS VARIABLES y r0 y r1 y r-1 y r0

A scatterplot can reveal various types of


associations between two variables. x x x x
The y variable can respond to the Small and large values of x
There appears to be no
Although a scatterplot is an essential first discernible relationship
increase of the x variable with an
increase or decrease.
variable are associated with
between two variables. large values of y variable. The
step in studying the association between The variables are related linearly. relationship is U-shaped.
variables, it is often useful to quantify the TYPES OF THE RELATIONSHIPS;
2-Ampirik (Deneysel), Stochastic (olasılıklı) relationships
stregth of association by calculating a 1-Deterministic (kesin)relationships.
summary index. The observed (x, y)
data points fall directly
y=1.5x+ Random Error A stochastic model
y=1.5x
is a mathematical
on a line.
The relationship description (of the
between degrees of relevant properties) of an
Fahrenheit and Celsius entropy source
is known to be: using random
F=(9/5)*C+32 variables.

As the scatter in the sample space widens, the strength of Interpretation of the Correlation Coefficients
the relationship decreases. ‘‘r’’ indicates two information about the relationship, such as ‘‘strength(1) and direction(2)’’

“r=” Relation
1 1)Strength of relationship:
0.00 – 0.19 None (Chance Effect)
Positive The strength of the ‘‘r’’ is as 0.20 – 0.34 Weak
r=1 r = 0,85 r =0,55 follows, as the absolute value. 0.35 – 0.49 Low
relationship
0.50 – 0.64 Moderate
0.65 – 0.79 Strong
2 0.80 – 0.95 Very Strong
r = -1 r = -0,55
r = -0,85

Negative 0.96 – 1.00 Perfect

relationship
2) Direction of relationship: r ranges in value from “–1” to “+1”
• positive (direct, paralel) – variables move in same direction
3 r = 0,0
• negative (inverse) – variables move in opposite directions
No relation
-1 (Strong Negative) 0 (No Relation) +1 (Strong Positive)

Final score State board score


One commonly used measure is the Pearson correlation Example: A random sample of 25 x y
87 440
coefficient, denoted by r. It is defined as following formulas. nurses selected from a state registry 87 480
of nurses yielded the following 87 535
 x  y 
 x . y    n
88 460
i i Testing the validity of the information on each nurse’s score on 88 525
correlation coefficient 89 480
i i the state board examination and his or 89 510
r H0:=0 HA:0 her score in school. Both scores 89 530

 x    x   y    y  
  2
 2
related to the nurse’s area of affiliation
89
89
545
600

  2
i
n 
 i

n 
2
i
i
t 
1 r2
r 90
90
495
545
   700 90
91
575
525
State Board Score

n2 91 575
91 600
x y 600 92 490
xy - 92 510
n 92 575
r = 500 93
93
540
595
SDx ² SDy ² n² If the number of cases 94 525
less than 30; 94 545
400 94 600
86 88 90 92 94 96
xy - ( x y / n ) n²-1 is used instead of n² Final Score 94 625
r = n-1 is used instead of n ∑x=2263, ∑y=13425, …
SDx SDy n ∑xy=1216685, ∑x2=7264525 and ∑y2=1216685

2
17.10.2023

Is there any corelation between final score and state board score? Example: The findings of the study
n n of 20 women are presented in the
n x y i i table, to investigate whether there
x y i i  i 1
n
i 1
is a relationship between the
r i 1

H0:=0  n
 n
 number of pregnancies (x) and
 n ( xi ) 2  n (  yi ) 2  hemoglobin (y) values.
HA:0   
 
x 2
 i 1
y 2
 i 1
i
n  i
n 

i 1

i 1
 96  236
   1075 
r 20
∑x=2263, (2263)(13425) 96 2 236
1216685 - (616  )  (2824 
∑y=13425, r 25  0,541788 20 20
∑xy=1216685,  5121169  180230625 
∑x2=7264525  7264525 - 1216685 -  t 
r 2
 25  25  [t=4.67] > [t(18;0.05)236
=2.1])
∑y2=1216685 1 r2
r
t  n2  p<0.05 20
1 r2  3.09 > t table 2.0687  P<0.05
n2 Reject H0 H0 rejected, H1 accepted,

The strength of the relationship between final score and state ‘‘r is strong, invers and
board score was moderate (r=0.54), paralel and valid. valid’’.

Coefficient of Determination…
Correlation Coefficient = r = - 0.74 , p<0.05
Tests thus far have shown if a linear relationship
Interpretation: A strong, invers and valid correlation exists; it is also useful to measure the strength of the
between the number of pregnancies and relationship. This is done by calculating the coefficient
hemoglobin values was found. of determination.
The coefficient of determination is the square of the
coefficient of correlation (r), hence R2 = r2
Coefficient of Determination = r2 = 0.55 Interpretation of ‘‘R2’’, is the proportion of the
Interpretation: 55% of changes in hemoglobin variation in the dependent variable that is predictable
from the independent variable(s).
values depend on the number of pregnancies. Unlike the value of a test statistic, the coefficient of
…….explained on the next slide
determination does not have a critical value that
enables us to draw conclusions.
In general the higher the value of R2, the better the
model fits the data.

Regression Analysis The problem is to fit a straight line to the data that in some
sense gives the best prediction of y for any value of x.
The nature and strength of the relationships between variables may
be examined by regression and correlation analysis.
Intuitively this will be a line that minimizes the distance
The linear correlation coefficient was presented as a quantity that between the data and the fitted line.
measures the strength of a linear relationship (dependency).
There are several approaches to this problem, but the
Regression analysis is helpful in ascertaining the probable form
standard method is called least squares regression.
of the relationship between variables, and the ultimate objective
When we use this method to fit a regression line
when this method of analysis is employed usually is to predict or
estimate the value of one variable corresponding to we minimize the sum of squares of the vertical distances
of the observations from the line.
a given value of another variable.
The general purpose of regression is to learn more about the Each distance is the difference for an individual between
relationship between one or several independent or predictor the observed value and the value given by the line, known
variables and a dependent or criterion variable. as the fitted value. The technical term for this distance is a
residual.

3
17.10.2023

Bivariate Linear Regression Analysis Calculation of Regression Equation


Mathematical model of bivariate linear relation: A line in a two-variable
space is defined by the following equation. The Y variable can be x y
expressed in terms of a constant (a) and a slope (b) times the X variable. xy - xy - ( x y / n )
n
Independent Variable b= =
Dependent
Variable Y = a + bX   Random Error
(x)²
x² - n
SD2x * n
Constant
If the number of cases less than 30;
Coefficient of Slope
n-1 is used instead of n
Regression Coefficients of the “a” and “b” is calculating with
“LEAST SQUARES METHOD”
y=a+bx must pass through the intersection of
. * Scatter Diagram
.
* *
.
.
Y=a+bX
Regression
(*) The distribution of (𝑥ҧ and 𝑦ത ), Hence, 𝑦=a+b
ത 𝑥,ҧ
individuals in the
. * Equation
sample space
.
* *
.
.
according to x, y also considering that calculated as a=𝑦-
ത b𝑥ҧ
values
.
. . . . . . . . . . . . . . .

 x  y 
 xy   n
Example: The findings of the study of 20 women
are presented in the table, to investigate whether
b To interpret the direction of the relationship
there is a relationship between the number of  x
x  
2

pregnancies (x) and hemoglobin (y) values


2

n
between variables, one looks at the signs (plus or minus)
b=[1075-((96*236)/20)]/(616-(962/20)=-0.37
of the regression or  coefficients.
y=a+bx 
a=(236/20)-[-0.37* 96/20)]=13.57 If a  coefficient is positive, then the relationship of this
y = 13.57 – 0.37 x variable with the dependent variable is positive; if the 
coefficient is negative then the relationship is negative.
x: 0 x: 5 x: 10
Of course, if the  coefficient is equal to 0 then there is no
relationship between the variables.
Hemoglobin Değerleri

15

, the slope, gives the amount of change in the dependent


10

variable when the independent variable changes by one unit.


5

0 0 5 10
0 2 4 6 8 10 12
Gebelik Sayısı

Classical assumptions for regression analysis include: Correlation Analysis in SPSS


(SPSS: Statistical Package for The Social Science)
1. The sample must be representative of the population for the
• Regression and Correlation analyses are two complementary methods.
inference prediction.
• If only correlation analysis will be performed after entering the data
2. The error is assumed to be a random variable with a mean of zero into the data page, made analysis using this menu options.
conditional on the explanatory variables. • Correlation findings are also included in the final table of regression
3. The independent variables are error-free. If this is not so, modeling analysis Analyze>Correlation>Bivariate
may be done using errors-in-variables model techniques.
4. The predictors must be linearly independent, i.e. it must not be
possible to express any predictor as a linear combination of the
others.
5. The errors are uncorrelated, that is, the variance-covariance matrix of the
errors is diagonal and each non-zero element is the variance of the error.
6. The variance of the error is constant across observations
(homoscedasticity).

Analyze>Correlation>Bivariate

4
17.10.2023

Dialog box of Correlation Analysis Regression Analysis in SPSS

Analyze>Regression>Linear

Correlation and Regression Outputs of "Math. and Intelligence (IQ)_Points"


CORRELATION ANALYSIS

r=
p=

a; sign (+), b; strength level(0.93),


c;Significance(P<0,001***)

REGRESSION ANALYSIS

Interpretation: The mathematical


model of the relationship between
Math and IQ scores is shown by Math Point = -11.925 +(1.111* IQ Point)
the linear regression equation.

You might also like