0% found this document useful (0 votes)
46 views44 pages

Correlation Coefficient

The ppt tells about the correlation coefficient coming under the reliability of test and test items.

Uploaded by

quratshabbir4433
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views44 pages

Correlation Coefficient

The ppt tells about the correlation coefficient coming under the reliability of test and test items.

Uploaded by

quratshabbir4433
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

THE CORRELATION

COEFFICIENT
PRESENTED BY:
ASNA ZIA (1994)
SHEEZA TARIQ (1995)
MAHNOOR IKLAQ (1996)
IQRA MASOOD CHOHAN (1999)
AMMARA ABID (2002)
QURAT-UL-AIN (2003)
Correlation
coffecient :
The correlation coefficient is a statistical measure that quantifies the strength and
direction of the relationship between two variables. It is denoted by r (in the case of
Pearson correlation) and ranges from +ve 1 to -ve 1

Three Essential facts about


Correlation :
1.The degree of relationship between two variables is indicated by the number in
the coefficient, whereas the direction of the relationship is indicated by the sign.
2. Correlation, even if high, does not imply causation
3. High correlations allow us to make predictions
Pearson product
moment
It is used to find out the correlationbetween two continous variables.The basic
correlation :
formula that Karl Pearson devised for computing the correlation coefficient of
bivariate data from a sample is formally known as the Pearson product moment
correlation coefficient. The definitional formula of this coefficient, more commonly
referred to as the Pearson r, is
rxy= Σ x y/Nsxsy.(2.1)
Although the computational raw-score formula for the Pearson r is more
complicated than the definitional formula, the easy availability of computer
software to compute correlation coefficients makes the computational formula
practically unnecessary. On the other hand, Formula (2.1) and the even shorter
Formula (2.2) are of considerable help in understanding the numerical meaning of
The Pearson r is actually the mean of the cross-products of the
standard scores of the two correlated variables. The formula
that embodies this definition
rxy = Σ zxzy /N (2.2)

Conditions necessary for the use of


Pearson r:
1. The pairs of observations are independent of one another.
2. The variables to be correlated are continuous and
measured on interval or ratio scales.
3. The relationship between the variables is linear; that is, it
approximates a straight-line pattern.
Deviations from
1. Bending or Nonlinear Relationship
linearity :
If the scatterplot shows a curve instead of a straight-line pattern, the relationship
is non linear.
2. Uneven Spread of Data (Heteroscedasticity)
The data points in the scatterplot might spread out unevenly
(heteroscedasticity). For Pearson to work, the data should have a uniform
spread (homoscedasticity).
How to Check Scatterplots
1. Look for a straight-line pattern. If the scatterplot curves or bends, the
relationship is nonlinear, and Pearson isn’t suitable.
2. Check if the spread of points is roughly the same across the range. If not,
the data shows heteroscedasticity, and Pearson might not work.
Range Restriction and
Correlation :
The Pearson (correlation coefficient) depends heavily on the variability
(range
of values) in the data. If the range of either variable is restricted, the
correlation will appear smaller—even if the two variables are strongly
related.
1. Extreme Case: No Variability
If one variable has no variability (all values are the same), the correlation will
always be zero because there’s nothing to relate.
2. Range Restriction in Employment Testing
Restricting the range of one variable can lower the correlation.
3. Wide Variability Can Inflate Correlation
A wide range of variability in both variables can exaggerate the correlation.
Other correlation
methods
1. Spearman’s : (Spearman’s )
Rank Correlation
When to Use: For ordinal data (data ranked in order, not measured
precisely).
2. Eta (η): For Curvilinear Relationships
When to Use: When the relationship between two variables forms a curve
instead of a straight line.
3. Point-Biserial Correlation ()
When to Use: When one variable is dichotomous (only two possible
values, like yes/no, true/false).
4. Phi Coefficient (φ): For Two Dichotomous Variables
When to Use: When both variables are dichotomous
5. Multiple Correlation Coefficient (R)
When to Use: When predicting a single variable (Y) using multiple predictors
(X1, X2, etc.).
A SATISFACTORY SIZE FOR THE
CORRELATION COEFFICIENT
nterpretation of Values:
• The correlation coefficient (r) ranges from -1 to +1.

• A value of +1 indicates a perfect positive correlation, meaning that as one variable increases, the
other variable also increases.

• Conversely, a value of -1 indicates a perfect negative correlation, where one variable increases as the other
decreases.
• A value of 0 indicates no correlation.

This is scatter plot.

Negative Positive Zero


2. Strength of
Correlation:
The interpretation of the correlation coefficient is typically categorized as follows:

• Very weak: 0.00 to 0.19


• Weak: 0.20 to 0.39
• Moderate: 0.40 to 0.59
• Strong: 0.60 to 0.79
• Very strong: 0.80 to 1.00
• In psychological testing, a coefficient of 0.30 or higher is often considered satisfactory, indicating a
meaningful relationship between the variables being studied.
3. Sample Size
• The Consideration:
reliability of the correlation coefficient is influenced by the sample size.

• Larger samples tend to provide more accurate estimates of the correlation, while smaller samples can
lead to unstable coefficients.

• Therefore, researchers should aim for an adequate sample size to ensure the validity of their findings.

4. Contextual
• TheConsideration:
satisfactory size of the correlation coefficient can depend on the context of the research.

• In some fields, such as psychology, a correlation of 0.30 may be considered meaningful, while in
other disciplines, a higher threshold might be required.
5. Statistical
• It isSignificance:
important to consider not just the size of the correlation but also its statistical significance.
• A correlation can be large but not statistically significant if the sample size is small.

• Conversely, a small correlation can be statistically significant in a large sample.


• Researchers should report both the correlation coefficient and the p-value to provide a complete
picture of their findings.
. Practical Significance:
• Beyond statistical significance, researchers should also consider practical significance.

• This involves evaluating whether the strength of the correlation has real-world implications.

• A small correlation might be statistically significant but may not have practical relevance in a real-
world context.

7. Correlation vs
• Causation:
Correlation does not imply causation, means that it is not necessary that correlation provides reason
and cause.
• A high correlation coefficient does not mean that one variable causes changes in another.
• If causation is involved, then it includes independent and dependent variables.

• Not every correlation gives direction, but in causation we know that which variable is cause
(independent variable) and which is effect (dependent variable).
8. Coefficient of
• Determination:
The square of the correlation coefficient (r²) is known as the coefficient of determination.

• The coefficient of determination (r²), measures how well the independent variable explain the
variability of the dependent variable.

• It provides insight into the proportion of variance in one variable that can be explained by the
other variable.
• For example, if r = 0.6, then r² = 0.36, indicating that 36% of the variance in one variable can
be explained by the other. This can help in understanding the practical significance of the
correlation.
9. Types of Correlation
• There are different types of correlation coefficient, such as Pearson's r, Spearman's rank
Coefficient:
correlation, and Kendall's tau.

• Each type is suited for different types of data.

• Pearson's r is used for linear relationships with interval or ratio data, while Spearman's and Kendall's
are used for ordinal data or non-linear relationships.
Pearson Product Moment
Correlation: Σ(X-X) (Y-Y)
r=
Σ(X-X)² Σ(Y-Y)²
Spearmen Rank Order
Correlation:
1- 6Σd²
r=
n(n²-1)
endall’s Tau Correlation:

0. Confidence Interval:
• A confidence interval for the correlation coefficient is a range of values that is likely to contain the true
correlation coefficient of a population based on sample data.
• It provides an estimate of the uncertainty around the correlation measurement.

• Typically, a confidence interval is expressed at a certain confidence level, such as 95%, meaning that if
we were to take many samples and compute the confidence interval for each, approximately 95% of
those intervals would contain the true correlation coefficient.
. Reporting Correlation:
• When reporting correlation results, it is essential to include the correlation coefficient, the sample
size, the p-value, and the confidence interval.
• This comprehensive reporting allows for better interpretation and understanding of the findings.

• In summary, understanding the satisfactory size of the correlation coefficient involves


considering various factors, including the type of correlation, the limitations of correlation analysis,
sample size, and comprehensive reporting of results. These elements help researchers draw
meaningful conclusions from their data in psychological testing.
Factors Influencing Reliability of test
scores
• Extrinsic factors
• Intrinsic factors
Test Length
• Longer tests tend to be more reliable.
ason: They cover a broader range of content and reduce the impact of
outlier items
paragraph text
Scoring Consistency Test Design
The objectivity and uniformity of • The design and alignment of test items
scoring. with the content being measured.
• Well-aligned tests accurately reflect the
Factors: Training of scorers, clear
intended knowledge or skill areas.
rubrics.
Impact: Consistent scoring increases
Statistical Analysis Test Administration
Techniques used to measure Conditions
internal consistency. The environment in which the test is
Impact: Helps quantify and
administered.
improve test reliability
Noise levels, lighting, seating
arrangements.

Test-Retest Interval Impact: Variations can distract or
: The time interval between discomfortItem test-takers, affecting
Quality
repeated administrations of the performance.
The clarity and unambiguity
same test. of test items.
• Impact: Short intervals may Impact: High-quality items
result in higher reliability due to contribute to the overall
memory effects; longer intervals reliability of the test.
can show lower reliability due to
changes in knowledge or skills.
Strategies to Improve Reliability of Test
1. Addressing Sources Scores
of Error:
• Content Sampling Error: This type of error occurs when a test fails to represent all aspects of
the domain it is intended to measure. For example, if a math test overemphasizes algebra but
neglects geometry, it introduces content sampling error.

• Solution: Develop a test blueprint that ensures balanced representation of all content areas.

• Use alternate-form reliability techniques, where different but equivalent versions of the test are created
and compared to estimate and reduce such errors.

• Time Sampling Error: This error arises from fluctuations in test-takers’ performance due to
timing factors, such as mood, fatigue, or external circumstances.

• Solution: Administer the test multiple times (test-retest reliability) to check stability over time.
• Use appropriate intervals between testing sessions: Shorter intervals
minimize practice effects (remembering items from the first test).

• Longer intervals account for natural variability over time.

Ensuring Scorer Consistency:


a) Scorer
Reliability:
• Challenge: When tests involve subjective scoring (e.g., essays, portfolios), inconsistency among
scorers can reduce reliability.

• Solution: Train scorers to follow clear rubrics and scoring guidelines.

• Use multiple raters and calculate inter-rater reliability. A correlation of 0.90 or higher between scores
from different raters indicates strong agreement.
• Employ double scoring for high-stakes assessments to cross-check accuracy.
3. Improving Test
Design:
• Internal consistency: Evaluates whether all test items measure the same underlying
construct.
• Solution: Use statistical measures such as:

• Split-half reliability: Divide the test into two halves and assess how well scores from each
half correlate.
• Cronbach’s alpha: A measure of how well items are interrelated. Higher alpha values indicate
greater reliability.

• Revise or eliminate poorly performing items (items with low item-total correlations).

• Test Lenght:Rationale: Longer tests generally produce more reliable scores because they reduce the
influence of random errors in individual items.

• Solution: Add more items that assess the same construct, ensuring they are high-quality and well-
designed.
• Develop parallel test forms with equivalent difficulty levels to increase reliability for repeated
testing.
4. Using Advanced Theoretical Models:
• Item Response Theory ( IRT): A modern psychometric approach that focuses on the
relationship between item difficulty and test-taker ability.
• Advantages: Allows precise calibration of test items for difficulty and discrimination.

• Enables adaptive testing, where the test adapts to the ability level of the test-taker, improving efficiency
and reliability.
• Application: Use IRT to identify items that function differently across subgroups (differential item
functioning).
• Generalizability Theory (GT): GT goes beyond classical test theory to evaluate multiple
sources of error simultaneously, such as item, time, and scorer variability.
• Application: Use GT to design tests that minimize variance caused by multiple factors, offering a
comprehensive view of reliability.
5. Evaluating Reliability Data in
Context:
• Purpose-Specific Reliability: High reliability is more critical for high-stakes tests (e.g.,
licensing exams) than for exploratory or low-stakes measures.
• Reliability coefficients above 0.80 are generally acceptable; however, thresholds of 0.90 or higher may
be needed for critical decisions.
• Sample Considerations: Ensure the reliability analysis is conducted on a representative sample
of the target population to avoid biased estimates.
• Interpretation: Use confidence intervals to interpret reliability coefficients, considering the
specific purpose and constraints of the test.

tandardization of Administration:
• Challenge: Variability in test administration (e.g., differences in instructions, environment, or time
limits) introduces error.

• Solution: Use standardized instructions and administration procedures for all test-takers.
• Control environmental factors such as noise, lighting, and seating arrangements.

• Train administrators to follow consistent protocols and handle unexpected situations


uniformly.
TRUE SCORE
True scores are hypothetical
entities that would result from
error-free measurement. In other
words, true scores represent the
perfect, unbiased, and reliable
measure of the construct being be
used as demontrations, lectures,
reports, and more. it is mostly
presented before an audience.
CLASSICAL TEST THEORY

IN CLASSICALTEST THEORY,AN INDIVIDUAL'S TRUE SCORE IS CONCEPTUALIZEDAS, THEAVERAGE

SCORE IN AHYPOTHETICAL DISTRIBUTION OF SCORES THAT WOULD BE OBTAINED IF THE INDIVIDUAL

TOOK THE SAME TESTAN INFINITE NUMBER OF TIMES. ,THIS CONCEPT IS IMPORTANT BECAUSE IT HELPS

US UNDERSTAND THAT OBSERVED SCORESARE NOTALWAYSA PERFECT REFLECTION OF AN

INDIVIDUAL'S TRUE ABILITY OR CHARACTERISTIC.


O B S E RV E D TRUE SCORE &
SCORE ERROR SCORE

Observed scores are the scores


that individual actually obtain THE TRUE SCORE
from a test. Observed scores COMPONENT REFLECTS
are directly observable and THE INDIVIDUAL'S TRUE
ABILITY OR
measurable, meaning that they
CHARACTERISTIC, WHILE
can be quantified and THE ERROR SCORE
recorded. the scores that COMPONENT REPRESENTS
individuals actually obtain ANY OTHER FACTORS THAT
from a test. It has two MAY ENTER INTO THE
OBSERVED SCORE.
components a true score and a
error component.
EQUATION

This relationship can be expressed


using the following equation:
Xo = Xtrue + Xerror. This
equation shows that any observed
score (Xo) is made up of two
components: a true score
component (Xtrue) and an error
score component (Xerror).
Quantifying Error in Test
Scores:
The Standard Error of
Measurement
Standard Error of
Measurement (SEM):
• Standard deviation of errors of measurement that are associated with “test” scores.
• Represents the range of scores that could be obtained if the test is repeated multiple

times.
• Formula: SEM = SD × √(1 - r)
• where;
• SD = the standard deviation of the test
• r = the reliability coefficient
• If SD is greater then SEM is also greater.
Why is SEM
important?
• Reminds us that test scores are not precise.
• Provides a range of possible true scores for an individual.
• Allows us to quantify the extent to which a test provides accurate scores.
• Low level SEM represents High score accuracy.
• High level SEM represents Low score accuracy.
Example: Applying
the SEM
• Case study:
• Maria's Vocabulary subtest score (WAIS-III)
• Calculating the SEM and confidence interval
• Interpreting the results
• SD= 3, M=10, error variance or r= .79

SEM = 3 × √(1 - .79)


SEM = 1.37
• After calculating SEM, confidence interval is calculated, using SEM and desired

confidence level.
Confidence
Interval:
• A statistical tool that provides a range of values within which a population parameter is
likely to lie.
• To calculate confidence interval we apply the percentages of estimated confidence level to
the estimated true scores.
• Importance:
1. Reminds us that test scores are not precise.
2. Prevents overvaluing insignificant score differences.
Confidence
Interval:
• Getting estimated true score using formula based on Dudek (1979):
• T ′ = r (Xo – M ) + M
• where:
• T ′ = the individual’s estimated true score
• r= estimated reliability of test scores
• Xo = the individual’s obtained score
• M = the mean of the test score distribution
Confidence
Interval:
• Xo = 15, r= .79, and M = 10
T ′ = (.79) (15 – 10) + 10
= 13.95
= 14
• If Xo is above the M, then T is lower than Xo
• If Xo is below the M, then T is greater than Xo

• If Xo=M then T is M
Confidence
Interval:
• Now, calculating Confidence Interval:

• 68%CI= T±SEM

68% CI= 14±1.37 i-e between 13 and 15


• 95%CI= T± 1.96*SEM

95%CI= 14 ± (1.37) (1.96) i-e between 11 and 17


Standard Error
•Difference:
Assessments often involve comparing scores within or between individuals.
• To determine if differences are due to chance, reliability data is used to calculate the Standard

Error of the Difference (SEdiff).


• Formulae:

SEdiff Formula 1:
SEdiff = SD× √( 2 –r11 –r22 )
where
SD = the standard deviation of Test 1 and Test 2
r11 = the reliability estimate for scores on Test 1
r22 = the reliability estimate for scores on Test 2
Standard Error
Difference:
SEdiff Formula 2:
SEdiff = √(SEM1^2 + SEM2^2)
Where:
SEM1 = Standard Error of Measurement for Score 1
SEM2 = Standard Error of Measurement for Score 2
SEdiff Formula 1 is used if the two test scores being compared are expressed in
the same scale, and SEdiff Formula 2 is used when they are not.
Example: Applying
SEdiff:
• The Standard Error of the Difference (SEdiff) is applied to determine the statistical significance of
differences between Maria's Vocabulary and Information subtest scores on the WAIS-III.
• SEdiff = √(2 × SEM^2)

= √(2 × (3^2 × √(1-.79) × √(1-.85)))


= 1.80
• Calculating Critical Value:

5/1.80 = 2.78
• Determining Statistical Significance:

p-value = 2 × .0027
= .0054
Example: Applying
SEdiff:
• Interpretation:
The probability that the 5-point difference between Maria's Vocabulary and Information subtest scores is due
to chance is 5.4 in 1,000.

• Conclusion:
Maria's knowledge of vocabulary most likely exceeds her knowledge of general information.
Profile
Analysis:
• Using SEM to analyze profiles of subtest scores
• Example: Maria's WAIS-III subtest scores

90%
Subtest Obtained Scores SEM Confidence
level

Vocanulary 15 1.37 13.63, 16.37

Information 10 1.44 8.56, 11.44

Arithmetic 12 1.51 10.49, 13.51

Digit Span 14 1.58 12.42, 15.58


Relationship Between Reliability
and Validity:
• Reliability is a necessary but not sufficient condition for validity.
• Score reliability can be seen as minimal evidence of validity.
• However, some tests may produce valid results that are not reliable in terms of consistency or

stability.
Relationship Between Reliability
and Validity:
• Reliability is a necessary but not sufficient condition for validity.
• Score reliability can be seen as minimal evidence of validity.
• However, some tests may produce valid results that are not reliable in terms of consistency or

stability.
THE END
THANK

You might also like