0% found this document useful (0 votes)
31 views42 pages

RMBS M2 Lecture 5a

Reliability refers to the consistency or repeatability of measurement. There are several types of reliability including test-retest, parallel forms, inter-rater, and internal consistency. Test-retest reliability measures consistency over time, parallel forms uses different versions of a test, inter-rater examines consistency between raters, and internal consistency assesses consistency between items measuring the same construct. Reliability is crucial for ensuring measurement error is minimized and accurate interpretation of results.

Uploaded by

Khushboo Ikram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views42 pages

RMBS M2 Lecture 5a

Reliability refers to the consistency or repeatability of measurement. There are several types of reliability including test-retest, parallel forms, inter-rater, and internal consistency. Test-retest reliability measures consistency over time, parallel forms uses different versions of a test, inter-rater examines consistency between raters, and internal consistency assesses consistency between items measuring the same construct. Reliability is crucial for ensuring measurement error is minimized and accurate interpretation of results.

Uploaded by

Khushboo Ikram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

RESEARCH METHODOLOGY &

BIOSTATISTICS
PROF WAQAR AHMED AWAN
PhD In Rehabilitation Sciences

1
RELIABILITY
• Reliability is the consistency of your measurement, or the
degree to which an instrument measures the same way each
time it is used under the same condition with the same
subjects.
• It is the extent to which measurement is consistent and free
from error.

2
Reliability
Joppe (2000) defined reliability as: …
• The extent to which results are consistent over time and an accurate
representation of the total population under study is referred to as reliability
• if the results of a study can be reproduced under a similar methodology, then
the research instrument is considered to be reliable.

3
Reliability coefficient
• Reliability: an estimate of the extent to which a test score is free from
errors i.e. to what extent observed scores vary from true scores
• Reliability coefficient ranges between 0.00 and 1.00 where
o 0 = no reliability
o <0.50 = poor reliability
o 0.50 – 0.75 = moderate reliability
o >0.75 = good reliability
o 1 = perfect reliability

4
Types of reliability

5
Type of reliability Measures the consistency of…
Test-retest The same test over time.
Rater The same test conducted by same
person = intra-rater
The same test conducted by
different people = inter-rater

Alternate forms / Para Different versions of a test which


llel forms are designed to be equivalent.

Internal consistency The individual items of a test.

6
AGREEMENT
When unit of measurement is on a categorical scale, reliability is assessed as measure of
agreement.
• Simplest measure of agreement is percent agreement.
• Percent agreement, calculated as the number of agreement scores divided by the total
number of scores but it doesn’t take chance agreement into account and overestimates
the level of agreement.
• Kappa statistics, k , is a chance corrected measure of agreement but is limited in that
it does not differentiate among disagreements.
• For differentiation of disagreements, a modified version of kappa statistics called
weighted kappa can be used to estimate reliability.

7
INTERNAL CONSISTENCY
• Most commonly applied statistical index for internal consistency is
crohnbach’s alpha (α).
• It can be used for scales with items that are dichotomus (yes / no) or when
there are more than two response choices (ordinal scale).
• Inter-item correlations, item-total correlations, and Cronbach's
alpha if item is deleted are used to conduct item analysis for the
instrument.
8
ALTERNATE FORMS: LIMITS OF
AGREEMENT
• Two analysis procedures have traditionally been applied for method comparisons.
o The correlation coefficient, r, has been used to demonstrate covariance among methods;
o The second procedure is the paired t-test, (or repeated measures ANOVA) which is used to show
that mean scores for two (or more) methods are not significantly different
o An interesting alternative for examining agreement across methods is an index called limits
of agreement

9
Test Retest
Reliability
10
Test-retest reliability

• Test-retest reliability is a measure of


reliability obtained by administering the
same test twice over a period of time to a
group of individuals.
• The scores from Time 1 and Time 2 can
then be correlated in order to evaluate
the test for stability over time.

11
How to conduct Test Retest
Reliability
The three main components to this method are as follows:
1.) implement your measurement instrument at two separate times for each subject

2). Choose appropriate interval far enough apart to avoid fatigue, learning or memory
effects but close enough to avoid genuine changes in measured variable.
3). Compute the correlation between the two separate measurements.

12
Reliability Coefficient for test
retest reliability
Type of data Reliability coefficient

Interval ratio data Pearson product moment coefficient of


correlation

Ordinal data Spearman rho

Correlation coefficients are limited as estimates of reliability so INTRACLASS CORRELATION


COEFFICIENT has become preferred index as it reflects both correlation and agreement

Nominal data Percent agreement or kappa statistics

Where stability of response is questioned Standard error of measurement

13
• In statistics, the intraclass correlation, or the intraclass correlation
coefficient (ICC), is a descriptive statistic that can be used when quantitative
measurements are made on units that are organized into groups.
o It describes how strongly units in the same group resemble each other.

• The standard error of measurement (SEM) is a measure of how much measured test
scores are spread around a “true” score.

14
Improving test-retest reliability
• When designing tests or questionnaires, try to formulate questions, statements and
tasks in a way that won’t be influenced by the mood or concentration of participants.
• When planning your methods of data collection, try to minimize the influence of
external factors, and make sure all samples are tested under the same conditions.
• Remember that changes can be expected to occur in the participants over time, and
take these into account.

15
Rater Reliability
16
Inter-rater reliability
Intra-rater reliability
• It refers to stability of data • It concerns variation between two
recorded by one individual or more raters who measure the
across two or more trials. same group of subjects.
• When rater skill is relevant to • Best assessed when all raters
the accuracy of test, intra rater measure response in single trial,
and test retest reliability are where they can observe a subject
essentially same estimate. simultaneously and
• Possibility for bias when one independently.
rater takes 2 measurements. • E.g. muscle force can decrease if
• Protection against rater bias: muscle is fatigued from first trial.
1. Develop objective grading If measuring joint ROM, it can
criteria change if joint tissues are
2. Train testers in use of stretched from first trial affecting
instrument inter rater reliability.
17
Improving rater reliability
• Clearly define your variables and the methods that will be used to measure them.
• Develop detailed, objective criteria for how the variables will be rated, counted or
categorized.
• If multiple researchers are involved, ensure that they all have exactly the same
information and training.

18
Parallel / Alternate /
Equivalent forms
Reliability
19
Parallel forms
reliability

• Parallel forms reliability is a measure


of reliability obtained by administering different
versions of an assessment tool (both versions must
contain items that probe the same construct, skill,
knowledge base, etc.) to the same group of individuals

20
Reliability Coefficient for Parallel forms
reliability
• Correlation coefficients are most often used for parallel forms reliability.

• Determination of limits of agreement is useful estimate of range of error expected

when using 2 different versions of instrument.

• This estimate is based on standard deviation of difference scores between the 2

instruments.

21
Improving parallel forms reliability

• Ensure that all questions or test items are based on the same theory and

formulated to measure the same thing.

22
Internal consistency
Reliability
23
Internal consistency
• Internal consistency assesses the correlation between
multiple items in a test that are intended to measure the
same construct.
• It is done to assess how consistent the results are for
different items for the same construct within the measure.

24
Internal consistency
There are a wide variety of internal consistency measures that can be used.

• Average Inter-item Correlation

• Average Item total Correlation

• Split-half reliability

25
Average Inter-item Correlation
• The average inter-item correlation uses all of the items on instrument that are
designed to measure the same construct.
• We first compute the correlation between each pair of items.
• For example, if we have six items we will have 15 different item pairings (i.e., 15
correlations).
• The average inter item correlation is simply the average or mean of all these
correlations.

26
Average Item total Correlation
• This approach also uses the inter-item correlations.

• In addition, we compute a total score for the six items and use that as a seventh

variable in the analysis.

27
Split half reliability

• In split-half reliability we randomly divide all


items that purport to measure the same
construct into two sets.
• We administer the entire instrument to a
sample of people and calculate the total score
for each randomly divided half.

28
Improving internal consistency reliability
• Take care when formulating questions or measures: those intended to reflect the
same concept should be based on the same theory and carefully formulated.

29
Reliability Coefficients for internal
consistency
Ways to compute the internal consistency of a test or questionnaire include:
• Cronbach’s alpha average of all possible split half reliabilities
• Average inter-item correlation
• Average item-total correlation
• Split-half reliability - Spearman–Brown prophecy statistics is used to estimate
correlation of two halves of test.

30
VALIDITY
• Qualitative research is based on the fact that validity is a matter of

trustworthiness, utility, and dependability.

• In quantitative research validity is the extent to which any measuring

instrument measures what it is intended to measure.

31
Sensitivity and Specificity
• Validity of diagnostic test in terms its ability to accurately assess the presence and

absence of target condition

• A diagnostic test can have four possible outcome

32
1. Sensitivity is the test's ability to obtain a positive test when the target condition
is really present, or the true positive rate.

2. Specificity is the test's ability to obtain a negative test when the condition is
really absent, or the true negative rate.
3. The complement of sensitivity (1 - sensitivity) is the false negative rate, or the
probability of obtaining an incorrect negative test in patients who do have the
target disorder.
4. The complement of specificity (1 - specificity) is the false positive rate,
sometimes called the "false alarm" rate. This is the probability of an incorrect
positive test in those who do not have the target condition.
33
Internal And External Validity In
Research
• Internal validity refers to whether the effects observed in a study are due to
the manipulation of the independent variable and not some other factor.
• In-other-words there is a causal relationship between the independent and
dependent variable.
• Internal validity can be improved by controlling extraneous variables, using
standardized instructions, counterbalancing, and eliminating demand
characteristics and investigator effects.

34
• External validity refers to the extent to which the results of a study can be
generalized to other settings (ecological validity), other people (population
validity) and over time (historical validity).
• External validity can be improved by setting experiments in a more natural
setting and using random sampling to select participants.

35
Types of validity
Validity test is mainly divided into four types as:

1. Face validity

2. Content validity

3. Criterion related validity

4. Construct validity.

36
Face validity
• Face validity is simply whether the test appears (at face value) to measure
what it claims to.
• This is the least sophisticated measure of validity.
• Tests wherein the purpose is clear, even to naïve respondents, are said to have
high face validity.
• A direct measurement of face validity is obtained by asking people to rate the
validity of a test as it appears to them. This rater could use a likert scale to
assess face validity.
• It is important to select individuals who actually take the test would be well
placed to judge its face validity.

37
Content validity
• Content validity assesses whether a test is representative of all aspects of the
construct.
• To produce valid results, the content of a test, survey or measurement method
must cover all relevant parts of the subject it aims to measure.
• If some aspects are missing from the measurement (or if irrelevant aspects are
included), the validity is threatened.
• Content validity usually depends on the judgment of experts in the field.

38
Criterion-Validity
• Criterion validity compares responses to future performance or to those
obtained from other, more well-established surveys.
• Criterion validity is made up two subcategories:
• Predictive validity refers to the extent to which a survey measure forecasts
future performance. A graduate school entry examination that predicts who
will do well in graduate school has predictive validity.
• Concurrent validity is demonstrated when two assessments agree or a new
measure is compared favorably with one that is already considered valid.

39
Construct validity
• Construct validity evaluates whether a measurement tool really represents the
thing we are interested in measuring.
• A construct refers to a concept or characteristic that can’t be directly
observed, but can be measured by observing other indicators that are
associated with it.
• Example
o There is no objective, observable entity called “depression” that we can measure
directly. But based on existing psychological research and theory, we can
measure depression based on a collection of symptoms and indicators, such as
low self-confidence and low energy levels.

40
• Construct validity is about ensuring that the method of measurement matches
the construct you want to measure.
• If you develop a questionnaire to diagnose depression, you need to know:
does the questionnaire really measure the construct of depression? Or is it
actually measuring the respondent’s mood, self-esteem, or some other
construct?
• To achieve construct validity, you have to ensure that your indicators and
measurements are carefully developed based on relevant existing knowledge.
• The questionnaire must include only relevant questions that measure known
indicators of depression.
41
• Convergent validity takes two measures that are supposed to be measuring the same

construct and shows that they are related.

• Discriminant validity shows that two measures that are not supposed to be related

are in fact, unrelated.

42

You might also like