0% found this document useful (0 votes)

31 views42 pages

RMBS M2 Lecture 5a

Reliability refers to the consistency or repeatability of measurement. There are several types of reliability including test-retest, parallel forms, inter-rater, and internal consistency. Test-retest reliability measures consistency over time, parallel forms uses different versions of a test, inter-rater examines consistency between raters, and internal consistency assesses consistency between items measuring the same construct. Reliability is crucial for ensuring measurement error is minimized and accurate interpretation of results.

Uploaded by

Khushboo Ikram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views42 pages

RMBS M2 Lecture 5a

Uploaded by

Khushboo Ikram

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

RESEARCH METHODOLOGY &

BIOSTATISTICS
PROF WAQAR AHMED AWAN
PhD In Rehabilitation Sciences

1
RELIABILITY
• Reliability is the consistency of your measurement, or the
degree to which an instrument measures the same way each
time it is used under the same condition with the same
subjects.
• It is the extent to which measurement is consistent and free
from error.

2
Reliability
Joppe (2000) defined reliability as: …
• The extent to which results are consistent over time and an accurate
representation of the total population under study is referred to as reliability
• if the results of a study can be reproduced under a similar methodology, then
the research instrument is considered to be reliable.

3
Reliability coefficient
• Reliability: an estimate of the extent to which a test score is free from
errors i.e. to what extent observed scores vary from true scores
• Reliability coefficient ranges between 0.00 and 1.00 where
o 0 = no reliability
o <0.50 = poor reliability
o 0.50 – 0.75 = moderate reliability
o >0.75 = good reliability
o 1 = perfect reliability

4
Types of reliability

5
Type of reliability Measures the consistency of…
Test-retest The same test over time.
Rater The same test conducted by same
person = intra-rater
The same test conducted by
different people = inter-rater

Alternate forms / Para Different versions of a test which

llel forms are designed to be equivalent.

Internal consistency The individual items of a test.

6
AGREEMENT
When unit of measurement is on a categorical scale, reliability is assessed as measure of
agreement.
• Simplest measure of agreement is percent agreement.
• Percent agreement, calculated as the number of agreement scores divided by the total
number of scores but it doesn’t take chance agreement into account and overestimates
the level of agreement.
• Kappa statistics, k , is a chance corrected measure of agreement but is limited in that
it does not differentiate among disagreements.
• For differentiation of disagreements, a modified version of kappa statistics called
weighted kappa can be used to estimate reliability.

7
INTERNAL CONSISTENCY
• Most commonly applied statistical index for internal consistency is
crohnbach’s alpha (α).
• It can be used for scales with items that are dichotomus (yes / no) or when
there are more than two response choices (ordinal scale).
• Inter-item correlations, item-total correlations, and Cronbach's
alpha if item is deleted are used to conduct item analysis for the
instrument.
8
ALTERNATE FORMS: LIMITS OF
AGREEMENT
• Two analysis procedures have traditionally been applied for method comparisons.
o The correlation coefficient, r, has been used to demonstrate covariance among methods;
o The second procedure is the paired t-test, (or repeated measures ANOVA) which is used to show
that mean scores for two (or more) methods are not significantly different
o An interesting alternative for examining agreement across methods is an index called limits
of agreement

9
Test Retest
Reliability
10
Test-retest reliability

• Test-retest reliability is a measure of

reliability obtained by administering the
same test twice over a period of time to a
group of individuals.
• The scores from Time 1 and Time 2 can
then be correlated in order to evaluate
the test for stability over time.

11
How to conduct Test Retest
Reliability
The three main components to this method are as follows:
1.) implement your measurement instrument at two separate times for each subject

2). Choose appropriate interval far enough apart to avoid fatigue, learning or memory
effects but close enough to avoid genuine changes in measured variable.
3). Compute the correlation between the two separate measurements.

12
Reliability Coefficient for test
retest reliability
Type of data Reliability coefficient

Interval ratio data Pearson product moment coefficient of

correlation

Ordinal data Spearman rho

Correlation coefficients are limited as estimates of reliability so INTRACLASS CORRELATION

COEFFICIENT has become preferred index as it reflects both correlation and agreement

Nominal data Percent agreement or kappa statistics

Where stability of response is questioned Standard error of measurement

13
• In statistics, the intraclass correlation, or the intraclass correlation
coefficient (ICC), is a descriptive statistic that can be used when quantitative
measurements are made on units that are organized into groups.
o It describes how strongly units in the same group resemble each other.

• The standard error of measurement (SEM) is a measure of how much measured test
scores are spread around a “true” score.

14
Improving test-retest reliability
• When designing tests or questionnaires, try to formulate questions, statements and
tasks in a way that won’t be influenced by the mood or concentration of participants.
• When planning your methods of data collection, try to minimize the influence of
external factors, and make sure all samples are tested under the same conditions.
• Remember that changes can be expected to occur in the participants over time, and
take these into account.

15
Rater Reliability
16
Inter-rater reliability
Intra-rater reliability
• It refers to stability of data • It concerns variation between two
recorded by one individual or more raters who measure the
across two or more trials. same group of subjects.
• When rater skill is relevant to • Best assessed when all raters
the accuracy of test, intra rater measure response in single trial,
and test retest reliability are where they can observe a subject
essentially same estimate. simultaneously and
• Possibility for bias when one independently.
rater takes 2 measurements. • E.g. muscle force can decrease if
• Protection against rater bias: muscle is fatigued from first trial.
1. Develop objective grading If measuring joint ROM, it can
criteria change if joint tissues are
2. Train testers in use of stretched from first trial affecting
instrument inter rater reliability.
17
Improving rater reliability
• Clearly define your variables and the methods that will be used to measure them.
• Develop detailed, objective criteria for how the variables will be rated, counted or
categorized.
• If multiple researchers are involved, ensure that they all have exactly the same
information and training.

18
Parallel / Alternate /
Equivalent forms
Reliability
19
Parallel forms
reliability

• Parallel forms reliability is a measure

of reliability obtained by administering different
versions of an assessment tool (both versions must
contain items that probe the same construct, skill,
knowledge base, etc.) to the same group of individuals

20
Reliability Coefficient for Parallel forms
reliability
• Correlation coefficients are most often used for parallel forms reliability.

• Determination of limits of agreement is useful estimate of range of error expected

when using 2 different versions of instrument.

• This estimate is based on standard deviation of difference scores between the 2

instruments.

21
Improving parallel forms reliability

• Ensure that all questions or test items are based on the same theory and

formulated to measure the same thing.

22
Internal consistency
Reliability
23
Internal consistency
• Internal consistency assesses the correlation between
multiple items in a test that are intended to measure the
same construct.
• It is done to assess how consistent the results are for
different items for the same construct within the measure.

24
Internal consistency
There are a wide variety of internal consistency measures that can be used.

• Average Inter-item Correlation

• Average Item total Correlation

• Split-half reliability

25
Average Inter-item Correlation
• The average inter-item correlation uses all of the items on instrument that are
designed to measure the same construct.
• We first compute the correlation between each pair of items.
• For example, if we have six items we will have 15 different item pairings (i.e., 15
correlations).
• The average inter item correlation is simply the average or mean of all these
correlations.

26
Average Item total Correlation
• This approach also uses the inter-item correlations.

• In addition, we compute a total score for the six items and use that as a seventh

variable in the analysis.

27
Split half reliability

• In split-half reliability we randomly divide all

items that purport to measure the same
construct into two sets.
• We administer the entire instrument to a
sample of people and calculate the total score
for each randomly divided half.

28
Improving internal consistency reliability
• Take care when formulating questions or measures: those intended to reflect the
same concept should be based on the same theory and carefully formulated.

29
Reliability Coefficients for internal
consistency
Ways to compute the internal consistency of a test or questionnaire include:
• Cronbach’s alpha average of all possible split half reliabilities
• Average inter-item correlation
• Average item-total correlation
• Split-half reliability - Spearman–Brown prophecy statistics is used to estimate
correlation of two halves of test.

30
VALIDITY
• Qualitative research is based on the fact that validity is a matter of

trustworthiness, utility, and dependability.

• In quantitative research validity is the extent to which any measuring

instrument measures what it is intended to measure.

31
Sensitivity and Specificity
• Validity of diagnostic test in terms its ability to accurately assess the presence and

absence of target condition

• A diagnostic test can have four possible outcome

32
1. Sensitivity is the test's ability to obtain a positive test when the target condition
is really present, or the true positive rate.

2. Specificity is the test's ability to obtain a negative test when the condition is
really absent, or the true negative rate.
3. The complement of sensitivity (1 - sensitivity) is the false negative rate, or the
probability of obtaining an incorrect negative test in patients who do have the
target disorder.
4. The complement of specificity (1 - specificity) is the false positive rate,
sometimes called the "false alarm" rate. This is the probability of an incorrect
positive test in those who do not have the target condition.
33
Internal And External Validity In
Research
• Internal validity refers to whether the effects observed in a study are due to
the manipulation of the independent variable and not some other factor.
• In-other-words there is a causal relationship between the independent and
dependent variable.
• Internal validity can be improved by controlling extraneous variables, using
standardized instructions, counterbalancing, and eliminating demand
characteristics and investigator effects.

34
• External validity refers to the extent to which the results of a study can be
generalized to other settings (ecological validity), other people (population
validity) and over time (historical validity).
• External validity can be improved by setting experiments in a more natural
setting and using random sampling to select participants.

35
Types of validity
Validity test is mainly divided into four types as:

1. Face validity

2. Content validity

3. Criterion related validity

4. Construct validity.

36
Face validity
• Face validity is simply whether the test appears (at face value) to measure
what it claims to.
• This is the least sophisticated measure of validity.
• Tests wherein the purpose is clear, even to naïve respondents, are said to have
high face validity.
• A direct measurement of face validity is obtained by asking people to rate the
validity of a test as it appears to them. This rater could use a likert scale to
assess face validity.
• It is important to select individuals who actually take the test would be well
placed to judge its face validity.

37
Content validity
• Content validity assesses whether a test is representative of all aspects of the
construct.
• To produce valid results, the content of a test, survey or measurement method
must cover all relevant parts of the subject it aims to measure.
• If some aspects are missing from the measurement (or if irrelevant aspects are
included), the validity is threatened.
• Content validity usually depends on the judgment of experts in the field.

38
Criterion-Validity
• Criterion validity compares responses to future performance or to those
obtained from other, more well-established surveys.
• Criterion validity is made up two subcategories:
• Predictive validity refers to the extent to which a survey measure forecasts
future performance. A graduate school entry examination that predicts who
will do well in graduate school has predictive validity.
• Concurrent validity is demonstrated when two assessments agree or a new
measure is compared favorably with one that is already considered valid.

39
Construct validity
• Construct validity evaluates whether a measurement tool really represents the
thing we are interested in measuring.
• A construct refers to a concept or characteristic that can’t be directly
observed, but can be measured by observing other indicators that are
associated with it.
• Example
o There is no objective, observable entity called “depression” that we can measure
directly. But based on existing psychological research and theory, we can
measure depression based on a collection of symptoms and indicators, such as
low self-confidence and low energy levels.

40
• Construct validity is about ensuring that the method of measurement matches
the construct you want to measure.
• If you develop a questionnaire to diagnose depression, you need to know:
does the questionnaire really measure the construct of depression? Or is it
actually measuring the respondent’s mood, self-esteem, or some other
construct?
• To achieve construct validity, you have to ensure that your indicators and
measurements are carefully developed based on relevant existing knowledge.
• The questionnaire must include only relevant questions that measure known
indicators of depression.
41
• Convergent validity takes two measures that are supposed to be measuring the same

construct and shows that they are related.

• Discriminant validity shows that two measures that are not supposed to be related

are in fact, unrelated.

Social Problems 14th Edition Eitzen Digital Access
100% (2)
Social Problems 14th Edition Eitzen Digital Access
405 pages
Lesson in EDUC 4 (Establishing Test Validity and Reliability)
No ratings yet
Lesson in EDUC 4 (Establishing Test Validity and Reliability)
20 pages
3 - Types of Reliability
No ratings yet
3 - Types of Reliability
36 pages
Virginia Woolf and Nature
100% (4)
Virginia Woolf and Nature
241 pages
Triorigin Theory
100% (5)
Triorigin Theory
13 pages
Lesson 6 Establishing Test Validity and Reliability
No ratings yet
Lesson 6 Establishing Test Validity and Reliability
19 pages
Evidencias de Validez Del LSB 50 Validac
No ratings yet
Evidencias de Validez Del LSB 50 Validac
12 pages
Maskelyne
100% (1)
Maskelyne
142 pages
Early Modern Italy: A Comprehensive Bibliography
No ratings yet
Early Modern Italy: A Comprehensive Bibliography
499 pages
Characteristics of Nursing
No ratings yet
Characteristics of Nursing
16 pages
Characteristics of Research Tools
No ratings yet
Characteristics of Research Tools
3 pages
Form Pressure Gauge Rev.1
No ratings yet
Form Pressure Gauge Rev.1
3 pages
02.elbow Pain
No ratings yet
02.elbow Pain
62 pages
Reliability
No ratings yet
Reliability
27 pages
Chracteristics of A Good Test
No ratings yet
Chracteristics of A Good Test
58 pages
Reliabilty Lecture
No ratings yet
Reliabilty Lecture
16 pages
Reliability
No ratings yet
Reliability
10 pages
Bureaucracy
No ratings yet
Bureaucracy
19 pages
Reliability and Its Types
No ratings yet
Reliability and Its Types
13 pages
Classification of Research Design
100% (6)
Classification of Research Design
15 pages
What Is Reliability and Its Types
No ratings yet
What Is Reliability and Its Types
6 pages
5) General Principles of Central
No ratings yet
5) General Principles of Central
30 pages
Expanding Horizons of Science
No ratings yet
Expanding Horizons of Science
1 page
9 Reliability
No ratings yet
9 Reliability
10 pages
Unit Three-Measurement Instruments V4 - 2 - 3
No ratings yet
Unit Three-Measurement Instruments V4 - 2 - 3
268 pages
Module 2 Ucsp
No ratings yet
Module 2 Ucsp
16 pages
5 Reliability
No ratings yet
5 Reliability
67 pages
35 40 Ganesh
No ratings yet
35 40 Ganesh
6 pages
MS Final Lecture Reliability
No ratings yet
MS Final Lecture Reliability
80 pages
Reliability Assignment
No ratings yet
Reliability Assignment
6 pages
Myanmar Country Report
No ratings yet
Myanmar Country Report
104 pages
Students Slides 1 Realibity
No ratings yet
Students Slides 1 Realibity
59 pages
Reliability Reviewer
No ratings yet
Reliability Reviewer
5 pages
Assess 1 PED 106 Lesson 6
No ratings yet
Assess 1 PED 106 Lesson 6
75 pages
Relibility Testing
No ratings yet
Relibility Testing
44 pages
Chapter 13 Assessing Quality of Measurement Tools 2
No ratings yet
Chapter 13 Assessing Quality of Measurement Tools 2
57 pages
5 Reliability
No ratings yet
5 Reliability
29 pages
Reliability 2024
No ratings yet
Reliability 2024
30 pages
Hospital Presetation
No ratings yet
Hospital Presetation
59 pages
RMBS M2 Lecture 8a
No ratings yet
RMBS M2 Lecture 8a
52 pages
Reliability
No ratings yet
Reliability
37 pages
Reliability and Validity
No ratings yet
Reliability and Validity
47 pages
2) Pharma Lecture 2
No ratings yet
2) Pharma Lecture 2
43 pages
Matematică: Annals of The University of Craiova - Mathematics and Computer...
No ratings yet
Matematică: Annals of The University of Craiova - Mathematics and Computer...
5 pages
Reliability and Validity
No ratings yet
Reliability and Validity
32 pages
Validity and Reliability
No ratings yet
Validity and Reliability
24 pages
3) Pharma Lecture
No ratings yet
3) Pharma Lecture
23 pages
Chapter 5 Reliability
No ratings yet
Chapter 5 Reliability
38 pages
Peter Drucker
No ratings yet
Peter Drucker
10 pages
Concept of Reliability, Validity and Norms (AutoRecovered)
No ratings yet
Concept of Reliability, Validity and Norms (AutoRecovered)
10 pages
Presentation Vital Signs: Group Members Hira Yousaf - Qudsia Hafeez - 19309 Memona Gul - Saadia Maryam - Misbah Ashraf
No ratings yet
Presentation Vital Signs: Group Members Hira Yousaf - Qudsia Hafeez - 19309 Memona Gul - Saadia Maryam - Misbah Ashraf
47 pages
Galvanic Current: DR - Shafaq Shahid Lecturer DPT, Ms-Ompt
No ratings yet
Galvanic Current: DR - Shafaq Shahid Lecturer DPT, Ms-Ompt
24 pages
1) Lec 01
No ratings yet
1) Lec 01
22 pages
NURS2031 - Quantitative Research - Lecture 7
No ratings yet
NURS2031 - Quantitative Research - Lecture 7
6 pages
1) Foundation Concepts
No ratings yet
1) Foundation Concepts
43 pages
MPC Validity and Reliability-1
No ratings yet
MPC Validity and Reliability-1
22 pages
4) Social Stratification
No ratings yet
4) Social Stratification
13 pages
8) Peripheral Joint Mobilization Part 1
No ratings yet
8) Peripheral Joint Mobilization Part 1
35 pages
Unit 9
No ratings yet
Unit 9
27 pages
Reliability and Validity
No ratings yet
Reliability and Validity
18 pages
5) Stretching For Impaired Mobility Part 1
No ratings yet
5) Stretching For Impaired Mobility Part 1
44 pages
Kearins PDF
No ratings yet
Kearins PDF
25 pages
Korea
No ratings yet
Korea
17 pages
Lesson 6 1
No ratings yet
Lesson 6 1
16 pages
8) 12 Drugs Used To Treat Affective
No ratings yet
8) 12 Drugs Used To Treat Affective
17 pages
9) Peripheral Joint Mobilization Part 2
No ratings yet
9) Peripheral Joint Mobilization Part 2
31 pages
Proposed Date Sheet Dec, 2019 From 2016 Batch PDF
No ratings yet
Proposed Date Sheet Dec, 2019 From 2016 Batch PDF
72 pages
9) Pharmacologic Management of Parkinson Disease
No ratings yet
9) Pharmacologic Management of Parkinson Disease
25 pages
Operation Research
No ratings yet
Operation Research
11 pages
Psych Stats Semi
No ratings yet
Psych Stats Semi
11 pages
LU 4 Methods of Reliability Testing Concepts
No ratings yet
LU 4 Methods of Reliability Testing Concepts
23 pages
May 2 - Reliability
No ratings yet
May 2 - Reliability
16 pages
Learning Styles: A Review of Theory, Application, and Best Practices
No ratings yet
Learning Styles: A Review of Theory, Application, and Best Practices
16 pages
LP 4TH Quarter
No ratings yet
LP 4TH Quarter
4 pages
Submission of Module 2 Learning Tasks
No ratings yet
Submission of Module 2 Learning Tasks
8 pages
Chapter 6edited
No ratings yet
Chapter 6edited
15 pages
Healt+R+and+Twycross+A+2015 Validity+and+reliabilty+in+quantitative+studies-1
No ratings yet
Healt+R+and+Twycross+A+2015 Validity+and+reliabilty+in+quantitative+studies-1
2 pages
Psycass Reviewer
No ratings yet
Psycass Reviewer
19 pages
Psy 112 Handout 6
No ratings yet
Psy 112 Handout 6
6 pages
Reliability and Validity: Marilyn K Simon
No ratings yet
Reliability and Validity: Marilyn K Simon
20 pages
Reliability
No ratings yet
Reliability
11 pages
Inter Rather Reliabaility - 045145
No ratings yet
Inter Rather Reliabaility - 045145
5 pages
Questionnaire Reliability Validity
No ratings yet
Questionnaire Reliability Validity
29 pages
Reliability
No ratings yet
Reliability
9 pages
PSY-6102 Evaluate Multicultural Challenges in Conducting Psychological Research
No ratings yet
PSY-6102 Evaluate Multicultural Challenges in Conducting Psychological Research
7 pages
Introduction To Reliability: What Is Reliability? Why Is It Important?
No ratings yet
Introduction To Reliability: What Is Reliability? Why Is It Important?
14 pages
Reliability
No ratings yet
Reliability
3 pages
Chapter 6
No ratings yet
Chapter 6
8 pages
Reliability & Pilot Testing
No ratings yet
Reliability & Pilot Testing
2 pages
Validity and Reliability
No ratings yet
Validity and Reliability
2 pages
Reliability
No ratings yet
Reliability
9 pages
SCIENTIFIC REVOLUTION (Chapter 2)
No ratings yet
SCIENTIFIC REVOLUTION (Chapter 2)
4 pages
SMS1034 (GROUP C) The Perception of The Factors That Contribute To Baby Dumping Problem
No ratings yet
SMS1034 (GROUP C) The Perception of The Factors That Contribute To Baby Dumping Problem
7 pages
Grade 10 ''THE PERCENTILE FOR UNGROUPED DATA''
No ratings yet
Grade 10 ''THE PERCENTILE FOR UNGROUPED DATA''
11 pages
Script-Sir Fano
No ratings yet
Script-Sir Fano
1 page
Realibility and Coefficient of Reliability
No ratings yet
Realibility and Coefficient of Reliability
4 pages
Educ. 402: (Advanced Sociological & Psychological Foundations of Education)
No ratings yet
Educ. 402: (Advanced Sociological & Psychological Foundations of Education)
17 pages
TYPESOFRELIABILITY
No ratings yet
TYPESOFRELIABILITY
5 pages
Language Test Reliability
No ratings yet
Language Test Reliability
20 pages
Psychometric Properties
No ratings yet
Psychometric Properties
3 pages
ELS Q1 Week-1a
No ratings yet
ELS Q1 Week-1a
6 pages
Reliability Estimates: Source of Error Variance Is Test Administration
No ratings yet
Reliability Estimates: Source of Error Variance Is Test Administration
8 pages
Sekolah Menengah Agama Bestari Subang Jaya, Selangor 16 October 2019
No ratings yet
Sekolah Menengah Agama Bestari Subang Jaya, Selangor 16 October 2019
2 pages
Katz - 1959 - Mass Communications Research and The Study of Popular Culture An Editorial Note On A Possible Future For This Journal
No ratings yet
Katz - 1959 - Mass Communications Research and The Study of Popular Culture An Editorial Note On A Possible Future For This Journal
5 pages
Sta630 Mcqs
No ratings yet
Sta630 Mcqs
5 pages
Evaluating a Psychometric Test as an Aid to Selection
From Everand
Evaluating a Psychometric Test as an Aid to Selection
Zuzana Robertson C.Psychol
5/5 (1)
CISA EXAM-Testing Concept-Knowledge of Compliance & Substantive Testing Aspects
From Everand
CISA EXAM-Testing Concept-Knowledge of Compliance & Substantive Testing Aspects
Hemang Doshi
3/5 (4)

RMBS M2 Lecture 5a

Uploaded by

RMBS M2 Lecture 5a

Uploaded by

RESEARCH METHODOLOGY &

Alternate forms / Para Different versions of a test which

Internal consistency The individual items of a test.

• Test-retest reliability is a measure of

Interval ratio data Pearson product moment coefficient of

Ordinal data Spearman rho

Correlation coefficients are limited as estimates of reliability so INTRACLASS CORRELATION

Nominal data Percent agreement or kappa statistics

Where stability of response is questioned Standard error of measurement

• Parallel forms reliability is a measure

• Determination of limits of agreement is useful estimate of range of error expected

when using 2 different versions of instrument.

• This estimate is based on standard deviation of difference scores between the 2

formulated to measure the same thing.

• Average Inter-item Correlation

• Average Item total Correlation

variable in the analysis.

• In split-half reliability we randomly divide all

trustworthiness, utility, and dependability.

• In quantitative research validity is the extent to which any measuring

instrument measures what it is intended to measure.

absence of target condition

• A diagnostic test can have four possible outcome

3. Criterion related validity

construct and shows that they are related.

are in fact, unrelated.

You might also like