Validity and Reliability of The Test

1. The document discusses validity and reliability testing which are important for developing high-quality test questions. Validity refers to a test measuring what it intends to measure, while reliability means a test gives consistent results. 2. There are several types of validity discussed including content, construct, predictive, and concurrent validity. Reliability is the ability of a test to consistently differentiate subjects. There are various methods for calculating validity and reliability discussed. 3. Test difficulty index, discrimination index, and how they relate are also covered. A good test question will have a difficulty index between 0.4-0.9 and a discrimination index close to 1.

Uploaded by

Lisa Sharp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

697 views9 pages

Validity and Reliability of The Test

Uploaded by

Lisa Sharp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 9

VALIDITY AND RELIABILITY OF THE TEST

INTRODUCTION Today, many people say that the world is scientific place. Many people do the best to make a great achievement. They are always making the new thing to make them great. If someone had been done about something, it can make them are motivated. For instance, student who will make a task for their duty, it is not far from the term about quantitative and qualitative approach. Talk about them, the students is always making the proposal. According to Arikunto (2002) proposal is a written planning which is made by the writer itself or with their colleague while Sugiyono (2008) proposal is manual book which was consist of some step to do the research. In the term of quantitative and qualitative, it is always talking about validity and reliability. The student has to do the test to know what validity is and what reliability is. After the students do the test, it gives more benefits to the students. The benefits are gotten from their analysis data which is used by the students as a base to know the good or no from their questions, so, here will discuss about quantitative and qualitative analysis. To know good or no the questions, students should be made the quantitative and qualitative analysis. Quantitative analysis focuses on the analysis of internal characteristic test which is gotten by empirical analysis which is consist of validity, reliability, and difficulty index while qualitative analysis focuses on the content, editorial, material, construction, and the structure of language in the test (Surapranata, 2005). By using the quantitative and qualitative analysis, the students know how far the function of their test. But, in here it just focuses on the validity and reliability analysis and also difficulty index because one of the purposes when the students do the analysis is to increase the questions quality. DIFFICULTY INDEX Difficulty index is an index that shows how difficult a question is. It can show that a question in a certain number can be answered by how many testees. The function of difficulty index is to know how difficult or easy a question in a certain number is, if the question is in easy category (p higher than 0.9), it cannot be used in test. If the question is in middle category (0.4 lower than equal p lower than equal 0.9), it can be accepted and if the question is in difficult category (p higher than 0.9), it can be revised (Yuntoro in Lestari, 2011).

Before to know the difficulty index the students should be made score of the test, for example, the multiple choices score. If the respondents give the correct answer it is scored 1 and if the respondents give the wrong answer it is scored 0 (Guttman Pattern). The question is called difficult if almost 99 percent from the testee give the wrong anwer and the question is called good if almost 100 percent from the testee give the correct answer. Generally, there are some theory say that the difficulty index can be clarified by some step. Those are (1) proportion of correct answer, (2) linier difficulty scale, (3) Daviss index, and (4) bipartite scale (Surapranata, 2005). The formula to know the difficulty index can be used the formula as follows:

Where: p = difficulty index x = the quantity of right answer (or the score for essay questions) Sm = maximum score N = the quantity of testees Based on the formula, it can use to answer why some people said that the questions usually have the raising difficulty index actually the difficulty index is not determined by those problem but how far the respondents can answer the each questions. According to Crocker and Algina (1986) in Surapranata (2005) the difficulty index has two characteristic. First, (p) difficulty index is parameter not showing the characteristic. Second, difficulty index is the characteristic of the question itself or the researcher. The difficulty index also have the weaknesses such as (1) difficulty index (p) actually the parameter of the easy of the question, higher the difficulty index higher the easy of the question and vise versa, (2) the difficulty index (p) have not the relation with the difficulty index scale. Ideally, the difficulty index is used to increase or upgrade the learning program. When the students are developed the questions, it should be followed by the difficulty index from the easier question up to the difficult question. When the respondent give the wrong answer or give the correct answer in all of the questions, it can be concluded that there is our tendency not using the questions. If all of the respondents give the correct answer in one or more question it can be said that the question is bad or vise versa. The question which has 0 or 1 of difficulty index it is not giving the contribution in the differences among the testees. The question which has 0 or 1 of difficulty index influence to the mean only, but not influence the reliability and validity. The

difficulty index influences the score variability and the difference among the testees if the total score is high or low. The maximum score of variability is when p equal 0.5 because the score is more various. In the class context, usually teacher use the fair test, which p equal 0.4 up to 0.9. The difficulty index has to do with the discrimination index. Discrimination index is a point to discriminate the quality of testee who is in the high group and low group. This index shows the suitableness between questions function and tests function generally. The function of discrimination index is to determine if a question can discriminate group in an aspect that is measured based on the difference which is available in that group or not (Yuntoro in Lestari, 2011). The point of discrimination index is between -1 up to 1. The minus sign shows that the respondents which have the lower ability can give the correct answer whereas the higher ability cannot answer with correct answer. So, this question is showing that the ability of the students is capsized. The discrimination index is computed based on the two groups; high and low group. According to Kelley (1939) and Crocker & Algina (1986) in Surapranata (2005) the stabile group is by divided the group into 27% high and 27% low while the Cureton (1957) in Surapranata (2005) divide the group into 33% high and 33% low. The discrimination index is influenced by the difficulty index. If the respondents answer with p equal 0 or p equal 1 it cannot say that the question able to discriminate the ability of the respondent. The discrimination index (D) is maximum when the p equal 0.5 the score is (D) equal 1.00. Here the table about the discrimination index (D) with p. Table 1. The Maximum of Discrimination Index (D) as the Function of p P score 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 D Maximum 0.00 0.20 0.40 0.60 0.80 1.00 0.80 0.60 0.40 0.20 0.00

The discrimination index can be computed by using the formula as follows: Where: D = discrimination index pA = the difficulty index of high group pB = the difficulty index of low group

VALIDITY Valid means that the instrument is used as the tool to gain the data is appropriate with the data. Validity is divided into two kinds those are (1) logic validity and (2) empirical validity. Logic validity is suitable with the analysis qualitative. Here consist of material, construction and language use. Validity of test is always in line with empirical study (Nunnaly, 1972) in Surapranata (2005). Acoording to Grondlund (1985) in Surapranata (2005) validity has to do with the result of the instrument, showing the level and used in the research in line with objective of the research. According to APA in Surapranata (2005) there are 4 kinds of validity; content, construct, predictive, and congruent validity. These kinds of validity saw s the stamp collecting (Landy, 1987 in Surapranata, 2005). Content validity means that the instrument is in line with the curriculum. Here there are two factors influence the content validity; the test itself and the process. The way to know the validity is by seeing the form of the test. For example, the mathematic test is used to measure the mathematic score not to measure the English score. Construct validity has to do with the phenomena and real object. For example, gravity, mathematic ability, English ability and so on. Construct validity means the instrument which is used in the research is in line with the theoretical construction where the instrument is made. The construct validity is explained briefly in the standard competence, basic competence, and indicator. All of them are used to answer the construction of the test. Predictive validity means the relation between the score gained by the students with the next score. The instrument which can be predicted the next score it is the best instrument. For example, the student who gets the best achievement in the senior high school is predicted that they can get the best achievement in the university. It means that the instrument is valid while Concurrent validity means the empiric validity. It means that the score is up to date. The score based on the experience. The way to compute the validity is using product moment correlation by Pearson. The formula served below:

Where: rxy = coefficient correlation between variable x and y xy = total computation between variable x and y

x2 y2

= quadrate from x = quadrate from y The other formula to compute the validity is using correlation bipartite; point biserial correlation, tetra choric correlation, and phi correlation.

RELIABILITY Reliable means that the instrument is used give the score constant. Whenever the instrument is used, the instrument gives the score which constant. Factors in reliability are there are differences between individual, tiredness, guessing, and etc. Reliability means that the instrument can differentiate the subject. Nunnaly (1970) in Surapranata (2005) states that reliability means normal score which is gained by the same subject when did the repeatedly test in the difference situation. The function of test reliability is to determine how much variability which happen because of the fault of measurement and how many the real test score variability. Reliability has two constant, internal and external. Internal means that the items of questions are homogeny from the difficulty until the form of the test while the external means level of the items to produce the score which constant every time. There are four kinds of reliability concept those are (1) parallel or equivalent, (2) test-retest or stabilities, (3) split-half, and (4) internal consistency. If the first score test is equivalent with the second score of the test it means has the high reliability or there is positive correlation between the first test and the second test. The score of reliability is determined the score of the correlation which is called reliability index. Crocker and Algina (1986) in Surapranata (2005) states that the coefficient correlation is influenced by some factors, those are long of the test, rapidity, homogeneity, and the difficulty. To compute the reliability is using formula as follows: Jkr = the quadrate quantity of testees Xt = total score from each testee k = the quantity of question N = the quantity of testee (Xt)2 = the quadrate of the total score quantity First the tester must count the quadrate quantity of tastes (Jkr) by dividing the quantity of total score from each tested with the quantity of question. Next, the result is decreased by the result of calculation from the quadrate of the total score quantity minus the quantity of questions divided the quantity of testee. Second step, tester determines the quadrate total of question by using equation: Where: Jki = the quadrate total of question B2 = quadrate total of right answer k = the quantity of question N = the quantity of testee (Xt)2 = the quadrate of the total score quantity

Tester must find the quadrate total of question (Jki) by dividing the quadrate total of right answer with the quantity of testee. Next, the result is decreased by the result of calculation from the quadrate of the total score quantity divided the quantity of questions times the quantity of testee. The third step, tester must find the quadrate quantity of total score by summing up the total of right answer times the total of wrong answer divided the total of right answer plus the total of wrong answer. The equation is:

Jkt B S

= the quantity of quadrate total = the total of right answer = the total of wrong answer

The fourth step is finding the remnant total quadrate by using equation: Jks = Jkt - Jkr - Jki Jks = the remnant total quadrate Jkt = the quantity of quadrate total Jkr = the quadrate quantity of testees Jki = the quadrate total of question Tester decreases the quantity of quadrate total with the quadrate quantity of testees and the quadrate total of question. The fifth step is looking for the variant. For finding the variant, tester must divide the quadrate total with the quantity of testee minus one. The equation which is used is:

Where: d.b = the quantity of testee minus one The last step is finding the reliability by using Hoyt equation. Here, the reliability of test calculated by using Hoyt equation. The equation of Hoyt

Where: r11 = reliability of test 2 Ss = varian of testees St2 = varian sisa

BIBLIOGRAPHY Arikunto, S. 2002. Prosedur Penelitian Suatu Pendekatan Praktek. Jakarta: Rineka Cipta Press. Azwar, S. 2004. Reliabilitas dan Validitas. Yogjakarta: Pustaka Pelajar. Lestari, P.Y. 2011. Lestaris Work Language Testing. Kediri: Uniska Sugiyono. 2008. Metode Penelitian Kuantitatif, Kualitatif dan R & D. Bandung: ALFABETA. Surapranata, S. 2005. Analisi, Validitas, Reliabilitas, dan Interpretasi Hasil Tes; Implementasi Kurikulum 2004. Bandung: PT. Remadja Rosdakarya

EDUC 75 Module 6 Item Analysis and Validation For Students
No ratings yet
EDUC 75 Module 6 Item Analysis and Validation For Students
11 pages
Pico Bricks Ebook 15
100% (1)
Pico Bricks Ebook 15
234 pages
Range of Difficulty Index Interpretation Action: 0-0.25 0.26-0.75 0.76-Above
100% (1)
Range of Difficulty Index Interpretation Action: 0-0.25 0.26-0.75 0.76-Above
2 pages
Item Analysis and Validation: Learning Outcomes
80% (5)
Item Analysis and Validation: Learning Outcomes
9 pages
Hybrid Kettlebell Strength and Conditioning Main Manual
No ratings yet
Hybrid Kettlebell Strength and Conditioning Main Manual
28 pages
5FS2 Learning Episode 5 Item Analysis
No ratings yet
5FS2 Learning Episode 5 Item Analysis
13 pages
Bab Viii (Bagan Ok) 89 - 99
No ratings yet
Bab Viii (Bagan Ok) 89 - 99
11 pages
8c Item Analysis
100% (1)
8c Item Analysis
6 pages
5 Item Analysis and Validation
No ratings yet
5 Item Analysis and Validation
22 pages
PDF Document
No ratings yet
PDF Document
76 pages
Lesson 6 - March 4
No ratings yet
Lesson 6 - March 4
3 pages
Item Ananlyis
No ratings yet
Item Ananlyis
7 pages
Educ 71 FS2 Episode5
No ratings yet
Educ 71 FS2 Episode5
20 pages
Module 7
No ratings yet
Module 7
7 pages
Module 4 Item Analysis and Validation
100% (9)
Module 4 Item Analysis and Validation
7 pages
Lesson 4 - Item Analysis and Test Validation
No ratings yet
Lesson 4 - Item Analysis and Test Validation
24 pages
Ped 8 - Input 7
No ratings yet
Ped 8 - Input 7
3 pages
Item Analysis and Validation
No ratings yet
Item Analysis and Validation
5 pages
Module 6 - Difficulty and Discri.
No ratings yet
Module 6 - Difficulty and Discri.
6 pages
Module 6
No ratings yet
Module 6
11 pages
Item Analysis and Validation: Ed 106 - Assessment in Learning 1 AY 2022-2023
No ratings yet
Item Analysis and Validation: Ed 106 - Assessment in Learning 1 AY 2022-2023
8 pages
Item Analysis Lecture FSC 2023
No ratings yet
Item Analysis Lecture FSC 2023
18 pages
ED 106 - Module 6
No ratings yet
ED 106 - Module 6
8 pages
Math Basics Standard Deviation
No ratings yet
Math Basics Standard Deviation
3 pages
Scoring and Interpretation of Test Scores
100% (1)
Scoring and Interpretation of Test Scores
13 pages
Item Analysis and Validation
No ratings yet
Item Analysis and Validation
19 pages
Report Iglesia Kez and KC
No ratings yet
Report Iglesia Kez and KC
3 pages
Item Analysis and Validation
67% (3)
Item Analysis and Validation
19 pages
Item Analysis: Reporter: Jacob C. Duncombe
No ratings yet
Item Analysis: Reporter: Jacob C. Duncombe
32 pages
Thank U : 6. Critical Evaluation of A Question
No ratings yet
Thank U : 6. Critical Evaluation of A Question
2 pages
Item Analysis and Validation (Group 5)
No ratings yet
Item Analysis and Validation (Group 5)
35 pages
Item Analysis and Validation - Latest
No ratings yet
Item Analysis and Validation - Latest
30 pages
Item Analysis and Validation
No ratings yet
Item Analysis and Validation
14 pages
Modul 20 Printing
100% (1)
Modul 20 Printing
59 pages
14 May 2025 Evaluation of Indonesian Language Learning & Literature Week 13 14 May 2025
No ratings yet
14 May 2025 Evaluation of Indonesian Language Learning & Literature Week 13 14 May 2025
50 pages
P B J P B J: Section 2: Item Analysis 1.0 Difficulty Index
No ratings yet
P B J P B J: Section 2: Item Analysis 1.0 Difficulty Index
7 pages
Module 4 Item Analysis and Validation
No ratings yet
Module 4 Item Analysis and Validation
7 pages
Administering, Analyzing, & Improving Tests (Part 2)
No ratings yet
Administering, Analyzing, & Improving Tests (Part 2)
31 pages
Jemay Basog BTLE-IA-2C L5
No ratings yet
Jemay Basog BTLE-IA-2C L5
4 pages
Jemay Basog BTLE-IA-2C L5
No ratings yet
Jemay Basog BTLE-IA-2C L5
4 pages
C. Item Analysis: A B C D
No ratings yet
C. Item Analysis: A B C D
8 pages
Item Analysis and Validation: Mark Leonard Tan Verena Gonzales Ann Creia Tupasi Ramil Cabañesas
No ratings yet
Item Analysis and Validation: Mark Leonard Tan Verena Gonzales Ann Creia Tupasi Ramil Cabañesas
46 pages
Item Analysis and Validation
No ratings yet
Item Analysis and Validation
4 pages
Test Item Analysis
No ratings yet
Test Item Analysis
2 pages
Item Analysis: Item Difficulty/Difficulty Index
100% (5)
Item Analysis: Item Difficulty/Difficulty Index
3 pages
Task 1 Edup3063-Tesl 3-Nur Syahanis Aduni Binti Mohd Rafi-2020092340327 PDF
No ratings yet
Task 1 Edup3063-Tesl 3-Nur Syahanis Aduni Binti Mohd Rafi-2020092340327 PDF
17 pages
05E.90 Improving A Classrom-Based Assessment Test
100% (1)
05E.90 Improving A Classrom-Based Assessment Test
36 pages
Item Analysis (Evaluation of Objective Tests) : Most Useful in and Reviewing Questions Done Through Calculating The and
No ratings yet
Item Analysis (Evaluation of Objective Tests) : Most Useful in and Reviewing Questions Done Through Calculating The and
13 pages
Item Analysis
100% (1)
Item Analysis
6 pages
BSED 503 Unit 1 Important Notes
No ratings yet
BSED 503 Unit 1 Important Notes
6 pages
Kyu Edu 2301 WK8
No ratings yet
Kyu Edu 2301 WK8
5 pages
Test Analysis
No ratings yet
Test Analysis
67 pages
Item Analysis and Validation
No ratings yet
Item Analysis and Validation
39 pages
Day 12 Item Analysis
No ratings yet
Day 12 Item Analysis
7 pages
Item Analysis
No ratings yet
Item Analysis
12 pages
Item Analysis Module
No ratings yet
Item Analysis Module
10 pages
7 Item Analysis
No ratings yet
7 Item Analysis
49 pages
2.4 Item Analysis
No ratings yet
2.4 Item Analysis
19 pages
3rd Module in Assessment of Learning 1
100% (1)
3rd Module in Assessment of Learning 1
10 pages
Scientific Management of the Classroom
From Everand
Scientific Management of the Classroom
Pernell Hodges
No ratings yet
Performance-Based Assessment for 21st-Century Skills
From Everand
Performance-Based Assessment for 21st-Century Skills
Todd Stanley
4.5/5 (14)
Revision Exercises in Basic Engineering Mechanics
From Everand
Revision Exercises in Basic Engineering Mechanics
Gregory Pastoll
No ratings yet
Open Hole Logging Costs ( ) : Platform Express
No ratings yet
Open Hole Logging Costs ( ) : Platform Express
8 pages
Reactions in Organic Chemistry
No ratings yet
Reactions in Organic Chemistry
31 pages
Drawing Class Notes
No ratings yet
Drawing Class Notes
5 pages
Bulk LPG Layout Requirements-Comparison BTW San & Nfpa 58
No ratings yet
Bulk LPG Layout Requirements-Comparison BTW San & Nfpa 58
25 pages
Equity Financing in Cooperatives. Three Case Studies in Dairy Sector
No ratings yet
Equity Financing in Cooperatives. Three Case Studies in Dairy Sector
15 pages
Mathematics 1: Matrix Algebra E. Kreyszig
No ratings yet
Mathematics 1: Matrix Algebra E. Kreyszig
1 page
Evergreen State - Music Cultures of The World (1993-1994) Sean Williams
No ratings yet
Evergreen State - Music Cultures of The World (1993-1994) Sean Williams
4 pages
A Short Introduction To Serverless Architecture
No ratings yet
A Short Introduction To Serverless Architecture
3 pages
Console Log ZC026
No ratings yet
Console Log ZC026
7 pages
Troubleshooting
No ratings yet
Troubleshooting
6 pages
Presentation 1 Adjectives-1
No ratings yet
Presentation 1 Adjectives-1
13 pages
For Touring Pros - The Secret That Will Make Your Mind Create Any Outrageous Outcome That You Wish - Siddha Performance Golf
No ratings yet
For Touring Pros - The Secret That Will Make Your Mind Create Any Outrageous Outcome That You Wish - Siddha Performance Golf
6 pages
Energy Efficient Pumping Technology Innovations and Recent Trends
No ratings yet
Energy Efficient Pumping Technology Innovations and Recent Trends
15 pages
Michael Evan Aguelo: Educational Background
No ratings yet
Michael Evan Aguelo: Educational Background
3 pages
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
No ratings yet
Washing Machine Owner's Instructions: B1485AV/ B1285AV/ B1285AS/ B1285A/ B1085A/ R1285AV/ R1085A/ F1285AV/ F1085A
22 pages
Rheology and Transport Phenomena (FET)
No ratings yet
Rheology and Transport Phenomena (FET)
9 pages
Assignment: Renewsys Launch New Range of High Efficiency Solar Modules/ Pannels
No ratings yet
Assignment: Renewsys Launch New Range of High Efficiency Solar Modules/ Pannels
2 pages
Solutions-Grand Marks Booster Challenege#1
No ratings yet
Solutions-Grand Marks Booster Challenege#1
66 pages
Anja Golob 5 Poems (Tadeja Spruk)
No ratings yet
Anja Golob 5 Poems (Tadeja Spruk)
9 pages
UNIT 3 - Test 2
No ratings yet
UNIT 3 - Test 2
7 pages
Line Sizing Calculation - Pump Discharge
No ratings yet
Line Sizing Calculation - Pump Discharge
2 pages
Low-Cost Strategy in The Air Air Arabia
No ratings yet
Low-Cost Strategy in The Air Air Arabia
15 pages
Science: First Quarter - Module 3 Mixtures and Substances
100% (3)
Science: First Quarter - Module 3 Mixtures and Substances
24 pages
Lgep 2: SKF High Load, Extreme Pressure Bearing Grease
No ratings yet
Lgep 2: SKF High Load, Extreme Pressure Bearing Grease
2 pages
Mid-Term Exam
No ratings yet
Mid-Term Exam
3 pages
Statistical Inference (BW-SP20)
No ratings yet
Statistical Inference (BW-SP20)
2 pages
00.01 Heko Chain Conveyors 2007
No ratings yet
00.01 Heko Chain Conveyors 2007
7 pages
Younity Community Course Module Oct 1.0
0% (1)
Younity Community Course Module Oct 1.0
103 pages

Validity and Reliability of The Test

Uploaded by

Validity and Reliability of The Test

Uploaded by

VALIDITY AND RELIABILITY OF THE TEST

Where: r11 = reliability of test 2 Ss = varian of testees St2 = varian sisa

You might also like