Validity and Reliability of The Test
Validity and Reliability of The Test
INTRODUCTION Today, many people say that the world is scientific place. Many people do the best to make a great achievement. They are always making the new thing to make them great. If someone had been done about something, it can make them are motivated. For instance, student who will make a task for their duty, it is not far from the term about quantitative and qualitative approach. Talk about them, the students is always making the proposal. According to Arikunto (2002) proposal is a written planning which is made by the writer itself or with their colleague while Sugiyono (2008) proposal is manual book which was consist of some step to do the research. In the term of quantitative and qualitative, it is always talking about validity and reliability. The student has to do the test to know what validity is and what reliability is. After the students do the test, it gives more benefits to the students. The benefits are gotten from their analysis data which is used by the students as a base to know the good or no from their questions, so, here will discuss about quantitative and qualitative analysis. To know good or no the questions, students should be made the quantitative and qualitative analysis. Quantitative analysis focuses on the analysis of internal characteristic test which is gotten by empirical analysis which is consist of validity, reliability, and difficulty index while qualitative analysis focuses on the content, editorial, material, construction, and the structure of language in the test (Surapranata, 2005). By using the quantitative and qualitative analysis, the students know how far the function of their test. But, in here it just focuses on the validity and reliability analysis and also difficulty index because one of the purposes when the students do the analysis is to increase the questions quality. DIFFICULTY INDEX Difficulty index is an index that shows how difficult a question is. It can show that a question in a certain number can be answered by how many testees. The function of difficulty index is to know how difficult or easy a question in a certain number is, if the question is in easy category (p higher than 0.9), it cannot be used in test. If the question is in middle category (0.4 lower than equal p lower than equal 0.9), it can be accepted and if the question is in difficult category (p higher than 0.9), it can be revised (Yuntoro in Lestari, 2011).
Before to know the difficulty index the students should be made score of the test, for example, the multiple choices score. If the respondents give the correct answer it is scored 1 and if the respondents give the wrong answer it is scored 0 (Guttman Pattern). The question is called difficult if almost 99 percent from the testee give the wrong anwer and the question is called good if almost 100 percent from the testee give the correct answer. Generally, there are some theory say that the difficulty index can be clarified by some step. Those are (1) proportion of correct answer, (2) linier difficulty scale, (3) Daviss index, and (4) bipartite scale (Surapranata, 2005). The formula to know the difficulty index can be used the formula as follows:
Where: p = difficulty index x = the quantity of right answer (or the score for essay questions) Sm = maximum score N = the quantity of testees Based on the formula, it can use to answer why some people said that the questions usually have the raising difficulty index actually the difficulty index is not determined by those problem but how far the respondents can answer the each questions. According to Crocker and Algina (1986) in Surapranata (2005) the difficulty index has two characteristic. First, (p) difficulty index is parameter not showing the characteristic. Second, difficulty index is the characteristic of the question itself or the researcher. The difficulty index also have the weaknesses such as (1) difficulty index (p) actually the parameter of the easy of the question, higher the difficulty index higher the easy of the question and vise versa, (2) the difficulty index (p) have not the relation with the difficulty index scale. Ideally, the difficulty index is used to increase or upgrade the learning program. When the students are developed the questions, it should be followed by the difficulty index from the easier question up to the difficult question. When the respondent give the wrong answer or give the correct answer in all of the questions, it can be concluded that there is our tendency not using the questions. If all of the respondents give the correct answer in one or more question it can be said that the question is bad or vise versa. The question which has 0 or 1 of difficulty index it is not giving the contribution in the differences among the testees. The question which has 0 or 1 of difficulty index influence to the mean only, but not influence the reliability and validity. The
difficulty index influences the score variability and the difference among the testees if the total score is high or low. The maximum score of variability is when p equal 0.5 because the score is more various. In the class context, usually teacher use the fair test, which p equal 0.4 up to 0.9. The difficulty index has to do with the discrimination index. Discrimination index is a point to discriminate the quality of testee who is in the high group and low group. This index shows the suitableness between questions function and tests function generally. The function of discrimination index is to determine if a question can discriminate group in an aspect that is measured based on the difference which is available in that group or not (Yuntoro in Lestari, 2011). The point of discrimination index is between -1 up to 1. The minus sign shows that the respondents which have the lower ability can give the correct answer whereas the higher ability cannot answer with correct answer. So, this question is showing that the ability of the students is capsized. The discrimination index is computed based on the two groups; high and low group. According to Kelley (1939) and Crocker & Algina (1986) in Surapranata (2005) the stabile group is by divided the group into 27% high and 27% low while the Cureton (1957) in Surapranata (2005) divide the group into 33% high and 33% low. The discrimination index is influenced by the difficulty index. If the respondents answer with p equal 0 or p equal 1 it cannot say that the question able to discriminate the ability of the respondent. The discrimination index (D) is maximum when the p equal 0.5 the score is (D) equal 1.00. Here the table about the discrimination index (D) with p. Table 1. The Maximum of Discrimination Index (D) as the Function of p P score 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 D Maximum 0.00 0.20 0.40 0.60 0.80 1.00 0.80 0.60 0.40 0.20 0.00
The discrimination index can be computed by using the formula as follows: Where: D = discrimination index pA = the difficulty index of high group pB = the difficulty index of low group
VALIDITY Valid means that the instrument is used as the tool to gain the data is appropriate with the data. Validity is divided into two kinds those are (1) logic validity and (2) empirical validity. Logic validity is suitable with the analysis qualitative. Here consist of material, construction and language use. Validity of test is always in line with empirical study (Nunnaly, 1972) in Surapranata (2005). Acoording to Grondlund (1985) in Surapranata (2005) validity has to do with the result of the instrument, showing the level and used in the research in line with objective of the research. According to APA in Surapranata (2005) there are 4 kinds of validity; content, construct, predictive, and congruent validity. These kinds of validity saw s the stamp collecting (Landy, 1987 in Surapranata, 2005). Content validity means that the instrument is in line with the curriculum. Here there are two factors influence the content validity; the test itself and the process. The way to know the validity is by seeing the form of the test. For example, the mathematic test is used to measure the mathematic score not to measure the English score. Construct validity has to do with the phenomena and real object. For example, gravity, mathematic ability, English ability and so on. Construct validity means the instrument which is used in the research is in line with the theoretical construction where the instrument is made. The construct validity is explained briefly in the standard competence, basic competence, and indicator. All of them are used to answer the construction of the test. Predictive validity means the relation between the score gained by the students with the next score. The instrument which can be predicted the next score it is the best instrument. For example, the student who gets the best achievement in the senior high school is predicted that they can get the best achievement in the university. It means that the instrument is valid while Concurrent validity means the empiric validity. It means that the score is up to date. The score based on the experience. The way to compute the validity is using product moment correlation by Pearson. The formula served below:
Where: rxy = coefficient correlation between variable x and y xy = total computation between variable x and y
x2 y2
= quadrate from x = quadrate from y The other formula to compute the validity is using correlation bipartite; point biserial correlation, tetra choric correlation, and phi correlation.
RELIABILITY Reliable means that the instrument is used give the score constant. Whenever the instrument is used, the instrument gives the score which constant. Factors in reliability are there are differences between individual, tiredness, guessing, and etc. Reliability means that the instrument can differentiate the subject. Nunnaly (1970) in Surapranata (2005) states that reliability means normal score which is gained by the same subject when did the repeatedly test in the difference situation. The function of test reliability is to determine how much variability which happen because of the fault of measurement and how many the real test score variability. Reliability has two constant, internal and external. Internal means that the items of questions are homogeny from the difficulty until the form of the test while the external means level of the items to produce the score which constant every time. There are four kinds of reliability concept those are (1) parallel or equivalent, (2) test-retest or stabilities, (3) split-half, and (4) internal consistency. If the first score test is equivalent with the second score of the test it means has the high reliability or there is positive correlation between the first test and the second test. The score of reliability is determined the score of the correlation which is called reliability index. Crocker and Algina (1986) in Surapranata (2005) states that the coefficient correlation is influenced by some factors, those are long of the test, rapidity, homogeneity, and the difficulty. To compute the reliability is using formula as follows: Jkr = the quadrate quantity of testees Xt = total score from each testee k = the quantity of question N = the quantity of testee (Xt)2 = the quadrate of the total score quantity First the tester must count the quadrate quantity of tastes (Jkr) by dividing the quantity of total score from each tested with the quantity of question. Next, the result is decreased by the result of calculation from the quadrate of the total score quantity minus the quantity of questions divided the quantity of testee. Second step, tester determines the quadrate total of question by using equation: Where: Jki = the quadrate total of question B2 = quadrate total of right answer k = the quantity of question N = the quantity of testee (Xt)2 = the quadrate of the total score quantity
Tester must find the quadrate total of question (Jki) by dividing the quadrate total of right answer with the quantity of testee. Next, the result is decreased by the result of calculation from the quadrate of the total score quantity divided the quantity of questions times the quantity of testee. The third step, tester must find the quadrate quantity of total score by summing up the total of right answer times the total of wrong answer divided the total of right answer plus the total of wrong answer. The equation is:
Jkt B S
= the quantity of quadrate total = the total of right answer = the total of wrong answer
The fourth step is finding the remnant total quadrate by using equation: Jks = Jkt - Jkr - Jki Jks = the remnant total quadrate Jkt = the quantity of quadrate total Jkr = the quadrate quantity of testees Jki = the quadrate total of question Tester decreases the quantity of quadrate total with the quadrate quantity of testees and the quadrate total of question. The fifth step is looking for the variant. For finding the variant, tester must divide the quadrate total with the quantity of testee minus one. The equation which is used is:
Where: d.b = the quantity of testee minus one The last step is finding the reliability by using Hoyt equation. Here, the reliability of test calculated by using Hoyt equation. The equation of Hoyt
BIBLIOGRAPHY Arikunto, S. 2002. Prosedur Penelitian Suatu Pendekatan Praktek. Jakarta: Rineka Cipta Press. Azwar, S. 2004. Reliabilitas dan Validitas. Yogjakarta: Pustaka Pelajar. Lestari, P.Y. 2011. Lestaris Work Language Testing. Kediri: Uniska Sugiyono. 2008. Metode Penelitian Kuantitatif, Kualitatif dan R & D. Bandung: ALFABETA. Surapranata, S. 2005. Analisi, Validitas, Reliabilitas, dan Interpretasi Hasil Tes; Implementasi Kurikulum 2004. Bandung: PT. Remadja Rosdakarya