MR ELAE G9

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

MINI RESEARCH

ENGLISH LANGUAGE ASSESSMENT & EVALUATION


“Item Analysis of Final Test for the 9 th Grade Students of SMPN 19
MEDAN in the Academic Year of 2021/2022”

Arranged By:
1. Desi Natalia (2213321027)
2. Johanes Tambunan (2212421003)
3. Purnama Sari Hasibuan (2213321025)

FACULTY OF LANGUAGES AND ARTS


ENGLISH DEPARTMENT
MEDAN STATE UNIVERSITY
2023
FOREWORD

Praise be to the presence of God Almighty, who has bestowed His mercy and grace and
health on us, so that we can complete the "MINI RESEARCH" task. This assignment was made
to fulfill one of our courses, namely "English Language Assessment and Evaluation". This mini
research assignment was prepared with the hope that it can add to our knowledge and insight,
especially in terms of "Item Analysis of Final Test for the 9th Grade Students of SMPN 19
MEDAN in the Academic Year of 2021/2022". We realize that this is still far from perfection, if
there are many shortcomings and mistakes in this task, we apologize because our knowledge and
understanding is still limited, because our limited knowledge and understanding is not enough.

Thanks to the lecturer for this course, Mrs. Dr. Neni Afrida Sari Harahap, S.Pd.,
M.Hum who has given us this Mini Research assignment as our initial lesson and to fulfill the
IQF assignment score. And thanks to the group team who have participated in doing this task
together. This work is still far from perfect, we really look forward to suggestions and constructive
criticism from readers to improve this work. We hope that this assignment can be useful for readers
and for us in particular, we thank you for your attention.

Medan, May 2023


TABLE OF CONTENT ............................................................................................ iii
FOREWORD.......................................................................................................... i
CHAPTER I INTRODUCTION ............................................................................. 1
A. The Background of The study ....................................................................... 1
B. The Problem of the Study............................................................................... 2
C. The Objectives of The study........................................................................... 2
D. The Significances of The study....................................................................... 2
CHAPTER II THEORITICAL STUDY ................................................................. 3
A. Theoretical Framework ................................................................................. 3
B. The Conceptual Framework............................................................................ 3
C. The Relevant Studies....................................................................................... 4
CHAPTER III METHODOLOGY.......................................................................... 5
A. Research Design…………………………………………………………….. 5
B. The Data and The Sources Data.................................................................... 5
C. Population and Sample .................................................................................. 5
D. The Instruments of Collecting Data............................................................... 5
D. The Techniques of Analyzing Data ................................................................. 6
CHAPTER IV DISSCUSION..................................................................................... 8
A. Research Findings............................................................................................. 8
B. Discussion......................................................................................................... 11
CHAPTER V CLOSING.......................................................................................... 14
A. Conclusion....................................................................................................... 14
B. Suggestion ......................................................................................................... 14

REFFERENCES .......................................................................................................... 15
CHAPTER I

INTRODUCTION

A. The Background of the Study

An assessment, also known as a test, shows significant role in learning process of education that
provides information to the teacher on the area to which learning outcomes have been achieved by a student.
Assessment identifies what students know, understand, can do and feel at different stages in the learning
process. There are several forms of assessments can be done either written form or oral form. Through
education process, it is expected that there will be changes on the part of the learners. One of the suitable
ways to comprehend students’ ability, especially in using English, uses evaluation or test. Test is a
procedure or an appliance used to know or measure something by means that every test should be reliable
which measure precisely whatever it is supposed to measure (Heaton, 1990, p. 7).

In learning, test is a tool of evaluation which has an important role to measure the teaching learning
process in schools in case of measuring the ability of student and measuring the efficiency of teaching
learning process. Identifying that evaluation is very important in school, teachers have to know the quality
of a good test or criteria of a good test applied to their students (Arikunto, 2005, p. 53). Moreover, there are
some characteristics of a good test such as (1) Tests have to have high validity; (2) Tests should be reliable
or can be trusted; (3) Test must be objective; and (4) Tests must be practical and has clear instruction.

One of the assessments used by the most teachers to assess their students; knowledge and
comprehension is multiple-choice test. Multiple-choice items are easy to score, but the problem is, this type
of tests is difficult and takes time consuming to construct. It is common knowledge that the correct answers
should be distributed consistently among the alternative positions of multiple-choice items, but there are
many other important guidelines for constructing good items and generally must be well known and
recognized by the teachers. The guidelines are fairly comprehensive list of recommendations for
constructing multiple-choice test items, focus on content, structure and options of a multiple-choice test
item. As a result, gaining a good test item, the teachers should construct its quality in regard to the reliability
and validity

To produce or construct a good test, specifically multiple-choice test, it needs to be considered


about the criteria by attempting an items analysis. Item analysis, known as a way examining a test item,
uses statistics and judgment to evaluate tests based on the quality of individual items, item sets, and entire
sets of items, as well as the relationship of each item to other items. It done by investigating the performance
of items considered individually either in relation to some external criterion or in relation to the remaining
items on the test (Thompson & Levitov, 1985, p. 163). In a short, it is appropriate to use in order to improve
item and test quality.

B. The Problems of the Study

The problem that the writer want to solve are as follows:

-What is the teacher's role in determining good test questions in an exam?

-How is the quality of PAS questions for students at SMPN 19 Medan?

-How to implement "item analysis" in analyzing multiple choice questions, which is the teacher's mainstay
of making questions.

C. The Objectives of the Study.

This research was conducted by the aim to reveal regarding the test scores reliability, discrimination
power, index difficulty, and item analysis in order to provide detail information leading to the improvement
of test items construction. Therefore, hopefully, the findings of this research will provide deeper
understanding and important information for the teachers and other researchers in regard that that analyzing
items test is part of continuing professional development for teachers. Concerning to the explanations
mentioned, the researchers were interested in the analysis of the English multiple-choice items test as a
final assessment (PAS which stands for Penilaian Akhir Semester) at the first semester of the 9th grade
students at SMPN 19 Medan in the academic year 2021/2022

D. The Significances of the Study.

The writer hopes this research can give contribution to the English teaching and learning. It has
two major significances namely Practical and Theoretical significances.

1. Theoretical Significance

This research gives The research provides input and information regarding the types of test questions in
this study

2. Practical Significance

A. For the teachers

the results of this research can be knowledge in improving the quality of questions on the test so that they
become even better

B. Other

The results of the research can be used as a reference in analyzing questions using "item analysis", as well
as providing benefits and knowledge to other readers.
CHAPTER II

THEORITICAL STUDY
A. THEORITICAL FRAMEWORK

Item analysis is a systematic procedure that will provide specific information of a test. This study
is intended to determine the quality of the items or questions in the English National Examination. The
difficulty level is aimed to know whether the question is too difficult or not. Difficulty level divided into
three categories, hard, moderate, and easy. The higher the difficulty level of an item, the fewer students
who can answer the question correctly. While item discrimination power is used to determine the
proficient and less proficient students. The item discrimination power divided into four categories, poor,
satisfactory, good, and excellent. Then, the purpose of distractor efficiency is to attract students to select
it. A distractor is declared to have a proper function if there are 5% of the students who choose it. The
result of the item analysis will greatly help provide in-depth information regarding difficulty level, item
discrimination power, and the distractor efficiency.

The item analysis is useful in analyzing the question in the English National Examination of
SMPN 19 Medan academic year 2021/2022 in order to know the quality of the item tested. The test result
obtained from unqualified items, of course, cannot be a true reflection of students' achievement. The
activity of item analysis will include the difficulty level, item discrimination power, and the distractor
efficiency. This item analysis activity is aimed to provide information to the teacher about the quality of
the item used. The teacher can find out the item quality and the result can be used to develop and revise
the items which have less good or bad quality.

B. THE CONCEPTUAL FRAMEWORK

The conceptual framework of this research is presented as follows:


Based on the conceptual framework above, to seen the content validity of the English test made
by the teacher. The researcher was analyzed the English test made by the teacher. The test in the form of
multiple choice or essay. To know whether or not each item of the English test made by the teacher
accordance with content of validity or not, the researcher used a rational approach such as proposed by
Thoha (2003). Namely by comparing the questions the syllabus, lesson plans and the material that has
been taught.

C. RELEVANT STUDIES

According from, the writer (Arikunto, 2005) would like to focus her research on the test items
analysis which is administered to seventh grade students of SMP N I Moga Pemalang in ES the academic
year of 2008/ 2009. The reasons of choosing the topic are as follows:

a. The evaluation of test items in final test in the academic year of 2008/2009 has just been administered
to the student even though it has not been analyzed in terms of difficulty and discrimination power.

b. In teaching learning process, evaluation is carried out every term. If test constructors do not pay
attention in selecting the items, the validity and the reliability of cach test will be less guaranted. For this
reason, every test constructor must be careful in constructing the test items, so that the result will meet the
desired goal.

c. By applying item analysis, we can indicate which items may be reliable and valid. We can check
properly whether the test has a good quality or not.
CHAPTER III

METHODOLOGY

1. The Research Design

This research applied a descriptive method to the formulated problem regarding the test scores
reliability, discrimination power, index difficulty, and item analysis. Marczyk, DeMatteo, & Festinger
(2005, p. 209) argued “Descriptive statistics allow the researcher to describe the data and examine
relationships between variables within the research conducted.

2. The Data and Source of Data.

The data in this study are student answer sheets in the final test obtained from the SMPN 19 Medan
school in 2021/2022.

3. Population and Sample.

The population that we use in this study is student of junior highschool in SMPN 19 Medan Grade 9-3.

4. The Techniques of Collecting Data

Techniques of Collecting Data in this research is documentation. Documentation is one way in


collecting data by analyzing the notes and documents that are available

5. The instrument of collecting data

The data collection instrument in this study used a descriptive method to formulate
problems regarding the test reliability scores, discriminating power, index of difficulty, and
analysis of questions related to quantitative data.

Validity

Validity is the extent to which a test measures what it is supposed to measure. Validity is
the accuracy of measuring items valid. Tests need to be determined in order to know the quality
of the test. It is the most critical dimension of test development. Simply stated, validity is what a
test measures and how well it does this (Anastasi, 1954; Anastasi & Urbani, 1997; in (McCowan
& McCowan, 1999, p. 3). Validity is a crucial consideration in evaluating tests.

Reliability

Reliability refers to the consistency of measurement. That is, how consistent test scores or
other evaluation results are from one measurement to another. Reliability is the extent or degree
of consistency of an instrument. Reliability test with regard to the question whether a trustworthy
and tests can be trusted in accordance with the criteria have been set. A test is reliable if it always
gives the same result if working on the same group at a different time or opportunity (Arifin, 2011,
p. 258). In estimating the reliability of test scores of the test items, the researchers applied by
means of formulas as proposed by Kuder and Richardson 20 (KR 20) because these formulas
provide a measure of internal consistency. This statistic measures test reliability of inter item

consistency. In this research, the analysis of 30 test items was computed using Microsoft Office
Excel. A higher value indicates a strong relationship between items on the test. The KR 20 is
calculated as follows:

KR 20 : Kuder Richardson 20

N : Number of items in the test

V : Variance of the raw scores or standard deviation squared

p : Proportion of correct answers of question (number of correct answers/total number of


responses)

q : Proportion of incorrect answers of question

6. The technique of analyzing data

The analysis of this research was documentation-based. Moreover, documentation is one


way in collecting data by analyzing the notes and documents that are available.

Index Difficulty

Item difficulty is determined as the proportion of correct responses, signified by the letter
“p”. An item that is rejected is the one with a proportion of correct answers that is less than 0.30

or that exceeds 0.70. The formula for calculating item difficulty is:

p: Index of item difficulty


B: Number of students answering correctly

JS: number of students taking the test

Discriminating Power

The discriminating power of a test item is its ability to differentiate between students who
have achieved well (the upper group) and those who have achieved poorly (the lower group). To
estimate item discriminating power is by comparing the number of students in the upper and lower
group who answered the item correctly. According to Gronlund (1982:103) the computation of
item discrimination index (D) for each item can be done by subtracting the number of students in
the lower group who get the item right (L) from the number of students in the upper group who
get the item right (U) and divided by one half on the total number of students included in the item
analysis (1/2 T). The first step of computing item discriminability is to separate the highest scoring
group and the lowest scoring group from the entire sample on the basis of total score on the test.
The students with highest total scores are compared in their performance with the students with
lowest total scores using the formula:

D: the index of discrimination

Pu: the proportion in the higher group

Pl: the proportion in the lower group


CHAPTER IV

DISCUSSSION

A. The Research Findings

The results of the test analysis were presented in order to answer the research question
about the test validity, the test reliability and the item analysis covering the level of difficulty and
the discriminating power.

1. Validity

The researchers used content validity to see how well the content of the instrument
representing the entire of content which might be measured. It was examined by making a table
consisting the test items of distribution items and the test items were analyzed with sequence of
learning outcomes to identify the possibility whether or not the test items and sequence of learning
outcomes covered by the test. The distribution of test items regarding the validity is presented on
the table and figure as follow: Table

1. Distribution Test Items of Validity

NO VALIDITY NUMBER OF ITEM TOTAL PERCENTAGE

1 Valid 1,2,3,5,7,8,10,11,12,14,14,16,18,19, 21 70%

21,24,25,26,2829,30

2 Invalid 4,6,9,13,17,20,22,23,27 9 30%

TOTAL 30 100%

2. Reliability

Results of this research showed that in the analyzing of the reliability within the 30 test
items based on the KR 20 proposed by Kuder and Richardson. If r11 reliability showed
Interpretation reliability coefficient (r11) is where r11 ≥ 0.90 then the item being tested has an
excellent reliability, r11 ≥ 0.80 then the item being tested has a very good test or a very good
reliability, r11 ≥ 0.70 then the item being tested has a good reliability or a good for classroom test,
r11 ≥ 0.60 then the item being tested has a low reliability, then, r11 ≥ 0.50 then the item being
tested has a lower reliability and it suggests need for revision of test, but if r11< 0.50 then the item
being tested has a questionable reliability. The results of this research indicated that the test items
were in the category “Lower Reliability and Revision of Test” because r11 was 0.521010831
(0.52). This means index reliability of 0.521 is 0.50- 0.60. It can be concluded the test items for
multiple choice questions distributed to the students of 9th grade at SMP Negeri 44 Surabaya were
categorized into Lower Reliability and Revision of Test.

Index Difficulty

Difficulty level is one kind of item analysis which is concerned with how difficult or easy
the item is for the students (Shohamy, 1985). In addition, it is argued that if the item is too easy, it
means that most or all of the students obtained the correct answer. In contrast, if the item is
difficult, it means that most or all of the students get it wrong. Such an item tells nothing about
differences within the students. Moreover, the difficulty level of an item may range from 0.00 to
1.00 shows about the extent of difficulty level (p.73). Thus, if the difficulty level is 0.00, it means
that the item is difficult. On the other hand, if the difficulty level is 1.00, it means that the item test
is easy.Classification is used to interpret the results of the calculation of the level of difficulty that
is 0.00-0.69 including the category difficulty question. Meanwhile, 0.70-1.00 including category
easy questions. Based on the analysis, it was known that among the 30 items, there were 5 items
(17%) about the difficulty category, 15 items (50%) were classified into medium category and 10
items (33%) belong to the category of easy. The distribution of difficulty levels are as follows:

No Index Difficulty Number of Items Total Percentage

1 0.00-0.29 (Diffcuty) 2,5,7,8,12 5 17%

2 0.30-0.69 (Medium) 1,3,4,6,9,10,11,13,15 15 50%

,17,18,19,21,23,25

3 0.70-1.00 (Easy) 14,16,20,22,24, 10 33%

26,27,28,29,30

Total 30 100%
Discriminating Power

The discrimination power of the test items tells how well the item performs in separating the upper
group and the lower group (McCowan & McCowan, 1999).Organization is used to interpret the
results of the calculation of discrimination index that from 0.00-1.0 either a good item or a poor
item. Additionally, the calculation of discrimination index that from 0.00-0.19 are included in the
category of “Poor”; 0.20-0.39 are included in the category “Enough”; 0.40-0.69 are included in
the category “Good”; and 0.70-1.00 are included “Excellent”. Based on the results of the analysis
, the multiple choice items with a good discrimination index totaled 10 items(33%). There was no
“Excellent” category within the 30 test items distributed to the students. Then, there were 15 items
(50%) that were classified as discrimination of enough. Lastly, the classification of a poor
discrimination index totaled 5 items (17%). Distribution of 30 items based on the discrimination
index are as follows:

No Descrimination Index Number of Item Total Persentage

1 0.00-0.19 (Poor) 1,2,3,5,7 5 17%

2 0.20-0.39 (Enough) 4,6,8,9,11,12,14,16, 15 50%

17,22,24,25,27,28,30

3 0.40-0.69 (Good) 10,13,15,18,19, 10 33%

20,21,23,26,29

4 0.70-1.0 (Excellent) - 0 0%

Total 30 100%
B. Discussions.
Quality of the test items can be seen through some indicators of validity, reliability, level of
difficulty, discrimination index, and distractor efficiency. The two following indicators namely reliability
and items analysis covering index difficulty and discrimination index are presented at the following
discussions:

1. Validity

Validity is a crucial consideration in evaluating tests. Validity is considered what a test measures
and how well it does (McCowan & McCowan, 1999, p3). Based on the data examined In this research as
the result findings there were 21 items (70%) were classified into valid itemsThey were distributed among
number of 1,2,3,5,7,8,10,11,12,14,14,16,18,19,21,24,25,26,2829,30. Meanwhile, the classification items of
invalid items were 9 items (30%) which were distributed among number of items 4,6,9,13,17,20,22,23,27.
Concerning the result findings on the description mentionedthe multiple choices of test items classified into
validity.

2. Reliability

Reliability is a question of consistency level that can be trusted. Reliability of question measured
using KR-20. Reliability coefficient (ALPHA)a measure of the amount of measurement error associated
with a test score(Test Item Analysis & Decision Making “Offered by the Measurement and Evaluation
Center” Analyzing Multiple-Choice Item Responses, 2003)Interpretation reliability coefficient () is where
r_{11} >= 0.9 then the item 42 Item Analysis of Final Test for the 9th Grade Students of SMPN 44 Surabaya
in the Academic Year of 2019/2020. Being tested has an excellent reliabilityr_{11} >= 0.8 then the item
being tested has a very good test or a very good reliability, r_{11} >= 0.7 then the item being tested has a
good reliability or a good for classroom test, r_{11} >= 0.6 then the item being tested has a low
reliabilitythen, r_{11} >= 0.5 then the item being tested has a lower reliability and it suggests need for
revision of test, but if r_{1} < 0.5 then the item being tested has a questionable reliabilityThe results of this
research indicated that the test items were in the category “Lower Reliability and Revision of Test” because
rn was 0.521010831 (0.52) and the results will not be steady or change if tested again in the same
groupBased on the description above, it can be concluded the test items distributed to the 9th grade students
at SMPN 44 Surabaya in the academic year 2019/2020 to the 30 students were a problem with low reliability
or can be said the items test needs to be revised.

3. Items Analysis

Concerning to the item analysis within this research, the researchers gained the data examined and
classified into two indicators of index difficulty and index discrimination. Item difficulty lends a hand in
distinguishing easy item from difficult ones. In general, there is a good distribution of difficulty throughout
the test (Sabri, 2013, p. 11). Meanwhile, the item discrimination index can be used to see if a question is
answered correctly more by the students in the high scoring group and is missed more frequently by those
students in the low scoring group. This accomplished by dividing the students into two groups, namely high
scoring group and low scoring group. Based on the results of this research, the researchers presented all of
the discussion as follow:

a. Index Difficulty

The result of item discrimination index can range from -1 to 1 (McCowan & McCowan, 1999). The
interpretation of this index is that if everyone answered the question correctly the score would be 0. If
everyone in the high scoring group answered correctly and everyone in the low scoring group missed the
question, the item discrimination index would be 1. Equally, if everyone in the low scoring group answered
the item correctly and everyone in the high scoring group missed the item, then item discrimination would
be -1. When the difficulty index falls below zero, this means that the testee in the low scoring group do
better on that question than those in the high scoring group. The difficulty index should not be used as the
only one indicator for a good test. Based on the data examined in this research, there were 5 items (17%)
about the difficulty category. It was found that there were 15 items (50%) classified into medium category
and, lastly, there were 10 items (33%) belong to the category of easy.

The results of the study were accordance with the study of the theory that one of the analysis should
be conducted to determine the quality of the question is quite good as an evaluation tool is the analysis of
the level of difficulty. Items belonging to the category are to be retained. A relatively difficult question to
be held repair by replacing about where most students were able to answer that question because it is likely
most of the students had seized the material in question. A relatively easy question to be held repair by
replacing a longer sentence and complex that requires learners to think more.

b. Discrimination Power

Discrimination index is the ability of items where the scores can distinguish a group of students
from high with a group of students is low (Thompson & Levitov, 1985). The discrimination index should
not be used as the only one indicator for a good test. As the example, when one question is missed by every
student in the class. The item discrimination index for this question would be 0. If everyone in the class
correctly answers a question, the item discrimination index will also be 0. By looking at the item
discrimination index along with the item difficulty index, a picture starts to come into view of the validity
of the questions.

Based on the data examined, the researchers found that the multiple choice items with a good
discrimination index were 4 items (8%). Then, there was not "Excellent" category within the 50 test items
distributed to the students. Meanwhile, there were 15 items (50%) that classified into discrimination of
enough. Lastly, the classification of a poor discrimination index totaled 5 items (17%). The results showed
that the discrimination value for 30 test items were below 0.40 (Index Discrimination) should either be
rejected or revised because those were categorized as poor items. In line with (Shohamy, 1985), "The
discrimination index should not be used as the only one indicator for a good test and by looking at the item
discrimination index standard along with the item difficulty index, a picture starts to come into view of the
validity of the questions".

The discrimination ability of 10 items (33%) were satisfactory with a value between 0.40 and 1.0
and are in good categories which had the discrimination ability categorized "excellent" with the value
ranging from 0.40 to 1.00. Overall, the test items were quite good and indicating the ability of testee with
a further consideration that there was only 10 items that discriminated very well between stronger and
weaker students. Thus, the results of the item difficulty and item discrimination analyses showed that there
were many easy items in general, which seem to lower the discrimination ability of the items. Most of the
moderately difficult items discriminate poor and only 5 items moderately difficult with good discrimination
value,
CHAP V

CLOSING

CONCLUSION

In this study, the formulation of questions regarding test scores of validity, reliability, and
item analysis must be detailed, for this a very in-depth focus is needed. To calculate the reliability
of the test, statistical tests are used. Analysis of test items using Microsoft office Excel. In the
research using descriptive methods to describe and examine the data. The results of the study are
categorized as low validation and the reliability of the test score is also low because it is only 0.52.

However, this study will focus on discriminating power and index difficulty to lead to
improvements in the construction of the test object.

SUGGESTION

From these findings, it is hoped that in measuring student performance it must be effective
and efficient. Important improvements need to be evaluated where items with poor discrimination
index should be reviewed. If there is an item error, it must be corrected with the respective index
classification because it must meet the criteria. Hopefully, by writing this mini-research, it can be
useful and used as internal capital to become an effective and efficient teacher in calculating a
good learning score test. The writer also receives criticism and suggestions in writing mini-
research in the future.
REFFERENCES

Nuryulia Ika Rini. 2009. Item Analysis of Achievement Test in Final Test for The Seventh Grade Students
of SMP N 1 Moga Pemalang In The Academic Year 2008/ 2009. Final Project. English Education. English
Department Language and Art Faculty, Semarang State University.

NUR' ISLAMiyah UMAR. (2022). English Education Department thesis. Analysis of the Validity of the
English Test at SMP Negeri 1 Bontomarannu. Supervised by Erwin Akib and Muhammad Asrianto Setiadi.

Marczyk, G., DeMatteo, D., & Festinger, D. (2005). Essentials of Research Design and Methodology .
USA: John Wiley & Sons. Inc.

McCowan, R. J., & McCowan, S. C. (1999). Items Analysis for Criterion-Referenced Test. New York:
CDHS: Center for Development of Human Services.

Sabri, S. (2013). Item Analysis of Student Comprehensive Test for Research in Teaching Beginner String
Ensemble Using Model Based Teaching Among Music Students in Public Universities. International
Journal of Education and Research, Vol.1, No.12 , Sultan Idris Education University.

You might also like