Optimized
Optimized
Optimized
a final project
by
2201412010
ENGLISH DEPARTMENT
2019
ii
iii
MOTTO AND DEDICATION
For
My beloved family,
iv
ACKNOWLEDGEMENTS
Wata’Ala, the Almighty, for all the blessing given to me during the accomplishment
of my final project. Then, shalawat and salaam are only given to Prophet
Galuh Kirana Dwi Areni, S.S., M.Pd., who give ideas, times, suggestions and
guidances to me so patiently during this final project writing. I would also like to
extend my gratefulness to all lectures of the English Department and the whole
staffs of the Faculty of Language and Arts, Universitas Negeri Semarang who
taught, guided and help me during the study so that I can overcome a lot of
for their love, prayers, support and anything they have done for me. And also to My
2012 especially Izzun, Riyan, Shofy, Roti, Yusti, Rosi, Pas, Wahyu who always
share their laughter and support me in past years. In addition I would like to thanks
v
ABSTRACT
The aim of this study is to find empirical evidence of whether or not the summative
test items have good characteristic of test in terms of difficulty level, discriminating
power, and distractor efficiency. This study was held in the seventh grade of SMP
Negeri 3 Ungaran. The study design used in this study was Item Analysis Research.
The respondent of this study are 100 students. The writer chose the top 27% for the
upper group and top 27% for the lower group. Then, the data analysis used a
formula from Arikunto’s theory. After conducting the research, the writer found
that from 40 items, there are 15 (37.5%) acceptable items to meet the criteria of
index difficulty. Besides there is 1 (2.5%) too difficult item and there are 24 (60%)
unacceptable items due to easiness of the items. For the discriminating index, the
writer found there are 17 (42.5%) poor items, 1 (2.5%) items has worst result, which
has to be discarded, 11 (27.5%) acceptable items, and 11 (27.5%) mediocre items
test. Moreover for the distractor effectiveness, the writer found there are 50
(47.67%) distracters with functioning distractor and 70 (58.33%) distracters with
non-functioning distractor. In conclusion, English summative test for seventh grade
of SMP N 3 Ungaran did not meet the criteria of effective and acceptable test.
vi
TABLE OF CONTENTS
APPROVAL……………….................................................................................. ii
PERNYATAAN…………................................................................................... iii
ACKNOWLEDGEMENTS ................................................................................ v
ABSTRACT ........................................................................................................ vi
vii
2.2.2.1 Placement Test.......................................................................................... 14
REFERENCES .................................................................................................... 78
viii
CHAPTER I
INTRODUCTION
This chapter presents background of the study, reasons for choosing the topic,
limitation of the study, purpose of the study, research question, significance of the
sure whether the teaching learning process has been running well over the term or
not. In education, evaluation becames the important role because it shows the result
of learning process. The objective of the evaluation itself is to help the teacher
ascertain the degree to which educational objectives have been achieved, to review
the effectiveness of teaching method and to help the teacher knows their pupils as
individual.
performance based on information that had been collect, synthesized and reflected
by the teacher.
1
2
There are several methods in getting the data for the evaluation purpose.
One of them is by using a test. This test could be a teacher-made test or standardized
test. In the teacher-made test, the teacher who makes the test should know and
master the principles and the steps that must be done in making the test. By this
knowledge the teacher will get a clear figure about the general systematic
can be used efficiently. To be an effective test, it has to fulfill the criteria of a good
test. They are validity, reliability and practicality. Brown (2003) stated that the test
if the result of the test is similar even the test administered with the same standard
It is not an easy way to know the quality of the test item. To acknowledge
whether the test has accomplished the standards of a good test, the teacher should
evaluate the test item in many steps. The investigation that teacher did in order to
know the quality of each item test is called item analysis. Item analysis is helpful
for improving teachers’ skills in the test construction and recognizing specific areas
of course content that need a greater emphasis or clarity. The characteristics that
determine an item analysis test are item difficulty, item discriminator, and item
distracter. The item difficulty means the rank of difficulty for each item test for
students. The item discriminator tells how well each item test differentiates the
comprehension ability among the upper and the lower students. Lastly, item
distracter indicates how effective each alternative or option for an item on multiple
choice questions.
3
Based on the preliminary study conducted by the writer on the seventh grade
students of SMP Negeri 3 Ungaran, the writer found some facts about construction
and content of the evaluation. The writer found that some of the seventh grade
students test questions in English final summative test academic year 2018/1019
were confusing due to ambiguous options. Some of the options are the same and
the instruction of the question was not clear. Furthermore, the summative test was
a teacher-made test which was designed by the teacher herself. The teacher designs
the item test based on item bank by using search engine from the internet that may
not fit with the local content. If the test item is not suitable to the local content, the
By giving a good test, the students will have an opportunity to get a quality result
in learning process fairly. Concerning on the problems that the writer found on the
preliminary study, the writer believes the presence of study that focuses on
discussion about item analysis is necessary. The research will not only find out the
quality of the test item which used in SMP Negeri 3 Ungaran, but also to know the
weakness in each item test. The writer decided to focus on item analysis as the main
topic of this study entitled “An Item Analysis of English Summative Test on
an important key to knowing whether or not each item in English summative test
The discussion of the study will focus on the item analysis of English final
summative test in the second semester of seventh grade students of SMP Negeri 3
Ungaran. The item analysis refers to the process of collecting, summarizing, and
using information about individual test items. The items that will be analyzed are:
1. Difficulty level which helps us to decide if the test items are at the right level for
2. Discriminating Power which allows us to see how well the students know the
3. Distractor efficiency which gives information whether or not the item test has an
efficient distractor.
The objective of the presents study is to find empirical evidence of whether or not
the summative test items have good characteristic of test in terms of difficulty level,
Based on the limitation of the problem, the writer conducts a research to find out
the percentages of item analysis in multiple choice item. So the writer formulates
the problem as follow “Do the test items of English final summative test used for
seventh grade students of SMP Negeri 3 Ungaran fulfil the criteria of a good test
The result of this study is expected to enhance the will of writer generally, and
society at large to identify poor items in evaluation test through item analysis,
therefore the test will give students score fairly based on their competence.
choosing the topic, limitation of the study, purpose of the study, research question,
theoretical review.
Chapter III presents research methodology. It presents place and time of the
study, population and sample, research instrument, method of collecting data, and
There are two subchapters in this chapter. They are review of previous studies and
There are some similar researches that have been conducted by other researchers.
Related to this study, the writer choose some references about previous studies
which are close to the previous studies. Those previous studies discuss about item
test analysis.
“An Analysis of the English Summative Test Items in terms of Difficulty Level for
the Second Year Students of MTs. Darul Ma’arif Jakarta”. The purpose of this
research is to measure the difficulty level of the English Summative Test items by
calculation the student’s correct response from the upper and lower group with J.B.
Heaton’s formula referred from his book “Writing English Language Test”.
items for the second year students of MTs. Darul Ma’arif Surakarta have a good
quality in terms of difficulty level?”. The result of this research interpreted by the
Suharsimi Arikunto’s criteria of items referred from his book “Dasar-dasar Evaluasi
6
7
All of the items have been counted by dividing the total of difficulty level
of the items with the total number of students. The research showed that the result
is 0.45. In the end of this research, the researcher has been concluded that the
English summative test items for the second year students of MTs. Darul Ma’arif
qualified as a good test. It can be seen from the difficulty level of all item which is
The next research came from Prayoga (2011) who did an analysis about
difficulty level of English summative test for the second grade of Junior High
School in the odd semester 2010/2011 at SMP N 13 South Tangerang. This study
data which were analyzed statistically. Also, this study categorized as a descriptive
analysis because it was intended to describe the objective condition about the
In this study the researcher took 93 students as a sample. The findings of the
study were moderate items had the highest percentages with 66.7%. Followed by
difficult items with 20 %, and easy items with 13.3%. Overall, the difficulty level
of the test was moderate level with 0.50 index of difficulty. It means that the test
about the discriminating power of English summative test at the second year of
it was intended to describe the objective condition about the discriminating power
8
The findings of the study showed that the English summative test which tested to
the second grade students of SMPN 87 Pondok Pinang had a good discriminating
power. There were 35 items ranging from 0.25 until 0.75 (70%), which means the
Based on the previous related findings there are some differences between
the researches. The researchers above analyzed difficulty level and discriminating
power. There is one analysis of the test item that did not conduct by the researchers.
Besides the difficulty level and discriminating power, the effectiveness of distractor
overcome the weakness occurs in those previous studies. This research will prove
whether multiple choice items of English summative test categorized as a good test
or not. This study will hopefully, turn out quite different from those studies above,
because the writer is not only analyze the difficulty level and discrimination power
but also include the distractor efficiency analysis in each test items.
8
9
information about the knowledge, skills and attitude of students. This action can
measurement and evaluation take parts. Brown (2003) described a test as a media
monitor the development of the program, to diagnose the difficulties in the program
and to measure the performance of the test taker in and at the end of the program
end of the program by measuring the ability of test taker intelligence both in
important either for the teachers nor the students. The importance for the students
through a test, they will know their achievement in learning the material. While for
the teachers, through a test, they will know a students who have understood the
material so that the teachers can give more attention to the students who have not
understood yet.
10
measuring a student’s control of the language in the light of what he or she will be
(1996) stated that a proficiency test assess the general knowledge or skill commonly
institution.
Arthur Hughes stated that proficiency tests are designed to measure test
taker’s ability in language regardless of any training they may have had in that
language. In contrast to achievement tests, content of proficiency tests are not based
language, regardless of any training they may have had in that language. The
language courses that people taking the test may have followed. Rather, it is based
selection, and their relative merit lies in their ability to spread students out according
As its name reflected, the purpose of achievement test is to establish how successful
Development, Evaluation and Research, achievement test may be used for program
tests normally come after a program of instruction and that the components or items
of the tests are drawn from the content of instruction directly Henning (2001).
the extent to which pupil has achieved in various subject area.15 The measurement
based on those opinions is usually done at the end of learning process or program.
Thus it can be inferred that achievement tests are used to measure the extent
stated objectives of a learning program. Achievement tests are also used by teacher
to motivate students to study. If students know they are going to face a quiz at the
end of the week, or an end of semester achievement test, the effect is often an
increase in study time near the time of the test. The primary goal of the achievement
tests is to measure past learning, that is, the accumulated knowledge and skills of
explain reports sent to students and their parents stated in (Arends (2012).
Moreover, Cotton (2004) states that, summative test assessment methods are
made to determine what a student has accomplished at the beginning or the end
According to Suwandi (2009) there are four types of summative test that be used
carry generalizable meaning; that is, the score can be interpreted to mean
to know how successful students have mastered the previous materials of a long
period of course.
13
achievement tests, they are hope to increase scores indicating the progress made.
Airasian (2102) stated that formative tests take place while interacting with students
and focused on making quick and specific decisions about what to do next in order
to help students learn. They all rely on information collected through either
instruction. Formative tests are typically designed to measure the extent to which
instruction, such as a unit or a textbook chapter. These tests are similar to the
quizzes and unit tests that teachers have traditionally used, but they place greater
emphasis on (1) measuring all of the intended outcomes of the unit of instruction,
and (2) using the results to improve learning (rather than to assign grades) based on
The result of formative test gives the information about how well students have
successes and failures so that adjustments in instruction and learning can be made.
The formative test also determines whether a student has not been mastered the
learning tasks being taught, it can be prescribed how to remedy the learning failures.
Formative test is intended to monitor learning progress during the instruction and
Brown (1996) stated that a diagnostic test is designed to determine the degree to
which the specific instructional objectives of the course have been accomplished.
Heaton (1988).also stated that diagnostic test is widely used; few tests are
carried out of groups of students rather for individuals. In summary, diagnostic tests
are designed to diagnose a particular aspect of a language and can be used to check
the students in learning a particular element of the course. For example: it can be
used at the end of a chapter in the course book or after finished one particular on
lesson.
The placement test provides an invaluable aid for placing each student at the most
content validity), and it thereby provides an indication of the point at which the
student will find a level or class to be neither too easy nor too difficult, but
information that will help to place students at the stage or in the part of the teaching
learning program that most appropriate with their abilities. So the performance
quality of a test will influence the result of the test itself. Once the test has a good
quality, the right information will be gained and used to make accurate decision to
the student’s achievement. Brown (2001) stated that a well-constructed test should
Validity is the degree to which the test actually measures what is intended
define, especially within the art and the science of evaluating and designing tests.
Moreover, a good test should be good at its item analysis. Brown (2004)
also stated that, “there are three main components of item analysis, they are:
Meanwhile, according to Purwanto (2009), a good test item should have three
criteria; moderate difficulty level, high discriminating power and distractor analysis
which work effectively. An effective and good test should have the items that
belong to moderate level. The item that is too easy or difficult potentially weaken
the quality of the test and the valid data of information about students’s achievement
answered the item correctly. Ryan and Ory (1993) said that the Higher the
percentage of the students who answer correctly, the easier the item is. To
obtain specific data, the writer differs students into two groups; upper and
lower group. The writer here uses top 27% of high group and the 27%
(1971) that it was small enough to clearly identify high and low students
group, yet large enough to provide a sufficient number of score as a base for
item statistic. The difficulty index is found by dividing students who get the
correct items with the total number of students whom taking the test, after
𝑈𝑅 + 𝐿𝑅
𝐼𝐷 = 𝜒 100%
𝑁
Where:
𝑈𝑅 : The number of pupils who answered the item correctly in the upper
group
𝐿𝑅 : The number of pupils who answered the item correctly in the lower
group
Table 2.2.4
Difficulty Level
The test is appropriate for being tested if each item in the examination
passed by half of the students. Karmel (1978) stated that difficulty Index is
relevant for determining whether students have learned the idea of being
the students who know the concept of materials being tested and those who
do not.
the students on the basis of how well they know the material that has been
tested. It is also important to note that the difficulties of items can identify
the student will also differ into two groups; upper and lower group. All of
the groups will be a representative for 27% from the whole student who took
The higher the percentage of the discrimination index means the ability of
students who scored high on the test as a whole than by students who scored
low. Arikunto (1999) stated that a good test item is the one having
obtained by subtracting the number of students in a lower group who get the
item right (L) with the number of students in the upper group who get the
item right (U). After that the result will divide with the half total number of
𝑈 𝐿
𝐷𝐼 = (𝑁𝑅 ) − (𝑁𝑅 ) 𝑂𝑟 𝑃𝑈 − 𝑃𝐿
𝑈 𝐿
Where:
𝑈𝑅 : The number of pupils who answered the item correctly in the upper
group
𝑁𝐿 : the number of pupils who answered the item correctly in the lower group
19
𝑈𝑅
𝑃𝑈 = : the proportion of pupil who answered the item correctly in the
𝑁𝑈
upper group
𝐿𝑅
𝑃𝐿 = : the proportion of pupil who answered the item correctly in the
𝑁𝐿
lower group
Table 2.2.5
DP QUALITY RECOMMENDATION
A test has discrimination index if more of the upper group answer the
questions correctly than the lower do. By analyzing the discrimination index
of each item, it will show the information that helps the teacher in
identifying the flaws, giving further explanation about material, and also
how the distracters are functioning. It can be compared with the proportion
distractor is mostly chosen by the upper group, it can be said that the
distracter did not function as it should be. One of the objectives of item
students who answered correctly, which distractor is too showy and make it
easier for students not to vote, the misleading distracter and the distractor
An item is the basic unit of language testing. Brown (1996) stated that the
item is the smallest unit that produces distinctive and meaningful information on a
test or rating scale. The item used in classroom tests are commonly divided into two
Nouman (1977) stated that objective test items can be used to measure a variety
but other item types also have a place. Following simple but important rules for
achievement test, the test maker may choose from a variety of item types. One
of them is referred to as objective item. This kind of item test can be scored
The objective item can be classified into two types, which are selection-type test
item and supply-type test item. Here, the researcher limited the study on the
selection-type test item. Because the type test used in this research is selection-
type test item. There are many kinds of selection-type test items. They are
multiple choice items, true-false items and matching items. Then, the researcher
Multiple Choice
Multiple choice items are made up of an item stem, which present a problem
problem. The options usually of, a, b, c or d. that will be counted correct, and
the distractors, which are those choices that will be counted as incorrect.
Nouman (1977) said that the multiple-choices item plays such an important
role in the objective testing of knowledge outcomes that it will be treated first
The term options refer to collectively to all the alternative choices presented
to the students and includes the correct answer and the distractors. These
Designing multiple choice item test, the test maker should be consider in
some ways. Based on Damien (2014) there are 18 basic rules in designing
c) Put the alternatives at the end of the question, not in the middle.
g) Avoid requiring personal opinion. Other item types are more suitable for
this.
n) Make all options grammatically consistent with the stem of the item.
Multiple choices have some advantages. Wilmar (1988) writes the advantages
a) The multiple choice item can be used for subject matter content in any
b) It has less chance for the students to guess the right answer than the true
c) One advantage of the multiple choice items over the true-false item is that
students also know what is correct rather than only know that statement is
incorrect.
Essay Test
opportunity to supply and construct their own responses, making them the
synthesizing and evaluating. The essay question is also the primary means
ideas. The main limitations of essays are that they are time-consuming to
answer and score, and they place a premium on writing ability. On the other
hand, Damien (2014) explained that essay tests are the best measure of
5.1 Conclusion
The objective of this study is to find empirical evidence of whether or not the
summative test items have good characteristic of a test in terms of difficulty level,
problem for this study is do the test items of English summative test used for
seventh grade students of SMP Negeri 3 Ungaran has fulfill criteria based on
difficulty level, discriminating power, and item distracter or not. The test consists
of 40 multiple-choice items in which there are four distracters in each item. In this
research, the writer uses quantitative research as a method of her study. Therefore,
the writer analyzes the existing data by referring to some related theories to describe
the difficulty level of the items, item discrimination index and the effectiveness of
the distracters.
Based on the analysis and data interpretation from the previous chapter, the
writer would like to conclude the quality of English Even Summative Test item
which have been tested at the seventh grade of SMP Negeri 3 Ungaran as follows;
from 40 items, there are 15 (37,5%) acceptable items to meet the criteria of index
difficulty. Besides there is 1 (2.5%) too difficult item and there are 24 (60%)
75
76
For the discriminating index, the writer found there are 16 (40%) poor items, 1
(2.5%) items has worst result, which has to be discard, and 12 (30%) acceptable
items and the last there are 11 (27.5%) mediocre items. Moreover for the distractor
effectiveness, the writer found there are 51 (42.5%) distracters with effective
Almost all of the items are easy in difficulty, and it also has many ineffective
distracters. While, ineffective distracters make the question easier to answer and
the result of an easy question is the gap between lower and upper group become
smaller. In conclude, it can be said that the English summative test did not test real
index and the distractor efficiency perspective of English summative test for
seventh grade of SMP Negeri 3 Ungaran did not meet the criteria of effective and
acceptable test.
5.2 Suggestions
Based on what the writer found in the data analysis and interpretation, it can be
understood that the test maker of this English even summative test cannot construct
ideal test items, in contrast the test maker go as she pleased. It will be better for the
test maker to understand how to construct an ideal yet effective item test before
following:
77
1. The test maker should use correct grammar, punctuation, capitalization, spelling,
and avoid trick items. Therefore, there will be no misinterpreting among the
2. If the test maker uses item bank as his or her resource, it is better to recheck from
ambiguity, question that are look alike, and miskeyed response in the stem will not
3. The test maker should arrange the items started from easiest to difficult in
sequence consider as students who taking the test. Therefore, the test maker will
know which part of items that the student knows very well and which one is not.
4. The test maker should do an item analysis after conducting a pilot test in order to
know whether the test that she, or he made has classified as a good test or not.
Moreover, because the test is for standardized test, the items which are too hard or
too easy are better not used in the constructing the test.
5. It is better for the test makers to follow the guidelines in writing items of multiple-
78
79