Chapter 9
Chapter 9
Score the test results in a set of numbers known as raw scores. To make the raw scores
comprehensible, they can be arranged in a frequency distribution or displayed graphically as a
histogram or frequency polygon. Once a test has been scored, teachers need to interpret the
results and use these interpretations to make grading, selection, placement, or other decisions. To
interpret test scores accurately, teachers need to analyze overall test performance and individual
test items, and use this data to draw valid conclusions about student performance. This
information also helps teachers prepare for posttest discussions with students about the exam. In
this case to interpret student test scores, teachers need to know the way or method. There are
several methods that will be introduced in this paper, namely Frequency Distribution, Measures
Of Central Tendency, Measures Of Dispersion, Item Analysis, and Moderation.
A. Frequency Distribution
The frequency of a value is the number of times it occurs in a dataset. A frequency distribution
is the pattern of frequencies of a variable. It’s the number of times each possible value of a
variable occurs in a dataset. There are four types of frequency distributions namely:
The number of observations of each value of a variable.You can use this type of frequency
distribution for categorical variables.
“Frequency.” Enter the values in the first column. For ordinal variables, the values should be
ordered from smallest to largest in the table rows. For nominal variables, the values can be in any
order in the table. You may wish to order them alphabetically or in some other logical order. And
next, Count the frequencies. The frequencies are the number of times each value occurs. Enter
the frequencies in the second column of the table beside their corresponding values. Especially if
your dataset is large, it may help to count the frequencies by tallying. Add a third column called
“Tally.” As you read the observations, make a tick mark in the appropriate row of the tally
column for each observation. Count the tally marks to determine the frequency. Here’s the
example of ungroupws Frequency Table
Example 1.1
rule of thumb:
Calculate the class intervals. Each interval is defined by a lower limit and upper limit.
Observations in a class interval are greater than or equal to the lower limit and less than the
upper limit. The lower limit of the first interval is the lowest value in the dataset. Add the class
interval width to find the upper limit of the first interval and the lower limit of the second
variable. Keep adding the interval width to calculate more class intervals until you exceed the
highest value. The next step is create a table with two columns and as many rows as there are
class intervals. Label the first column using the variable name and label the second column
“Frequency.” Enter the class intervals in the first column. For the last step is count the
frequencies. The frequencies are the number of observations in each class interval. You can
count by tallying if you find it helpful. Enter the frequencies in the second column of the table
beside their corresponding class intervals. Here is the example group frequency table.
Example 1.2
Frequency distributions are often displayed using frequency tables. A frequency table is an
effective way to summarize or organize a dataset. It’s usually composed of two columns namely
the values or class intervals and their frequencies. The method for making a frequency table
differs between the four types of frequency distributions. You can follow the guides below or use
software such as Excel, SPSS, or R to make a frequency table.
Example 1.3
Example 1.4
B. Central Tendency
Central tendency is a descriptive summary of a dataset through a single value that reflects the
center of the data distribution. Along with the variability (dispersion) of a dataset, central
tendency is a branch of descriptive statistics. The central tendency is one of the most
quintessential concepts in statistics. Although it does not provide information regarding the
individual values in the dataset, it delivers a comprehensive summary of the whole dataset.
Measures of central Tendency, Generally, the central tendency of a dataset can be described
using the following measures:
a) Mean (Average): Represents the sum of all values in a dataset divided by the total number of
the values.
b) Median: The middle value in a dataset that is arranged in ascending order (from the
smallest value to the largest value). If a dataset contains an even number of values, the median of
the dataset is the mean of the two middle values.
c) Mode: Defines the most frequently occurring value in a dataset. In some cases, a dataset may
contain multiple modes, while some datasets may not have any mode at all.
Even though the measures above are the most commonly used to define central tendency, there
are some other measures, including, but not limited to, geometric mean, harmonic mean,
midrange, and geometric median. The selection of a central tendency measure depends on the
properties of a dataset. For instance, the mode is the only central tendency measure for
categorical data, while a median works best with ordinal data.
Although the mean is regarded as the best measure of central tendency for quantitative data,
that is not always the case. For example, the mean may not work well with quantitative datasets
that contain extremely large or extremely small values. The extreme values may distort the mean.
Thus, you may consider other measures. The measures of central tendency can be found using a
formula or definition. Also, they can be identified using a frequency distribution graph. Note that
for datasets that follow a normal distribution, the mean, median, and mode are located on the
same spot on the graph.
C. Measure of dispersion.
Mean Deviation is a statistical measure to find the average deviation of values from the mean
in a sample. How to calculate Mean Deviation:
2.) Calculate the difference of each observation from the mean (deviations of all the
observations),
2.) Variance
The standard deviation of a random variable is the square root of its variance. A low standard
deviation shows that the values tend to be approaching the mean of the set, while a high
standard deviation indicates that the values are spread out across a wider range.
4.) Range and Interquartile range.
The range is the difference between the highest and lowest values of a dataset. The Interquartile
Range(IQR), also called the mid-spread, is a measure of statistical dispersion, being equal to the
difference between 75th and 25th percentiles, or between upper and lower quartiles, IQR = Q3 −
Q1.In other words, the IQR is the first quartile subtracted from the third quartile. These quartiles
can be seen on a box plot on the data.It is a trimmed estimator, defined as the 25% trimmed
range, and is a commonly used robust measure of scale.
D. Item analysis.
In Item Analysis calculation, the interpretation of the result is pretty much straightforward. The
only problem that might occur for beginner is familiarity with the data (or worst never analyze it
before). The more we analyze, the easier we could understand what happened inside the data. So
First and foremost, we have to remember that in medical education, it is common that student
have to be able to do or know certain skills/knowledge. Lets take Basic Life Support (BLS), as
the example. If you as the educationalist already set the standard that the student have to be able
to do BLS, understanding the all necessary knowledge to do BLS is a must. Therefore, if you
have question regarding BLS, we will expect the result to be “easy” (in column B). Why ?
Because every student will get the answer correct. In this case, having “easy” warning is not a
problem, and thus the question is not categorize as bad question. In empirical analysis There are
statistic analysis that we want to analyze namely Item difficulty, Item – total test correlations,
Item discrimination.
a. Item Difficulty
Item Difficulty is measuring how difficult the question to the students. It is a ratio between
student who can answer correctly to the total of the student who join the exam.
This ratio number range from 0 – 1. The more student could answer correctly, the higher the
number closer to 1. If the question is difficult (few students could answer) then the number close
to 0.
100 students join an exam with 100 questions. There are 65 students answer correctly in question
number 1.
This will give us an Item Difficulty of the question number one , 0.65.
b. Total test correlations.
This is a correlation between answering question correctly with the total score. For example, is it
answering correctly question number 2 correlate with the total score ? It is expected that if the
students answer number 2 correctly, they will have higher total scores, compare who answer
wrong.
In Psychometric library in R, the item criterion give us freedom to correlate item with Y
criterion (choose what you want to correlate with). In the script, I correlate item with total score
and the result is column G. It is a Point biserial correlation.
c. Item Descrimination
This is called D-index. It is the ability to distinguish between the students who understand the
material and who do not. It is calculates the proportion of high achievers and low achievers to
answer the item correctly. The value of D-index is between -1.0 and 1.0. So, in this value, a
high-positive values are good items. With the opposite condition, low or negative value are bad.
The simple explanation is, when the low-scorer student could answer the questions, means this
question cannot distinguish the ability of the student that well.
The importance of moderating class intervals is to find out the average count of observations,
the difference from the average observations, and constraints on observations. Central tendency
is a branch of descriptive statistics. Central tendency is one of the most classic concepts in
statistics. While it doesn't provide information about individual values in a data set, it does
provide a comprehensive summary of the entire data set. Measures of Central Tendency In
general, the central tendency of a dataset can be described using the following steps: a) Mean
(average): Represents the sum of all values in a dataset divided by the total number of values. b)
Median: The middle value in a data set arranged in ascending order (from smallest to largest
value). If the data set contains even values, the median of the data set is the average of the two
middle values. c) Mode: Determines the value that appears most often in the data set. In some
cases, a data set may contain multiple modes, while some data sets may have no mode at all. The
choice of measure of central tendency depends on the properties of the dataset. For example,
mode is the only measure of central tendency for categorical data, while the median works best
with ordinal data. Although the mean is considered the best measure of central tendency for
quantitative data, it is not always the case. The measure of central tendency can be found using
formulas or definitions. Also, they can be identified using a frequency distribution chart.
Summary
So in the testing process, it is critical to consider any sources of bias or inaccuracy. Test-taker
weariness is one possible source of error, and it might affect test results if the person is drained
or loses concentration while taking the test. Cultural or linguistic disparities could also factor in
prejudice, as certain people may not be able to take specific assessments because of these factors.
When interpreting the results, these variables should be considered as they may affect the
validity of the test results.