Using Microsoft Excel® To Calculate Descriptive Statistics and Create Graphs
Using Microsoft Excel® To Calculate Descriptive Statistics and Create Graphs
Using Microsoft Excel® To Calculate Descriptive Statistics and Create Graphs
Nathan T. Carr
To cite this article: Nathan T. Carr (2008) Using Microsoft Excel® to Calculate Descriptive
Statistics and Create Graphs, Language Assessment Quarterly, 5:1, 43-62, DOI:
10.1080/15434300701776336
their results, and they may expect to see them when reading testing reports pro-
duced by high-priced Professional Testing Experts, but aside from calculating the
average score on a test, they may question the need for further statistical or
graphical description of scores. Such additional work is, no doubt, the province
of large testing organizations (see, e.g., Educational Testing Service, 2005). But
for locally developed tests, why bother, especially if it has never been done
before? On the other hand, others may recognize that they need to but find them-
selves unable to articulate any reasons why.
The easy answer is, “Well, we just do, because it is good testing practice, and
therefore one of our responsibilities as test developers.” In other words, it is part
of what is expected when you are conducting serious assessment, or want your
work to be taken seriously. But why is that, and why should it matter, really? No
doubt some will reply that perhaps they do not want to be so serious, thank you
very much. The official answer is that it is, in fact, necessary for discharging our
ethical duties as language testers: Principle 1 Annotation 5 and Principle 8 Anno-
tation 1 of the International Language Testing Association Code of Ethics (2000)
call, in part,1 for communicating information “in as meaningful a way as possi-
ble” and to do so accurately.
There are several practical reasons for calculating descriptive statistics as
well, however. The first is that descriptive statistics let us know whether it is
appropriate to perform certain statistical tests (e.g., whether it is appropriate to
perform a t test to determine whether two groups performed differently to a sta-
tistically significant degree). Discussion of these statistical tests is beyond the
scope of this article, but they cannot be considered appropriately without first
paying attention to the concerns discussed here. Another practical reason for cal-
culating descriptives is that they also let us know which correlation coefficient
we can use appropriately on our test scores. This is particularly important when
it comes to deciding between the Pearson product–moment correlation coeffi-
cient (Pearson r) and Spearman rho (ρ).
Descriptive statistics are also used as a part of other statistical analyses that are
important to ensuring test quality, such as estimating test reliability (Bachman,
2004). For example, the formula for Cronbach’s alpha—an estimate of the score
consistency of a test—requires calculating the variance for total test scores as well
as for the scores on each individual item. Similarly, if we are interested in improving
reliability by revising problematic items, the most common way of estimating item
1
The full text of Principle 1; Annotation 5 is “Language testers shall endeavour to communicate
the information they produce to all relevant stakeholders in as meaningful a way as possible.” Princi-
ple 8, Annotation 1 reads “When test results are obtained on behalf of institutions (government
departments, professional bodies, universities, schools, companies) language testers have an obliga-
tion to report those results accurately, however unwelcome they may be to the test takers and other
stakeholders (families, prospective employers etc).”
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 45
difficulty (item facility) involves calculating the mean score for each item, as does
a family of well-known approaches to estimating item discrimination (upper–lower
method item discrimination, the difference index, and the B-index).
Even more important, descriptive statistics give us basic information about
how people did on our tests. Presumably, we are interested in how people per-
formed overall. But “how the students did on the test” involves more than just
reporting the average score, as is explained next. Descriptives and related graphi-
cal representations of our results help us determine whether we have the distribu-
tion of scores that we expected—or even needed. In other words, is the test
functioning as we expected it to? Descriptive statistics offer a precise description
of this (Bachman, 2004), whereas graphs of the score distribution provide a more
intuitive and holistic description of score data.
For example, if we are administering a pretest in a language class (a criterion-
referenced pretest, such as a baseline measure intended to verify that students do
not already know the material about to be taught), we normally expect that the
majority of the examinees will score quite poorly, because they have not been
taught the material yet. Knowing whether this expectation actually holds true is
important; if it does not, the students may be in the wrong class, the course con-
tent may need to be revised so as to better meet their learning needs, or perhaps
the test was inappropriately constructed. In contrast, on a final exam (a criterion-
referenced post-test), we expect most of the students to have mastered the mate-
rial. If the descriptive statistics and graphs of the score distribution do not show
this, however, we will know that we need to revise the course content or teach-
ing, or revise the test. Similarly, if we have reason to expect a normal distribution
in our test scores (i.e., the classical “bell curve”), we can use descriptive statistics
and graphs to see how well our results match our expectations. Situations in
which this expectation might be reasonable include proficiency testing, as well as
any other test where we expect most people to be average, and equal numbers to
be above and below average.
Finally, descriptive statistics and graphical representations of data can be use-
ful when making comparisons between sets of test scores. Although there are sta-
tistical tests for doing this more precisely, we can still get some indication of the
degree of similarity or difference between groups by comparing descriptives and
graphs. Furthermore, even if we establish that a difference is statistically “signif-
icant”—large enough that it probably did not happen by chance—that does not
mean that the difference is large enough to really matter. Examining descriptive
statistics such as the means and standard deviations, and comparing graphs of
score distributions, can help us judge whether significant differences are in fact
meaningfully large. One example of when we might want to do this is when we
wish to compare how a group of students performed on two different forms
(i.e., versions) of a test. Another example might be when we wish to compare the
performance of different groups taking the same test.
46 CARR
Before we can construct charts of our data, or calculate any descriptive statistics,
we first need to get our score data into an electronic format. The best way to do
this is with a spreadsheet program; most people will probably use Microsoft
Excel® spreadsheet software, although as Brown (2005) pointed out, the proce-
dures are highly similar regardless of the particular program being used. The only
differences of real importance here are that programs other than Excel (such as
version 15 of SPSS; see Bachman & Kunnan, 2005, for comparison) may use
slightly different names for the functions discussed here and that the procedures
for creating histograms and other graphs will also differ. As Brown also noted,
the procedures described here are virtually identical for all versions of Excel. For
those who are easily intimidated by math in general and statistics in particular, I
should make something clear right now: Excel is not a program that people have
to be good at math to use. Rather, it is a program that people use when they do
not want to do math themselves. None of the procedures outlined in this article
will require you to anything beyond adding, subtracting, multiplying, or divid-
ing—and very little of that, too.
Unless the data are being imported from a text file, as would be the case with
tests that use optically scanned score sheets, results will probably have to be
entered by hand. The most accurate way to do this is to have one person reading
the names and data while a second person types everything. Not only is this
method usually faster, it also allows the person entering the data to watch the
screen at all times, increasing the likelihood that data entry errors will be caught
immediately. The nearly universal practice for arranging the file is to have each
examinee’s data in their own separate row in the spreadsheet, with each score—
whether for individual items, sections of the test, or just total test score—in its
own column. Even when there are multiple sources of data for each test taker, as
when a speaking test has been scored by two raters, each test taker should have
one row in the data set. In other words, a given examinee’s data should not be put
into more than one row. Aside from custom, there are practical reasons for
arranging the data this way, not least of which is that none of the procedures
described here will work otherwise.
TABLE 1
Example of a Frequency Distribution
Bin Frequency
0 0
10 0
20 1
30 4
40 7
50 10
60 10
70 1
80 2
90 0
100 0
tell us how many test takers got each score. The first step in doing this is to com-
pile a frequency distribution, a table that shows each score that someone received
on the test. In the past, this might have been done by hand, with paper and pencil;
for example, for a test worth 0 to 100 points, a tally mark would be made next to
each possible score, and after all the tests had been gone through, the marks
would be totaled and reported in a table going from highest score to lowest
(Brown, 2005; Guilford & Fruchter, 1978). As Bachman (2004) pointed out,
however, unless we have a small sample, it is more informative to group scores
in the frequency distribution. Fortunately, the entire process can now be done in
moments using Excel, as will be explained next. An example of a frequency dis-
tribution can be seen in Table 1, which reports frequencies for a small simulated
data set of 35 cases, similar in size to what a classroom teacher might expect to
deal with.
Guilford and Fruchter (1978) pointed out the importance, when grouping
scores in a frequency distribution, of using appropriately sized intervals, which
Excel refers to as “bins.”2 They recommended using 10 to 20 intervals, with 10 to 15
being more common. They also recommended using certain sizes for intervals—
generally 2, 3, 5, 10, or 20 of whatever units are being used—and beginning each
interval with a number evenly divisible by the size being used (e.g., if 5-unit
intervals are used, then start each one with a number divisible by 5). Once the bin
size has been determined and the frequency distribution created, it can be
graphed. One way to do this is with a frequency polygon, which is basically a
2
I generally use the term bin instead of the better-sounding score interval in this article to remain
consistent with the usage in Excel. If the term does not seem to make much sense, imagine that we are
sorting potatoes, or buttons, and putting them into a number of storage bins based on their sizes.
48 CARR
12
10
0
0 10 20 30 40 50 60 70 80 90 100
12
10
0
0 10 20 30 40 50 60 70 80 90 100
Begin by clicking on the Tools menu. If you see “Data Analysis. . .,” the toolpack
is installed already. If it is not, click on “Add-Ins. . .” and select the “Analysis
ToolPak.” You will then be asked whether you want to install the feature; click
on “Yes” and follow any additional directions. You will not need to reboot your
computer or restart Excel when the installation is finished.
Once the toolpack is installed, you are ready to construct a histogram in Excel.
You begin by setting up the “bins.” Although this is theoretically optional, the
results will be much more useful if you set an appropriate bin size (see Figure 3
for an example of what happens when the bins are not specified in Excel and the
program is left to determine them itself). All this requires is finding an empty col-
umn in the spreadsheet and entering the interval boundaries in ascending order;
see the first column of Table 1 for an example. You do not need to create the fre-
quency distribution yourself, as Excel will do this for you automatically when it
creates the histogram. Once the bins have been created, go to Tools → Data
Analysis. . ., select the “Histogram” option, and click on the “OK” button.
The first text box is the input range; click on the button to its right (the one
with the red arrow), and the dialogue box will almost entirely disappear, aside
from a floating text box. This happens so that you can navigate the spreadsheet
and select the raw data for which you wish to construct a histogram. Once you
have selected the data, click on the button on the right edge of the floating text
box (the one with the red arrow). Then repeat the process for the bins range.
Another important part of the process is choosing from among the three
options for output location. Normally, it is better to select “Output Range” or
“New Worksheet Ply”; the former will put the histogram in the current work-
sheet, whereas the latter will create a new worksheet tab within the current work-
book (i.e., within the same Excel file). Choosing “New Workbook” will create a
new Excel workbook, which will probably be neither necessary nor useful for
most users. Finally, it is important to select the “Chart Output” checkbox, or
Excel will only produce a frequency distribution table, with no histogram. When
this is done, click “OK” and watch the histogram appear.
150
Frequency
100
50
0
3.454545
14.90909
26.36363
37.81818
49.27272
60.72727
72.18181
83.63636
95.09090
106.5454
More
–8
Test Score
Note that once one histogram has been created in a session of using Excel, the
next one will contain the previous settings as a default. It is also worth remem-
bering that the ever-popular “undo” button will not work with a histogram—once
created, the chart and the frequency distribution table must be deleted if there has
been a mistake.
Because the histogram is a chart, it can be reformatted like any other chart in
Excel. Resizing works the same as with any chart or picture in Microsoft Office®
System applications. Many users will probably want to revise the labels; for
example, “Bins,” the default label for the x-axis, should probably be replaced
with something more informative, such as “Test Score.” Likewise, unless the
document will be printed and copied in color, it is better to change all graphs to
black, white, and gray. The color of the bars can be changed by right-clicking on
one, selecting “Format Data Series. . .,” and changing the color settings on the
“Patterns” tab. I recommend a dark gray color for clarity. The color of the plot
area can be changed to white in a similar fashion after right-clicking an empty
place in the plot area and selecting “Format Plot Area. . .” The X or Y axis may
be formatted—including the direction of the text—by right-clicking on any of the
text labeling the axis and selecting “Format Axis. . .” Text such as the title and
legend of the graph can be deleted entirely, if desired, by simply clicking on the
box and hitting the Delete key. Text that is not deleted can be formatted by right-
clicking it, selecting the format option (“Format Axis. . . ,” “Format Axis Title. . . ,”
etc.), and clicking on the “Font” tab. One area for particular attention in the
“Font” tab is the “Auto scale” checkbox, which controls whether the text size
stays the same at all times or automatically adjusts as the chart is resized. It is
important to note that formatting must be applied separately for each text box
within the chart.
To add a trendline,3 right-click on one of the bars in the graph, and select
“Add Trendline. . .” Finally, a frequency polygon can be created by changing the
chart type. This is done by right-clicking one of the bars in the graph, selecting
“Chart Type. . .,” and in the “Standard Types” tab selecting “Line” as the chart
type and clicking on “Line with markers displayed at each data value” as the
chart subtype. Note that even if the color of the bars had been changed already,
converting to a frequency polygon or adding a trend line will produce a colored
line, which should then be reformatted to black and white. A slightly reformatted
example of a histogram can be seen in Figure 4, whereas Figure 5 shows the
same histogram with a trend line added. Figure 6 shows a frequency polygon for
the same variable.
To insert a chart into a Microsoft Word® document, simply click on the empty
space inside the borders of the chart—not a section with text or graphics—and
3
Note that Excel does not allow users to superimpose a normal curve over a histogram.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 51
250
200
Frequency
150
100
50
0
0
10
20
30
40
50
60
70
80
90
100
Test Score
250
200
Frequency
150
100
50
0
0
10
20
30
40
50
60
70
80
90
100
Test Score
FIGURE 5 Histogram showing a relatively normal distribution with a trend line added.
250
200
Frequency
150
100
50
0
0
10
20
30
40
50
60
70
80
90
100
Test Score
copy it. Open the Word document, put the cursor where you want to insert the
chart, and paste it in. It is important to note that all chart formatting should be
done in Excel first. Reformatting is not always possible in Word, and some
attempts at formatting in Word can even disrupt other, unrelated parts of the chart.
Therefore, once the document is pasted into Word, you should plan to do no addi-
tional formatting beyond changing the size of the chart. If you do need to make
changes, make them in Excel, and then paste in the new version of the chart.
52 CARR
Figures 7 through 9 show histograms for 34, 194, and 991 cases, respectively.
Note that as the sample size increases, the shape of the histogram grows
smoother; that is, the larger the sample, the more it will tend to approximate the
normal distribution. This illustrates the point that large samples tend to yield
smoother graphs, although they do not guarantee that you will obtain the
7
6
5
Frequency
4
3
2
1
0
100
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
Test Score
30
25
Frequency
20
15
10
5
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Test Score
FIGURE 8 Example histogram with 194 cases and automatically scaled y-axis.
120
100
Frequency
80
60
40
20
0
100
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
Test Score
FIGURE 9 Example histogram with 991 cases and automatically scaled y-axis.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 53
4
When the last number on the axis has more digits than the others, the final digit sometimes may
not display if the labels are oriented vertically. The solution to that problem is to put them at a slight
angle, as in many of the examples in this article, which use a −75° orientation for this very reason.
54 CARR
110
100
90
80
Frequency
70
60
50
40
30
20
10
0
100
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
Test Score
FIGURE 10 Example histogram with 34 cases and y-axis on the same scale as Figures 11
and 12.
110
100
90
80
Frequency
70
60
50
40
30
20
10
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Test Score
FIGURE 11 Example histogram with 194 cases and y-axis on the same scale as Figures 10
and 12.
110
100
90
80
Frequency
70
60
50
40
30
20
10
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Test Score
FIGURE 12 Example histogram with 991 cases and y-axis on the same scale as Figures 10
and 11.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 55
250
200
Frequency
150
100
50
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
0
Test Score
250
200
Frequency
150
100
50
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Test Score
250
200
Frequency
150
100
50
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Test Score
is that “the tail tells the tale.” In other words, the “tail” points in the direction of
the sign. The sign is not determined by which side the “hump” is on.
The other statistic used to describe the shape of the distribution is the kurtosis.
This tells us how flat or peaked the distribution is. A perfectly normal distribu-
tion has a kurtosis of zero; a distribution in which the scores are clustered tightly
56 CARR
250
200
Frequency
150
100
50
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Test Score
250
200
Frequency
150
100
50
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100 Test Score
together (see Figure 16) has positive kurtosis, whereas one in which they are
spread out (see Figure 17) has negative kurtosis. If this seems difficult to keep
straight at first, remember that high, peaked distributions are positive, and low,
flattened-out ones are negative.
5
People taking their first—or second, for that matter—course in testing, statistics, or research
methods often wonder what the difference is between an arithmetic average and the “ordinary” aver-
ages they learned to calculate in elementary school. There is no difference.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 57
the median is the average of the middle two scores. The third measure of central
tendency, the mode, is simply the most common score.
These three measures provide differing levels of information and are useful in
different contexts. The mode provides us with the least information of the three—
knowing the most common score is nice, but it does not necessarily tell us much
about what is going on with everyone else who took our test. The mode is more
useful in cases where the variable being described is not a score, but a category—
for example, when we are classifying students by first language. As Bachman
(2004) pointed out, the median is particularly useful when the distribution is
skewed—extreme values in the “tail” of the distribution will have a disproportion-
ate effect on the mean—and in small samples, although the latter reason is largely
because small samples are unlikely to have a normal distribution. It is also appro-
priate for rankings and for scores where the distance between levels is not necessar-
ily the same in every case (e.g., if an essay test is rated using a 5-point rating scale,
the difference in quality between a 2 and 3 might not be the same as the difference
between a 3 and 4). The mean is appropriate for any case in which the variable is
not highly skewed and the distances between levels of the variable are equivalent;
this is usually the case when test scores are based on a number of items.
In a normal distribution, the three measures of central tendency should be grouped
or clustered together fairly closely. If the distribution is skewed, they will be farther
apart. In particular, the larger the skewness is, the greater the distance will be between
the mean and the median. The median will be closer to the center of the “hump” of
the distribution, and the mean will be closer to the tail. The mode, of course, will be at
the tallest point of the “hump,” because that represents the most common score.
Measures of Dispersion
Measures of dispersion, as their name suggests, indicate how spread out scores
are for a particular variable. They include the standard deviation, variance, semi-
interquartile range, and range. As with the measures of central tendency, these
indexes provide varying amounts of information about the data they describe.
The standard deviation is the most informative measure of dispersion and is
appropriate any time that it is appropriate to use the mean. To understand what
the standard deviation is, it is useful to keep in mind that on a given test, very few
test takers will receive scores exactly equal to the mean; that is, there will be
some difference between each examinee’s score and the mean. Conceptually, the
standard deviation is similar6 to the average of these differences.
6
Strictly speaking, the standard deviation is not really the average of the differences, which is
referred to as the mean deviation (Gorard, 2004). The mean deviation is used so seldom, however,
that thinking of the standard deviation as the average of the differences is unlikely to cause any
problems.
58 CARR
Further complicating the picture is that there are two formulas for the standard
deviation: one for when we are calculating the standard deviation for a sample,
and one for when we are calculating the standard deviation for the entire
population of interest. The population formula for the standard deviation is
S=
∑ ( X − M )2 , and the sample formula is s = ∑ ( X − M )2
(Brown, 2005),
N n −1
where X is an individual test taker’s score, M is the mean, and N or n is the popu-
lation or sample size, respectively.7 The formulas yield almost identical results if
the number of cases is large, but in small groups, the difference can be notice-
able. If the test takers whose data are being analyzed are all of the test takers who
could be expected to take that test, then the population formula should be used—
as, for example, when all of the students in a particular language program are
included in the analysis (Brown, 2005). This issue arises again in the context of
estimating test reliability, but further discussion of the matter lies beyond the
scope of this article.
The variance is simply the square of the standard deviation; therefore, as there
are two formulas for the standard deviation, there are also two for the variance—
that is, for the variance of a population and the variance of a sample. The vari-
ance is not very useful in and of itself, but it is used in calculating a number of
other statistics, such as Cronbach’s alpha, an estimate of internal consistency reli-
ability (Allen & Yen, 1979).
The semi-interquartile range (Bachman, 2004) is based on the notion of quartiles,
divisions of the scores into four equally sized groups. Also referred to as the
quartile deviation, it is the average of the difference between the median (the
50th percentile) and the 25th and 75th percentiles; that is, between the second,
first, and third quartiles, respectively. Its calculation is very straightforward once
the values of the first and third quartiles are calculated: Q − Q1 Fortunately,
Q= 3
2
finding these values is very simple in Excel. The semi-interquartile range should
be reported any time the median is used.
The range is probably the simplest of the indicators of dispersion and is equal
to the highest score minus the lowest score, plus 1. As Bachman (2004) noted,
although it is the simplest of these indicators, it is also the least informative, as
distributions with widely varying shapes may all have the same range. This is the
case, in fact, in Figures 13 through 17.
7
Note the use of capital S in the population formula, and lowercase s for the sample formula. The
abbreviation SD is also commonly used, and does not specifically refer to either the population or
sample version.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 59
What Are “Good” Values for Descriptive Statistics, and How Normal Is
Normal Enough?
A common question from people applying these statistics for the first time is to
wonder what “good” values are for descriptive statistics. There is no single
answer to this, as it depends on the distribution that is expected or needed in a
particular situation. For example, we would expect scores on a proficiency test to
be normally distributed (see Figure 13 for an example of an approximately nor-
mal distribution). In most cases, we might expect the same for a placement test,
even if it is criterion referenced. A normal distribution is also an assumption for
certain statistical tests. On the other hand, in a criterion-referenced pretest, given
to learners to assess their knowledge of something that has not been taught yet,
we generally expect a positively skewed distribution (see Figure 14). That is
because most of the test takers will have very low scores, although a few may
already know the material they are about to be taught, and will thus score higher
than the others. Similarly, a criterion-referenced posttest should have a nega-
tively skewed distribution (see Figure 15), because the vast majority of the stu-
dents will (we hope!) have mastered the content, and only a few will have very
low scores.
When we do not get the distribution we expect, there is something wrong. The
problem may lie with the test itself, or there may have been something problem-
atic about our assumptions. To learn which it was will require gathering addi-
tional information about our students, further analyzing the test (e.g., item and
reliability analyses), or both. That is why we need to calculate descriptive statis-
tics and create graphs of score distributions: to tell us whether our tests are func-
tioning as we expected or not.
So how normal is “normal enough,” and how skewed should we expect our
pretests and posttests to be? Bachman (2004, p. 74) advised that as long as the
skewness and kurtosis values are between −2 and +2, the distribution is “reason-
ably normal,” meaning that it would be appropriate to perform analyses that
require normality to be appropriate (e.g., calculating the Pearson r, or performing
a t test). On the other hand, that does not automatically mean that a criterion-
referenced pretest should have a skewness of at least +2, or that a criterion-
referenced posttest should have at least a −2 skew. There are no rules of thumb of
which I am aware for these values; I therefore recommend looking at not only the
skewness statistic but also a histogram or frequency polygon of the scores to see
whether it has a shape that seems reasonable in light of the content being tested
and what you expect the students to know already when they take the test.
Finally, skewness and kurtosis are probably the most commonly misinter-
preted and overinterpreted statistics discussed in this article. In particular, it is
important for beginners to keep in mind that any distribution found in real life
will have some degree of positive or negative skewness and kurtosis. Thus, having
60 CARR
TABLE 2
Functions and Formulas for Calculating Descriptive Statistics in Microsoft
Excel Spreadsheet Software
a minor negative skewness (e.g., –0.034) does not necessarily suggest that a test
was a post-test. The same holds true for minor positive skewness and pretests.
8
For those having trouble reading the scores in the figure, they are 45, 50, 38, 79, 56, 38, 57, 3, 30,
and 63.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 61
select the range by holding down the mouse button and dragging the cursor down to the
bottom of the data, or by holding down the Shift key and pressing the down arrow (↓)
or Page Down key on the keyboard. When the entire range is selected, type the close
parentheses sign, and hit the Enter key. Note that the functions are not case sensitive.
When a user is less confident, however, or cannot remember exactly how a
function works, it is possible to click on Insert → Function. . ., and find the desired
function there. Something that may prove initially confusing is that for many func-
tions, Excel has two text boxes, labeled “Number 1” and “Number 2.” Simply
ignore the second one, and click on the button to the right of the first text box (the
one with the red arrow). As with creating histograms, most of the dialogue box will
disappear, and you select the range containing the data. After finishing, press the
Enter key, or click on the button to the right of the floating text box (again, the but-
ton with the red arrow). Click OK, and the function is calculated.
When entering the formulas for the range and semi-interquartile range, instead
of typing the specific cell addresses given in Table 2, it is necessary to use the
addresses in your own spreadsheet. When it would be time to type a cell address,
use the mouse to click on the desired cell (e.g., the cell containing the value for
Q3), and then continue typing—that is, do not then click on the cell where you are
62 CARR
entering the formula. When the formula is finished, hit the Enter key. As a final
point, the spaces in the formulas are optional.
CONCLUSION
ACKNOWLEDGMENT
I thank two anonymous reviewers for the feedback and encouragement that they
offered on a previous draft of this article. Any remaining shortcomings are, of
course, my own responsibility.
REFERENCES
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge, UK: Cambridge
University Press.
Bachman, L. F., & Kunnan, A. J. (2005). Workbook and CD for statistical analyses for language
assessment. Cambridge, UK: Cambridge University Press.
Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language
assessment (2nd ed.). New York: McGraw-Hill.
Educational Testing Service. (2005). TOEFL test and score data summary: 2004–2005 test year data
(Report No. TOEFL-SUM-0405-DATA). Princeton, NJ: Author. Retrieved December 17, 2006,
from http://www.ets.org/Media/Tests/TOEFL/pdf/Test%20and%20Score%20Data%20Summary
%2004_05.pdf
Gorard, S. (2004, September). Revisiting a 90-year-old-debate: The advantages of the mean devia-
tion. Paper presented at the British Educational Research Association Annual Conference, Univer-
sity of Manchester, England. Retrieved December 20, 2006, from http://www.leeds.ac.uk/educol/
documents/00003759.htm
Guilford, J. P., & Fruchter, B. (1978). Fundamental statistics in psychology and education (6th ed.).
New York: McGraw-Hill.
International Language Testing Association. (2000). Code of ethics. Retrieved December 17, 2006,
from http://www.iltaonline.com/code.pdf