Lesson2 1
Lesson2 1
Back Next
2.1 Frequency distribution of data
Estimated time required to achieve the outcomes: 8 hours
How to generate a frequency distribution diagram in Microsoft Excel
Suppose you have a data set consisting of 10 children’s names, surnames and marks for a certain
assessment or evaluation, for example a Test:
This is a typical raw data set. It may be ordered according to several different criteria, for example by
ordering the entries alphabetically with respect to the surnames:
Select (high-light the cells by clicking on the left mouse button and dragging the mouse marker over the
appropriate cells while holding in the button) all the data (not the headings).
Then click on “Data” in the top horizontal task bar and select “Sort”. When the sorting menu opens, select the
column where the surnames are displayed (it is column B):
Important: Check that the option “My data has headers” is NOT checked.
If you now click on “OK”, then Excel will sort the data set alphabetically according to the entries in
column B; Note that all the data associated with surname (namely the name and mark) is carried with
the surname – so the data does not become disassociated or jumbled:
In order to process and organize our ordered data set further, we may, for example, compute the
percentage achieved by each child. We can do this in Excel by entering a formula into cells D6 to D15 – this
formula must simply divide the mark in die C-column by the total of the tests in cell C4 and multiply the
result by 100 in order to obtain a percentage.
However, before we can do that we must note that each of the calculations which take place in cells D6 to
D 15, is divided by the same value (the value in C4). Because C4 is used in each of the rows from 6 to 15,
we must “anchor” the cell C4 in the formula which we enter into D6 by employing the dollar-sign. Position
the cursor in D6 and type” = “, left-click on C6, type “/ “, left-click on C4 and tap the F4 function key and type
“*100”
or insert a dollar sign before the column letter and before the row number of the cell you
which to anchor: $C$4
Press “Enter” and duplicate the formula from cell D6 to D 15 using the plus/cross-method explained on the
previous page in the shaded paragraph.
You may format cells D6 to D15 (or any rectangular section of the spreadsheet) to approximate numbers
correct to a certain number of decimal places. If, for example, you wish to approximate the percentages to
one decimal place then you must high-light the cells D6 to D15 and right-click on the shaded area and
select “Format Cells” from the drop-down menu. Then select “Number” in the next drop-down menu and set
the number or decimal places to 1.
If you now click “OK”, the following will happen:
You may also obtain this using the “Decrease Decimal” option:
The next step is to change the ordered data set into a grouped data set. Note that the term “grouped data”
indicates that we subdivide the data into groups (classes) and then determine the frequency for each class
indicates that we subdivide the data into groups (classes) and then determine the frequency for each class.
We wish to obtain a statistical picture or view of the percentages achieved by the class of children.
In the first place, note that a percentage is a number which belongs in the closed interval [0; 100]. Saying
this, we mean that a percentage may be zero, but not less than zero – it may also be 100, but not more
than 100 – and all values between 0 and 100 are also possible percentage values. Now think of the
interval [0; 100] as a horizontal number line. We may divide this number line in several parts; one way to do
this is to divide it into five parts of equal width:
In this way we obtain five classes of equal class width (namely 20%). In order to determine the frequencies
associated with each class we must simply sort the percentages achieved into the classes and count how
many numbers end up in each class:
We should now be able to construct the following frequency distribution table by hand:
The entire idea, however, is to do the process described above (the grouping and counting) by means of
Microsoft Excel. In order to achieve this, we proceed as follows:
Create for yourself three additional columns to the right of the percentage column. Label these columns
“class interval”, “Bin” and “Frequency”:
Cells E6 to E 10 you may format as “Text”; cells F6 to G10 you may format as “Number”, correct to two
decimal places.
Now we must fill in the “Bin” column. In order to do so, we must think of the cells E6 to E10 as bins
(containers). Always, format the cells in the “Bin” column so that it rounds to one decimal place more than
the data (percentages, in this case). In the “Bin” column we enter the upper value of the class interval.
Because the upper limit of each class is open (except of the last class) (note the round brackets), the bin
value of each class should be just smaller than the number to the left of the round bracket in the class
interval notation. So we fill in the “Bin” column values as follows:
The “Bin” column must contain values which are just smaller that the “next” value which should be sorted
into the class to the right of the current class. If the “Bin” value of the first class, for example, were 20 it
would mean that a percentage of exactly 20% would be sorted into that class – that is not what we want to
happen; 20% should be sorted into the second class. That is the meaning of the notation [0; 20) – all values
smaller that 20% should be sorted into that class.
We are now ready to employ the built-in “Frequency”-function in Excel in order to compute and display for
us the frequencies of the classes in cells G6 to G10.
High-light the cells G6 to G10 (where you want to see the frequencies)
Type into the formula window at the left top of the spreadsheet: = Frequency (
You will see that Excel now wants you to select (high-light) two columns, namely the column where the
data appear (the percentages, that is cells D6 to D15) as well as the “Bin” column (cells F6 to F10). When
you are done selecting (high-lighting) columns D6 to D15, type the comma and then select (high-light)
columns F6 to F10. Close the bracket in the top formula bar and DO NOT PRESS ENTER.
In order to automatically display the frequencies (cells G6 to G10) first press “CTRL” and “SHIFT” and
then, while you hold both keys in, hit the “Enter” key, releasing them simultaneously. (if you do not follow
the instructions above, it will not work.)
Alternative method:
If you typed the “Frequency”-formula in C6 without beforehand selecting the cells up to G10, or if you forgot
to press “CTRL” and “SHIFT” while you pressed the “Enter” key, you may proceed as follows to remedy the
situation: High-light G6 to G10, press F2 and press “CTRL” and “SHIFT” and then, while you hold both keys
in, hit the “Enter” key.
The following step is now to generate a histogram from the frequency distribution table. In order to
Select (high-light) the class interval-cells (cells D6 to D10), then press and hold CTRL and at the same
time, select the frequency cells (G6 to G10). Release CTRL.
Select “Insert” in the top horizontal task bar and select the column chart.
You may choose any type of column graph in Excel; there is nothing wrong with the first choice at the top
and left (experiment with the other types). Choose the first one presented. Your should obtain:
In order to make the vertical columns touch (as they should in a histogram), right-click on any column and
select “Format Data Series”. Set the “Gap width” as zero.
In order to vary the colour of the columns, select “Fill” on the left hand side and select “Vary colours by
point”:
To display the frequencies on each column, right-click on any column and select “Add Data Labels”.
Left-click anywhere on the graph to obtain the “DESIGN” -option from “CHART TOOLS” in the top
horizontal task bar.
Select “Add Chart Element” at the left in the top horizontal task bar to format and annotate your graph to
label the axes of the graph and to insert a title for the graph. You can also obtain interesting effects using
the “grid lines” options:
If you right-click on any white area inside the frame of the graph and you select “Move Chart”, then you
obtain an option to remove the graph out of the spreadsheet and to attach it as a large picture to the bottom
of the display.
It would now be relatively easy to generate a frequency polygon also of the data – such a representation
does not, however, provide any new insight in the situation and will not be assessed in this module.
Important:
Practice regularly with Excel and electronic mark books. It saves hours and enables you to present your
records in an accurate, neat and professional way.
Exercise 2.1 for self-assessment
Back Next