Test run
Test run
Introduction
This user guide is designed to help you effectively use Microsoft Excel for generating and
interpreting several types of frequency distributions. While modern analytics tools abound,
pivot tables in Excel remain a powerful, straightforward means of producing key descriptive
statistics. Through step-by-step instructions and clear explanations, this guide will illustrate
how to create frequency distributions (simple, relative, cumulative, and two-way column
percent), how to use a histogram for grouped data, and how to construct a boxplot to assess
data shape and outliers.
All examples reference an imaginary Excel file named AssignmentData.xlsx, which you
should already have on hand. In practice, you will adapt the column names and data ranges
accordingly. By following these instructions, you will be able to explain your methods
formally, demonstrate a thorough understanding of Excel’s features, and present clean,
accurate results that adhere to academic standards.
This guide assumes a working knowledge of Excel’s basic interface, including navigating the
Ribbon, selecting data ranges, and understanding how to insert charts or pivot tables. If you
require additional support, you may consult standard Excel documentation or the resources
provided in Kingston University’s online materials.
Before diving into techniques, it is helpful to note the structure of the data in
AssignmentData.xlsx. Generally, this file may include:
Ensure your dataset is arranged with proper headings in the first row and that the columns
containing numeric values have consistent data types. This setup allows Excel’s pivot table
feature to recognize the fields you wish to analyze.
• Drag “DataSet1” again into the Values area. This will typically default to
“Count of DataSet1.”
6. You will now see a table displaying each unique value of “DataSet1” (in the
Rows) and how many times it appears (in the Values).
If each cell in the pivot table represents a unique value (call it x_i) and its corresponding
frequency (f_i), you can estimate the mean by applying the following steps:
1. Insert a helper column next to the pivot table (not strictly inside the pivot
itself).
For example, if in one row the value “5” has a frequency of “4,” that row’s product is 5 \times
4 = 20. You would do this for all unique values, sum those products, and then divide by the
sum of all frequencies. This approach provides a simple arithmetic mean (or very close to
the exact mean if you have a comprehensive list of discrete values).
A relative frequency distribution indicates the proportion (or percentage) that each value
contributes to the total count. To create one using the pivot table:
2. In the PivotTable Fields, place “DataSet1” in both the Rows and Values
areas.
3. Within the pivot table, click on the arrow next to “Count of DataSet1” in the
Values area. Then select Value Field Settings.
5. The table will now display the relative frequency (in percentage terms) of
each value.
1. Note each unique value x_i from the pivot table and its relative
frequency r_i, where r_i is the fraction of the total occurrences (not just the
percentage). If Excel shows percentages, convert them to decimal form (e.g.,
25% → 0.25).
2. Compute \sum (x_i \times r_i).
For instance, if “5” has a relative frequency of 0.20 (i.e., 20%), you multiply 5 \times 0.20 = 1.
You add these products across all distinct values to get the mean.
Excel’s pivot tables do not have a direct “cumulative frequency” option, but you can create it
manually:
1. Insert a pivot table with “DataSet1” in the Rows and Values areas, as before.
2. Right-click within the pivot table and select Sort → Sort Smallest to
Largest, ensuring your numeric values are in ascending order.
1. Next to the pivot table, create a new column labeled “Cumulative Frequency.”
2. Suppose the first count (in the pivot table) appears in cell B4. In cell C4 (the
first cell of the new column), enter =B4.
3. In the cell C5, enter =C4 + B5. Drag this formula down to accumulate the
counts.
4. Each cell in column C now represents the total frequency from the first row up
to that row in the sorted list.
• The median is the middle value when data is sorted. If your total count is N,
the median is near the \frac{N+1}{2}-th observation (if N is odd) or the average of the \frac{N}
{2}-th and \left(\frac{N}{2} + 1\right)-th observations (if N is even).
For instance, if you have 20 data points in total, you will look for the 10th and 11th
observations. If your cumulative frequencies in rows 6 and 7 surpass 10 and 11, you would
average the values in “DataSet1” for those two rows to find the median.
When dealing with two categorical variables (for example, “Gender” and “Preference”), a
two-way table helps you see how frequently each category pair occurs. A column percent
frequency distribution is particularly useful for comparing proportions within each column.
• Drag one of these fields (say, “CategoryB”) again into the Values area, so that
Excel generates a count of occurrences.
• The resulting table entries show the percentage each cell contributes within
its column.
A histogram, by definition, sorts data into intervals (“bins”) and plots the frequency for each
interval. You can replicate this in a table:
1. Identify the bins (intervals) used in the histogram (e.g., 0–4, 5–9, 10–14, etc.).
2. In a separate area of the worksheet, list these bin intervals as row labels.
3. Next to each bin interval, note the histogram’s frequency (i.e., how many
values fall into that range).
4. You have now formed a grouped frequency distribution, where each interval’s
midpoint can be used for estimation of certain statistics.
Use the formula for the approximate mean from grouped data:
This approximation works best when bin intervals are narrow and the data is relatively
evenly distributed within each bin.
A boxplot (or box-and-whisker chart) provides a visual summary of the minimum value, first
quartile (Q1), median (Q2), third quartile (Q3), and maximum, along with identification of
potential outliers. To create one in Excel:
2. First Quartile (Q1): The lower boundary of the box (25th percentile).
3. Third Quartile (Q3): The upper boundary of the box (75th percentile).
5. Outliers: Points that lie more than 1.5 × IQR (interquartile range) below Q1 or
above Q3. These appear as individual markers or dots.
• Right-Skewed (Positive Skew): The median is closer to the lower quartile, and
a longer tail extends to the right.
• Symmetric: The median line is roughly in the center of the box, with the
whiskers near equal length.
You can supplement this visual judgment by comparing the mean and median numerically. If
the mean is larger than the median, the distribution tends to be right-skewed; if the mean is
smaller than the median, the distribution is generally left-skewed.
Any data points beyond the whiskers are flagged by Excel as outliers. While these outliers
might be legitimate extreme values, they can also indicate potential measurement errors or
rare events. You should decide whether to investigate or remove them based on the context
and objectives of your analysis.
When tasked with analyzing a numeric dataset in AssignmentData.xlsx, you can follow this
sequence to ensure thoroughness:
6. Histogram: Visualize the shape of numeric data in bins, then record the
grouped frequencies to refine your understanding of distribution.
7. Grouped Mean: If intervals are relevant, estimate the mean by using bin
midpoints and frequencies.
By adhering to these steps, you will produce a coherent analysis that addresses the core
requirements of most descriptive statistics assignments.
11. Conclusion
In this guide, we have demonstrated how to leverage Excel’s pivot tables, histogram tool,
and box-and-whisker plots to generate:
• Relative Frequency Distributions (to convey the proportion of each value and
estimate the mean through weighted proportions).
• Histograms (to visualize the distribution of numeric data and develop grouped
frequency tables).
These methods, while straightforward, cover the most critical descriptive statistics functions
you will likely need at an undergraduate level. Excel remains a flexible tool, allowing you to
illustrate key concepts such as central tendency, dispersion, skewness, and outlier analysis
without requiring advanced programming skills.