0% found this document useful (0 votes)
2 views

Test run

This user guide provides detailed instructions on using Microsoft Excel to create various types of frequency distributions, including simple, relative, cumulative, and two-way column percent distributions. It also covers how to generate histograms and boxplots to analyze data shape and identify outliers. The guide assumes a basic understanding of Excel and aims to help users present accurate statistical results in an academic context.

Uploaded by

Saraye Jaykishan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Test run

This user guide provides detailed instructions on using Microsoft Excel to create various types of frequency distributions, including simple, relative, cumulative, and two-way column percent distributions. It also covers how to generate histograms and boxplots to analyze data shape and identify outliers. The guide assumes a basic understanding of Excel and aims to help users present accurate statistical results in an academic context.

Uploaded by

Saraye Jaykishan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1.

Introduction

This user guide is designed to help you effectively use Microsoft Excel for generating and
interpreting several types of frequency distributions. While modern analytics tools abound,
pivot tables in Excel remain a powerful, straightforward means of producing key descriptive
statistics. Through step-by-step instructions and clear explanations, this guide will illustrate
how to create frequency distributions (simple, relative, cumulative, and two-way column
percent), how to use a histogram for grouped data, and how to construct a boxplot to assess
data shape and outliers.

All examples reference an imaginary Excel file named AssignmentData.xlsx, which you
should already have on hand. In practice, you will adapt the column names and data ranges
accordingly. By following these instructions, you will be able to explain your methods
formally, demonstrate a thorough understanding of Excel’s features, and present clean,
accurate results that adhere to academic standards.

2. Purpose and Scope

The main objectives of this guide are:

1. Frequency Distributions Using Pivot Tables

• Construct a frequency distribution for a single numeric dataset.

• Determine the modal value (the most frequently occurring number).

• Estimate the mean from the frequency distribution.

2. Relative Frequency Distribution

• Convert raw counts to proportions or percentages of the total.

• Estimate the mean using relative frequencies.

3. Cumulative Frequency Distribution

• Produce a running total of frequencies in ascending order.

• Locate the median by referencing the cumulative frequencies.

4. Two-Way Column Percent Frequency Distribution

• Display a contingency table (two-way table) of column percentages for two


categorical variables.

• Generate likelihood (probability) values based on those column percentages.

5. Histogram and Grouped Frequency Distribution

• Create a histogram in Excel for a numeric dataset.

• Use grouped intervals to construct a grouped frequency distribution.

• Estimate the mean from the grouped data.


6. Boxplot for Distribution Shape and Outliers

• Generate a box-and-whisker plot.

• Identify skewness (left-skewed, symmetric, or right-skewed).

• Determine whether there are any outliers in the data.

This guide assumes a working knowledge of Excel’s basic interface, including navigating the
Ribbon, selecting data ranges, and understanding how to insert charts or pivot tables. If you
require additional support, you may consult standard Excel documentation or the resources
provided in Kingston University’s online materials.

3. Brief Overview of the Data (AssignmentData.xlsx)

Before diving into techniques, it is helpful to note the structure of the data in
AssignmentData.xlsx. Generally, this file may include:

• Numeric variables (e.g., “DataSet1,” “DataSet2”), which are suitable for


frequency distributions, histograms, and boxplots.

• Categorical variables (e.g., “Gender,” “Category,” or “Location”), which are


essential for two-way tables and column percentages.

Ensure your dataset is arranged with proper headings in the first row and that the columns
containing numeric values have consistent data types. This setup allows Excel’s pivot table
feature to recognize the fields you wish to analyze.

4. Generating a Frequency Distribution Using a Pivot Table

4.1 Steps to Create the Frequency Distribution

1. Open AssignmentData.xlsx and locate the numeric column you want to


analyze (for instance, “DataSet1”).

2. Click any cell within the “DataSet1” column.

3. From the Ribbon, go to Insert → PivotTable. In the “Create


PivotTable” dialog box, confirm the selected range (which should include all data
in “DataSet1”).

4. Choose to place the pivot table on a New Worksheet or within an Existing


Worksheet, depending on your preference.

5. In the PivotTable Fields panel:

• Drag “DataSet1” into the Rows area.

• Drag “DataSet1” again into the Values area. This will typically default to
“Count of DataSet1.”

6. You will now see a table displaying each unique value of “DataSet1” (in the
Rows) and how many times it appears (in the Values).

4.2 Identifying the Modal Value


• The modal value is the most frequently occurring observation. Once the pivot
table is generated, look down the “Count of DataSet1” column. The row with the highest
count is your mode.

4.3 Estimating the Mean from the Frequency Distribution

If each cell in the pivot table represents a unique value (call it x_i) and its corresponding
frequency (f_i), you can estimate the mean by applying the following steps:

1. Insert a helper column next to the pivot table (not strictly inside the pivot
itself).

2. In this helper column, multiply each unique value by its frequency:

x_i \times f_i

3. Sum those products to obtain

\sum (x_i \times f_i).

4. Divide by the total number of data points \sum f_i.

For example, if in one row the value “5” has a frequency of “4,” that row’s product is 5 \times
4 = 20. You would do this for all unique values, sum those products, and then divide by the
sum of all frequencies. This approach provides a simple arithmetic mean (or very close to
the exact mean if you have a comprehensive list of discrete values).

5. Generating a Relative Frequency Distribution

5.1 Converting Counts to Relative Frequencies

A relative frequency distribution indicates the proportion (or percentage) that each value
contributes to the total count. To create one using the pivot table:

1. Create a pivot table in the same manner as in Section 4.1.

2. In the PivotTable Fields, place “DataSet1” in both the Rows and Values
areas.

3. Within the pivot table, click on the arrow next to “Count of DataSet1” in the
Values area. Then select Value Field Settings.

4. Go to the Show Values As tab (sometimes labeled “Show Values As” in


certain Excel versions), and choose % of Grand Total.

5. The table will now display the relative frequency (in percentage terms) of
each value.

5.2 Calculating the Mean from Relative Frequency

To approximate the mean using relative frequencies:

1. Note each unique value x_i from the pivot table and its relative
frequency r_i, where r_i is the fraction of the total occurrences (not just the
percentage). If Excel shows percentages, convert them to decimal form (e.g.,
25% → 0.25).
2. Compute \sum (x_i \times r_i).

3. The result is the mean of your distribution.

For instance, if “5” has a relative frequency of 0.20 (i.e., 20%), you multiply 5 \times 0.20 = 1.
You add these products across all distinct values to get the mean.

6. Generating a Cumulative Frequency Distribution

6.1 Constructing the Pivot Table and Sorting

Excel’s pivot tables do not have a direct “cumulative frequency” option, but you can create it
manually:

1. Insert a pivot table with “DataSet1” in the Rows and Values areas, as before.

2. Right-click within the pivot table and select Sort → Sort Smallest to
Largest, ensuring your numeric values are in ascending order.

6.2 Building the Cumulative Frequency Column

1. Next to the pivot table, create a new column labeled “Cumulative Frequency.”

2. Suppose the first count (in the pivot table) appears in cell B4. In cell C4 (the
first cell of the new column), enter =B4.

3. In the cell C5, enter =C4 + B5. Drag this formula down to accumulate the
counts.

4. Each cell in column C now represents the total frequency from the first row up
to that row in the sorted list.

6.3 Finding the Median

• The median is the middle value when data is sorted. If your total count is N,
the median is near the \frac{N+1}{2}-th observation (if N is odd) or the average of the \frac{N}
{2}-th and \left(\frac{N}{2} + 1\right)-th observations (if N is even).

• Examine your Cumulative Frequency column to see where the cumulative


count meets or exceeds \frac{N}{2}. That value (or the average of two adjacent values)
provides the median.

For instance, if you have 20 data points in total, you will look for the 10th and 11th
observations. If your cumulative frequencies in rows 6 and 7 surpass 10 and 11, you would
average the values in “DataSet1” for those two rows to find the median.

7. Generating a Two-Way Column Percent Frequency Distribution

7.1 Purpose of a Two-Way Table

When dealing with two categorical variables (for example, “Gender” and “Preference”), a
two-way table helps you see how frequently each category pair occurs. A column percent
frequency distribution is particularly useful for comparing proportions within each column.

7.2 Steps to Create the Pivot Table


1. Identify two categorical fields, for instance “CategoryA” and “CategoryB.”

2. Highlight the entire data range (including both columns).

3. Go to Insert → PivotTable → choose the location for the new pivot


table.

4. In the PivotTable Fields pane:

• Drag “CategoryA” into the Columns area.

• Drag “CategoryB” into the Rows area.

• Drag one of these fields (say, “CategoryB”) again into the Values area, so that
Excel generates a count of occurrences.

7.3 Converting Counts to Column Percentages

• In the pivot table, click the drop-down arrow on “Count of


CategoryB” (located in the Values area). Select Value Field Settings → Show
Values As → % of Column Total.

• The resulting table entries show the percentage each cell contributes within
its column.

7.4 Generating Likelihood Values

• These column percentages can be interpreted as conditional probabilities. For


instance, if the entry in the pivot table for “CategoryA = A1” and “CategoryB = B2” is 30%,
that means:

“Given CategoryA is A1, the likelihood that CategoryB is B2 is 30%.”

• This approach is often used in contingency table analysis to explore


relationships between two categorical variables.

8. Generating a Histogram and Grouped Frequency Distribution

8.1 Creating a Histogram

For a numeric dataset (say “DataSet2”):

1. Highlight the entire “DataSet2” column (including the header).

2. In modern versions of Excel, go to Insert → Insert Statistic Chart


(look for the histogram icon) → select Histogram.

• Alternatively, if you have the Data Analysis ToolPak enabled


(accessible via Data → Data Analysis → Histogram), you can specify the input
range and bin ranges.

3. Excel will produce a histogram with automatically chosen bin widths. To


adjust these:

• Right-click on the horizontal axis of the histogram.


• Select Format Axis.

• Customize “Bin width” or “Number of bins” according to your preference.

8.2 Using the Histogram to Create a Grouped Frequency Distribution

A histogram, by definition, sorts data into intervals (“bins”) and plots the frequency for each
interval. You can replicate this in a table:

1. Identify the bins (intervals) used in the histogram (e.g., 0–4, 5–9, 10–14, etc.).

2. In a separate area of the worksheet, list these bin intervals as row labels.

3. Next to each bin interval, note the histogram’s frequency (i.e., how many
values fall into that range).

4. You have now formed a grouped frequency distribution, where each interval’s
midpoint can be used for estimation of certain statistics.

8.3 Estimating the Mean from the Grouped Frequency Distribution

Use the formula for the approximate mean from grouped data:

1. Find the midpoint of each bin:

\text{Midpoint of bin} = \frac{\text{Lower bound} + \text{Upper bound}}{2}

2. Multiply each midpoint by the frequency of that bin:

m_i \times f_i

3. Sum these products:

\sum (m_i \times f_i)

4. Divide by the total number of observations:

\frac{\sum (m_i \times f_i)}{\sum f_i}

This approximation works best when bin intervals are narrow and the data is relatively
evenly distributed within each bin.

9. Creating a Boxplot and Assessing Distribution Shape

9.1 Steps to Generate the Box-and-Whisker Plot

A boxplot (or box-and-whisker chart) provides a visual summary of the minimum value, first
quartile (Q1), median (Q2), third quartile (Q3), and maximum, along with identification of
potential outliers. To create one in Excel:

1. Highlight the numeric data in your chosen column (e.g., “DataSet2”).

2. From the Ribbon, go to Insert → Insert Statistic Chart → Box and


Whisker.
3. Excel will automatically generate the five-number summary and display it in
the form of a boxplot.

9.2 Interpreting the Boxplot

1. Median (Q2): The line inside the box.

2. First Quartile (Q1): The lower boundary of the box (25th percentile).

3. Third Quartile (Q3): The upper boundary of the box (75th percentile).

4. Minimum and Maximum: The whiskers extend up to these points, provided


they are not designated as outliers.

5. Outliers: Points that lie more than 1.5 × IQR (interquartile range) below Q1 or
above Q3. These appear as individual markers or dots.

9.3 Describing the Shape of the Distribution

• Left-Skewed (Negative Skew): The median is often closer to the upper


quartile, and a longer tail extends to the left.

• Right-Skewed (Positive Skew): The median is closer to the lower quartile, and
a longer tail extends to the right.

• Symmetric: The median line is roughly in the center of the box, with the
whiskers near equal length.

You can supplement this visual judgment by comparing the mean and median numerically. If
the mean is larger than the median, the distribution tends to be right-skewed; if the mean is
smaller than the median, the distribution is generally left-skewed.

9.4 Detecting Outliers

Any data points beyond the whiskers are flagged by Excel as outliers. While these outliers
might be legitimate extreme values, they can also indicate potential measurement errors or
rare events. You should decide whether to investigate or remove them based on the context
and objectives of your analysis.

10. Putting It All Together: Suggested Workflow

When tasked with analyzing a numeric dataset in AssignmentData.xlsx, you can follow this
sequence to ensure thoroughness:

1. Initial Overview: Quickly scan the data for completeness or anomalies.

2. Basic Frequency Distribution: Use a pivot table to generate raw frequencies,


find the mode, and get a rough estimate of the mean.

3. Relative Frequency Distribution: Convert frequencies to percentages to see


the proportion of total occurrences.

4. Cumulative Frequency: Sort data and accumulate frequencies to identify


quartiles and the median.
5. Two-Way Analysis (if you have categorical variables that need cross-
tabulation): Build a two-way pivot table to examine column percentages and likelihood
values.

6. Histogram: Visualize the shape of numeric data in bins, then record the
grouped frequencies to refine your understanding of distribution.

7. Grouped Mean: If intervals are relevant, estimate the mean by using bin
midpoints and frequencies.

8. Boxplot: Identify medians, quartiles, skewness, and outliers to finalize your


narrative about the data’s overall structure.

By adhering to these steps, you will produce a coherent analysis that addresses the core
requirements of most descriptive statistics assignments.

11. Conclusion

In this guide, we have demonstrated how to leverage Excel’s pivot tables, histogram tool,
and box-and-whisker plots to generate:

• Simple Frequency Distributions (to identify modes and estimate means).

• Relative Frequency Distributions (to convey the proportion of each value and
estimate the mean through weighted proportions).

• Cumulative Frequency Distributions (to locate medians accurately).

• Two-Way Column Percent Frequency Tables (to explore relationships


between two categorical variables and compute likelihoods).

• Histograms (to visualize the distribution of numeric data and develop grouped
frequency tables).

• Boxplots (to summarize quartiles and detect skewness and outliers).

These methods, while straightforward, cover the most critical descriptive statistics functions
you will likely need at an undergraduate level. Excel remains a flexible tool, allowing you to
illustrate key concepts such as central tendency, dispersion, skewness, and outlier analysis
without requiring advanced programming skills.

By systematically applying each technique to the data in AssignmentData.xlsx, you ensure a


rigorous approach that can be adapted to a wide range of data sets and academic tasks.
Whether you are exploring a small sample or a large dataset, these pivot-table-driven
strategies offer clarity, reproducibility, and professional-standard outputs.

You might also like