Excel Statistical Formulas
Excel Statistical Formulas
Excel Statistical Formulas
Terms
You accept that this product is intended for your use, and you will not
duplicate in any form or manner, electronic or otherwise, copies of this
product nor distribute this product to anyone else.
You recognize that the product and its content are the sole property of the
Publisher, and that we have copyrighted the product.
You agree that the Publisher is not responsible for any interruption of
service or malfunction that is a consequence of the Internet, a service
provider, personal computer, browser or other software or hardware
components. You accept that there is no guarantee that this product is
totally error free. You further understand and accept that the Publisher
intends to provide reliable information but does not guarantee the accuracy
or completeness of any information, and is not responsible for any results
obtained from the use of such information.
Contents
Introduction This sheet
Ex.1 Analyzing Single-Variable Data Sets using Microsoft Excel
Ex.2 Statistical Analysis using the Summary Statistics Tool
Ex.3 Data Representation - Histograms
Ex.4 Analyzing Two-Variable Data Sets using Correlation
Ex.5 Data Representation - Scatter Diagrams
Ex.6 Simple Forecasting - Adding Trendlines to your Scatter Diagrams
Overview
The Random House College Dictionary defines statistics as the science that deals with the
collection, classification, analysis, and interpretation of information or data. In business we
use statistical analysis to reveal such trends as the number of employees working in
high-tech companies compared to banking or consulting. One might use this data to
determine if the supply of available workers will meet demand. Often the data we analyze are
selected from a larger set of data whose characteristics we want to know something about.
For example, we might collect the number of job openings at a high-tech company as
compared to a bank and a consulting company. The companies surveyed are part of a
sample. By analyzing the data from these sample companies we hope to draw conclusions
about the larger population of all high-tech, banking, and consulting companies.
This brief explanation leads us to break the study of statistics into two broad categories
descriptive statistics and inferential statistics.
Descriptive statistics utilize numbers and graphs to look for patterns in a data set, summarize
the data, and present the data in a convenient form.
Inferential statistics utilize sample data to help make estimates, decisions, predictions, or
other generalizations about a larger set of data.
This workbook will help introduce several fundamental statistical concepts and provide you
with hands-on experience using the powerful statistical analysis tools built into Microsoft
Excel.
Mean (average)
Median (median)
Mode (mode)
Range (min, max)
Variance (var)
Standard deviation (stddev)
Summary statistics (Data analysis add-in)
Histogram (Data analysis add-in)
Correlation (Data analysis add-in)
Scatter diagram (Chart wizard)
Trendline (Chart wizard)
Introduction to Statistics Using Excel INTRODUCTION
Directions
You may want to print these directions as a reference guide for this tool.
Introduction to Statistics Using Excel is a self-instructional workbook (tutorial) that introduces the user
to ten Microsoft Excel statistical analysis tools and their corresponding statistical concepts. Each
exercise is self-contained but the workbook is designed to be completed in order from Exercise 1 to
Exercise 5.
Note: you may want to print the entire workbook before you begin so that you can refer to it
as you work through the Excel-based exercises.
HBS Menu
Show Calculator: Launches Windows calculator
Show/Hide Celltips: Toggles in/out red Celltips in documented cells
Print Sheet with Celltips: Prints Celltip documentation on current sheet
Set Zoom: Provides quick access to 80%, 100%, and 125% zoom levels
Visit Web Links: Links to HBS Toolkit website, Toolkit Glossary, and Toolkit
Feedback, as well as HBS and HBS Publishing web sites
About HBS Toolkit: Launches the about box for the HBS Toolkit
Jon B. DeFriese MBA `00 developed this software under the supervision of Professor Frances X.
Frei as the basis for class discussion rather than to illustrate either the effective or ineffective
handling of an administrative situation.
This exercise demonstrates how to analyze sets of data that contain one variable, in this case,
the selling price of residential real-estate.
Statistics The study of ways to collect, describe, draw conclusions, and make projections from data
Population A group of objects about which information is to be gained
Sample A subset of a population used to gain information about the whole population
Measures of Central Tendency Summary measures used to describe data.
Mean The sum of the data divided by the number of data points in the data set (the average)
Median The middle number when the data set is arranged in ascending (or descending) order
Mode The most frequently occurring number in the data set
Measures of Variability Summary measures used to describe data.
Range The largest number in the data set minus the smallest number in the data set
Sample Variance The sum of each data points distance from the mean, squared, and divided
by the number of data points minus one (consult the Excel Help file for the equation)
Sample Standard Deviation The positive square root of the sample variance
1. To determine the Mean (the statistical average) of the data set follow these steps:
Formulas:
Mean (Average) AVERAGE(G66:G76)
Median MEDIAN(G66:G76)
Mode MODE(G66:G76)
Range MAX(G66:G76)-MIN(G66:G76)
Variance VAR(G66:G76)
Standard Deviation STDEV(G66:G76)
Selling Price
$109,360
$137,980
$131,230
$130,230
Data Set $125,410
$124,370
$109,360
$139,030
$140,160
$144,220
$154,190
Now follow the same steps substituting the appropriate Excel command in place of the Average command.
Check your answers below.
EXERCISE 1
Introduction to Statistics Using Excel SINGLE-VARIABLE
DATA SET ANALYSIS
This is what the data entry dialog box should look like:
This is what the data entry dialog box should look like:
This is what the data entry dialog box should look like:
This is what the data entry dialog box should look like:
$154,190 MAX(G66:G76)
$109,360 MIN(G66:G76)
This is what the data entry dialog box should look like:
This is what the data entry dialog box should look like:
This concludes Exercise 1: Analyzing Single-variable data sets using Microsoft Excel.
Exercise 2 demonstrates how to use the Excel tool Summary Statistics to combine these steps into one command
This exercise continues with the same data set but introduces a tool that allows you to
quickly calculate all of the individual measures previously introduced (and several others)
using a single Excel tool called Summary Statistics.
Mean The sum of the data divided by the number of data points in the data set (the average)
Standard Error The standard error of the mean of the sample
Median The middle number when the data set is arranged in ascending (or descending order)
Mode The most frequently occurring number in the data set
Standard Deviation The positive square root of the sample variance
Sample Variance The sum of each data points distance from the mean, squared, and divided
by the number of data points minus one (consult the Excel Help file for the equation)
Kurtosis The relative peakedness or flatness of a distribution compared with the normal distribution
Skewness The degree of asymmetry of a distribution around its mean
Range The largest number in the data set minus the smallest number in the data set
Minimum The smallest number in the data set
Maximum The largest number in the data set
Sum The data points added together
Count The number of data points
Note: Summary Statistics requires the Excel Data Analysis Add-In. If Data Analysis is not
available under the Tools menu, you will need to install the Analysis Toolpak. Under Tools,
click Add-Ins..., select the Analysis Toolpak and then click OK. If the Analysis Toolpak is
already checked, uncheck the box, click OK, and then repeat this procedure.
To analyze the data table using Summary Statistics follow these steps:
Step 1: Click on Tools from the Menu Bar and select Data Analysis
Step 2: Select Descriptive Statistics and click OK
Step 3: With your cursor in the Input Range cell, use your mouse to highlight the
data in the Selling Price column, including the label
Step 4: Select the Columns option in the Grouped By section and check Labels in First Row
Step 5: Under Output Options, place your cursor in the Output Range cell and use your
mouse to select labeled output cell
Step 6: Check off Summary Statistics and click OK.
Selling Price
$109,360
$137,980
$131,230
Data Set $130,230
$125,410
$124,370
$109,360
$139,030
$140,160
$144,220
$154,190
Your table should match the one at the bottom of this page.
Introduction to Statistics Using Excel EXERCISE 2
SUMMARY STATISTICS
Output cell
This is what the data entry dialog box should look like:
Mean 131412.727272727
Standard Error 4178.0896223707
Median 131230
Mode 109360
Standard Deviation 13857.1556178814
Sample Variance 192020761.818182
Kurtosis -0.2688724255
Skewness -0.3079635474
Range 44830
Minimum 109360
Maximum 154190
Sum 1445540
Count 11
This concludes Exercise 2: Statistical Analysis using the Summary Statistics Tool
Exercise 3 demonstrates how to graphically represent your data using Excel to create a histogram.
This exercise introduces a powerful tool which allows you to graphically represent
your data set in addition to analyzing its summary statistics. In this example we use
the same selling price information to construct a graphical representation of the data
called a histogram.
To generate a histogram, we must first define a range of selling price categories (called
Bins) so the histogram can assign each value to the appropriate category.
To do this, we have added a column next to Selling Price and labeled it Bin Range. The
Bin Range is the equally spaced set of categories we want to file each data point in.
To analyze the data and create the Histogram we will use the Histogram Tool which
is part of the Data Analysis Add-In.
Note: The Histogram Tool requires the Excel Data Analysis Add-In. If Data Analysis is not
available under the Tools menu, you will need to install the Analysis Toolpak. Under Tools,
click Add-Ins..., select the Analysis Toolpak and then click OK. If the Analysis Toolpak is
already checked, uncheck the box, click OK, and then repeat this procedure.
Step 1: Click on Tools from the Menu Bar and select Data Analysis
Step 2: Select Histogram and click OK
Step 3: With your cursor in the Input Range cell, use your mouse to highlight the
data in the Selling Price column, including the label
Step 4: With your cursor in the Bin Range cell, use your mouse to highlight the
data in the Bin Range column, including the label
Step 5: Check the Labels check box
Step 6: Under Output Options, place your cursor in the Output Range cell and use your
mouse to select labeled output cell,
Step 7: Check the Chart Output check box, Click OK
Your Histogram and output table should match the one at the bottom of this page.
EXERCISE 3
Introduction to Statistics Using Excel DATA REPRESENTATION
HISTOGRAMS
Output cell
*Note: You may need to resize the Histogram in order to see the y axis values.
This can be done by clicking on the Histogram and dragging one of the points at the
corner with the left mouse button held down.
This is what the data entry dialog box should look like:
Answer 8
Frequency
This exercise presents a data set with two residential real-estate variables: selling price and size in square feet.
The correlation coefficient, a summary statistic, is often used to indicate the degree to which
two variables (x and y) are related (more specifically, the degree to which they are linearly related).
The correlation coefficient is represented by the letter r. A value of r near 0 implies little or
no relationship between x and y. An r value of 1 implies a perfect positive relationship
between x and y. The closer the value is to 1, the stronger the correlation. An r value of -1
implies a perfect negative relationship between x and y. The closer the value of r is to -1 the
stronger the negative correlation.
An example of positive correlation would be as the number of rainy days go up (monsoon season)
rain coat sales go up.
An example of negative correlation would be as the number of rainy days go down (during a drought)
bathing suit sales go up.
To perform correlation analysis we will use the Correlation Tool which is part of the Data Analysis Add-In.
Note: The Histogram Tool requires the Excel Data Analysis Add-In. If Data Analysis is not
available under the Tools menu, you will need to install the Analysis Toolpak. Under Tools,
click Add-Ins..., select the Analysis Toolpak and then click OK. If the Analysis Toolpak is
already checked, uncheck the box, click OK, and then repeat this procedure.
Step 1: Click on Tools from the Menu Bar and select Data Analysis
Step 2: Select Correlation and click OK
Step 3: With your cursor in the Input Range cell, use your mouse to highlight the
data in the Square Feet and Selling Price columns, including the labels
Step 4: Select the Columns option in the Grouped By section and check Labels in First Row
Step 5: Under Output Options, select Output Range and place your cursor in the
Output Range cell and
Step 6: Use your mouse to select labeled output cell, Click OK.
Output cell
The r value is the value in the Square Feet row and the Selling Price column, in this case .8758699.
EXERCISE 4
Introduction to Statistics Using Excel TWO-VARIABLE DATA
SET ANALYSIS
CORRELATION
This is what the data entry dialog box should look like:
The r value is the value in the Square Feet row and the Selling Price column, in this case .8758699.
This exercise presents a data set with two residential real-estate variables: selling price and
size in square feet.
A Scatter Diagram (also called a scatter plot, scatter chart, or scattergram) shows an
approximate straight-line relationship between the points in a data set. In Scatter
Diagrams the horizontal axis (the x axis) is labeled with one variable (in our example
we use Square Feet) and the vertical axis (the y axis) is labeled with the other variable
(in this case Selling Price). For each observation, a point is plotted whose
coordinates are that observation's values on both x and y.
Step 1: Select Chart Wizard from the top toolbar (or select Chart from the Insert menu)
Step 2: Select the XY (scatter) chart type, Click the Next Button
Step 3: Place your cursor in the Data Range cell
Step 4: Highlight the entire contents of the table, including labels, Click the Next button
Step 5: Use the default values to show the legend at the right of the graph
Note: You can label the X and Y axes if you wish
Step 6: Select the option to place the chart as an Object in Ex.5, Click the Finish button
Your Scatter Diagram should match the one at the bottom of this page.
EXERCISE 5
Introduction to Statistics Using Excel DATA REPRESENTATION
SCATTER DIAGRAMS
Note: You may need to click on the chart and drag it into this space.
Answer 10:
Selling Price
$250,000
$200,000
$150,000
Selling Price
$100,000
$50,000
$0
1400 1600 1800 2000 2200 2400 2600 2800
This exercise will walk you through adding a forward-looking and a backward looking trendline to the scatter
diagram we created in exercise 5.
Trendlines
Now that you have identified the straight-line relationship between the x and y points in the
data set, you may want to extrapolate to determine what possible values are above or below
the end-points of your scatter diagram. Trendlines are used to analyze problems of
prediction. You can extend a trendline in a chart forward or backward beyond the actual data
to show a trend. For example, since the maximum house size for which we have data is
2600 square feet, to forecast the price at 3000 square feet we will add a trendline of 400 units
to our scatter diagram. We might also be interested in what a 1000 square foot house would
sell for based on our sample data.
Note: Although beyond the scope of this workbook. The add trendline feature uses a
concepts known as regression to add the trendline (also known as a regression line) and
extend it beyond the points for which we have data in our data set. For more information
about regression and trendlines, consult the Introduction to Regression Using Excel
Workbook that is part of the HBS Toolkit.
To add a trendline to the scatter diagram you created in exercise 5 follow these steps (a copy of the graph is
located below):
Step 1: Use the right mouse buton to click on any data point in the graph
Step 2: Select Add Trendline from the menu
Step 3: Under the Type tab select Linear
Step 4: Under the 0ptions tab in the Forecast section place your cursor the Forward box
Step 5: Enter 400 units (3000 sq. ft. - 2600 sq. ft.)
Step 6: Now place your curson in the Backward box
Step 7: Enter 500 units (1500 sq. ft. - 1000 sq. ft.), Click OK
Selling Price
$250,000
$200,000
$150,000
Selling Price
$100,000
$50,000
$0
1400 1600 1800 2000 2200 2400 2600 2800
EXERCISE 6
Introduction to Statistics Using Excel SIMPLE FORECASTING
TRENDLINES
This is what the data entry dialog box should look like:
EXERCISE 6
Introduction to Statistics Using Excel SIMPLE FORECASTING
TRENDLINES
Answer 11:
Chart Title
$250,000
$200,000
$50,000
$0
1400 1600 1800 2000 2200 2400 2600 2800
This concludes Exercise 6: Simple Forecasting - Adding Trendlines to your Scatter Diagrams
This concludes the Introduction to Statistics Using Excel Workbook