Using Excel For Statistics
Using Excel For Statistics
November 2000
University of Reading
Statistical Services Centre
Biometrics Advisory and
Support Service to DFID
Contents
1. Introduction 3
2. Adding to Excel 9
3. Conclusions 10
Tips Warnings
Whenever possible use Lists to keep your data
Use “names” to refer to each column of data.
Keep column names short; some statistical packages have
problems reading names longer than 8 characters.
Do not mix data with analysis or plots in the same
worksheet.
If you use Excel 97 or a later version, become familiar with
the facilities available for data entry under the Data menu, in
particular Form and Validation.
If you need to enter character data: Be aware that Excel only
(1) Keep them aligned to the left handles dates after
(2) Do not enter blanks as the initial characters of a cell 1st January 1900.
Use numerical codes for any well defined classification
variable,
e.g. Gender: 0 = Female, 1 = Male.
Use the VLOOKUP function in combination with numerical
codes to display text values attached to the numbers.
Filters can be used to restrict attention to subsets of the data
Sorting facilities work well for a maximum of up to 3
sorting criteria.
Become familiar with the use of relative and absolute
references.
Tips Warnings
Pivot tables are one of the most powerful data summary tools
in Excel. It produces cross-tabulations based on data kept on a
list, a database or other pivot tables.
Pivot tables are also useful to reorganise data as well as to
provide summaries.
Tips Warnings
The hypothesis test for the differences of We recommend against the overuse of
means, and for the variances, available statistical tests for one and two-sample
from the Analysis ToolPak, work well. problems. Confidence intervals are also
useful. Excel gives the components from
which you can calculate the intervals if
you know the formulae, but it would be
better if Excel gave the intervals directly.
Tips Warnings
Except for Single Factor Analysis, Excel only works if the
number of replications is equal for all treatments (balanced
data).
Does not allow missing values.
Lacks flexibility in the model fitted.
Encourages bad practice for data storage.
Requires extra work if data have been stored appropriately.
Uses incorrect names for the analysis it performs.
Lacks diagnostic tools.
If you need to perform Gives the impression that it is possible to use Excel for
analysis of variance, Analysis of Variance when in fact its capabilities are very
avoid using Excel, unless limited. It is a very restrictive approach to analysing data,
you are dealing with which is not only unnecessary but also undesirable.
extremely simple
problems.
Tips Warnings
Before fitting a regression line plot your Do not move data points on a scatter plot.
data. Excel will change your original values to
the new position of your point!
The Regression tool works correctly for Ignore the ANOVA and regression
the estimation of regression coefficients, statistics when using the regression tool
their standard errors and the Analysis of for regression through the origin. They are
Variance for data sets without missing wrong.
values and when the intercept is included
in the model.
Warning
The fields used for defining structure should normally be factors, i.e. discrete, categorical
variables (numeric, character or other types). Using a measurement variable could produce
a large table of nonsense.
The body of the table, labelled DATA, contains the variable(s) that you want to
summarise in the table. The data fields will usually be numeric, but other data types
are allowed, depending on what you want to summarise.
Adding a field
For this and certain other operations, it is best to use the tools on the PivotTable
toolbar:
The first button on the toolbar gets you back to the PivotTable wizard. You can then
add (or remove) fields in the same way that you constructed the table.
In this example, let us add a breakdown by GROUP to the table.
First click on any cell in the table (if you do not, you will be creating a new table).
Then click the PivotTable wizard button.
Drag the field GROUP to the PAGE space and click the Finish button.
The second button on the PivotTable toolbar is used for editing field specifications.
The particular dialog box used for modifying a field depends on whether it is a DATA
field or a structure field (ROW, COLUMN or PAGE).
First, to make changes to a field in the table structure (i.e. ROWS, COLUMNS or
PAGES), click on either the field name (e.g. GENDER), or one of its labels (e.g.
Male or Female)
Click the PivotTable Field button on the toolbar.
You should get the following dialog box:
Choose the PivotTable drop-down menu from the toolbar, and open the Options…
dialog.
Warning
The entire case (row) corresponding to an empty cell in a data field will be ignored
in the table. Check for empty cells in the data field before using it.
Tables of Percentages
It is often more informative to present table counts as percentages. These are usually
row or column percentages, but other percentage bases are sometimes required.
To continue with Example 2, suppose we want row percentages instead of absolute
counts.
Open the PivotTable Field dialog box for the data field.
Select % of row. Click on the Number button and select percentage format with
0 decimals. Click OK.
To produce a table with both counts and row percentages, place two copies of the
ID field in the DATA area of the table, one with the Count statistic, the other with
“% of row”…
Here, both of the DATA fields are the ID variable, the first set up as a simple count
and the second as a row percentage.
Formatting Tables
To format the numbers in the cells of a PivotTable, use the PivotTable Field dialog
box, as before.
Although many standard Excel formatting techniques can be applied directly to a
table, certain things cannot be done. For example, try changing the title “Grand
Total”.
To have maximum formatting flexibility, make a copy of the entire table using
Paste Special, Paste Values. The copy can be formatted like any other Excel
range.
are listed as …
Warning
A new worksheet is produced for each listing that you request in this way.