Excel-Statistics-Manual For Physics
Excel-Statistics-Manual For Physics
Advanced
Excel
Statistical
Functions and
Formulae
Introduction............................................................................................ 4
Some key terminology and symbols........................................................1
Data management...................................................................................2
Calculating a new value 2
Recoding a variable 3
Missing values 3
Descriptive measures..............................................................................4
Measures of central tendency.................................................................5
Calculating the Mean, Median or Mode using Excel functions 5
Using formulae in cells to calculate descriptive statistical measures 6
Measures of Dispersion 6
Frequency 7
Measures of Association.........................................................................8
Correlation Coefficient 8
Simple Linear Regression 8
Trends 9
The Analysis ToolPak............................................................................10
Perform an analysis of variance (ANOVA) 11
Learning more.......................................................................................16
Learning more.......................................................................................17
Significance level ( )
The significance level of a statistical hypothesis test is a fixed probability of
wrongly rejecting the null hypothesis H0, if it is in fact true.
S
A test of Association that allows the comparison of two values in a sample of data
to determine if there is any relationship between them.
We can label column G Mean Results and then enter the following formula in cell
G2
=sum(D2,E2,F2)/3
and then copy the formula using the fill handle down to row 31. This will calculate
the average exam score for each pupil.
Missing values
Sometimes you will not have a recorded observation or score for some case of a
variable - that is there will be missing values. In this case, you have to decide how
to manage these cases. Usual practise involves choosing a code to be input
whenever a missing value is encountered for some case or to impute a value for the
missing observations. Since Excel doesn’t have the sophisticated recoding methods
available that specialist packages do, you will have to code missing values yourself
in such a way that your analysis can be carried out accurately.
Choose the codes for your missing values carefully. If you have numeric variables,
remember that there is no way to define a particular value as missing and thus
exclude it from calculations. Therefore, while you might be tempted to code a
missing age as 999 if you do this and then compute mean age, Excel will include all
your 999 year olds. It may be wise to use a string as the missing value since strings
will normally be excluded from Excel’s calculations.
Each of these can be accessed from the menu sequence Insert |Function or using
the function wizard or by writing a formula in a cell.
Using the mouse, I highlight the cells containing the data range just entered or you
can select data by first clicking the collapse icons.
These are the collapse icons and are
used in selecting ranges in many Excel
dialogues.
Notice that as you fill in the ranges Excel previews the value that will result from
applying the function
Click OK.
The value of the mean will now appear in the blank cell you selected in step 2.
Measures of Dispersion
Range
The range of a sample is the largest score minus the smallest score. This can be
calculated using the Excel Formula
=(Max(A1:A10)-(Min(A1:A10)
Variance
The variance in a population is calculated as follows. We won’t build this equation
ourselves in Excel during this session but I give it here so that you can try it in your
own time.
This formula depends upon first calculating X and N which we have already seen
above.
The Excel function to calculate the variance for a population is
varp(range)
And for a sample
var(range)
Frequency
Another useful statistical function is FREQUENCY. Given a set a data and a set of
intervals, FREQUENCY counts how many of the values in the data occur within each
interval. The data is called a data array and the interval set is called a bins array.
The format for the FREQUENCY function is:
FREQUENCY(DATA,BINS)
FREQUENCY is an array function. This means that the function returns a set of
values rather than just one value. To enter an array function, the range that the
array is to occupy must first be selected and the function must be entered by
pressing Shift+Ctrl+Enter instead of just Enter or using the mouse.
The following worksheet contains the examination results for 14 students. The
numbers in the column headed Score Below is the bins array.
Before keying in the function, you must select the range of the array for the result.
In this case it will be F8:F17.
You input the data and parameters for each analysis and Excel computes the
appropriate statistical measures or test results and displays the results in an output
table. Some tools generate charts in addition to output tables.
Before using an analysis tool, you must arrange the data you want to analyze in
columns or rows on your worksheet. This is your input range.
If the Data Analysis command is not on the Excel Tools menu, you need to install
the Analysis ToolPak:
1. On the Tools menu, click Add-Ins.
2. Select the Analysis ToolPak check box.
3. Install.
5 4 4 5
4 4 4 5
5 3 5 5
3 4 3 4
5 5 3 5
5 3 5 5
4 4 3 5
The null hypothesis (H0) is that there is no difference between the four groups being
compared. In this example, with a significance level of 95% (a = 0.05), since the
calculated value of F (3.23) is greater than Fcrit (3.01), we reject the null hypothesis
that the three drugs perform equally. A post-hoc comparison or individual pairwise
comparisons would have to be be performed to determine which pair or pairs of
means caused rejection of the null hypothesis.
As always, the null hypothesis (H0) is that there is no difference between the
groups being compared.
In this example, with a significance level of 95% (a = 0.05), the calculated value
of F (10.57) for the table rows (Orchard 1 vs. Orchard 2) is greater than Fcrit
(2.82), so the hypothesis that there is no difference between the orchards is
rejected.
However, the calculated value of F (0.48) for the table columns (Bait 1 vs. Bait 2)
is less than Fcrit (2.82), so the hypothesis that there is no difference between the
pheromone baits is accepted.
Online learning
There is also a comprehensive range of online training available via
TheLearningZone at: www.ucl.ac.uk/elearning
Getting help
The following faculties have a dedicated Faculty Information Support Officer (FISO)
who works with faculty staff on one-to-one help as well as group training, and
general advice tailored to your subject discipline:
Arts and Humanities
The Bartlett
Engineering
Maths and Physical Sciences
Life Sciences
Social & Historical Sciences
See the faculty-based support section of the www.ucl.ac.uk/is/fiso Web page for more
details.