0% found this document useful (0 votes)
24 views24 pages

Excel-Statistics-Manual For Physics

xác suất thống kê trong excel

Uploaded by

Trang Trịnh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views24 pages

Excel-Statistics-Manual For Physics

xác suất thống kê trong excel

Uploaded by

Trang Trịnh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 24

UCL

EDUCATION & INFORMATION SUPPORT DIVISION


INFORMATION SYSTEMS

Advanced
Excel

Statistical
Functions and
Formulae

Document No. IS-113 v1


Contents

Introduction............................................................................................ 4
Some key terminology and symbols........................................................1
Data management...................................................................................2
Calculating a new value 2
Recoding a variable 3
Missing values 3
Descriptive measures..............................................................................4
Measures of central tendency.................................................................5
Calculating the Mean, Median or Mode using Excel functions 5
Using formulae in cells to calculate descriptive statistical measures 6
Measures of Dispersion 6
Frequency 7
Measures of Association.........................................................................8
Correlation Coefficient 8
Simple Linear Regression 8
Trends 9
The Analysis ToolPak............................................................................10
Perform an analysis of variance (ANOVA) 11
Learning more.......................................................................................16
Learning more.......................................................................................17

UCL Information Systems 05/01/2007


Introduction
This workbook has been prepared to help you to:
• Manage and code data for analysis in Excel including recoding, computing
new values and dealing with missing values;
• develop an understanding of Excel Statistical Functions;
• learn to write complex statistical formulae in Excel worksheets.
The course is aimed at those who have a good understanding of the basic use of
Excel and sound statistical understanding.
It is assumed that you have attended the Introduction to Excel Formulae &
Functions course or have a good working knowledge of all the topics covered
on that course. In particular, you should be able to do the following:
• Edit and copy formulae
• Use built-in functions such as Sum, Count, Average, SumIf, CountIf and
AutoSum
• Use absolute and relative cell referencing
• Name cells and ranges
You should also have some familiarity with basic statistical measures and tests.
If you are uncertain about the statistical knowledge assumed by the course you
may wish to use the list of key terminology and symbols to revise.
Excel has a number of useful statistical functions. We could classify these in
several ways but in this workbook we are going to look first at the functions
available through the function wizard and then those that require the Analysis
ToolPak. In order to achieve the goal of teaching you something both about
formulae and functions, we will use some of the simple measures as material to
explore best practice in building Excel formulae and then we will introduce the
equivalent built-in Excel functions which will allow us to check the accuracy of
our formulae. For the more complex functions we will find out how these are
used in Excel and how the results are presented. Finally we will build the
Pearson Correlation Coefficient as practise in building complex formulae. This
last exercise uses only general Excel techniques but illustrates good practise in
using Excel.
Some key terminology and symbols
(…)
Used to group operations in formulae. Do everything inside the brackets before
doing anything outside the brackets.
Median
The middle score in a sample. If there is an even number of scores the median
falls midway between the two middle scores.
Mean
The weighted average of the scores: the sum of all the scores divided by the
number of scores for a measure.
Mode
The most frequently occurring score for a measure.
Central tendency
The location of the middle of a distribution – roughly the average.
One Way Analysis of Variance (ANOVA)
The one way analysis of variance allows us to compare several groups of
observations, all of which are independent but possibly with a different mean for
each group. A test of great importance is whether or not all the means are equal.
P-Value
The probability value (p-value) of a statistical hypothesis test is the probability of
getting a value of the test statistic as extreme as or more extreme than that
observed by chance alone, if the null hypothesis H0, is true.
It is equal to the significance level of the test for which we would only just reject
the null hypothesis. The p-value is compared with the significance level and, if it
is smaller, the result is significant. That is, if the null hypothesis were to be
rejected at
= 0.05, this would be reported as 'p < 0.05'.
Range
The highest score for a measure minus the lowest score for a measure.
Regression Line
A regression line is a line drawn through the points on a scatter plot to
summarise the relationship between the variables being studied. When it slopes
down (from top left to bottom right), this indicates a negative or inverse
relationship between the variables; when it slopes up (from bottom right to top
left), a positive or direct relationship is indicated.
Score
The value recorded as the result of an observation or measurement.

Significance level ( )
The significance level of a statistical hypothesis test is a fixed probability of
wrongly rejecting the null hypothesis H0, if it is in fact true.
S

UCL Information Systems 1 Measures of Association


The variance – a measure of the dispersion or spread of scores around its
average.
σ2
Standard deviation – another measure of the dispersion of scores.

Sum of a series of values.
t-test
A test that compares for significant difference between means, either of paired
samples in a repeated measure test or between groups in the independent
samples test. The test assumes both a normal distribution and homogeneity of
variance.
X
The mean for a variable X.
 2

A test of Association that allows the comparison of two values in a sample of data
to determine if there is any relationship between them.

Measures of central tendency 2 UCL Information Systems


Data management
Although Excel doesn’t provide the sophisticated data coding techniques of a
specialist statistical application, there are useful methods for accomplishing some
common data management tasks.

Calculating a new value


Open the file results.sav. You will see the following data in the sheet:

We can label column G Mean Results and then enter the following formula in cell
G2
=sum(D2,E2,F2)/3
and then copy the formula using the fill handle down to row 31. This will calculate
the average exam score for each pupil.

UCL Information Systems 3 Measures of Association


Recoding a variable
Often analysis requires that we recode a variable. Sometimes this is
straightforwardly because we wish, for example, to change the designation of
gender as M or F to 1 or 2. On other occasions we wish to collapse a continuous
value variable into a categorical variable. In the latter case we should usually
recode into a new variable, ie non-destructively.
To recode a continuous into a categorical variable we will use the if function to
compute a new variable Gender in the results.xls spreadsheet that assigns each
pupil to the value M if the variable Sex has value 1 and the value F if Sex has the
value 2.
The general format of an IF statement is
If(logical_test,value_if_true,value_if_false)
In our example the formula should be this:
=IF(G2=1,”M”,”F”)
Be aware that we could have a nested IF statement and that if we do, our catch all,
default condition comes as the last argument of the nested IF. For example,
suppose that you wished to assign each of the pupils to a stream based on a mean
exam score. The formula would be
=IF(G2>60,"Top",IF(G2>50,"Middle","Bottom"))

Missing values
Sometimes you will not have a recorded observation or score for some case of a
variable - that is there will be missing values. In this case, you have to decide how
to manage these cases. Usual practise involves choosing a code to be input
whenever a missing value is encountered for some case or to impute a value for the
missing observations. Since Excel doesn’t have the sophisticated recoding methods
available that specialist packages do, you will have to code missing values yourself
in such a way that your analysis can be carried out accurately.
Choose the codes for your missing values carefully. If you have numeric variables,
remember that there is no way to define a particular value as missing and thus
exclude it from calculations. Therefore, while you might be tempted to code a
missing age as 999 if you do this and then compute mean age, Excel will include all
your 999 year olds. It may be wise to use a string as the missing value since strings
will normally be excluded from Excel’s calculations.

Measures of central tendency 4 UCL Information Systems


Descriptive measures
Below is a list of Excel functions used for descriptive statistical measures.

Function What it does

SUM(range) Adds a range of cells


(SUMIF(range,criteria,sum Adds cells from sum_range if the condition
_range) specified in criteria on range is met.

AVERAGE(range) Calculates the mean (arithmetic average) of a


range of cells

MEDIAN(range) Calculates the median value for a data set; half


the values in the data set are greater than the
median and half are less than the median

MAX(range) Returns the maximum value of a data set

MIN(range) Returns the minimum value of a data set

SMALL(range,k) Returns the kth smallest or kth largest value in a


LARGE(range,k) specified data range

COUNT(range) Counts the number of cells containing numbers


COUNTA(range) in a range
COUNTBLANK(range) Counts the number of non-blank cells within a
COUNTIF(range,value) range
Counts the number of blank cells within a range
Counts the number of cells in range that are the
same as value.

VAR(range) and Calculates the variance of a sample or an entire


VARP(range) population (VARP); equivalent to the square of
the standard deviation

STDEV(range) and Calculates the standard deviation of a sample or


STEVP(range) an entire population (STDEVP); the standard
deviation is a measure of how much values vary
from the mean.

Each of these can be accessed from the menu sequence Insert |Function or using
the function wizard or by writing a formula in a cell.

UCL Information Systems 5 Measures of Association


Measures of central tendency
The most common measures of central tendency are the mean, median and mode.

Calculating the Mean, Median or Mode using Excel functions


1. First, open a new spreadsheet.
2. Click on a blank cell where you will paste a function to calculate the mean,
median or mode.
3. Using the series fill function, enter the series of integer values 1 to 10 in cells A6
to A15.
4. Next click on the function wizard
button.
5. From the drop down list Or select a
category, select Statistical.
6. Click on Average to highlight it, then
on OK.

Using the mouse, I highlight the cells containing the data range just entered or you
can select data by first clicking the collapse icons.
These are the collapse icons and are
used in selecting ranges in many Excel
dialogues.

Excel previews the result of applying


the function here.

Notice that as you fill in the ranges Excel previews the value that will result from
applying the function
Click OK.
The value of the mean will now appear in the blank cell you selected in step 2.

Measures of central tendency 6 UCL Information Systems


To calculate the median or mode, follow the same procedure but highlight MEDIAN
or MODE in step 4. Alternately you can enter the formulae directly into spreadsheet
cells as shown below. All the statistical functions are accessed in the same way and
have a similar interface.
Using formulae in cells to calculate descriptive statistical measures
Mode
The syntax for this computation is
=Mode(Range)
Median
The syntax for this computation is
=Median(Range)
Mean
There is a built in Excel function that returns the mean as its value
=Average(Range)
It is often useful to put the result of this function into a suitably named cell in a
spreadsheet.

Measures of Dispersion
Range
The range of a sample is the largest score minus the smallest score. This can be
calculated using the Excel Formula
=(Max(A1:A10)-(Min(A1:A10)
Variance
The variance in a population is calculated as follows. We won’t build this equation
ourselves in Excel during this session but I give it here so that you can try it in your
own time.

gives the population variance and

gives the sample variance.

This formula depends upon first calculating X and N which we have already seen
above.
The Excel function to calculate the variance for a population is
varp(range)
And for a sample
var(range)

UCL Information Systems 7 Measures of Association


You can access both from the function wizard or use them by typing formulae in
cells.
Standard Deviation
The Standard Deviation is the square root of the deviance. You can calculate it with
the formula
=sqrt(var(range)) or by using the appropriate function, either
stdev(range)or stdevp(range).

Frequency
Another useful statistical function is FREQUENCY. Given a set a data and a set of
intervals, FREQUENCY counts how many of the values in the data occur within each
interval. The data is called a data array and the interval set is called a bins array.
The format for the FREQUENCY function is:

FREQUENCY(DATA,BINS)
FREQUENCY is an array function. This means that the function returns a set of
values rather than just one value. To enter an array function, the range that the
array is to occupy must first be selected and the function must be entered by
pressing Shift+Ctrl+Enter instead of just Enter or using the mouse.

The following worksheet contains the examination results for 14 students. The
numbers in the column headed Score Below is the bins array.
Before keying in the function, you must select the range of the array for the result.
In this case it will be F8:F17.

Measures of central tendency 8 UCL Information Systems


With this range selected, the following function is keyed into the Formula bar:
=FREQUENCY(C4:C17,E8:E17)
Press Shift+Ctrl+Enter.
The array is now filled with data. This data shows that no student scored below 30, 1
student scored between 30 and 39, 3 between 40 and 49, 1 between 50 and 59, 3
between 60 and 69, 1 between 70 and 79, 3 between 80 and 89, and 2 scored
between 90 and 100.
If any of the results are changed, the data in the No. In Range column will be
updated automatically.

UCL Information Systems 9 Measures of Association


Measures of Association
Correlation Coefficient
The Correlation Coefficient is calculated according to the following formula:

We would build a complicated formula like this in steps – incrementally - having


broken it down to its component parts, each of which could be written simply using
standard Excel features. We will construct this formula in the training session. For
now, you can use the built in Excel function:
Using an Excel function
=CORREL(A1:A15,B1:B15)
You will see when we do a later exercise, that the result from the hand built formula
is more than tolerably close to Excel’s result.

Simple Linear Regression


If the correlation coefficient indicates a sufficiently strong relation ship (direct or
inverse) between variables, you may wish to explore that relationship using
regression techniques.
Using an Excel function
Excel has three built in functions that give information about the line of best fit:
Slope(X_values, Y_values) and Linest(X_values, Y_values,Constant)
The Constant is TRUE or FALSE. If False then the Y intercept of the line is set to 0.
You must enter the formula as an array formula because it will return more than one
value. To create an array formula you select the cells in which you want the results
(ie the slope and intercept) to appear and press control-shift-enter and Excel will
enclose the formula in curly braces to signify that the result is an array of values.

Measures of central tendency 10 UCL Information Systems


The Analysis ToolPak
Microsoft Excel provides a set of data analysis tools - called the Analysis ToolPak -
that you can use to save huge amounts of time when you perform complex statistical
analyses. However, there is one catch (see WARNING below).

You input the data and parameters for each analysis and Excel computes the
appropriate statistical measures or test results and displays the results in an output
table. Some tools generate charts in addition to output tables.
Before using an analysis tool, you must arrange the data you want to analyze in
columns or rows on your worksheet. This is your input range.
If the Data Analysis command is not on the Excel Tools menu, you need to install
the Analysis ToolPak:
1. On the Tools menu, click Add-Ins.
2. Select the Analysis ToolPak check box.
3. Install.

To use the Analysis ToolPak:


1. On the Tools menu, click Data Analysis.
2. In the Analysis Tools box, click the tool you want to use.
3. Enter the input range and the output range, and then select the options you want:

Various statistical tools are available via the Analysis ToolPak

UCL Information Systems 11 Measures of Association


Perform an analysis of variance (ANOVA)
You can perform various types of ANOVA analysis using the Excel Analysis ToolPak.
As an example we look at the Analysis of Variance.
One-way ("Single Factor") ANOVA
This example asks whether or not four treatments (3 drugs and one control group)
have a differential effect on the subjects.
"Pain Score" for 3 Analgesics:

Aspirin: Paracetemol (Acetaminophen): Ibuprophen: Control (no drug):

5 4 4 5

4 4 4 5

5 3 5 5

3 4 3 4

5 5 3 5

5 3 5 5

4 4 3 5

If your data are entered in the worksheet as above, then you

The null hypothesis (H0) is that there is no difference between the four groups being
compared. In this example, with a significance level of 95% (a = 0.05), since the
calculated value of F (3.23) is greater than Fcrit (3.01), we reject the null hypothesis
that the three drugs perform equally. A post-hoc comparison or individual pairwise
comparisons would have to be be performed to determine which pair or pairs of
means caused rejection of the null hypothesis.

Measures of central tendency 12 UCL Information Systems


Two-way ANOVA ("Two-factor without replication")

Apple codling moth (Cydia pomonella) caught in pheromone traps:


Bait 1: Bait 2:
19 20
22 22
19 18
Orchard 1:
18 19
20 19
21 20
22 21
19 19
19 18
Orchard 2:
18 18
20 20
21 22

As always, the null hypothesis (H0) is that there is no difference between the
groups being compared.

In this example, with a significance level of 95% (a = 0.05), the calculated value
of F (10.57) for the table rows (Orchard 1 vs. Orchard 2) is greater than Fcrit
(2.82), so the hypothesis that there is no difference between the orchards is
rejected.

However, the calculated value of F (0.48) for the table columns (Bait 1 vs. Bait 2)
is less than Fcrit (2.82), so the hypothesis that there is no difference between the
pheromone baits is accepted.

UCL Information Systems 13 Measures of Association


:The Analysis ToolPak also contains the following tools
 Correlation analysis tool
 Covariance analysis tool
 Descriptive Statistics analysis tool
 Exponential Smoothing analysis tool
 Fourier Analysis tool
 F-Test: Two-Sample for Variances analysis tool
 Histogram analysis tool
 Moving Average analysis tool
 Perform a t-Test analysis
 Random Number Generation analysis tool
 Rank and Percentile analysis tool
 Regression analysis tool
 Sampling analysis tool
 z-Test: Two Sample for Means analysis tool

Measures of central tendency 14 UCL Information Systems


UCL Information Systems 15 Measures of Association
Measures of central tendency 16 UCL Information Systems
Learning more
Central IT training
Information Systems runs courses for UCL staff, and publishes documents for staff
and students to accompany this workbook as detailed below:
Getting started with This 3hr course is for those who are new to
Excel spreadsheets or to Excel, and wish to explore the
basic features of spreadsheet design. Note that it
does not cover formulae and functions.
Getting more from This 3hr course is for users of Excel who wish to
Excel (no formulae or learn more about the non-mathematical features of
functions) Excel and to work more efficiently.
Using Excel to manage This 3hr course is for those already familiar with
lists Excel and would like to use some of its basic data-
handling functions.
Excel formulae & This 3hr course is aimed at introducing users, who
functions are already familiar with the Excel environment, to
formulae and functions.
More Excel formulae & This 3.5hr course is aimed at competent Excel users
functions who are already familiar with basic functions and
would like to know what else Excel can do and try
some more complex IF statements.
Advanced formulae & This 3.5hr course is aimed at competent Excel users
functions who are already familiar with basic functions. It
aims to introduce you to functions from several
different categories so that you are equipped to try
out other functions on your own.
Excel statistical This course aims to introduce you to built-in Excel
functions statistical functions and those in the analysis tool
pack. The course covers major descriptive,
parametric and non-parametric measures and tests.
Excel statistical This course covers best practise in constructing
formulae complex statistical formulae in spreadsheets using
common statistical measures as example material.
Excel tricks and tips This is a 2hr interactive demonstration of popular
Excel shortcuts. It aims to help you find quicker
ways of doing everyday tasks. This fast-paced course
is also a good all-round revision course for
experienced Excel users.
Pivot tables Pivot tables allow you to organise and summarise
large amounts of data by filtering and rotating
headings around your data. This 2 hr course also
shows you how to create pivot charts.

UCL Information Systems 17 Measures of Association


Advanced Excel – Data This course aims to help you learn to use some less
analysis tools common Excel features to analyse your data.
Advanced Excel – Would you like to customise and automate Excel to
Setting up & perform tasks you do regularly? If you are an
automating Excel experienced user of Excel, then this course is for
you.
Advanced Excel – Do you share workbooks with others? Would you
Importing data and like to see who has updated what? Do you know how
sharing workbooks to import data from text files or databases? This
course aims to show you how.

These workbooks are available for students at the Help Desk.

Open Learning Centre


 The Open Learning Centre is open every afternoon for those who wish to obtain
training on specific features in Excel on an individual or small group basis. For general
help or advice, call in any afternoon between 12:30pm – 5:30pm Monday – Thursday, or
12:30pm – 4:00pm Friday.
 If you want help with specific advanced features of Excel you will need to book a session
in advance at: www.ucl.ac.uk/is/olc/bookspecial.htm
 Sessions will last for up to an hour, or possibly longer, depending on availability. Please
let us know your previous levels of experience, and what areas you would like to cover,
when arranging to attend.
 See the OLC Web pages for more details at: www.ucl.ac.uk/is/olc

Online learning
There is also a comprehensive range of online training available via
TheLearningZone at: www.ucl.ac.uk/elearning

Getting help
The following faculties have a dedicated Faculty Information Support Officer (FISO)
who works with faculty staff on one-to-one help as well as group training, and
general advice tailored to your subject discipline:
 Arts and Humanities
 The Bartlett
 Engineering
 Maths and Physical Sciences
 Life Sciences
 Social & Historical Sciences
See the faculty-based support section of the www.ucl.ac.uk/is/fiso Web page for more
details.

Measures of central tendency 18 UCL Information Systems


A Web search using a search engine such as Google (www.google.co.uk) can also
retrieve helpful Web pages. For example, a search for "Excel tutorial” would return
a useful selection of tutorials.

UCL Information Systems 19 Measures of Association

You might also like