0% found this document useful (0 votes)
122 views24 pages

An Introduction To Excel-2019

Document introducing you to excel

Uploaded by

Joel Thompson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views24 pages

An Introduction To Excel-2019

Document introducing you to excel

Uploaded by

Joel Thompson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

MONASH

SCIENCE

Introduction to MS Excel ©
for

basic statistical techniques

School of Mathematical Sciences


2

CONTENTS

THE VERY BASIC BASICS............................................................................................3


1. ENTERING AND MODIFYING INFORMATION .....................................................3
a) Formulae in Excel ....................................................................................................4
2. USING DATA ANALYSIS TOOLPAK ......................................................................7
2.1 Loading Data Analysis in MS Excel ® 2007 and above ..................................7
2.2 Features in Data Analysis ....................................................................................7
2.3 To Obtain Descriptive Statistics ..........................................................................9
2.4 To Obtain a Histogram........................................................................................10
2.5 Drawing Box-Plots ...............................................................................................11
2.6 Generating Random Numbers ..........................................................................12
2.7 Scatterplot and line of best fit (Regression) ....................................................13
3. EXCEL FUNCTIONS FOR DISTRIBUTIONS .......................................................16
4. EXCEL and INFERENCE .........................................................................................18
4.1 Confidence Interval on a Single Mean .............................................................18
4.2 Difference between Two Means (t-tests) .........................................................19
4.3 ANOVA (Analysis of Variance) - Single Factor ...............................................21
4.4 Chi-Squared Test of Independence..................................................................22
5. NAMING CELLS and RANGES ..............................................................................23
6. FURTHER RESOURCES.........................................................................................24

This edition prepared by Dr Dianne Atkinson 2019.


School of Mathematical Sciences, Monash University, Clayton, Vic. 3800
3

The goal of this brief manual is to help you get started in Microsoft Excel 2016 or above
for basic computations, data summary and data analysis. This will be a powerful tool for
you in the preparation of reports in all disciplines in science.
Note this manual is not a comprehensive guide to Excel. It is primarily to give you
background in the features used in statistics in which you probably have had limited to no
experience of before. You will need to refer to it throughout this course.
THE VERY BASIC BASICS
When you first open MS Excel, a new file—called a workbook—is displayed on your
screen. A workbook consists of various sheets. When you open a new workbook, it
contains three worksheets. Near the bottom of the screen you should see tabs labelled
Sheet1, Sheet2, and Sheet3. You can add or delete or copy worksheets as needed by
clicking on tab and right-mouse click for a menu. Using separate sheets in a workbook can
be helpful to organise your work when you have many sections & related information, all
within the one file. They can be renamed also by double-clicking on the tabs and typing.
A spreadsheet is an array of cells organised in rows and columns with an alphanumeric
system:
 Rows are numbered from 1,2,3,….. to 65,536 and
 Columns are labelled alphabetically: A, B, C, , X, Y, Z, AA, AB, AC,  to IV
(256 columns in total).
Each cell is identified by the column and row that intersect at its location. Thus D3 is the
cell reference for the cell in the fourth column and the third row.
There is an alternative cell reference style called (R1C1) in which the columns are also
designated by numbers eg R3C4 for cell at row 3 and column 4 is D4. If your computer
defaults to this setting it can be changed:
 In Office 2007 and above: go to Office symbol in top left hand corner>Excel
Options>Formulas and un-clicking the box R1C1 reference style.

1. ENTERING AND MODIFYING INFORMATION


Three types of information can be entered into a cell: labels, values and formulas. Press the
<Enter> key when you have typed the information into each cell, or use the arrows to
move to another cell. Cells can be formatted to represent different types of data via
Format>Cell on the top toolbar; eg a choice of currency in number style will put in the $
sign etc. More on formatting later.
 Labels are character strings that are typically used for headings or comments.
Example: Growth of investment
 Values are numbers such as 1.21, $6.75, 5% entered as 1.21, 6.75, or 0.05 into
specially formatted cells for number, currency or percentage, respectively.
 Formulae are mathematical expressions that use the values or formulae in other
cells to create new values or formulas.
Examples:
= B3 = A7+1 = B7*(1+B4) = SUM(A7:A12)
4

a) Formulae in Excel
All formulas must begin with an equals sign = and are entered directly into the cell. This
can be either by hand using certain operators or codes as described below or by using in-
built functions. A formula may contain constants or use cell references containing values
(see above examples) to calculate the final result.
Operators
Arithmetic operator Description Example Result

+ (plus sign) Addition =3+3 6

– (minus sign) Subtraction =3–1 2

* (asterisk) Multiplication =3*3 9

/ (forward slash) Division =3/3 1

^ (caret) Exponentiation =2^3 8

The Paste Function Wizard on the toolbar is a library of in-built functions so that you
do not have to know all of Excel’s shorthand. Investigate menus, particularly the
“Statistical” one. Some Common Functions with their formulae built-in to Excel are:

Built-in Function Description Example


=SQRT Square Root =SQRT(A6)

=LOG Log in base 10 =LOG(B1)

=LN Natural Log =LN(A1)

=EXP Exponential =EXP(A1)

=SUM Sum =SUM(D1:D23)

=AVERAGE Average =AVERAGE(D1:D23)

SAMPLE Standard
=STDEV.S =STDEV.S(D1:D23)
Deviation

Eg First Quartile Q1,


=QUARTILE.EXC =QUARTILE.EXC(D1:D23,1)
excluding the median

=SUMSQ Sum of Squares =SUMSQ(D1:D23)

Use also to learn about


specific functions. Each
function is described briefly
here and more detail is
available via Help ?
5

b) Writing Formulae
Exercise 1: Value of an Investment
 Open a new spreadsheet and enter the
information in the diagram for an annually
compounded interest:
(Don’t worry that the columns do not appear wide
enough. You can change the column width easily
by putting the cursor on the line between column
labels A and B at the top of the sheet and dragging
it to the desired width).

The formula =B3 will show the value 1000 when you press Enter .

 To change appearances to
reflect currency and
percentages, Formatting of the
number can be done from the
“Number” section on the
toolbar of “Home” tab. Select
the cell of interest and press the
$ or the % button as required:

 The 0 and 1 in cells A7 and A8 of the Year column show the pattern of wanting to count
in 1s. Highlight these two cells (A7 and A8) and use the “Fill Handle” of NOTE 1
below to copy this pattern down the column to A57 (50 years).

NOTE 1: Copying a cell


Use the mouse to position the cursor on the bottom right
corner of the highlighted cell or cells.
The cursor changes from a block cross to a thin cross, ,
which is the Fill Handle.
Depress the left mouse button and drag down.

 Now select cell B8 and enter the formula =B7*(1+B4)


Cell B8 should contain the number 1050, with the above formula displayed in the text
formula entry area when B8 is the active cell.
We now want to copy the formula in B8 into money cells against each year: B9 to B57.
Both references in cell B8 are relative, so if we were to copy this formula to B9, the
formula in B9 would be
= B8*(1 + B5).
6

The problem here is that the cell B5 no longer contains the interest rate. We need to make
part of this reference absolute (by placing a dollar sign, $, in front of the part of the cell
designation that would change…here copying down a column this is the row number that
would change otherwise). Thus you can now copy the formula effectively without
changing the reference to the constant value in the formula.

Select B8, click into the formula bar at top and edit the formula to:
= B7*(1+B$4). The value 1050 will remain in cell B8. ($B$4 would also be OK)

 Select cell B8 and copy down to cell B57


using the “fill handle (+) as before.

Your spreadsheet should now look like this:

NOTE 2: Relative versus Absolute Referencing


Relative references When you create a formula, references to cells or ranges are usually
used to find values already in the spreadsheet. If you are in C4 and wish to use a value in
cell B3 in the formula, Excel interprets this as “find the value one cell to the left (CB)
and one row up (43) and use it here”. This is a relative reference. When you copy a
formula that uses relative references, Excel automatically adjusts the references in the
pasted formula. So copying down column by 1 cell would change this to =B4. Copying
across row would change formula to =C3 etc.

Absolute references If you don't want Excel to adjust references when you copy a
formula to a different cell, use an absolute reference or a name. You can create an absolute
reference by placing dollar signs ($): in front of row number for copying down the column,
and in front of column letter for copying across row, OR in front of both.

To see the power of a spreadsheet, we need only change the Principal or Interest Rate, and
the spreadsheet automatically calculates the new values.
 Change the interest rate to 10% and observe the changes
 Change the principal to $5000 and observe the changes
7

2. USING DATA ANALYSIS TOOLPAK

2.1 Loading Data Analysis in MS Excel ®


Click on the “DATA” tab. With the cursor hovering over this tab, Click the Right Mouse
… a drop down menu appears and select “Customise the Quick Access Toolbar”
On the new screen:

 Select “Add-ins” on the left-hand side.

AND

 At the bottom of this screen you see


“Manage Excel Add-ins” next to “Go”. Click Go.

An “add-ins” pop-up screen appears. Select the box


next to “Analysis Toolpak” and click “OK”.
A pop-up screen appears with “… Feature not
currently installed. Would you like to install now?”
Click “YES”.
This results in “Wait for Configuration Process”.
Be patient as this may take a few minutes.
The “Data Analysis” option should then appear on
the Quick Access Toolbar at the right-hand side in a
new DATA tool section called “Analysis”.

FOR MAC OPERATING SYSTEMS (from 2016 Excel for Mac versions only):
Go to Tools on top tool bar…. Select Add-ins, and Tick “Analysis Toolpak”. The data
Analysis module is then accessed via the TOOL Tab.
This procedure only needs to be done once as “Data Analysis” will remain on logging out.

Excel 2007 up to 2015 for Mac DO NOT include this Data Analysis feature.
You can update for free as a Monash student to the latest Office version via the My Monash
Software … download MS Office 365.
8

2.2 Features in Data Analysis


Within “Data Analysis” there
are built-in multi-stage
procedures or tests.
The menu looks like

Note in particular the


following features:
 Descriptive Statistics
describe the centre and
spread of a data set for a single variable. Statistics such as mean, median and standard
deviation, maximum and minimum are displayed in one table.

The various functions can be found individually using the Paste Function Wizard
on the toolbar BUT many descriptors together can be obtained in one table via Data
Analysis.
 A Histogram is the bar chart showing the frequency distribution of values in a data
set for a single variable. It gives a visual picture of the spread of the data.
 Random Number Generation is useful to allocate a random number to an ordered list
and after sorting according to the random number the list can be randomised. This is
important when choosing a random sample from a list.
 Regression will be used to investigate the appropriateness and strength of a straight
(linear) line placed on two variable (x,y) data. How does the variable y change with a
change in variable x?
 t-Test: ….. of different types in which the hypothesis test for comparing the means of
a quantitative variable across two groups. This is performed directly from the data
from two samples.
For STA1010 only see also:
 Anova: Single factor is a hypothesis test involving the question “Is there a difference
between the means of more than 2 treatment groups?”
9

2.3 To Obtain Descriptive Statistics


Exercise 2: Uni-variable sample data
In 1798 the English scientist Henry Cavendish measured the density of the earth by careful
work with a torsion balance. The variable recorded was the density of the earth as a
multiple of the density of water. Here are his measurements:

5.50, 5.61, 4.88, 5.07, 5.26, 5.55, 5.36, 5.29, 5.58, 5.65, 5.57, 5.53, 5.62, 5.29,
5.44, 5.34, 5.79, 5.10, 5.27, 5.39, 5.42, 5.47, 5.63, 5.34, 5.46, 5.30, 5.75, 5.68,
5.85

 Download the file from Moodle or Enter these values in an Excel column, eg cells
A2 to A30 with a heading in A1.

Select Data Analysis > Descriptive Statistics

 Input Range: A1:A30 by highlighting the data (I have included the heading row!).
 Click radio button for
Grouped by Columns as your
data is set down in a column.
 Click radio button for “Labels
in first row” IF you did include
the heading in the Input Range.
 Click radio button for Output
range and enter C1 in the
space. (Cell C1 will be the upper
left hand cell of the output
generated.)
 Check Summary Statistics box
 OK

You will notice a large amount of information is calculated; much of which ( eg kurtosis,
skewness) will not concern this unit.

Note the measures of the centre of the data: mean (5.45) and median (5.46).
Note the measures of spread of the data: Standard deviation (0.22) and range (= max-min =
5.85-4.88 = 0.97. The other important measure of spread is the Interquartile Range (=Q3-
Q1) which cannot be found here. You can find the quartiles using the built-in function
=QUARTILE.EXC(cell range, x) (see ).
10

2.4 To Obtain a Histogram


A frequency distribution is the count of the occurrences of data values that fall within set
intervals of values over the range of the values in the data set for the sample. For example,
how many data values in Cavendish experiment are between 5.2 and 5.4?
A histogram is the bar chart representation of the distribution of values. A good detail is to
have about 8-12 bars and very few intervals with very small (0 or 1 or 2) counts.
The first thing to do is to identify the range of values and the intervals in which you wish
the count the data 9 called the BIN range). These intervals should be uniform and “make
sense” in practical terms: To choose our own bin intervals, we need to look at the highest
and lowest values and establish the appropriate interval size to fit about 8-10 intervals
(bars) in this range?

For Cavendish experiment: Take a step below the min and a step above the max the data
lies between about 4.8 and 6.0. This is a span of 1.2 and needing about 8-10 bars means
intervals of 0.1 (12 bars) or 0.2 (6 bars) … opt for the latter (intervals as 0.2) If 0.1was used
there would be to many zero and very small counts. The upper limit for each interval is
then 4.8, 5.0, 5.2, 5.4, 5.6, 5.8, and 6.0.
In the Excel spreadsheet: Set up a column (column F) with heading “Relative density of
earth” (variable’s title for auto entry as label on x-axis of histogram) in F1 and enter all
these values in the column (one number per cell, eg in F1 to F8).

Select Data Analysis > Histogram

 Input range: A1:A30


 Bin range: F1:F8
 Tick “Labels”
 Output options: Click radio
button for Output range, put
cursor in blank space and enter
K1
 Check Chart output box
 OK

NOTE that the output is a table of frequencies and a chart. The chart from Excel does need
editing to have an acceptable appearance:

edit
11

This edit was achieved by:


 Select the “Bin” label IF you did not title the bin range before with the actual variable
and you can edit the wording by typing and “Enter”
 Select the heading “Histogram” and Delete. A caption below a figure is a better
explanation of the figure in a report.
 Remove the legend “Frequency” by highlighting it and pressing “Delete”.
 In Chart Tools tab go to “Quick layouts” Select layout 8 to close the gaps between
columns.
 Double click on Chart area… “Format Plot Area” window pane slides in from right
and select paint can for Fill/Border options OR Click on one of the blue bars, right
mouse click for menu: in “Border” select “Solid Line” and make “color”  “Black”
to outline bars.

NOTE: The x-axis scale and label alignment is a problem with histograms in Excel and
cannot be fixed.

Better axis labelling and ALSO for Office for Mac users:

If you would like a histogram that is a little bit better than the simple histograms that use
the Excel Column chart type (particularly for continuous-valued data with proper labelling
of the horizontal axis), you could try the free “Better Histogram add-in for excel” from
TreePlan, available for download from the Histogram page at
http://www.treeplan.com/better.htm

NEW to Excel 2016: Direct drawing of a histogram is possible by highlighting


data; go to Insert and Charts…. Choose histogram and left option.

A big problem with this approach is that the x-axis scaling (Bin) is chosen by Excel and it
will probably NOT make sense eg counts in columns 4.88-5.13, etc … This is not an
acceptable x-axis scale. It should be 4.80-5.00 (counting in 0.25s) or 4.80-5.00 (counting
in 0.20s) – usual scale tick numbers for easy reading and understanding.
By clicking on the x-axis and right mouse you can format the x-axis … BUT it is not
straightforward to obtain a correct scale with a reasonable staring point.

I suggest taking the Data Analysis approach whereby you select the starting value and set
class intervals for the x-axis scale!

2.5 Drawing Box-Plots


New in the 2016 version of excel is the ability to draw box-plots.
Highlight the data; go to Insert and Charts…. Choose “Box and Whisker”. The outliers are
identified correctly here as well. You will need to select “values” to obtain the 5-number
summary values – these are also required in reporting..

See https://www.youtube.com/watch?v=TxuretcM5Uk BUT use


=QUARTILE.EXC(_:_,x); not the .INC version used in this clip!
12

2.6 Generating Random Numbers


There are several ways of generating random numbers; including drawing out of a hat,
drawing out of a hat with replacement and Sampling using Excel Data Analysis.
But we will use a very efficient way to obtain a random sequence without repetition:
Random Number Generation using Excel Data Analysis
Exercise 3: Randomising an Ordered List
Set up an ordered list in a column in Excel: eg the alphabet in order
Say we wish to choose 5 letters at random from the alphabet with all letters having equal
chance of being selected.
In the column next to the ordered list, place a random number (any number) next to every
letter in the list by:
Open Data Analysis> Random Number
Generation and enter the following values:
 Number of variables = 1
(for 1 column of random numbers)
 Number of random numbers = 26
(appropriate to question)
 Distribution = Uniform
 Leave as Between 0 and 1.
 Enter the last 4 digits of your ID
number as a random seed
 Output range B2 as start of column
 Leave everything else blank
 OK.

This is now an ORDERED LIST with a corresponding RANDOM NUMBER. If the two
columns are now linked and the RANDOM column is sorted into order  the LIST
column will correspondingly be RANDOMISED:
 Highlight the entire 26x2 array. You can include the headings to make it easier.
 Open Data> Sort
 Tick “My data has headers” if
column headings were included.
 Sort by ... in the drop down menu
select column “Random Number”

The LIST will now be randomised, and the first 5 letters will
be my randomly chosen letters: : D, R, P, N, and E in this
example.
13

2.7 Scatterplot and line of best fit (Linear Regression)


A scatterplot is the graphical representation of x-y data. The line of best fit is the straight
line placed on the data such that all data points are as close as possible to the line. In-built
into Excel, the Method of Least Squares is used to position the line of best fit. The line is
described by its equation of the line and the closeness of fit is described by the correlation
coefficient, both can be obtained in the scatterplot.
A full regression analysis including the residual plot and inference is obtained in Data
Analysis.

Exercise 4: Relationship between Weight and Blood Pressure


The table lists the systolic blood pressure and weight (kg) of a group of males, aged 50-55,
who have been diagnosed with high BP
Enter this information into Excel.

(i) Produce scatterplot:


 Highlight the data of the x and y
columns. Excel always takes the
column furthest to the left as x and
any columns to the right as y.
 Go to Insert tab > Charts and
choose “Scatter”>Scatter with
markers only. This produces the
basic plot as shown on the
following page.

Like all Excel charts this plot requires


editing (axes labels, line of best fit and its equation, correlation) to be presentable:

These edits were achieved by:


 With the plot selected by clicking the left mouse into area, go to Chart
Tools tab at top in centre. Select “Quick Layouts”. From the drop-down
menu choose “layout 9” that gives you the axes labels and the added
trendline and it equation directly.
 Select the heading and “Delete”. A caption below a figure is a better
explanation of the figure in a report.
14

 Select the legend and “Delete”. Only one (x,y) pair ( called a series) is plotted so no
need for a legend.
 Select the minor horizontal gridlines and “Delete”.
 Change x-axis and y-axis labels by selecting each and typing an appropriate label with
units. This first appears only in the bar at the top … after typing, press “Enter” to place
the words at the selected axis.
 Change each axis’ scale to start the plot so that the data fills the whole plot area:
 Place the cursor near the x-axis or one of its values, click left mouse to select axis
 Click right mouse for menu and select “Format Axis”
 On the right hand side you will see a “Format Axis” pane and select “Axis
Options”. Change Minimum to “75” and press enter.
 Repeat the step above by selecting the y-axis. Enter “100” as the minimum. Click
on the chart to accept.
 Click on the trendline and in Format pane change to a solid line, black, and 0.75pt.
 Move the equation box to a clearer position.

(ii) Regression Analysis including Residual plot


Regression analysis involves placing this “line of best fit” AND assessing the goodness of
the fit of the added trendline through a residual plot. The full inferential statistical analysis
of the slope is also given (STA1010 only).
The residual is the difference between data value and the line’s value at each point. The
residuals will be randomly distributed ± along the line IF the line is an appropriate
representation of the data. If the linear relationship of the line of best fit is NOT
appropriate a pattern (like a curve ∩ or U) would be seen in the residuals as you progress
along the line.
To determine regression line, residuals and residual plot:
Open Data Analysis > Regression:
Note that it asks for Y-range first!
 Input Y Range: B2 : B12
 Input X Range: A2 : A12
 Check labels box if headings
included

Output range: A18


 Tick Residuals and Residual
Plots (Do not ask for the Line
Plot here as it is not a good plot.)
 OK

The “SUMMARY OUTPUT” is a large table that contains full regression information and
the important residual plot as on the following page:
15

In SCI1020 the important information here is:

 Correlation (without sign!


“Multiple R”) and correlation
squared (“R Square”),
 The line of best fit equation
given by y = mx +c where
c = “Coefficients” for Intercept
m = “Coefficients” for slope:
“weight” in this example
(sometimes title is X-Variable)

The equation in the example is


BP= 0.7506 Weight + 72.489
(as seen before on the
scatterplot).

In STA1010, the information needed is as for SCI1020 plus the inference on the slope
which involves Regression on the SLOPE via a test P-value and Confidence interval on
the slope Upper 95% and Lower 95%for the X-variable.

Residual plot

This output needs slight editing also:


 Make the heading smaller font; eg,12 pt
instead of 18pt.
 Change the x-axis minimum to coincide
with that of your scatterplot.
 Extend the plot vertically and pull in
horizontally to observe the scatter of the
data points along the line of best fit.

This residual plot is randomly scattered with random + and – along the line of best fit. This
indicates that the linear relationship (described by this line of best fit) is appropriate to the
data in this range.
16

3. EXCEL FUNCTIONS FOR DISTRIBUTIONS


The following functions are used in Inferential Statistics, as will be explained in the unit’s
lectures and support classes.

NORMSDIST
Returns the standard normal cumulative distribution
function. The distribution has a mean of 0 (zero) and a
standard deviation of one.
NORMSDIST gives the probability, from the samples z-
value, of that sample or LESS.
Syntax in Excel is: =NORMSDIST(z)
Z is the value for which you want the distribution.
NB: The Standard Normal distribution in Excel and in tables is a LEFT-SIDED interval
area. If you want the RIGHT-SIDED area ( the greater than probability) then the formulas
is =1-NORMSDIST(z). The total area under any distribution is 1 (100% of possibilities).

NORMSINV
Returns the inverse of the standard normal cumulative distribution. The distribution has a
mean of zero and a standard deviation of one. This is NORMSDIST in reverse: given the
probability (that z-value or less) what is the standardised score, the z-value?
Syntax in Excel is: =NORMSINV(probability)
Probability is a probability corresponding to the normal distribution (as in the diagram: the
left hand interval area).
NB: The Standard Normal distribution in Excel and Tables is LEFT-SIDED interval area.

TDIST
Returns the Percentage Points (probability) for the Student’s t-distribution where a numeric
value (x) is a calculated value of t for which the Percentage Points are to be computed. The
Student’s t-distribution is used to find the p-value in the hypothesis testing of means when
the population standard deviation, σ, is not known (as is the usual case).
TDIST gives the probability, from the samples t-value, of that sample or more extreme.
Syntax in Excel is: =TDIST(x,degrees_freedom,tails)
X is the numeric value of the t-value.
degrees_freedom = sample size – 1 = (n-1)
Tails specifies the number of distribution tails to return. If
tails = 1, TDIST returns the one-tailed distribution. If tails
= 2, TDIST returns the two-tailed distribution.
NB: The t-distribution in Excel and tables gives a TAIL
area
17

TINV
Returns the t-value of the Student's t-distribution as a function of the probability and the
degrees of freedom. This is TDIST in reverse: given the probability what is the t-value?
Syntax in Excel is =TINV(probability,degrees_freedom)
Probability is the probability associated with the two-tailed Student's t-distribution.
degrees_freedom is the number of degrees of freedom with which to characterize the
distribution, = sample size – 1 = (n-1)
Remarks
 A one-tailed t-value can be returned by replacing probability with 2*probability.
For a probability of 0.05 and degrees of freedom of 10, the two-tailed value is
calculated with TINV(0.05,10), which returns 2.28. The one-tailed value for the
same probability and degrees of freedom can be calculated with TINV(2*0.05,10),
which returns 1.812.

CHIDIST
Returns the one-tailed probability of the chi-squared distribution. The χ2 distribution is
associated with a χ2 test. Use the χ2 test to compare observed and expected values. By
comparing the observed results with the expected ones, you can decide whether your
original hypothesis is valid.
CHIDIST gives the probability, from the samples chi-squared value, of that sample or
MORE EXTREME (the tail area).
Syntax in Excel is: =CHIDIST(x,degrees_freedom)
X is the value of chi-squared calculated from observed and expected counts.
degrees_freedom is the number of degrees of freedom.= (#rows-1)x(#colums-1)
18

4. EXCEL and INFERENCE


The features in Data Analysis on Windows
PCs that can be used in simple Inference
techniques are:
 Descriptive statistics
 t-Test: Paired Two Sample for Means
 t-Test: Two-Sample Assuming Equal
Variances (un-pooled general approach)
 ANOVA –Single Factor

Excel is limited in its use in Inference.


It does NOT include Confidence Intervals
on the difference between means and any form of inference on proportions.

4.1 Confidence Interval on a Single Mean


This is done through the Data/Data Analysis  “Descriptive Statistics” option.
Example: Cavendish’s data from Section 2.3, page 8.

To obtain the margin of error about a sample


mean an extra tick is placed on the “Confidence
level for the Mean” choice AND the desired
level of confidence written as a %.

This margin of error is calculated using a


multiplier based on the t-distribution with
degrees of freedom, n-1:
𝑠
Margin of error = 𝑡𝛼,𝑛−1 × √𝑛
2

The resulting output is similar to the “descriptive


statistics” exercise done earlier EXCEPT for the extra
last line “Confidence level (95.0%)

This indicates that the 95% confidence interval of


(mean ± margin of error) for the relative density of the
Earth based on Cavendish’s data is: 5.4479 ± 0.0840
which is evaluated to be (5.36, 5.53) to two decimal
places as in the original data.
19

4.2 Difference between Two Means (t-tests)


(i) Difference between Two Independent (or unpaired) Means
This applies when the two populations are not dependent on each other - the data is from
unpaired (or independent) samples.
The hypotheses being tested are:
H0: µ1 - µ2 = µo where µd is the mean difference of pairs
HA: µ1 - µ2 ≠ µo for a two-sided test ( > or < for a one-sided test)
where µo is the null value and is usually 0 for no difference between the two means.
Go to Data/Data Analysis, scroll to bottom of
menu to select:
“t-test: Two-Sample Assuming Unequal
Variances” Press OK.

This is also termed the “unpooled” approach


and is the more general approach.
The full degrees of freedom calculation (using
Welch’s approximation) is done when
UNequal variances is assumed as here.
The “pooled” approach of assuming equal variances is not recommended as Excel has no
test to verify the validity of the assumption
of equal variances.
 The data ranges are entered here; e.g.
 The Hypothesized Mean Difference is
usually zero
 If you included the headings in the data
ranges then tick “Labels” box
 Click “output range” and select a single
cell for the upper left of output.

The output looks like:

 The sample means and


sample sizes are given.
 The df is the degree of
freedom calculated using
Welch’s approx.
 The test statistic is given.
 The P-values for one-sided
and two-sided tests are given.

Here for a two sided test the P-value = 0.076, indicating a no significance evidence at 5%
level of significance of a difference in means between the two populations.
20

(ii) Difference between Two Paired Means – t-tests


This test applies to the difference between two populations that are linked in some way
such as matched pairs or repeated measures designs. The hypotheses being tested are:
H0: µ1 - µ2 = µd = µo where µd is the mean difference of pairs
HA: µ1 - µ2 = µd ≠ µo for a two-sided test ( > or < for a one-sided test)
Note: µo is the null value and is usually 0 for no difference between the two means.

Go to Data/Data Analysis, scroll to bottom


of menu to select:
“t-test: Paired Two Sample for Means”

Press OK

In this process, the difference on each pair is obtained (behind the scenes) and tested
against the “Hypothesized Mean Difference” (null value, µo) as given by you.

 The cells containing the data list for


each variable are entered here; e.g.
 The Hypothesized Mean Difference
is usually zero
 If you included the headings in the
data ranges then tick “Labels” box
 Click “output range” and select a
single cell for the upper left of output
table to be on this same sheet for
convenience.
The output looks like:
 The sample means and sample size are
given.
 The df is the degree of freedom = n-1
where n is the number of paired
differences.
 The test statistic.
 The P-values for one-sided and two-
sided tests are given.

Here for a two sided test the P-value = 1.911×10-5, indicating a highly significant mean
difference between the paired populations.
21

4.3 ANOVA (Analysis of Variance) - Single Factor


This test compares the means of a single same variable across two or more populations.
The validity of this test relies on the assumption that there are equal variances across all
populations.
The hypotheses tested are:
H0: µ1 = µ2 = µ3 = µ4= ... µk where there are k samples of populations involved.
HA: There is a difference among the means.
No “ad-hoc” tests are available in Excel to determine which is different among the
populations. Only separate t-tests otherwise can be performed in Excel.

In Data/Data Analysis select


“Anova: Single Factor”
at the top of the list.

Click OK.

Example: Number of insects trapped on traps of four different colours. Is there a difference
in the mean number trapped with the different colours of trap?

 Note that the data is entered as a total


array of the actual individual units

 Note if each sample is by row or


column, here it is by row!
 Remember to tick “Labels” if heading
included in array, as here.
 Output on same sheet next to data is
helpful

The output looks like:

A summary of each sample’s statistics is


given.

The F-statistic and corresponding P-value


is given.

Here the P-value is 1.2×10-7, a highly


significant result that indicates there is a difference in mean number between the colours.
22

4.4 Chi-Squared Test of Independence


This test answers the question of “Does the unit’s response in one category depend on the
same units response in a second category?” The data is summarised in a “two-way” or
“contingency table”.

NB: Excel does NOT have a direct tool to perform this test on a contingency table.

It can only be used after setting up manually both the “Observed Counts” table and the
“Expected Counts” table and using a function command of CHITEST.

p-value =CHITEST( Observed array:Expected array)


23

5. NAMING CELLS and RANGES


Instead of cell references, it is much clearer to use the actual terms for the variables in the
formula. This can be achieved by “naming the cells”.
This is an advanced feature that is very useful to know as it makes the equations in your
spreadsheet directly readable instead of cell designations. In business this is essential for
transparency of the procedures.

Exercise 5: Celsius – Fahrenheit Conversion


Use the “Fill in a Series” technique to enter in column A, a heading of “Celsius” and values
ranging from -20 to 50, in steps of 1. In cell B1, enter the heading “Fahrenheit”.

To designate column A cells with the name “Celsius”:

 Select the range A2:A72.


 Select Formulas tab and
Select “Name Manager”
 Select “New”

Excel anticipates that the column heading will be


the name and has it there already. If you want
different, just type it in.

Select OK.

Enter in cell B2 the formula for the


Fahrenheit temperature conversion as
=1.8*Celsius+32.
This is instead of =1.8*A2+32 which is
not as informative.
Note spelling and capitals must be the same as the defined name.

Copy this formula down to cell B72 to show the conversion for all Celsius values.

Use the Microsoft Excel Help facility to find out more on how to “name cells and ranges”
for yourself.
24

6. FURTHER RESOURCES
Some useful resources for extra help may be
 Microsoft Excel Help, , found on the top toolbar of the program and Microsoft’s
Support page for Excel: http://office.microsoft.com/en-us/excel-help/, or
http://office.microsoft.com/en-us/excel-help/excel-help-and-how-to-FX101814052.aspx
investigate the “Free Training” tutorials and choose your version.

 Excel 2016 - Training - Microsoft Office Online


Audiovisual course with many and various tutorials in basics: Get to know Excel:
Microsoft Corporation. All rights reserved. ...
https://support.office.com/en-us/article/Excel-training-9bc05390-e94c-46af-a5b3-
d7c22f6990bb

 The Data Analysis Toolpak was removed in Office for Mac versions between 2008 and
2015. However, the following is a free third-party tool (not supported by Microsoft!)
that offers similar functionality:
StatPlus:mac LE: http://www.analystsoft.com/en/products/statplusmacle/
 http://www.youtube.com , the programs are of very mixed quality but one suggestion is
an Uploader called ExcelIsFun who has a series on Excel Statistics, e.g.“Excel
Statistics 31 Histogram using Data Analysis Add In”.
Beware of statistical procedures on YouTube – many are simply wrong and most are
incomplete.

You might also like