An Introduction To Excel-2019
An Introduction To Excel-2019
SCIENCE
Introduction to MS Excel ©
for
CONTENTS
The goal of this brief manual is to help you get started in Microsoft Excel 2016 or above
for basic computations, data summary and data analysis. This will be a powerful tool for
you in the preparation of reports in all disciplines in science.
Note this manual is not a comprehensive guide to Excel. It is primarily to give you
background in the features used in statistics in which you probably have had limited to no
experience of before. You will need to refer to it throughout this course.
THE VERY BASIC BASICS
When you first open MS Excel, a new file—called a workbook—is displayed on your
screen. A workbook consists of various sheets. When you open a new workbook, it
contains three worksheets. Near the bottom of the screen you should see tabs labelled
Sheet1, Sheet2, and Sheet3. You can add or delete or copy worksheets as needed by
clicking on tab and right-mouse click for a menu. Using separate sheets in a workbook can
be helpful to organise your work when you have many sections & related information, all
within the one file. They can be renamed also by double-clicking on the tabs and typing.
A spreadsheet is an array of cells organised in rows and columns with an alphanumeric
system:
Rows are numbered from 1,2,3,….. to 65,536 and
Columns are labelled alphabetically: A, B, C, , X, Y, Z, AA, AB, AC, to IV
(256 columns in total).
Each cell is identified by the column and row that intersect at its location. Thus D3 is the
cell reference for the cell in the fourth column and the third row.
There is an alternative cell reference style called (R1C1) in which the columns are also
designated by numbers eg R3C4 for cell at row 3 and column 4 is D4. If your computer
defaults to this setting it can be changed:
In Office 2007 and above: go to Office symbol in top left hand corner>Excel
Options>Formulas and un-clicking the box R1C1 reference style.
a) Formulae in Excel
All formulas must begin with an equals sign = and are entered directly into the cell. This
can be either by hand using certain operators or codes as described below or by using in-
built functions. A formula may contain constants or use cell references containing values
(see above examples) to calculate the final result.
Operators
Arithmetic operator Description Example Result
The Paste Function Wizard on the toolbar is a library of in-built functions so that you
do not have to know all of Excel’s shorthand. Investigate menus, particularly the
“Statistical” one. Some Common Functions with their formulae built-in to Excel are:
SAMPLE Standard
=STDEV.S =STDEV.S(D1:D23)
Deviation
b) Writing Formulae
Exercise 1: Value of an Investment
Open a new spreadsheet and enter the
information in the diagram for an annually
compounded interest:
(Don’t worry that the columns do not appear wide
enough. You can change the column width easily
by putting the cursor on the line between column
labels A and B at the top of the sheet and dragging
it to the desired width).
The formula =B3 will show the value 1000 when you press Enter .
To change appearances to
reflect currency and
percentages, Formatting of the
number can be done from the
“Number” section on the
toolbar of “Home” tab. Select
the cell of interest and press the
$ or the % button as required:
The 0 and 1 in cells A7 and A8 of the Year column show the pattern of wanting to count
in 1s. Highlight these two cells (A7 and A8) and use the “Fill Handle” of NOTE 1
below to copy this pattern down the column to A57 (50 years).
The problem here is that the cell B5 no longer contains the interest rate. We need to make
part of this reference absolute (by placing a dollar sign, $, in front of the part of the cell
designation that would change…here copying down a column this is the row number that
would change otherwise). Thus you can now copy the formula effectively without
changing the reference to the constant value in the formula.
Select B8, click into the formula bar at top and edit the formula to:
= B7*(1+B$4). The value 1050 will remain in cell B8. ($B$4 would also be OK)
Absolute references If you don't want Excel to adjust references when you copy a
formula to a different cell, use an absolute reference or a name. You can create an absolute
reference by placing dollar signs ($): in front of row number for copying down the column,
and in front of column letter for copying across row, OR in front of both.
To see the power of a spreadsheet, we need only change the Principal or Interest Rate, and
the spreadsheet automatically calculates the new values.
Change the interest rate to 10% and observe the changes
Change the principal to $5000 and observe the changes
7
AND
FOR MAC OPERATING SYSTEMS (from 2016 Excel for Mac versions only):
Go to Tools on top tool bar…. Select Add-ins, and Tick “Analysis Toolpak”. The data
Analysis module is then accessed via the TOOL Tab.
This procedure only needs to be done once as “Data Analysis” will remain on logging out.
Excel 2007 up to 2015 for Mac DO NOT include this Data Analysis feature.
You can update for free as a Monash student to the latest Office version via the My Monash
Software … download MS Office 365.
8
The various functions can be found individually using the Paste Function Wizard
on the toolbar BUT many descriptors together can be obtained in one table via Data
Analysis.
A Histogram is the bar chart showing the frequency distribution of values in a data
set for a single variable. It gives a visual picture of the spread of the data.
Random Number Generation is useful to allocate a random number to an ordered list
and after sorting according to the random number the list can be randomised. This is
important when choosing a random sample from a list.
Regression will be used to investigate the appropriateness and strength of a straight
(linear) line placed on two variable (x,y) data. How does the variable y change with a
change in variable x?
t-Test: ….. of different types in which the hypothesis test for comparing the means of
a quantitative variable across two groups. This is performed directly from the data
from two samples.
For STA1010 only see also:
Anova: Single factor is a hypothesis test involving the question “Is there a difference
between the means of more than 2 treatment groups?”
9
5.50, 5.61, 4.88, 5.07, 5.26, 5.55, 5.36, 5.29, 5.58, 5.65, 5.57, 5.53, 5.62, 5.29,
5.44, 5.34, 5.79, 5.10, 5.27, 5.39, 5.42, 5.47, 5.63, 5.34, 5.46, 5.30, 5.75, 5.68,
5.85
Download the file from Moodle or Enter these values in an Excel column, eg cells
A2 to A30 with a heading in A1.
Input Range: A1:A30 by highlighting the data (I have included the heading row!).
Click radio button for
Grouped by Columns as your
data is set down in a column.
Click radio button for “Labels
in first row” IF you did include
the heading in the Input Range.
Click radio button for Output
range and enter C1 in the
space. (Cell C1 will be the upper
left hand cell of the output
generated.)
Check Summary Statistics box
OK
You will notice a large amount of information is calculated; much of which ( eg kurtosis,
skewness) will not concern this unit.
Note the measures of the centre of the data: mean (5.45) and median (5.46).
Note the measures of spread of the data: Standard deviation (0.22) and range (= max-min =
5.85-4.88 = 0.97. The other important measure of spread is the Interquartile Range (=Q3-
Q1) which cannot be found here. You can find the quartiles using the built-in function
=QUARTILE.EXC(cell range, x) (see ).
10
For Cavendish experiment: Take a step below the min and a step above the max the data
lies between about 4.8 and 6.0. This is a span of 1.2 and needing about 8-10 bars means
intervals of 0.1 (12 bars) or 0.2 (6 bars) … opt for the latter (intervals as 0.2) If 0.1was used
there would be to many zero and very small counts. The upper limit for each interval is
then 4.8, 5.0, 5.2, 5.4, 5.6, 5.8, and 6.0.
In the Excel spreadsheet: Set up a column (column F) with heading “Relative density of
earth” (variable’s title for auto entry as label on x-axis of histogram) in F1 and enter all
these values in the column (one number per cell, eg in F1 to F8).
NOTE that the output is a table of frequencies and a chart. The chart from Excel does need
editing to have an acceptable appearance:
edit
11
NOTE: The x-axis scale and label alignment is a problem with histograms in Excel and
cannot be fixed.
Better axis labelling and ALSO for Office for Mac users:
If you would like a histogram that is a little bit better than the simple histograms that use
the Excel Column chart type (particularly for continuous-valued data with proper labelling
of the horizontal axis), you could try the free “Better Histogram add-in for excel” from
TreePlan, available for download from the Histogram page at
http://www.treeplan.com/better.htm
A big problem with this approach is that the x-axis scaling (Bin) is chosen by Excel and it
will probably NOT make sense eg counts in columns 4.88-5.13, etc … This is not an
acceptable x-axis scale. It should be 4.80-5.00 (counting in 0.25s) or 4.80-5.00 (counting
in 0.20s) – usual scale tick numbers for easy reading and understanding.
By clicking on the x-axis and right mouse you can format the x-axis … BUT it is not
straightforward to obtain a correct scale with a reasonable staring point.
I suggest taking the Data Analysis approach whereby you select the starting value and set
class intervals for the x-axis scale!
This is now an ORDERED LIST with a corresponding RANDOM NUMBER. If the two
columns are now linked and the RANDOM column is sorted into order the LIST
column will correspondingly be RANDOMISED:
Highlight the entire 26x2 array. You can include the headings to make it easier.
Open Data> Sort
Tick “My data has headers” if
column headings were included.
Sort by ... in the drop down menu
select column “Random Number”
The LIST will now be randomised, and the first 5 letters will
be my randomly chosen letters: : D, R, P, N, and E in this
example.
13
Select the legend and “Delete”. Only one (x,y) pair ( called a series) is plotted so no
need for a legend.
Select the minor horizontal gridlines and “Delete”.
Change x-axis and y-axis labels by selecting each and typing an appropriate label with
units. This first appears only in the bar at the top … after typing, press “Enter” to place
the words at the selected axis.
Change each axis’ scale to start the plot so that the data fills the whole plot area:
Place the cursor near the x-axis or one of its values, click left mouse to select axis
Click right mouse for menu and select “Format Axis”
On the right hand side you will see a “Format Axis” pane and select “Axis
Options”. Change Minimum to “75” and press enter.
Repeat the step above by selecting the y-axis. Enter “100” as the minimum. Click
on the chart to accept.
Click on the trendline and in Format pane change to a solid line, black, and 0.75pt.
Move the equation box to a clearer position.
The “SUMMARY OUTPUT” is a large table that contains full regression information and
the important residual plot as on the following page:
15
In STA1010, the information needed is as for SCI1020 plus the inference on the slope
which involves Regression on the SLOPE via a test P-value and Confidence interval on
the slope Upper 95% and Lower 95%for the X-variable.
Residual plot
This residual plot is randomly scattered with random + and – along the line of best fit. This
indicates that the linear relationship (described by this line of best fit) is appropriate to the
data in this range.
16
NORMSDIST
Returns the standard normal cumulative distribution
function. The distribution has a mean of 0 (zero) and a
standard deviation of one.
NORMSDIST gives the probability, from the samples z-
value, of that sample or LESS.
Syntax in Excel is: =NORMSDIST(z)
Z is the value for which you want the distribution.
NB: The Standard Normal distribution in Excel and in tables is a LEFT-SIDED interval
area. If you want the RIGHT-SIDED area ( the greater than probability) then the formulas
is =1-NORMSDIST(z). The total area under any distribution is 1 (100% of possibilities).
NORMSINV
Returns the inverse of the standard normal cumulative distribution. The distribution has a
mean of zero and a standard deviation of one. This is NORMSDIST in reverse: given the
probability (that z-value or less) what is the standardised score, the z-value?
Syntax in Excel is: =NORMSINV(probability)
Probability is a probability corresponding to the normal distribution (as in the diagram: the
left hand interval area).
NB: The Standard Normal distribution in Excel and Tables is LEFT-SIDED interval area.
TDIST
Returns the Percentage Points (probability) for the Student’s t-distribution where a numeric
value (x) is a calculated value of t for which the Percentage Points are to be computed. The
Student’s t-distribution is used to find the p-value in the hypothesis testing of means when
the population standard deviation, σ, is not known (as is the usual case).
TDIST gives the probability, from the samples t-value, of that sample or more extreme.
Syntax in Excel is: =TDIST(x,degrees_freedom,tails)
X is the numeric value of the t-value.
degrees_freedom = sample size – 1 = (n-1)
Tails specifies the number of distribution tails to return. If
tails = 1, TDIST returns the one-tailed distribution. If tails
= 2, TDIST returns the two-tailed distribution.
NB: The t-distribution in Excel and tables gives a TAIL
area
17
TINV
Returns the t-value of the Student's t-distribution as a function of the probability and the
degrees of freedom. This is TDIST in reverse: given the probability what is the t-value?
Syntax in Excel is =TINV(probability,degrees_freedom)
Probability is the probability associated with the two-tailed Student's t-distribution.
degrees_freedom is the number of degrees of freedom with which to characterize the
distribution, = sample size – 1 = (n-1)
Remarks
A one-tailed t-value can be returned by replacing probability with 2*probability.
For a probability of 0.05 and degrees of freedom of 10, the two-tailed value is
calculated with TINV(0.05,10), which returns 2.28. The one-tailed value for the
same probability and degrees of freedom can be calculated with TINV(2*0.05,10),
which returns 1.812.
CHIDIST
Returns the one-tailed probability of the chi-squared distribution. The χ2 distribution is
associated with a χ2 test. Use the χ2 test to compare observed and expected values. By
comparing the observed results with the expected ones, you can decide whether your
original hypothesis is valid.
CHIDIST gives the probability, from the samples chi-squared value, of that sample or
MORE EXTREME (the tail area).
Syntax in Excel is: =CHIDIST(x,degrees_freedom)
X is the value of chi-squared calculated from observed and expected counts.
degrees_freedom is the number of degrees of freedom.= (#rows-1)x(#colums-1)
18
Here for a two sided test the P-value = 0.076, indicating a no significance evidence at 5%
level of significance of a difference in means between the two populations.
20
Press OK
In this process, the difference on each pair is obtained (behind the scenes) and tested
against the “Hypothesized Mean Difference” (null value, µo) as given by you.
Here for a two sided test the P-value = 1.911×10-5, indicating a highly significant mean
difference between the paired populations.
21
Click OK.
Example: Number of insects trapped on traps of four different colours. Is there a difference
in the mean number trapped with the different colours of trap?
NB: Excel does NOT have a direct tool to perform this test on a contingency table.
It can only be used after setting up manually both the “Observed Counts” table and the
“Expected Counts” table and using a function command of CHITEST.
Select OK.
Copy this formula down to cell B72 to show the conversion for all Celsius values.
Use the Microsoft Excel Help facility to find out more on how to “name cells and ranges”
for yourself.
24
6. FURTHER RESOURCES
Some useful resources for extra help may be
Microsoft Excel Help, , found on the top toolbar of the program and Microsoft’s
Support page for Excel: http://office.microsoft.com/en-us/excel-help/, or
http://office.microsoft.com/en-us/excel-help/excel-help-and-how-to-FX101814052.aspx
investigate the “Free Training” tutorials and choose your version.
The Data Analysis Toolpak was removed in Office for Mac versions between 2008 and
2015. However, the following is a free third-party tool (not supported by Microsoft!)
that offers similar functionality:
StatPlus:mac LE: http://www.analystsoft.com/en/products/statplusmacle/
http://www.youtube.com , the programs are of very mixed quality but one suggestion is
an Uploader called ExcelIsFun who has a series on Excel Statistics, e.g.“Excel
Statistics 31 Histogram using Data Analysis Add In”.
Beware of statistical procedures on YouTube – many are simply wrong and most are
incomplete.