0% found this document useful (0 votes)
20 views3 pages

Week 5 Cheat Sheet

This cheat sheet outlines key mathematical equations and Excel functions for Week 5 of the course on Statistics and Data Analysis. It covers continuous variables and distributions, including uniform, normal, and exponential distributions, along with their associated Excel functions for calculations. Additionally, it discusses probability plots and the importance of determining if sample data comes from a normally distributed population.

Uploaded by

raresdynu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views3 pages

Week 5 Cheat Sheet

This cheat sheet outlines key mathematical equations and Excel functions for Week 5 of the course on Statistics and Data Analysis. It covers continuous variables and distributions, including uniform, normal, and exponential distributions, along with their associated Excel functions for calculations. Additionally, it discusses probability plots and the importance of determining if sample data comes from a normally distributed population.

Uploaded by

raresdynu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Week 5 Cheat Sheet

StaŸsŸcs and Data Analysis with Excel, Part 1

Charlie Nu©elman

Here, I provide the mathemaŸcal equaŸons and some of the important Excel funcŸons required to
perform various calculaŸons in Week 5 of the course. The headings represent the screencasts in which
you will find those calculaŸons and concepts. Not all screencasts are referenced below – just the ones
that have complex mathemaŸcal formulas or Excel formulas that are tricky to use.

NOTE: For Week 5, I also have provided you with a cheat-sheet specifically for Excel formulas (“Excel
FuncŸons for ConŸnuous DistribuŸons”). So, be sure to check that out, too!

ConŸnuous Variables and DistribuŸons


A conŸnuous variable is one that can take on a range of values, not just constrained to discrete values
we discussed in the previous week. For example, 𝑥 can range from −∞ to +∞ (−∞ < 𝑥 < ∞), can be
constrained to only posiŸve values (𝑥 > 0), or can be constrained to a finite range (e.g., 1 < 𝑥 < 3). For
𝑓(𝑥) to be a probability density funcŸon, all values of the funcŸon must be non-negaŸve [𝑓(𝑥) ≥ 0] and
the area underneath the funcŸon must sum to unity [∫ 𝑓(𝑥)𝑑𝑥 = 1]. For conŸnuous density funcŸons,
the probability that the random variable takes on any exact value is equal to zero (e.g., 𝑃[𝑋 = 4] = 0).

Like for discrete density funcŸons, we can calculate probabiliŸes by taking differences in cumulaŸve
probability density funcŸons:
Õ
𝑃[𝑎 ≤ 𝑥 ≤ 𝑏] = ± 𝑓(𝑥) ∙ 𝑑𝑥 = 𝐹(𝑏) − 𝐹(𝑎)
Ô

We typically don’t need to perform the integraŸon (the term between the two equal signs) since we can
either look up cumulaŸve density funcŸons or obtain them from Excel (or other compuŸng tool).

The Uniform DistribuŸon


5
The uniform distribuŸon is defined by 𝑓(𝑥) = , where 𝑥 is constrained to 𝑎 ≤ 𝑥 ≤ 𝑏. The funcŸon is
Õ?Ô
a flat line with slope zero. The RAND() funcŸon in Excel will output a uniformly distributed variable
between 0 and 1 (and it can be modified to output a uniformly distributed variable between any two
integers).

The Normal DistribuŸon


The most common conŸnuous distribuŸon that models real world phenomenon very well is the
normal distribuŸon, also known as the Gaussian distribuŸon or the “bell curve”. As always, we can use
the
equaŸon above in “ConŸnuous Variables and DistribuŸons” secŸon to determine probabiliŸes, where we
can obtain the cumulaŸve density funcŸon from tables or from Excel using the NORM.DIST funcŸon. If
we have a cumulaŸve probability for the normal distribuŸon, we can always determine the
corresponding value of 𝑥 using the NORM.INV funcŸon in Excel.

Some useful properŸes of the normal distribuŸon are that about 68% of the distribuŸon lies within ±𝜎
(one standard deviaŸon) of the mean, about 95% of the distribuŸon lies within ±2𝜎 (two standard
deviaŸons) of the mean, and 99.7% of the distribuŸon lies within ±3𝜎 (three standard deviaŸons) of the
mean.

Standardizing and Z-Values


The standard normal distribuŸon is a normally distributed variable with mean zero and standard
deviaŸon equal to 1. Standardizing was useful before compuŸng tools (Excel being one) became
mainstream, and they enabled one to convert a normally distributed variable to the standard normal
distribuŸon, and use the z-tables for probability calculaŸons. Nevertheless, it’s sŸll important to be able
to convert between normally distributed variables and the standard normal distribuŸon.
Ñ?
The standardizaŸon equaŸon is given by: 𝑍 = , where 𝑋 is the normally distributed variable of

interest, 𝜇 is the populaŸon mean, and 𝜎 is the populaŸon standard deviaŸon from which the samples
are derived.

CumulaŸve standard normal distribuŸon tables can be helpful in calculaŸng probabiliŸes associated with
standard normal distribuŸons. The NORM.S.DIST and NORM.S.INV funcŸons are useful Excel funcŸons
for dealing with the standard normal distribuŸon.

Inverse Normal DistribuŸon CalculaŸons


OLenŸmes, we need to know what the z-value (for standard normal distribuŸon) or x-value (for
normally distributed variables) is that corresponds to a given probability. We can use inverse normal
calculaŸons by looking up data in cumulaŸve standard normal distribuŸon tables. AlternaŸvely, we can
use the NORM.INV or NORM.S.INV funcŸons in Excel to do the same.

ExponenŸal DistribuŸon
Another important conŸnuous distribuŸon is the exponenŸal distribuŸon. The exponenŸal distribuŸon
is commonly associated with Poisson processes. It provides the probability that there will be a certain
interval (Ÿme, length, area, volume) between two events of a Poisson process. It is common to use the
cumulaŸve exponenŸal distribuŸon to calculate probabiliŸes. The cumulaŸve exponenŸal distribuŸon is
given by:

𝐹(𝑥) = 1 − 𝑒 ?
Here, 𝜆 is the long-term average rate (events per interval). The EXPON.DIST funcŸon in Excel is useful for
probability calculaŸons related to the exponenŸal distribuŸon.

Other ConŸnuous DistribuŸons


Other conŸnuous distribuŸons that you might encounter in staŸsŸcs and applied data analysis include:
the gamma distribuŸon (GAMMA.DIST in Excel, constrained to 𝑥 > 0); the Weibull distribuŸon
(WEIBULL.DIST in Excel, also constrained to 𝑥 > 0); the triangular distribuŸon (constrained to a finite
range in 𝑥); the beta distribuŸon (BETA.DIST in Excel, constrained to 0 ≤ 𝑥 ≤ 1); and the beta-PERT
distribuŸon (can use the BETA.DIST funcŸon in Excel, constrained to a finite range in 𝑥).

Probability PIots
A probability plot is used to determine whether a set of sample data likely came from a normally
distributed populaŸon. Typically, in order to show that the data were derived from a normally
distributed populaŸon, the P-value of the AD staŸsŸc should be greater than 0.05. If the P-value of the
AD staŸsŸc is less than 0.05, then it can be concluded that the data do not come from a normally
distributed populaŸon.

Many sophisŸcated soLware tools will provide probability plots and P-values of the AD staŸsŸc. Excel,
unfortunately, won’t do this, but in the course, I provide a file “Probability plot.xlsx” that will do this for
you. Please use this file to explore normality of the data – this will be much more important in Part 2 of
the course.

You might also like