Stat Quick Overview
Stat Quick Overview
Descriptive Statistics
Methods of organizing, summarizing, and presenting data in an informative way.
Example: the United States government reports the population of the United
States was 179,323,000 in 1960; 203,302,000 in 1970; 226,542,000 in 1980;
248,709,000 in 1990; 265,000,000 in 2000; and 308,400,000 in 2010. This informa-
tion is descriptive statistics.
Inferential Statistics
The methods used to estimate a property of a population on the basis of a sample.
POPULATION The entire set of individuals or objects of interest or the measurements obtained
from all individuals or objects of interest.
SAMPLE A portion, or part, of the population of interest.
Types of Variables
1. Quantitative (Numeric)
● Discrete (non decimal)
● Continuous (decimal)
2. Qualitative (Nonnumeric)
Level of Measurement
Nominal:
1. The variable of interest is divided into categories or outcomes.
2. There is no natural order to the outcomes.
The classification of the six colors of M&M’s milk chocolate candies is an example of the
nominal level of measurement.
Ordinal:
1. Data classifications are represented by sets of labels or names(high, medium, low) that have
relative values.
2. Because of the relative values, the data classified can be ranked or ordered.
One classification is “higher” or “better” than the next one. That is, “Superior” is better than
“Good,” “Good” is better than “Average,” and so on. However, we are not able to distinguish the
magnitude of the differences between groups.
Interval:
1. Data classifications are ordered according to the amount of the characteristic they possess.
2. Equal differences in the characteristic are represented by equal differences in the
measurements.
An example of the interval level of measurement is temperature. Equal differences between two
temperatures are the same, regardless of their position on the scale.
Ratio:
1. Data classifications are ordered according to the amount of the characteristics
they possess.
2. Equal differences in the characteristic are represented by equal differences in the numbers
assigned to the classifications.
3. The zero point is the absence of the characteristic and the ratio between two numbers is
meaningful.
Examples of the ratio scale of measurement include wages, units of production, weight, changes
in stock prices, distance between branch offices, and height.
Chapter 2
PIE CHART A chart that shows the proportion or percentage that each class represents the total
number of frequencies.
HISTOGRAM A graph in which the classes are marked on the horizontal axis and the class
frequencies on the vertical axis. The class frequencies are represented by the heights of the bars,
and the bars are drawn adjacent to each other.
FREQUENCY POLYGON It consists of line segments connecting the points formed by the
intersections of the class midpoints and the class frequencies.
HISTOGRAM & FREQUENCY POLYGRAM Both the histogram and the frequency polygon
allow us to get a quick picture of the main characteristics of the data (highs, lows, points of
concentration, etc.). Although the two representations are similar in purpose, the histogram has
the advantage of depicting each class as a rectangle, with the height of the rectangular bar
representing the number in each class. The frequency polygon, in turn, has an advantage over the
histogram. It allows us to compare directly two or more frequency distributions.
Population Mean
Mean calculated value from all of the values (population).
where:
µ represents the population mean. It is the Greek lowercase letter “mu.”
N is the number of values in the population.
X represents any particular value.
∑ is the Greek capital letter “sigma” and indicates the operation of adding.
∑X is the sum of the X values in the population.
PARAMETER A characteristic of a population.
Sample Mean
STATISTIC A characteristic of a sample.
Median:
The major properties of the median are:
1. It is not affected by extremely large or small values. Therefore, the median is a valuable
measure of location when such values do occur.
2. It can be computed for ordinal-level data or higher.
Mode:
Advantage: we can determine the mode for all levels of data—nominal, ordinal, interval, and
ratio. The mode also has the advantage of not being affected by extremely high or low values.
Disadvantage: For many sets of data, there is no mode because no value appears more than
once. Conversely, for some datasets there is more than one mode. (If two modes, Bimodal).
Zero Skewness:
● Distribution is symmetrical.
● Mean, Mode, Median are located in the center. All are equal.
Positive Skewness:
● The slope will be at the right.
● The arithmetic mean is the highest. Mean is more affected by extreme values.
Negative Skewness:
● The slope will be at the left.
● The mean is the lowest. Mean>Median>Mode.
Geometric Mean:
Mean changes in the percentage. It has a wide application in business and economics because we
are often interested in finding the percentage changes in sales, salaries,or economic figures.
Measures of Dispersion
RANGE The simplest measure of dispersion is the range. It is the difference between the largest
and the smallest values in a data set.
MEAN DEVIATION The arithmetic mean of the absolute values of the deviations
from the arithmetic mean.
VARIANCE The arithmetic mean of the squared deviations from the mean.
STANDARD DEVIATION The square root of the variance.
Interpretation of Standard Deviation: Standard deviation (SD) measures how spread out the
values in a dataset are from the mean (average).
Small SD → Data points are close to the mean (low variability).
Large SD → Data points are spread out from the mean (high variability).
A dataset with a higher SD is more spread out and less consistent.
A dataset with a lower SD is more tightly clustered around the mean.
CHEBYSHEV’S THEOREM
For any set of observations (sample or population), the proportion of the values that lie within k
standard deviations of the mean is at least 1 1/k2, where k is any constant greater than 1.
● k = 2 (Two standard deviations) At least 75% of values lie within ±2 SDs of the mean.
● k = 3 (Three standard deviations) At least 88.9% of values lie within ±3 SDs of the
mean.
● k = 4 (Four standard deviations) At least 93.75% of values lie within ±4 SDs of the
mean.
● Unlike the Empirical Rule, which only applies to normal distributions, Chebyshev’s
Theorem applies to all distributions.
● Useful when the shape of the distribution is unknown or skewed.
Dot Plot
A dot plot groups the data as little as possible, and we do not lose the identity of an individual
observation. To develop a dot plot, we simply display a dot for each observation along a
horizontal number line indicating the possible values of the data. If there are identical
observations or the observations are too close to be shown individually, the dots are “piled” on
top of each other. This allows us to see the shape of the distribution, the value about which the
data tend to cluster, and the largest and smallest observations.
Dot plots are most useful for smaller data sets, whereas histograms tend to be most useful
for large data sets.
Box Plots
A box plot is a graphical display, based on quartiles, that helps us picture a set of data. To
construct a box plot, we need only five statistics: the minimum value, Q1 (the first quartile), the
median, Q3 (the third quartile), and the maximum value.
The box plot shows that the middle 50 percent of the deliveries take between 15 minutes and 22
minutes. The distance between the ends of the box, 7 minutes, is the interquartile range. The
interquartile range is the distance between the first and the third quartile. It shows the spread or
dispersion of the majority of deliveries.
Coefficient of Skewness:
Relationship Between Two Variables
When we study the relationship between two variables, we refer to the data as bivariate. One
graphical technique we use to show the relationship between variables is called a scatter
diagram. We scale one variable along the horizontal axis (X-axis) of a graph and the other
variable along the vertical axis (Y-axis). Usually one variable depends to some degree on the
other. A scatter diagram requires that both of the variables be at least interval scale.
Histogram Dotplot
Useful for large data sets Useful for small data sets
Bars represent frequency in intervals Data split into stems and leaves
Useful for large data sets Useful for small data sets
Chapter 5
PROBABILITY A value between zero and one, inclusive, describing the relative possibility
(chance or likelihood) an event will occur.
EXPERIMENT A process that leads to the occurrence of one and only one of several possible
observations.
OUTCOME A particular result of an experiment.
EVENT A collection of one or more outcomes of an experiment.
Objective probability is subdivided into
(1) classical probability and (2) empirical probability.
1.Classical Probability
Classical probability is based on the assumption that the outcomes of an experiment are equally
likely.
MUTUALLY EXCLUSIVE The occurrence of one event means that none of the other events
can occur at the same time. The variable “gender” presents mutually exclusive outcomes, male
and female. An employee selected at random is either male or female but cannot be both.
COLLECTIVELY EXHAUSTIVE At least one of the events must occur when an experiment
is conducted.
2. Empirical Probability
Empirical or relative frequency is the second type of objective probability. The probability of an
event happening is the fraction of the time similar events happened in the past is called empirical
probability.
The empirical approach to probability is based on what is called the law of large numbers. The
key to establishing probabilities empirically is that more observations will provide a more
accurate estimate of the probability.
LAW OF LARGE NUMBERS Over a large number of trials, the empirical probability of an
event will approach its true probability.
To explain the law of large numbers, suppose we toss a fair coin. Based on the classical
definition of probability, the likelihood of obtaining a head in a single toss of a fair coin is .5.
Based on the empirical or relative frequency approach to probability, the probability of the event
happening approaches the same value based on the classical definition of probability.
Subjective Probability
The likelihood (probability) of a particular event happening that is assigned by an individual
based on whatever information is available. Basically, guessing the probability on the basis of
information available.
Rules of Addition
To apply the special rule of addition, the events must be mutually exclusive. Mutually
exclusive means that when one event occurs, none of the other events can occur at the same time.
Complement Rule
The probability that a bag of mixed vegetables selected is underweight, P(A), plus the probability
that it is not an underweight bag, writtenP(∼ 𝐴) and read “not A,” must logically equal 1.
The General Rule of Addition
If not mutually exclusive,
What is the probability a selected person visited either Disney World or Busch Gardens?” (1) add
the probability that a tourist visited Disney World and the probability he or she visited Busch
Gardens, and (2) subtract the probability of visiting both. For the expression P(A or B), the word
or suggests that A may occur or B may occur.This also includes the possibility that A and B may
occur. This use of or is sometimes called an inclusive.
JOINT PROBABILITY A probability that measures the likelihood two or more events will
happen concurrently.
Rules of Multiplication
Special Rule of Multiplication:
The special rule of multiplication requires that two events A and B are independent.
INDEPENDENCE The occurrence of one event has no effect on the probability of the
occurrence of another event. For example, when event B occurs after event A occurs, does A
have any effect on the likelihood that event B occurs? If the answer is no, then A and B are
independent events.
Bayes’ Theorem
It helps revise an initial probability (prior) based on new evidence (likelihood). The updated
probability (posterior) tells us how likely an event is given the new information.
PRIOR PROBABILITY The initial probability based on the present level of information.
POSTERIOR PROBABILITY A revised probability based on additional information.
Chapter 6
A random variable is a numerical value that represents the outcome of a random event. It
assigns numbers to different possible outcomes in a probability experiment.
Chapter 7
Uniform Distribution
The distribution’s shape is rectangular and has a minimum value of a and a maximum of b.
Why is this rectangular?
The uniform distribution is called rectangular because its probability is evenly spread across
all possible values. This means that every outcome in the range has the same probability,
creating a flat, rectangular shape when graphed.
Example: Rolling a dice. Every number has the same probability.
The signed distance between a selected value, designated X, and the mean divided by the
standard deviation.
Interpretation : The Z-value (Z-score) measures how far a data point (or sample mean) is from
the population mean, in terms of standard deviations. It helps us determine how unusual or
typical a value is in a normal distribution.
Application of Standard Normal Distribution
Chapter 8
Reasons to Sample
1. To contact the whole population would be time consuming.
2. To cut costs
3. The physical impossibility of checking all items in the population.
4. The destructive nature of some tests.
5. The sample results are adequate.
SIMPLE RANDOM SAMPLE A sample selected so that each item or person in the population
has the same chance of being included.
SYSTEMATIC RANDOM SAMPLE A random starting point is selected, and then every kth
member of the population is selected. k is calculated as the population size divided by the sample
size.
STRATIFIED RANDOM SAMPLE A population is divided into subgroups, called strata, and
a sample is randomly selected from each stratum.
CLUSTER SAMPLE A population is divided into clusters using naturally occurring geographic
or other boundaries. Then, clusters are randomly selected and a sample is collected by randomly
selecting from each cluster.
Simple Every Use a random method to When the - Reduces - May not
Random individual select individuals from population is bias represent
Sampling in the the entire population. homogeneou - Easy to subgroups
(SRS) population s and a implement well
has an completely - Can be
equal random inefficient
chance of selection is for large
being needed. populations
selected.
Systemati Individuals Choose a starting point When a - Easy to - Can
c are selected randomly, then select simple and execute introduce
Random at regular every k-th individual quick - Ensures bias if there
Sampling intervals (where k=Population method is good is a hidden
from an Size/Sample) needed for coverage pattern in
ordered list. large of the the
populations. population population
1. The mean of the distribution of sample means will be exactly equal to the population mean if
we are able to select all possible samples of the same size from a given population.
2. There will be less dispersion in the sampling distribution of the sample mean than in the
population.
The z values express the sampling error in standard units—in other words, the standard error.
Chapter 9
Point Estimates
A point estimate is a single value (point) derived from a sample and used to estimate a
population value. For example, suppose we select a sample of 50 junior executives and ask how
many hours they worked last week. Compute the mean of this sample of 50 and use the value of
the sample mean as a point estimate of the unknown population mean.
Confidence Intervals
A range of values constructed from sample data so that the population parameter is likely to
occur within that range at a specified probability. The specified probability is called the level of
confidence.
Population SD
known
Population SD unknown
Precision Provides only a single value Provides a range that accounts for
as an estimate. variability.
Accuracy May not always be close to More reliable since it accounts for
the true population sampling error.
parameter.
Sample Proportion
X= number of success
Characteristics of t stat
● The t-distribution is symmetrical and bell-shaped, similar to the normal (Z) distribution.
● However, it has heavier tails, meaning it accounts for more variability.
For proportion,
π =Portion of Population
Chapter 10
NULL HYPOTHESIS A statement about the value of a population parameter developed for the
purpose of testing numerical evidence. H0 and read “H sub zero.”
ALTERNATE HYPOTHESIS A statement that is accepted if the sample data provide
sufficient evidence that the null hypothesis is false. H1 and read “H sub one.”
The null hypothesis will always contain the equal sign. =, ≥, ≤
For an alternate hypothesis, it must contain ≠, >, <.
LEVEL OF SIGNIFICANCE The probability of rejecting the null hypothesis when it is true.
It is also sometimes called the level of risk.
H0 Doesn’t Reject Reject
TEST STATISTIC A value, determined from sample information, used to determine whether to
reject the null hypothesis.
CRITICAL VALUE The dividing point between the region where the null hypothesis is
rejected and the region where it is not rejected.
DECISION MAKING
H0 can not be accepted if,
Left tailed, Zcalc < Zcrit. Used when testing if the mean is significantly less than a specified
value. Example: A researcher wants to test if the average blood pressure of a group is lower
than 120 mmHg.
Right tailed, Zcalc > Zcrit. Used when testing if the mean is significantly greater than a
specified value.Example: A factory wants to know if a machine produces more than 500 units
per hour on average.
Two tailed, |𝑍𝑐𝑎𝑙𝑐|>|𝑍𝑐𝑟𝑖𝑡|. It is used when we are concerned with deviations in both
directions, not just one. We reject H0if the test statistic falls into either of the two extreme tails
(upper or lower). The rejection region is split into two parts (each at α/2).
Differences in One and Two tailed
P Value
The probability of observing a sample value as extreme as, or more extreme than, the value
observed, given that the null hypothesis is true.
1. The p-value helps determine statistical significance.
2. A small p-value (≤0.05) suggests rejecting H0
3. The p-value must be compared to the significance level (α)
Type II Error
Chapter 11
When we want to know the difference between means or proportions of two different samples.
Degrees of freedom (df) refer to the number of values in a calculation that are free to vary while
still satisfying a given condition.For example, if you have 5 numbers with a known mean, only 4
numbers can be freely chosen because the last one is fixed by the mean.
In a t-test:
● For a one-sample t-test: df=n−1 (since the sample mean is already known).
● For an independent two-sample t-test (pooled): df=n1+n2−2
● For a paired t-test: df=n−1 (since differences are calculated first).
When to Use? When sample variances are When sample variances are
similar (check using Levene’s significantly different
test or F-test)