Inferential Statistics Lecture 2
Inferential Statistics Lecture 2
ONE-SAMPLE t-TEST
Recall:
LEFT TAIL TWO-TAIL TEST RIGHT TAIL
Source: Hartmann, K., Krois, J., Waske, B. (2018): E-Learning Project SOGA:
Statistics and Geospatial Data Analysis. Department of Earth Sciences, Freie
Universitaet Berlin.
Less than Not Equal Greater than
Below Different From Above
Lower than Changed From Higher than
Smaller than Not the same as Longer than
Shorter than Bigger than
Decreased Increased
Reduced from At least
At most
A statistician will make a decision about claims via a process called "hypothesis
testing”
i. A hypothesis test involves collecting data from a sample and evaluating
the data.
ii. Then, the statistician makes a decision as to whether or not there is
sufficient evidence, based upon analysis of the data, to reject the null
hypothesis.
Hypothesis that assumes there is no difference between the population
parameters of the groups being tested
Under this assumption, any apparent difference between sample statistics is the
result of sampling error.
Page 2 of 54
Say,
Shapes of Distribution
1. Symmetry (Symmetrical or asymmetrical)
2. Skew (Right or Left)
3. Peak or Modes (Unimodal, bimodal, or multimodal)
4. Spread (Narrow, Wide)
Page 3 of 54
Source:
http://homepage.stat.uiowa.edu/~rdecook/stat1010/notes/Section_4.2_dist
ribution_shapes.pdf
Where,
(a) Left-skewed (Negatively-skewed Distribution) The mean and
median are LESS than the mode. Has a long LEFT tail. The mean is
also to the LEFT of the peak.
(b) Right-skewed (Positive-skew Distribution) The mean and median
are GREATER than the mode. Has a long RIGHT tail. The mean is also
to the RIGHT of the peak.
(c) Symmetric Distribution The mean, median, and mode are EQUAL
Source: lynnschools.org
Page 4 of 54
Several unimodal distributions plotted on the same graph. The green “bell
curve” is the normal distribution
Source: https://www.statisticshowto.com/shapes-of-distributions/
NORMAL DISTRIBUTION
o Is a bell-curve or the normal curve
o Describes the tendency for the data to cluster around a central value
(Population MEAN mu, which is located in the middle of the curve)
o Parameters affecting the Normal Curve
a. POPULATION MEAN (μ) characterizes the position of the Normal
Distribution:
Recall:
Mean is the mathematical average of the data. It is the sum of
all data values in your data set divided by the total number of
values you have in the data set. It is used to represent the
central tendency of the data. Most of the values in the
normally distributed data will be clustered around the mean.
Page 5 of 54
Source: https://www.omnicalculator.com/statistics/normal-
distribution
(a) The LARGER the standard deviation, the more spread out
the distribution will be; and (b) The SMALLER the standard
deviation, the less spread out the distribution will be.
Source:
https://mathbitsnotebook.com/Algebra2/Statistics/STnorm
alDistribution.html
Page 7 of 54
95.4% of the observations will fall within +/- 2 standard deviations from the
mean; and
This means that data falling outside of three standard deviations ("3-sigma")
would signify rare occurrences.
Source: https://www.omnicalculator.com/statistics/normal-distribution
Page 9 of 54
Example 1:
The weights of stray dogs at a particular pound average 70 lbs with a standard
deviation of 2.5 lbs. Assuming the weights follow a Gaussian distribution:
Example 2:
Source: https://www.youtube.com/watch?v=mtbJbDwqWLE
68-95-99.7 Rule:
Within 1 Std Deviation away from the mean it contains a total area of 0.68 or
68%
Then, if you go away 2 Std Deviation from the mean, it contains a total area
of 0.95 or 95%
Therefore, 95% of the population are between 4.5 and 6.5 ft. tall.
This means, that 99.7% of the population are between 4.0 and 7.5 ft. tall.
Exercises:
Page 13 of 54
Given:
Based on the above Rule, there is an area of 95% contained within 2 std
Dvtns of the mean.
Ans. 47.5%
Q2. For the normal distribution below, approximately what area is contained
between -2 and 1?
Page 15 of 54
Given:
μ =0;=1
From the Rule, 1 std Dvtn away from the mean gives us 68% and half of this
is 34%
With 2 (0 to -2) std Dvtn from the mean gives us an area of 95%, dividing this
by 2 gives us equal to 47.5%
Skewness
If the distribution of a data set instead has a skewness less than zero, or
negative skewness (left-skewness), then the left tail of the distribution is
Page 17 of 54
longer than the right tail; positive skewness (right-skewness) implies that the
right tail of the distribution is longer than the left.
Kurtosis
Kurtosis measures the thickness of the tail ends of a distribution in relation
to the tails of a distribution. The normal distribution has a kurtosis equal to
3.0.
Distributions with larger kurtosis greater than 3.0 exhibit tail data exceeding
the tails of the normal distribution (e.g., five or more standard deviations
from the mean). This excess kurtosis is known in statistics as leptokurtic,
but is more colloquially known as "fat tails." The occurrence of fat tails in
financial markets describes what is known as tail risk.
Distributions with low kurtosis less than 3.0 (platykurtic) exhibit tails that
are generally less extreme ("skinnier") than the tails of the normal
distribution.Source:
https://www.investopedia.com/terms/n/normaldistribution.asp
KEY CONCEPTS
Degrees of freedom
represent the number of values that are free to vary in calculating each
statistic
Formula: df = n -1; where n Sample Size
t Distribution
is actually a series of distributions where the exact shape of each is
determined by its respective degrees of freedom.
HYPOTHESIS TESTING
1. p-Value Approach
A p value is used in hypothesis testing to help you support or reject
the null hypothesis.
The p value is the evidence against a null hypothesis.
The smaller the p-value, the stronger the evidence that you should
reject the null hypothesis.
It is compared to the level of significance ( ).
If
p-value ≤ , Then REJECT H0 at level α
p-value > , Then FAIL TO REJECT H0 at level α
When,
p-value > 0.05: Means, that there is GREATER than 5% chance that
the data is random but LESS than 95% confidence that the data is
significant.
We want as much as confidence as possible and we want a very
small p-value to tell us its not random, but it’s actually significantly
different.
Say,
We calculated a p-value of 0.53. Is that significantly different or not?
Ans. It’s NOT Significant. That means, there is a 53% chance that the
data you have is RANDOM and only 47% confidence that the data is
significant.
In scientific studies, they want a 95% confidence interval
The following table provides guidelines for using the p-value to assess the
evidence against the null hypothesis (Weiss, 2010).
C.V < 0.05: Means, that there is LESS than 5% chance that the
data is random but GREATER than 95% chance that the data is
significant.
Reject the null hypothesis if the test statistic lies in the critical region.
Otherwise, retain the null hypothesis.
KINDS OF VARIABILITY
Variance Standard Deviation
is mean squared is a measure of how
difference between spread out numbers are.
each data point and Its symbol is σ (the greek
the center of the letter sigma) for
distribution population standard
measured by the deviation and S for
mean. sample standard
deviation.
It is the square root of
the Variance.
SAMPLE
T-Test
o Also known as Student’s t-test
Page 20 of 54
o Is the data paired or unpaired? (Paired means that sets of data would
be linked in each row)
Unpaired: Height of Boys and Girls
Paired: Boy’s Height at age 7 and age 21
of same individual
o T-tests are statistical hypothesis tests that analyze one or
two sample means.
TYPES OF t-TESTS
1. One sample t-test: tests the mean of single group against a known
population mean.
Page 21 of 54
t-VALUE
o Measures the size of the difference in mean values relative to the variation
in the sample data
ONE-SAMPLE t-TEST
Degree of Freedom (df) = n -1
Where:
𝑥̅ Observed Mean of the Sample
μ Theoretical Mean of the Population
s Standard Deviation of the Sample
n Sample Size
SE Standard error of mean difference
represents how much random error is
in the sample and how well the sample
estimates the population mean.
Example 1:
Solution:
Given:
n = 25
Page 23 of 54
df = 25 – 1 = 24
𝑥̅ = 60
s = 19
Hypotheses:
H0 : μ= 67
Ha : μ < 67
Statistical Test: t-Test (Left-Tailed)
From T-Table below: Determine the Critical Value, given the following:
df=24 (Row), -tailed (LEFT) Test
(Column)
DECISION:
-1.84 < -
1.711, Therefore we
reject the H0 at level
0.05
As you progress through your university career you will be introduced to statistical
packages such as R and Minitab that can perform these tests for you and present
the final significance level. However, you may also be introduced to how to conduct
and interpret hypothesis test without using such software (this is good to
demonstrate a thorough knowledge of what is really happening with the data).
Example 2.
Page 25 of 54
Sol’n a. (Manual)
s = 11.155
From T-Table above: Determine the Critical Value, given the following:
df=11 (Row), = 0.05 and Two-tailed Test (Column)
DECISION:
-2.28 < - 2.201, Therefore, we reject the H0 at 0.05 significance level.
Two-tailed
You may also use the “DATA ANALYSIS” Option:
Page 27 of 54
If you do not see the “Data Analysis” option, you will need to install the add-
in. Do this by clicking on “File” in the top left corner, and selecting the
“Options” button (below left). You will then see the Excel Options menu (below
right): click on the “Add-Ins” button and select the “Analysis ToolPak” and click
the “Go” button to install. The “Data Analysis” tab should then appear in the
“Data” menu
Note: No available option for one sample case so we use a Dummy Values to
make it appear as 2 samples.
Click: Data Tab Data Analysis OptionT-test: Paired Two Sample for
Means
Then, type the value for Hypothesized mean difference and the Alpha (See
problem) Press OK
Page 30 of 54
DECISION:
Two-tailed
p-Value = 0.044
Page 31 of 54
Click here
Page 33 of 54
Double-Click
Type 95 (Given
in the Problem)
Page 35 of 54
Press OK
Page 36 of 54
DECISION:
p-Value = 0.044
Example 3
An engineer measured the Brinell hardness of 25 pieces of ductile iron that
were subcritically annealed. The resulting data were:
The engineer hypothesized that the mean Brinell hardness of all such ductile
iron pieces is greater than 170. Therefore, he was interested in testing the
hypotheses at significance level α at 0.05.
Solution:
Given:
n = 25
df = 25 -1 = 24
Hypotheses
H0 μ = 170
μ > 170
From T-Table above: Determine the Critical Value, given the following:
df=24 (Row), = 0.05 and One-tailed (Right)Test
(Column)
DECISION:
1.22184 < 1.711, Therefore, we fail to reject the H0 at 0.05 significance level
Page 40 of 54
Right-tailed
DECISION:
p-Value = 0.117
Since 0.117 > 0.05, Therefore we Fail to Reject the H0. There is an insufficient
evidence, at the α = 0.05 level, to conclude that the mean Brinell hardness of
all such ductile iron pieces is greater than 170.
DECISION:
Right-tailed
p-Value = 0.117 (One)
Page 42 of 54
Since 0.117 > 0.05, Therefore we Fail to Reject the H0 Hypothesis. There is
an insufficient evidence, at the α = 0.05 level, to conclude that the mean Brinell
hardness of all such ductile iron pieces is greater than 170.
USING MINITAB
DECISION:
p-Value = 0.117
Page 43 of 54
Since 0.117 > 0.05, Therefore we Fail to Reject the H0 Hypothesis. There is
an insufficient evidence, at the α = 0.05 level, to conclude that the mean Brinell
hardness of all such ductile iron pieces is greater than 170.
Exercises
A consumer group, concerned about the mean fat content of a certain grade
of steakburger submits to an independent laboratory a randomsample of 12
steakburgers for analysiis. The percentage of fat in each of the steakburgers
is as follows:
21 18 19 16 18 24 22 19 24 14 18 15
The manufacturer claims that the means fat content of this grade of
steakburger is less than 20%. Assuming percentage fat content to be
normally distributed, carry out an appropriate hypohtesis test in orders to
advise the consumer group as to the validity of the manufacturer’s claim.
Sol’n.
Given:
μ = 20%
n = 12
Hypotheses:
μ = 20%
μ < 20%
df = 12 – 1 = 11
̅𝑥 = (21+18+19+16+18+24+22+19+24+14+18+15)/12= (228/12 )= 19
-1.796
DECISION:
Since,
-1.07 > -1.796, Therefore we failed to reject the H0
level.
49 50 45 51 47 49 48 54 53 55 45 50 48
Assuming that this sample came from an underlying normal population<
test< at the 5% significance level, the hypothesis that the population mean
length is 50cm.
Page 45 of 54
A random sample of 12 steel ingots was taken from a production line. The
masses, in kilograms, of these ingots are given below:
Sol’n.
Given:
μ = 25 Kg
n = 12
Hypotheses: (4 Pts)
H0: μ ≤ 25
H1: μ > 25
df = 12 – 1 = 11 (2 pts)
Page 46 of 54
(3 pts) (2 pts)
DECISION:
Since,
1.43 < 1.796, Therefore we failed to reject the H0
significance level. (2 pts)
Page 47 of 54
Page 48 of 54
12(8201.42)−97593.76
𝑠 =√ 12(12−1)
98417.04−97593.76
𝑠=√ 12(11)
823.28
𝑠=√ 𝑠 = √6.237
132
𝑠 = 2.5
26.03333−25 1.03333
𝑡= 2.5 𝑡= 2.5
√12 3.464
𝑡 = 1.43
1.03333
𝑡= 0.7217
Page 50 of 54
Lesson Summary
reject the null hypothesis when it is true and a Type II error is when we do
not reject the null hypothesis, even when it is false.
References:
https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/Hypothesis-
Tests/Introduction-to-Hypothesis-Testing/Critical-Value-and-the-p-Value-
Approach/index.html
https://www.analyticssteps.com/blogs/what-are-differences-between-z-test-
and-t-test
12(1630)−3136
𝑠=√ 12(12−1)
19560−3136
𝑠=√ 12(11)
16424
𝑠=√ 𝑠 = √124.4242
132
4.67−12 −7.33
𝑡= 11.155 𝑡= 11.155
√12 3.464
𝑡 = −2.28
−7.33
𝑡=
3.22
𝑛𝑥 2 −(𝑥)2
𝑠 =√ 𝑛(𝑛−1)
12(4448)−51984
𝑠 =√
12(12−1)
53,376−51984
𝑠=√ 12(11)
1392
𝑠 = √ 132 𝑠 = √106.5454
𝑠 = 3.25
https://www.youtube.com/watch?v=R7y1dIRIqq8
https://www.youtube.com/watch?v=fiMFqfatieE
Page 53 of 54
https://www.youtube.com/watch?v=5vmb5zafqNk
https://www.youtube.com/watch?v=Fsa-5_XdIMs
https://www.youtube.com/watch?v=yvHQEJnYZBY
https://www.youtube.com/watch?v=rK3mXS3gHyI&t=738s
Example 1:
The weights of stray dogs at a particular pound average 70 lbs with a standard
deviation of 2.5 lbs. Assuming the weights follow a Gaussian distribution:
Answers:
Q. A coffee shop relocates to Italy and wants to make sure that all lattes are
consistent. They believe that each latte has an average of 4 Oz of espresso. If
this is not the case, they must increase or decrease the amount. A random
sample of 25 lattes shows a mean of 4.6 Oz of espresso and a standard
deviation of .22 Oz. Use alpha = .05 and run a one-sample t-test to compare
with the known population mean.
Solution
Page 54 of 54
1. Hypothesis
H0 : μ = 4.0
Ha : μ ≠ 4.0
2. Statistical Tool: t-Test (Two-Tailed)
3. Significance Level:
4. t-Test
n = 25
sample mean = 4.6
μ = 4.0
s = 0.22 Oz.
t = (4.6 – 4.0)/(0.22/sqrt(25))
= 0.6/(0.22/5)
= 0.6 / 0.044
=13.6364
Critical Value:
df = 25-
From the T Table
CV = ± 2.064
5. Conclusion
Since 13.6364 > 2.064 or -13.6364 < -2.064, we therefore conclude
that there is a significant difference between our sample mean of the
amount of espresso in the coffee in Italy and the expected population
amount. (Or We reject the Ho hypothesis and accept Ha. Therefore, we
can easily say that there is too much espresso being placed in the coffee
in Italy and it should be reduced to meet the normal (population) mean.)