0% found this document useful (0 votes)
21 views33 pages

Lecture 2 Hypothesis Test I - Updated2

Uploaded by

sharontao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views33 pages

Lecture 2 Hypothesis Test I - Updated2

Uploaded by

sharontao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Environmental Data analysis

Lecture 2

Dr. Zhi NING


Probability distributions

• Normal distribution in nature

Thomas Young experiment, 1801


Probability distributions

• The Normal Distribution


– Often called Gaussian distribution
– Characterized completely by N(η, σ2 ), “a normal
distribution with mean η and variance σ2 .

3
Probability distributions

• The Normal Distribution


1. The vertical axis (probability density) is scaled
such that area under the curve is unity (1.0).
2. The standard deviation σ measures the distance
from the mean to the point of inflection.
3. The probability that a positive deviation from the
mean will exceed one σ is 0.1587.
4. Because of symmetry, the probabilities are the
same for negative deviations
5. The chance that a deviation in either direction will
exceed 2σ is 2(0.0228) = 0.0456

4
Probability density function

x
• How to find the quantitative relation between
– probability density function (PDF)
– x value and  value

– The standard normal distribution N(0, 1) has


Probability distributions

• NORM.DIST(x, mean, standard_dev, cumulative)


– Returns the normal cumulative distribution of with specific η and σ.
– Returns α value with given z and η σ values.
• NORM.INV (probability, mean, standard_dev)
– Returns the inverse of the normal cumulative distribution for η and σ.
– Returns z value with given α, η and σ values
• NORM.S.DIST (z, cumulative)
– Returns the standard normal cumulative distribution of with η=0 and σ=1
• NORM.S.INV (probability)
– Returns the inverse of the standard normal distribution with η=0 and σ=1

• Cumulative or not?
• Left tailed or right tailed?
• How to generate a normal distribution in excel?

6
Probability distributions

• Examples
– A normal distribution with η=8mg/L and σ=1 mg/L;
– Look for the value with 95% of data below?
– Look for the probability that the value is read
below 6.4mg/L?

– How to draw a normal distribution in Excel?


– Use function: norm.inv(rand(),8,1,1)

7
Probability distributions

• t distribution
– In normal distributions, both η and σ are known;
– In practice, σ is often not known and we use s to
replace σ:
– Bell shaped and symmetric but tails are wider.
– Width of the t distribution depends on degree of
freedom.
Probability distributions

• Part of the t table as function of  and 

x
x axis moves by its own values;
t
t is a normalized value for linkage with formular

9
Probability distributions

• T.INV (probability, degree of freedom)


– Returns the inverse of the left tailed Student t distribution
• T.INV.2T (probability, degree of freedom)
– Returns the inverse of the two tailed Student t distribution

• T.DIST (x, degree of freedom, cumulative)


– Returns the left tailed Student t distribution
• T.DIST.RT (x, degree of freedom, cumulative)
– Returns the right tailed Student t distribution
• T.DIST.2T (x, degree of freedom, cumulative)
– Returns the two tailed Student t distribution

• If we enter α as probability and n-1 as Deg_freedom, then T.INV


outputs tn-1, 1-α/2, the 1-α/2 th percentile of a t distribution with n-1
degrees of freedom.
10
Probability distributions

• Example
– What is the 97.5th percentile of a t distribution with
degree of freedom 24 ?
– T.INV.2T(0.05, 24)=2.06
OR -T.INV(0.025,24)

– What is the probability of t value larger than 2.064


in a t distribution with degree of freedom 24?

– T.DIST.2T(2.064,24)

11
Probability distributions

• Consider a sampling distribution of the


average, with many random samples of size n
were collected from a population
• Sample standard deviation:

• Standard error of the mean is:

12
Probability distributions

• Example:

From Sd to Se, 0.266

NORM.DIST(7.51,8,0.27,1) With t=-1.842 and =26,


T.DIST(-1.842,26,1)
N(8,0.27)

13
Important notes

• For t distributions, remember the three steps:


– Plugging in the inputs from questions
– Visualize the questions with distributions

– Use two sets of equations from t distributions


• T.INV (probability, degree of freedom)
• T.DIST (x, degree of freedom, cumulative)
Test of Hypothesis
Test of Hypothesis

• Background
– Parametric vs. Nonparametric tests
– Sampling distribution
– Test levels and p values
– Error types and power of test
– Confidence intervals
– The elements of hypothesis tests
• Parametric tests
– One sample t test
– Tests for Differences of Mean under Independence
– Tests for Differences of Mean for Paired Samples
• Nonparametric tests
– Resampling test or Monte Carlo test
– Bootstrap
Test of Hypothesis

• Parametric vs. Nonparametric tests


– Parametric test:
• The theoretical distribution of the data is known or
assumed;
– Nonparametric test:
• No assumption is made to represent the data
– Classical approach: to construct the data so that
distribution is unimportant
– Resampling procedures: repeated computer manipulations
of the observations.
Parametric tests

– One-sample t Test
• Compare the mean of a distribution with a specified value

– Two Samples t Tests


• Dependent t-test for Mean of two paired samples
– Pair the samples according to time, technician, batch of
material, or other factors that might contribute to a difference;
– Results are produced in pairs that are not independent of
each other.

• Independent t test for differences of means


– Make a series of tests using treatment A and then
independently make a series of tests using method B.
– The two methods are compared by computing the average
for each treatment.
Important concepts

• Test levels
– Test level, or the rejection level, level of
significance, is the sufficiently improbable region
of the null distribution.
– Test level is chosen manually. Commonly 5%;
sometimes 10% or 1% is also chosen.
• p value
– The p-value is the smallest level of significance
that would lead to rejection of the null hypothesis
with the given data
Important concepts

• Confidence intervals
• Instead of asking “what is the probability that Z falls within
limits a and b in the normal distribution”, it is more
important to ask “Within what intervals or limits does X% of
the population lie in the normal distribution”.
• The X% is referred as the “confidence level”.
• The interval correspond to
this level is called the
“confidence intervals”
or “confidence limits”.
Confidence interval

• An interval within which the value of a


parameter would be expected to lie;
• 1- confidence interval from the population
mean based on t statistics:

• Where, t/2 and Se have n-1 degree of freedom


– Se is Sd/sqrt(n)
• Meaning:
There is a 1-  probability that the true value falls within
this confidence interval

21
Confidence interval

• Example Se

22
Steps for hypothesis test

• Steps of hypotheses testing


1. State the hypotheses, choose the alpha α level
2. Locate the critical region on the table of t values
3. Collect sample data and compute t statistic
4. Evaluate H0 and make a decision

Make sure of the visual inspection method for


distributions.
One-sample t Test

• To examine the mean difference between the sample and the


known value of the population mean µ0.

Sample distribution
Test value in one sample t test

Critical t value
Test level, α

Sample value t
corresponding t value
x
Sample value, x
http://www.wadsworth.com/psychology_d/special_features/ext/workshops/t_testsample1.html
Read and type Greek letters correction
https://www.keynotesupport.com/internet/special-
characters-greek-letters-symbols.shtml

• Alt 945 α
• Alt 951 η
• Alt 956 μ
• Alt 960 π
• Alt 961 ρ
• Alt 963 σ

https://www.thespruceeats.com/the-greek-
25
alphabet-1705558
Application examples

• Applications:
– 1. In laboratory quality control checks, the analyst measures the
concentration of test specimens that have been prepared or
calibrated so precisely that any error in the quantity is negligible.
The specimens are tested according to a prescribed analytical
method and a comparison is made to determine whether the
measured values and the known concentration of the standard
specimens are in agreement.
– 2. The desired quality of a product is known, by specification or
requirement, and measurements on the process are made at
intervals to see if the specification is accomplished.
– 3. A vendor claims to provide material of a certain quality and the
buyer makes measurements to see whether the claim is met.
– 4. A decision must be made regarding compliance or
noncompliance with a regulatory standard at a hazardous waste
site.
One-sample t Test

• Assumptions:
1. Dependent variables should be normally distributed in the
population.
2. Samples drawn from the population should be random.
3. Cases of the samples should be independent.
4. We should know the population mean.
• Hypothesis:
– A. Null hypothesis: assumes that there are no significance
differences between the population mean and the sample
mean.
– B. Alternative hypothesis: assumes that there is a significant
difference between the population mean and the sample
mean.
One-sample t Test

• Procedures:

1. Calculate the standard deviation for the sample by using this


formula:
Where, S = Standard deviation, = sample mean,
n = number of observations in sample
2. Calculate the value of the one sample t-test, by using this formula:
Where, t = one sample t-test value,
µ= sample mean

3. Calculate the degree of freedom by using this formula:


V=n–1 Where, V= degree of freedom

4. Hypothesis testing:
Compare the calculated value with the table value. If the calculated value is
greater than the critical value, then we will reject the null hypothesis, and accept
the alternative hypothesis.
One-sample t Test

• Compare the calculated t value and the


statistical table.
– Confidence interval p =0.05 corresponds to a t
value with given degree of freedom;

– t > t0.05, H0 is
rejected
– Or, calculated
t value corresponds
to a p value;
– p < 0.05, H0 is
rejected
One-sample t Test

• Example
– 1. Calculate the standard deviation
Measured DO
0.305 mg/L concentration
Unit mg/L
1 1.20

– 2. Calculate t value 2
3
1.40
1.40
1.30
t = 2.407 4
5 1.20
6 1.35
7 1.40
In Excel TDIST(t, df, tails)=0.03 <0.05 8 2.00
9 1.95
10 1.10
11 1.75
Ho is rejected. 12 1.05
13 1.05
14 1.40
Significant test

If a sample mean of 7.51 mg/L is estimated from measurements on 27


specimens, what is the likelihood that the true population mean is 8.0mg/L?

• Statistical inference is about making assessment from


experimental data about an unknown population parameter:
• Significance tests
• Confidence interval

• Significance test
• Form of hypothesis test with null and alternative hypothesis;
• Significance level  represents the risks of falsely rejecting H0
Significant test

• Previous example:
– Null hypothesis is “the mean is 8.0mg/L”
– Alternative hypothesis is “the mean < 8.0mg/L”
– Test at significance level =0.05

– t= -1.842 with degree of freedom of n-1=27-1=26


T.DIST(-1.842,26,1)=0.04, <0.05 so reject H0
Or T.INV(0.05, 26)=-1.706 and -1.853<-1.706, so reject H0
– Conclusion:
• The mean of the samples isn’t 8.0mg/L at significance level of 0.05

32
Significant test

• Another example:

• One sided or two sided test?


• In a case where a positive deviation is undesirable but a negative deviation
is not, a one-sided test would be indicated.
– (1) judging compliance with an environmental protection limit where high values indicate a violation,
and
– (2) an experiment intended to investigate whether adding chemical A to the process
increases the efficiency.
• If the experimental question is whether adding chemical A changes the efficiency
(either for better or worse), a two-sided test would be indicated.

33

You might also like