0% found this document useful (0 votes)
23 views92 pages

Estimation

Uploaded by

deaththeos954
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views92 pages

Estimation

Uploaded by

deaththeos954
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 92

Probability

由 NordriDesign 提供
www.nordridesign.com
Course Outline

• Introduction to Biostatistics
• Descriptive Biostatistics
• Probability
• Discrete Probability Distributions
• Continuous Probability Distributions
• Sampling Distributions
• Estimation
• Hypothesis Testing
Lecture Outline
Part I

Large Sample Estimation


Probability vs Statistics Reasoning

• In the last Chapter we looked at sampling: staring with a population, we imagined


taking many samples and investigated how sample statistics were distributed
(sampling distribution).

• In this Chapter, we do the reverse: given one sample, we ask what was the random
system that generated its statistics.

• This shift our mode of thinking from deductive reasoning to inductive reasoning.

• Deductive reasoning: from a hypothesis to a conclusion.

• Inductive reasoning argues backward: from a set of observations to a reasonable


hypothesis.

Probability/deduction

Population Sample

Statistics/induction
Populations vs. Samples

• The term population is used in statistics to represent all possible measurements or


outcomes that are of interest to us in a particular study or piece of analysis .
– In the example the population of interest was the voting intentions of all voters in
the Western Nation.

• The term sample refers to a subset of the population that is selected for analysis.
– In the example the polling company selected a sample of 1,008 voters
Sampling

• In choosing a sample it is important that it is representative of the population.


• No bias should exist in the sample.
• There are a number of sampling methods available to ensure that your data is
representative.
• A simple random sample is the most straight forward of these methods.
Parameters

• Populations are described by their probability distributions and/or parameters.


o For quantitative populations, the location and shape are described by and 

o Binomial populations are determined by a single parameter, p.

• If the values of parameters are unknown, we make inferences about them using
sample information.
Types of Inference

Estimation
•Estimating or predicting the value of the parameter

•“What is (are) the most likely values of  , σ or p?”

Hypothesis Testing
•Deciding about the value of a parameter based on some preconceived idea.

•“Did the sample come from a population with or p = .3?”


Types of Inference

Example: A consumer wants to estimate the average price of similar homes in his city
before putting his home on the market.

Estimation: Estimate µ, the average home price.

Example: A manufacturer wants to know if a new type of steel is more resistant to high
temperatures than an old type was.

Hypothesis test: Is the new average resistance, µN greater than the old average
resistance, µO?
Types of Inference

Whether you are estimating parameters or testing hypotheses, statistical methods are
important because they provide:

•Methods for making the inference

•A numerical measure of the goodness or reliability of the inference


Statistical Inference

Statistical inference is the procedure by which we reach a conclusion about a


population on the basis of the information contained in a sample drawn from that
population.

Estimation involves the use of the data in the sample to calculate the corresponding
parameter in the population from which the sample was drawn.
Estimator

• A single computed value has been referred to as an estimate.

• An estimator is a rule, usually a formula, that tells you how to calculate the estimate
based on the sample.

• A sample statistic which is used to estimate a population parameter.


Estimator

A good estimator must satisfy three conditions in order to predict well the value in the
population:

1. Unbiased: The expected value of the estimator must be equal to the mean of the
parameter.

2. Consistent: The value of the estimator approaches the value of the parameter as the
sample size increases.

3. Relatively Efficient: The estimator for a parameter has the smallest variance of all
estimators which could be used.
Unbiased Estimator

• An estimator whose expected value is the mean of the parameter being estimated.

• Some measures of a sample can be used as unbiased estimators of the


corresponding parameters in the population. These are listed in the table below.
Types of Estimates

There are two types of estimates:

1. Point Estimates

2. Interval Estimates.
Point Estimate

A point estimate is a single numerical value used to estimate the corresponding


population parameter.
Interval Estimate

• An interval estimate is a range of values used to estimate a parameter.

• Two numbers are calculated to create an interval within which the parameter is
expected to lie. It is constructed so that, with a chosen degree of confidence, the
true unknown parameter will be captured inside the interval.
Confidence Interval

• The point estimate is going to be different from the population parameter because
due to the sampling error, and there is no way to know who close it is to the actual
parameter. For this reason, statisticians like to give an interval estimate which is a
range of values used to estimate the parameter.

• A confidence interval is an interval estimate with a specific level of confidence.


Confidence Interval

• A level of confidence is the probability that the interval estimate will contain the
parameter.

• Confidence intervals depend on sampling distributions.

• The shape of sampling distributions depend on sample sizes.

• For large sample sizes, central limit theorem applies which allow us to use normal
distributions.

• For small sample sizes, we need to learn a new distribution.


Confidence Interval

1. Confidence Interval for a Population mean

2. Confidence Interval for the Difference of Two Population Means

3. Confidence Interval for a Population Proportion

4. Confidence Interval for the Difference of Two Population Proportions

5. Confidence Interval for the Variance of a Normally Distributed Population

6. Confidence Interval for the Ratio of Variances of Two Normally Distributed Populations
Confidence Level

The percent of the time the true mean will lie in the interval estimate given.
Point Estimator of Population Mean

A point estimate of population mean, µ, is the sample mean.

x
 x i

n
Example
Point Estimation of Population Proportion

• An point estimate of population mean, p, is the sample proportion pˆ x / n, where x


is the number of successes in the sample.

• Example: A sample of 200 students at a large university is selected to estimate the


proportion of students that wear contact lens. In this sample 47 wear contact lens.

pˆ 47 / 200 .235


Properties of Point Estimators

• Since an estimator is calculated from sample values, it varies from sample to sample
according to its sampling distribution.

• An estimator is unbiased if the mean of its sampling distribution equals the parameter
of interest. It does not systematically overestimate or underestimate the target
parameter.

• Both sample mean and sample proportion are unbiased estimators of population mean
and proportion. The following sample variance is an unbiased estimator of population
variance.

22
 ( x  x )
ss22   ( xi i  x )
nn  11
Properties of Point Estimators

• Of all the unbiased estimators, we prefer the estimator whose sampling distribution
has the smallest spread or variability.
Maximum Error of the Estimate

• The maximum error of the estimate is denoted by E and is one-half the width of the
confidence interval for means and proportions.

• The maximum difference between the point estimate and the actual parameter.
Sampled Population & Target Population

The sampled population is the population from which we actually draw the sample.

The target population is the population about which we wish to make an inference.

•These two populations may or may not be the same.


•When they are the same, it is possible to use statistical inference procedures to make
conclusions about the target population.
• If the sample and target populations are different, conclusions can be made about the
target population only on the basis of non-statistical considerations.
•In many situations the sampled population and the target population are identical; when
this is the case, inferences about the target population are straightforward.
•The researcher, however, should be aware that this is not always the case and not fall
into the trap of drawing unwarranted inferences about a population that is different from
the one that is sampled.
Random and Non-Random Samples

The strict validity of statistical procedures depends on the assumption of random samples.
Random Sample

A random sample is a selection of some members of the population such that each
member is independently chosen and has a known nonzero probability of being selected.
Simple Random Sample

A simple random sample is a random sample in which each group member has the
same probability of being selected.
Confidence Intervals

• For a Normal distribution, we know that 95% of values will be within 1.96 Standard
deviations of 

• So, given one estimate we can say that this estimate is within 1.96 standard errors
of the actual population mean , with 95% confidence.

95% in • We can turn this knowledge on its head: given we


shaded can be 95% confident that the true mean  is within
area 1.96 standard errors of it.
Confidence Interval

• From this we can specify a range of values within which we are 95% confident
that the population mean () lies.
• This is called a confidence interval.
• 95% Confidence Interval for a population mean (from large enough sample):

__
x 1.96 standard error
__

x 1.96 
n

• Remarkably, this result holds for samples of size 30 or more. So, a large sample
in this context, is a sample of 30 or more.
The Margin of Error

• In this note we assume that the sample sizes are large.

• From the Central Limit Theorem (CLT), the sampling distributions of x and p̂ will

be approximately normal under certain assumptions.

• For unbiased estimators with normal sampling distributions, 95% of all point

estimates will lie within 1.96 standard deviations of the parameter of interest.

• Margin of error: provides a upper bound to the difference between a particular

estimate and the parameter that it estimates. It is calculated as

96
11..96 std
stderror
errorof
of the
theestimator
estimator
The 95% Confidence Interval
The 95% Confidence Interval
Components of an Interval Estimate

• The interval estimate of µ is centered on the point estimate of µ. Approximately 95%

of the values of the standard normal curve lie within 2 standard deviations of the

mean. The z score in this case is called the reliability coefficient. The real value to

use is 1.96.
Estimating Means and Proportions

• For a quantitative population,

Point
Pointestimator
estimatorof
of population
populationmean
meanμμ::xx

Margin
Marginof error((nn
of error 30))::
30 11..96
96
nn

• For a binomial population,

Point
Pointestimator
estimatorofof population proportionpp::p̂p̂
populationproportion x/n
x/n
p̂p̂q̂q̂
Margin
Marginof error::
of error 1.96
1.96 n
n
pq
pq
Assumption : np  5 and nq  5; or 0  p 2 n 11
Assumption : np  5 and nq  5; or 0  p 2
n
Estimating Means and Proportions

Example: A homeowner randomly samples 64 homes similar to her own and finds
that the average selling price is ₦250,000 with a standard deviation of ₦15,000.
Estimate the average selling price for all similar homes in the city.

Solution

Point
Pointestimator of μμ::xx250
estimatorof 250,000
,000
 15
15,000
,0003675
Margin of error : 1 .
Margin of error : 1.9696 1 .
1.9696 3675
nn 64
64
Estimating Means and Proportions

Example: A quality control technician wants to estimate the proportion of soda


cans that are underfilled. He randomly samples 200 cans of soda and finds 10
underfilled cans.

Solution

nn 200 pp


200 proportion
proportionof of underfille
underfilleddcans
cans
Point
Pointestimator of pp::pˆpˆ 
estimatorof x/n
x/n 10 200
10//200 ..05
05
pˆpˆqˆqˆ (.(.05
05)(.
)(.95
95)) .03
Margin of error : 1 .96
Margin of error : 1.96 1 .96
1.96 .03
nn 200
200
Interval Estimation/Confidence Interval

• Create an interval (a, b) so that you are fairly sure that the parameter lies between
these two values.

• “Fairly sure” means “with high probability”, measured using the confidence
coefficient, 1

• Usually 1- α = .90, .95, .99

• For large-Sample size,

100(1-)%
100(1-)%Confidence
ConfidenceInterval:
Interval:
Point Estimator 
Point Estimator  zz SE
SE
Interpretation of A Confidence Interval

• A confidence interval is calculated from one given sample. It either covers or misses
the true parameter. Since the true parameter is unknown, you'll never know which
one is true.

• If independent samples are taken repeatedly from the same population, and a
confidence interval calculated for each sample, then a certain percentage (confidence
level) of the intervals will include the unknown population parameter.

• The confidence level associated with a confidence interval is the success rate of the
confidence interval.
General Expression for an Interval Estimate
Table of confidence coefficients

α 1-α Z
.10 .90 1.645
.05 .95 1.960
.01 .99 2.575
Confidence Intervals for Means and Proportions

• For a quantitative population,

Confidence
Confidenceinterval
intervalfor
foraapopulation
populationmean
meanμμ::

xxz
z/ 2/ 2
nn

• For a binomial population,

Confidence
Confidenceinterval
intervalforforaapopulation
populationproportion
proportion pp::
pˆpˆqˆqˆ
pˆpˆzz/ 2/ 2
nn
Confidence Intervals for Means and Proportions

Example: One sample of size 30 from the electronic components yields a sample
mean = 5,873 hours. We know  = 3,959, so a 95% confidence interval would be:
__
x 1.96 standard error
__

3959
 x 1.96   5873 1.96 
n 30
 5873 1417  4456 to 7290
Interpretation: we would say that the average lifetime of all components (μ) is between
4,456 and 7,290 hours with 95% confidence.

Why is this any good?


Before: one estimate, = 5,873 but no idea of how good or bad it was, i.e. how close to μ is
was likely to be.

Now: 95% confident that μ is between 4,456 and 7,290 hours.


So, using CLT leads to Confidence Intervals that enables us to estimate a statistic with
certain level of confidence.
In other word it gives us an objective measure of the actual amount of information
contained in our sample about the likely location of μ.
Confidence Intervals for Means and Proportions

Example: A random sample of n = 50 males showed a mean average daily intake of


dairy products equal to 756 grams with a standard deviation of 35 grams. Find a
95% confidence interval for the population average µ.

Solution

 35
x 1.96  756 1.96  756 9.70
n 50

or 746.30    765.70 grams.


Confidence Intervals for Means and Proportions

Example: Find a 99% confidence interval for µ, the population average daily intake
of dairy products for men.

Solution

 35
x 2.58  756 2.58  756 12.77
n 50

or 743.23    768.77 grams.

The interval must be wider to provide for the increased confidence that is does
indeed enclose the true value of .
Confidence Intervals for Means and Proportions

Example: A physical therapist wished to estimate, with 99 percent confidence, the


mean maximal strength of a particular muscle in a certain group of individuals. He
is willing to assume that strength scores are approximately normally distributed
with a variance of 144. A sample of 15 subjects who participated in the experiment
yielded a mean of 84.3.

Solution
•The z value corresponding to a confidence coefficient of .99 is found in the standard
normal distribution table is 2.58. This is our reliability coefficient.
•The standard error is;

•Our 99 percent confidence interval for µ, then, is;

We say we are 99 percent confident that the population mean is between 76.3 and 92.3
since, in repeated sampling, 99 percent of all intervals that could be constructed in the
manner just described would include the population mean.
Confidence Intervals for Means and Proportions
Example: Punctuality of patients in keeping appointments is of interest to a
research team. In a study of patient flow through the offices of general
practitioners, it was found that a sample of 35 patients were 17.2 minutes late for
appointments, on the average. Previous research had shown the standard
deviation to be about 8 minutes. The population distribution was felt to be non-
normal. What is the 90 percent confidence interval for the true mean amount of
time late for µ appointments?

Solution
•Since the sample size is fairly large (greater than 30), and since the population standard
deviation is known, we draw on the central limit theorem and assume the sampling
distribution of to be approximately normally distributed.
•From the standard normal distribution table we find the reliability coefficient
corresponding to a confidence coefficient of .90 to be about 1.645, if we interpolate.
•The standard error is;

•So that our 90 percent confidence interval for is;


Confidence Intervals for Means and Proportions
Confidence Intervals for Means and Proportions

b) At 95% confidence interval, z = 1.96

5.98 ± 1.96 (.875)

(4.265, 7.695)

c) At 99% interval, z = 2.575

5.98 ± 2.575 (.875)

(3.7261, 8.2339)

•A higher percent confidence level gives a wider band.


•There is less chance of making an error but there is more uncertainty.
•Calculator answers are more accurate because the calculator uses exact values and
derives its answers from calculus.
Confidence Intervals for Means and Proportions

Example: Of a random sample of n = 150 college students, 104 of the students said
that they had played on a soccer team during their K-12 years. Estimate the
proportion of college students who played soccer in their youth with a 90%
confidence interval.

Solution

pˆ qˆ 104 .69(.31)
pˆ 1.645  1.645
n 150 150

 .69 .06 or .63  p  .75.


Confidence Intervals for Means and Proportions

Example: The AB Internet and Country BA Life Project reported in a particular year
that 18 percent of Internet users have used it to search for information regarding
experimental treatments or medicines. The sample consisted of 1220 adult Internet
users, and information was collected from telephone interviews. We wish to
construct a 95 percent confidence interval for the proportion of Internet users in
the sampled population who have searched for information on experimental
treatments or medicines.

Solution

•We shall assume that the 1220 subjects were sampled in random fashion.
•The best point estimate of the population proportion is
•The reliability coefficient corresponding to a confidence level of .95 is 1.96.
•Our estimate of the standard error is,

•The 95 percent confidence interval for p, based on these data, is


Estimating the Difference between Two Means

• Sometimes we are interested in comparing the means of two populations.


• The average growth of plants fed using two different nutrients.
• The average scores for students taught with two different teaching methods.
• To make this comparison,

AArandom
randomsample
sampleofofsize
sizenn11drawn
drawnfrom
from
population with mean μ and variance  2
population 1 with mean μ1 and variance 1 2. .
1
1 1

AArandom
randomsample
sampleofofsize
sizenn22drawn
drawnfrom
from
population mean μ 2 and variance  2
population 2 with mean μ and variance  2. .
2 with 2
2 2
Estimating the Difference between Two Means

Notations - Comparing Two Means

Mean Variance Standard Deviation

Population 1 µ1 σ1 2 σ1

Population 2 µ2 σ2 2 σ2

Sample Standard
Mean Variance
size Deviation

Sample from
n1 s1 2 s1
Population 1

Sample from
n2 s2 2 s2
Population 2
Estimating the Difference between Two Means

• We compare the two averages by making inferences about µ1-µ2, the difference in

the two population averages.

• If the two population averages are the same, then µ1-µ2 = 0.

• The best estimate of µ1-µ2 is the difference in the two sample means,

xx11 xx22
1.1. The meanofof xx1 1 xx2 2isis1 1 2 2, ,the
Themean thedifference
difference inin
the
thepopulation
populationmeans.
means.
12 2 22 2
2.2. The
Thestandard deviationofof xx1 1 xx2 2isis n 1 n 2. .
standarddeviation
n1 n2
1 2
33. . IfIfthe
thesample
samplesizes
sizes(both
(bothnn1 1and andnn2 2))are
arelarge,
large,the
the
sampling
samplingdistributi onofof xx1  xx2 isisapproximat
distribution 1 2 approximatelyelynormal,
normal,
ss12 2 ss22 2
and
andstandard
standarddeviation
deviationcan
canbe
beestimated SE n 1 n 2. .
estimatedasasSE
n1 n2 1 2
Estimating 1-
For large samples, point estimates and their margin of error as well as confidence
intervals are based on the standard normal (z) distribution.

Point
Pointestimate for11--22::xx11 xx22
estimatefor
ss122 ss222
Margin
Marginof Error::
of Error ZZ/ 2/ 2 1
 2
nn1 nn2
1 2

Assumption
Assumption::
Bothnn1 30
Both
1 andnn2 30
30and 2 30
Confidence
Confidenceinterval for11--22::
intervalfor
ss122 ss222
((xx1  xx2 ))z 1
1 2 z/ 2/ 2  2
nn1 nn2
1 2
Estimating 1-
Estimating 1-
Example: Compare the average daily intake of dairy products of men and women
using a 95% confidence interval.

Solution

s12 s22
( x1  x2 ) 1.96 
n1 n2

35 30
 (756  762) 1.96    6 12.78
50 50
Continue on next slide…
or - 18.78  1   2  6.78.
Estimating 1-

• Could you conclude, based on this confidence interval, that there is a difference in the
average daily intake of dairy products for men and women?

• The confidence interval contains the value µ1- µ2= 0. Therefore, it is possible that µ1

= µYou would not want to conclude that there is a difference in average daily intake
of dairy products for men and women.
Estimating the Difference between Two Proportions

Sometimes we are interested in comparing the proportion of “successes” in two binomial


populations.
• The germination rates of untreated seeds and seeds treated with a fungicide.
• The proportion of male and female voters who favor a particular candidate for
governor.

AArandom
randomsample
sampleof
ofsize
sizenn11drawn
drawnfrom
from
binomial
binomialpopulation
population11with
withparameter
parameterpp1. . 1

AArandom
randomsample
sampleof
ofsize
sizenn22drawn
drawnfrom
from
binomial
binomialpopulation
population22with
withparameter
parameter pp2 . . 2
Estimating the Difference between Two Proportions

Notations - Comparing Two Proportions

Sample Sample Sample Standard


size Proportion Variance Deviation

Sample frompˆ  n
x1 pˆ1qˆ1 pˆ1qˆ1
1
1 n1
Population 1
n n
x2 pˆ 2 qˆ2 pˆ 2 qˆ2
Sample from
n2 pˆ 2 
Population 2 n2 n n
Estimating the Difference between Two Proportions

We compare the two proportions by making inferences about p1-p2, the difference in the

two population proportions.

• If the two population proportions are the same, then p1-p2 = 0.

• The best estimate of p1-p2 is the difference in the two sample proportions,

xx11 xx22
pˆpˆ11 pˆpˆ22 
 
nn11 nn22
Estimating the Difference between Two Proportions

1.1. The
Themean
meanofof pˆpˆ11 pˆpˆ22isis pp11 pp22,,the
thedifference
difference in
in
the
thepopulation
populationproportion
proportions.s.
pp1qq1 pp2 qq2
2.2. The
Thestandard
standarddeviation of pˆpˆ11 pˆpˆ22isis
deviationof 1 1
 2 2..
nn1 nn2
1 2
33..IfIf the
thesample
samplesizes
sizes(both
(bothnn11and andnn22))arearelarge,
large,the
the
sampling
samplingdistributi
distribution of pˆpˆ1  pˆpˆ2 isisapproximat
onof 1 2 approximately
ely
normal,
normal,andandstanard
stanarddeviation
deviationcan
canbe
beestimated
estimatedas
as
pˆpˆ1qˆqˆ1 pˆpˆ2 qˆqˆ2
SE
SE  1 1  2 2..
nn1 nn2
1 2
Estimating the Difference between Two Proportions

For large samples, point estimates and their margin of error as well as confidence intervals
are based on the standard normal (z) distribution.

Point
Pointestimate
estimatefor -p22::pˆpˆ11 pˆpˆ22
for pp11-p
pˆpˆ11qˆqˆ11 pˆpˆ22qˆqˆ22
Margin
Marginof Error::
of Error 1.96
1.96 
nn11 nn22

Confidence
Confidenceinterval intervalfor for pp11 pp22:: Assumption
Assumption::both bothnn11and
andnn22
pˆpˆ11qˆqˆ11 pˆpˆ22qˆqˆ22 are
aresufficient
sufficientlylylarge
largeso
sothat
that
((pˆpˆ11 pˆpˆ22))
zz/ /22  --11
nn11 nn22 p̂p̂1 --p̂p̂2 
1 2 SE
22SE 11
Estimating the Difference between Two Proportions
Example: Compare the proportion of male and female college students who said that they had played on a soccer team during their K-12 years using a 99% confidence interval.

Solution

Continue on next slide…

Youth Soccer Male Female


Sample Size 80 70
Played Soccer 65 39

pˆ 1qˆ1 pˆ 2 qˆ 2
( pˆ 1  pˆ 2 ) 2.58 
n1 n2

65 39 .81(.19) .56(.44)
 (  ) 2.58   .25 .19
80 70 80 70

or .06  p1  p2  .44.
Estimating the Difference between Two Proportions

• Could you conclude, based on this confidence interval, that there is a difference in the

proportion of male and female college students who said that they had played on a
soccer team during their K-12 years?

• The confidence interval does not contains the value p1-p2 = 0. Therefore, it is not

likely that p1= p2You would conclude that there is a difference in the proportions for

males and females.

A higher proportion of males than females played soccer in their youth.


Large Sample Point Estimators (Summary)

To estimate one of four population parameters when the sample sizes are large, use the
following point estimators with the appropriate margins of error.
Large Sample Interval Estimators (Summary)

To estimate one of four population parameters when the sample sizes are large, use the
following interval estimators.
Large Sample Estimation (Summary)

1. All values in the interval are possible values for the unknown population
parameter.

2. Any values outside the interval are unlikely to be the value of the unknown
parameter.

3. To compare two population means or proportions, look for the value 0 in the
confidence interval. If 0 is in the interval, it is possible that the two population
means or proportions are equal, and you should not declare a difference. If 0 is not
in the interval, it is unlikely that the two means or proportions are equal, and you
can confidently declare a difference.
Part II

Small Sample Estimation


Small Sample Estimation of a Population Mean

• When the sample size is small, the estimation procedures for large-samples size are
not appropriate.

• Point estimators remain the same, but the CLT no longer works

• There are small sample interval estimators/confidence intervals for;


o , the mean of a normal population.
o the difference between two normal population means.
History of the Student t Distribution

• Student's t distribution was developed in 1908 by W. S. Gosset (1876-1937) who


worked for the Guinness Brewery. Gosset was a chief brewer for Guinness. The
brewery wouldn't allow him to publish his work under his name, so he used the
pseudonym "Student".

• He derived the correct sampling distribution for the mean of samples < 30 and called
it the ‘t distribution’.

• In his honour, it is often called the ‘Student t’ distribution.

• The mathematical details are complicated, but, it turns out that we perform exactly the
same calculations as before, with the one change that the t distribution instead of the
normal distribution is used.
Properties of the t distribution

The Student's t distribution is very similar to the standard normal distribution.

•It is symmetric about its mean

•It has a mean of zero

•It has a standard deviation and variance greater than 1.

•There are actually many t distributions, one for each degree of freedom

•As the sample size increases, the t distribution approaches the normal distribution.

•It is bell shaped.

•The t-scores can be negative or positive, but the probabilities are always positive.
Degrees of Freedom

• A degree of freedom occurs for every data value which is allowed to vary once a
statistic has been fixed.
• For a single mean, there are n-1 degrees of freedom.
• This value will change depending on the statistic being used.
Properties of the t distribution
Deciding between z and t

• When constructing a confidence interval for a population mean, we must decide


whether to use z or t.

• Which one to use depends on the size of the sample, whether it is normally
distributed or not, and whether or not the variance is known.

• There are various flowcharts and decision keys that can be used to help decide.
Assumptions

• Student t’s result only referred to a mean where the distribution of the population was
normally distributed with some mean μ and finite standard deviation σ.

• This is in contrast to the CLT for large samples that required no such assumption
about normality.

• The t-distribution also requires the assumption regarding independence in the


sample.
The t-Distribution

• The t-distribution itself is bell shaped and symmetric – just like the normal distribution
but is ‘flatter’.

• There are many t distributions – one for each sample size.

• The rule used is: for a sample of size n – use the t distribution with degrees of
freedom = n−1

Example: if the sample size is 15, then use a t distribution with degrees of freedom 15
− 1=14.

• Note the degrees of freedom often abbreviated to df.


The Sampling Distribution of the Sample Mean

xx 
zz 

 // nn

xx 
isisnot
notnormal
normal
ss// nn
Student’s t Distribution

Fortunately, this statistic does have a sampling distribution that is well known to
statisticians, called the Student’s t distribution, with n-1 degrees of freedom.

xx 
tt 

ss// nn

We can use this distribution to create estimation procedures for the population mean µ.
a
Using the t-Table

• Table gives the values of t that cut off certain critical values in the right tail of the t
distribution. ta
• Use index df and the appropriate tail area a to find ta, the value of t with area a to its
right.

For a random sample of size n = 10,


find a value of t that cuts off .025 in
the right tail.
Row = df = n –1 = 9

Column subscript = a = .025

t.025 = 2.262
Small Sample Confidence Interval for Population Mean

Small
Small--Sample (1--)100%
Sample(1 )100%confidence
confidenceinterval
interval
of
of the
thepopulation meanisis
populationmean
s
xxttα/2 s
α/2
nn
where
where tt/ 2/ 2isisthe
thevalue
valueof ofttthat
thatcuts
cutsoff area/2/2
off area
ininthe
theright
righttail tailof
ofaatt--distributi
distribution withdfdf nn11. .
onwith

Assumption: population must be normal


Small Sample Confidence Interval for Population Mean 
Small Sample Confidence Interval for Population Mean 
Estimating the Difference between Two Means

You can also create a 100(1-)% confidence interval for 1-2.

 11 11  Remember
ss    Rememberthe
thethree
22
((xx1  xx2 ))t three
2 t/ 2/ 2
nn11 nn22
1
assumptions:
assumptions:
1.1. Original
Originalpopulations
populationsnormal
normal
2.2. Samples
Samplesrandom
randomand
and
independent
independent
22 ((nn1 11))ss122((nn2 11))ss222
withss 
with  1 1 2 2 3.3. Equal
Equalpopulation
populationvariances
variances
nn1 nn2  22 2 2
1 2  1  2
and
andtt/2/2isisthe
thecritical
criticalvaluevalueof of ttwithwith
degrees
degreesof freedom nn1 nn2  22
of freedom 1 2
Estimating the Difference between Two Means

Example: A student recorded the mileage he obtained while commuting to school in


his car. He kept track of the mileage for twelve different tanks of fuel, involving
gasoline of two different octane ratings. Compute the 95% confidence interval for the
difference of mean mileages. His data follow:
87 Octane 90 Octane
26.4, 27.6, 29.7 30.5, 30.9, 29.2
28.9, 29.3, 28.8 31.7, 32.8, 29.3
Solution

•Let 87 octane fuel be the first group and 90 octane fuel the second group,
•so we have n1 = n2 = 6 and x1 =28.45, s1 1.228, x 2 =30.73, s 2 1.392
•Degree of freedom (df) = n1+n2-2 =10.
•The critical value of t is 2.228.
2 (n(n1  11) )s1s2 2(n(n2  11) )ss22 2 55 .50855
11.508 11.938
.9381.723
ss 2 1 1 2 2
 1.723
nn1 nn2  22 10
10
1 2
2
xx1   xx2 tt / 2 ss 2(1(1/ /nn1 11/ /nn2 ) )
1 2  /2 1 2

28 .45 30


28.45 .7322.228
30.73 .723(1(1/ /5511/ /55) )
.228 11.723
 22.28
.28 11.849
.849
The
The95% 95%confidence
confidenceinterval
intervalisis(-4.129,-.
(-4.129,-.431).431).
Estimating the Difference between Two Means
Example: The purpose of a study by Granholm et al. was to determine the
effectiveness of an integrated outpatient dual-diagnosis treatment program for
mentally ill subjects. The authors were addressing the problem of substance abuse
issues among people with severe mental disorders. A retrospective chart review
was carried out on 50 consecutive patient referrals to the Substance Abuse /
Mental Illness program at the DHA Healthcare System. One of the outcome
variables examined was the number of inpatient treatment days for psychiatric
disorder during the year following the end of the program. Among 18 subjects with
schizophrenia, the mean number of treatment days was 4.7 with a standard
deviation of 9.3. For 10 subjects with bipolar disorder, the mean number of
psychiatric disorder treatment days was 8.8 with a standard deviation of 11.5. We
wish to construct a 95 percent confidence interval for the difference between the
means of the populations represented by these two samples.

Solution

n1 = 18
s1 = 9.3
x1 = 4.7
n2 = 10
s2 = 11.5
x2 = 8.8 Continue on next slide…
Estimating the Difference between Two Means

• Degree of freedom (df) = n1+n2-2 = 18 + 10 – 2 = 26.


• The reliability factor is 2.0555.

The 95 percent confidence interval for the difference between population means as follows:
LOGO

• REFERENCES

• Dr. Muhammad Arif, PhD

You might also like