0% found this document useful (0 votes)

6 views

Chapter 2

Chapter Two discusses inference about population means and proportions, emphasizing the importance of making predictions based on data. It outlines two main methods of statistical inference: estimation and hypothesis testing, detailing point and interval estimation techniques. The chapter also covers properties of good estimators, the concept of sampling distributions, and the construction of confidence intervals for population parameters.

Uploaded by

Tigist G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Chapter 2

Uploaded by

Tigist G

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

CHAPTER TWO

1. Inference about a population mean and proportion

2.1. Introduction
Inference, specifically decision making and prediction, is centuries old and plays a
very important role in our lives. Each of us faces daily personal decisions and
situations that require predictions concerning the future. The inferences that
individuals make should be based on relevant facts, which we call observations, or
data.
Methods for making inferences about parameters fall into one of two categories.
Either we will estimate (predict) the value of the population parameter of interest or
we will test a hypothesis about the value of the parameter. These two methods of
statistical inference estimation and hypothesis testing involve different procedures,
and, more important, they answer two different questions about the parameter. In
estimating a population parameter, we are answering the question, „„what is the
value of the population parameter?‟‟ In testing a hypothesis, we are answering the
question, „„is the parameter value equal to this specific value?‟‟
Inference is the process of making interpretations or conclusions from sample data
for the totality of the population. Inferential statistics uses the sample results to
make decisions and draw conclusions about the population from which the sample
is drawn. In statistics there are two ways through which inference can be made.
 Statistical estimation  Statistical hypothesis
testing

Parameter and Statistic

 A number that describes a population is called a parameter
 A number that describes a sample is a statistic
 If we take a sample and calculate a statistic, we often use that statistic to
infer something about the population from which the sample was drawn

Getu D.
The two common forms of statistical inference are. 1. Estimation 2. Null hypothesis
tests of significance (NHTS)
There are two forms of estimation:
 Point estimation (maximally likely value for parameter)
 Interval estimation (also called confidence interval for parameter)
Both estimation and NHTS are used to infer parameters. A parameter is a statistical
constant that describes a feature about a phenomena, population, pmf, or pdf
2.2 Statistical Estimation:
This is one way of making inference about the population parameter where the
investigator does not have any prior notion about values or characteristics of the
population parameter. There are two ways estimation:

i. Point Estimation: The goal of point estimation is to make a reasonable guess of

the unknown value of a designated population quantity, e.g., the populations
mean. The quality of an individual estimate depends on the individual sample
from which it was computed and is therefore affected by chance variation. Point
Estimation is a single value or number of sample information that is used to
estimate a parameter. The best point estimate of the population mean  is the
sample mean X.

ii. Interval estimation: It is the procedure that results in the interval of values as
an estimate for a parameter, which is interval that contains the likely values of a
parameter. It deals with identifying the upper and lower limits of a parameter.

Getu D.
Estimator and Estimate
Estimator is the rule or random variable that helps us to approximate a population
parameter. But estimate is the different possible values which an estimator can
n

X
assume. For example: The sample mean is an estimator for the population
i
X  i 1

mean and X  10 is an estimate, which is one of the possible values of X .

Properties of best estimator
Three Properties of a Good Estimator
 It should be unbiased.
 It should be consistent.
 It should be relatively efficient.
 The estimator should be an unbiased estimator. That is, the expected value or the
mean of the estimates obtained from samples of a given size is equal to the
parameter being estimated. It‟s desirable that the sampling distribution be
centered on the true population parameter. An estimator with this property is
called unbiased.


Variance is Unbiased Estimator of population variance.

Solution

= = 𝛍 it is UE

Now, we want to compute the expected value of this

Getu D.
Now, let's multiply both sides of the equation by n-1, just so we don't have to
keep carrying that around, and square out the right side, just like we did with
that shortcut formula for SSX, above.

Let's write that again as a numbered equation:

Unfortunately, the expected value of the square of something is not equal to

the square of the expected value, so we seem to have hit an impasse with both
terms on the RHS. But, we're not out of tricks yet. Each of those terms is an
expected value of something squared: a second moment. Let's use the trick
about moments that we saw above. First, let Y be the random variable defined
by the sample mean. We're trying to figure out the expected value of its
square.
4

Getu D.
We can substitute this stuff for the second term on the RHS of equation 1. Also, note
that the first term on the RHS of equation 1 is the second moment of X, so that can
also be rewritten. Doing both substitutions gives us:

 The estimator should be consistent. For a consistent estimator, as sample size

increases, the value of the estimator approaches the value of the parameter
estimated.
 The estimator should be a relatively efficient estimator. That is, of all the
statistics that can be used to estimate a parameter, the relatively efficient
estimator has the smallest variance. It's desirable that our chosen estimator have
a small standard error in comparison with other estimators we might have
chosen.

Getu D.
2.2.1 Sampling Distribution of the sample mean
Because statistic such as x varies from sample to sample, they are random variables.
As such, Statistic has probability distributions associated with them. In order to
make probability statements regarding a sample statistic, we need to know the
probability distribution of the sample statistic. That is to say, we need to know the
shape, center and spread of the sample statistic‟s distribution.
The sampling distribution of a statistic is a probability distribution for all possible
values of the statistic computed from a sample of size n.

 There are commonly three properties of interest of a given sampling

distribution.
 Its Mean
 Its Variance
 Its Functional form.

Sampling distribution of the sample mean is a theoretical probability distribution

that shows the functional relationship between the possible values of a given sample
mean based on samples of size and the probability associated with each value, for all
possible samples of size drawn from that particular population.
Steps for the construction of Sampling Distribution of the mean

1. From a finite population of size N, randomly draw all possible samples of size
n
2. Calculate the mean for each sample.
3. Summarize the mean obtained in step 2 in terms of frequency distribution or
relative frequency distribution.

Example: Suppose we have a population of size 5=N, consisting of the age of five
children: 1,3,5,7and9

 Population mean   
X i1  3  5  7  9 25
  5
N 5 5

Population var iance   

2  ( X i   ) 2 (1  5) 2  (3  5) 2  (5  5) 2  (7  5) 2  (9  5) 2 40
   8 The
N 5 5

Getu D.
standard deviation is σ = 2.828427. In most of the situations we never know all

population values µ and σ, but we estimate sample values.

Example: Take samples of size 2 without replacement and construct sampling
distribution of the sample mean.
There are N  = 5

 n  
  
  10 possible samples of size as shown below.
   2

Sample No Sample Mean ( x )

1 1, 3 2
2 1, 5 3
3 1, 7 4
4 1, 9 5
5 3, 5 4
6 3, 7 5
7 3, 9 6
8 5, 7 6
9 5, 9 7
10 7, 9 8
1 3 1 5
For instance, x1 = =2, x2 = =3, etc
2 2
N 1
Sampling is random so that each sample has the same probability 1   = of being
 n  10
selected. x f Probability
2 1 1/10
3 1 1/10
4 2 2/10
5 2 2/10
6 2 2/10
7 1 1/10
8 1 1/10
Total 10 1.0

This is the sampling distribution of x.

Remark:

1. In general if sampling is with replacement

2
 2x 
n

Getu D.
2. The sample mean is unbiased estimator of the population mean i.e.

x    E x  

 Sampling may be from a normally distributed population or from a non- normally

distributed population.
 When sampling is from a normally distributed population, the distribution of x
will possess the following property.

1. The distribution of x will be normal

2. The mean of x is equal to the population mean, i.e.  x  

3. The variance of x is equal to the population variance divided by the sample

size i.e.
2
 2x 
n
   x
 x ~ N   ,   Z 
 n 
n
2.2.2 Point and Interval estimation of the population mean
i. Point estimation of the population mean
A point estimator is the numeric value of a sample statistic that is used to estimate
the value of a population parameter. The best point estimator of the population
mean µ is the sample mean X .
ii. Interval estimation (confidence interval) of the population mean
Although X possesses nearly all the qualities of a good estimator, different samples
are very likely to result in different sample means, and thus there is some degree of
uncertainty involved. Because the point estimate is unlikely to be exactly correct, we
usually specify a range of values in which the population parameter is likely to be.
Besides, a point estimate does not provide any information about the variability of
the estimator.
Definition: An interval estimator (or confidence interval) is a formula that tells us
how to use sample data to calculate an interval that estimates a population
parameter.

Getu D.
For example, if our confidence level is 95%, then in the long run, 95% of our sample
confidence intervals will contain 𝛍.
Consequently, interval estimation is often preferred. This technique provides a range
of reasonable values that are intended to contain the parameter of interest with a
certain degree of confidence. This range of values is called a confidence interval.
The confidence level of an interval estimate of a parameter is the probability that
the interval estimate will contain the parameter, assuming that a large number of
samples are selected and that the estimation process on the same parameter is
repeated.
A confidence interval is a specific interval estimate of a parameter determined by
using data obtained from a sample and by using the specific confidence level of the
estimate.
The probability that an interval estimate will contain the parameter is called
confidence level. There are different cases to be considered to construct confidence
intervals.
Intervals constructed in this way are called confidence intervals
Suppose that a sample of size n is selected from a population that has mean 𝛍 and
standard deviation σ. Let X1; X2; ; Xn be the n observations that are independent
and identically distributed (i.i.d.). Define now the sample mean
and the total of these n observations as follows:
and T =

The central limit theorem states that the sample mean follows approximately the
normal distribution with mean 𝛍 and standard deviation σ, where 𝛍 and σ are the
mean and standard deviation of the population from where the sample was selected.
The sample size n has to be large (usually n ≥ 30) if the population from where the
sample is taken is non normal.
If the population follows the normal distribution then the sample size n can be either
small or large

Getu D.
Case 1: When n is large or if the population is normally distributed
• If the variable x of a population is normally distributed with mean 𝛍 and standard
deviation σ then, for any sample of size 1 n , the variable x is also normally
2
Distributed with mean 𝛍 and standard deviation
n
2
In this case X is normally distributed with mean  and variance . That is
n
2
X ~ N ( , ) This allows us to use the normal distribution curve for computing
n
confidence intervals.
X 
Z  has a normal distributi on with mean 0 and s tan dard deviation 1.

n
 
  X Z  X    Z
n n
For the interval estimator to be a good estimator the error should be small. How 
can be small?
o If  is small
o By increasing the sample size (n)
o By decreasing Z
The best way is to decrease Z. to decrease Z we have to attach standard normal
distribution with the theory of chance.

/2 (1 ) /2

z
z/2 0 z/2

Figure: 2.1. A (1-  ) Confidence Interval

Getu D.
 P( Z   Z  Z  )  1  
2 2
X 
 P( Z   Z  Z  )  P( Z    Z )  1  

2 2 2 n 2
 
 P( X  Z      X  Z  )  1
2 n 2 n
 
 P( X  Z     X  Z )  1
2 n 2 n
 
 A (1   )100% confidence int erval for  will be ( X  Z  , X  Z )
2 n 2 n
However most of the time  is not known, in that case we estimate  by its po int estimate S
S S
 ( X  Z , X  Z ) is a (1   )100%confidence int erval for 
2 n 2 n

The Z values corresponding to the most commonly used confidence levels is given
below
(1-
α)100% α α/2 Zα/2
90 0.1 0.05 1.645
95 0.05 0.025 1.96
99 0.01 0.005 2.58
For example for 95% confidence interval Zα/2=1.96
Statistical interpretation of a confidence interval: Suppose we repeated this
sampling experiment 100 times; that is, we collected 100 different sets of data, each
set consisting of 40 observations. Suppose that we computed a confidence interval
based on each of the100 data sets. On average, we would expect 90 of the confidence
intervals to include the true mean µ; we would expect that 10 would not. The figure
90 comes from the fact that we chose a 90% confidence interval.
More generally, we can choose whatever confidence level we want. The convention is
to specify the confidence level as 1−α, where α is typically 0.1, 0.05 or 0.01. These
three α values correspond to confidence levels 90%, 95% and 99%. (α is the Greek
letter alpha.)

Getu D.
Definition: For any α between 0 and 1, we define zα to be the point on the z-axis
such that the area to the right of zα under the standard normal curve is α;
i.e.(Z>zα)=α.

Figure 1: The area to the right of zα is α. For example, z.05 is 1.645. The area outside
± zα /2

is α/2+α/2=α. For example, z.025=1.96 so the area to the right of 1.96 is 0.025, the
area to the left of−1.96 is also 0.025, and the area outside±1.96 is .05.

 Why zα/2, rather than zα? We want to make sure that the total area outside the
interval is α. This means that α/2 should be to the left of the interval and α/2
should be to the right. In the special case of a 90% confidence interval, α=0.1,
so α/2=0.05, andz.05 is indeed 1.645.

The expression

is called the half width of the confidence interval or the margin of error. The half
width is a measure of precision; the tighter the interval, the more precise our
estimate. Not surprisingly, the half width

 decreases as the sample size increases;

 increases as the population standard deviation increases;

Getu D.
 increases as the confidence level increases (higher confidence requires larger
zα/2

Case 2: When n is small and the population variance  2 is not known

When  is known and the variable is normally distributed, or when  is unknown
and n  30 , the standard normal distribution is used to find confidence intervals.
However, in many situations, the population standard deviation is not known and
the sample size is less than 30. In such situations, the standard deviations from the
sample can be used in place of the population standard deviation for confidence
intervals. But a somewhat different distribution, called the t-distribution must be
used when the sample size less than 30 and the variable is normally distributed or
approximately normally distributed. The t-distribution is sometimes called the
Student’s t distribution.
Characteristics of the t-distribution
o The t-distribution is bell-shaped
o The t-distribution is symmetrical about the mean
o The mean, median, and mode are equal to 0 and are located at the center of
the distribution
o The curve never touches the x-axis
The t-distribution differs from the standard normal distribution in the following
ways:
o The variance is greater than 1
o The t- distribution is actually a family of curves based on the concept of
degrees of freedom, which is related to sample size.
o As the sample size increases, the t-distribution approaches the standard
normal distribution.
Many statistical distributions use the concept of degrees of freedom, and the formula
for finding the degrees of freedom vary from different statistical tests. The degrees of
freedom are the number of values that are free to vary after a sample statistics has
been computed.

Getu D.
X 
If the sample size is small and the population variance  2 is not known t  has
S/ n
a t-distribution with n-1 degrees of freedom.

/2 (1 ) /2

t/2 0 t/2
S S
 A (1-α) 100% confidence interval for µ is given by ( X  t , X  t )
2 n 2 n

For any sample size n and any confidence level 1−α,we have tn−1,α/2 >zα/2
Consequently,intervals based on the t distribution are always wider then those
based on the standard normal.
As the sample size increases, the df increases. As the df increases, that distribution
becomes the normal distribution.
For example, we know that z 0.05 =1.645. Look down the.05 column of the t table. t
n,.05approaches 1.645 as n increases
When to use t, when to use z? Strictly speaking, the conditions are as follows:

 Z-based confidence intervals are valid if we have a large sample;

 t - based confidence intervals are valid if we have a sample from a normal
distribution with an unknown variance.

Examples: 1) The registrar of Wollega University is interested to estimate the

average age of students who graduate with BSc degree. From past studies the
population standard deviation is known to be 2 years. A sample of 50 graduating
students is selected, and the mean is found to be 23.2 years. Find the 95%
confidence interval estimate of the population mean age of the graduating students
at the university.

1) A random sample size 36 selected from a normal population has a mean of 32.
Given that the population standard deviation (σ) is 4.2. Find

a) A 95% confidence interval for the population mean

b) A 99% confidence interval for the population mean
c) Which interval is wider? Explain why

2) The mean operating life time for a random sample of n =10 light bulbs is X
=4,000 hr, with the sample standard deviation S=200 hr. The operating life of
14

Getu D.
bulbs in general is assumed to be approximately normally distributed. Find the
95% confidence interval for the true mean operating life time.
Solutions:
1. Given: σ=2 years, X  23.2 years , n=50 (Case 1)
 
 A (1   )100%confidenceint erval for is ( X  Z  , X  Z )
2 n 2 n
 
 X  Z    X  Z
2 n 2 n

1    0.95    0.05,  0.025  Z   1.96
2 2
 2   2 
 23.2  1.96     23.2  1.96 
 50   50 
 23.2  0.55    23.2  0.55
 22.65    23.75
Interpreta tion : The registerar is 95% confident taht the averafe age of graguating students is
between 22.65and 23.75 years

2. Given : X  32,   4.2 n  36 and the populationis normal

 
 A (1   )100 %confidenceint erval for  is ( X  Z  , X  Z )
2 n 2 n

a) 95 %  1    0.95    0.05,  0.025  Z
2   1.96
2
 
 X  Z    X  Z
2 n 2 n
 4.2   4.2 
 32  1.96     32  1.96 
 36   36 
 32  1.372
 30 .628    33 .372
 The 95 % confidenceint erval is 30 .35, 33 .65 
Interpretation : We are 95 % confidenttaht the populationmean is between30 .35 and33 .65

Getu D.

b)99 %  1    0.99    0.01,  0.005  Z
2   2.58
2
 
 X  Z    X  Z
2 n 2 n
 4.2   4.2 
 32  2.58     32  2.58 
 36   36 
 32  1.806
 30 .194    33 .806
 The 99 % confidenceint erval is 29 .83, 34 .17 
Interpretation : We are 95 % confidenttaht the populationmean is between29 .83and34 .17
c)The 99 % confidenceint erval is wider thanthe 95 % confidenceint erval
 As the confidenceincreasesthe int erval becomesl arg e
3. Given n  10 X  4,000 hrs and S  200hrs
n is small and  unkown (Case 2)  Use the t  distributi on
S S
 A (1   )100%confidence int erval for  is ( X  t , X  t )
,(n1) n , (n1) n
2 2

95%   0.025  t t  2.262
2 ,(n1) 0.025,9
2
 200   200 
 4000  2.262   4000  2.262 
 10   10 
 3856.8    4143.2
 (3856.8, 4143.2)
Interpreta tion : The registerar is 95% confident taht the averafe age of graguating students is
between 22.65and 23.75 years
Exercises:
1. A sociologist found that in a sample of 49 retired men, the average number of
jobs they had during their life-time was 7.2. From previous studies it was found
that the population standard deviation of the number of jobs is 2.1.
a) Find the 90% confidence interval of the mean for the number of jobs a man
had during his life time
b) Find the 95% confidence interval of the mean for the number of jobs a man
had during his life time
c) Compare the intervals in (a) and (b)
2. An electrical firm manufactures light bulbs that have a length of life that is
approximately normally distributed with a standard deviation of 40 hours. If a
random sample of 30 bulbs has an average life of 780 hours, find a 99%
confidence interval for the population mean of all bulbs produced by this firm.

Getu D.
3. A random sample of 400 households was drawn from a town and a survey
generated data on weekly earning. The mean in the sample was Birr 250 with a
standard deviation Birr 80. Construct a 95% confidence interval for the
population mean earning.
4. A sample of 15 private-duty nurses showed an average weekly wage of birr
480.75 with standard deviation of birr 56. Find the 99% confidence interval for
the true mean.
5. A major truck has kept extensive records on various transactions with its
customers. If a random sample of 16 of these records shows average sales of 290
liters of diesel fuel with a standard deviation of 12 liters, construct a 95%
confidence interval for the mean of the population sampled.

2.2.3 Sampling Distribution of sample Proportion

A proportion refers to the fraction of the total that possesses a certain attribute. For
example, suppose we have a sample of four pets - a bird, a fish, a dog, and a cat. We
might ask what proportion has four legs. Only two pets (the dog and the cat) have
four legs.
The population and sample proportion, denoted by P and p̂ respectively, are
calculated as
Number of elements in the population with a specific characteri stics
P=
Total number of elements in the population
Number of elements in the sample with a specific characteri stics
p̂ =
Total number of elements in the sample

The concept of proportion is the same as the concept of relative frequency

distribution. The relative frequency distribution of a category or class gives the
proportion of the sample or the population that belongs to that category or class.
Example: Suppose a sample of 240 families is taken from the city and 158 of them
a 158
are homeowners. Then, the sample proportion is given by p̂ = = =0.66, where „ a
n 240
‟is the number of families who own houses out of the total sample.
Just like the sample mean, the sample proportion is also a random variable. The
value of p̂ calculated for a particular sample depends on what elements of the
population are included in that sample.

Getu D.
The probability distribution of the sample proportion p̂ is called sampling
distribution. It lists the various values that p can assume and their probabilities.
To illustrate sampling distribution of p̂ let us consider the following small example.
Five employees of a given firm provided information concerning their awareness of
HIV/AIDS.
Name Awareness of HIV/AIDS
A Yes
J No
S No
L Yes
T Yes
Considering this as population, its proportion P of employees who know about
HIV/AIDS is
P=3/5=0.6 or 60%
Suppose we take all possible samples of three employees each and compute the
proportion of employees, for each sample who know about HIV/AIDS. The number of
possible samples is  5  =10.

 3
 

The following table shows all possible value of p̂ (rounded to two decimal places) for
each sample.
Sample No Sample Proportion who know HIV/AIDS
1 A, J, S 1/3=0.33
2 A, J, L 2/3=0.67
3 A, J, T 2/3=0.67
4 A, S, L 2/3=0.67
5 A, S, T 2/3=0.67
6 A, L, T 3/3=1.00
7 J, S, L 1/3=0.33
8 J, S, T 1/3=0.33
9 J, L, T 2/3=0.67
10 S, L, T 2/3=0.67
The frequency and sampling distribution of p̂ can be prepared from the above table
and it is summarized as follows.
p̂ f probability, P( p̂ )
0.33 3 3/10=0.3
0.67 6 6/10=0.6
1.00 1 1/10=0.1
total 10 1.0
E( p̂ ) =  pˆ P( pˆ ) = 0.33  0.3+0.67  0.6+1  0.1=0.601
18

Getu D.
 E( p̂ )=0.60 = P, which is population proportion.
x
The sample proportion is Pˆ  is a point estimate of P can be approximated by using
n

P1  P 
a normal with a mean  Pˆ  P and a standard error  Pˆ  if nP and n1  P  is
n
greater than 5.
2.2.4 Point and Interval estimation of population proportions (P)
Point estimation of population proportions
X
If P represents for the population proportion then the sample proportion Pˆ 
n
provides a good estimate of P. Therefore, the sample proportion P̂ is the point
estimation of the population proportion.
Interval estimation of population proportions (P)
In the binomial experiment each trial results in one of two outcomes, which we
labeled as either a success or a failure. We designated P as the probability of a
success and 1  P as the probability of a failure. Then the probability distribution for
 n!  x
x, the number of successes in n identical trials, is Px     P 1  P 
n x

 x!n  x ! 
In a random sample of n from a population in which the proportion of elements
classified as successes is P , the best estimate of the parameter P is the sample
proportion of successes. Letting x denote the number of successes in the n sample
x
trials, the sample proportion is Pˆ  . X can be approximated by using a normal
n
curve when nP  5 and n1  P  5 .
x
In a similar way, the distribution of Pˆ  can be approximated by a normal
n

P1  P 
distribution with a mean and a standard error given as  Pˆ  P and  Pˆ 
n
respectively.
A general 100 1    100% confidence interval for the proportion of successes is given

pˆ qˆ pˆ qˆ
by ( pˆ  Z  , pˆ  Z 
2 n 2 n
19

Getu D.
Examples
a. If in a random sample of n=230 voters, 54 voted for candidate A. find the 90%
confidence interval for the proportion of individuals who voted for candidate A.
b. In a sample of 100 teenage girls, 30% used hair coloring. Find the 95%
confidence interval of the true proportion of teenage girls who use hair
coloring.
Solutions:
a)Let x be the number of individuals who voted for candidate A
x 54
 pˆ    0.235  qˆ  1  pˆ  1  0.235  0.765 90%  Z   1.645
n 230
2

pˆ qˆ pˆ qˆ
confidence int erval : ( pˆ  Z  , pˆ  Z 
n n
2 2
0.235  0.765 0.235  0.765
 0.235  1.645 , 0.235  1.645
230 23
 0.235  0.046, 0.235  0.046
 (0.189,0.281)  0.189  p  0.281
 18.9%  p  28.1%
We can be 90% confident that the true population proportion is betwen 18.9% and 28.1%
b) Given pˆ  0.3  qˆ  0.7 95%  Z   1.96
2
pˆ qˆ pˆ qˆ
confidence int erval : ( pˆ  Z  , pˆ  Z 
n n
2 2
0.3  0.7 0.3  0.7
 0.3  1.96 , 0.3  1.96
100 100
 0.3  0.0898, 0.3  0.0898
 (0.1202,0.3898)  0.1202  p  0.3898
 21.02%  p  38.98%
We can be 95% confident that the true population proportion is betwen 21.02% and 38.98%
Generally how do you interpret a confidence interval?
How do you interpret a confidence interval?
 Suppose you calculate a 95% confidence interval for some unknown
parameter µ (the true price all students spent on books).
IT IS INCORRECT TO SAY:

Getu D.
 “There is a 95% probability that µ (the average price all UNL students spent
on books) is within this interval”
Why is it Incorrect?
The confidence interval you compute is NOT a random interval and µ is a constant
(unfortunately unknown to us), thus there is no randomness. In fact, µ either falls
in that interval or it does not.
What is the Correct Interpretation?
 “We are 95% confident that if µ (the average price all UNL students spent on
books) were
known, this interval would cover/contain it”
Note: The probability refers to the interval containing µ, not on µ being in the
interval
Why is this?
A 95% confidence interval is not so much a statement about any particular interval,
such as (79.3, 80.7), but pertains to what would happen if a very large number of
like intervals were to be constructed. That is, from a practical point of view, the
95% gives the fraction of the time, in repeated sampling, that the intervals
constructed will contain the target parameter µ.
Exercise:
1. A survey of 1000 people who watched the Democrats/Republican debate
resulted in 600 who thought that democrats won the debate. Construct a 95%
percent confidence interval for the proportion of people who thought democrats
won the debate.
2. A survey of 120 female freshmen shows that 18% did not wish to work after
marriage. Find the 95% confidence interval of the true proportion of females who
do not wish to work after marriage.
2.3 Hypothesis testing

The idea of hypothesis testing is:

 Ask a question with two possible answers

 Design a test, or calculation of data

Getu D.
 Base the decision (answer) on the test

Hypothesis testing is one way of making inference about the population parameter
where the investigator has prior notion about the values of the parameter. It is a
common method of drawing inferences about a population based on statistical
evidence from a sample.
Hypothesis testing: A procedure, based on sample evidence and probability theory,
used to determine whether the hypothesis is a reasonable statement and should not
be rejected, or is unreasonable and should be rejected.
A hypothesis is a statement or a claim about the values of the parameter whose
plausibility is to be evaluated on the basis of the sample data.
Hypothesis: A statement about the value of a population parameter developed for the
purpose of testing.
A statistical hypothesis test is a method of making statistical decisions using
experimental data.
2.3.1 Important Concepts in Hypothesis testing
Statistical hypothesis: Is an assertion, statement, or claim about the population
whose plausibility is to be evaluated on the basis of the sample data.
Test statistic: Is a statistics whose value serves to determine whether to reject or
not reject the hypothesis to be tested. There are two types of statistical hypotheses
for each situation: the null hypothesis and the alternative hypothesis.

a. Null hypothesis: Is a claim or statement about a population parameter that is

usually assumed to be true from the very beginning until it is declared false. It is
a statistical hypothesis that states a hypothesis of equality or the hypothesis of
no difference between a parameter and a specific value. It is usually denoted by
H0.

b. Alternative hypothesis: Is a claim or statement about a population parameter

that will be true if the null hypothesis is false. It is a statistical hypothesis that
states a hypothesis of difference between a parameter and a specific value. It is
usually denoted by H1 or HA.

Getu D.
Types and size of errors: There are two types of error in hypothesis testing
Type I error: Rejecting the null hypothesis when it is true. The significance level (  )
can be interpreted as the probability of rejecting the null hypothesis when it is
actually true. The probability of type I error is denoted by α. That is, P (Type I error)
= α called level of significance.
Type II error: Failing to reject the null hypothesis when it is false (accepting the null
hypothesis when it is false). The probability of type II error is denoted by β. That is, P
(Type I error) = β
Type I error and type II error have inverse relationship and therefore, cannot be
minimized at the same time. In practice we set α at some value and design a test
that minimizes β. This is because type I error is often considered to be more serious,
and therefore more important to avoid than type II error.
The following table gives a summary of possible results of any hypothesis test:

STEPS IN THE HYPOTHESIS TESTING PROCEDURES

General steps in hypothesis testing:

State the null hypothesis and the alternate hypothesis.

Null Hypothesis – statement about the value of a population parameter.
Alternate Hypothesis – statement that is accepted if evidence proves null
hypothesis to be false.
Decide on the significance level :

Getu D.
In practice, the level of significance (α) is chosen arbitrarily. Three levels 0.01, 0.05,
or 0.10. (Depending on confidence level). The smaller the level of significance, the
stronger the hypothesis tests. The level of significance determines the values of the
test statistic that would cause us to reject the hypothesis. The corresponding test
statistic values for the level of significance are called the critical values. The critical
value is the value that divides the non-reject region from the reject region. A level of
significance has different critical values for one and two tailed test. Level of
significance of 0.05 has critical value of ±1.96 if the test is two tailed. However if the
test is one tailed the critical value would be 1.64 to either of the tails. Note that
critical values for a given level of significance differ depending on the test statistic
intended to be used.
The critical value separates the critical region from the noncritical region. The
symbol for critical value is C.V.

 The critical or rejection region is the range of values of the test value that
indicates that there is a significant difference and that the null hypothesis
should be rejected.
 The non critical or non rejection region is the range of values of the test value
that indicates that the difference was probably due to chance and that the
null hypothesis should not be rejected
 The critical and noncritical regions and the critical value are shown in the
following Figure for one tailed

 The critical and noncritical regions and the critical value are shown in
the following Figure for two tailed

Getu D.
Select the appropriate test statistic and level of significance.
When testing a hypothesis of a population mean we use the z or t-statistic .the
formula When testing a hypothesis of a mean, we use the z-statistic or we use the t-
statistic according to the following conditions. If the population standard deviation,
σ, is known and either the data is normally distributed or the sample size n > 30, we
use the normal distribution (z-statistic). When the population standard deviation, σ,
is unknown and either the data is normally distributed or the sample size is less
than 30 (n < 30), we use the t-distribution (t-statistic)
State the decision rules.
The decision rules state the conditions under which the null hypothesis will be
accepted or rejected. The critical valuefor the test-statistic is determined by the level
of significance. The critical value is the value that divides the non-reject region from
the reject region.
Compute the appropriate test statistic and make the decision.
When we use the z-statistic, we use the formula

When we use the t-statistic, we use the formula

Compare the computed test statistic with critical value.If the computed value is
within the rejection region(s), we reject the null hypothesis; otherwise, we do not
reject the null hypothesis.
25

Getu D.
Interpret the decision.
Based on the decision in Step 4, we state a conclusion in the context of the
original problem.
2.3.2 Hypothesis testing about the population means (µ)
Let  0 be the assumed or hypothesized value of µ, then one can formulate two-sided

(1) and one-sided (2 and 3) hypothesis as follows:

1. H 0 :    0 vs H1 :    0  two  tailed (two  sided alternativ e hypothesis
2. H 0 :    0 vs H1 :    0 
one  tailed (one  sided alternativ e hypothesis
3. H 0 :    0 vs H1 :    0 

A one-tailed test indicates that the null hypothesis should be rejected when the test
value is in the critical region on one side of the mean. A one-tailed test is either a
right tailed test or left-tailed test, depending on the direction of the inequality of the
alternative hypothesis
In a two-tailed test, the null hypothesis should be rejected when the test value is in
either of the two critical regions
The choice of the alternative hypothesis (H1) depends on the prior information on µ.

Case 1: When n is large or the population is normal

X  0
Test Statistics: Z cal 

n
After specifying α we have the following regions (critical and acceptance) on the
standard normal distribution corresponding to the above three hypothesis.
Table: Summary of Decision Rules
H1 Reject H0 if Do not reject H0 (Accept H0) if
Z cal  Z  Z cal  Z 
  0 2 2

  0 Z cal  Z Z cal  Z
  0 Z cal  Z Z cal  Z

X  0
Where Z cal 

n

Getu D.
If the population standard deviation σ is not unknown, the sample standard

deviation is used in and the test statistic will be

X  0
Z cal 
S
n
The decision rule is the same as above.
Case 2: When n is small and σ unknown
X  0
We use the t-test  Test Statistics: tcal 
~ t  ,n1
S 2
n
After specifying α we have the following regions (critical and acceptance) on the
standard normal distribution corresponding to the above three hypothesis.
Table: Summary of decision rules
H1 Reject H0 if Do not reject H0 (Accept H0) if
tcal  t  tcal  t 
  0 2
, n 1
2
, n 1

  0 tcal  t tcal  t
  0 tcal  t tcal  t
X  0
Where tcal 
S
n
For the t distribution to apply strictly we need the following two assumptions:

 The observations are selected at random from the population

 The population distribution is normal

Sometimes the second assumptions may not be met as the t test is robust for
departures from the normal distribution. That means even when assumption 2 is
not satisfied, the probabilities calculated from the t table are still approximately
correct.
Examples:

1. Convicted murderers receive a sentence of an average of 18.7 years in prison. A

criminologist wants to perform a hypothesis test to determine whether the mean
sentence by one particular judge differs from 18.7 years. A random sample of 36
cases from the court files from this judge is taken. It is found that sample mean
27

Getu D.
is 17.2 years. Assume that the population standard deviation is 4.2 years. Test
whether the mean differs from 18.7 years use the 0.05 significance level.
2. The Wollega University uses thousands of fluorescent light bulbs each year. The
brand of bulb it currently uses has a mean life of 900 hours. A manufacturer
claims that its new brand of bulbs, which cost the same as the brand the
university currently uses, has a mean life of more than 900 hours. The university
has decided to purchase the new brand if, when tested, the test evidence
supports the manufacturer‟s claim at the 0.05 significance level. Suppose 64
bulbs were tested with the following results: X = 920 hours S = 80 hours. Will
the Wollega University purchase the new brand of fluorescent bulbs?"
3. For healthy women aged 18-24, the systolic blood pressure reading with a mean
114.8. A random sample of 16 women has an average systolic blood pressure is
117.23 with a standard deviation of 5.63. Test the claim that the systolic blood is
different from 114.8. Use the 0.05 significance level
4. A job placement director claims that the average monthly starting salary for
nurses is less than 1600 birr. A sample of 16 nurses has a mean monthly
starting salary of 1570 birr with a sample standard deviation of 120 birr. At
α=0.05 test the claim that nurses earn less than 1600 birr a month.
5. Researchers are interested in the mean level of an enzyme in a certain
population. They take a sample of 36 individuals, determine the level of enzyme
in each and compute a sample mean 22. It is known that the variable of interest
is approximately normally distributed with a standard deviation of 10. Let‟s say
that they are asking the following question: Can we conclude that the mean
enzyme level in this population is different from 25?

Solution:
1. Step 1 : State the null and alternativ e hypothesis
H 0 :   18.7
H 1 :   18.7
Step 2 :  0.05
Step 3 :  known and n l arg e  use the Z  stastic
Step 4 : Critical regions : Re ject H 0 if Z cal  Z   1.96
2

Getu D.
X  0
Step 5 : Calculatio n of the test statistic : Z cal 

n
17.2  18.7
 Z cal   2.143
4.2
36
Step 6 : Decission : Since Z cal  2.143  1.96  Re ject H 0
Step 7 : Interpretation : At   0.05 the cri min o log ist can conclude that the average sentence is
differnt from18.7 years.
2. Step 1 : State the null and alternativ e hypothesis
H 0 :   900
H 1 :   900
Step 2 :  0.05
Step 3 :  unknown but n is l arg e  use the Z  stastic
Step 4 : Critical regions : Re ject H 0 if Z cal  Z   1.645
X  0
Step 5 : Calculatio n of the test statistic : Z cal 
S
n
920  900
 Z cal  2
80
64
Step 6 : Decission : Since Z cal  2  1.645  Re ject H 0
Step 7 : Interpretation : At   0.05 there is enough evidence to indicate that the new brand of light bulbs has a
mean life time of more than 900 hours.
3. Step 1 : State the null and alternativ e hypothesis
H 0 :   114.8
H 1 :   114.8
Step 2 :  0.05
Step 3 : n small and  unknown  use the t  test
Step 4 : Critical regions : Re ject H 0 if t cal  t  ,n 1  t0.025, 15  2.131
2

X  0
Step 5 : Calculatio n of the test statistic : t cal 
S
n
117.23  114.8
 t cal   1.726
5.63
16
Step 6 : Decission : Since tcal  t  ,n 1  2.131  Do not Re ject H 0
2

Step 7 : Interpretation :The Systolic blood pressure for a healthy women aged 18  24 is 114.8

Getu D.
4. Step 1 : State the null and alternativ e hypothesis
H 0 :   1600
H 1 :   1600
Step 2 :  0.05
Step 3 : n small  unknown  use the t  stastic
Step 4 : Critical regions : Re ject H 0 if t
cal
 t , n 1  t
0.05,15
 1.753
X  0
Step 5 : Calculatio n of the test statistic : t cal 
S
n
1570  1600
 t cal   1
120
16
Step 6 : Decission : Since Z cal  1  1.753  Do not reject H 0
Step 7 : Interpretation : At   0.05 the mean monthly starting salary of nurses is not less than 1600 birr

Exercises:

1. State the null and alternative hypotheses for each of the following

a) A researcher thinks that if expectant mothers use vitamin pills, the birth
weight of the babies will increase. The average of the birth weights of the
population is 4.6 Kilograms.
b) An engineer claims that she can decrease the mean number of defects in a
manufacturing process of compact discs by using robots instead of human
for certain tasks. The mean number of defective disks is 18
c) A psychologist feels that if he plays soft music during a test, the result of the
test will be changed. He is not whether the grades will be higher or lower. In
the past, the mean of the scores was 73.

2. The scores on an aptitude test required for entry into a certain job position is
normally distributed with mean 500 and standard deviation of 120. If a
random sample of 36 applicants has a mean of 546, is there evidence that their
mean score is different from 500? Use α=0.05.
3. Ten years ago, the mean age of juveniles held in public custody was 16.0 years.
The mean age of 250 randomly selected juveniles currently being held in public
custody is 15.86 years. Assuming σ=1.01 years, does it appear that the mean

Getu D.
age of all juveniles being held in public custody this year is less than it was 16
years ago? Use α=0.10.
4. The mean life time of light bulbs produced by a company is known to be 1600
hours. The mean life time of a sample of 16 light bulbs produced by the factory
is computed to be 1570 hours

a) If the population standard deviation is 120 hours, test whether or not the
mean life time is different from 1600 hours
b) If the population standard deviation is not known and the sample standard
deviation is 110 hour, is there any evidence to say that the mean life time of
the light bulbs is more than 1600 hours?

5. With a standard care, cancer patients are expected to survive a mean duration
of time equal to 38.3 months. A clinician claims that a new therapy will
improve survival time. The new therapy is administered to 100 cancer patients.
Their average time is 46.9 months. Suppose σ is known to be 43.3 months. Is
this statistically significant evidence of improved survival time at the 0.05 level
of significance?
6. A recent study shows that the average age of murder victims in a small city is
23.2 years. A random sample of 18 recent victims had a mean of 22.6 years
and a standard deviation of 2 years. At α=0.05, is the average different from
23.2 years? Assume the variable is approximately normally distributed.
7. Oromia International Bank claims that the mean wait time for a teller during
peak hours is less than 4 minutes. A random sample of 20 wait times has a
mean of 2.6 minutes with a sample standard deviation of 2.1 minutes. At
α=0.05 test the bank‟s claim

3.3.3 Hypothesis testing about the population proportion: P

The procedure to make tests of hypothesis about the population proportion P for
large samples is similar in many aspects to the population mean. The procedure
includes the same seven steps. Similarly, the test can be two-tailed or one tailed.
When the sample size is large, the sample proportion P̂ is approximately normally

Getu D.
P(1  P)
distributed with its mean equal to P and standard deviation equal to .
n
Hence; we use the normal distribution to perform a test of hypothesis about the
population proportion P for a large Sample. The sample size considered to be large

when nPˆ and n(1  Pˆ ) are both greater than 5.

Suppose the assumed or hypothesized value of P (parameter of the binomial
distribution) is denoted by P0 then one can formulate two sided (1) and one sided (2

and 3) hypothesis as follows:

1. H 0 : P  P0 VS H 1 : P  P0
2. H 0 : P  P0 VS H 1 : P  P0
3. H 0 : P  P0 VS H 1 : P  P0

The choice of H 1 depends on the prior information we have on the values of P0 .

Decision Rule:

Hypothesis Decision rule is to reject

H0 if:
Null Alternative
P  P0 Z cal  Z  2
VS P  P0 Z cal  Z 
P  P0
P  P0 Z cal  Z 

Z cal 
Pˆ  P 
0
~ N (0,1)
P0 (1  P0 )
n
Example 8.9: A manufacturing company has submitted a claim that 100% of items
produced by a certain process are non defective. An improvement in the process is
being considered that the feel will lower the proportion of defectives below the
current 10%. In an experiment 100 items are produced with the new process and 5
are defective: Is this evidence sufficient to conclude that the method has been
improved? Use a 0.05 level of significance.
Solution: As usual, we follow the steps:

1. H 0 : P  0.9 (actually P  0.9 ) VS H1 : P  0.9

Getu D.
2.   0.05
3. Critical Region: Z>1.645
4. Computation

X 95
Pˆ    0.95
n 100

Z cal 

Pˆ  P0 
0.95  0.90
 1.67
P0 (1  P0 ) 0.9 * 0.1
n 100

5. Decision: Reject H0
6. Conclusion: At 0.05 we have an evidence to say that the improvement has
reduced the proportion of defective.

Example: the unemployment rate in a given country at a given period is believed to

be 10%. The government embarked on a series of projects to reduce unemployment.
It was of interest to determine whether unemployment decreases as a result of the
projects. A random sample of 500 people was chosen, and 48 of them were found to
be unemployed. Test at 1% level of significance if the government projects reduced
the unemployment rate
Solution: As usual, we follow the steps:

1. H 0 : P  0.1 VS H1 : P  0.1
2.   0.05
3. Critical Region: Z<-Z1.645
4. Critical Region: Z  Z 

5. Computation

X 48
Pˆ    0.096
n 500

Z cal 
Pˆ  P 
0

0.096  0.1
 0.3
P0 (1  P0 ) 0.1 * 0.9
n 500
 Z tab  Z   Z 0.01  2.33

Getu D.
6. Decision: Do not reject H0 since Zcal > Ztab
7. Conclusion: the government projects didn‟t reduce unemployment.

Example: A large sample of 200 students from the students of a certain high school
is interviewed and 85 of them are found to use city bus. Can you conclude that at
least 40% of the students use city bus? Use a 0.05 level of significance (Exercise)
Examples:

1. A registrar officer believes that the dropout for seniors at Wollega university is
15%. He performed a hypothesis test to determine if the percentage is the same
or different from 15%. Last year, 38 seniors from a random sample of 200 seniors
withdrew. At α=0.05 test the educator‟s claim.
2. A telephone company representative estimates that more than 25% of its
customers want call waiting service. A sample of 200 customers showed that 63
had the call waiting service. At α=0.05 is his estimate appropriate?

Solutions:
1) Step 1 : State the null and alternativ e hypothesis
H 0 : p  0.15 H 1 : p  0.15
Step 2 :  0.05
Step 4 : Critical regions : Re ject H 0 if Z cal  Z   1.96
2
pˆ  p0
Step 5 : Calculatio n of the test statistic : Z cal 
p0 (1  p0 )
n

38
pˆ   0.19, p0  0.15  1  p0  0.85
200
0.19  0.15
 Z cal   1.58
0.15  0.85
200
Step 6 : Decission : Since Z cal  1.58  1.96  Do notreject H 0
Step 7 : Interpretation : At   0.05 ther dropout for seniors is 15%.

Getu D.
2) Step 1 : State the null and alternativ e hypothesis
H 0 : p  0.25 H 1 : p  0.25
Step 2 :  0.05
Step 4 : Critical regions : Re ject H 0 if Z cal  Z   1.645
pˆ  p0
Step 5 : Calculatio n of the test statistic : Z cal 
p0 (1  p0 )
n
63
pˆ   0.315, p0  0.25  1  p0  0.75
200
0.315  0.25
 Z cal   2.12
0.25  0.75
200
Step 6 : Decission : Since Z cal  2.12  1.645  Re ject H 0
Step 7 : Interpretation : At   0.05 more than 25% have a call  waiting service.

Exercises: 1) Candidate Chala is one of the two candidates running for the mayor of
Nekemte town. A random polling of 672 registered voters finds that 323 will vote for
candidate Chala. At α=0.05 is it reasonable to assume that half of the population
will vote for Chala?
2) Hawi believes that 50% the brides in the Nekemte are younger than their grooms.
She performs a hypothesis test to determine if the percentage is the same or
different from 50%. Hawi samples 100 brides and 53 reply that they are younger
than their grooms. At 1% level of significance test Hawi‟s claim
2.3.4 Sample size determination
In planning a statistical investigation we should decide the number of units (Sample
size) to be studied in order to answer the study objectives. If the sample size is too
small we may fail to detect important effects, or may estimate effects too imprecisely.
If the sample size is too large then we will waste resources. Therefore it is
recommended to determine the appropriate sample size for our study.
How many samples should be included in our study? The sample size depends on
the maximum error of the estimate, the population standard deviation, and the
degree of confidence.

Z   Z 
2

  
Recall that   Z   n  Z   n  2
n 2 
 
n 2   
2  

Getu D.
Example: The college president asks the registrar officer to estimate the average age
of the students at their college. From a previous study, the standard deviation of the
ages was found to be σ= 2 years. How large the sample should be if the officer wishes
to be accurate within 1 year?
Solution : Given : Z   2.58   2   1
2

 Z  
2

   2.58  2 
n 2     26.6256  27
    1 
 
A scientist wishes to estimate the average depth of a river. He wants to be 99%
confident that the estimate is accurate within 2 feet. From a previous study, the
standard deviation of the depths measured was 4.38 feet.
Solution

Round the value 31.92 up to 32 therefore, to be 99% confident that the estimate is
within 2 feet of the true mean depth, the scientist needs at least a sample of 32
measurements. (Always round up to the next whole number.)
2
 Z 
 2 
Similarly for proportions the sample size required is given by: n  pˆ qˆ  
  
 
Example: A university administrator wishes to estimate, with 90 percent confidence
the proportion of students enrolled in M.B.A. programs that also have
undergraduate degrees in business. It was found that in random sample of 230
students enrolled in M.B.A. programs 54 have undergraduate degrees in business
What sample size should be required, if the researcher wishes to be accurate within
5% of the true proportion?
Solution:
54
Given : 90%  Z   1.645 pˆ   0.235  qˆ  0.765 and   0.05
2
230
2
 Z 
   1.645 
2

 n  pˆ qˆ  2   0.235  0.765   194.59  195

    0.05 
 
Exercises:
1. A college dean wishes to estimate the average number of hours his part-time
instructors teach per week. The standard deviation from pervious study is 2.6

Getu D.
hours. How large sample must be selected if he wants to be 99% confident of
finding whether the true mean differs from the sample mean by 1 hour?
2. A researcher wants to estimate, with 95% confidence, the number of people who
own a home computer. A previous study shows that 40% of those interviewed had
a computer at home. The researcher wishes to be accurate within 2% of the true
proportion. Find the minimum sample size necessary.

Getu D.

Immediate download Statistics for Engineers and Scientists 5th Edition William Navidi ebooks 2024
100% (3)
Immediate download Statistics for Engineers and Scientists 5th Edition William Navidi ebooks 2024
58 pages
IPMVP StatisticsUncertainty 2014
100% (1)
IPMVP StatisticsUncertainty 2014
26 pages
Sta 341 Class Notes Final
No ratings yet
Sta 341 Class Notes Final
120 pages
Stat Notes
No ratings yet
Stat Notes
5 pages
Sampling
No ratings yet
Sampling
27 pages
MIT2 854F10 Stats
No ratings yet
MIT2 854F10 Stats
38 pages
Basic Univariate Statistics for Engineers 2019
No ratings yet
Basic Univariate Statistics for Engineers 2019
32 pages
7 Estimation
No ratings yet
7 Estimation
91 pages
Sampling Distributions of Sample Means
No ratings yet
Sampling Distributions of Sample Means
7 pages
Chapter 8
No ratings yet
Chapter 8
19 pages
2 Hypothesis Testing
No ratings yet
2 Hypothesis Testing
22 pages
Ch-1.Ppt Business Statx (2)
No ratings yet
Ch-1.Ppt Business Statx (2)
66 pages
L8 Statistical Estimation 1
No ratings yet
L8 Statistical Estimation 1
48 pages
STAT 101 Module Handout 4.1
No ratings yet
STAT 101 Module Handout 4.1
12 pages
SM-2 Basic Statistics
No ratings yet
SM-2 Basic Statistics
35 pages
Inferential Statistics
No ratings yet
Inferential Statistics
29 pages
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
No ratings yet
Point Estimation: Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2018
12 pages
Chapter 2 Students-Sta408
No ratings yet
Chapter 2 Students-Sta408
59 pages
Unit - III (P&S Notes)
No ratings yet
Unit - III (P&S Notes)
39 pages
Lecture 6 - Estimation Part A
No ratings yet
Lecture 6 - Estimation Part A
23 pages
SAMPLING by Naresh Vasant Afre 13.04.23 Shareable
No ratings yet
SAMPLING by Naresh Vasant Afre 13.04.23 Shareable
58 pages
Session2_QTII_24
No ratings yet
Session2_QTII_24
31 pages
Lecture 3: Sampling and Sample Distribution
No ratings yet
Lecture 3: Sampling and Sample Distribution
30 pages
Chapter-Summary of Bacal
No ratings yet
Chapter-Summary of Bacal
11 pages
03 Estimation IITB PDF
No ratings yet
03 Estimation IITB PDF
58 pages
Note 06 - Concept of Statistical Inference
No ratings yet
Note 06 - Concept of Statistical Inference
30 pages
Formula_List_Statistics_2
No ratings yet
Formula_List_Statistics_2
4 pages
Sampling and Estimation
No ratings yet
Sampling and Estimation
34 pages
Inferential Statistics: X (Called X Bar), To Symbolize The Sample
No ratings yet
Inferential Statistics: X (Called X Bar), To Symbolize The Sample
19 pages
Chapter 6-8 Sampling and Estimation
No ratings yet
Chapter 6-8 Sampling and Estimation
48 pages
Statistics PDF
No ratings yet
Statistics PDF
17 pages
Preliminary Concepts On Statistical Inference
100% (1)
Preliminary Concepts On Statistical Inference
39 pages
6 Estimation and Hypothesis
No ratings yet
6 Estimation and Hypothesis
95 pages
4. Interval Estimation
No ratings yet
4. Interval Estimation
69 pages
Stats-And-Prob-Reviewer (Grade 11 Stem)
100% (1)
Stats-And-Prob-Reviewer (Grade 11 Stem)
5 pages
Slideset 2
No ratings yet
Slideset 2
63 pages
Biostat Inferential Statistics
No ratings yet
Biostat Inferential Statistics
62 pages
Chapter 2
No ratings yet
Chapter 2
45 pages
Brief Lecture Notes
No ratings yet
Brief Lecture Notes
13 pages
CH 6 Updated
No ratings yet
CH 6 Updated
13 pages
Inferential Statistics: by The End of This Chapter You Should Be Able To
No ratings yet
Inferential Statistics: by The End of This Chapter You Should Be Able To
46 pages
Estimation
No ratings yet
Estimation
92 pages
Chapter 6. Estiamation
No ratings yet
Chapter 6. Estiamation
65 pages
Lecture06 Ch6 Forsyth Inf Stats FA24
No ratings yet
Lecture06 Ch6 Forsyth Inf Stats FA24
56 pages
Statistics
No ratings yet
Statistics
49 pages
Chap5 Statistical Inference
No ratings yet
Chap5 Statistical Inference
38 pages
Week 6. Chapter 7 Introduction To Inferential Statistics
No ratings yet
Week 6. Chapter 7 Introduction To Inferential Statistics
24 pages
POINT INTERVAL Estimates
No ratings yet
POINT INTERVAL Estimates
48 pages
Stats Reviewer
No ratings yet
Stats Reviewer
3 pages
Lec 10-13
No ratings yet
Lec 10-13
207 pages
- Module 4-Sampling 2
No ratings yet
- Module 4-Sampling 2
56 pages
Lecture - 9 EstimationRM (ECON 1005 2011-2012)
No ratings yet
Lecture - 9 EstimationRM (ECON 1005 2011-2012)
52 pages
Estimation & Hypothesis Testing.pptx (Final)
No ratings yet
Estimation & Hypothesis Testing.pptx (Final)
92 pages
Stats Assign
No ratings yet
Stats Assign
6 pages
Lec_7& 8(Stastical Estimation)
No ratings yet
Lec_7& 8(Stastical Estimation)
65 pages
Confidence Intervals and Hypothesis Tests For Means
No ratings yet
Confidence Intervals and Hypothesis Tests For Means
40 pages
Sampling technique and sampling distribution
No ratings yet
Sampling technique and sampling distribution
47 pages
Gsbiju MA202 3 1
No ratings yet
Gsbiju MA202 3 1
5 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
SM Chapter 1
No ratings yet
SM Chapter 1
18 pages
Chapter 2 SM
No ratings yet
Chapter 2 SM
37 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
Chapter 5
No ratings yet
Chapter 5
8 pages
Chapter 3-Multiple Regression Model
No ratings yet
Chapter 3-Multiple Regression Model
26 pages
BSMA 301 Statistics: Dr. Eyram Kwame
No ratings yet
BSMA 301 Statistics: Dr. Eyram Kwame
137 pages
SP16Ec120ASyllabus PDF
No ratings yet
SP16Ec120ASyllabus PDF
3 pages
Applied Statistics: From Bivariate Through Multivariate Techniques Second Edition – Ebook PDF Version - Quickly download the ebook to explore the full content
100% (1)
Applied Statistics: From Bivariate Through Multivariate Techniques Second Edition – Ebook PDF Version - Quickly download the ebook to explore the full content
55 pages
Immediate download (eBook PDF) Understanding Social Statistics: A Student's Guide to Navigating the Maze ebooks 2024
100% (5)
Immediate download (eBook PDF) Understanding Social Statistics: A Student's Guide to Navigating the Maze ebooks 2024
46 pages
Relationship of Institutional Ownership With Firm Value and Earnings Quality: Evidence From Tehran Stock Exchange
No ratings yet
Relationship of Institutional Ownership With Firm Value and Earnings Quality: Evidence From Tehran Stock Exchange
8 pages
Statistics FinalReview
No ratings yet
Statistics FinalReview
8 pages
The Effect of Temperature On Respiration Rate in Saccharomyces Cerevisiae
No ratings yet
The Effect of Temperature On Respiration Rate in Saccharomyces Cerevisiae
16 pages
Economic Impact of Lamu Cultural Festival 2016
No ratings yet
Economic Impact of Lamu Cultural Festival 2016
58 pages
Multiple Choice Questions: Answer: D
No ratings yet
Multiple Choice Questions: Answer: D
41 pages
Complete Business Statistics: Confidence Intervals
No ratings yet
Complete Business Statistics: Confidence Intervals
50 pages
Out PDF
No ratings yet
Out PDF
9 pages
Guideline For The Transfer of Analytical Test Procedures
100% (5)
Guideline For The Transfer of Analytical Test Procedures
5 pages
Process Capability
No ratings yet
Process Capability
8 pages
Sta 221 Assignment
No ratings yet
Sta 221 Assignment
2 pages
The Test Use This Test When
No ratings yet
The Test Use This Test When
23 pages
AP Statistics CH 9 Student Notes 2017
No ratings yet
AP Statistics CH 9 Student Notes 2017
17 pages
Anvisa Ba-Be Guidelines
No ratings yet
Anvisa Ba-Be Guidelines
5 pages
Q4-M1 Stat
No ratings yet
Q4-M1 Stat
18 pages
Raffaini Bioss Collagen
No ratings yet
Raffaini Bioss Collagen
7 pages
As 02
No ratings yet
As 02
32 pages
Evaluation of Analytical Data
No ratings yet
Evaluation of Analytical Data
58 pages
The Influence of The External Environment On The
No ratings yet
The Influence of The External Environment On The
15 pages
Student Developed Shiny Applications For Teaching Statistics
No ratings yet
Student Developed Shiny Applications For Teaching Statistics
11 pages
Improving The
No ratings yet
Improving The
30 pages
Solutions (Stats)
No ratings yet
Solutions (Stats)
20 pages
Chapter Two: Sampling and Sampling Distribution
100% (1)
Chapter Two: Sampling and Sampling Distribution
30 pages
STAT 112 20172S - y ExamWeek 10 - Quarterly Exam - Quarterl
No ratings yet
STAT 112 20172S - y ExamWeek 10 - Quarterly Exam - Quarterl
4 pages

Chapter 2

Uploaded by

Chapter 2

Uploaded by

CHAPTER TWO

1. Inference about a population mean and proportion

Parameter and Statistic

i. Point Estimation: The goal of point estimation is to make a reasonable guess of

mean and X  10 is an estimate, which is one of the possible values of X .

Variance is Unbiased Estimator of population variance.

Now, we want to compute the expected value of this

Let's write that again as a numbered equation:

Unfortunately, the expected value of the square of something is not equal to

 The estimator should be consistent. For a consistent estimator, as sample size

 There are commonly three properties of interest of a given sampling

Sampling distribution of the sample mean is a theoretical probability distribution

Population var iance   

population values µ and σ, but we estimate sample values.

Sample No Sample Mean ( x )

This is the sampling distribution of x.

1. In general if sampling is with replacement

 Sampling may be from a normally distributed population or from a non- normally

1. The distribution of x will be normal

3. The variance of x is equal to the population variance divided by the sample

/2 (1 ) /2

Figure: 2.1. A (1-  ) Confidence Interval

 decreases as the sample size increases;

Case 2: When n is small and the population variance  2 is not known

/2 (1 ) /2

 Z-based confidence intervals are valid if we have a large sample;

Examples: 1) The registrar of Wollega University is interested to estimate the

a) A 95% confidence interval for the population mean

2. Given : X  32,   4.2 n  36 and the populationis normal

2.2.3 Sampling Distribution of sample Proportion

The concept of proportion is the same as the concept of relative frequency

The idea of hypothesis testing is:

 Ask a question with two possible answers

a. Null hypothesis: Is a claim or statement about a population parameter that is

b. Alternative hypothesis: Is a claim or statement about a population parameter

STEPS IN THE HYPOTHESIS TESTING PROCEDURES

State the null hypothesis and the alternate hypothesis.

When we use the t-statistic, we use the formula

(1) and one-sided (2 and 3) hypothesis as follows:

Case 1: When n is large or the population is normal

deviation is used in and the test statistic will be

 The observations are selected at random from the population

1. Convicted murderers receive a sentence of an average of 18.7 years in prison. A

3.3.3 Hypothesis testing about the population proportion: P

when nPˆ and n(1  Pˆ ) are both greater than 5.

and 3) hypothesis as follows:

The choice of H 1 depends on the prior information we have on the values of P0 .

Hypothesis Decision rule is to reject

1. H 0 : P  0.9 (actually P  0.9 ) VS H1 : P  0.9

Example: the unemployment rate in a given country at a given period is believed to

 n  pˆ qˆ  2   0.235  0.765   194.59  195

You might also like