0% found this document useful (0 votes)
16 views73 pages

Inference 1 Notes Hsts111

This document introduces Statistics, its branches, and practical applications, emphasizing the importance of Descriptive and Inferential Statistics. It explains key concepts such as population, sample, parameters, and statistics, and discusses sampling distributions and the Central Limit Theorem. The document also provides definitions, objectives, and exercises to reinforce understanding of statistical methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views73 pages

Inference 1 Notes Hsts111

This document introduces Statistics, its branches, and practical applications, emphasizing the importance of Descriptive and Inferential Statistics. It explains key concepts such as population, sample, parameters, and statistics, and discusses sampling distributions and the Central Limit Theorem. The document also provides definitions, objectives, and exercises to reinforce understanding of statistical methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

CHAPTER 1

1.0 Introduction
In this chapter, the student will learn about what Statistics is, its main branches and some
practical applications through given problems and exercises.

The word 'Statistics' seems to have been derived from the Latin word 'status' or Italian word
'statista' or the German word 'Statistik' each of which means a political state. Statistics is an old
discipline, as old as the human activity. Its utility has been increasing with time. It was used in
the administrative departments of the states and governments to keep record of birth, death,
population etc., for administrative purpose. John Graunt was the first man to make a systematic
study of birth and death statistics and the calculation of expectation of life at different age in the
17th century which led to the idea of Life Insurance. Almost all the fields like agriculture,
engineering, health, finance, economics, sociology, management etc., are now using Statistical
Methods for different purposes.

1.1 Objectives of the chapter

At the end of this unit, the student should be able to:

 Defining Statistics
 Understand and explain the branches of Statistics
 Distinguish between Population and Sample
 Distinguish between Parameters and statistics

1.2 Definitions and terms

Definition 1.2.1: Statistics is a scientific method of collecting, organising, analysing, and


interpreting numerical information or data.

Statistics has many branches, but the two major ones are Descriptive Statistics and Inferential
Statistics.

Definition 1.2.2: Descriptive or Deductive Statistics is concerned with summarizing given


data sets through summary measures such as the measures of central tendency, measures of
variability, measures of position and others. All graphs also fall under descriptive statistics. No
conclusions or generalisations can be made about population parameters under descriptive
Statistics.

Definition1.2.3: Inferential or Inductive Statistics is that branch of Statistics that deals with
making conclusions, generalisations or extensions of results obtained from sample data to the
entire data set.

1
Definition1.2.4: A random variable is a variable whose value is unknown or a function that
assigns values to each of an experiment's outcome. When the numerical value of a variable is
determined by a chance event, that variable is called a random variable.

Random variables are often designated by letters and can be classified as discrete, that is
variables that can assume some specific and countable values within a given range, or
continuous, that is variables that can have any values within a continuous range.

Consider an experiment where a coin is tossed three times. If X represents the number of times
the coin shows heads then X is a discrete random variable that can only have the values 0, 1, 2
and 3. Examples of continuous random variables are, amount of rainfall in a city over a year, or
the average height of a random sample of 40 girls.

1.3 Population and Sample

Definition1.3.1: A population is defined as the totality of elements under study.


If we consider every individual in a certain country, they all make up a population of that
country. If we consider the ages of all the grade one children in the country, they make up the
population, all the numbers in the interval [10, 18] they make up a population, etc. The term
population does not only refer to people, but to all the objects under consideration.

Some populations are finite whilst others are infinite. Finite populations include the people in a
certain area, ages of all first year university students, number of households in Harare, etc. Other
populations are infinite, and these include the Binomial population, etc.

In practical situations it is not possible to analyse every element in the population due to
financial, economic and time constraints. We normally resort to the use of part of the population
for information. This leads to the following definition.

Definition1.3.2: A sample is any subset of a population selected for study in a research.


The reliability of conclusions drawn about a population rests on the nature of the sample drawn.
If the sample is properly chosen in a way that it sufficiently represents the population, then the
conclusions drawn about the population will be reliable.

There are probability and non-probability sampling methods. The probability sampling methods
include Simple random sampling, Stratified random, systematic sampling, cluster and
multistage-sampling methods. Non-probability sampling methods include convenience sampling,
judgmental sampling, etc. It is advisable to use probability sampling than non-probability
sampling methods because of their sound mathematical and statistical background. You will
learn more about sampling in the module that deals with sampling sand survey techniques.

2
1.4 Population Parameters and Sample statistics
Summary statistics can be obtained from a population or sample. These are calculations made
using population or sample data such as the mean, median, standard deviation, to mention just a
few.

Definition1.4.1: A parameter is any calculation obtained using population data. It can also
be defined as any numeric descriptive measure of a population characteristic or feature, such as
central tendency or variability.

Examples of population parameters include the population mean , population proportion ,


population correlation , population variance , population mode and median, etc.

In making inference about a population parameter on the basis of a sample, we normally use a
statistic, which is defined below.

Example
The probability distribution of a population of figures 1, 2, 3, 4, 5 and 6 is given by

1 2 3 4 5 6
λ

Find the (a) parameter λ,

(b) population mean μ, and

(c) population variance .

Solution
(a) Since it is a probability distribution it means

(b) ∑

( ) ( ) ( ) ( ) ( )

(c)

3
∑ ( ) ( ) ( ) ( ) ( ) ( )

( )

Definition 1.4.2: A statistic is any calculation obtained using population data. It can also be
defined as any numeric descriptive measure of a sample characteristic or feature, such as sample
mean, sample standard deviation.

Examples of sample statistics include the sample mean ̅ , sample proportion ̂ , sample
correlation , sample variance , population mode and median, etc.

Exercise 1
1. Distinguish between the following terms:
(a) Statistics and statistics
(b) Parameter and statistics

2. What sample statistic is used to estimate each of the following a population


parameter? Give reasons for each of your answers.
(a) Population mean
(b) Population proportion
(c) Population variance
(d) Population median

4
CHAPTER 2

2.0 Introduction
Consider the average birth weight, µ in kilograms of all children born in Zimbabwe. The actual
mean weight of all children is not known and it is difficult if not impossible to know it. But in
order to have an idea of the value of µ we find a sample of size n and determine its sample
mean ̅ . These sample means differ from sample to sample. This variability makes it difficult to
make meaningful conclusions based on one sample. If the extent to which a statistic varies from
sample to sample is known, then we can use the ̅ to make inference about µ. In order to make
meaningful inference about µ we need to understand the statistical distribution of the random
variable ̅ , that is its sampling distribution.

The next section gives us the sampling distributions of sample mean, sample proportions, sample
variances and correlations.

2.1 Objectives of the chapter

At the end of this unit, the student should be able to:

 Determine and use the sampling distribution of the sample mean


 Understand and apply the Central Limit Theorem
 Determine and use the sampling distribution of the sample proportion
 Determine and use the sampling distribution of the difference between two sample means
 Determine and use the sampling distribution of the difference between two sample
proportions
 Determine and use the sampling distribution of the sample variance
 Determine and use the sampling distribution of the ratio of two sample variances
 Determine and use the sampling distribution of Order statistics

2.2 Sampling Distributions


Definition 2.2.1: Let be a random sample from a distribution depending on a
parameter . The probability distribution of a statistic ̂ ̂ is called the
sampling distribution of the statistic.

Sampling distributions allow us to calculate probabilities of events relating to the unknown


parameters and also to make inference about them. They also allow us to check whether the
statistics obtained have the desirable properties such as unbiasedness, consistency and others to
be discussed in the next chapter of this module.

5
Consider the experiment where we throw a die and take the uppermost score.

X1 X2 X3 X4
X
Sample 1 6 2 5 6 4.75
Sample 2 2 3 1 6 3
Sample 3 1 1 4 6 3
Sample 4 6 2 2 1 2.75
Sample 5 1 5 1 3 2.5

We know that on a fair die the population mean is 3.5 and the population median is also 3.5. But
if we take a sample of four throws, the mean may be far from 3.5. Since each sample consists of
4 throws, we say that the sample size is . Notice that none of the five samples gave us the
correct mean, and that the mean of the first sample is far from the actual mean. The average
(mean) of these means is 3.2. Thus, although the mean of a particular sample may not be a good
predictor of the population mean, we get better results if we take the mean of a whole bunch of
sample means.

The example above has shown us that a sample statistic (such as the sample mean) may be "all
over the place," so a further question is: How confident can we be in the sample statistic?
Sampling distributions will help us address such a question.

2.3 Sampling Distribution of ̅


Let be a random sample from a normal distribution with mean and variance .
Consider ̅ ∑ . ̅ is a linear combination of normally distributed random variables,
therefore it is also normally distributed.

Theorem 2.3.1: The sampling distribution of ̅ can be summarized by

1. ̅ , the expected value of ̅ is µ


2. ̅
3. ̅ is normally distributed

Proof:

1. ( ∑ ) ∑ ∑

2. ( ∑ )

6

Theorem 2.3.2: Let be a random sample from a normal distribution with mean
̅
and variance , then follows the standard normal distribution.

Exercise 2.3
1. Prove part 3 of the theorem above.
̅
2. Use the method of moment generating function to show that follows the standard


normal distribution.

2.4 Central Limit Theorem


In most practical situations, the data used are usually not normally distributed. If relatively large
samples are drawn from a population, the sampling distribution of ̅ is approximately normal. If
a large sample of Zimbabwean adult weight is collected. Very few people would have their
weights close to 100 kilograms and we tend also to get few people as the weight tends to 30
kilograms.

Theorem 2.4.1: Let be independent and identically distributed random variables


̅
each having a mean µ and finite variance . Then the distribution of approaches the


standard normal, as n gets large, i.e.

̅ ⁄
( ) ∫
⁄ √

This theorem can be applied regardless of the form or type of population probability distribution.
The approximation is generally good for values of

Example
A sample of 800 households had an average household income of $640 and a standard deviation
of $220. Use the Central Limit Theorem to find

7
(a) percentage of households whose income is below $400,
(b) proportion of households whose income is above $730, and
(c) proportion of households whose income is between $400 and $600.

Solution

(a)
The percentage of households below $400 income is 13.79%.
(b) ( )
(c)

2.5 Sampling Distribution of the Sample variance

The sample variance is given by ∑ ̅ . For computational purposes, one can


(∑ )
use (∑ ).

Definition 2.5.1: A random variable X has a chi-square distribution with k degrees of freedom,
i.e. if its density function can be written as


⁄ ⁄

From the module in probability, it can be shown that if a standard normal random variable is
squared, the resulting distribution is chi-square with one degree of freedom.

Lemma 2.5.1: Let X be a standard normal random variable, i.e. . Then , that
is has a chi-square distribution with 1 degree of freedom.

Lemma 2.5.2: It can also be shown that if are independent standard normal
random variables, then , i.e. the sum of squares of k independent
standard normal variables follows a chi-square distribution with k degrees of freedom.

The proofs to lemma ** and ** can be given using the variable transformation techniques or
moment generating functions.

Theorem 2.5.1: Let be a random sample from a normal distribution with mean
and variance , then ̅ ∑ and ∑ ̅ are independent.

We are only going to use the result. The proof to the theorem is beyond the scope of the course.

8
Theorem 2.5.2: Let be a random sample from a normal distribution with mean
and variance and let ̅ and be the sample mean and variance respectively. Then

̅
∑( )

That is follows a chi-square distribution with 1 degrees of freedom.

Proof: For each we have


̅ ̅ ̅ ̅ ̅

Summing over yields

∑ ∑ ̅ ∑ ̅ ̅ ∑ ̅

But ∑ ̅ ̅ so we obtain

∑ ∑ ̅ ̅

Dividing through by we get

̅ ̅
∑( ) ∑( ) ( )

̅ ̅
Now let ∑ ( ) , ∑ ( ) and ( ) . Then with moment



generating function ( ) and with moment generating function


( ) . Since and and are independent and also because of
independence of ̅ and , it can be shown that . Solving this equation
for gives


( )

9
which is the moment generating function of a chi-square distribution with degrees of
freedom.

Example
The lifetime in months of an electronic switch has a distribution. Calculate the probability
that the component will last between 15 and 30 months.

Solution

∫ (Integrating by parts)

⁄ ⁄
[ ]

⁄ ⁄
( )

Exercise 2.5
1. Suppose that the random variables are independent and normally
distributed with mean µ and variance . Compute the probability that
∑ ̅ is greater than .
2. Find such that
̅
2.6 Sampling Distribution of

A random variable X has a student’s t-distribution with n degrees of freedom if its density
function is given by

( ⁄ ) ⁄

( )
√ ( ⁄ )

In most practical cases is usually unknown. The common approach round this is to estimate
by the sample standard deviation S. The t-distribution was first derived by W.S Gosset in 1908
under the pseudonym student.

Theorem 2.6.1: Let be a standard normal random variable and be a chi-square random
variable with degrees of freedom. If and are independent then the ratio

10
√ ⁄

has a t-distribution with degrees of freedom.

The probability density function of the t-distribution is symmetric about zero, but it has tails that
are more spread out than the distribution. As the degrees of freedom increase the t-
distribution tends to look more like the distribution.

Exercises 2.6
1. Let be a random sample from Use theorem 2.6.1 to show that
has a t-distribution with degrees of freedom.


2. Use statistical tables or any computer software to evaluate the following.
(a)
(b)
(c) Find a number such that , for T with 15 degrees of
freedom.
(d) when there are 15 degrees of freedom

2.7 Sampling distribution of the Sample Proportion


Let be the proportion of items or individuals in a population bearing a characteristic or feature
of interest. Let a random sample of size be drawn from the population. Let be the number of
sample items bearing the characteristic of interest. Then the sample proportion is given as ̂
and is an estimator of .

When the sample size is small, then has binomial distribution with mean and variance
where . When n is large, the binomial variable can well be approximated
by the normal distribution with mean and variance , that is

̂
Dividing both numerator and denominator by we obtain . We use this result to
√ ⁄
̂
standardize sample proportion. From the Central Limit Theorem approaches the
√ ⁄

standard normal distribution as .

Using the laws of expected value and variance, we can show that the mean, variance, and
standard deviation of ̂ are:
11
̂ ,

̂ ̂̂ , and

̂ √ . This standard deviation of ̂ is called the standard error of the proportion.

Exercises 2.7
1. In a sample of 2400 households, 1800 were found to be a below the poverty datum line. Find
the sample proportion of households below the poverty datum line,

2.8 Sampling Distribution of the Difference between Two Means


Let and be independent random samples from and
respectively. Let ̅ and ̅ be sample mean and sample variance for the two
samples respectively.

(i) When both population variances are known, then ̅ ̅


̅ ̅
and ̅ ̅ . Then is a standard normal random

̅ ̅
variable, i.e.

(ii) When and are both large, the normal distribution remains valid if and are
replaced by their estimators. The statistic
̅ ̅


has an approximate standard normal distribution.
(iii) If both population variances are unknown and both sample sizes small, we assume
that both populations are normal and also that the population variances are equal
( . Under these assumptions
̅ ̅ and
( )
The common variance can be estimated by combining information from both samples
to obtain a pooled variance denoted by . The pooled variance is calculated as
∑ ̅ ∑ ̅

12
By using the pooled variance as an estimator of the statistics

̅ ̅

√( )

has a t-distribution with degrees of freedom.

Sampling Distribution of the Ratio of Two Variances

Definition **: A random variable has an F-distribution with and degrees of freedom if
its density function is given by

( ) ⁄ ( ⁄ )
( )
( ) ( ) ⁄
( )

In practical cases the F-distribution is useful when testing for equality of at least two variances.

Theorem **: Let and be independent chi-square random variable with and degrees of
freedom respectively. Then the random variable

( ⁄ )
( ⁄ )

has an F-distribution with degrees of freedom.

Consider and sample variances from two independent random samples of sizes and
from normal distributions with population variances and respectively. Let
and . Then the statistic

13
( )

( )

( )

( )

has an F-distribution with degrees of freedom.

Sampling Distribution of the Difference between Two Proportions


Let and be two proportions of items or individuals having a characteristic of interest in two
populations respectively. Independent samples are drawn from the two populations and gave the
statistics ̂ , ̂ and ̂ . If ̂ ̂ ̂ , then the statistic

̂ ̂

√ ̂ ̂ * +

is approximately distributed as standard normal as

Example
In a study to determine the proportions of males and females with obesity, a sample of 400 men
gave 80 of them had obesity and a sample of 700 females had 175 of them with obesity.
Calculate the probability that the proportion of females with obesity is greater than that of males.

Solution

̂ ,̂ , ̂ , and

̂ ̂ ̂ ̂

( )

√( )( )( ))
(

14
Sampling Distributions of Order statistics
Most sampling distribution results (except for CLT) apply to samples from normal populations.
If data does not come from a normal (or at least approximately normal), then statistical methods
called “distribution-free” or “non-parametric” methods can be used. Non-parametric methods are
often based on ordered data (called order statistics: ) or just their ranks.
are the minimum and maximum observations in an ordered dataset.

If are from a continuous population with cdf and pdf then the pdf of
is given by
[ ] [ ]

The confidence intervals for percentiles can be derived using the order statistics and the binomial
distribution.

Sampling Distribution of Ordered Statistics

Theorem: Let be independent and identically distributed with pdf


. Let be the maximum of ( and be the minimum of
( . Then
(a) [ ]
(b) [ ]

Proof

(a)

[ ]
[ ]

Hence [ ]

(b) Proof of part (b) is exactly the same as in part (a)

15
The sampling distribution of the maximum sample observation is given by
[ ] and the sampling distribution of the minimum sample observation is
given by [ ] .

The sample range is the random variable

This statistic gives a measure of the dispersion of the sample. Note the the distribution of the
sample range can be obtained from the joint distribution of X (1) and given earlier.

Exercises
1. Consider a random sample of size n from the exponential distribution with rate parameter
r. Compute the density function of the
(a) order statistic

(b) minimum sample observation , and

(c) maximum sample observation .

2. Consider a random sample of size 10 from the uniform distribution on (0, 2).

(a) Find the distribution of


(b) Find .
(c) Find the mean of

3. Four fair dice are rolled. Find the (discrete) density function of each of the order
statistics.

16
CHAPTER 3

ESTIMATION: POINT ESTIMATION OF POPULATION


PARAMERTERS

Introduction
Point estimation is concerned with finding a single value which we think best represents the
unknown population parameter. Suppose we want to a value which best represents the proportion
of the Zimbabwean population owing a vehicle. The best value is the sample proportion. The
population variance is best represented by the sample variance.

If we are to make some inference about some population parameters, we have to have some
observations or sample, from the population of interest. Then based on these
values we can then find approximate probability distribution which we can then use to address
questions relating to the population of interest. The major aim of having these values is to obtain
a statistics ̂ ̂ which we think is good enough to estimate the population
parameter . Once has been estimated then the underlying probability distribution can be
used for further inference about the population or probability distribution. Definition: An
estimator

Definition: Let be a random sample from a population with probability density


function given by , i.e. the function depends on the unknown parameter . Then the
statistic ̂ ̂ is called an estimator of if it is “close” in some sense to the
true value of . A possible value of ̂ ̂ is called an estimate of .

There are several methods used to find these point estimates. The following section deals with all
the methods used to find these estimates.

Methods of Finding Estimates

17
The following methods are usually used to find estimators of population parameters.

(a) Judgemental method


(b) Method of Moments
(c) Maximum Likelihood Method
(d) Method of Least Squares

Judgemental method
In this method of estimation personal expertise, experience or judgement is used to determine an
estimate ̂ of the population parameter . This method is commonly used in Economics were
personal judgement and experience are required to come up with sound judgements about the
levels of various parameters of an economic system.

Example
If you are asked to estimate the average household income for the population in Harare. What
statistic would you use and why?

Solution
The sample mean is the best estimate. A parameter being estimated is the population mean
therefore the sample mean would be the best natural estimate of .

Example
Let be a random sample from the uniform distribution

(a) Find estimators of .


(b) If the random sample 1.2, 2.4, 3.1, 0.9, 2.0, 1.8, 4.2 is from the uniform distribution, find
the estimates of .

Solution

(a) Since all values of are equal to or greater than , the most sensible estimator of is
̂
Since all values of are equal to or less than , the most sensible estimator of is
̂

(b) Thus ̂ and ̂

18
Method of Moments
This methods is based on both sample and population moments.

Definition: Let be a random sample from a distribution with density function


The method of moments estimator (MME) of is obtained by equating population
moments to sample moments, i.e.

then solve the resulting equations for . The solution(s) to the equations are the estimate(s) of .

The population parameter can be vector or single parameter. One can use moments about zero or
moments about the mean, the results obtained are the same.

Example
Let be a random sample from a distribution with density function

(a) Use the method of moments to find an estimator of


(b) If the sample 0.8, 0.2, 0.1, 0.6, 0.7, 0.5, 0.4, 0.9 and 0.1 is from the population with the
distribution, find the estimate of

Solution

(a) ∫

[ ]

19
is the first population moment

̅ ∑ is the first sample moment

Equate the moments and solve for , i.e.

̅ solving for

̅
̂
̅

(b) ̅ ∑

Exercises
1. Find the method of moments estimators of based on a random sample
from each of the following density functions:
(a)
(b)
(c)

Maximum Likelihood Method


This is the most common method of estimation parameters. This is based on a function called the
likelihood function.

Definition: Let be possible values of a random sample from a


population with a density then the likelihood function of the observed sample
is defined by

The likelihood function is the joint density of it represents the probability of


selecting a sample equal to the observed sample in the case of discrete variable. Since this
probability depends on the parameter , we have to chose a value of which maximizes the
probability of obtaining the observed values.

20
Definition: Let be a random sample from a distribution with density
function , an estimator of that maximizes the likelihood function is
called the Maximum Likelihood Estimator (MLE) for

If the likelihood function is defined on and if is differentiable and assumes a maximum


on , then the MLE will be a solution of the equation (maximum likelihood equation)

If one or more solutions exist, then it should be verified which ones maximises . The value
of that maximises also maximises the log-likelihood, . So for computational
convenience the alternative form of the maximum likelihood equation

can be used.

Example
Let be a random sample from a Poisson distribution

(a) Find the MLE for .


(b) Use the observed values 2, 4, 6, 4, 7, 10, 3, 5, 5, 4, 3 and 7 to find an estimate of .

Solution
(a) The likelihood function is given by

(∏ )

Taking natural logarithms and then differentiate with respect to and set the derivative
to 0, we obtain

( (∏ )
)


(∏ )

21
∑ (∏ )

̅ (∏ )

Thus the MLE for ̂ ̅

(b) The estimate of is


̂

Example
Let be a random sample from an exponential distribution

Find the likelihood function .

Solution


(a) ∏ ( )
∑ ⁄
( )
Taking natural logarithms differentiate and equate to 0 and solve the resulting equation
for
∑ ∑
and
̂ ̅

There are cases where the MLE exists but cannot be obtained as a solution of the maximum
likelihood equation. The example below illustrates such a scenario.

Example
Let be a random sample from a population with a density function

22
(a) Find the MLE for .

(b) Sketch the graph of as a function of .
(c) Hence or otherwise find the MLE for .

Solution

(a) The likelihood function for the sample values is



(b)

(c) From the graph the likelihood attains its maximum at . Thus the MLE for is
̂ .

Exercises
1. Let be a random sample from a geometric distribution

( )( )

Find the MLE for .

2. Let be a random sample from a gamma distribution



.
(a) Find the MLE for .
(b) Find the MME for .
3. Let be a random sample from a normal distribution

23
{ ( ) }

Find the MLE for and .

Method of Least Squares


In certain types of models, the principle of least squares is very important. We assume that the
mean of the random variable is a linear function of unknown vector of parameters
( ) and factors ( ) that can be fixed and measured without error.
We assume that , where is constant and does not depend on .

The linear model can be written as

The least squares estimate of ( ) based on the sample


where ( ), for , is that ̂ ( ̂ ̂ ̂ ) which minimizes
the sum of squares

∑ ∑

We differentiate ∑ with respect to all the then solve the resulting equations for ̂
(̂ ̂ ̂ ).

Example
Let be a random variable mean . Given sample of size
n , find the least squares estimates of and .

Solution

Let ∑

Set the derivatives to zero and solve the simultaneous equations for and we obtain

24
∑ ∑ ∑
̂
∑ ∑

and

∑ ̂∑
̂

Exercises
Consider the data on monthly Income ($) and Savings ($) given in the table below.

Income 980 1500 420 198 785 2800 3845 1890 4210
Savings 180 315 80 42 165 520 760 366 800

Suppose that Income (X) and savings (Y) are related through a relationship with the form

(a) Find the least squares estimates of and .


(b) Estimate when .

25
CHAPTER 4

Properties of Point Estimators

Introduction
This Unit deals with the various requirements or properties that have to be possessed by
estimators. Different estimation procedures yield different estimators for the same population
parameter. We use the different properties to determine the estimator which is the best in some
sense. These properties include error, mean square error, unbiasedness, consistency, efficiency
and sufficiency.

Error
Consider the random sample . The estimator ̂ ̂ of the
population parameter θ. Then error of the estimator ̂ is given by

where θ is the parameter being estimated. Note that the error depends not only on the estimator
but also on the sample. Good estimators tend to have low error values whilst poor ones have
large error values.

Bias
Bias is defined as the difference between the average of the collection of estimates and the single
population parameter being estimated, that is

( ̂)

26
Mean Square Error
It is defined as the expected value of the squared errors, that is,

( ̂) [ ̂ ]

It is used to determine how far on average, the collection of sample estimates are from the
population parameter being estimated. High values of the MSE mean that the poor estimator and
low MSE values mean good estimators.

Exercise
Show that MSE can be expressed as ( ̂) ( ̂) .

Variance
Variance of an estimator ̂ ̂ is defined as the expected value of the
squared sampling deviations, that is

( ̂) [ ̂ ( ̂) ]

It is used to determine how far on average, the collection of sample estimates are from the the
expected value of the estimates. High values of the variance mean that the poor estimator and
low values usually imply good estimators.

Unbiasedness
Definition: Let be a random sample form a population with a parameter . An
estimator of is called an unbiased estimator for if ( ̂ )
̂ , i.e. if the expectation of the
estimator is equal to the parameter .

Example

Let be a random sample form a population with density function , with


mean and variance . Let ̅ ∑ be the sample mean. Show that ̅ is
an unbiased estimator of .

Solution

̅ ( ∑ )

27
( )

{ }

{ }

Since ̅ , ̅ is an unbiased estimator of .

Exercise
1. Let be a random sample from a uniform distribution

(a) Find the estimator of using the method of

(i) moments, and

(ii) maximum likelihood.


(b) Which of the two estimators in (a) is unbiased for ?

2. Let be a random sample from a population with variance . Show that


̂ ∑ ̅ is an unbiased estimator for .

Consistency
Consistency is another way of assessing the accuracy of an estimator. This property says that as
the sample size increases, the estimator ̂ must get closer to its true value.

Definition
Let be a random sample from a population with parameter . Then an estimator
̂ is said to be a consistent estimator for if for any

28
And mean square consistent if

(̂ )

Example
Let be a random sample from the Bernoulli distribution

Show that ̅ ∑ is an unbiased and consistent estimator of .

Solution

For a Bernoulli distribution

̅ ( ∑ ) ∑ ∑

Hence ̂ is an unbiased estimator for .

̅ ( ∑ ) ∑

Hence ̅ , that is ̅ is a consistent estimator for .

Exercise
1. Let be a random sample from an exponential distribution

(a) Determine the method of moment estimator for is a consistent estimator.


(b) Show that the maximum likelihood estimator for is a consistent estimator.

Efficiency
Efficiency is a term used in Statistics when comparing various statistical procedures or refers to a
measure of the optimality of an estimator. A more efficient estimator requires fewer samples that
a less efficient one to achieve a desire level of performance.

29
Definition: Let be a random sample from a distribution having a parameter θ
and let ̂ and ̂ be two unbiased estimators of θ. The relative efficiency of two unbiased
estimators ̂ and ̂ is the ratio of their variances, i.e.

̂
̂

Example
Consider two estimators for the parameter θ of a uniform distribution θ ; θ̂
̅ θ̂ where is the maximum observation on the data .
Determine (a) whether ̂ and ̂ are unbiased estimators of θ.

(b) the relative efficiency of ̂ and ̂ . Comment on your result.

Solution

(a) (̂ ) ̅ ̅
Since ( ̂ ) we conclude that ̂ is an unbiased estimator of .

From order statistics the pdf of is given by . Therefore the expectation


of is given by

(̂ ) ( )( )∫ ( )( )

Since ( ̂ ) we conclude that ̂ is an unbiased estimator of . Therefore both ̂ and


̂ are unbiased estimators of θ.

(b) (̂ ) ̅ ̅ ( )( )
To find the variance of ̂ we first find ( ) that is
( ) ∫ . Therefore the variance of ̂ is given by

(̂ ) ( ) { ( ) ( ( ̂ )) }

Therefore relative efficiency is given by

30
(̂ ) ( ⁄ )
(̂ ) ( ⁄ )

This indicates that for values of , ̂ has a lower variance.

We previously discussed how to compare at least two unbiased estimators for the same
parameter. The one with least variance is considered the better one. The question that we want
to address is, “Is there a best estimator in the sense of possessing a minimum variance? How do
we know if the estimator is the best?”

In the next section we shall see that the variance of an unbiased estimator cannot be smaller than
a certain bound called the Cramer-Rao bound.

Theorem: The Cramer-Rao Inequality


Let be a random sample from where has continuous first-
order and second-order partial derivatives at all but a finite set of points. Suppose the set of
for does not depend on θ. Let ̂ be an unbiased estimator of θ. Then

θ̂
*( ) +
θ

and

θ̂
[ ]

The Cramer-Rao Lower Bound (CRLB) sets a lower bound on the variance of an unbiased
estimator. It uses are

(a) If we find an estimator that achieves the CRLB, then we know that we have found a
Minimum Variance Unbiased Estimator (UMVUE),
(b) The CRLB provides a benchmark against which we can compare the performance of an
estimator,
(c) The CRLB can be used to rule-out impossible estimators, and
(d) The theory behind the CRLB can tell us if an estimator exists that achieves the lower
bound.

Example

31
Let be the total number of successes in each of n independent trials. Let p be
the probability of success in any given trial and is an unknown parameter. Let the distribution of
X be

Let be total number of successes. Define ̂ .

(a) Show that ̂ is unbiased.


(b) Compare ̂ ) with the CRLB for .

Solution

(a) ̂ ( ) , therefore ̂ is unbiased.


(b) We have
̂ ( )
Using the second form of the CRLB we have

Then and .

Taking the expected value of the above equation and substitute in the CRLB inequality, we get

( )

Conclusion: ̂ the CRLB therefore ̂ is a Minimum Variance Unbiased Estimator


(UMVUE).

Sufficiency
A sufficient statistics with respect to population parameter is a statistic ̂ ̂ that
contains all the information that is useful for the estimation of . It is a very useful data
reduction tool, and studying its properties leads to other useful results.

Definition: A statistic ̂ is a sufficient estimator for θ if ̂ is not a


function of .

32
The intuition behind the sufficient statistic concept is that it contains all the information
necessary for estimating θ. Therefore if one is interested in estimating θ, it is necessary to get rid
of the original data while keeping only the value of the sufficient statistic.

The definition of sufficient statistic is very hard to verify. A much easier way to find sufficient
statistics is through the factorization theorem.
Definition: Let be independent and identically distributed random variables
whose distribution is the pdf or the pmf . The likelihood function is the product of
the pd`s or pmf’s, that is


{

The likelihood function is sometimes viewed as a function of (fixing θ) and sometimes


as a function of . In the latter case, the likelihood is sometimes denoted

Theorem: (Factorization Theorem). A statistic ̂ ̂ is a sufficient for θ if and


only if the likelihood function or joint density factorizes into the following form

where does not depend on θ and ̂ depends on θ but only through


the statistic ̂

Exercises
1. Let be a random sample from Bernoulli distribution with unknown
parameter p. The pdf of the is
Determine whether ̂ ∑ is sufficient for p.
2. Let be a random sample from the uniform distribution over the range
Consider the statistic ̂ , determine whether the
statistic is
(a) unbiased, and
(b) sufficient
3. Let be a random sample from a normal distribution for which the mean
µ is unknown but the variance .
(a) Find the unbiased estimator for µ.
(b) Determine whether ̅ ∑ is a sufficient estimator for µ.

33
CHAPTER 5

INTERVAL ESTIMATION
Point estimates vary from one sample to another that is the probability that an estimate is equal
to the population parameter is almost equal to zero. A lee way to this problem is to determine a
range of values that we think contains the parameter we are looking for with some known
probability. Such a range or interval is called an interval estimate of the parameter. In order to
come up with these ranges, we have to preset the required level of confidence and make use of
the sampling distributions that we dealt with in unit ***.

Definition: A confidence interval of a population parameter is an interval


such that ( ̂ ) . The probability is called the level of confidence.

One Sample Problems

34
Confidence Intervals for Population Mean

Consider the random sample form a normal distribution or population with mean
and variance . The confidence intervals for the population mean is

(a) ̅ ⁄ √ if the population variance is known.


(b) ̅ ⁄ √
if the population variance is unknown.
(c) ̅ ⁄ √ if the population variance is unknown but the sample size is large.

Example
A random sample of size 12 was drawn from a normal population with a variance of 69. The
sample gave a sample mean of 28. Determine the

(a) 95%
(b) 98% and
(c) 90%

confidence intervals for the population mean .

Solution

(a) For the 95% confidence interval ⁄ √ .



̅ ⁄
√ √
Thus we are 95% sure that the true population mean lies between 23.3001 and 32.6999.

(b) For the 98% confidence interval ⁄ √ .



̅ ⁄
√ √
Thus we are 98% sure that the true population mean lies between 22.4217 and 33.5783.

(c) For the 90% confidence interval ⁄ √ .



̅ ⁄
√ √
Thus we are 90% sure that the true population mean lies between and

Example
Recorded here is the daily number of cars (in hundreds) passing through toll gate in Nyabira over
a period of two weeks
35
34, 49, 53, 22, 69, 55, 47, 60, 54, 44, 37, 42, 50 and 41

Construct a 95% confidence interval for the mean number of cars passing through the toll gate .

Solution

The sample size , ̅ , and . From the t-distribution


tables

̅ ⁄

Determination of the Sample Size


During the planning stage of an investigation or survey, it is important to address the question of
the sample size. The researcher needs to know beforehand the sample size that gives the desired
level of accuracy or precision.

In order to determine the desired level of precision, we first have to specify

and

Let be a random sample from . Then the minimum sample size


required for ̅ to be within the of the true mean within a probability is


( )

In other words, to be sure that the error of estimation ̅ does not exceed ,

the required sample size is ( )

In most practical cases, is unknown, in such cases use .

Example

36
For each case determine the sample size n required that is required for estimating the population
mean. The population standard deviation , confidence level and the desired error margin are
given below.

(a) 95%,
(b) 98%,

Solution


(a) ( ) ( )
Thus to make sure that the sample average is within 0.75 of the true mean 95% of the
time, we must sample at least 70 units.

(b) ( ) ( )
Thus to make sure that the sample average is within 25.2 of the true mean 98% of the
time, we must sample at least 192 units.

Exercise
1. A random sample of 24 children in Harare had their birth-weights (in kilograms)
recorded. The data obtained is

3.30 3.51 2.97 3.01 2.64 3.00 2.89 3.74 3.66 3.12 2.88 2.97

Assuming the birth weights in Harare have a variance of 0.81,

(a) Construct a 98% confidence interval for true mean birth weight .
(b) Find the minimum sample sizes needed so that one is
(i) 99% confident that the sample mean will be within 0.23kgs of the true mean.
(ii) 90% confident that the sample mean will be within 0.50kgs of the true mean.
2. Measurements on the acidity (pH) of rain samples in an industrial site in Harare recorded
at 15 sites and were

3.6 5.2 4.8 3.7 4.9 3.8 4.6 4.2 4.0 4.6 4.8 4.7 5.0 4.4 5.1

(a) Construct a 98% confidence interval for the mean acidity of rain in that region.

(b) Find the minimum sample sizes needed so that one is 99% confident that the sample
mean will be within 0.23kgs of the true mean.

Confidence Intervals for Population Variance

37
From the sampling distributions of the sample variance, it was shown that one can

easily the confidence interval of . Since has a chi-square distribution with


degrees of freedom, the is given by

( ⁄ ⁄
)

By inverting the inequality and multiplying all the terms by we get

( )
⁄ ⁄

Let be a random samples from . Then the confidence


interval for is

( )
⁄ ⁄

where ⁄ and ⁄ are the ( ⁄ ) and ( ⁄ ) percentiles of


the chi-square distribution with degrees of freedom and is the sample variance.

Example
Suppose that a random sample of size 18 from a normal distribution gave a mean 6.45 and a
variance of 1.92. Construct a 95% confidence interval for the population variance .

Solution

The sample size

The 95% percent confidence interval for is

( ) ( )
⁄ ⁄

Exercise
1. A certain company packages its products in 2 kilograms bags. In order to determine whether
the packaging machine was able to meet the target of 2kgs a random sample of twelve
products was taken and the weights were:
1.89, 2.05, 2.09, 1.89, 1.99, 2.11, 2.08, 1.92, 1.88, 1.79, 1.98, 2.11

38
If the variance of the weights is greater 0.012, the packaging machine is failing to meet the
target of 2kgs.

(a) Construct a 99% confidence interval for the population variance .


(b) Using the confidence interval obtained in (a), is the packaging machine meeting the
target?

2. Given the sample data

12, 19, 12, 10, 17, 9, 21, 11

(a) Obtain the point estimate of the population variance .


(b) Construct a 95% confidence interval for .
(c) Examine whether or not your point estimate lies at the centre of the confidence interval.

Confidence Intervals for a Population Proportion


Let be the population proportion of item or individuals having a characteristic of interest. If a
random sample of size is drawn from the population and let be the number if items bearing
the characteristic of interest. Then the confidence interval for is

̂ ̂
̂ ⁄

where ̂ is the sample proportion and ⁄ is the ( ⁄ ) percentile of the


standard normal distribution.

Minimum Sample Size Required for a Specified Error Margin


The minimum sample size required in order to be sure that will be within of
the true mean is given by


̂ ̂ ( )

Example
A health public survey is carried out in order to estimate the prevalence of HIV/AIDS in the
population of a certain town. A random sample of 400 people showed that 72 were HIV/AIDS
infected,

39
(a) Construct a 98% confidence interval of the population proportion .
(b) How many people should be sampled if one wishes to be 95% sure that error margin is
below 0.08?

Solution

(a) From the sample ̂ , ⁄ ,

̂ ̂
̂ ⁄

(b) From the sample ̂ , ⁄ , ,


̂ ̂ ( ) ( )

Thus to make sure that the sample proportion is within 0.08 of the true proportion 95% of
the time, we must sample at least 521 units.

Exercises
1. In a sample of 800 fuses produced by an electronic company, 69 of them were found to be
defective.
(a) Estimate with a 95% confidence interval the true proportion of defective fuses.
(b) How many fuses should be sampled if one wants to be 97% sure that error margin is
below 0.02?

2. In a survey of 380 voters prior to an election, 260 of them indicated that they would vote for
the incumbent candidate.
(a) Estimate with a 95% confidence interval the population proportion of voters who
support the incumbent.

(b) How many voters should be sampled if one wants to be 99% sure that error margin is
less than 0.03?

Two Sample Problems

Confidence Intervals for the Difference between Two Population Means

40
This section deals with confidence intervals for the differences between two population means.
In this section we assume that the samples used are independent.

Then the confidence interval for is

(a) If population variances, and are both known, Then the confidence
interval for is

̅ ̅ ⁄ √( )

(b) If population variances, and are both unknown and small sample sizes

̅ ̅ ( ⁄ ) √( )

where .

(c) If population variances, and are both unknown and sample sizes large, i.e.

̅ ̅ ⁄ √( )

All the results follow immediately from the sampling distributions of ̅ ̅ .

Example
The following data show summary statistics for two random samples drawn from two normal
populations.

Sample size Sample mean Sample variance


Sample 1 12 71 14
Sample 2 9 64 8

(a) Construct a 90% confidence interval for


(b) Using your results in (a) can we conclude that and are different?

Solution

41
(a) .

√( )

(b) Since the confidence interval does not include a zero, the population means are different.

Exercises
1. Suppose that samples of sizes and are drawn from two normal
populations. The sample statistics are as follows:
̅ ̅
(a) Construct a 95% confidence interval for
(b) Using your results in (a) can we conclude that and are different?

2. Samples of sizes and were drawn from two populations. The sample
statistics

̅ ̅

(a) Construct a 96% confidence interval for


(b) Using your results in (a) can we conclude that and are different?

Confidence Intervals for the Difference between Two Population Proportions


Let and be two proportions of items or individuals having a characteristic of interest in
population 1 and 2 respectively. Independent samples of sizes and respectively are drawn
from the two populations. Let and be the number of item or individuals of interest in
samples from population 1 and 2 respectively. Then the estimates of and are ̂ and
̂ respectively.

Then the confidence interval for is

̂ ̂ ̂ ̂
̂ ̂ ⁄ √

If the confidence interval includes a zero, this means we cannot rule out the possibility that the
two population proportions are equal.

Example

42
In a research that was investigating the sexual practices of men and women, it was found that
284 women in a random sample of 1400 had sexual intercourse below the age of 15 years.
Another sample of 695 men showed that 101 of them had sexual intercourse below the age of 15
years.

(a) Construct a 95% confidence interval for the difference between the proportions of males
and females who had sexual intercourse below the age of 15 years.
(b) Using the results in part (a), can one conclude that the proportion of girls who enter into
sexual unions below the age of 15 years is higher than that of boy?

Solution

(a) Let and be the proportions of males and females who enter into sexual
relationships below the age of 15 years. Then ̂ , ̂ and .
Then confidence interval for is

( ) ( )
( ) √

(b) Since the confidence interval does not include a zero, we conclude that the proportion of
girls who enter into sexual unions below the age of 15 years is higher than that of boys.

Exercises
1. Consider the following data from independent samples from two populations with and
respectively of items bearing a characteristic of interest. For each construct the 98%
confidence interval for ̂ ̂ and use the confidence interval obtained to test
whether ̂ ̂ .

(a)
(b)

2. In a public opinion survey, 80 out of a sample of 120 high-income voters and 30 out of
80 low-income earners supported a decrease in water charges.
(a) Construct a 95% confidence interval for the difference in the proportion of voters
favouring a decrease in the water charges.
(b) Using your result in (a), can we conclude that the proportions of the voters are
different?

43
Matched/ Paired Samples
Let be the paired observations, where the are the
sample values before the experiment and the are the sample values after the experiment.
Let . Assuming that are independent and are distributed as
where then the confidence interval for is

̅

Example
It is claimed that an industrial safety training programme helps to reduce the number of industrial
accidents. The numbers of accidents were recorded for a period of six weeks before the
programme and another six weeks after the programme in eight different companies. The data
obtained were

Company 1 2 3 4 5 6 7 8
Before Training 13 24 18 9 15 18 30 12
After Training 9 18 19 9 13 16 26 11

(a) Determine the 95% confidence interval for the difference in the mean number of
accidents.
(b) Using the results obtained in (a) can one conclude that the claim is true?

Solution

(a) Let . Then . The summary


statistics from the are
̅

̅

√ √
(b) The claim is true since the confidence interval does not include a zero.

Exercise
1. Given following the paired sample data

X 15 13 17 15 18 17
Y 14 12 14 16 18 19

44
(a) Construct a 95% confidence interval for the mean difference of the responses.
(b) Using your results in (a), test whether .

2. Six strings made of wool are stretched with the same force before and after washing and
their length are observed. The lengths in before and after washing are recorded in the
table below.

Before 1.2 2.4 0.6 1.9 4.5 1.7


After 0.9 2.2 0.4 1.8 3.8 1.5

(a) Construct a 95% confidence interval for the mean differences in stretching.
(b) Using your results in (a), test whether washing reduces stretching length.

Confidence Intervals for the Ratio ⁄

Let and be independent samples from and


respectively. Let ∑ ̅ and ∑ ̅ be the unbiased estimates of and

respectively. Then the confidence interval for is

⁄ ⁄

⁄ ⁄

( )

where ⁄ and ⁄ are the ( ⁄ ) and ( ⁄ ) percentiles of the F-


distribution with and degrees of freedom.

Example
Suppose that two independent samples from two normal populations and gave the
following statistics

45
Find the 95% confidence interval for the ratio .

Solution

Then the 95% confidence interval for is

⁄ ⁄
( )

Exercise
Suppose that two independent samples from two normal populations and
gave the following statistics

(a) Find the 95% confidence interval for the ratio .


(b) Using the results obtained in (a), can one conclude that ?

CHAPTER 6

TESTING OF HYPOTHESES

Introduction

46
Testing of hypothesis is mainly concerned with addressing questions arising from experimental
and observational research such as:
 Is the mean age of women in business 35 years?
 Is there a significant difference in the viral loads between males and females infected
with HIV/AIDS?
 Is there any significant difference in the production levels of 4 maize hybrids in
Zimbabwe?

The objective of hypothesis testing is to decide or draw conclusions, based on information


gathered from a sample, on the true values of population parameters.

Definition: A statistical hypothesis is a claim, belief or suspicion about the parameter of a


population or a statistical model for a data generating process or system.

Definition: The null hypothesis a statistical hypothesis denoted by is a statistical


hypothesis which states that there is no change in the level of the population variable we are
interested in.

The null hypothesis is a claim that is established for the purpose of testing. This claim can either
be rejected or not rejected. If there is sufficient evidence to reject the null hypothesis then the
alternative hypothesis is accepted.

Definition: The alternative hypothesis or research hypothesis is a statistical hypothesis


denoted by which says that there is change in the level of the population variable we are
interested in.

A statistical test of hypothesis is a procedure that is used in conjunction with sample data to
decide whether to reject or not to reject the null hypothesis.

Definition: A test statistic, T, is a calculation made from sample data whose value is used as a
basis for deciding whether or not should be rejected.

Definition: A critical region or rejection region of a test is a set C chosen so that if the value
of the test statistic falls in it, then is rejected.

A critical value is a value that separates the acceptance region from the rejection region, and
these values are usually obtained from some statistical tables.

Type I and Type II Errors


Definition: In a statistical test, if is rejected when it is it is in fact true, the error so made is
called a Type I error. The probability of making this error is called the Significance level and
is usually denoted by , i.e.

47
.

The significance level is usually set at 5%, for one to have a $95\%$ confidence on the results
one obtains.

Definition: In a statistical test, if is not rejected or retained when it is it is in fact false, the
error so made is called a Type II error. The probability of making this error is usually denoted
by and it is given by

where is the compliment of set C, i.e. t and


is called the power of a test. The graph of against is called the Operating
characteristic curve.

Example
Let be a random sample of size 1 from a population with a probability distribution
{ }
and suppose that one wants to test the hypothesis

using as the test statistic and [ as the rejection region. Calculate


(a) the probability of making a type I error, the significance level of the test,
(b) the probability of type II error and
(c) power of the test.

Solution

(a)

(b)

(c)

In most computer based tests of hypothesis, the critical values obtained from tables would not be
readily available, but we make the decision of either rejecting or not rejecting based on what
we call a p-value.

48
Definition: P-value is the lowest level of significance at which the null hypothesis can be
rejected. Consequently, we reject at the level, if .

Exercise

1. Suppose you are to verify the claim that on the basis of a random sample of size
58 and you are given that .
(a) If you set your rejection region to be ̅ what is the probability of making
Type I error, i.e. the level of significance of your test.
(b) Find the numerical value of c such that the test ̅ has a 5% of significance.

2. Suppose you want to test the hypotheses on the basis


of a random sample of size 10. If the rejection region is set as , determine the
probability of making a type I error.

3. The probability that a person has an allergic reaction to a new drug on the market
is . The drug is modified and retested on 10 people. The null hypothesis
is rejected if at most two people have an allergic reaction.
(a) What is the probability of making Type I error?
(b) Find the probability of making Type II error for and

Types of tests
One-tailed tests are those statistical tests with the rejection region either on the left or right hand
side of the distribution, i.e. those tests whose alternative hypothesis has the form
or

Two-tailed test are those statistical tests with rejection regions on either tails or ends of the
statistical distribution, i.e. those tests whose alternative hypothesis has the form .

The position of the rejection region depends on the form of the alternative hypothesis and its size
depends on the level of significance selected. The following diagram shows the critical/rejection
regions corresponding to each form of .

Tests concerning the mean of a single population

Consider the random sample form a normal distribution or population with mean
and variance . The test statistics that is used to test the following hypotheses

49
is given by
̅
(d) if the population variance is known.


̅
(e) if the population variance is unknown.


̅
(f) if the population variance is unknown but the sample size is large.

Example
Suppose we want to ascertain whether the mean amount of sulfur in mustard is 0.7, given that
from a sample of size 9, the mean is 0.706. It is known from past experience that the amount of
sulfur in mustard is normally distributed with a standard deviation of 0.25.

Solution

̅
Test statistic


We reject if ⁄ (1.96 is obtained from the Normal distribution Tables)
Substituting ̅ , and in the test statistic we have

.


Since is less than 1.96, hence we fail to reject , we conclude that there is sufficient
evidence that the mean sulfur content is no significantly different from 0.7.

Example

The following is a random sample of 9 observations on the profits (in $000) realised per month
by women cooperatives; 4.9, 5.8, 5.9, 6.5, 5.5, 5.0, 6.0, 5.6 and 5.7. Test at whether the
mean profit is less than 5.9.

Solution

Since the population variance is unknown the test statistic is ⁄


We reject if (from the t-distribution tables).
Substitute ̅ and from sample data into your test statistic, to obtain

50

Since is greater than we fail to reject and conclude that at , the average
profit is less than $5900

Example
Suppose that the legally required level of sodium in the production of a certain beverage is .
A randomly selected sample of 144 bottles of the beverage gave a sample mean of and a
sample standard deviation of . Test whether that amount of sodium is lower than Use

Solution

Since the population variance is unknown but the sample size is large, i.e. the test
̅
statistic is


We reject if 649 (from the Normal Tables)

Substitute ̅ and from sample data into your test statistic, we obtain



Since is less than -1.645, we reject and conclude that at , we have enough evidence
that the mean sodium level in the beverages is less than 3.5g.

Exercises
1. For each of the following sets of values, test the hypotheses indicated.
(a) ̅
(b) ̅
(c) ̅

2. The following data are amounts (in grams) of a certain vitamin found in raw milk.
7.97 7.83 7.56 6.15 7.99 7.28 8.11
8.09 7.12 6.69 7.54 7.35 4.55 6.77
Is there enough evidence to conclude that the mean amount of the vitamin is 6.00g? Use
.

Tests Concerning the Difference between Two Population Means


The sampling distributions used in this section are the same as the ones derived in chapter ** and
also ones used in the construction of confidence interval. In this section, we are to look at

51
situations where we are testing for equality of two population means or situations were the
difference between two population means is equal to some non-zero other value.

The following pairs of hypotheses will be tested.

If we are testing for the equality of two population means then, the hypotheses to be
tested then become

We will consider three different scenarios, given below:

(d) If population variances, and are both known

̅ ̅
The test statistic is
√( )

(e) If population variances, and are both unknown and small sample sizes

̅ ̅
where .
√( )

(f) If population variances, and are both unknown and sample sizes large, i.e.

̅ ̅

√( )

Example

Males claim that their weekly savings are higher than those of their female counterparts. It is
known that the variances for their weekly savings are and . A

52
sample of 40 women gave an average saving of 5.22 whilst another sample of 37 men gave an
average of 6.45. Test at whether this claim can be justified.

Solution

̅ ̅
The test statistic is
√( )

We reject if (from Normal distribution tables)


Substituting all symbols for their respective figures we get

√( )

We reject since and conclude that at males weekly savings of males is


greater than that of females.

Example
In the comparison of two food preservatives purchased by a certain food company, a random
sample of size 8 of one preservative gave an average shelf-life of 6.75 months with a variance of
while the another sample of size 13 of the other preservative gave an average of
5.92 months with a variance of . Test if there is a significant difference between
these two preservatives. Use .

Solution

̅ ̅
The test statistic is
√( )

We reject if

. Therefore

√( )
Since we fail to reject and conclude that the shelf lives of the two preservatives are
not significantly different.

Example
53
Consider the following sample statistics.

Sample size Sample mean Sample variance


Sample 1 320 59.7 5.76
Sample 2 480 53.2 6.25

Test at whether the difference between the two populations means is greater than 3.2.

Solution

̅ ̅
The test statistic is
√( )

We reject if .

Substituting in the test statistic we get

√( )
Since we reject and conclude that at , the difference between the two
populations means is greater than 3.2.

Exercises
1. Eight secretaries were taught using method 1 to type their documents and 6 others were
taught using method 2. Their typing speeds measured by words typed per minute are
shown in the table below.
Method 1: 41 34 38 42 44 39 36 40
Method 2: 56 45 50 54 48 47

Are the typing speeds significantly different for the two teaching methods? Test at the 5%
level of significance.

2. An employment consultant asked selected workers from two different industries to fill in
a questionnaire on job satisfaction. The answers were scored 0 to 50 with higher scores
indicating greater job satisfaction. The data recorded are given in table below.
Industry ̅
A 35 38.4 3.1
B 44 30.5 2.2

54
Is there a significant difference in the job satisfaction of the workers in the two
industries? Use .

Paired Samples
In the previous section we were testing the difference between two population means using
independent samples. In this section we are now considering situations were the samples are not
independent.

Suppose we want to decide on the basis of weights whether a certain diet has an effect of
reducing weight. In this experiment we measure the weights of the people concerned before they
are put on the diet and also measure their weights after the exercise. In this case we obtain two
samples, the before experiment sample and the after experiment sample, and we call these
samples paired/matched samples.

Item
Before the diet
After the diet

The are representing weights before the diet programme and the are weights of
individuals after the diet exercise. To handle this problem, one has to work with the differences
between the paired measurements, i.e. compute the differences
.

Using the new sample of the one can compute the sample mean ̅ ∑ and the
(∑ )
sample variance (∑ ).

The pairs of hypotheses under consideration are

The null hypothesis is equivalent to say the programme has no effect on reducing weight.
Interpreting the alternative hypothesis depends on how one defines the differences.

The test statistic for paired samples is given by


̅


Example
Ten people were put on a special exercise programme for 8 weeks in order for them to lose
weight. The table below gives the weights (in kg) of the 10 people before and after the
programme.

55
Person 1 2 3 4 5 6 7 8 9 10
Before 90 98 89 110 104 78 69 82 66 59
After 87 98 85 105 103 77 62 76 64 55

Test at the level of significance whether this programme reduces weight.

Solution

Let we obtain the sample


3, 0, 4, 5, 1, 1, 7, 6, 2, and 4
̅
The test statistic is given by


We reject if
Substituting ̅ and in the test statistic, we get



Since , we reject and conclude that the programme indeed reduces weight.

Exercises
1. Two different methods of memorizing difficult material are being tested. Nine pairs of
students are matched according to their IQ and background and then assigned to one of
the two methods at random. A test is finally given to all the students and the results
obtained are as follows.

1 2 3 4 5 6 7 8 9
Method A 90 86 72 65 44 52 46 38 43
Method B 85 87 70 62 4 53 42 35 46

Using 5% level of significance, test whether there is a difference in the effectiveness of


the two methods.

2. An experiment was conducted to test the effect of continuous music on productivity. Ten
workers were selected at random and their productivity was measured one month without
music and another month with music. The table below gives the average number of item
produced per day.

Worker 1 2 3 4 5 6 7 8 9 10
Without 8.37 5.01 7.24 6.35 6.24 4.73 7.82 5.67 8.01 6.98
music
With music 7.38 6.96 8.03 6.40 6.02 4.90 8.53 6.28 8.59 7.32

(a) Test the claim that there is no change in production.

56
(b) Construct a 98% confidence interval for the production difference.

Tests Concerning a Population Proportion


When testing a hypothesis on population proportion, the hypotheses to be tested are any of the
following pairs

The test statistics of all the given pairs of hypotheses is

̂
.
√ ⁄

where ̂ is the sample estimate of

Example
A manufacturer claims that at least 98% of his products care defect-free. A sample of 800 items
showed that 56 of them were defective. Test the claim at 1% level of significance.

Solution

̂
Test statistic is
√ ⁄

Reject if
Substitute in the test statistic

Since we reject and conclude that the manufacturer’s claim is false.

Exercises

1. A drug manufacturer claims that the new drug cures a certain disease in 80% of the cases.
In a random sample of 180 patients who used the drug, 105 of them found it to be
effective. Test the manufacturer’s claim at the 0.02 level of significance.

2. A random sample of 480 urban adolescents revealed that 96 of them were to vote
supporting a law supporting abortion. Similarly, 88 out of another random sample of 620

57
rural adolescents were in support of abortion. Test at 5% level of significance whether
there is no significant difference in the two proportions.

3. A dietician claimed the 42% of children in a certain district were undernourished. In


order to verify this claim a random sample of 800 children was taken and 240 of them
were found undernourished. Test the dietician’s claim at 3% level of significance.

Tests Concerning the Difference between Two Populations Proportions


Let and be two proportions of items or individuals having a characteristic of interest in two
populations 1 and 2 respectively. Independent samples are drawn from the two populations and
gave the statistics ̂ and ̂ .

The hypotheses to be tested when we are interested in the difference between two population
proportion is either one of the following,

The test statistics is


̂ ̂

√̂ ̂ * +

where ̂ the pooled estimate of the proportion.

Example
In order to test the effectiveness of a new anthrax vaccine, 240 infected cattle were given the
vaccine and 300 were not. All the 540 cattle were infected with anthrax. Among those which
were vaccinated, 60 of them died and among those which were not vaccinated 115 of them died
from the disease. Does vaccination reduces mortality rate? Use

Solution
58
Let the subscript 1 represents the vaccinated cattle and 2 the unvaccinated ones.

̂ ̂
The test statistics is
√̂ ̂ * +

Reject if

̂ , ̂ and ̂ and ̂

√ ( )* +

Since we reject and conclude that indeed the vaccine reduces mortality rate at
5% level of significance.

Exercises
1. In a random sample of 400 males, 48 of them were found to have sexually transmitted
diseases. In another random sample of 630 females 111 of them were found to have
sexually transmitted diseases.
(a) Construct a 98% confidence interval of the difference in the proportions of those with
the sexually transmitted diseases.
(b) Use your result in (a) to test the hypotheses

2. In a study to determine the prevalence of poverty in rural and urban areas, a sample of
280 households was taken in urban areas and 99 of them were classified as poor. A
sample of 600 households from the rural areas gave 148 of them were classified as poor.
Test at the 2% level of significance whether poverty is high in the urban areas than in
rural areas.
3. Test the following hypotheses for the given data.
(a)

(b)

(c)

59
Tests Concerning a Population Variance
From the sampling distributions of the sample variance, it was shown that

When testing any of following pairs of hypotheses

The test statistic is .

Example
A manufacturer of light bulbs claims that the average lifetime of his bulbs is 150 hours with a
standard deviation of 4.8 hours. A sample of 9 bulbs gave the following lifetimes (in hours)

Test whether the lifetimes of bulbs have a variance greater than hours? Use 5% significance
level.

Solution

The test statistic is

Reject if

Substituting the following and we obtain

Since is less than 15.5 we reject and conclude that the variance of the lifetime of the
bulbs is not significantly different from at 5% significance level.

Exercises
1. The following data are amounts (in grams) of a certain vitamin found in raw milk.
7.97 7.83 7.56 6.15 7.99 7.28 8.11

60
8.09 7.12 6.69 7.54 7.35 4.55 6.77
Is there enough evidence to conclude that the variance of the vitamin is ? Use
.

2. The following is a random sample of 9 observations on the profits (in $000) realised per
month by women cooperatives; 4.9, 5.8, 5.9, 6.5, 5.5, 5.0, 6.0, 5.6 and 5.7. Test at
whether the population variance is significantly different from 0.2.

Tests for Equality of Two Population Variance


There are times when we want to compare or test measures of variability. In order to test for
equality of two population variances we use the sampling distribution of the ratio of two sample
variance.

The hypotheses that will be under consideration are

The hypotheses above can be written also as

The test statistic for is .

From our probability theory, it can be shown that . This is very important in
cases where the values of are not given in the tables.

Example
Consider the following two samples from two independent normal populations.

Sample 1: 8.2, 5.3, 6.5, 5.1, 9.7 and 10.8

Sample 2: 9.5, 8.3, 7.5, 10.9, 11.3, 9.3, 8.8 and 8.0

61
Determine whether there is sufficient evidence that Use

Solution

The test statistic is

Reject if or if

Substitute and into the test statistic we get

Since F is between 0.205 and 5.29, we do not reject . We conclude that is not enough evidence
to conclude that the population variances are significantly different.

Exercises
1. Test the following hypotheses:
(a)

(b)

2. Consider the following samples from two normal populations. Test whether the two
population variances are different. Use

Sample 1: 16 10 24 9 6 16 22
Sample 2: 27 17 10 32 9 15

3. Using data in question 1,


(a) Construct a 95% confidence interval for .

(b) Use your confidence interval (a) to test

62
4. The risk of an investment is at times measured by its variance on the return on
investment. In a comparison of the risk associated with two investments, monthly returns
on two $1000 investments were recorded and the data is given in the table below.

Investment 1 15 9 28 -2 21 10 0 10 13 18
Investment 2 16 -2 -13 35 22 -18 36 -12

(a) Is the risk associated with investment 2 more than that of investment 1? Use

(b) What assumptions have you put in place in order to answer part (a)?

(c) Construct a 98% confidence interval for .

CHAPTER 7
Chi-Square Tests

63
The chi-square test is used to determine whether there is a significant difference between the
expected frequencies and the observed frequencies in one or more categories. The Chi-Square
test can be used when;

(i) Testing to see whether there is an association between two categorical variables.
(Independence or Association tests)
(ii) Testing to see whether a set of observations was drawn from a specified probability
distribution (Goodness-of-Fit tests)

Chi-Square Tests for Independence

This test is used to determine if there is association or no association between two categorical
variables. A set of data is said to be categorical if the data is separable into categories that are
mutually exclusive, for example, race, sex, age groups, and educational level. These factors are
normally displayed in a table called the contingency table and an example of a contingency table
is given below.

Economic Class of parents


High Medium Poor Very Poor
A 27 39 78 98
Grade
Exam

B 48 145 121 67
C 58 211 350 147
F 27 74 112 169

This is a 4 by 4 contingency table because it has 4 rows and 4 columns. We use a chi-square test
to determine whether the two factors are independent or whether there is an association between
them.

Procedure

(i) Specify the hypotheses:


H0: The two factors are not associated or are independent
H1: The two factors are not independent

(ii) The test statistic is given by

∑∑ [ ]

where is the number of rows and c is the number of columns.


(iii) We reject the null hypothesis when the value of the test statistic is greater or equal
to the tabulated or critical value [ ]
(iv) Calculate the Expected frequencies

64
(v) Find the value of the test statistic ∑ ∑ .

(vi) Compare the value of the test statistic against the critical value and make a decision
whether to reject or not. Finally give a conclusion.

Yates’ correction for a 2 by 2 contingency table

When dealing with a 2 by 2 contingency table, the chi-square distribution with one degree of
freedom is used. The Yates correction should be used when calculation the value of the test
statistic, and is given by

∑∑

Example
A study is run to determine whether there is an association between a child’s weight and success
in school. Given the following data, test at α= 0.05 if there is an association between overweight
and success in school?

Weight
Overweight Not overweight
Yes 162 263
Success
Not 38 37

Solution
H0: There is no association between overweight and success in school.
H0: There is association between overweight and success in school.

Overweight Not overweight Total


Yes 162 263 425
Not 38 37 75
Total 200 300 500

65
Rejection Criteria: Testing at α=0.05, we reject H0 if χcalc>χ0.05((2-1)(2-1))= χ0.05(1)=3.84.

Test Statistic:
Since this is a 2 by 2 contingency table, we apply the Yates’ correction to the test statistic and we
obtain

(| | )
∑∑

Conclusion
Since we reject H0 and conclude that at 0.05 level of significance, that there is
sufficient evidence that there is association between weight and success.

Exercises
1. The pass rates of 1700 students from three types of schools had their students’ results in
Statistics analysed and the data obtained are given in the table below.

Type of School
Government Private Boarding
Distinction 189 78 113
Credit 217 60 128
Result

Pass 329 38 174


Fail 289 22 63

Test whether there is an association between Pass rate and the type of school. Test at the
1% level of significance.

2. In a study to determine the mother’s educational level and her number of children, the
following data were obtained.

Mother’s Highest Education Attained


No school Primary Secondary Tertiary

66
Small 17 30 58 78

Family
Medium 42 77 94 41

Size
Large 117 101 46 29

Test at the 5% level of significance whether there is an association between the mother’s
level of education and family size.

3. An experiment was carried out to determine whether there is a relationship between the
type of fertilizer applied and yield. The yield of cops was classified as high, medium and
low. The data obtained are summarized in the table below.

Fertilizer Row Total


A B C
High 20 25 12 57
Yield

Medium 16 17 15 48
Low 12 14 17 43
Column Total 48 56 44 148

Stating your hypotheses clearly, test at the 5% level of significance whether or not there
is evidence of an association between type fertilizer and yield.

McNemar's test

McNemar's test can be viewed as a paired version of Chi-square test. Let's say you asked
whether the participants liked the device before and after the experiment.

After experiment (Y)


Yes No
Yes a b
Before experiment (X)
No c d

Here, what you want to test is whether the number of the participants who liked the device was
significantly changed between before and after the experiment. Given two paired variables where
each variable has exactly two possible outcomes (1 Yes and 2 No), the McNemar test can be
used to test if there is a statistically significant difference between the proportions after and
before the experiment. It is useful when dealing with paired binary response data.

The McNemar test has the following assumptions:

67
1. The pairs (Xi, Yi) are mutually independent.
2. Each Xi and Yi can be assigned to one of two possible categories.
3. The difference

P (Xi = 1, Yi = 2) - P (Xi = 2, Yi = 1)

is negative for all that is

If we let and , then the McNemar test


can be formulated as follows.

The test statistic is given by

which for large samples is distributed like a chi-squared distribution with 1 degree of freedom. A
closer approximation to the chi-squared distribution uses a continuity correction

which is distributed as chi-square with 1 degree of freedom.

Under the null hypothesis, with a sufficiently large number of discordants (cells b and c), χ2 has a
chi-squared distribution with 1 degree of freedom. If either b or c is small (b + c < 25) then χ2 is
not well-approximated by the chi-squared distribution. The binomial distribution can be used to
obtain the exact distribution for an equivalent to the uncorrected form of McNemar's test
statistic. In this formulation, b is compared to a binomial distribution with size parameter equal
to b + c and "probability of success" = ½, which is essentially the same as the binomial sign test.
For b + c < 25, the binomial calculation should be performed, and indeed, most software
packages simply perform the binomial calculation in all cases, since the result then is an exact
test in all cases. When comparing the resulting χ2 statistic to the right tail of the chi-squared
distribution, the p-value that is found is two-sided, whereas to achieve a two-sided p-value in the
case of the exact binomial test, the p-value of the extreme tail should be multiplied by 2.

If the χ2 result is significant, this provides sufficient evidence to reject the null hypothesis, in
favour of the alternative hypothesis that p1 ≠ p2, which would mean that the marginal proportions
are significantly different from each other.

Example

68
A study was carries out to determine if certain has an effect on a certain disease. A sample of 350
people of people were diagnosed for the disease (disease: present or absent) before treatment
given in the rows, and the diagnosis after treatment in the columns. The test requires the same
subjects to be included in the before-and-after measurements (matched pairs).

After Treatment Row Total


Present After
Present 110 130 240
Before
Absent 68 42 110
Column Total 178 172 350

Solution

Test statistic is

We reject if .

The value of the test statistic is greater than 3.84; we therefore reject and conclude that there
is treatment effect.

Exercises

1. Candidates of two political parties A and B gave their campaigning speeches to a group
of a group of 600 residents of a small certain town. There were 380 people who intended
to vote against political party A before the speech, and after the speech 110 of them
changed their voting intention towards the party. There were 220 residents who intended
to vote for party A before the speech and 70 of them changed their voting intention
against the party after the speech. Does the speech have an effect of changing the voting
intentions of the residents? Use

2. In a study to determine whether prevalence of severe cold increases with age, a sample of
1500 school children were questioned. In the study the children were questioned on the
prevalence of symptoms of severe cold at the age of 12 and again at the age of 14 years.

69
At age twelve, 447 children were reported to have severe colds in the past 12 months
compared to 558 at age 14. The data obtained is summarized in the table below.

Severe colds at age of 14 Total


years
Yes No
Severe colds at Yes 257 190 447
age of 12 years No 301 752 1053
Total 558 942

Was there a significant increase of the prevalence of severe cold? Use

Chi-Square Goodness-of-Fit Tests


The chi-square goodness of fit tests are similar to the chi-square tests of association or no
association in that the test statistic is the same and results from comparing the observed and
expected frequencies.

The assumptions for this test statistic are:


(a) The data available for analysis consist of a random sample of n independent observations
(b) The measurement scale may be nominal
(c) The observations can be classified into r non-overlapping categories that exhaust all
classification possibilities, that is, the categories are mutually exclusive and exhaustive.
The number of observations falling into a given category is called the observed frequency
of that category.

Suppose that items from a given population can be placed in any one of the categories
and a random sample of size n is drawn from this population and that the observed
sample frequencies are for each category are . There is a probability that a
randomly selected observation may in any one of these categories

Category 1 2 3 ………i……… r
Observed frequencies

Suppose that the probabilities for belonging to each of these categories respectively
are denoted by , then the expected frequencies ( for each
category are given by , that is . The values
of must be greater than or equal to 5 for the test to be good. If the value of is
smaller than 5, combine the class or category to the adjacent one.

The hypotheses that we are to test are

70
The test statistic used in testing the given hypotheses is given by

This statistic follows a chi-square distribution with degrees of freedom where, is the
number of categories and is the number of parameters that we estimate from sample data. But
for large samples the test statistic is distributed as chi-square with degrees of freedom. If
the computed value of ∑ is equal or greater than the tabulated value of chi-
square with degrees of freedom and significance level , we reject at
the level of significance.

Example
Consider the data in the table below.

x 0 1 2 3 4 5+
Frequency 160 98 41 19 4 1

Test at 5% level of significance whether the data came from a Poisson distribution.

Solution

The test-statistic is ∑ . We are given the observed frequencies in the table above.
We must find the expected frequencies. We have to estimate the Poisson parameter by ̅ , that
is ̂ ̅ ∑ .

71
x 0 1 2 3 4 5+
Observed Frequency 160 98 41 19 4 1
Expected Frequency 145.32 116.09 46.35 12.34 2.47 0.45

The last two expected frequencies are less than 5; we merge the last two categories to 3 to obtain

x 0 1 2 3+
Observed Frequency 160 98 41 24
Expected Frequency 145.32 116.09 46.35 15.26

We have estimated the mean, therefore the degrees of freedom of the test statistic become
. Therefore we reject when .

The value of the test statistic is

The value of the test statistic is greater than the tabulated value 6.00; therefore we reject and
conclude that the data did not come from a binomial distribution.

Exercise
1. A fair die is thrown 400 times and the score on the die is noted. The table below gives the
observed number of scores in the 400 throws.

Score 1 2 3 4 5 6
Frequency 54 60 83 77 82 44

Test, at the 5% level of significance whether the die is fair.

2. According to some genetic theory, the number of white, yellow, blue, red and black
petals should appear in the ratio 3:6:9:2. Observed frequencies of white, yellow, blue, red
and black petals in a sample of 1200 plants were 240, 320, 589, and 51. Test whether the
observed frequencies are consistent with the genetic theory. Use

3. Samples of size 6 each were regularly drawn from a production line and the numbers of
defective products were noted. During one week 400 samples were drawn and the
number of defective units in each sample is noted. The data obtained are as follows.

Number of defective items, x 0 1 2 3 4 5 6


Frequency, f 111 139 88 37 17 5 3

72
Test whether the number of defective items follows the binomial distribution. Use a 5%
level of significance.

4. The weights of 230 grade one pupils were recorded and the data obtained were as
follows.

Weight, x <19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55+
f 6 11 27 37 47 42 34 17 9

Test whether the data follows a normal distribution. Use a 5% level of significance.

Summary

73

You might also like