0% found this document useful (0 votes)
13 views94 pages

Teaching Notes

Uploaded by

Muskan Aneja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views94 pages

Teaching Notes

Uploaded by

Muskan Aneja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 94

Teaching Notes

1 Descriptive Statistics
1.1 Introduction
1. There are 1020 females per 1000 males in India (present).

2. Food minister in the Lok Sabha quotes statistics of sugar exports or those of food grain
production.

3. How many people have taken birth in the current year ? How many of them are males
and females ?

4. There are 860 males to 1000 females in Russia (2021).

5. Approximately 30% of Google’s Employees were female in July 2014.

6. Chicago’s O Hare Airport was the busiest airport in 2014 with a total of 881, 933 flight
arrivals and departures.

All these statements contain figures or convey the message in terms of figures. The ques-
tion also needs an answer in the context of numbers. Hence they may be called as numerical
statements of facts. These are clear, precise and meaningful. And similarly, analysis of such
statements helps one in arriving at certain conclusions. For example to accomodate new stu-
dents or who are likely to take admission in a college, the college authorities need to increase
the seats, build new hostels etc.
Statistics refers to quantitative information or to a method of dealing with quantitative
information. Statistics is a branch of scientific methods used in dealing with such phenomena
that can be described by counts or by measurements.

Definition 1.1. Statistics is the science of collecting, analyzing, presenting and interpreting
data, as well as of making decisions based on such analyses.

Example 1. Suppose someone is interested to purchase a two wheeler motor cycle. The con-
cerned person may have some options already in his or her mind. Further, the relatives, friends
and family members etc. can be contacted for better suggestions and guidance. After having an
idea about different brands of vehicles, the person may analyse these on the basis of mileage,
low maintenance costs, acceleration etc. Finally whichever two wheeler suits the needs of the
concerned person as per his/her judgement will be purchased. This whole process demonstrates
Statistics.

There are two aspects of Statistics. They are as follows:

1. Theoretical: It deals with the development, derivation and proof of statistical theorems,
formulas, rules and laws.

2. Applied: It deals with applications of those theorems, formulas, rules and laws to serve
real world problems.

1
In this course, we will mostly deal with Applied Statistics. Applied Statistics can be
divided into two areas: descriptive statistics and inferential statistics. We will brief both of
them one by one.

Descriptive Statistics
Suppose we have information on the test scores of students enrolled in a class. The whole set
of numbers that represents the scores is called a dataset, the name of each student is called
an element and the score of each student is called an observation. It consists of methods for
organizing, displaying and describing data by using tables, graphs and summary measures.

Example 2. Scores of students in a class need not be represented one by one. We will take
the scores of the students and find the average. Sometimes that gives a better idea about the
performance of the class. Even if we can compare the sections of a class or two classes of a
school. We may also have an idea about the variability among the scores and average score.

Inferential Statistics
The collection of all elements of interest is called population. The selection of a portion of the
elements in this population is called a sample. It consists of methods that use sample results to
help make decisions or predictions about a population.

Example 3. We may want to find the starting salary of a typical college graduate. We may
want to select 2000 recent college graduates, find their starting salaries and make a decision
based on this information.

Example 4. We may be interested to find whether a graduate student has some tehnical skills
known to him/her at the time of his/her passing out from the college. We may pick 4000
graduates, list the technical skills and make a decision based on this information.

Need for Statistics


1. Statistical Methods are specially appropriate for handling data which are subject to vari-
ation that cannot be fully controlled by experimental method.

2. They are used in the analysis of problems in nature, physical and social sciences.

3. Statistical methods are used by governmental bodies, private business firms, research agen-
cies as an aid in forecasting, controlling and exploring.

Functions of Statistics
• It presents facts in a definite form.

• It simplifies mass of figures.

• It facilitates comparison.

• It helps in formulating and testing of hypothesis.

2
• It helps in prediction.

• It helps in the formation of policies.

1.2 Organizing Data


Some Basic Terminologies
Definition 1.2. Data recorded in the sequence in which they are collected and before they are
processed or ranked are called raw data.

Example 5. We collect information about ages of a few students.

21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23

Definition 1.3. An ungrouped data set contains information on each member of a sample or
population individually.

The dataset in Example 5 is ungrouped data. If we rank the data in Example 5 now from
the lowest to highest age, it still remains ungrouped but not raw data.

Definition 1.4. A variable refers to the characteristic that varies in amount or magnitude in
a frequency distribution.

A variable may be either continuous or discrete.

Definition 1.5. A continuous variable is capable of manifesting every conceivable fractional


value within the range of possibilities.

For example, continuous variables can be height, weight etc.

Definition 1.6. A discrete variable is the one which cannot manifest every conceivable fractional
value but appears by limited gradations.

For example, discrete variables can be number of persons, number of machines etc.

Definition 1.7. Classification is the grouping of related facts into classes. Facts in one class
differ from those of another class with respect to some characteristic called a basis of classifica-
tion.

Few examples of classification are:

• Sorting letters in a post office.

• Classifying students in regards to the marks secured by them.

• Classifying people on the basis of their incomes.

Classification can be categorised into two parts: Qualitative and Quantitative.

3
Qualitative Classification
Definition 1.8. The classification of data on the basis of some attributes that can be studied
and not measured is called qualitative classification.

Data is classified on the basis of some attribute or quality such as gender, colour of hair,
literacy, religion etc. In such a case, the attribute cannot be measured. It can only be studied.
So, here two classes are formed, one possessing the attribute and other one not possessing the
attribute.

Example 6. This is an example of two fold classification.

Population

Males Females

The type of classification where only two classes are formed is called a twofold classification.
If we increase the number of classes further on the basis of some attribute or attributes, that
classification is called manifold classification.

Example 7. This is an example of manifold classification.

Population

Males Females

Literates Illiterates Literates Illiterates

Employed Unemployed Employed Unemployed Employed Unemployed Employed Unemployed


Example 8. This is an example of manifold classification.

Krea University

IFMR SIAS

Marketing Finance Mathematics Economics


Further, we provide an example of Qualitative Classification.

4
Example 9. A sample of 30 persons who often consume donuts were asked what variety of
donuts is their favorite. The responses are: glazed filled other plain glazed other frosted filled
filled glazed other frosted glazed plain other glazed glazed filled frosted plain other other frosted
filled filled other frosted glazed glazed filled. Construct a frequency table for these data.
Answer: Frequency distribution of favourite donut variety

Donut Variety Frequency


Glazed 8
Filled 7
Frosted 5
Plain 3
Other 7
Total 30
A frequency distribution lists all categories and the number of elements that belong to each
of these categories. The relative frequency for a category is obtained by dividing the frequency
of that category by the sum of all frequencies. Thus the relative frequency shows what fractional
part or proportion of the total frequency belongs to the corresponding category. The previous
table can be interpreted in a better way in the following manner:
Donut Variety Frequency Relative Frequency Percentage
Glazed 8 8/30 = 0.267 0.267*100 = 26.7 %
Filled 7 7/30 = 0.233 0.233*100 = 23.3 %
Frosted 5 5/30 = 0.167 0.167*100 = 16.7 %
Plain 3 3/30 = 0.100 0.100*100 = 10 %
Other 7 7/30 = 0.233 0.233*100 = 23.3 %
Therefore, 50% of the persons included in the sample said that glazed or filled donuts is their
favourite.

Quantitative Classification
Definition 1.9. The classification of data according to some characteristics that can be measured
is called quantitative classification.
Data is classified on the basis of some attributes or quality such as height, weight etc. In
such a case, the attribute can be measured. Quantitative classification can be of two types:
Discrete frequency distribution and Continuous frequency distribution.
Example 10. This is the example of a discrete frequency distribution.
Number of Children Number of Families
0 10
1 40
2 80
3 100
4 250
5 150
6 50

5
Example 11. This is the example of a continuous frequency distribution.
Weight (lbs) Number of Persons
100-110 10
110-120 15
120-130 40
130-140 45
140-150 20
150-160 5

Example 12. Marks obtained by 25 students of a class are:

10 20 20 30 40 25 25 30 40 20 25 25 50
15 25 30 40 50 40 50 30 25 25 15 40

Calculate the relative frequency and percentage.

Answer:
Marks Frequency Relative Frequency Percentage
10 1 1/25 = 0.04 0.04*100 = 4 %
15 2 2/25 = 0.08 0.08*100 = 8 %
20 3 3/25 = 0.12 0.12*100 = 12 %
25 7 7/25 = 0.28 0.28*100 = 28 %
30 4 4/25 = 0.16 0.16*100 = 16 %
40 5 5/25 = 0.20 0.20*100 = 20 %
50 3 3/25 = 0.12 0.12*100 = 12 %
Total 25

A frequency distribution for quantitative data lists all the classes and the number of values
that belong to each class. Data presented in the form of a frequency distribution are called
grouped data.

Need for a Grouped Frequency Distribution


1. When we deal with 20, 30 or 40 observations at a time, it is ok to go for ungrouped data
and discrete frequency distributions. Suppose we deal with large number of datapoints like
100, 200 or 1000 etc, we should go for continuous frequency distributions where grouping
is required.

2. Also if the datapoints are repeated, it is ok to proceed with ungrouped data. But if data
points are not repeated, we should group data into different class intervals and proceed
for meaningful analysis.

3. Also if the range of the data is high, grouped frequency distribution should be considered.

Example 13. This is the example of a grouped frequency distribution.

6
Weekly Earnings (Dollars) Number of Employees
800-1000 4
1000-1200 11
1200-1400 39
1400-1600 24
1600-1800 16
1800-2000 6
If we take the first row of the table, 800−1000 represents the first class interval. 1000−1200
represents the second class interval and so on. 800 is the lower limit of the first class interval,
1000 is the lower limit of the second class interval and so on. 1000 is the upper limit of the first
class interval, 1200 is the upper limit of the second class interval and so on. 4 is the frequency
of the first class interval, 11 is the frequency of the second class interval and so on.
Definition 1.10. Class mark or class midpoint of a class interval is the average of the lower
limit and upper limit of the class interval.
Definition 1.11. Class size or class width of a class interval is the difference between the upper
limit and lower limit of the class interval.
Definition 1.12. When upper limit of one class interval is the lower limit of the next class
interval, it is known as exclusive method of classification. When upper limit of one class interval
is included in that class interval itself, it is known as inclusive method of classification.

Measures of Central Tendency


We can’t interpret large data sets easily. We seek one single value that describes the character-
istic of the entire mass of unwieldy data. Such a value is called central value. In general, we say
average height, average weight, average income etc. But in the context of Statistics, average is
that value of a distribution:
1. which is considered as most representative or typical value for a group of numbers.
2. which depicts characteristic of the whole group.
3. whose value lies between the two extremes, i.e., the largest and the smallest items.

Properties of a good average


• It should be based on all the observations.
• It should be easy to understand.
• It should be simple to compute.
• It should not be unduly affected by extreme items.
• It should be rigidly defined.
• It should be capable of further algebraic treatment.
• It should have the sampling stability.

7
Mean
Definition 1.13. Mean is the ratio of sum of all values in a dataset to the total number of
values in the dataset.
Pn
Xi
Mean for population data, µ = i=1
N
,
Pn
xi
Mean for sample data, x̄ = i=1
n
,

population mean, x̄ is the sample mean, ni=1 Xi is the sum of all values in
P
where, µ is the P
the population, ni=1 xi is the sum of all values in a sample, N is the number of values in the
population and n is the number of values in the sample.

Example 14. The following table represents the total profits (in million dollars) of 10 U.S.
companies for the year 2014
Company Profits (millions of dollars)
Apple 37, 037
AT&T 18, 249
Bank Of America 11, 431
Exxon Mobil 32, 580
General Motors 5, 346
General Electric 13, 057
Hewlett Packard 5, 113
Home Depot 5, 385
IBM 16, 483
Walmart 16, 022
Find the mean of the 2014 profits for these 10 companies.
Answer: Here,
n
X
xi = 160, 703 and n = 10.
i=1

Therefore,
Pn
i=1 xi 160, 703
x̄ = = = $16070.3 million.
n 10

Thus, these 10 companies earned an average of $16, 070.3 million profits in 2014.

Example 15. The following are the ages (in years) of all eight employees of a small company:

53 32 61 27 39 44 49 57

Find the mean age of these employees.

8
Answer: Here,
n
X
Xi = 362 and N = 8.
i=1

Therefore,
Pn
i=1 Xi 362
µ= = = 45.25 years
N 8
= 45 years 3 months.
Hence, the average age of these employees is 45 years 3 months.
Example 16. The following table represents the marks obtained by 100 students of a class in a
subject
Marks Number of Students (f )
0-10 5
10-20 10
20-30 25
30-40 30
40-50 20
50-60 10
Find the mean of the 2014 profits for these 10 companies.
Answer: The solution can be obtained as follows:
Marks Number of Students (f) Class-Mark (m) fm
0-10 5 5 25
10-20 10 15 150
20-30 25 25 625
30-40 30 35 1050
40-50 20 45 900
50-60 10 55 P 550
Total N = 100 f m = 3300
P
fm 3300
Mean = = = 33
N 100
Therefore, the average marks of 100 students is 33.

Other aspects of Mean


1. The sample mean changes from sample to sample. However, the population mean remains
constant. If we pick two different samples from Example 15, our sample means would be
61 + 44 + 57
x̄ = = 54 years and
3
53 + 27 + 57 137
x̄ = = = 45.66 years
3 3
respectively.

9
2. Sometimes a data may contain a few very small or a few very large values. There values
are called outliers or extreme values. For instance, there are 4 students in a group whose
scores in an exam are:
10, 60, 70, 80.
If we calculate the means, we find:
60 + 70 + 80
x̄ = = 70 (without outlier)
3
10 + 60 + 70 + 80
x̄ = = 55 (with outlier).
4
Hence, mean is not considered a good measure of central tendency when outliers are
present in the data.

3. Open end classess are those in which lower limit of the first class and the upper limit of
the last class are unknown.

Marks Number of Students (f)


Below 10 4
10-20 6
20-30 10
30-40 15
40-50 8
Above 50 7

In such a case, it is very difficult to find the mean.

4. The sum of the deviations of the items from the arithmetic mean is always zero.

Merits
• It is simple to understand.

• It is easy to compute.

• It is based on all observations.

• It is rigidly defined.

• It is a calculated value and not based on the position of the series.

Demerits
• It is not a good measure when outliers are present in the data.

• It is not a good measure when open end intervals are present.

10
Median
Definition 1.14. Median is the value that divides a data set, that has been ranked in increasing
order, in two equal halves.

If a data set has an even number of observations, then median will be the mean of the
two middle values in the ranked data set. If a data set has odd number of observations, then
median will be value of the middle term in a ranked data set. Please remember that half of the
data values lie to the left of the median and half to the right of the median.

Example 17. The following data gives the cell phone minutes used last month by 12 randomly
selected persons.

230 2053 160 397 510 380 263 3864 184 201 326 721

Find the median for this data.

Answer: We will arrange the data in ascending order first. Hence, the ranked dataset looks
like
160 184 201 230 263 326 380 397 510 721 2053 3864
Since, we have even number of observations, we will take the average of two midddle values of
the ranked data set. Hence,
326 + 380 706
median = = = 353.
2 2
Thus, the median cell phone minutes used last month by these 12 persons was 353 minutes.
We can state that half of these 12 persons used less than 353 cell phone minutes and the other
half used more than 353 cell phone minutes last month.
In the above question there are two outliers, 2053 and 3864 minutes respectively, but these
outliers do not affect the value of the median. Median is a positional value.

Example 18. A small company has 11 employees. Their commuting times (rounded to the
nearest minute) from home to work are:

23 36 14 23 47 32 8 14 26 31 18

Find the median for this data.

Answer: We will arrange the data in ascending order first. Hence, the ranked dataset looks
like
8 14 14 18 23 23 26 31 32 36 47
The number of observations are odd. Hence, the median is clearly the middle value of the ranked
data set. Therefore, median = 23 minutes.

The median commuting time from home to work is 23 minutes. We can conclude that half
of these 11 employees took less than 23 minutes to commute and half of the employees took
more than 23 minutes to commute from home to work.

11
Merits
• It is useful in case of outliers.
• It is useful in open and classes.
• It is most appropriate when dealing with qualitative data, i.e., where attributes are studied
or ranks are alloted.
• It is the middle item in the distribution.

Demerits
• We need to arrange the dataset always before finding the median.
• It does not depend upon all the observations.
• It is not capable of further algebraic treatment.
• Sampling fluctuations affect the median more than mean.

Mode
Definition 1.15. The mode is the value that occurs with the highest frequency in a data set.
Example 19. The following data gives the speeds (in miles per hour) of eight cars that were
stopped on some highways for speeding violations.
77 82 74 81 79 84 74 78
Answer: We can observe that 74 occurs twice in the above dataset. Therefore, the mode in
this case is 74. Therefore the modal speed is 74 miles per hour.
A dataset may have more than one mode or no mode. A dataset with each value occurring
only once has no mode. A dataset having each value repeating same number of times has no
mode. A dataset with no repeating values has no mode.
Example 20. The ages of 10 randomly selected students from a class are:
21 19 27 22 29 19 25 21 22 30
Find the mode.
Answer: The dataset has three modes −19, 21 and 22. Each of these three values occur with
a highest frequency of 2.
Example 21. Suppose the patients seen in a mental health clinic during a given year received
one of the following diagnoses: mental retardation, organic brain syndrome, psychosis, neurosis
and personality disorder. The diagnosis occuring most frequently in the group of patients would
be called the modal diagnosis.
Example 22. Suppose 10 patients in an hospital are asked about their non veg preferences for
lunch. They can have either chicken, mutton or fish for the lunch. The option with the maximum
frequency will be treated as most preferred or popular. Mode is really applicable when dealing
with qualitative phenomenon.

12
Merits
• It is useful in case of outliers.

• It can be determined in open end classes easily.

• We can talk of modal wages, modal size of shoe or modal size of family.

• We can compare qualitative phenomenon, e.g., consumer preferences for soap, toothpaste
etc.

Demerits
• It is not capable of further algebraic treatment.

• It does not depend on each item of the series.

• We may not be able to find a mode always.

• It is not rigidly defined.

In clear cut cases, it is ok, i.e., a situation in which there is clearly one value or one class
that is outstanding in the frequency of its occurrence.

Definition 1.16. A distribution in which the values of the mean, median and mode coincide is
known as a symmetrical distribution.

Definition 1.17. When the values of the mean, median and mode don’t coincide or they are
not equal, the distribution is known as asymmetrical or skewed.

Karl Pearson has provided a relationship between mean, median and mode. For moderately
assymetrical distributions, the interval between the mean and median is approximately one third
of the interval between mean and mode. Mathematically,

M ode = 3M edian − 2M ean.

Which average to use ?


1. If the distribution is badly skewed, we should not use mean.

2. If the distribution is gappy around the middle, we should not use median.

3. If the distribution has unequal class intervals, we should not use mode.

4. Median is the best average in case of outliers and open end distributions.

5. Mode describes qualitative data in a better way. It is particularly useful when we deal
with number of people wearing a given size of shoe or number of children per household
etc. The mode is best suited when there is an outstandingly large frequency.

13
General limitations of an average
1. An average may not exist in the data. For example, the arithmetic mean of 100, 300, 250, 50,
800
100 is = 160, a value that does not exist in the dataset.
5
2. At times average may give a very absurd result. Average size of a family could be 4.8
which is absurd.
3. Two or more series may have the same central value but may differ widely in composition.

Series A Series B
150 300
170 500
190 20
210 78
180 2

The central value is 180 for both the series but their composition is different.
4. Average is a single value representing a group of values. So it should be properly inter-
preted. This can be illustrated with the help of a story. A person has to cross the river
from one bank to another. He was not aware of the depth of the river, so he enquired of
another man who told him that the average depth of water is 50 400 . The man was 50 800 and
he thought that he can very easily cross the river because at all times he would be above
the level of water. So he started. In the beginning, the level of water was very low but
as he reached the middle, the water was 150 deep and he lost his life. The man drowned
because he had a misconception that average depth means uniform depth throughout. But
it is not so.

1.3 Measures of Dispersion


The average alone cannot describe the set of observations. It is necessary to describe the
variability or dispersion of the observations.
Definition 1.18. Dispersion refers to the extent to which the items vary from one another and
from some central value.

Need to opt for Dispersion


Let us consider three series.
Series A Series B Series C
200 300 1
200 205 989
200 202 2
200 203 3
P 200 P 190 P 5
A = 1000 B = 1000 C = 1000
Ā = 200 B̄ = 200 C̄ = 200

14
Our Observations
1. Arithmetic mean of all three series is same.
2. In Series A, none of the items deviate from the arithmetic mean and hence there is no
dispersion.
3. In Series B, items vary but the variation is very small as compared to Series C.
4. In Series C, items vary widely from one another.

Hence, measures of central tendency is not enough and should be supplemented with
measures of dispersion to understand the composition of a dataset. It will also help to have an
idea about the variability present in a dataset.
Definition 1.19. The measures that help us learn about the spread of a dataset are called
measures of dispersion.
A measure of dispersion is designed to state the extent to which the individual values differ
on average from the variation or its degree but not in the direction.

Properties of a good measure of dispersion or variation


• It should be simple to understand.
• It should be easy to compute.
• It should be based on all the observations.
• It should be rigidly defined.
• It should be capable of further algebraic treatment.
• It should have sampling stablility.
• It should not be unduly affected by extreme values.

Range
Definition 1.20. It is defined as the difference between the largest and the smallest value in a
dataset.

Range = (largest value - smallest value)


Example 23. The table gives the total areas in square miles of the four western South Central
states of the United States.
State Total Area (Square Miles)
Arkansas 53,182
Louisiana 49,651
Oklahoma 69,903
Texas 2,67,277

15
Find the range for this dataset.
Answer:
Range = ( Largest value - Smallest value )
= (2, 67, 277 − 49, 651)
= 2, 17, 626 square miles.
Thus, the total areas of these four states are spread over a range of 2, 17, 626 square miles.

Example 24. Let us have a look at the following three series:


Series A Series B Series C
6 6 6
46 6 10
46 6 15
46 6 25
46 46 30
46 46 32
46 46 40
46 46 46

Our Observations
1. All the three series have the same range, i.e., 46.

2. If the range is same, distributions are not the same.

3. The range takes no account of the form of the distribution within the range.

Merits
• It is simple to understand.

• It is easy to compute.

Demerits
• It is not based on each and every item of the series.

• Range takes no account of the form of the distribution.

• It is subjected to sampling fluctuations.

Measures of Position
A measure of position determines the position of a single value in relation to other values in a
sample or a population data set.

Definition 1.21. Quartiles are three values that divide a ranked data set into four equal parts.

16
If we divide the data into two equal parts, median solves the purpose. If we divide the
data into four equal parts, we need quartiles for the same. The second quartile is the same as
median of a data set. The first quartile is the median of the observations that are less than
the median and the third quartile is the median of the observations that are greater than the
median.

Definition 1.22. Inter quartile range (IQR) is the difference between third quartile and first
quartile of a dataset. It contains middle 50% of the observations.

Inter quartile range = Q3 − Q1 .

Example 25. A sample of 12 commuter students was selected from a college. The following
data give the typical oneway commuting times (in minutes) from home to college for these 12
students.
29 14 39 17 7 47 63 37 42 18 24 55

1. Find the values of the three quartiles.

2. Where does the commuting time of 47 fall in relation to the three quartiles ?

3. Find the interquartile range.

Answer: First we rank the given data in increasing order as follows:

7 14 17 18 24 29 37 39 42 47 55 63
Q1 Q2 Q3
1.
29 + 37
Q2 = = 33 (Quartile 2 )
2
17 + 18
Q1 = = 17.5 ( Quartile 1 )
2
42 + 47
Q3 = = 44.5 ( Quartile 3 )
2
The three quartiles are 17.5, 33 and 44.5 minutes.

2. By looking at the position time of 47, it falls in the top 25% of the commuting times, i.e.,
to the right of Q3 .

3. Inter quartile range = (Q3 − Q1 ) = (44.5 − 17.5) = 27 minutes, i.e., the commuting time
of the middle 50% students varies within a range of 27 minutes.

Interpretations
• 25% of these 12 students in this sample commute for less than 17.5 minutes.

• 75% of them commute for less than 44.5 minutes and 25% of them commute for more than
44.5 minutes.

17
• 50% of the students commute for less than 33 minutes and 50% of the students commute
for more than 33 minutes.

Example 26. The following are the ages (in years) of nine employees of an insurance company:

47 28 39 51 33 37 59 24 33

1. Find the values of the three quartiles. Where does the age of 28 years fall in relation to
the ages of these employees ?

2. Find the interquartile range.

Merits of IQR
• It is based on middle 50 percent of observations.

• It is very useful in open end disbributions.

• It is not affected by extreme values.

Demerits of IQR
• It is not capable of further algebraic manipulation.

• It shows a distance on a scale. It does not show the scatter around an average.

• It is a positional average.

Standard Deviation
The standard deviation is the most used measure of dispersion. Greater the amount of disper-
sion or variability, greater is the standard deviation and greater will be the magnitude of the
deviations of the values from the mean.
A lower value of the standard deviation for a dataset indicates that the values of that
dataset are spread over a relatively smaller range around the mean. A larger value of standard
deviation for a dataset indicates that the values of dataset are spread over a relatively larger
range around the mean.
A small standard deviation represents a high degree of uniformity of the observations as
well as homogeneity of a series. Thus, if we have two or comparable series with identical or
nearly identical means, it is the distribution with the smallest standard deviation that has the
most representative mean.
The standard deviation is obtained by taking the positive square root of variance. Some basic
formulas are:

(xi − x̄)2
P
2 1 X 2 2
σ = (Xi − µ) , s = ,
rN n − 1
r
1 X 2 1 X
σ= (Xi − µ) and s = (xi − x̄)2 ,
N n−1

18
where σ 2 is the population variance, σ is the population standard deviation, s2 is the sample
variance and s is the sample standard deviation. These can be further simplified as:
P 2  (P X)2  P 2  (Σx)2 
X − N
x − n
σ2 = , s2 = ,
v N n−1
  v  
uP
u X 2 − (ΣX)2 u x2 − (Σx)2
uP
t N t n
σ= and s = .
N n−1
Example 27. The following data gives the 2015 earnings (in thousands of dollars) before taxes
for all six employees of a small company.

88.50 108.40 65.50 52.50 79.80 54.60

Calculate the variance and standard deviation for this data.

Answer: We can calculate the variance and standard deviation from the following table:
Earnings (x) x2
88.50 7832.25
108.40 11750.56
65.50 4290.25
52.50 2756.25
79.80 6368.04
P 54.60 P 2 2981.16
x = 449.30 x = 35,978.51
Here, N = 6.
n o
(449.30)2
35, 978.51 − 6
Variance(σ 2 ) =
6
= 388.90.
The standard deviation is obtained by taking the (positive) square root of variance:

σ= 388.90 = $19.721 thousand.
Thus, the standard deviation of the 2015 earnings of all six employees of this company is
$19.721 thousand.

Example 28. The following data represents the compensation (millions of dollars) of 11 female
CEO’s of American companies.

19.3 16.2 19.6 19.3 33.7 21.0 22.5 16.9 28.7 42.1 22.2

Find the variance and standard deviation for this data.

Answer: We can calculate the variance and standard deviation from the following table:

19
Compensation (x) x2
19.3 372.49
16.2 262.44
19.6 384.16
19.3 372.49
33.7 1135.69
21.0 441.00
22.5 506.25
16.9 285.61
28.7 823.69
42.1 1772.41
P 22.2 P 2492.84
x = 261.50 x = 6849.07
Here, N = 11.
2 (261.5)2
x2 − (Σx)
P
2 n
6849.07 − 11
s = =
(n − 1) 10
6849.07 − 6216.5682
=
10
= 63.2502.
Therefore, the standard deviation is,

s = 63.2502 = $7.95 million.

Thus, the standard deviation of the compensations of these 11 female CEO’s of American
companies is $7.95 million.
P 2 P 2 P 2
Remark 1.1. x is not the same as x . x is obtained by squaring the x values first
P 2 P 
and then adding them. The value of x is obtained by squaring the value of x .

Remark 1.2. The choice of (n − 1) in the sample variance and standard deviation formulas
acknowledges the loss of degrees of freedom when estimating population parameters from a sam-
ple. This adjustment enhances the accuracy and unbiased nature of the estimation, particularly
in scenarios involving smaller sample sizes.

Remark 1.3. Another reason for dividing (n − 1) instead of n in s2 is because there are only
(n − 1) independent observations (xi − x̄). Because their sum is always zero, the value of any
one particular deviation is equal to the negative of the sum of the other (n − 1) deviations.

Remark 1.4. The values of the variance and standard deviation are never negative. They can
become zero but not negative.

Remark 1.5. The measurement units of the variance are always the square of the measurement
units of the original data.

20
Merits
• It is the best measure of dispersion. It is based on each and every observation.

• It is possible to calculate the combined standard deviation of two or more groups.

• For comparing the variability of two or more distributions, coefficient of variation is con-
sidered to be appropriate. And it consists of σ and µ.

• It has sampling stability.

Demerits
• It is difficult to compute.

• It gives more weight to extreme items and less weight to those near the mean. Corre-
spondingly, squares of the deviations would be larger.

Coefficient of Variation
Standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as coefficient of variation. It is used in such problems where we want to compare the
variability of two or more than two series that have different units of measurement. Even if
the measurement units are same, the two means may be quite different. The series for which
coefficient of variation is less, is said to be less variable or more consistent, more uniform, more
stable or more homogeneous. The formula is given as follows:

σ
coefficient of variation = × 100%
µ
s
coefficient of variation = × 100%

σ
coefficient of standard deviation =
µ
Remark 1.6. The coefficient of variation is independent of the unit of measurement.

Remark 1.7. The coefficient of variation expresses the standard deviation as a percentage of
the mean.

Example 29. The yearly salaries of all employees working for a large company have a mean of
$72, 350 and a standard deviation of $12, 820. The years of schooling (education) for the same
employees have a mean of 15 years and a standard deviation of 2 years. Is the relative variation
in the salaries higher or lower than that in years of schooling for these employees ?

Answer: Both the variables have different units of measurement (dollars and years). So, we

21
need to calculate the coefficient of variation for each of these datasets.
σ 12820
CV for salaries =× 100% = × 100%
µ 72350
= 17.72%
σ 2
CV for education = × 100% = × 100%
µ 15
= 13.33%

This implies the salaries have a higher relative variation than the years of schooling.

Example 30 (Exercise). The scores of two batsmen A and B in ten innings during a certain
season are:
A : 32 28 47 63 71 39 10 60 96 14
B : 19 31 48 53 67 90 10 62 40 80
Find which of the two batsmen is more consistent in scoring.

Which measure of dispersion to use ?


1. If the observations are few in numbers or contain extreme values, we should avoid the
standard deviation.

2. If the dataset has gaps around the quartiles, the quartile deviation should be avoided.

3. If there are open end classes, the quartile measure of dispersion should be preferred.

4. In all the general cases, we can use range, quartile deviation and standard deviation.
However standard deviation has a very high degree of accuracy.

Use of Standard Deviation


Theorem 1.1. (Chebyshev’s Theorem) For any number k > 1, at least (1 − 1/k 2 ) of the data
values lie within k standard deviations of the mean, i.e.,
1
P (µ − kσ < X < µ + kσ) > 1 − .
k2
Example 31. The average systolic blood pressure for 4000 women who were screened for high
blood pressure was found to be 187 mm Hg with a standard deviation of 22. Using Chebyshev’s
theorem, find the minimum percentage of women in this group who have a systolic blood pressure
between 143 and 231 mm Hg.

Answer: Let µ and σ be the mean and standard deviation of the systolic blood pressures of
these women. According to the question,

µ = 187, σ = 22.
To find out the percentage, we need to find out k. As per the question, the lower limit is 143
and the upper limit is 231. Hence,

22
µ − kσ = 143 ⇒ 187 − kσ = 143
⇒ kσ = 44
⇒ k(22) = 44
⇒ k=2
   
1 1
∴ For k = 2, number of data values = 1 − 2 = 1 −
2 4
3
= .
4
Hence, according to Chebysev’s theorem, at least 75% of the women have a systolic blood
pressure between 143 and 231 mm Hg.

Example 32 (Exercise). The ages of a sample of 5000 persons are distributed with a mean of
40 years and a standard deviation of 12 years. Determine the approximate percentage of people
whose ages lie between 4 and 76 years.

Empirical Rule
Chebyshev’s theorem is applicable to a distribution of any shape. However, the empirical rule
applies only to a specific shape of distribution called a bell shaped distribution (Normal Curve).
It states that for a bell shaped distribution, approximately

1. 68% of the observations lie within one standard deviation of the mean.

2. 95% of the observations lie within two standard deviations of the mean.

3. 99.7% of the observations lie within three standard deviations of the mean.

Remark 1.8. Please note that since k > 1, Chebyshev’s theorem cannot be used to find the
approximate percentage of observations within one standard deviation of the mean for any dis-
tribution. Due to this reason, Chebyshev’s Theorem does not apply to Empirical rule. However,
Chebyshev’s Theorem holds true for Empirical rule in case of k = 2 and k = 3. Moreover,
Chebyshev’s Theorem gives a range whereas Empirical rule gives a specific value.

23
2 Skewness
Measures of central tendency and measures of dispersion don’t reveal the entire story. There are
two other comparable characteristics called skewness end kurtosis that help us to understand a
distribution. Two distributions may have the same mean and standard deviation but may differ
widely in their overall appearance as can be seen.

In both the distributions, mean and standard deviation are same. (x̄ = 15, σ = 6). The
distribution on the left hand side is a symmetrical one whereas the distribution on the right
hand side is asymmetrical or skewed.

Definition 2.1. Skewness refers to the asymmetry or lack of symmetry in the shape of a fre-
quency distribution.

Symmetrical distribution
In a symmetrical distribution, the values of mean, median and mode coincide. The spread of
the frequencies is the same on both the sides of centre point of the curve.

Positively Skewed distribution


In this case, mean is greater than mode and median lies in between mean and mode. The
frequencies are spread over a greater range of values on the high value end of the curve (the
right hand side).

24
Negatively Skewed distribution
Here, mode is greater than mean and median lies in between mean and mode. The excess tail
is on the left hand side.

Remark 2.1. In moderately asymmetrical distributions, the interval between the mean and
median is approximately one third of the interval between the mean and mode.

Remark 2.2. Dispersion is concerned with the amount of variation rather than its direction.
Skewness tells us about the direction of the variation or the departure from symmetry.

Measures related to Skewness


Let us discuss a few measures related to skewness.

1. Absolute measure of skewness:

Absolute skewness = ( Mean − Mode )

Observations
(a) If the value of mean is greater than mode, skewness will be positive.
(b) If the value of mode is greater than mean, skewness will be negative.
(c) If the values of mean and mode are same, there is no skewness in the data.

Absolute measure of skewness has two issues:

(a) Two series may have different units.


(b) Difference between mean and mode might be considerable in one series and smaller
in another series.

25
Hence, we can use a relative measure of skewness.

2. Karl Pearson’s coefficient of skewness:

Mean − Mode
Coefficient of Skewness =
Standard Deviation

Observations
(a) The value lies between −1 to +1.
(b) When the distribution is symmetrical, the values of mean, median and mode coincide.
Hence, coefficient of skewness will be zero.
(c) When a distribution is positively skewed, the coefficient of skewness will be positive.
(d) When a distribution is negatively skewed, the coefficient of skewness will be negative.

Example 33. Find the absolute coefficient of skewness from the following data:

6 12 18 24 30 36 42 24

Answer: Here, mean = 192


8
= 24. M ode = 24. Therefore, absolute coefficient of skewness is
mean − mode = 24 − 24 = 0. Hence, the distribution is symmetrical.

Example 34 (Exercise). Analyze the skewness of the frequency distribution, given by,

2 3 4 5 6 4

3 Kurtosis
Definition 3.1. Kurtosis refers to the degree of flatness or peakedness in the region about the
mode of a frequency curve. The degree of kurtosis of a distribution is measured relative to the
peakedness of a normal curve.

26
Interpretations
1. If a curve is more peaked than the normal curve, it is called leptokurtic. In such a case,
items are more closely bunched around the mode. Curve L is more peaked than M and is
called leptokurtic.

2. The normal curve itself is known as mesokurtic. Curve M is normal one and it is called
mesokurtic.

3. If a curve is more flat topped than the normal curve, it is called platykurtic. Curve P is
less peaked than M and is called platykurtic.

Skewness and Kurtosis in regards to Moments


We can also calculate skewness and kurtosis in terms of moments of a distribution. Moments
help to characterise a distribution. We have

µ23 µ4
β1 = and β2 = .
µ32 µ22

β1 measures skewness and β2 measures kurtosis. We can also have two other measures:
p
γ1 = β1 and γ2 = β2 − 3.

γ1 measures skewness and γ2 measures kurtosis.

Interpretations
1. When the value of γ1 = 0, the distribution is symmetrical.

2. When the value of γ1 is greater than 0, the distribution is positively skewed.

3. When the value of γ1 is less than 0, the distribution is negatively skewed.

Interpretations
1. For any curve, if β2 = 3, the curve is mesokurtic.

2. When the value of β2 is greater than 3, the curve is leptokurtic.

3. When the value of β2 is less than 3, the curve is platykurtic.

Interpretations
1. For any curve, if γ2 = 0, the curve is mesokurtic.

2. When the value of γ2 is greater than 0, the curve is leptokurtic.

3. When the value of γ2 is less than 0, the curve is platykurtic.

27
Example 35. The first four central moments of a distribution are 0, 2.5, 0.7 and 18.75. Test
the skewness and kurtosis of the distribution.
Answer: Here, µ1 = 0, µ2 = 2.5, µ3 = 0.7 and µ4 = 18.75. We know that
µ23
β1 = .
µ32
Therefore,
(0.7)2
β1 = = +0.031
(2.5)3
and
µ4 18.75 18.75
β2 = 2
= 2
= = 3.
µ2 (2.5) 6.25
Hence, the distribution is slightly positively skewed and mesokurtic.

Example 36 (Exercise). In a moderately skewed frequency distribution, the mean is 20 and the
median is 18.5. If the coefficient of variation is 30%, find the Pearsonian coefficient of skewness.

Box and Whisker Plot


A box and whisker plot gives a graphic representation of data using five measures: the median,
the first quartile, the third quartile, and the smallest and the largest values in the data set
between the lower and the upper inner fences.
This helps us to visualize the center, the spread and the skewness of the data set. It helps
to detect outliers.
Definition 3.2. A plot that shows the center, spread and skewness of a data set is called a Box
and Whisker plot.
Example 37. The following data are the incomes for a sample of 12 households.

75 69 84 112 74 104 81 90 94 144 79 98


Construct a box and whisker plot for this data.

Answer: First we need to rank the data. The ranked data are

69 74 75 79 81 84 90 94 98 104 112 144

Let us find the quartiles and IQR first. They can be obtained as:
84 + 90
Q2 = = 87
2
75 + 79
Q1 = = 77
2
98 + 104
Q3 = = 101
2
IQR = Q3 − Q1 = (101 − 77) = 24.

28
Now, we have to find the points that are 1.5 × 1QR below Q1 and 1.5 × IQR above Q3 .
These two points are called lower and the upper inner fences respectively.

1.5 × IQR = (1.5 × 24) = 36


Lower inner fence = 77 − 36 = 41
Upper inner fence = 101 + 36 = 137

Next, we have to find the smallest and largest values in the given dataset within two inner
fences.

Smallest value = 69
Largest value = 122
Now, draw a horizontal line and mark all the income levels on it such that all the values in
the given dataset are covered. Draw a box with its left side at the position of the first quartile
and the right side at the position of the third quartile. Draw a vertical line at the position of
the median.

We draw two lines. We join two points of the smallest and the largest values within the
two inner fences to the box. These values are 69 and 112. The two lines that join the box to
these two values ane called whiskers. This completes the box and whisker plot.

The observations that fall outside the two inner fences are called outliers. There are two
types of outliers - mild and extreme outliers. To find the values of outer fences, we can subtract
the value of 3× IQR from the first quartile and add the value of 3× IQR to third quartile. In
our case,
3 × IQR = (3 × 24) = 72
Lower outer fence = 77 − 72 = 5
Upper outer fence = 101 + 72 = 173

29
Interpretations
1. If an observation is outside either of the two inner fences but within the two outer fences,
it is called a mild outlier.

2. An observation that is outside either of the two outer fences is called an extreme outlier.

3. If the line representing the median is in the middle of the box and the two whiskers are of
about the same length, then the data have a symmetric distribution.

4. The distribution is skewed to the right if the median is to the left of the center of the box
with the right side whisker equal to or longer than the whisker on the left side.

5. The distribution is skewed to the left if the median is to the right of the center of the box
with the left side whisker equal to or longer than the whisker on the right side.

Example 38 (Exercise). Construct a box and whisker plot for the following data:

53 61 67 71 89 107 122 136 175 208 247 258 361 391 781

4 Probability and distributions


4.1 Basic Probability Concepts
Definition 4.1. An experiment is a process, that when performed, results in one of many ob-
servations. These observations are called outcomes of the experiment. The collection of all
outcomes for an experiment is called sample space.

Experiment Outcomes Sample Space


Play lottery Win, Lose {Win, Lose}
Take a test Pass, Fail {Pass, Fail}
Roll a die once 1, 2, 3, 4, 5, 6 {1, 2, 3, 4, 5, 6}
Toss a coin once Head, Tail {Head, Tail}

Example 39. What is the sample space when a coin is tossed twice ?

Answer: We may get a head or tail on the first toss and a head or tail on the second toss
respectively. Therefore,

sample space = {(H, H), (H, T ), (T, H), (T, T )}.

Remark 4.1. If you toss one coin twice or two coins once, sample space will remain the same.

Example 40. What is the sample space when two dice are thrown once ?

Answer: On the first throw of a die, we may get 1, 2, 3, 4, 5 and 6. Similarly, on the second
toss, the results could be 1, 2, 3, 4, 5 and 6. Therefore,

sample space = {(1, 1), (1, 2), . . . , (1, 6), (2, 1), (2, 2), . . . , (2, 6), . . . , (6, 1), (6, 2), . . . , (6, 6)}.

30
Remark 4.2. If you throw one die twice or two dice once, sample space will remain the same.

Example 41. Suppose we randomly pick families consisting of two children and we assume that
the child could be a boy or a girl. What is the sample space ?

Answer: The first child could be a boy or a girl. Similarly, for the second child, we have two
options, i.e., a boy or a girl. Therefore,

sample space = {(G, B), (B, G), (G, G), (B, B)}.

Definition 4.2. Event is a collection of one or more outcomes of the experiment.

Definition 4.3. A simple event is the one that includes only one outcome of the experiment.

Definition 4.4. A compound event consists of more than one outcome for an experiment.

Example 42. In a group of college students, some like ice tea and others do not. There is no
student in this group who is indifferent or has no opinion. Two students are randomly selected
from this group.

1. How many outcomes are possible ? List all possible outcomes.

2. Consider the following events. List all the outcomes included in each of these events.
Mention whether each of these events is a simple or a compound event.

(a) Both students like ice tea.


(b) At most one student likes ice tea.
(c) At least one student likes ice tea.
(d) Neither student likes ice tea.

Answer: Let L denote the event that a student likes ice tea and N denote the event that a
student does not like ice tea.
(1) The experiment has four outcomes. Let us define:
LL = Both the students like ice tea,
LN = The first student likes ice tea and the second student does not,
N L = The first student does not like ice tea and the second student likes ice tea,
N N = Neither of them like ice tea.
(2) The events are:

1. Both students like ice tea = {(L, L)}. This event includes only one outcome. Hence, it is
a simple event.

2. The event at most one student likes ice tea = {(N, N ), (L, N ), (N, L)}. This event includes
three outcomes. Hence, it is a compound event.

3. The event at least one student likes ice tea = {(L, N ), (N, L), (L, L)}. This event includes
three outcomes. Hence, it is a compound event.

4. The event neither student likes ice tea = {N N }. This event includes only one outcome.
Hence it is a simple event.

31
Idea of Probability
We may hear some statements like:
1. It may rain tomorrow.

2. Mr. X may not come for taking his class today.

3. The chances of both the teams A and B winning today’s match are equal.

4. I may not be able to join you at the coffee on Monday.


Definition 4.5. Probability is the likelihood of occurrence of an event. It is denoted by P . It is
a numerical measure of the likelihood that a specific event will occur.

Four conceptual approaches to probability


1. Classical Probability
Various outcomes for an experiment may have the same probability of occurrence. Such
outcomes are called equally likely outcomes. The classical probability rule is applied to
compute the probabilities of events for an experiment for which all outcomes are equally
likely.
Definition 4.6. Two or more outcomes that have the same probability of occurrence are
said to be equally likely outcomes.

Classical probability rule to find probability

Suppose Ei is a simple event and A is a compound event for an experiment with equally
likely outcomes. Then by the classical probability rule:

1
P (Ei ) = ,
Total number of outcomes for the experiment
Number of outcomes favourable to A
P (A) = .
Total number of outcomes for the experiment
Example 43. Find the probability of obtaining an even number in one roll of a die.

Answer: When we roll a die, it has six outcomes. Therefore,

sample space = {1, 2, 3, 4, 5, 6}.

Given that the die is fair, these outcomes are equally likely. Let A be the event that an
even number is observed on the die. Event A includes three outcomes, 2, 4, 6. Favourable
cases for A = {2, 4, 6}.
Favourable cases to A
∴ P (A) =
Total number of outcomes
3 1
= = .
6 2
32
Example 44. Jim and Kim have been looking for a house to buy in New Jersey. They like
five of the homes they have looked at recently and two of those are in West Orange. They
cannot decide which of the five homes they should pick to make an offer. They put five
balls (of the same size) marked 1 through 5 (each number representing a home) in a box
and asked their daughter to select one of these balls. Assuming their daughter’s selection
is random, what is the probability that the selected home is in West Orange?

Answer: Here, there are 5 houses. Only two out of these five homes are in West Orange
and all the outcomes are equally likely due to the fact that their daughter’s selection is
random. Let A be the event that the selected home is in West Orange.
Favourable cases to A
∴ P (A) =
Total number of outcomes
2
= .
5

Need to look for another approach


Suppose we want to calculate the following probabilities:

• The probability that a bulb will burn less than 2000 hours.
• The probability that a randomly selected family owns a home.
• The probability that a randomly selected person owns a sport utility vehicle.
• The probability that a randomly selected woman is a foodie.

These probabilities cannot be computed using the classical probability rule because the
various outcomes for the corresponding experiments are not equally likely. The two out-
comes “Family owns a home” and “Family does not own a home” are not equally likely.
If they were, half of the persons in the world might be having their own homes, which is
not necessarily true. Generally, the classical definition is difficult to apply as soon as we
deviate from the the fields of coins, dice, cards etc.
In such cases, to calculate probabilities, we either use past data or generate new data
by performing the experiment a large number of times. Using these data, we calculate
the frequencies and relative frequencies for various outcomes. The method of assigning a
probability to an event in the context of relative frequency is called the relative frequency
concept of probability.

2. Relative Frequency Approach


If an experiment is repeated n times and an event A is observed f times where f is the
frequency, then according to the relative frequency concept of probability:
f frequency of A
P (A) = = .
n sample size

If we toss a coin 10 times, we may not get exactly 5 heads and 5 tails. However, as the
experiment is carried out larger and larger number of times, say, a coin is thrown 10, 000

33
times, we can expect occurrence of heads and tails very close to 50%. Theoretically, we
can never obtain the probability of an event. We can only try to have a close estimate
of P (A) based on a large number of observations. This approach is also called Empirical
probability.

Remark 4.3. Relative frequencies are not exact probabilities but are approximate proba-
bilities unless they are based on a census. However if the experiment is repeated again and
again, this approximate probability of an outcome obtained from the relative frequency will
approach the actual probability of that outcome. This is called the Law of Large Numbers.

Definition 4.7. Law of large numbers states that if an experiment is repeated again and
again, the probability of an event obtained from the relative frequency approaches the actual
(or theoretical) probability.

Example 45. Ten of the 500 randomly selected cars manufactured at a certain auto factory
are found to be ”Suzuki Dzire”. Assuming that Suzuki Dzires are manufactured randomly,
what is the probability that the next car manufactured at this auto factory is a ”Suzuki
Dzire” ?

Answer Let n denote the total number of cars in the sample and f the number of Suzuki
Dzires in n. Then from the given information:

n = 500 and f = 10.

Using the relative frequency concept of probability, we obtain


f 10
P ( next car is a Suzuki Dzire ) = = = 0.02.
n 500
The probability is actually the relative frequency of Suzuki Dzires in 500 cars. The distri-
bution can be seen in the foloowing table.

Car Frequency Relative Frequency


Good 490 490/500 = 0.98
Defective 10 10/500 = 0.02
Total 500 1.00

From this table,


P ( next car is good ) = 0.98 and
P ( next car is defective ) = 0.02.
Example 46. Allison wants to determine the probability that a randomly selected family
from New York state owns a home. How can she determine this probability ?

Answer: This is not a case of equally likely outcomes. Hence, the classical probability
rule cannot be applied. However we can repeat this experiment again and again. In other
words, we can select a sample of families from New York State and observe whether or
not each of them owns a home.

34
Suppose Allison selects a random sample of 1000 families from New York State and observes
that 730 of them own homes and 270 do not own homes. Then,

n = sample size = 1000


f = number of families who own homes = 730
f 730
Therefore, P ( a randomly selected family owns a home ) = n
= 1000
= 0.730.

Remark 4.4. Suppose she picks a different sample or repeats this experiment, she may
obtain a different probability for this event. However because this sample size is large
(n = 1000), the variation is expected to be relatively small.

Need to look for some other approach other than the previous
two
Many times, we face experiments that neither have equally likely outcomes nor can be
repeated to generate data. In such cases, we cannot compute the probabilities of events
using the classical probability rule or the relative frequency concept. For example, consider
the following probability of events:

• The probability that Carol, who is taking a statistics course for the first time, will
earn an A in the course.
• The probability that the Dow Jones industrial average will be higher at the end of
the next trading day.
• The probability that the New York Giants will win the Super Bowl next season
provided they are playing for the first time.
• The probability that Joe will lose the lawsuit he has filed against his landlord.

The two events “Carol will earn A” and “Carol will not earn A” are not equally likely. Also
she cannot take the tests again and again to calculate the relative frequency of getting or
not getting an A grade. She will take the test only once. The probability assigned to an
event in such cases is called subjective probability.
It is based on the individual’s judgement, experience, information and belief. Carol may
be very confident and assign a higher probability to the event that she will earn an A in
Statistics whereas her instructor may be more cautious and assign a lower probability to
the same event.

3. Subjective Probability

Definition 4.8. Subjective probability is the probability assigned to an event based on


subjective judgement, experience, information and belief. There are no definite rules to
assign such probabilities.

This approach leads to somewhat biased results.

35
4. Axiomatic Approach to Probability
Here, only axioms or postulates have been given. These axioms were developed by Prof.
A. N. Kolmogorov. They are:

(a) The probability of an event ranges from 0 to 1.


(b) The probability of the entire sample space is 1.
(c) If A and B are mutually exclusive (or disjoint) events, then the probability of occur-
rence of either A or B, denoted by P (A ∪ B) shall be given by:

P (A ∪ B) = P (A) + P (B).

The last approach leads us to the following properties.

Properties of probability
1. Whether it is a simple or compound event, the probability of an event is never less than
zero or greater than one. An event that cannot occur has zero probability and is called an
impossible event. An event that is certain to occur has a probability equal to one and is
called a sure event. Some examples are:

P ( getting a 7 in throw of a die ) = 0


P ( Sun will rise in the East ) = 1
P ( a born child will eventually die one day ) = 1

2. The sum of P
the probabilities of all simple events (or final outcomes) for an experiment,
denoted by P (Ei ), is always 1.

Uses of Probability
Some of the uses of probability are as follows:

1. To determine lifespan of a radioactive atom.

2. To find out regarding the crossing of two species of plants.

3. To construct econometric models with managerial decisions on planning and control, with
the occurrence of accidents of all kinds and with random disturbances in an electrical
mechanism.

Independent versus dependent events


If we toss two coins, the outcome of the first toss does not affect the outcome of the second toss.
Whether we get a head or tail in the first toss, the probability of obtaining a head in the second
toss is still 0.50 and that of a tail is 0.50. This is an example of a statistical experiment where
the outcomes of two tosses are independent.

36
Definition 4.9. Two events are said to be independent if the occurrence of one event does
not affect the probability of the occurrence of the other event. In other words, A and B are
independent events if either

P (A/B) = P (A) or P (B/A) = P (B).


Definition 4.10. Thus, if the occurrence of one event affects the probability of the occurrence
of other event, then the two events are said to be dependent events. In other words, A and B
are dependent events, if either
P (A/B) 6= P (A) or P (B/A) 6= P (B).
Example 47. In a survey, 500 randomly selected adults who drink coffee were asked whether
they usually drink coffee with or without sugar. Of these 500 adults, 240 are men and 175 drink
coffee without sugar. Of the 240 men, 84 drink coffee without sugar. Are the events drinking
coffee without sugar and men independent?
Answer: Let us have a look at the following contingency table:
Coffee with sugar Coffee without sugar Total
Man 156 84 240
Woman 169 91 260
Total 325 175 500
From the table, we have,
175
P ( drinking coffee without sugar ) = = 0.35
500
240 12
P (Man) = = = 0.48
500 25
84
P ( drinking coffee without sugar/Man ) = = 0.35
240
Since the two probabilities are equal, the two events drinking coffee without sugar and
men are independent.
Definition 4.11. The complement of event A, denoted by Ā and read as “A bar” or “A com-
plement” is the event that includes all the outcomes for an experiment that are not in A. Math-
ematically,
P (A) + P (Ā) = 1.
Example 48. In a group of 2000 taxpayers, 400 have been audited by the auditors at least once.
If one taxpayer is randomly selected from this group, what are the two complimentary events for
this experiment and what are their probabilities ?
Answer: The two complimentary events are:
A = the selected taxpayer has been audited by auditors at least once.
Ā = the selected taxpayer has never been audited by the auditors.

Event A includes the 400 taxpayers who have been audited by the auditors at least once,
and Ā includes the 1600 taxpayers who have never been audited by the auditors. Therefore,
P (A) = 400/2000 = 0.20
P (Ā) = 1600/2000 = 0.80.

37
Marginal Probability
Suppose all 100 emplogees of a company were asked whether they are in favour of or against
paying high salaries to CEOs of U.S companies. We assume that every employee responds either
in favour or against.
In Favour Against
Male 15 45
Female 4 36
The above table shows the distribution of 100 employees based on two variables or char-
acteristics:
1. Gender (Male or Female)
2. Opinion (In Favour or Against)
Such a table is called a contingency table or a two way table. Each box that contains a number
is called a cell.
In Favour Against Total
Male 15 45 60
Female 4 36 40
Total 19 81 100
Definition 4.12. Marginal probability is the probability of a single event without consideration
of any other event.
For example,
Number of males 60
P ( male ) = = = 0.60.
Total number of employees 100
40
P ( female ) = = 0.40.
100
19
P ( in favour ) = = 0.19.
100
81
P ( against ) = = 0.81.
100
Now, suppose that one employee is selected at random from these 100 employees. Further-
more, assume that it is known that this employee is a male. In other words, the event that the
employee selected is a male has already occurred. Given that this selected employee is a male,
he can be in favour or against. What is the probability that the employee selected is in favour
of paying high salary to CEOs ?

Definition 4.13. Conditional probability is the probability that an event will occur given that
another event has already occurred. If A and B are two events, then the conditional probability
of A given B is written as

P (A/B)

and read as“the probability of A given that B has already occurred.”

38
Example 49. From the aforementioned table in the previous page, calculate the conditional
probability that a randomly selected employee is a female given that this employee is in favour
of paying high salaries to CEOs.
Answer: We are to find the probability of a female given that the selected employee is in favour,
that is, we are to compute P (f emale/inf avour).
Number of females who are in favour 4
Hence, P (f emale/inf avour) = = .
Total number of employees who are in favour 19
Example 50. From the aforementioned table in the previous page, calculate the conditional
probability that a randomly selected employee is in favour of paying high salaries to CEOs given
that this employee is a male.
Answer: We are to find the probability of an employee in favour given that the selected
employee is a male, that is, we are to compute P (inf avour/male).
Number of males who are in favour 15 1
Hence, P (inf avour/male) = = = .
Total number of employees who are males 60 4
Definition 4.14. Two events are said to be mutually exclusive if they cannot occur together,
i.e., two events cannot happen simultaneously in a single trial. For example, head and tail cannot
occur simultaneously in a single trial. Mathematically,

P (A ∩ B) = 0.

Example 51. Consider the following events for one roll of a die:
A = an even number is observed = {2, 4, 6}
B = an odd number is observed = {1, 3, 5}
C = a number less than 5 is ob,renved = {1, 2, 3, 4}
Are events A and B mutually exclusive ? Are events A and C mutually exclusive ?
Answer: We observe that events A and B have no common element. For one roll of a die, only
one of the two events A and B can happen. Hence these are two mutually exclusive events.
Events A and C have two common outcomes: 2 and 4. Hence, events A and C are not mutually
exclusive.
Example 52. Consider the following two events for a randomly selected adult:
Y = this adult has shopped at least once on the internet.
N = the adult has never shopped on the internet.
Are events Y and N mutually exclusive ?
Answer: We observe that events Y and N have no common element. Therefore, events Y and
N are mutually exclusive.
Example 53. From the aforementioned table in the previous page, determine whether the events
in favour and female independent.
Answer: To check the independence of both the concerned events, we have to check whether
the conditional probability is equal to marginal probability, i.e.,
Number of females who are in favour 4 1
P (inf avour/f emale) = = = .
Total number of employees who are females 40 10

39
Similarly,
19
P (inf avour) = .
100
We can see that both the probabilities are not equal. Hence, the events are dependent.

Example 54. In a survey, 500 randomly selected adults who drink coffee were asked whether
they usually drink coffee with or without sugar. Of these 500 adults, 240 are men and 175 drink
coffee without sugar. Of the 240 men, 84 drink coffee without sugar. Are the events drinking
coffee without sugar and men independent?

Answer: For the given data, we have the following table:


Coffee with Sugar Coffee without Sugar Total
Men 156 84 240
Women 169 91 260
Total 325 175 500

P (No. of adults who drink coffee without sugar/the selected adult is a man)

Number of men who drink coffee without sugar 84 7


= = = .
Total number of men 240 20

175 7
P (No. of adults who drink coffee without sugar) = = .
500 20
We can see that both the probabilities are equal. Hence, the events are independent.

Multiplication Theorem
If we have two events that are independent of each other, then,

P (A ∩ B) = P (A).P (B).
However, if the two events are dependent, then, we have,

P (A ∩ B) = P (A).P (B/A) or P (A ∩ B) = P (B).P (A/B).

Example 55. An office building has two fire detectors. The probability is .02 that any fire
detector of this type will fail to go off during a fire. Find the probability that both of these fire
detectors will fail to go off in case of a fire. Assume that these two fire detectors are independent
of each other.

Answer: It is given that these two fire detectors are independent of each other. Let A denote
the event that first fire detector fails to go off and let B be the event that second fire detector
fails to go off.
Hence,
P (A ∩ B) = P (A).P (B) = 0.02 × 0.02 = 0.0004.

40
Example 56. Suppose we have a table:
College Graduate Not a College Graduate Total
Male 7 20 27
Female 4 9 13
Total 11 29 40
If one of these employees is selected at random for membership on the employee–management
committee, what is the probability that this employee is a female and a college graduate?
Answer: Let A be the event that the employee is a female and B be the event that the employee
is a college graduate. Clearly, we can see that the events are dependent. Therefore,
13 4 1
P (A ∩ B) = P (A) × P (B/A) = × = = 0.1.
40 13 10

Addition Theorem
If A and B are two mutually exclusive events, then the probability of either A or B is given by

P (A ∪ B) = P (A) + P (B).

However, if they are not mutually exclusive, then then the probability of either A or B is given
by
P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
Example 57. In a group of 2500 persons, 1400 are female, 600 are vegetarian, and 400 are
female and vegetarian. What is the probability that a randomly selected person from this group
is a male or vegetarian?
Answer: We have the following table:
Vegetarian Non-Vegetarian Total
Female 400 1000 1400
Male 200 900 1100
Total 600 1900 2500
Let M be the event that the person is a male and V be the event that the person is a vegetarian.
Therefore,
1100 600 200 1500
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = + − = = 0.6.
2500 2500 2500 2500
Example 58. A university president proposed that all students must take a course in ethics as
a requirement for graduation. Three hundred faculty members and students from this university
were asked about their opinion on this issue. The following table gives a two-way classification
of the responses of these faculty members and students.
Favour Oppose Neutral Total
Faculty 45 15 10 70
Student 90 110 30 230
Total 135 125 40 300

41
What is the probability that a randomly selected person from these 300 faculty members and
students is in favor of the proposal or is neutral?
Answer: Let C be the event that the person is in favour and D be the event that the person
is neutral. Therefore,
135 40 175
P (C ∪ D) = P (C) + P (D) − P (C ∩ D) = + −0= = 0.58.
300 300 300
Definition 4.15. Two or more events are said to be exhaustive if the totality of all the events
include all possible outcomes of a random experiment. Mathematically,
P (A ∪ B) = 1.

Fundamental Principle of Counting


If an experiment consists of three steps, and if the first step can result in m outcomes, the second
step in n outcomes, and the third step in k outcomes, then total outcomes for the experiment
= m × n × k.

Combinations
For example, a student may be required to attempt any two questions out of four in an exam-
ination. Suppose the four questions are denoted by the numbers 1, 2, 3, and 4. Then the six
selections are
(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4).
Note that in combinations, all selections are made without replacement and order is not im-
portant. Each of the possible selections in the above list is called a combination. All six
combinations are distinct; that is, each combination contains a different set of questions. The
number of combinations for selecting x from n distinct elements is given by the formula
n!
n Cx = .
x!(n − x)!

Permutations
The three ways of selecting two marbles out of three (R, G, P ) are RG, RP, and GP,where R
represents that a red marble is selected, G means a green marble is selected, and P indicates a
purple marble is selected. In these three combinations, the order of selection is not important,
and, thus, RG and GR represent the same selection. However, if the order of selection is
important, then RG and GR are not the same selections, but they are two different selections.
Similarly, RP and P R are two different selections, and GP and P G are two different selections.
Thus, if the order in which the marbles are selected is important, then there are six selections:
RG, GR, RP, P R, GP, P G.
These are called six permutations or arrangements. The following formula is used to find
the number of permutations or arrangements of selecting x items out of n items.
n!
n Px = .
(n − x)!

42
Example 59. An ice cream parlour has six flavours of ice cream. Kristen wants to buy two
flavours of ice cream. If she randomly selects two flavours out of six, how many combinations
are possible?
Answer: Here, total number of ice cream flavours is 6 and number of ice cream flavours to
be selected is 2. Therefore, the number of ways in which Kristen can select two flavours of ice
cream out of six is
6! 6×5×4×3×2×1
6 C2 = = = 15.
2! × 4! 2×1×4×3×2×1
Thus, there are 15 ways for Kristen to select two ice cream flavors out of six.
Example 60. Marv & Sons advertised to hire a financial analyst. The company has received
applications from 10 candidates who seem to be equally qualified. The company manager has
decided to call only 3 of these candidates for an interview. If she randomly selects 3 candidates
from the 10, how many total selections are possible?
Answer: Here, total number of applications is 10 and number of candidates to be selected for
interview is 3. Therefore, the number of ways in which total number of selections are possible is
10!
10 C3 = = 120.
3! × 7!
Thus, there are 120 total ways for such selections.
Example 61. A club has 20 members. They are to select three office holders—president, secre-
tary, and treasurer— for next year. They always select these office holders by drawing 3 names
randomly from the names of all members. The first person selected becomes the president, the
second is the secretary, and the third one takes over as treasurer. Thus, the order in which 3
names are selected from the 20 names is important. Find the total number of arrangements of
3 names from these 20.
Answer: Here, total number of members is 20 and number of candidates to be selected for
positions is 3. Please note that here order is important. Therefore, the number of ways in which
total number of selections (with order) are possible is, i.e., the total number of arrangements is
20!
20 P3 = = 6840.
17!
Thus, there are 6840 total ways for such arrangements.
Example 62. In a country health department, there are 5 adjacent offices to be occupied by 5
nurses, A, B, C, D and E. In how many ways can the 5 nurses be assigned to the offices ?
Answer: Here, there are 5 offices and 5 nurses to whom these office will be allocated. Therefore,
the number of ways in which the assignments can be made is

5 P5 = 5! = 120.

Thus, there are 120 total ways for such allocations.


Example 63. A bag contains 6 white, 4 red and 10 black balls. Two balls are drawn at random.
Find the probability that they both will be black.

43
Answer: The total number of balls is 20. The number of ways in which 2 balls can be drawn
from 20 balls is
380
20 C2 = = 190
2
and 2 black balls can be drawn from 10 black balls in
90
10 C2 = = 45
2
ways. Therefore, the probability that the two balls drawn randomly are black is
10 C2 45
= = 0.237.
20 C2 190
Example 64. In a single throw of 2 dice, what is the probability of getting a total of 8, and a
total different from 8 ?
Answer: Here, the sample space is 36. Favourable outcomes for the event of getting a total of
8 are
(2, 6), (6, 2), (3, 5), (5, 3), (4, 4).
Therefore, the probability of getting a total of 8 is
5
.
36
Similarly, the probability of getting a total different from 8 is
5 31
1− = .
36 36
(∵ the event of getting a total of 8 and the event of not getting a total of 8 are complimentary.)
Example 65. From a pack of 52 cards, two cards are drawn at random. Find the probability
that one is a king and other is a queen.
Answer: Two cards can be drawn from 52 cards in 52 C2 ways. A king can be drawn in 4 C1
ways. A queen can be drawn in 4 C1 ways. There are two possibilities:
• first card is a king and second card is a queen.
• first card is a queen and second card is a king.
Therefore, the concerned probability is
4 C1 ×4 C1 × 2 4×4×2 8
= = = 0.012.
52 C2 52 × 51 663
Example 66. A card is drawn from a pack of playing cards and then another card is drawn
without the first being replaced. What is the probability of drawing two aces, and two spades
respectively (both questions are different) ?
Answer: Two cards can be drawn from 52 cards in 52 C2 ways. Two aces can be drawn from
four aces in 4 C2 ways. Two spades can be drawn from thirteen spades in 13 C2 ways. Therefore,
the concerned probability of drawing two aces is
4 C2 4×3 12
= = = 0.004.
52 C2 52 × 51 2652
Similarly, the concerned probability of drawing two spades is
13 C2 13 × 12 156
= = = 0.058.
52 C2 52 × 51 2652

44
4.2 Random Variable
Definition 4.16. A random variable is a variable whose value is determined by the outcome of
a random experiment.
Some examples are:
• Winning a lottery.
• Your car breaking down.
• Getting sick.
• Getting involved in an accident.
• Making money in the stock market.
• Number of customers visiting a bank, grocery store etc.
• Number of cars passing on a bridge on a given day.
We cannot predict when, where and to whom these things can or will happen ? In other words,
there is no definite time, place and person for these events to happen.
In tossing a coin, the random variable can take any of the values: head and tail. In
throwing a die, the random variable can take any of the values: 1, 2, 3, 4, 5, 6. In appearing an
exam, the random variable can take any of the values: pass, fail. In number of vehicles owned
by families, the random variable can take any of the values: 0, 1, 2, 3, 4 etc.
Definition 4.17. A random variable that assumes countable values is called a discrete random
variable.
Some examples are:
1. The number of cars sold at a dealership during given month.
2. The number of houses in a certain block.
3. The number of complaints received at the office of an airline on a given day.
4. The number of customers present in a bank during specific hour.
5. The number of telephone calls in a telephone exchange.
Definition 4.18. A random variable that can assume any value contained in one or more
intervals is called a continuous random variable.
Some examples are:
1. The time taken to commute from home to work.
2. The weight of a letter.
3. Lifetime of a battery.
4. Height of a Student.

45
5 Probability distribution of a discrete random variable
Definition 5.1. The probability distribution of a discrete random variable lists all the possible
values that the random variable can assume and their corresponding probabilities.

Characteristics of a discrete probability distribution


1. All the individual probabilities lie between 0 and 1, i.e., 0 ≤ P (x) ≤ 1, for each value of x.

2. The sum of all the probabilities should be equal to 1.

Example 67. Determine whether or not each table represents a valid probability distribution.
X 0 1 2 3
1.
P (X) 0.08 0.11 0.39 0.27

X 2 3 4 5
2.
P (X) 0.25 0.34 0.28 0.13

X 7 8 9
3.
P (X) 0.70 0.50 -0.20
Answer:

1. Here, all the probabilities lie between 0 and 1 but the sum of all the probabilities is 0.85.
Hence, this is not a probability distribution.

2. In this case, all the probabilities lie between 0 and 1 and the sum of all the probabilities
is 1. Hence, this is a probability distribution.

3. Here, the sum of all the probabilities is 1, but one of the probabilities is negative, i.e., less
than 0. Hence this is not a probability distribution.

Example 68. The following table lists the probability distribution of the number of breakdowns
per week for a machine based on past data.

Breakdowns per week 0 1 2 3


Probability 0.15 0.20 0.35 0.30

Find the probability that the number of breakdowns for this machine during a given week is

1. exactly 2

2. 0 to 2

3. more than 1

4. at most 1

Answer: Let X be the number of breakdowns for this machine during a given week. From the
table, we have:

46
1. P (exactly two breakdowns) = P (X = 2) = 0.35
2. P (0 to 2 breakdowns) = P (X = 0 or 1 or 2) = P (X = 0) + P (X = 1) + P (X = 2)
= 0.15 + 0.20 + 0.35 = 0.70
3. P (more than 1 breakdown) = P (X > 1) = P (X = 2) + P (X = 3) = 0.35 + 0.30 = 0.65
4. P (atmost 1 breakdown) = P (X ≤ 1) = P (X = 0) + P (X = 1) = 0.15 + 0.20 = 0.35
Example 69. A public health nurse has a caseload of 50 families. Let us construct the probability
distribution of X, the number of children per family for this population.
X = x Frequency Relative Frequency
0 1 1/50
1 4 4/50
2 6 6/50
3 4 4/50
4 9 9/50
5 10 10/50
6 7 7/50
7 4 4/50
8 2 2/50
9 2 2/50
10 1 1/50
What is the probability that a family picked at random from 50 will have:
1. fewer than 5 children ?
2. 5 or more children ?
3. between 3 and 6 children (both inclusive) ?
Answer: Let X be the number of children per family for this population. From the table, we
have:
1. P (fewer than 5 children) = P (X < 5) = P (X = 0) + P (X = 1) + P (X = 2) + P (X =
1 4 6 4 9 24
3) + P (X = 4) = + + + + = = 0.48.
50 50 50 50 50 50
24 26
2. P (5 or more children) = P (X ≥ 5) = 1 − P (X < 5) = 1 − = = 0.52.
50 50
3. P (between 3 and 6 children (both inclusive)) = P (3 ≤ X ≤ 6) = P (X = 3) + P (X =
4 9 10 7 30
4) + P (X = 5) + P (X = 6) = + + + = = 0.6.
50 50 50 50 50
Definition 5.2. The mean of a discrete random variable X is the value that is expected to occur
per repetition, on average, if an experiment is repeated a large number of times. It is denoted
by µ and calculated as X
µ= xP (X = x).
The mean of a discrete random variable X is also called its expected value and is denoted by
X
E(X) = xP (X = x).

47
Example 70. Baier’s Electronics manufactures computer parts that are supplied to many com-
puter companies. Despite the fact that two quality control inspectors at Baier’s Electronics
check every part for defects before it is shipped to another company, a few defective parts do pass
through these inspections undetected. Let X denote the number of defective computer parts in a
shipment of 400. The following table gives the probability distribution of X.
X 0 1 2 3 4 5
P (X = x) 0.02 0.20 0.30 0.30 0.10 0.08

Compute the mean.

Answer: We have the following table.


X P(X = x) x P(X = x)
0 0.02 0
1 0.20 0.20
2 0.30 0.60
3 0.30 0.90
4 0.10 0.40
5 0.08 P 0.40
xP (X = x) = 2.50
P
Therefore, the mean is µ = xP (X = x) = 2.50. Hence, average number of defective computer
parts in a shipment of 400 is 2.50.

Definition 5.3. The standard deviation of a discrete random variable X measures the spread
of its probability distribution and is computed as:
qX
σ= x2 P (X = x) − µ2
p
= E (X 2 ) − ((E(X))2

Example 71. For the question in Example 70, compute the standard deviation.

Answer: We construct the following table.


X P (X = x) xP (X = x) x2 x2 P (X = x)
0 0.02 0 0 0
1 0.20 0.20 1 0.20
2 0.30 0.60 4 1.20
3 0.30 0.90 9 2.70
4 0.10 0.40 16 1.60
5 0.08 0.40 25 P 2.00
2
P
xP (X = x) = 2.50 x P (X = x) = 7.70
P P 2
From the table, µ = xP (X = x) = 2.50 and x P (X = x) = 7.70. Hence,
qX p √
σ= x2 P (X = x) − µ2 = 7.70 − (2.50)2 = 1.45 = 1.204.

48
Thus, a given shipment of 400 computer parts is expected to contain an average of 2.50 defective
parts with a standard deviation of 1.204 defective parts. Now, we observe,

2.50 − 2(1.204) = 0.92


2.50 + 2(2.204) = 4.908.

Using Chebyshev’s Theorem, we can state that atleast 75% of the shipments (each containing
400 computer parts) are expected to contain 0.092 to 4.908 defective computer parts each.

Example 72. Loraine Corporation is planning to market a new makeup product. According to
the analysis made by the financial department of the company, it will earn an annual profit of
4.5 million if this product has high sales, it will earn an annual profit of 1.2 million if the sales
are mediocre, and it will lose 2.3 million a year if the sales are low. The probabilities of these
three scenarios are .32, .51, and .17, respectively.

1. Let X be the profits (in millions of dollars) earned per annum from this product by the
company. Write the probability distribution of X.

2. Calculate the mean and standard deviation of X.

Answer:

1. Let X be the profits (in millions of dollars) earned per annum from this product by the
company. The probability distribution of X is given by,

X P(X = x)
4.5 0.32
1.2 0.51
-2.3 0.17

2. We have the following table:

X P(X = x) x P(X = x) x2 P (X = x)
4.5 0.32 1.44 6.48
1.2 0.51 0.612 0.7344
-2.3 0.17 -0.391 0.8993
Total 1.661 8.1137
P
Therefore, mean = µ = xP (X = x) = 1.661 and standard deviation
qX p √
= x2 P (X = x) − µ2 = 8.1137 − 1.6612 = 5.354 = 2.314.

6 Binomial distribution
Binomial distribution is a discrete probability distribution, i.e., the associated random variable
X must be a discrete random variable.

49
Conditions of a Binomial Experiment
1. There are n identical trials.

2. Each trial has only two possible outcomes (or events). The outcomes should be mutually
exclusive.

3. The probabilities of two outcomes remain constant.

4. The trials are independent.

Example 73. If we toss a coin thrice. We may be interested in finding the probability of getting
two heads in three tosses of a coin. The coin will be tossed thrice under similar conditions. Each
toss has two outcomes, i.e., head and tail. The probabilities of getting a head or tail in each of
the tossess is constant. The tossess are independent of each other.
Example 74. Given that 70% of students at a college use facebook, we may want to find the
probability that in a random sample of 5 students at this college, exactly three use facebook. The
five students are selected one by one from the same college. The selected student may be either
a facebook user or not. The probabilities of a selected student being a facebook user, or not being
a facebook user are same. The selection of students are independent of each other.
For a Binomial experiment, the probability of exactly x successes in n trials is given by
the Binomial formula. The Binomial formula is also called the probability mass function of the
Binomial distribution. It is written as,

P (X = x) = n Cx px q n−x , x = 0, 1, . . . , n,
where n = number of trials
x = number of successes in n trials
p = probability of success
q = 1 − p = probability of failure
n − x = number of failures in n trials.

Example 75. 75% of students at a college with a large student population use the social media
site instagram. Three students are randomly selected from this college. What is the probability
that exactly two of these three students use instagram ?
Answer: As per the given information,

n = number of trials = number of students = 3


x = number of successes = number of students in three who use instagram = 2
p = probability of success = probability that a student uses instagram = 0.75
q = 1 − p = probability that a student does not use instagram = 0.25

Therefore,
P (X = 2) = 3 C2 (0.75)2 (0.25)1
= 3(0.5625)(0.25) = 0.4219.
Hence, there is a 42% chance that exactly two of the three selected students use instagram.

50
Example 76. At the Express House Delivery Service, providing high quality service to customers
is the top priority of the management. The company guarantees a refund of all charges if a
package it is delivering does not arrive at it destination by the specified time. It is known from
the past data that despite all efforts, 2% of the packages mailed through this company do not
arrive at their destinations within the specified time. Suppose a corporation mails 10 packages
through Express House Delivery Service on a certain day.
1. Find the probability that exactly one of these 10 packages will not arrive at its destination
within the specified time.

2. Find the probability that atmost one of these 10 packages will not arrive at its destination
within the specified time.
Answer: As per the given information,

n= number of trials = number of packages = 10


x= number of packages that will not arrive at its destination within the specified time
p= probability that a package will not arrive at its destination within the specified time = 0.02
q= probability that a package will not arrive at its destination within the specified time = 0.98

1. Therefore, P (X = 1) = 10 C1 (0.02)1 (0.98)9 = 0.1667.


So, we can conclude that there is 17% chance that one of these 10 packages will not arrive
at its destination within the specified time.

2. Hence, P (X ≤ 1) = P (X = 0) + P (X = 1) = 10 C0 (0.02)0 (0.98)10 + 10 C1 (0.02)1 (0.98)9 =


0.9838.
In this case, we can conclude that 98% chances are there that atmost one of these 10
packages will not arrive at its destination within the specified time.
Example 77. According to a survey, 33% of Americans do not plan to change their jobs in the
near future. Let X denote the number of employees in a random sample of three american em-
ployees who do not plan to change their jobs in the near future. Write the probability distribution
of X.
Answer: We can apply Binomial distribution here because there are only two outcomes, i.e.,
planning to change the job and not planning to change the job in the near future respectively.
Each of the trials are independent. Let X denote the number of employees in a random sample
of three american employees who do not plan to change their jobs in the near future. Therefore,
X can take values 0, 1, 2, 3.

n= number of trials = number of persons in sample = 3


x= number of employees who do not plan to change their jobs in the near future
p= probability that a employee does not plan to change his/her job in the near future = 0.33
q= probability that a employee does plan to change his/her job in the near future = 0.67

Therefore,
1. P (X = 0) = 3 C0 (0.33)0 (0.67)3 = 0.3008.

51
2. P (X = 1) = 3 C1 (0.33)1 (0.67)2 = 0.4444.

3. P (X = 2) = 3 C2 (0.33)2 (0.67)1 = 0.2189.

4. P (X = 3) = 3 C3 (0.33)3 (0.67)0 = 0.0359.

Hence, the probability distribution of X is given by,


X P(X = x)
0 0.3008
1 0.4444
2 0.2189
3 0.0359

Example 78. Suppose the mortality rate for a certain disease is 0.10 and suppose 10 people in
a community contract the disease. What is the probability that:

1. None will survive ?

2. Fifty percent will die ?

3. At least one will die ?

4. Exactly three will die ?

Answer: Along the lines of previous solutions, we have p = 0.10 and n = 10. Let X be the
number of people who will die. Hence, we have,
10 0
1. P (X = 10) = 10 C10 (0.10) (0.90) = 0.0000000001.
5 5
2. P (X = 5) = 10 C5 (0.10) (0.90) = 0.0014.

3. P (X ≥ 1) = 1 − {P (X = 0) + P (X = 1)} = 0.264.
3 7
4. P (X = 3) = 10 C3 (0.10) (0.90) = 0.057.

Mean and Standard Deviation



The mean and standard deviation of a Binomial distribution are µ = np and σ = npq respec-
tively, where n is the number of trials, p is the probability of success and q is the probability of
failure.

Remark 6.1. The parameters in case of a Binomial distribution are n and p respectively.

7 Poisson distribution
Poisson distribution is another discrete probability distribution, i.e., the associated random
variable X should be a discrete random variable.

52
Conditions to apply Poisson distribution
1. First Scenario

(a) The occurrences are random.


(b) The occurrences are independent.
(c) There should be some interval associated with the problem, i.e., volume or time
interval etc.

2. Second Scenario When n is large and p is small such that np is finite. If this is the case,
we can apply Poisson distribution. Specifically this is known as limiting case of Binomial
distribution or Poisson distribution is an approximation of Binomial distribution.

Example 79. Consider the number of telemarketing phone calls received by a household during
a given day. Here, the receiving of a telemarketing phone call by a household is called an
occurrence, the interval is one day (an interval of time), and the occurrences are random (that
is there is no specified time for such a phone call to come in) and discrete. The total number
of telemarketing phone calls received by a household during a given day may be 0, 1, 2, 3, 4, and
so forth. The independence of occurrences in this example means that the telemarketing phone
calls are received individually and none of these phone calls are related.

Example 80. Consider the number of defective items in the next 100 items manufactured on a
machine. In this case, the interval is a volume interval (100 items). The occurrences (number of
defective items) are random and discrete because there may be 0, 1, 2, 3, . . . , 100 defective items
in 100 items. The occurrence of defective items to be independent of one another.

For a Poisson distribution, the probability of x occurrences in an interval is given by the


Poisson formula. The Poisson formula is also called the probability mass function of the Poisson
distribution. It is written as,

e−λ λx
P (X = x) = , x = 0, 1, 2, 3, . . . ,
x!
where λ is the mean number of occurrences in that interval and the value of e is approximately
2.71828 .

Example 81. On average, a household receives 9.5 telemarketing phone calls per week. Find
the probability that a randomly selected household receives exactly 6 telemarketing phone calls
during a given week.

Answer: Let λ be the mean number of telemarketing phone calls received by a household per
week. Then, λ = 9.5. Let X be the number of telemarketing phone calls received by a household
during a given week. Therefore,

e−9.5 (9.5)6 (735091.8906)(0.00007485)


P (X = 6) = = = 0.0764.
6! 720
So, we conclude that there is 8% chance that a randomly selected household receives exactly 6
telemarketing phone calls during a given week.

53
Example 82. A washing machine in a laundromat breaks down an average of three times per
month. Find the probability that during the next month this machine will have:
1. exactly two breakdowns.
2. atmost one breakdown.
Answer: Let λ be the mean number of break downs of washing machine per month. Let X
be the actual number of breakdowns observed during the next month for this machine. Here,
λ = 3.
1. Hence,
e−3 32 9
P (X = 2) = = (0.049787) = 0.2240.
2! 2
So, we conclude that the chances of the washing machine breaking down twice in the next
month is 22%.
2. Therefore,
P (X 6 1) = P (X = 0) + P (X = 1)
e−3 30 e−3 31
= +
0! 1!
= e−3 (1 + 3)
= 4e−3
= 4(0.04978707)
= 0.2992.
So, we conclude that the event of washing machine breaking down atmost once in next
month has a chance of 30%.
Example 83. Cynthia’s Mail Order Company provides free examination of its products for 7
days. If not completely satisfied, a customer can return the product within that period and get a
full refund. According to past records of the company, an average of 2 of every 10 products sold
by this company are returned for a refund. Using the Poisson probability distribution formula,
find the probability that exactly 6 of the 40 products sold by this company on a given day will be
returned for a refund.
Answer: Here, λ = 2 × 4 = 8 because in question there are 40 products instead of 10. λ is
given for n = 10 in the question. So we modified it as for n = 40 as given in the question. Let
X be the number of products sold by this company on a given day that will be returned for a
refund. Therefore, the required probability is
e−8 86
P (X = 6) = = 0.1221.
6!
There are 12% chances that exactly 6 of the 40 products sold by this company on a given day
will be returned for a refund.
Example 84. A hospital administrator who has been studying daily emergency admissions over
a period of several years has concluded that they are distributed according to the Poisson law.
Hospital records reveal that emergency admissions have averaged three per day during this period.
If the administrator is correct in assuming a Poisson distribution, find the probability that:

54
1. exactly two emergency admissions will occur on a given day.
2. no emergency admissions will occur on a particular day.
3. either three or four emergency cases will be admitted on a particular day.
Answer: Here, λ = 3. Let X be the number of emergency admissions on a given day. Therefore,
e−3 32
1. P (X = 2) = = 0.224.
2!
e−3 30
2. P (X = 0) = = 0.050.
0!
e−3 33 e−3 34
3. P (X = 3 or X = 4) = P (X = 3) + P (X = 4) = + = 0.392.
3! 4!
Remark 7.1. One important point about the Poisson probability distribution is that the intervals
for λ and X must be equal. It they are not, the mean λ should be redefined to make them equal.

Mean and Standard Deviation



The mean and standard deviation of a Poisson distribution are µ = λ and σ = λ respectively.
Remark 7.2. The parameter in case of a Poisson distribution is λ.

8 Probability distribution of a continuous random vari-


able
As we have already discussed, a continuous random variable is the one which can assume any
value over an interval or intervals. Because the number of values contained in any interval is
infinite, the possible number of values that a continuous random variable can assume is also
infinite. Some examples are:
1. Time taken to complete an examination.
2. Weights of babies.
3. Prices of houses.
Remark 8.1. Variables involving money are often represented by continuous random variables.

Characteristics of a continuous probability distribution


1. The probability that the random variable X assumes a value in any interval lies in the
range 0 to 1.
2. The total probability of all the (mutually exclusive) intervals within which the random
variable X can assume a value is 1.
3. The probability that a continuous random variable X assumes a single value is always
zero.

55
9 Normal distribution
Normal distribution is a continuous probability distribution, i.e., the corresponding random
variable X must be a continuous random variable.

Example 85. The continuous random variables representing heights and weights of people,
scores of an examination, weights of packages, life of an item, time taken to complete a certain
job have all been observed to have an approximate normal distribution.

Properties of a Normal probability distribution


1. The total area under the curve is 1.

2. The curve is symmetric about the mean.

3. The two tails of the curve extend indefinitely.

For a Normal distribution, the formula to calculate the probabilities or the probability
density function of the Normal distribution is expressed as,
1 (x−µ)2
fX (x) = √ e− 2σ 2 , −∞ < X < ∞, −∞ < µ < ∞, σ 2 > 0.
2πσ 2

Standard Normal distribution


It is a special case of the Normal distribution. For the standard normal distribution, mean is
equal to zero and the value of standard deviation is equal to 1. The probability density function
of the standard normal distribution is given by,
1 2
fX (x) = √ e−x /2 , −∞ < X < ∞.

The normal distribution with µ = 0 and σ = 1 is called the standard normal distribution. The
random variable that possesses the standard normal distribution is denoted by Z.

Z values or Z scores
The units marked on the horizontal axis of the standard normal curve are denoted by Z and
are called the Z values or Z scores. A specific value of Z gives the distance between the mean
and the point represented by Z in terms of the standard deviation.

Example 86. Find the area under the standard normal curve to the left of Z = 1.95.

Answer: We divide the given number into two portions: 1.9 and 0.05. To find the required
area under the standard normal curve, we locate 1.9 in the left side for Z on the left side of
standard normal distribution table and 0.05 in the row for Z at the top. Therefore, from the
standard normal distribution table,

area to the left of 1.95 = P (Z ≤ 1.95) = 0.9744.

56
Example 87. Find the area under the standard normal curve from Z = −2.17 to Z = 0.

Answer: We divide the given number into two portions: −2.1 and 0.07. To find the required
area under the standard normal curve, we locate −2.1 in the left side for Z on the left side of
standard normal distribution table and 0.07 in the row for Z at the top. Therefore, from the
standard normal distribution table,

area from -2.17 to 0 = P (−2.17 < Z < 0)


= P (Z 6 0) − P (Z 6 −2.17)
= 0.5000 − 0.0150
= 0.4850.

Example 88. Find the area under the standard normal curve to the right of Z = 2.32.

Answer: Following a similar procedure, as in the previous examples, we have,

area to the right of 2.32 = P (Z > 2.32)


= 1 − P (Z ≤ 2.32)
= 1 − 0.9898
= 0.0102.

Example 89. Find the probability for the standard normal curve: P (−1.56 < Z < 2.31).

Answer: Proceeding along the same lines as above, we get,

P (−1.56 < Z < 2.31)


= P (Z ≤ 2.31) − P (Z ≤ −1.56)
= 0.9896 − 0.0594
= 0.9302.

Converting an X value to Z value


For a normal random variable X, a particular value of X can be converted to its corresponding
Z value by using the formula:
x−µ
z= ,
σ
where µ and σ are the mean and standand deviation of the normal distribution X respectively.
When X follows a Normal distribution, Z follows a Standard Normal distribution.

Example 90. Let X be a continuous random variable that is normally distributed with a mean
of 25 and a standard deviation of 4. Find the area between:

1. X = 25 and X = 32

2. X = 18 and X = 34.

Answer:

57
1. For the normal distribution, µ = 25 and σ = 4. We standardize the given normal distri-
bution by converting X = 25 and X = 32 to their respective Z values.
25 − 25 7
For X = 25, Z = = = 0.
4 4
32 − 25 7
For X = 32, Z = = = 1.75.
4 4
Therefore,
P (25 < X < 32)
= P (0 < Z < 1.75)
= P (Z ≤ 1.75) − P (Z < 0)
= 0.9599 − 0.5000
= 0.4599.
2. We standardize the given normal distribution by converting X = 18 and X = 34 to their
respective Z values.
18 − 25 −7
For X = 18, Z = = = −1.75.
4 4
34 − 25 9
For X = 34, Z = = = 2.25.
4 4
Therefore,
P (18 < X < 34)
= P (−1.75 < Z < 2.25)
= P (Z ≤ 2.25) − P (Z ≤ −1.75)
= 0.9878 − 0.0401
= 0.9477.
Example 91. Let X be a normal random variable with its mean equal to 40 and standard
deviation equal to 5. Find the following probabilities for this normal distribution.
1. P (X > 55)
2. P (X < 49)
Answer:
1. In this case, µ = 40 and σ = 5. We standardize the given normal distribution by converting
X = 55 to it’s corresponding Z value.
55 − 40 15
For X = 55, Z = = = 3.
5 5
Therefore,
P (X > 55)
= P (Z > 3)
= 1 − P (Z < 3)
= 1 − 0.9987
= 0.0013.

58
2. We standardize the given normal distribution by converting X = 49 to it’s corresponding
Z value.
49 − 40 9
For X = 49, Z = = = 1.80.
5 5
Therefore,
P (X < 49)
= P (Z < 1.80)
= 0.9641.

Example 92. A racing car is one of the many toys manufactured by the Mack Corporation.
The assembly times for this toy follows a normal distribution with a mean of 55 minutes and a
standard deviation of 4 minutes. The company closes at 5 PM everyday. If one worker starts
to assemble a racing car at 4 PM. What is the probability that she will finish this job before the
company closes for the day ?

Answer: Let X be the time to assemble a toy. Hence, the mean and standard deviation of X
are,
µ = 55 minutes
σ = 4 minutes.
According to the question,
P (X ≤ 60)
 
X −µ 60 − 55
=P ≤
σ 4
= P (Z ≤ 1.25)
= 0.8944.
Thus, the probability is 0.8944 that this worker will finish assembling this racing car before the
company closes for the day.

Example 93. According to the 2015 Physician Compensation Report by Medscape (a subsidiary
of WebMD), American internal medicine physicians earned an average of $196, 000 in 2014.
Suppose that the 2014 earnings of all American internal medicine physicians are normally dis-
tributed with a mean of $196, 000 and a standard deviation of $20, 000. Find the probability
that the 2014 earnings of a randomly selected American internal medicine physician are between
$169, 400 and $206, 800.

Answer: Let X represent the earnings of a randomly selected American internal medicine
physician. Hence, the mean and standard deviation of X are,

µ = $196, 000
σ = $20, 000

59
According to the question,
P (169, 400 < X < 206, 800)
 
169, 400 − 196, 000 X −µ 206, 800 − 196, 000
=P < <
20, 000 σ 20, 000
= P (−1.33 < Z < 0.54)
= P (Z < 0.54) − P (Z < −1.33)
= 0.7054 − 0.0918
= 0.6136.
Thus, the probability is 0.6136 that the earnings of these physicians will lie between $169, 400
and $206, 800.

Mean and Standard Deviation


The mean and standard deviation of a Normal distribution are µ and σ 2 respectively.
Remark 9.1. The parameters in case of a Normal distribution are µ and σ 2 respectively.

10 Sampling Distributions
10.1 Introduction
For example, for any population data set, there is only one value of the population mean µ.
However, we cannot say the same about the sample mean x̄. We would expect different samples
of the same size drawn from the same population to yield different values of the sample mean
x̄. Consequently, the sample mean, x̄ is a random variable. Therefore, the sample mean also
possesses a probability distribution which is more commonly called the sampling distribution
of x̄. Other sample statistics such as the median, mode and standard deviation also possess
sampling distributions.
Definition 10.1. The probability distribution of X̄ is called its sampling distribution. It lists
the various values that X̄ can assume and the probability of each of these values. In general, the
probability distribution of a sample statistic is called its sampling distribution.
Suppose there are only five students in an advanced statistics class and the midterm scores
of these five students are
70 78 80 80 95
Consider all possible samples of three scores each that can be selected, without replacement,
from that population. The total number of possible samples that can be drawn is given by
5!
5 C3 = = 10. Suppose we assign the letters A, B, C, D, and E to the scores of the five
3!(5 − 3)!
students, so that A = 70, B = 78, C = 80, D = 80, E = 95. Then, the 10 possible samples of
three scores each are
ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE.
The following table represents all possible samples and their means when the sample size is 3.

60
Sample Scores in the Sample X̄
ABC 70, 78, 80 76.00
ABD 70, 78, 80 76.00
ABE 70, 78, 95 81.00
ACD 70, 80, 80 76.67
ACE 70, 80, 95 81.67
ADE 70, 80, 95 81.67
BCD 78, 80, 80 79.33
BCE 78, 80, 95 84.33
BDE 78, 80, 95 84.33
CDE 80, 80, 95 85.00
The upcoming table represents frequency and relative frequency distributions of X̄ when the
sample size is 3.
X̄ Frequency Relative Frequency
76.00 2 2/10
76.67 1 1/10
79.33 1 1/10
81.00 1 1/10
81.67 2 2/10
84.33 2 2/10
85.00 P 1 1/10
f = 10 Sum = 1
The sampling distribution of X̄ when the sample size is 3, is given by:
X̄ P(X̄)
76.00 0.20
76.67 0.10
79.33 0.10
81.00 0.10
81.67 0.20
84.33 0.20
85.00 0.10

10.2 Sampling and Nonsampling Errors


Definition 10.2. Sampling error is the difference between the value of a sample statistic and
the value of the corresponding population parameter. In the case of the mean,
Sampling error = X̄ − µ,
assuming that the sample is random and no nonsampling error has been made. The sampling
error occurs only in a sample survey, and not in a census.
However, in the real world, it is not possible to find the sampling error because µ is not
known. If µ is known then we do not need to find X̄. It is important to remember that a
sampling error occurs because of chance. Sometimes it also occurs due to selection of a faulty
sampling technique.

61
Definition 10.3. The errors that occur in the collection, recording, and tabulation of data are
called nonsampling errors. The nonsampling errors can occur both in a sample survey and in a
census.
Nonsampling errors can be attributed to many sources, e.g., inability to obtain information
about all cases in the sample, definitional difficulties, differences in the interpretation of ques-
tions, inability or unwillingness on the part of the respondents to provide correct information,
inability to recall information, errors made in collection such as in recording or coding the data,
errors made in processing the data, errors made in estimating values for missing data etc.
Example 94. Suppose one sample of three scores is selected from this population, and this
sample includes the scores 70, 80, and 95. Find the sampling error.
Answer: The scores of the five students are 70, 78, 80, 80, and 95. The population mean is
70 + 78 + 80 + 80 + 95
µ= = 80.60.
5
Now a random sample of three scores from this population is taken and this sample includes
the scores 70, 80, and 95. The mean for this sample is
70 + 80 + 95
X̄ = = 81.67.
3
Consequently,
Sampling Error = X̄ − µ = 81.67 − 80.60 = 1.07.
That is, the mean score estimated from the sample is 1.07 higher than the mean score of the
population. Note that this difference occurred due to chance-that is, because we used a sample
instead of the population.
Example 95. The following data give the ages (in years) of all four members of a family.
55 53 28 25
1. Let X denote the age of a member of this family. Write the population probability distri-
bution of X.
2. List all the possible samples of size two (without replacement) that can be selected from this
population. Calculate the mean for each of these samples. Write the sampling distribution
of sample mean.
3. Calculate the mean for the population data. Select one random sample of size two and
calculate the sample mean X. Compute the sampling error.
Answer:
1. The population probability distribution of X is given by the following table:

X Frequency Relative Frequency P(X̄)


55 1 1/4 0.25
53 1 1/4 0.25
28 1 1/4 0.25
25 P 1 1/4 0.25
f =4

62
2. The possible samples of size 2 are: (55, 53), (55, 28), (55, 25), (53, 28), (53, 25), (28, 25). The
following table represents the sampling distribution of the sample mean.
Sample Sample Mean Frequency Relative Frequency P(Sample Mean)
(55, 53) 54 1 1/6 0.167
(55, 28) 41.5 1 1/6 0.167
(55, 25) 40 1 1/6 0.167
(53, 28) 40.5 1 1/6 0.167
(53, 25) 39 1 1/6 0.167
(28, 25) 26.5 P 1 1/6 0.167
f =6
3. Here, the population mean is,
55 + 53 + 28 + 25
µ= = 40.25.
4
If we select the first sample from the above table, the corresponding sample mean is 54.
Therefore,
sampling error = 54 − 40.25 = 13.75.

10.3 Mean and Standard Deviation of X̄


The mean of the sampling distribution of X̄ is always equal to the mean of the population.
Thus,
µX = µ.
The sample mean, X̄ is called an estimator of the population mean µ.
Definition 10.4. When the expected value (or mean) of a sample statistic is equal to the value of
the corresponding population parameter, that sample statistic is said to be an unbiased estimator.
For the sample mean X̄, µX = µ. Hence, X̄ is an unbiased estimator of µ. This is a very
important property that an estimator should possess.

The standard deviation of the sampling distribution of X̄ is


σ
σX̄ = √ ,
n
where σ is the standard deviation of the population and n is the sample size. This formula is
used when sample size is considered to be small compared to the population size, i.e., if the
sample size is less than or equal to 5% of the population size. Mathematically, the condition is
n
≤ 0.05,
N
where N is the population size.

If this condition is not satisfied, we use the following formula to calculate σX̄ :
r
σ N −n
σX̄ = √ ,
n N −1
q
−n
where the factor N N −1
is called the finite population correction factor.

63
Two important observations
1. The spread of the sampling distribution of X̄ is smaller than the spread of the correspond-
ing population distribution. In other words, σX̄ < σ. When n is greater than 1, which is
usually true, the denominator in √σn is greater than 1.

2. The standard deviation of the sampling distribution of X̄ decreases as the sample size
increases. This feature of the sampling distribution of X̄ is also obvious from the formula:
σ
σX̄ = √ .
n

Definition 10.5. When the standard deviation of a sample statistic decreases as the sample size
is increased, that statistic is said to be a consistent estimator.

This is another important property that an estimator√should possess. It is obvious from the
above formula for σX̄ that as n increases, the value of n also increases and, consequently, the
value of √σn decreases. Thus, the sample mean X̄ is a consistent estimator of the population
mean µ.

Example 96. The mean wage per hour for all 5000 employees who work at a large company
is $27.50, and the standard deviation is $3.70. Let X be the mean wage per hour for a random
sample of certain employees selected from this company. Find the mean and standard deviation
of X for a sample size of

1. 30

2. 75

3. 200

Answer: From the given information, for the population of all employees,

N = 5000, µ = $27.50, and σ = $3.70.

1. The mean of the sampling distribution of X̄, µX̄ , is

µX̄ = µ = $27.50.

In this case, n = 30, N = 5000, and n/N = 30/5000 = 0.006. Because n/N is less than
0.05, the standard deviation of X̄ is obtained by using the formula √σn . Hence,

σ 3.70
σX̄ = √ = √ = $0.676.
n 30

2. In this case, n = 75 and n/N = 75/5000 = 0.015, which is less than 0.05. The mean and
standard deviation of X̄ are
σ 3.70
µX̄ = µ = $27.50 and σX̄ = √ = √ = $0.427.
n 75

64
3. In this case, n = 200 and n/N = 200/5000 = 0.04, which is less than 0.05. Therefore, the
mean and standard deviation of X̄ are
σ 3.70
µX̄ = µ = $27.50 and σX̄ = √ = √ = $0.262.
n 200
n
Example 97. Consider a large population with µ = 60 and σ = 10. Assuming N
≤ .05, find
the mean and standard deviation of the sample mean, X, for a sample size of

1. 18

2. 90

Answer: From the given information, for the population,

µ = 60 and σ = 10.

1. The mean of the sampling distribution of X̄, µX̄ , is µX̄ = µ = 60. In this case, n = 18.
Because n/N ≤ 0.05 (assumption in the question), the standard deviation of X̄ is obtained
by using the formula √σn . Hence,

σ 10
σX̄ = √ = √ = 2.357.
n 18

2. In this case, n = 90. Because n/N ≤ 0.05 (assumption in the question), the mean and
standard deviation of X̄ are
σ 10
µX̄ = µ = 60 and σX̄ = √ = √ = 1.054
n 90

Example 98. According to the 2015 Physician Compensation Report by Medscape (a subsidiary
of WebMD), American orthopedists earned an average of $421, 000 in 2014. Suppose that the
mean and standard deviation of the 2014 earnings of all American orthopedists are $421, 000 and
$90, 000, respectively. Let X be the mean 2014 earnings of a random sample of 200 American
orthopedists. Find the mean and standard deviation of the sampling distribution of X. Assume
n
N
≤ 0.05.

Answer: From the given information, for the population,

µ = $421, 000 and σ = $90, 000.

The mean of the sampling distribution of X̄, µX̄ , is

µX̄ = µ = $421, 000.

In this case, n = 200. Because n/N ≤ 0.05 (assumption in the question), the standard deviation
of X̄ is obtained by using the formula √σn . Hence,

σ 90, 000
σX̄ = √ = √ = $6363.977.
n 200

65
Example 99. The standard deviation of the 2014 gross sales of all corporations is known to be
$16.06 billion. Let X be the mean of the 2014 gross sales of a sample of corporations. What
n
sample size will produce the standard deviation of X equal to $2.15 billion ? Assume ≤ .05.
N
Answer: From the given information, for the population,

σ = $16.06 and σX̄ = $2.15.

Because n/N ≤ 0.05 (assumption in the question), the standard deviation of X̄ is obtained by
using the formula √σn . Hence,

σ 16.06  16.06 2
σX̄ = √ ⇒ 2.15 = √ ⇒ n = ≈ 56.
n n 2.15

10.4 Shape of the Sampling Distribution of X̄


To determine the shape of the sampling distribution of X̄, we need to have an idea about the
population from which the samples are drawn. We can have the following two cases:

1. When the population from which samples are drawn is normally distributed with its mean
equal to µ and standard deviation equal to σ, then:

(a) The mean of X̄, µX̄ , is equal to the mean of the population, µ.
(b) The standard deviation of X̄, σX̄ , is equal to √σ , assuming n
≤ 0.05.
n N

(c) The shape of the sampling distribution of X̄ is normal, whatever the value of n.

2. When the population from which the samples are selected is not normally distributed, the
shape of the sampling distribution of X̄ is inferred from a very important theorem called
the Central Limit Theorem.

Central Limit Theorem


According to the central limit theorem, for a large sample size, the sampling distribution
of X̄ is approximately normal, irrespective of the shape of the population distribution.
The mean and standard deviation of the sampling distribution of X̄ are, respectively,
σ
µX̄ = µ and σX̄ = √ .
n

In case of the mean, the sample size is usually considered to be large if n ≥ 30.
Thus, if the population is non-normal, we have the following observations:

(a) When n ≥ 30, the shape of the sampling distribution of X̄ is approximately normal
irrespective of the shape of the population distribution. This is so due to the Central
Limit Theorem.
(b) The mean of X̄, µX̄ , is equal to the mean of the population, µ.

66
(c) The standard deviation of X̄, σX̄ , is equal to √σn if Nn ≤ 0.05.
Again, remember that for σX̄ = √σn to apply, Nn must be less than or equal to 0.05,
otherwise we multiply √σn by the finite population correction factor as mentioned
above in the previous section.

Remark 10.1. When the population does not have a normal distribution, the shape of the
sampling distribution is not exactly normal, but it is approximately normal for a large sample
size. The approximation becomes more accurate as the sample size increases.

Remark 10.2. The central limit theorem applies to large samples only. Usually, in case of the
mean, if the sample size is 30 or larger, it is considered sufficiently large so that the central limit
theorem can be applied to the sampling distribution of X̄.

Example 100. According to the 2015 Physician Compensation Report by Medscape (a subsidiary
of WebMD), American internal medicine physicians earned an average of $196, 000 in 2014.
Suppose that the 2014 earnings of all American internal medicine physicians are approximately
normally distributed with a mean of $196, 000 and a standard deviation of $20, 000. Let X be the
mean 2014 earnings of a random sample of American internal medicine physicians. Calculate
the mean and standard deviation of X and describe the shape of its sampling distribution when
the sample size is

1. 16

2. 50

3. 1000

Answer: Let µ and σ be the mean and standard deviation of the 2014 earnings of all American
internal medicine physicians, and µX̄ and σX̄ be the mean and standard deviation of the sampling
distribution of X̄, respectively. Then, from the given information,

µ = $196, 000 and σ = $20, 000.

1. The mean and standard deviation of X̄ are, respectively,


σ 20000
µX̄ = µ = $196, 000 and σX̄ = √ = √ = $5000.
n 16
Because the 2014 earnings of all American internal medicine physicians are approximately
normally distributed, the sampling distribution of X̄ for samples of 16 such physicians is
also approximately normally distributed.

2. The mean and standard deviation of X̄ are, respectively,


σ 20000
µX̄ = µ = $196, 000 and σX̄ = √ = √ = $2828.427.
n 50
Again, because the 2014 earnings of all American internal medicine physicians are ap-
proximately normally distributed, the sampling distribution of X̄ for samples of 50 such
physicians is also approximately normally distributed.

67
3. The mean and standard deviation of X̄ are, respectively,
σ 20000
µX̄ = µ = $196, 000 and σX̄ = √ = √ = $632.456.
n 200
Again, because the 2014 earnings of all American internal medicine physicians are ap-
proximately normally distributed, the sampling distribution of X̄ for samples of 200 such
physicians is also approximately normally distributed.

Example 101. The delivery times for all food orders at a fast-food restaurant during the lunch
hour are approximately normally distributed with a mean of 7.7 minutes and a standard deviation
of 2.1 minutes. Let X be the mean delivery time for a random sample of 16 orders at this
restaurant. Calculate the mean and standard deviation of X, and describe the shape of its
sampling distribution.

Answer: Let µ and σ be the mean and standard deviation of the delivery times for all food
orders at a fast-food restaurant during the lunch hour, and µX̄ and σX̄ be the mean and standard
deviation of the sampling distribution of X̄, respectively. Then, from the given information,

µ = 7.7 minutes and σ = 2.1 minutes.

The mean and standard deviation of X̄ are, respectively,


σ 2.1
µX̄ = µ = 7.7 minutes and σX̄ = √ = √ = 0.525 minutes.
n 16
Because the delivery times for all food orders at a fast-food restaurant during the lunch hour
are approximately normally distributed, the sampling distribution of X̄ for samples of 16 such
orders at this restaurant is also approximately normally distributed.

Example 102. The GPAs of all 5540 students enrolled at a university have an approximate
normal distribution with a mean of 3.02 and a standard deviation of 0.29. Let X̄ be the mean
GPA of a random sample of 48 students selected from this university. Find the mean and
standard deviation of X̄, and comment on the shape of its sampling distribution.

Answer: Let µ and σ be the mean and standard deviation of the GPA’s of all students enrolled
at a University, and µX̄ and σX̄ be the mean and standard deviation of the sampling distribution
of X̄, respectively. Then, from the given information,

µ = 3.02 marks and σ = 0.29 marks.

The mean and standard deviation of X̄ are, respectively,


σ 0.29
µX̄ = µ = 3.02 marks and σX̄ = √ = √ = 0.0418 marks.
n 48
Because the GPA’s of all students enrolled at a University are approximately normally dis-
tributed, the sampling distribution of X̄ for samples of 48 such students is also approximately
normally distributed.

68
Example 103. The mean rent paid by all tenants in a small city is $1550 with a standard
deviation of $225. However, the population distribution of rents for all tenants in this city is
skewed to the right. Calculate the mean and standard deviation of X and describe the shape of
its sampling distribution when the sample size is

1. 30

2. 100

Answer: Although the population distribution of rents paid by all tenants is not normal, in
each case the sample size is large (n ≥ 30). Hence, the central limit theorem can be applied to
infer the shape of the sampling distribution of X̄.

1. Let X̄ be the mean rent paid by a sample of 30 tenants. Then, the sampling distribution
of X̄ is approximately normal with the values of the mean and standard deviation given
as
σ 225
µX̄ = µ = $1550 and σX̄ = √ = √ = $41.079.
n 30
2. Let X̄ be the mean rent paid by a sample of 100 tenants. Then, the sampling distribution
of X̄ is approximately normal with the values of the mean and standard deviation given
as
σ 225
µX̄ = µ = $1550 and σX̄ = √ = √ = $22.50.
n 100
Example 104. According to the National Association of Colleges and Employers Spring 2015
Salary Survey, the average starting salary for 2014 college graduates was $48, 127. Suppose that
the mean starting salary of all 2014 college graduates was $48, 127 with a standard deviation of
$9200, and that this distribution was strongly skewed to the right. Let X be the mean starting
salary of 25 randomly selected 2014 college graduates. Find the mean and the standard deviation
of the sampling distribution of X. What are the mean and the standard deviation of the sampling
distribution of X if the sample size is 100? How do the shapes of the sampling distributions differ
for the two sample sizes ?

Answer: The population distribution of average starting salary for 2014 college graduates is
not normal, i.e., it is strongly skewed to the right. In first case, the sample size is small (n = 25).
Hence, the central limit theorem cannot be applied to infer the shape of the sampling distribution
of X̄ and consequently the shape of the sampling distribution of X̄ will be skewed to the right.
However, in the second case, the sample size is large (n = 100). Hence, the central limit
theorem can be applied to infer the shape of the sampling distribution of X̄.

1. Let X̄ be average starting salary for 2014 college graduates for a sample of 25 students.
Then, the sampling distribution of X̄ is positively skewed with the values of the mean and
standard deviation given as
σ 9200
µX̄ = µ = $48, 127 and σX̄ = √ = √ = $1840.
n 25

69
2. Let X̄ be average starting salary for 2014 college graduates for a sample of 100 students.
Then, the sampling distribution of X̄ is approximately normal with the values of the mean
and standard deviation given as
σ 9200
µX̄ = µ = $48, 127 and σX̄ = √ = √ = $920.
n 100

For the first case, the shape is positively skewed and for the second case, the shape is bell shaped.

Example 105. According to the American Time Use Survey results released by the Bureau of
Labor Statistics on June 24, 2015, Americans age 15 and over watched television for an average
of 168 minutes per day. Suppose that the current distribution of times spent watching television
per day by all Americans age 15 and over has a mean of 168 minutes and a standard deviation
of 20 minutes. Let X be the average time spent watching television per day by 400 randomly
selected Americans age 15 and over. Find the mean and the standard deviation of the sampling
distribution of X. What is the shape of the sampling distribution of X ? Do you need to know
the shape of the population distribution in order to make this conclusion ? Explain why or why
not.

Answer: Although the population distribution of time to watch television, for Americans aged
15 and above, is not normal, the sample size is large (n ≥ 30). Hence, the central limit theorem
can be applied to infer the shape of the sampling distribution of X̄.
Let X̄ be average time to watch television, for Americans aged 15 and above, for a sample
of 400 students. Then, the sampling distribution of X̄ is approximately normal with the values
of the mean and standard deviation given as
σ 20
µX̄ = µ = 168 minutes and σX̄ = √ = √ = 1 minute.
n 400
We don’t need to know the shape of the population distribution in order to make the conclusion
because sample size is large and hence, we can apply Central Limit Theorem to infer the shape
of the sampling distribution of X̄.

10.5 Applications of the Sampling Distribution of X̄


From the central limit theorem, for large samples, the sampling distribution of X̄ is approxi-
mately normal with mean µX̄ = µ and standard deviation σX̄ = √σn . Based on this result, we
can make the following statements about X̄ for large samples.

1. If we take all possible samples of the same (large) size from a population and calculate the
mean for each of these samples, then about 68.26% of the sample means will be within
one standard deviation (σX̄ ) of the population mean. Alternatively, we can state that if
we take one sample (of n ≥ 30) from a population and calculate the mean for this sample,
the probability that this sample mean will be within one standard deviation (σX̄ ) of the
population mean is 0.6826. That is,

P (µ − 1σX̄ ≤ X̄ ≤ µ + 1σX̄ ) = 0.8413 − 0.1587 = 0.6826.

70
2. If we take all possible samples of the same (large) size from a population and calculate the
mean for each of these samples, then about 95.44% of the sample means will be within
two standard deviations (σX̄ ) of the population mean. Alternatively, we can state that if
we take one sample (of n ≥ 30) from a population and calculate the mean for this sample,
the probability that this sample mean will be within two standard deviations (σX̄ ) of the
population mean is 0.9544. That is,

P (µ − 2σX̄ ≤ X̄ ≤ µ + 2σX̄ ) = 0.9772 − 0.0228 = 0.9544.

3. If we take all possible samples of the same (large) size from a population and calculate the
mean for each of these samples, then about 99.74% of the sample means will be within
three standard deviations (σX̄ ) of the population mean. Alternatively, we can state that if
we take one sample (of n ≥ 30) from a population and calculate the mean for this sample,
the probability that this sample mean will be within three standard deviations (σX̄ ) of the
population mean is 0.9974. That is,

P (µ − 3σX̄ ≤ X̄ ≤ µ + 3σX̄ ) = 0.9987 − 0.0013 = 0.9974.

Remark 10.3. The Z value for a value of X̄ is calculated as

X̄ − µX̄
Z= .
σX̄
Example 106. Assume that the weights of all packages of a certain brand of cookies are approx-
imately normally distributed with a mean of 32 ounces and a standard deviation of 0.3 ounce.
Find the probability that the mean weight, X̄, of a random sample of 20 packages of this brand
of cookies will be between 31.8 and 31.9 ounces.

Answer: Although the sample size is small (n < 30), the shape of the sampling distribution of
X̄ is approximately normal because the population is approximately normally distributed. The
mean and standard deviation of X̄ are, respectively,
σ 0.3
µX̄ = µ = 32 ounces and σX̄ = √ = √ = 0.06708 ounce.
n 20

We are to compute the probability that the value of X̄ calculated for one randomly drawn
sample of 20 packages is between 31.8 and 31.9 ounces; that is,

P (31.8 < X̄ < 31.9).

This probability is given by the area under the normal distribution curve for X̄ between
the points X̄ = 31.8 and X̄ = 31.9. The first step in finding this area is to convert the two X̄
values to their respective Z values.
31.8 − 32
For X̄ = 31.8 : Z = = −2.98
0.06708

31.9 − 32
For X̄ = 31.9 : Z = = −1.49
0.06708
71
The probability that X̄ is between 31.8 and 31.9 is given by the area under the standard normal
curve between Z = −2.98 and Z = −1.49, which is obtained by subtracting the area to the left
of Z = −2.98 from the area to the left of Z = −1.49. Thus, the required probability is

P (31.8 < X̄ < 31.9) = P (−2.98 < Z < −1.49) = P (Z < −1.49) − P (Z < −2.98) = 0.0667.

Therefore, the probability that the mean weight of a sample of 20 packages will be between 31.8
and 31.9 ounces is 0.0667.
Example 107. According to Moebs Services Inc., an individual checking account at major U.S.
banks costs the banks between $350 and $450 per year. Suppose that the current average cost of
all checking accounts at major U.S. banks is $400 per year with a standard deviation of $30. Let
X̄ be the current average annual cost of a random sample of 225 individual checking accounts
at major banks in America.
1. What is the probability that the average annual cost of the checking accounts in this sample
is within $4 of the population mean ?

2. What is the probability that the average annual cost of the checking accounts in this sample
is less than the population mean by $2.70 or more?
Answer: From the given information, for the annual costs of all individual checking accounts
at major banks in America,
µ = $400 and σ = $30.
Although the shape of the probability distribution of the population (annual costs of all indi-
vidual checking accounts at major U.S. banks) is unknown, the sampling distribution of X̄ is
approximately normal because the sample is large (n ≥ 30). Remember that when the sample
is large, the central limit theorem applies. The mean and standard deviation of the sampling
distribution of X̄ are, respectively,
σ 30
µX̄ = µ = $400 and σX̄ = √ = √ = $2.00 ounce.
n 225
1. The probability that the mean of the annual costs of checking accounts in this sample
is within $4 of the population mean is written as P (396 ≤ X̄ ≤ 404). This probability
is given by the area under the normal distribution curve for X̄ between X̄ = $396 and
X̄ = $404. We find this area as follows:
396 − 400
For X̄ = $396 : Z = = −2.00
2.00

404 − 400
For X̄ = $404 : Z = = 2.00
2.00
Hence, the required probability is

P ($396 ≤ X̄ ≤ $404) = P (−2.00 ≤ Z ≤ 2.00) = P (Z ≤ 2.00) − P (Z < −2.00) = 0.9544.

Therefore, the probability that the average annual cost of the 225 checking accounts this
sample is within $4 of the population mean is 0.9544.

72
2. The probability that the average annual cost of the checking accounts in this sample is
less than the population mean by $2.70 or more is written as

P (X̄ ≤ 397.30)

This probability is given by the area under the normal curve for X̄ to the left of X̄ =
$397.30. We find this area as follows:
397.30 − 400
For X̄ = $397.30 : Z = = −1.35.
2.00
Hence, the required probability is

P (X̄ ≤ 397.30) = P (Z ≤ −1.35) = 0.0885.

Thus, the probability that the average annual cost of the checking accounts in this sample
is less than the population mean by $2.70 or more is 0.0885.

10.6 Population and Sample Proportions


The population proportion, denoted by p, is obtained by taking the ratio of the number of
elements in a population with a specific characteristic to the total number of elements in the
population. The sample proportion, denoted by p̂ (pronounced p hat), gives a similar ratio for
a sample.
The population and sample proportions, denoted by p and p̂, respectively, are calculated as
X x
p= and p̂ = ,
N n
where, N = total number of elements in the population, n = total number of elements in the
sample, X = number of elements in the population that possess a specific characteristic and x
= number of elements in the sample that possess the same specific characteristic.

10.7 Sampling Distribution of p̂


The probability distribution of the sample proportion, p̂ , is called its sampling distribution. It
gives the various values that p̂ can assume and their probabilities.
Suppose there are only five employees in a company and we want to test their knowledge
of Statistics. The following table gives the particulars:
Name Knowledge of Statistics
Ally Yes
John No
Susan No
Lee Yes
Tom Yes
If we define the population proportion, p, as the proportion of employees who know statis-
tics, then
3
p = = 0.60.
5
73
Note that this population proportion, p = 0.60 is a constant. As long as the population does
not change, this value of p will not change.
Now, suppose we draw all possible samples of three employees each and compute the
proportion of employees, for each sample, who know Statistics. The total number of samples of
size three that can be drawn from the population of five employees is
5!
Total number of samples =5 C3 = = 10
3!(5 − 3)!
.
The following table represents all possible samples and their proportions when the sample
size is 3.
Sample Proportion who know Statistics (p̂)
Ally, John, Susan 1/3 = 0.33
Ally, John, Lee 2/3 = 0.67
Ally, John, Tom 2/3 = 0.67
Ally, Susan, Lee 2/3 = 0.67
Ally, Susan, Tom 2/3 = 0.67
Ally, Lee, Tom 3/3 = 1.00
John, Susan, Lee 1/3 = 0.33
John, Susan, Tom 1/3 = 0.33
John, Lee, Tom 2/3 = 0.67
Susan, Lee, Tom 2/3 = 0.67

The upcoming table represents frequency and relative frequency distributions of p̂ when the
sample size is 3.
p̂ Frequency Relative Frequency
0.33 3 3/10
0.67 6 6/10
1.00 P 1 1/10
f = 10 Sum = 1

The sampling distribution of p̂ when the sample size is 3, is given by:


p̂ P(p̂)
0.33 0.30
0.67 0.60
1.00 0.10

Example 108. Suppose a total of 789, 654 families live in a particular city and 563, 282 of them
own homes. A sample of 240 families is selected from this city, and 158 of them own homes.
Find the proportion of families who own homes in the population and in the sample. Find the
sampling error.

Answer: For the population of this city,

N = population size = 789, 654, X = number of families in the population who own homes = 563, 282.

74
The population proportion of families in this city who own homes is
X 563, 282
p= = = 0.71.
N 789, 654
Now, a sample of 240 families is taken from this city, and 158 of them are home-owners. Then,

n = sample size = 240, x = number of families in the sample who own homes = 158.

The sample proportion is


x 158
p̂ = = = 0.66.
n 240
As in the case of the mean, the difference between the sample proportion and the corresponding
population proportion gives the sampling error, assuming that the sample is random and no
nonsampling error has been made. Thus, in the case of the proportion,

Sampling error = p̂ − p = 0.66 − 0.71 = −0.05.

10.8 Mean and Standard Deviation of the Sampling Distribution of



The mean of the sample proportion, p̂ , is denoted by µp̂ and is equal to the population propor-
tion, p. Thus,
µp̂ = p.
The sample proportion, p̂, is called an estimator of the population proportion, p. Since for
the sample proportion, µp̂ = p, p̂ is an unbiased estimator of p.
The standard deviation of the sample proportion p̂ is denoted by σp̂ and is given by the
formula r
pq
σp̂ =
n
where p is the population proportion, q = 1 − p, and n is the sample size. This formula is used
when Nn ≤ 0.05, where N is the population size.
However, if Nn is greater than 0.05, then σp̂ is calculated as follows:
r r
pq N − n
σp̂ =
n N −1
where the factor r
N −n
N −1
is called the finite population correction factor.
It is obvious from the above formula for σp̂ that as n increases, the value of pq
p
n
decreases.
Thus, the sample proportion, p̂ is a consistent estimator of the population proportion, p.
n
Example 109. Consider a large population with p = .21. Assuming N
≤ .05, find the mean
and standard deviation of the sample proportion p̂ for a sample size of
1. 400

75
2. 750

Answer:

1. Here, p = 0.21, q = 1 − p = 1 − 0.21 = 0.79, and n = 400. The mean of the sampling
distribution of p̂ is
µp̂ = p = 0.21.
The standard deviation of p̂ is
r r
pq (0.21)(0.79)
σp̂ = = = 0.020.
n 400

2. Here, p = 0.21, q = 1 − p = 1 − 0.21 = 0.79, and n = 750. The mean of the sampling
distribution of p̂ is
µp̂ = p = 0.21.
The standard deviation of p̂ is
r r
pq (0.21)(0.79)
σp̂ = = = 0.015.
n 750

10.9 Shape of the Sampling Distribution of p̂


The shape of the sampling distribution of p̂ is inferred from the central limit theorem.

Central Limit Theorem for Sample Proportion


According to the central limit theorem, the sampling distribution of p̂ is approximately normal
for a sufficiently large sample size. In the case of proportion, the sample size is considered to be
sufficiently large if np and nq are both greater than 5; that is, if

np > 5 and nq > 5.

Example 110. According to a recent New York Times/CBS News poll, 55% of adults polled
said that owning a home is a very important part of the American Dream. Assume that this
result is true for the current population of American adults. Let p̂ be the proportion of American
adults in a random sample of 2000 who will say that owning a home is a very important part of
the American Dream. Find the mean and standard deviation of p̂ and describe the shape of its
sampling distribution.

Answer: Let p be the proportion of all American adults who will say that owning a home is a
very important part of the American Dream. Then,

p = 0.55, q = 1 − p = 1 − 0.55 = 0.45, and n = 2000.

The mean of the sampling distribution of p̂ is

µp̂ = p = 0.55.

76
The standard deviation of p̂ is
r r
pq (0.55)(0.45)
σp̂ = = = 0.0111.
n 2000
The values of np and nq are

np = 2000(0.55) = 1100 and nq = 2000(0.45) = 900.

Because np and nq are both greater than 5, we can apply the central limit theorem to make an
inference about the shape of the sampling distribution of p̂. Thus, the sampling distribution of
p̂ is approximately normal with a mean of 0.55 and a standard deviation of 0.0111.

Example 111. According to a Gallup poll conducted January 5–8, 2014, 67% of American adults
were dissatisfied with the way income and wealth are distributed in America. Assume that this
percentage is true for the current population of American adults. Let p̂ be the proportion in a
random sample of 400 American adults who hold the above opinion. Find the mean and standard
deviation of the sampling distribution of p̂ and describe its shape.

Answer: Let p be the proportion of American adults who were dissatisfied with the way income
and wealth are distributed in America according to a Gallup pollconducted January 5 − 8, 2014.
Then,
p = 0.67, q = 1 − p = 1 − 0.67 = 0.33, and n = 400.
The mean of the sampling distribution of p̂ is

µp̂ = p = 0.67.

The standard deviation of p̂ is


r r
pq (0.67)(0.33)
σp̂ = = = 0.024.
n 400
The values of np and nq are

np = 400(0.67) = 268 and nq = 400(0.33) = 132.

Because np and nq are both greater than 5, we can apply the central limit theorem to make an
inference about the shape of the sampling distribution of p̂. Thus, the sampling distribution of
p̂ is approximately normal with a mean of 0.67 and a standard deviation of 0.024.

10.10 Applications of the Sampling Distribution of p̂


From the central limit theorem, for large samples, the sampling distribution of p̂ is approximately
normal with mean µp̂ = p and standard deviation σp̂ = pq
p
n
. Based on this result, we can make
the following statements about p̂ for large samples.

1. If we take all possible samples of the same (large) size from a population and calculate
the proportion for each of these samples, then about 68.26% of the sample proportions
will be within one standard deviation (σp̂ ) of the population proportion. Alternatively, we

77
can state that if we take one sample (with np > 5 and nq > 5) from a population and
calculate the proportion for this sample, the probability that this sample proportion will
be within one standard deviation (σp̂ ) of the population proportion is 0.6826. That is,

P (p − 1σp̂ ≤ p̂ ≤ p + 1σp̂ ) = 0.8413 − 0.1587 = 0.6826.

2. If we take all possible samples of the same (large) size from a population and calculate
the proportion for each of these samples, then about 95.44% of the sample proportions
will be within two standard deviation (σp̂ ) of the population proportion. Alternatively,
we can state that if we take one sample (with np > 5 and nq > 5) from a population and
calculate the proportion for this sample, the probability that this sample proportion will
be within two standard deviation (σp̂ ) of the population proportion is 0.9544. That is,

P (p − 2σp̂ ≤ p̂ ≤ p + 2σp̂ ) = 0.9772 − 0.0228 = 0.9544.

3. If we take all possible samples of the same (large) size from a population and calculate
the proportion for each of these samples, then about 99.74% of the sample proportions
will be within three standard deviation (σp̂ ) of the population proportion. Alternatively,
we can state that if we take one sample (with np > 5 and nq > 5) from a population and
calculate the proportion for this sample, the probability that this sample proportion will
be within three standard deviation (σp̂ ) of the population proportion is 0.9974. That is,

P (p − 3σp̂ ≤ p̂ ≤ p + 3σp̂ ) = 0.9987 − 0.0013 = 0.9974.

Remark 10.4. The Z value for a value of p̂ is calculated as


p̂ − µp̂
Z= .
σp̂

Example 112. In a recent Pew Research Center nationwide telephone survey of American
adults, 75% of adults said that college education has become too expensive for most people.
Suppose that this result is true for the current population of American adults. Let p̂ be the
proportion in a random sample of 1400 adult Americans who will hold the said opinion. Find
the probability that 76.5% to 78% of adults in this sample will hold this opinion.

Answer: From the given information,

n = 1400, p = 0.75, and q = 1 − p = 1 − 0.75 = 0.25,

where p is the proportion of all adult Americans who hold the said opinion. The mean of the
sample proportion p̂ is
µp̂ = p = 0.75.
The standard deviation of p̂ is
r r
pq (0.75)(0.25)
σp̂ = = = 0.011572.
n 1400

78
The values of np and nq are

np = 1400(0.75) = 1050 and nq = 1400(0.25) = 350.

The first step in finding the area under the normal distribution curve between p̂ = 0.765 and
p̂ = 0.78 is to convert these two values to their respective Z values.
0.765 − 0.75
For p̂ = 0.765 : Z = = 1.30.
0.011572
0.78 − 0.75
For p̂ = 0.78 : Z = = 2.59.
0.011572
Hence, the required probability is

P (0.765 < p̂ < 0.78) = P (1.30 < Z < 2.59) = P (Z < 2.59) − P (Z < 1.30) = 0.0920.

Thus, the probability that 76.5% to 78% of American adults in a random sample of 1400 will
say that college education has become too expensive for most people is 0.0920.
Example 113. Maureen Webster, who is running for mayor in a large city, claims that she is
favored by 53% of all eligible voters of that city. Assume that this claim is true. What is the
probability that in a random sample of 400 registered voters taken from this city, less than 49%
will favor Maureen Webster?
Answer: Let p be the proportion of all eligible voters who favor Maureen Webster. Then,

p = 0.53 and q = 1 − p = 1 − 0.53 = 0.47.

The mean of the sampling distribution of the sample proportion p̂ is

µp̂ = p = 0.53.

The population of all voters is large (because the city is large) and the sample size is small
compared to the population. Consequently, we can assume that n/N ≤ 0.05. Hence, the
standard deviation of p̂ is calculated as
r r
pq (0.53)(0.47)
σp̂ = = = 0.024954.
n 400
The values of np and nq are

np = 400(0.53) = 212 and nq = 400(0.47) = 188.

From the central limit theorem, the shape of the sampling distribution of p̂ is approximately
normal. The Z value for p̂ = 0.49 is
0.49 − 0.53
Z= = −1.60.
0.024954
Hence, the required probability is

P (p̂ < 0.49) = P (Z < −1.60) = 0.0548.

79
11 Basics of Estimation
Definition 11.1. The procedure of assignment of value(s) to a population parameter based on
a value of the corresponding sample statistic is called estimation.
Remark 11.1. Population parameters are mean, median, mode, variance, standard deviation
etc.
Some examples are:
1. To estimate the mean time taken to learn a certain job by new employees, the manager
will take a sample of new employees and record the time taken by each of these employees
to learn the job. Using this information, he or she will calculate the sample mean, x̄.
Then, based on the value of x̄, he or she will assign certain values to µ.
2. The polling agencies who wants to find the proportion or percentage of adults who are in
favour of raising taxes on rich people to reduce the budget deficit will take a sample of
adults and determine the value of the sample proportion, p̂, which represents the proportion
of adults in the sample who are in favour of raising taxes on rich people to reduce the
budget deficit. Using this value of sample proportion, p̂, the agency will assign values to
the population proportion, p.
Definition 11.2. The values assigned to a population parameter based on the value of a sample
statistic is called an estimate. The sample statistic used to estimate a population parameter is
called an estimator.
Remark 11.2. Generally the sample taken is a simple random sample.
Remark 11.3. An estimate may be a point or interval estimate.
Definition 11.3. The value of a sample statistic that is used to estimate a population parameter
is called a point estimate.
Example 114. Suppose the Census Bureau takes a random sample of 10, 000 households and
determines that the mean housing expenditure per month, x̄, for this sample is $2970. Then
using x̄ as a point estimate of µ, the Bureau can state that the mean housing expenditure per
month, µ, for all households is about $2970. Thus, point estimate of a population parameter is
the value of the corresponding sample statistic.
Definition 11.4. In interval estimation, an interval is constructed around the point estimate,
and it is stated that this interval contains the corresponding poqulation parameter with a certain
confidence level.
Example 115. In the previous example, instead of saying that the mean housing expenditure
per month for all households is $2970, we may obtain an interval by subtracting a number from
$2970 and adding the same number to $2970. Then, we state that this interval contains the
population mean µ. Suppose we subtract and add $340 from and to $2970 respectively, we get
$2630 and $3310 respectively. Then we state that the interval $2630 to $3310 is likely to contain
the population mean µ and that the mean housing expenditure per month for all households in
the United Nations is between $2630 and $3310. This procedure is called interval estimation.
$2630 is called the lower limit of the interval. $3310 is called the upper limit of the interval.
The number we add to and subtract from the point estimate is called the margin of error or the
maximum error of the estimate.

80
The number which needs to be subtracted from or added to the point estimate to obtain
an interval estimate, depends on two factors:

1. The standard deviation σx̄ of the sample mean x̄.

2. The level of confidence to be attached to the interval.

Remark 11.4. If standard deviation is large, greater is the number subtracted from and added
to the point estimate. Thus it is obvious that if the range over which x̄ can assume values is
larger, then the interval constructed around x̄ must be wider to include µ.

Definition 11.5. Each interval is constructed with regard to a given confidence level and is
called a confidence interval. The confidence interval is given as: point estimate ± Margin of
error.
The confidence level associated with a confidence interval states how much confidence we have
that this interval contains the true population parameter. The confidence level is denoted by
(1 − α)100%.

When can we use z for confidence interval ?


1. If population is approximately normally distributed, population standard deviation (σ) is
known and sample size is less than 30.

2. If population is not normally distributed, population standard deviation (σ) is known and
sample size is greater than or equal to 30.

In both the cases, we can use z for constructing confidence interval for µ.

Confidence interval for µ


The (1 − α)100% confidence interval for µ is

x̄ ± zσx̄ ,

where σx̄ = √σn , given that Nn ≤ 0.05. The value of z used here is obtained from the standard
normal distribution table, for the given confidence level. The quantity zσx̄ is called the margin
of error and is denoted by E.

81
Definition 11.6. The margin of error for the estimate of µ, denoted by E, is the quantity that
is subtracted from and added to the value of x̄ to obtain a confidence interval for µ. Thus,
E = zσx̄ .
We can refer to the following table for obtaining z values, associated with several commonly
used areas.

Example 116. A publishing company has just published a new college textbook. Before the
company decides the price at which to sell this textbook, it wants to know the average price of all
such textbooks in the market. The research department at the company took a random sample of
25 comparable textbooks and collected information on their prices. This information produced a
mean price of $145 for this sample. It is known that the standard deviation of the prices of all
such textbooks is $35 and the population distribution of such prices is approximately normal.
1. What is the point estimate of the mean price of all such college textbooks ?
2. Construct a 90% confidence interval for the mean price of all such college textbooks.
Answer: Here, σ is known and although n < 30, it is given that population is normally
distributed. From the given information, we have,
n = 25, x̄ = $145, σ = $35.
The standard deviation of x̄ is
σ 35
σx̄ = √ = √ = 7.
n 25
1. The point estimate of the mean price of all such college textbooks is $145, that is point
estimate of µ = x̄ = $145.
2. The confidence level is 90%. Therefore,
α = 0.10, α/2 = 0.05.
From the table, the values are z0.05 = 1.645 and −z0.05 = −1.645.
Therefore, the 90% confidence interval for µ is
x̄ ± zσx̄ = 145 ± 1.65(7.00) = 145 ± 11.55
= (145 − 11.55) to (145 + 11.55)
= $133.45 to $156.55.
Thus, we are 90% confident that the mean price of all such college textbooks is between $133.45
and $156.55.

82
Interpretation of a confidence level
1. According to the previous example, if we take all possible samples of 25 such college
textbooks each and construct a 90% confidence interval for µ around each sample mean,
we can expect that 90% of these intervals will include µ and 10% will not.

2. In the following figure, x̄1 , x̄2 and x̄3 are sample means of three different samples of the
same size drawn from the same population respectively. In the figure, 90% confidence
intervals are constructed around these three sample means.

3. As we can see that, confidence intervals constructed around x̄1 and x̄2 include µ, but the
one constructed around x̄3 does not include µ.

4. We can state that for a 90% confidence level that if we take many samples of the same
size from a population and construct 90% confidence intervals around the means of these
samples, then we expect 90% of the confidence intervals will be like the ones around x̄1
and x̄2 which include µ and 10% will be like one around x̄3 which does not include µ.

Example 117. A city planner wants to estimate the average monthly residential water usage
in the city. He selected a random sample of 40 households from the city, which gave a mean
water usage of 3415.70 gallons over a 1 month period. Based on earlier data, the population
standard deviation of the monthly residential water usage in this city is 389.60 gallons. Make
a 95% confidence interval for the average monthly residential water usage for all households in
this city.

Answer: From the given information, we have,

n = 40, x̄ = 3415.70 gallons, σ = 389.60 gallons,


confidence level = 95% or 0.95.

83
Here, shape of the population distribution is unknown, the population standard deviation is
known and the sample size is large. Hence we can use the normal distribution to make a
confidence interval for µ. The value of σx̄ is
σ 389.60
σx̄ = √ = √ = 61.60122.
n 40
To find a 95% confidence interval, we find area in each of the two tails of the normal distribution
curve which is (1 − 0.95)/2 = 0.0250. Then we look for areas 0.0250 and 0.0250 + 0.95 = 0.9750
in the normal distribution table to find the two z values. These two z values are approximately
−1.96 and 1.96. Hence we obtain the 95% confidence interval for µ,

x̄ ± zσx̄ = 3415.70 ± (1.96)(61.60122)


= (3415.70 ± 120.7383)
= 3294.96 gallons to 3536.44 gallons.

Thus, we can state with 95% confidence that average monthly residential water usage for all
households in this city is between 3294.96 gallons and 3536.44 gallons.
Example 118. According to a 2013 study by Moebs Services Inc, an individual checking account
at major U.S banks costs the banks more than $380 per year. A recent random sample of 600
such checking accounts produced a mean annual cost of $500 to major U.S banks. Assume that
the standard deviation of annual costs to major U.S. banks of all such checking accounts is $40.
Make a 99% confidence interval for the current mean annual cost to major U.S banks of all such
checking accounts.
Answer: From the given information, we have,

n = 600, x̄ = $500, σ = $40,


confidence level = 99% or 0.99.

Here, shape of the population distribution is unknown, the population standard deviation is
known and the sample size is large. Hence we can use the normal distribution to make a
confidence interval for µ. The value of σx̄ is
σ 40
σx̄ = √ = √ = 1.633.
n 600
To find a 99% confidence interval, we find area in each of the two tails of the normal distribution
curve which is (1 − 0.99)/2 = 0.0050. Then we look for areas 0.0050 and 0.0050 + 0.99 = 0.9950
in the normal distribution table to find the two z values. These two z values are approximately
−2.58 and 2.58. Hence we obtain the 99% confidence interval for µ,

x̄ ± zσx̄ = 500 ± (2.58)(1.633)


= (500 ± 4.21)
= $495.79 to $504.21.

Thus, we can state with 99% confidence that the current mean annual cost to major U.S banks
of all individual checking accounts is between $495.79 and $504.21.

84
Observations
1. The width of a confidence interval depends on the size of the margin of error, zσx̄ , which
depends on the values of z, σ and n because σx̄ = √σn . The width of a confidence interval
can be controlled using:

(a) The value of z which depends on the confidence level.


(b) The sample size n.

2. The confidence level determines the value of z which in turn determines the size of the
margin of error. The value of z increases as the confidence level increases and it decreases
as the confidence level decreases.

3. If we want to decrease the width of a confidence interval, we have two choices:

(a) Lower the confidence level.


(b) Increase the sample size.

4. Lowering the confidence level is not a good choice because a lower confidence level may
give less reliable results. Therefore, we should always prefer to increase the sample size if
we want to decrease the width of a confidence interval.

Determining the sample size for estimation of µ


Given the confidence level and the standard deviation of the population, the sample size that
will produce a predetermined margin of error E of the confidence interval estimate of µ is

z2σ2
n=
E2
Example 119. Suppose we want to estimate the mean life of a certain auto battery. If a sample
of 40 batteries can give us the desired margin of error then we will be wasting money and time
if we take a sample of a much larger size - say 500 batteries. In such cases, if we know the
confidence level and the margin of error that we want, then we can find the (approximate) size
of the sample that will produce the required result.


E=√
n
z2σ2
E2 =
 2n 2 
z σ
n= .
E2
Example 120. An alumni association wants to estimate the mean debt of this year’s college
graduates. It is known that the population standard deviation of the debts of this year’s college
graduates is $11, 800. How large a sample should be selected so that a 99% confidence interval
of the estimate is within $800 of the population mean?

85
Answer: The alumni association wants the 99% confidence interval for the mean debt of this
year’s college graduates to be
x̄ ± 800.
Hence, the maximum size of the margin of error of estimate is to be $800; that is,

E = $800.

The value of z for a 99% confidence level is 2.58. The value of σ is given to be $11, 800. Therefore,
substituting all values in the formula and simplifying, we obtain
z2σ2 (2.58)2 (11, 800)2
n= = = 1448.18 ≈ 1449.
E2 (800)2
Thus, the minimum required sample size is 1449. If the alumni association takes a random
sample of 1449 of this year’s college graduates, computes the mean debt for this sample, and
then makes a 99% confidence interval around this sample mean, the margin of error of estimate
will be approximately $800. Note that we have rounded the final answer for the sample size
to the next higher integer. This is always the case when determining the sample size because
rounding down to 1448 will result in a margin of error slightly greater than $800.
Example 121. A company that produces detergents wants to estimate the mean amount of
detergent in 64-ounce jugs at a 99% confidence level. The company knows that the standard
deviation of the amounts of detergent in all such jugs is 0.20 ounce. How large a sample should
the company select so that the estimate is within 0.04 ounce of the population mean?
Answer: The company wants the 99% confidence interval for the mean amount of detergent in
64-ounce jugs to be
x̄ ± 0.04.
Hence, the maximum size of the margin of error of estimate is to be 0.04 ounce; that is,

E = 0.04.

The value of z for a 99% confidence level is 2.58. The value of σ is given to be 0.20 ounce.
Therefore, substituting all values in the formula and simplifying, we obtain
z2σ2 (2.58)2 (0.20)2
n= = = 166.41 ≈ 167.
E2 (0.04)2
Thus, the minimum required sample size is 167.

When can we use t for confidence interval ?


1. If population is approximately normally distributed, population standard deviation (σ) is
not known and sample size is less than 30.

2. If population is not normally distributed, population standard deviation (σ) is not known
and sample size is greater than or equal to 30.

In both the cases, we can use t for constructing confidence interval for µ.

86
The t-distribution
1. The t-distribution is a specific type of bell shaped distribution with a lower height and a
greater spread than the standard normal distribution.

2. As the sample size becomes larger, the t distribution approaches the standard normal
distribution.

3. The mean of the t-distribution is 0.

4. t-distribution is symmetrical about the mean.

5. The t-distribution has only one parameter called the degrees of freedom.

Confidence interval for µ


The (1 − α)100% confidence interval for µ is

x̄ ± tsx̄ ,

where sx̄ = √sn , given that Nn ≤ 0.05. The value of t is obtained from the t distribution table for
(n − 1) degrees of freedom and the given confidence level. Here, tsx̄ is the margin of error or the
maximum error of the estimate, that is

E = tsx̄ .

Example 122. According to a 2014 Kaiser Family Foundation Health Benefits survey released
in 2015, the total mean cost of employer sponsored family health coverage was $16, 834 per
family per year, of which workers were paying an average of $4823. A recent random sample
of 25 workers from New York city who have employer provided health insurance coverage paid
an average premium of $6600 for family health insurance coverage with a standard deviation of
$800. Make a 95% confidence interval for the curreent average premium paid for family health
insurance coverage by all workers in the New York city who have employer provided health
insurance coverage. Assume that the distribution of premiums paid for family health insurance
coverage by all workers in New York city who have employer provided health insurance coverage
is approximately normally distributed.

Answer: Here, σ is not known, n < 30 and the population is normally distributed. Therefore
we will use the t distribution to make a confidence interval for µ. From the given information,
we have,
n = 25
x̄ = $6600
s = $800
Confidence level = 95% or 0.95.
The value of sx̄ is
s 800
sx̄ = √ = √ = $160.
n 25

87
Here, degrees of freedom = (25 − 1) = 24. To find the area in each tail, we divide the confidence
level by 2 and subtract the number obtained from 0.5. Thus,
Area in each tail = {0.5 − (0.95/2)}
= (0.5 − 0.4750)
= 0.025
By substituting all values in the formula for the confidence interval for µ, we obtain the 95%
confidence interval as
x̄ ± tsx̄ = 6600 ± 2.064(160) =(6600 ± 330.24)
=$6269.76 to $6930.24.
Thus, we can state that with 95% confidence that the current average premiers paid for family
health insurance coverage by all workers in New York city who have employer -provided health
insurance coverage is between $ 6269.76 and $6930.24.
Example 123. Sixty-four randomly selected adults who buy books for general reading were asked
how much they usually spend on books per year. This sample produced a mean of $1450 and a
standard deviation of $300 for such annual expenses. Determine a 99% confidence interval for
the corresponding population mean.
Answer: Here, σ is not known, n > 30 and the population is not normally distributed. There-
fore we will use the t distribution to make a confidence interval for µ. From the given information,
we have,
n = 64
x̄ = $1450
s = $300
Confidence level = 99% or 0.99.
The value of sx̄ is
s 300
sx̄ = √ = √ = $37.50.
n 64
Here, degrees of freedom = (64 − 1) = 63. To find the area in each tail, we divide the confidence
level by 2 and subtract the number obtained from 0.5. Thus,
Area in each tail = {0.5 − (0.99/2)}
= (0.5 − 0.4950)
= 0.0050
By substituting all values in the formula for the confidence interval for µ, we obtain the 99%
confidence interval as
x̄ ± tsx̄ = 1450 ± 2.656(37.5) =(1450 ± 99.60)
=$1350.40 to $1549.60.
Thus, we can state that with 99% confidence that the annual expenses for reading books by all
adults per year is between $ 1350.40 and $ 1549.60.
Remark 11.5. If the sample size is large and the number of degrees of freedom is not in t
distribution table, then we can use the normal distribution as an approximation to t distribution.
In such a case, replace σ by s and σx̄ by sx̄ in the formula for constructing the confidence interval
for µ when population standard deviation (σ) is known.

88
12 Sample Proportion
The sample proportion, p̂ is a sample statistic, possesses a sampling distribution. We know that

1. The sampling distribution of the sample proportion p̂ is approximately normal for a large
sample.

2. The mean of the sampling distribution of p̂, µp̂ is equal to the population proportion p.

3. The standard deviation of the sampling distribution of p̂ is σp̂ = pq


p
n
, where q = 1 − p,
n
given that N ≤ 0.05.

In the case of a proportion, a sample is considered to be large if np and nq are both greater
than 5. If p and q are not known, then np̂ and nq̂ should each be greater than 5 for the sample
to be large. When estimating the value of a population proportion, we do not know the values
of p and q. Consequently, we cannot compute σp̂ . Therefore, in the estimation of a population
proportion, we use the value of sp̂ as an estimate of σp̂ . Here, sp̂ is an estimator of σp̂ . The value
of sp̂ , which gives a point estimate of σp̂ , is calculated as follows:
r
p̂q̂
sp̂ =
n
n
Please note that the condition N
≤ 0.05 must hold true to use this formula.

Confidence interval for p


For a large sample, the (1 − α)100% confidence interval for the population proportion, p, is

p̂ ± zsp̂ .

The value of z isqobtained from the standard normal distribution table for the given confidence
level, and sp̂ = p̂q̂
n
. The term zsp̂ is called the margin of error, or the maximum error of the
estimate, and is denoted by E.

Example 124. PolicyInteractive of Eugene, Oregon, conducted a study in April 2014 for the
Center for a New American Dream that included a sample of 1821 American adults. Seventy five
percent of the people included in this study said that having basic needs met is very or extremely
important in their vision of the American dream.

1. What is the point estimate of the corresponding population proportion?

2. Find, with a 99% confidence level, the percentage of all American adults who will say that
having basic needs met is very or extremely important in their vision of the American
dream. What is the margin of error of this estimate?

Answer: Let p be the proportion of all American adults who will say that having basic needs
met is very or extremely important in their vision of the American dream, and let p̂ be the
corresponding sample proportion. From the given information, n = 1821, p̂ = 0.75, and q̂ =

89
1 − p̂ = 1 − 0.75 = 0.25. First, we calculate the value of the standard deviation of the sample
proportion as follows: r r
p̂q̂ (0.75)(0.25)
sp̂ = = = 0.010147.
n 1821
Please note that np̂ and nq̂ are both greater than 5. Consequently, the sampling distribution of
p̂ is approximately normal and we will use the normal distribution to make a confidence interval
about p.
1. The point estimate of the proportion of all American adults who will say that having basic
needs met is very or extremely important in their vision of the American dream is equal
to 0.75; that is, point estimate of p = p̂ = 0.75.
2. The confidence level is 99%, or 0.99. To find z for a 99% confidence level, first we find the
area in each of the two tails of the normal distribution curve, which is (1−0.99)/2 = 0.0050.
Then, we look for 0.0050 and 0.0050 + 0.99 = 0.9950 areas in the normal distribution table
to find the two values of z. These two z values are −2.58 and 2.58. Thus, we will use
z = 2.58 in the confidence interval formula. Substituting all the values in the confidence
interval formula for p, we obtain
p̂ ± zsp̂ = 0.75 ± 2.58(0.010147) = 0.75 ± 0.026 = 0.724 to 0.77 or 72.4% to 77.6%.
Thus, we can state with 99% confidence that 0.724 to 0.776 or 72.4% to 77.6% of all
American adults will say that having basic needs met is very or extremely important in
their vision of the American dream.
The margin of error associated with this estimate of p is 0.026 or 2.6%, that is, Margin of
error = zsp̂ = 0.026 or 2.6%.
Example 125. According to a Gallup-Purdue University study of college graduates conducted
during February 4 to March 7, 2014, 63% of college graduates polled said that they had at least
one college professor who made them feel excited about learning. Suppose that this study was
based on a random sample of 2000 college graduates. Construct a 97% confidence interval for
the corresponding population proportion.
Answer: Let p be the proportion of all college graduates who would say that they had at least
one college professor who made them feel excited about learning, and let p̂ be the corresponding
sample proportion. From the given information, n = 2000, p̂ = 0.63, and q̂ = 1 − p̂ = 1 − 0.63 =
0.37. Here, Confidence level = 97% or 0.97. The standard deviation of the sample proportion is
r r
p̂q̂ (0.63)(0.37)
sp̂ = = = 0.010795.
n 2000
Please note that np̂ and nq̂ are both greater than 5. Consequently, the sampling distribution of
p̂ is approximately normal and we will use the normal distribution to make a confidence interval
about p. To find this z value, we will look for the areas 0.0150 and 0.9850. Substituting all the
values in the formula, the 97% confidence interval for p is
p̂ ± zsp̂ = 0.63 ± 2.17(0.010795) = 0.63 ± 0.023 = 0.607 to 0.653 or 60.7% to 65.3%.
Thus, we can state with 97% confidence that the proportion of all college graduates who
would say that they had at least one college professor who made them feel excited about learning
is between 0.607 and 0.653 or between 60.7% and 65.3%.

90
Determining the sample size for estimation of p
Given the confidence level and the values of p̂ and q̂, the sample size that will produce a
predetermined margin of error E of the confidence interval estimate of p is

z 2 p̂q̂
n= .
E2
We can observe from this formula that to find n, we need to know the values of p̂ and q̂. However,
the values of p̂ and q̂ are not known to us. In such a situation, we can choose one of the following
two alternatives.

1. We make the most conservative estimate of the sample size n by using p̂ = 0.50 and
q̂ = 0.50. For a given E, these values of p̂ and q̂ will give us the largest sample size in
comparison to any other pair of values of p̂ and q̂ because the product of p̂ = 0.50 and
q̂ = 0.50 is greater than the product of any other pair of values for p̂ and q̂.

2. We take a preliminary sample (of arbitrarily determined size) and calculate p̂ and q̂ for
this sample. Then, we use these values of p̂ and q̂ to find n.

Example 126. Lombard Electronics Company has just installed a new machine that makes a
part that is used in clocks. The company wants to estimate the proportion of these parts produced
by this machine that are defective. The company manager wants this estimate to be within 0.02
of the population proportion for a 95% confidence level. What is the most conservative estimate
of the sample size that will limit the margin of error to within 0.02 of the population proportion?

Answer: The company manager wants the 95% confidence interval to be

p̂ ± 0.02.

Therefore, E = 0.02. The value of z for a 95% confidence level is 1.96. For the most conservative
estimate of the sample size, we will use

p̂ = 0.50 and q̂ = 0.50.

Hence, the required sample size is

z 2 p̂q̂ (1.96)2 (0.50)(0.50)


n= = = 2401.
E2 (0.02)2

Thus, if the company takes a sample of 2401 parts, there is a 95% chance that the estimate of
p will be within 0.02 of the population proportion.

Example 127. Suppose a preliminary sample of 200 parts produced by this machine showed
that 7% of them are defective. How large a sample should the company select so that the 95%
confidence interval for p is within 0.02 of the population proportion?

Answer: Again, the company wants the 95% confidence interval for p to be

p̂ ± 0.02.

91
Hence, E = 0.02. The value of z for a 95% confidence level is 1.96. From the preliminary sample,
p̂ = 0.07 and q̂ = 1 − 0.07 = 0.93.
Using these values of p̂ and q̂, we obtain
z 2 p̂q̂ (1.96)2 (0.07)(0.0.93)
n= = = 2401 = 625.22 ≈ 626.
E2 (0.02)2
Thus, if the company takes a sample of 626 parts, there is a 95% chance that the estimate of p
will be within 0.02 of the population proportion.

13 Maximum Likelihood Estimation


To introduce the method of maximum likelihood, consider a very simple estimation problem.
Suppose that an urn contains a number of black and a number of white balls, and suppose that
it is known that the ratio of the numbers is 3/1 but that it is not known whether the black or
the white balls are more numerous. That is, the probability of drawing a black ball is either 34
or 41 . If n balls are drawn with replacement from the urn, the distribution of X, the number of
black balls, is given by the binomial distribution
f (x; p) = n Cx px q n−x for x = 0, 1, 2, . . . , n,
where q = 1 − p and p is the probability of drawing a black ball. Here p = 41 or p = 34 . We
shall draw a sample of three balls, that is, n = 3, with replacement and attempt to estimate
the unknown parameter p of the distribution. The possible outcomes and their probabilities are
given below:
Outcome: x 0 1 2 3
f (x; 3/4) 1/64 9/64 27/64 27/64
f (x; 1/4) 27/64 27/64 9/64 1/64
In the present example, if we found x = 0 in a sample of 3, the estimate 0.25 for p would be
preferred over 0.75 because the probability 2764
1
is greater than 64 , i.e., because a sample with
x = 0 is more likely (in the sense of having larger probability) to arise from a population with
p = 14 than from one with p = 43 . And in general, we should estimate p by 0.25 when x = 0 or 1
and by 0.75 when x = 2 or 3. The estimator may be defined as
(
0.25 for x = 0, 1
p̂ = p̂(x) =
0.75 for x = 2, 3.
The estimator thus selects for every possible x the value of p, say p̂, such that
f (x; p̂) > f (x; p0 ),
where p0 is the alternative value of p.
Definition 13.1. The likelihood function of n random variables X1 , X2 , . . . , Xn is defined to be
the joint density of the n random variables, say, fX1 ,...,Xn (x1 , . . . , xn ; θ)), which is considered to
be a function of θ. In particular, if X1 , . . . , Xn is a random sample from the density f (x; θ),
then the likelihood function is f (xl ; θ)f (x2 ; θ) · · · f (xn ; θ).

92
The likelihood function L(θ; x1 , x2 , . . . , xn ) gives the likelihood that the random variables
assume a particular value xl , x2 , . . . , xn .
Definition 13.2. We want to find the value of θ, denoted by θ̂, which maximizes the likelihood
function L(θ; x1 , x2 , . . . , xn ). The value θ̂ which maximizes the likelihood function is, in general,
a function of x1 , . . . , xn , say θ̂ = θ̂(x1 , x2 , . . . , xn ). When this is the case, the random variable
θ̂ = θ̂(X1 , X2 , . . . , Xn ) is called the maximum likelihood estimator of θ.
The most important cases which we shall consider are those in which X1 , X2 , . . . , Xn is a
random sample from some density f (x; θ), so that the likelihood function is

L(θ) = f (x1 ; θ)f (x2 ; θ) . . . f (xn ; θ).

The maximum likelihood estimator is the solution of the equation


∂L(θ)
= 0.
∂θ
Also L(θ) and ln L(θ) have their maxima at the same value of θ, and it is sometimes easier to
find the maximum of the logarithm of the likelihood.
Remark 13.1. You can use log or ln for log likelihood function.
Example 128. Consider a characteristic that occurs in proportion p of a population. Let
X1 , . . . , Xn be a random sample of size n so P [Xi = 0] = 1 − p and P [Xi = 1] = p for
i = 1, . . . , n. where 0 ≤ p ≤ 1. Obtain the maximum likelihood estimator of p.
Answer: The concerned p.m.f is:

P (Xi = xi ) = pxi (1 − p)(1−xi ) .

The likelihood function is


n
Y Pn Pn
L(p|x1 , x2 , . . . , xn ) = pxi (1 − p)(1−xi ) = p i=1 xi
(1 − p)n− i=1 xi
.
i=1

Equivalently, we maximize the log-likelihood function logL(p) over 0 < p < 1. Setting the
derivative equal to zero,
n n
!
∂logL(p) ∂ X X 
= xi logp + n − xi log(1 − p)
∂p ∂p i=1 i=1
Pn
n − ni=1 xi
P
i=1 xi
= − =0
p 1−p
We obtain the maximum likelihood estimator p̂ = ni=1 xni . That is, p̂ is the fraction of persons
P
in the sample that have the characteristic.
Example 129. Let X1 , . . . , Xn be a random sample of size n from the Poisson distribution
λx e−λ
f (x/λ) = ,
x!
where 0 ≤ λ < ∞. Obtain the maximum likelihood estimator of λ.

93
Answer: The concerned p.m.f is:

λxi e−λ
P (Xi = xi ) = .
xi !
The likelihood function is
n
Y λxi e−λ Pn
xi −nλ 1
L(λ|x1 , x2 , . . . , xn ) = =λ i=1 e Qn .
i=1
xi ! i=1 xi !

We can maximize the log-likelihood function logL(λ) by setting it’s derivative equal to zero,
Pn
∂logL(λ) xi
= i=1 − n = 0.
∂λ λ
Pn
So, the maximum likelihood estimator λ̂ = i=1 xi /n.

Example 130. Let X1 , . . . , Xn be a random sample of size n from the Exponential distribution

f (x/λ) = λe−λx ,

where 0 ≤ λ < ∞. Obtain the maximum likelihood estimator of λ.

Answer: The concerned p.d.f is:

P (Xi ≤ xi ) = λe−λxi .

The likelihood function is


n
Y Pn
L(λ|x1 , x2 , . . . , xn ) = λe−λxi = λn e−λ i=1 xi
.
i=1

We can maximize the log-likelihood function logL(λ) by setting it’s derivative equal to zero,
n
∂logL(λ) n X
= − xi = 0.
∂λ λ i=1
Pn
So, the maximum likelihood estimator λ̂ = n/ i=1 xi .

References
[1] Mann, Prem S. Introductory statistics. John Wiley & Sons, 2010.

[2] Gupta, S. P. Elementary Statistical Methods. Sultan Chand & Sons, 2022.

[3] Daniel, Wayne W. Biostatistics: a foundation for analysis in the health sciences. Vol. 129.
Wiley, 1978.

94

You might also like