Awoke Introduction To Statistics
Awoke Introduction To Statistics
In the plural sense:- statistics is defined as the collection of numerical facts or figures ( or the
raw data themselves).
Eg. 1. Vital statistics (numerical data on marriage, births, deaths, etc).
2. The average mark of statistics course for students is 70% would be considered as a
statistics whereas Abebe has got 90% in statistics course is not statistics.
Remark: statistics are aggregate of facts. Single and isolated figures are not statistics as they
cannot be compared and are unrelated.
In its singular sense:- the word Statistics is the subject that deals with the methods of collecting,
organizing, presenting, analyzing and interpreting statistical data.
Classification of Statistics
Statistics is broadly divided into two categories based on how the collected data are used.
Descriptive Statistics:- deals with describing the data collected without going further
conclusion.
Example 1.1: Suppose that the mark of 6 students in Statistics course for Mathematics is given
as 40, 45, 50, 60, 70 and 80. The average mark of the 6 students is 57.5 and it is considered as
descriptive statistics.
1
Inferential Statistics:- It deals with making inferences and/or conclusions about a population
based on data obtained from a sample of observations. It consists of performing hypothesis
testing, determining relationships among variables and making predictions.
Example 1.2: In the above example, if we say that the average mark in Statistics course for
Mathematics students is 57.5, then we talk about inferential statistics (draw conclusion based on
the sample observation).
2
Interpretation of data: - Interpretation means drawing conclusions from the data collected and
analyzed. Correct interpretation will lead to a valid conclusion of the study & thus can aid in
decision making.
1.3 Definition of some statistical terms
Population: - It is the totality of objects unde n doesn’t necessarily refer to people.
Examples:
All clients of Telephone Company
All students of Debre Markos University (DMU)
Population of families, etc.
The population could be finite or infinite (an imaginary collection of units).
Sample: - is part or subset of population under study.
Sampling frame: - is the list of all possible units of the population that the sample can be drawn
from it.
Eg. List of all students of MeU, List of all residential houses in Mett town, etc
Quantitative variables: - are variables which assume numerical values. eg. Age, weight, etc.
4
ratios, rates, coefficients, etc, are the tools that can be used for the purpose of comparing sets of
data.
• Statistics helps to predict future trends: statistics is very useful for analyzing the past and
present data and forecasting future events.
• Statistics helps to formulate & review policies: Statistics provide the basic material for
framing suitable policies. Statistical study results in the areas of taxation, on unemployment rate,
on inflation, on the performance of every sort of military equipment, etc, may convince a
government to review its policies and plans with the view to meet national needs and aspirations.
• Formulating and testing hypothesis: Statistical methods are extremely useful in formulating
and testing hypothesis and to develop new theories.
Limitations of Statistics
The field of statistics, though widely used in all areas of human knowledge and widely applied in
a variety of disciplines such as engineering, economics and research, has its own limitations.
Some of these limitations are:
a) It does not deal with individual values: as discussed earlier, statistics deals with aggregate of
facts. For example, wage earned by an individual worker at any one time, taken by itself is not a
statistics.
b) It does not deal with qualitative characteristics directly: statistics is not applicable to
qualitative characteristics such as beauty, honesty, poverty, standard of living and so on since
these cannot be expressed in quantitative terms. These characteristics, however, can be
statistically dealt with if some quantitative values can be assigned to these with logical criterion.
For example, intelligence may be compared to some degree by comparing IQs or some other
scores in certain intelligence tests.
c) Statistical conclusions are not universally true: since statistics is not an exact science, as is
the case with natural sciences, the statistical conclusions are true only under certain assumptions.
d) It can be misused: statistics cannot be used to full advantage in the absence of proper
understanding of the subject matter.
1.5 Levels of Measurement
Proper knowledge about the nature and type of data to be dealt with is essential in order to
specify and apply the proper statistical method for their analysis and inferences.
Scale Types
5
Measurement is the assignment of values to objects or events in a systematic fashion. Four levels
of measurement scales are commonly distinguished: nominal, ordinal, interval, and ratio and
each possessed different properties of measurement systems. The first two are qualitative while
the last two are quantitative.
Nominal scale: The values of a nominal attribute are just different names, i.e., nominal attributes
provide only enough information to distinguish one object from another. Qualities with no
ranking or ordering; no numerical or quantitative value. These types of data are consists of
names, labels and categories. This is a scale for grouping individuals into different categories.
Example 1.3: Eye color: brown, black, etc, sex: male, female.
In this scale, one is different from the other
Arithmetic operations (+, -, *, ÷) are not applicable, comparison (<, >, ≠, etc) is
impossible
Ordinal scale: - defined as nominal data that can be ordered or ranked.
Can be arranged in some order, but the differences between the data values are
meaningless.
Data consisting of an ordering of ranking of measurements are said to be on an ordinal
scale of measurements. That is, the values of an ordinal scale provide enough information
to order objects.
One is different from and greater /better/ less than the other
Arithmetic operations (+, -, *, ÷) are impossible, comparison (<, >, ≠, etc) is possible.
Example 1.4 -Letter grading (A, B, C, D, F), -Rating scales (excellent, very good, good, fair,
poor), military status (general, colonel, lieutenant, etc).
Interval Level: data are defined as ordinal data and the differences between data values are
meaningful. However, there is no true zero, or starting point, and the ratio of data values are
meaningless. Note: Celsius & Fahrenheit temperature readings have no meaningful zero and
ratios are meaningless.
In this measurement scale:-
One is different, better/greater and by a certain amount of difference than another.
Possible to add and subtract. For example; 800c – 500c = 300c, 700c – 400c = 300c.
6
Multiplication and division are not possible. For example; 60 0c = 3(200c). But this does
not imply that an object which is 600c is three times as hot as an object which is 200c.
Most common examples are: IQ, temperature.
Ratio scale: Similar to interval, except there is a true zero (absolute absence), or starting point,
and the ratios of data values have meaning.
Arithmetic operations (+, -, *, ÷) are applicable. For ratio variables, both differences and
ratios are meaningful.
One is different/larger /taller/ better/ less by a certain amount of difference and so much
times than the other.
This measurement scale provides better information than interval scale of measurement.
Example 1.5: weight, age, number of students.
Exercise 1
7
CHAPTER TWO: METHODS OF DATA COLLECTION AND PRESENTATION
Data: - is the raw material of statistics. It can be obtained either by measurement or counting.
Sources of data
There are two types of source of data:
1. Primary source
2. Secondary source
The statistical data may be classified under two categories depending up on the sources.
1. Primary data: - Data collected by the investigator himself for the purpose of a specific
inquiry or study. Such data are original in character & are mostly generated by surveys
conducted by individuals or research institutions.
It is more reliable & accurate since the investigator can extract the correct information by
removing doubts, if any, in the minds of the respondents regarding certain questions.
2. Secondary data: - When an investigator uses data, which have already been collected by
others, such data are called secondary data. Such data are primary data for the agency that
collected them, and become secondary for some one else who uses these data for his own
purposes. Example of secondary data: books, reports, magazines, etc.
When our source is secondary data check that:
The type and objective of the situations.
The purpose for which the data are
collected and compatible with the present problem.
The nature and classification of data is
appropriate to our problem.
There are no biases and misreporting in
the published data.
Note: Data which are primary for one may be secondary for the other.
2.2 Methods of Data Presentation
8
Having collected and edited the data, the next important step is to organize it. That is to present it
in a readily comprehensible condensed form that aids in order to draw inferences from it. It is
also necessary that the like be separated from the unlike ones.
The presentation of data is broadly classified in to the following two categories:
Tabular presentation
Diagrammatic and Graphic presentation.
The process of arranging data in to classes or categories according to similarities technically is
called classification. It eliminates inconsistency and also brings out the points of similarity
and/or dissimilarity of collected items/data.
Classification is necessary because it would not be possible to draw inferences and conclusions if
we have a large set of collected [raw] data.
2.2.1 Frequency distribution
Frequency:- is the number of times a certain value or class of values occurs.
Frequency distribution (FD):- is the organization of raw data in table form using classes and
frequency.
There are three types of FD and there are specific procedures for constructing each type.
I. Categorical FD
II. Ungrouped FD and
III. Grouped FD
I. Categorical FD: Used for data that can be placed in specific categories; such as nominal,
ordinal level of data.
Example 2.1: Twenty five patients were given a blood test to determine their blood type. The
data is as shown below: A,A, B B AB O O O B AB B B B O A O O O AB AB A O O B A.
Solution: since the data are categorical by taking the four blood types as classes we can
construct a FD as shown below.
Step 1: Make a table as shown below
Step 2: Tally data and place the result under the column Tally
Step 3: Count the tallies and place the result under the column Frequency.
Step 4: find the percentage of values in each class by the formula (%= f/n * 100%; f= frequency,
n total number of observation.)
Is often constructed for small set of data or data once discrete variable?
First find the smallest and largest raw score in the collected data.
Arrange the data in order of magnitude and count the frequency.
To facilitate counting one may include a column of tallies.
8 76 90 85 80
0
7 60 62 70 85
0
6 60 63 74 75
5
7 70 70 80 85
6
10
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data.
Step 4: Compute the frequency.
Mar Tally Frequency
k
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
-Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.
3. Grouped Frequency Distribution (GFD).
When the range of the data is large the data must be grouped in to classes that are more than one
unit in width.
Definition of some basic terms
Grouped frequency distribution: is a FD when several numbers are grouped into one
class.
Class limits (CL): It separates one class from another. The limits could actually appear in
the data and have gaps between the upper limits of one class and the lower limit of the next
class.
Unit of measure (U): This is the possible difference between successive values. E.g. 1,
0.1, 0.01, 0.001……
11
Class boundaries: Separate one class in a grouped frequency distribution from the other.
The boundary has one more decimal place than the raw data. There is no gap between the
upper boundaries of one class and the lower boundaries of the succeeding class. Lower
class boundary is found by subtracting half of the unit of measure from the lower class
limit and upper class boundary is found by adding half unit measure to the upper class
limit.
Class width (W): The difference between the upper and lower boundaries of any
consecutive class. The class width is also the difference between the lower limit or upper
limits of two consecutive classes.
Class mark (Mid point): It is found by adding the lower and upper class limit
(Boundaries) and divided the sum by two.
Cumulative frequency (CF): It is the number of observation less than the upper class
boundary or greater than the lower class boundary of class.
CF (Less than type): it is the number of values less than the upper class boundary of a
given class.
CF (Greater than type): it is the number of values greater than the lower class boundary
of a given class.
Relative frequency (Rf ):The frequency divided by the total frequency. This gives the
percent of values falling in that class.
Rfi = fi/n= fi/∑fi
Relative cumulative frequency (RCf): The running total of the relative frequencies or the
cumulative frequency divided by the total frequency gives the percent of the values which
are less than the upper class boundary or the reverse.
12
3. Select the number of class desired (K)
I. Choose arbitrary between 5 and 15.
II. Using sturgles formula
K= 1 + 3.322Log n; n= Total frequency
4. Find the class width (W) by dividing the range by the number of classes and round to the
nearest integer.
W = R/K
5. Identify the unit of measure usually as 1, 0.1, 0.01,…..
6. Pick a suitable starting point less than or equal to the minimum value. Your starting point
is lower limit of the first class.
- Then continue to add the class width to get the rest lower class limits.
7. Find the upper class limits UCLi = LCLi+ w-U. then continue to add width to get the rest
upper class limits
8. find class boundaries
LCBi = LCLi – ½ U, UCBi = UCLi + ½ U
9. Find class mark
CMi = (UCLi + LCLi) / 2 or CMi = (UCBi + LCBi) / 2.
10. Tally the data
11. Find the frequencies
12. Find the cumulative frequencies. Depending on what you are trying to accomplish, it may
be necessary to find the cumulative frequency.
13. If necessary find RF and RCF.
13
The groups should normally be of an equal width, so that the counts in different groups can
easily be compared.
Example 2.3: Construct FD for the following data.
11 29 6 33 14 31 22 27 19 20 18 17 22 38 23 21 26 34 39 27
Solution:-
1) Highest value = 39, Lowest value = 6
2) Range = 39 – 6 = 33
3) K = 1+ 3.322Log20 = 1 + 3.322(2.301) = 5.6 ≈ 6
4) W = R / K = 33/6 = 5.5 ≈ 6
5) U = 1
6) LCL1= 6
7) Find the upper class limits.
8) Find class boundaries
9) Find class mark
10) Tally the data
Class Class Class Tally Frequenc CF(<) CF(>) RF RCF(>)
limit boundary Mark y
6 – 11 5.5 – 11.5 8.5 // 2 2 20 2/20=0.1 1
12 – 17 11.5 – 17.5 14.5 // 2 4 18 2/20=0.1 0.9
18 – 23 17.5 – 23.5 20.5 ///// // 7 11 16 7/20=0.35 0.8
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 4/20=0.2 0.45
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 3/20=0.15 0.25
36 – 41 35.5 – 41.5 38.5 // 2 20 2 2/20=0.1 0.10
14
The three most commonly used diagrammatic presentation for discrete as well as qualitative data
are:
Pie chart
Bar chart
Pictogram
A) Pie chart
A pie chart is a circle that is divided in to sections or wedges according to the percentage of
frequencies in each category of the distribution. The angle of the sector is obtained using:
Value of t h e part
Angle of a sector= ∗3600
T h e w h ole quantity
Example 2.4: Draw a suitable diagram to represent the following population in a town.
Step 3: Using a protractor and compass, graph each section and write its name with
corresponding percentage.
15
B) Bar Charts
Used to represent & compare the frequency distribution of discrete variables and
attributes or categorical series.
Bars can be drawn either vertically or horizontally.
All bars must have equal width and the distance between bars must be equal.
The height or length of each bar indicates the size (frequency) of the figure represented.
There are different types of bar charts. The most common being:
Example 2.5: Number of students in the four department of Science College given as follows:
16
students
Solution:
Simple bar chart
800 600
Frequency
Example 2.6: Draw a component (sub-divided) bar chart of the number of students by
department is given in the example 2.5.
Solution:
800
600 Female
Frequency 400 Male
200
0
Phys Maths Chem Bio
Department
17
III. Multiple Bar charts
These are used to display data on more than one variable.
They are used for comparing different variables at the same time.
Example 2.7: The following data represent sales by product, 1957- 1959 of a given company
for three products A, B, C.
Solution:
C) Pictograph
In this diagram, we represent data by means of some picture symbols. We decide about a suitable
picture to represent a definite number of units in which the variable is measured.
The histogram, frequency polygon and cumulative frequency graph or ogive is most commonly
applied graphical representation for continuous data.
18
Draw and label the X and Y axis.
Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y
axis.
Represent the class boundaries for the histogram or ogive or the mid points for the frequency
polygon on the X axis.
Plot the points.
Draw the bars or lines to connect the points.
Histogram
A graph which displays the data by using vertical connected bars of various heights to represent
frequencies. Class boundaries are placed along the horizontal axis. Class marks and class limits
are some times used as quantity on the X axis.
Example 2.8: Construct a histogram to represent the following data.
Solution:
Histogram
eq
ue
nc
Fr
20
15
15 12
10
10
4 4
5 3 2
0
Class boundaries
19
Frequency polygon
If we join the mid-points of the tops of the adjacent rectangles of the histogram with line
segments a frequency polygon is obtained. When the polygon is continued to the x-axis just
outside the range of the lengths the total area under the polygon will be equal to the total area
under the histogram.
Example 2.9: Construct a frequency polygon to represent the previous data in example 2.8.
Solution:
Class Frequency Class Class R.F. % R.F. Less than More than
limits marks boundaries C.F. C. F.
(percent)
Adding two class marks with f i 0 , we have 9.5 at the beginning, and 89.5 at the end, the
following frequency polygon is plotted:
Frequency Polygon
16
14
F 12
r
e 10
q 8
u
e 6
n 4
c
y 2
0
9.5 19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5
Class mark
20
An Ogive (pronounced as “oh-jive”) is a line that depicts cumulative frequencies, just as the
cumulative frequency distribution lists cumulative frequencies. Note that the Ogive uses class
boundaries along the horizontal scale, and graph begins with the lower boundary of the first class
and ends with the upper boundary of the last class. Ogive is useful for determining the number of
values below or above some particular value. There are two type of Ogive namely less than
Ogive and more than Ogive. The difference is that less than Ogive uses less than cumulative
frequency and more than Ogive uses more than cumulative frequency on y axis.
Example 2.10: Draw a both types of ogives for the F.D. of Example 2.8.
Solutions:
The More th
The Less than Ogive an Ogive
uenc
Cum
Freq
ulati
ve
60 60
y
50 50
40 40
Cumulative
Frequency
30 30
20
20
10
10
0
0
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5
Class Boundaries
Class Boundaries
Note: For both ogives, one class with frequency zero is added for similar reason with the
frequency polygon.
Exercise 2
2. Distinguish between primary and secondary data. What precautions should be taken
before using secondary data?
3. Construct a frequency distribution for a survey taken at a hotel, that 40 tourists arrived by
the following means of transportation:
car car bus plane plane car plane plane bus car plane car car car
plane bus car bus car plane car car car bus car bus bus
plane plane plane car plane plane plane bus bus car car plane car
21
4. The following are weekly salaries (in birr) of employees of a firm:
a) How many classes can be used? c) What LCL would be used for the first class?
b) What class width should be used? d) Prepare the complete frequency distribution.
5. Given the following frequency distribution:
frequency 16 25 13 4 2
Find a) the class marks; b) the class boundaries; c) the relative frequencies
Objectives
At the end of this chapter students will be able to:
Identify measure of central tendency.
understand properties of arithmetic mean.
Summarize an aggregate of statistical data by using single measure.
Define and calculate the mean, mode and median.
measure the position of data using quartiles, deciles and percentiles with their
interpretation.
3.1 The Summation Notation (¿
Statistical Symbols: Let a data set consists of a number of observations, represents by x 1 , x 2
, … , x n where n (the last subscript) denotes the number of observations in the data and x i is the ith
22
observation. Then the sum of all numbers ( x ¿¿ i' s)¿where i goes from 1 up to n is symbolically
n
given by ∑ x i∨∑ x i ¿ ∑ x that is,
i=1
∑ x i = x 1 + x 2 +…+ xn
x - whole set of numbers
x i - specific score in a set of numbers
n - total number of observations
For instance a data set consisting of six measurements 2, 3, 9, 10, 8 and -2 is represented by x 1 ,
6
x 2 , … , x 6 where x 1=2, x 2 =3, x 3=¿9, x 4 = 10, x 5= 8 and x 6=-2 Their sum becomes ∑ xi = x1 +
i=1
n n
2. ∑ bxi = b∑ x i where b is a constant number
i=1 i=1
n n
3. ∑ (a+ bx¿¿ i)¿ = n.a + b∑ x i
i=1 i=1
n n n
4. ∑ ¿¿ ¿ = ∑ x i ± ∑ y i
i=1 i=1 i=1
n n n
5. ∑ xi yi ≠ ∑ xi ∑ yi
i=1 i=1 i=1
7 7 7 7
2 2
Example 3.1: ∑ x i = 20 , ∑ yi = 30, ∑ x i = 420, ∑ yi =280
i=1 i=1 i=1 i=1
7 7 7
Find i/ ∑ ¿¿ + 4 y i ¿ = 6 ∑ x i + 4∑ yi = 6.20 + 4.30 = 240
i=1 i=1 i=1
7 7
2 2
ii/ 3∑ x i −2 ∑ y i = 3.420 – 2.280 = 700
i=1 i=1
23
3.2 Properties of measures of central tendency
A good average should be:
3. Easily understood.
4. Simple to compute.
24
Suppose x 1 , x 2 , … , x n are observed values in a sample of size n from a population of size N,
n<N then the arithmetic mean of the sample, denoted by x́ is given by
n
x ∑ xi
x́ = x 1+ 2+¿ …+ x ¿ =
n
i=1
n
n
If we take an entire population the mean is denoted by μ and is given by:
N
X ∑ Xi
μ= X 1 + 2 +¿…+ X ¿ = N
i=1
N
N
Where N stands for the total number of observations in the population.
Example 3.2: Consider the samples given below:
i. 46 54 21 35
ii. 10.5 2.4 3.6 5.9 8.7
Find the arithmetic mean
Solution:
i. The sample values are: 46 54 21 35
n
x1 f 1 + x 2 f 2 +…+ x k f k ∑ xi f i
i=1
x́ = = k
f 1 + f 2+ …+f k
∑ fi
i=1
25
Example 3.3: Calculate the arithmetic mean of the sample of numbers of students in 10 classes:
50 42 48 60 58 54 50 42 50 42
n
The formula for the arithmetic mean for data of this type is
k
x1 f 1 + x 2 f 2 +…+ x k f k ∑ xi f i
i=1
x́ = = k
f 1 + f 2+ …+f k
∑ fi
i=1
∑ xi f i x 1 f 1 + x 2 f 2+ …+ x k f k
i=1
x́ = k
= where x i is the class mark of the i th class; i=1, 2, . . . , k , f i is
f 1+ f 2 +…+ f k
∑ fi
i=1
Example 3.4: The following frequency table gives the height (in inches) of 100 students in a
college.
26
Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total
Frequency (f) 5 18 42 20 8 7 100
Solution:
The formula to be used for the mean is as follows:
k
∑ xi f i
i=1
x́ = k
∑ fi
i=1
Let us calculate these values and make a table for these values for the sake of convenience.
Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total
Frequency (f) 5 18 42 20 8 7 100
Mid-Point ( x i) 61 63 65 67 69 71
f i xi 305 1134 2730 1340 552 497 6558
6
Substituting these values with ∑ f i = 100, we get
i=1
k
∑ xi f i 6558
i=1
x́ = = x́ = = 65.58
k
100
∑ fi
i=1
( x x) 0
i 1
i
( x A)
i
2
• The sum of squares of deviations from the mean is the least. That is, i 1 is minimum
when A x .
27
b) The mean of kx 1 , kx 2 , … , kx n will be k x́.
Merits of Arithmetic Mean
Arithmetic mean has a rigidly defined mathematical formula so that its value is always
definite or unique. It can be calculated for any set of numerical data.
It is calculated based on all observations.
Arithmetic mean is simple to calculate and easy to understand.
It doesn’t need arrangement of data in increasing or decreasing order.
Arithmetic mean of many samples from the same population does not fluctuate
considerably.
It affords a good standard of comparison.
Demerits of Arithmetic Mean
• It can’t be calculated for data which are not quantifiable.
• It is highly affected by extreme (abnormal) values in the series.
• It can be a number which does not exist in the series.
• It can’t be calculated for grouped continuous open-ended classes.
w1 x1 w2 x2 wn xn wi xi
xw
w1 w2 wn wi
Example 3.5: A student’s final mark in Mathematics, Physics, Chemistry and Biology are
respectively A, B, D and C. If the respective credits received for these courses are 4, 4, 3 and 2,
determine the approximate average grade the student has got for the course.
Solution
28
We use a weighted arithmetic mean, weight associated with each course being taken as the
number of credits received for the corresponding course.
xi 4 3 1 2 Total
wi 4 4 3 2 13
x i wi 16 12 3 4 35
w1 x1 w2 x2 wn xn wi xi
xw
w1 w2 wn wi
16+12+3+4 35
= = = 2.69
13 13
Combined mean: When a set of observations is divided into k groups and x́ 1is the mean of n1
observations of group 1, x́ 2 is the mean of n2 observations of group2, …, x́ k is the mean of nk
observations of group k, then the combined mean, denoted by x́ c, of all observations taken
together is given by
x́ x́ 1 n1+ x́ 2n 2 +…+ x́ k n k
c=¿ ¿
n 1+n 2+…+nk
This is a special case of the weighted mean. In this case the sample sizes are the weights.
Example 3.6: In the Previous year there were two sections taking Statistics course. At the end of
the semester, the two sections got average marks of 70 & 78. There were 45 and 50 students in
each section respectively. Find the mean mark for the entire students.
Solution:
29
Geometric mean for individual series: The geometric mean, G.M. of an individual series of
positive numbers x 1 , x 2 , … , x n is defined as the nth root of their product.
G.M n x1 .x2 xn 1
= antilog ( n ∑ logx i )
Solution: a) GM 3 12 36 6 ; b) GM= √ 2 x 4 x 8 ¿ √ 64 = 4
3 3
Example 3.8: Compute the geometric mean of the following values: 3, 3, 4, 4, 4, 5, 6 and 6.
Solution
Values 3 4 5 6
Frequency 2 3 1 2
8 2 3 1 2
G.M. = √ 3 X 4 X 5 X 6 = 4.236
Geometric mean for continuous grouped FD:- The above formula can also be used whenever
the frequency distribution is grouped continuous, class marks of the class intervals are
considered as xi.
30
Harmonic mean for individual series: If x 1 , x 2 , … , x n are n observations, then harmonic mean
can be represented by the following formula:
n
H .M
1 1 1
x1 x2 xn
Example 3.9 A car travels 25 miles at 25 mph, 25 miles at 50 mph, and 25 miles at 75 mph. Find
the harmonic mean of the three velocities.
Solution
n
H .M 3
1 1 1
1 1 1
x1 x2 xn = + + = 40.9
25 50 75
Harmonic mean for discrete data arranged in FD: If the data is arranged in the form of
frequency distribution
n
H .M
f1 f 2 f
m m
x1 x 2 x m , where n f
k 1
k
Harmonic mean for continuous grouped FD: Whenever the frequency distribution are
grouped continuous, class marks of the class intervals are considered as x i and the above formula
can be used as
n m
n
f i where
n f k
H.M. = k 1
∑ xi
i=1
31
ii. For two observations √ x́∗HM ¿ GM
iii. x́ = GM = HM if all observation are positive and have equal value.
3.3.4 Median
The median is as its name indicates the middle most value in the arrangement which divides the
data into two equal parts. It is obtained by arranging the data in an increasing or decreasing order
of magnitude and denoted by~ x.
Median for individual series
We arrange the sample in ascending order of the variable of interest. Then the median is the
middle value (if the sample size n is odd) or the average of the two middle values (if the sample
size n is even).
For individual series the median is obtained by
a/ ~
x = ¿ value if n is odd, and
n th th
n
( ) value+( +1) value
b/ ~
x= 2 2 if n is even
2
Example 3.10: Find the median for the following data.
a/ -5 15 10 5 0 2 1 4 6 and 8
b/ 5 2 2 3 1 8 4
Solution;
i. The data in ascending order is given by:
-5 0 1 2 4 5 6 8 10 15
n=10 n is even. The two middle values are 5 th and 6th observations. So the median
is,
10 th 10 th
~ ( ) +( +1) 5th +6 th 4 +5
x= 2 2 value = = =4.5
2 2
2
Note: The median is easy to calculate for small samples and is not affected by an "outlier".
Median for Discrete data arranged in a frequency distribution:- In this case also, the median
is obtained by the above formula. After arranging the values in an increasing order find the
32
smallest CF greater than or equal to that value obtained by a & b above formula and the
corresponding value is the median.
Median for grouped continuous data:-For continuous data, the median is obtained by the
following formula.
w n
Median L CF ~
x
f med 2
Where: L= the lower class boundary of the median class; w = the class width of the median
class;
f med = the frequency of the median class; and CF the cum. freq. corresponding to the class
preceding the median class. That is, the sums of the frequencies of all classes lower than the
median class. Where the median class is the class which contains the (n/2) th observation whether n
is odd or even, since the items have already lost their originality once they are grouped in to
continuous classes.
Example 3.11: Calculate the median for the following frequency distribution.
C.I 1 - 5 6 - 10 11 – 15 16 – 20 21 - 25 26 - 30 31 - 35 Total
Freq. 4 8 12 6 3 4 3 40
Freq. 4 8 12 6 3 4 3 40
Cuml. Freq. 4 12 24 30 33 37 40
Since n = 40, 40/2 = 20, and the smallest CF greater than or equal to 20 is 24; thus, the median class
is the third class. And for this class, L = 10.5, w = 5, f med =12, CF = 12. Then applying the formula,
we get:
~
x =10.5+(20-12)*5/12=13.8
33
Merits of median
It is less affected by extreme values.
Median can be calculated even in case of open-ended intervals.
It can be computed for ratio, interval, and ordinal level of data.
Demerits of median
Its value is not determined by each & every observation.
It is not a good representative of the data if the number of items (data) is small.
The arrangement of items in order of magnitude is sometimes very tedious process if the
number of items is very large.
Mode of individual series:- The mode or the modal value of individual series (raw data) is simply
obtained by locating the observation with the maximum frequency.
Mode for discrete data arranged in a frequency distribution:-In the case of discrete grouped
data, the mode is determined just by looking to that value (s) having the highest frequency.
34
For grouped data, the mode is found by the following formula:
In such cases, one can only determine the modal class easily: the class with the highest frequency.
1
Mode L w
1 2 , where L = the lower class boundary of the modal class;
immediately preceding the modal class; f 2 = frequency of the class immediately succeeding the
modal class; and fmode = frequency of the modal class.
Example 3.13: Calculate the mode for the frequency distribution of data of example 3.11.
Solution: By inspection, the mode lies in the third class, where L =10.5, fmod = 12, f1=8, f2=6, w = 5
1
Mode L w
1 2 = 10.5 + (12-8)*5/(12-8)+(12-5) = 12.5
Merits of mode
35
The mean is not a resistant measure of central tendency because it is not resistant to the
influence of the extreme data values or outliers.
The median is resistant to the influence of extreme data values or outliers and its value
does not respond strongly to the changes of a few extreme data values regardless of how
large the change may be.
The mode has an advantage over both the mean and the median when the data is
categorical since it is not possible to calculate the mean or median for this type of data.
Also, the mode usually indicates the location within a large distribution where the data
values are concentrated. However, the mode can not always be calculated because if a
distribution has all different data values, then the distribution is non modal.
In the case of symmetrical distribution; mean, median and mode coincide. That is
mean=median = mode. However, for a moderately asymmetrical (non symmetrical)
distribution, mean and mode lie on the two ends and median lies between them and they
have the following important empirical relationship, which is
Mean – Mode = 3(Mean - Median)
Example 3.14: In a moderately asymmetrical distribution, the mean and the mode are 30 and 42
respectively. What is the median of the distribution?
Solution:
Median = (2mean + Mode)/2 = (2*30 + 42)/3 = 34
Hence the median of the distribution is 34.
Which of the Three Measures is the Best?
At this stage, one may ask as to which of these three measure of central tendency is the best.
There is no simple answer to this question. It is because these three measures are based upon
different concepts. The arithmetic mean is the sum of the values divided by the total number of
observations in the series. The median is the value of the middle observations tend to
concentrate. As such; the use of a particular measure will largely depend on the purpose of the
study and the nature of the data. For example, when we are interested in knowing the consumers’
preferences for different brands of television sets or kinds of advertising, the choice should go in
favor of mode. The use of mean and median would not be proper. However, the median can
sometimes be used in the case of qualitative data when such data can be arranged in an ascending
or descending order. Let us take another example. Suppose we invite applications for a certain
36
vacancy in our company. A large number of candidates apply for that post. We are now
interested to know as to which age or age group has the largest concentration of applicants. Here,
obviously the mode will be the most appropriate choice. The arithmetic mean may not be
appropriate as it may be influenced by some extreme values.
3.5 Measures of Non-central Locations
Median is the value of the middle item which divides the data in to two equal parts and found by
arranging the data in an increasing or decreasing order of magnitude, where as quintiles are
measures which divides a given set of data in to approximately equal subdivision and are
obtained by the same procedure to that of median. They are averages of position (non-central
tendency). Some of these are quartiles, deciles and percentiles.
Quartiles: are values which divide the data set in to approximately four equal parts, denoted by
Q 1 , Q 2∧Q 3 . The first quartile (Q 1) is also called the lower quartile and the third quartile ( Q 3) is
the upper quartile. The second quartile (Q 2) is the median.
• Quartiles for Individual series:
Let x1 , x 2 , , x n be n ordered observations. The ith quartile Qi is the value of the item
corresponding
That is, after arranging the data in ascending order, Q1, Q2, & Q3 are, obtained by:
1(n+1) th 2(n+1) th th
3(n+1)
Q 1= ( 4 )
value, Q 2=
4( )
value and Q 3=
4 ( ) value.
• Quartiles for discrete data arranged in a frequency distribution:-Arranged in a frequency
distribution this case also, we will follow the same procedure as the median. That is, we construct
the less than cumulative frequency distribution and apply the formula of quartile for individual
series.
• Quartiles in continuous data:- For continuous data, use the following formula:
w in
Qi L CF
f Qi 4
Where i = 1,2, 3, and L, w ,fQi and CF are defined in the same way as the median.
37
w n w 2n w 3n
i.e. Q1 = L + (
f Q1 4
−CF , ) Q2 = L + (
f Q2 4 )
−CF ∧¿ Q3 = L + (
f Q3 4
−CF )
The class under question is the one including (ixn/4)th value. That is, the class with the minimum
frequency greater than or equal to (ixn/4) th is the class of the ith quartile.
Deciles: are values dividing the data approximately in to ten equal parts, denoted by D 1 , D 2 , …, D 9
.
• Deciles for Individual Series:
Let x1 , x 2 , , x n be n ordered observations. The ith decile ( D¿¿ i)¿ is the value of the item
corresponding
That is, after arranging the data in ascending order, D1, D2, . . . & D9 are, obtained by:
1(n+1) th 2(n+1) th th
9(n+ 1)
D 1= ( 10 )
value, D 2=
10 ( )
value . . . and D 9=
10 ( ) value.
• Deciles for Discrete data arranged in a frequency distribution:-Arranged in a frequency
distribution this case also, we will follow the same procedure as the median. That is, we construct
the less than cumulative frequency distribution and apply the formula of deciles for individual
series.
• Deciles for continuous data: Apply the following formula and follow the procedures of quartile
for continuous data.
w ¿
Di=L+
f D 10
i
(−CF ) ,i = 1, 2,...,9 . Then
Define the symbols in similar ways as we did in the case of quartiles for continuous data.
Percentiles: are values which divide the data approximately in to one hundred equal parts, and
denoted by P1 , P2 , …, P99 .
• Percentiles for Individual Series:
Let x1 , x 2 , , x n be n ordered observations. The ith percentile( P¿¿ i) ¿ is the value of the item
corresponding with the [i(n+1)/100]th position, i = 1, 2, . . . ,99.
38
That is, after arranging the data in ascending order, P1, P2, . . . & P99 are, obtained by:
th
2(n+1) th th
1( n+1) 99(n+1)
P 1= ( 100 ) value, P2= ( 100 ) (
value . . . and P99=
100 ) value.
• Percentiles for Discrete data arranged in a frequency distribution:-Arranged in a frequency
distribution this case also, we will follow the same procedure as the median. That is, we construct
the less than cumulative frequency distribution and apply the formula of percentile for individual
series.
w ¿
Pi=L+ (
f P 100
i
−CF ) ,i = 1, 2,...,99 . Then
Define the symbols similar ways as we did in the case of quartiles or deciles for continuous data.
Interpretations
1. Q i is the value below which ( i × 25) percent of the observations in the series are found (where
i = 1, 2,3). For instance Q 3 means the value below which 75 percent of observations in the given
series are found.
2. Di is the value below which ( i ×10) percent of the observations in the series are found (where
i = 1, 2,...,9 ). For instance D4 is the value below which 40 percent of the values are found in the
series.
3. Pi is the value below which i percent of the total observations are found (where i = 1,
2,3,...,99 ). For example 60 percent of the observations in a given series are below P60.
Example 3.15: Calculate Q1 , Q2 ,Q 3 , D4 , D9 , P40∧P 90for the following data given on the table
below.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Solution: The data is arranged in an increasing order. So we need to construct only the
cumulative frequency table before calculating the required values.
x 10 11 12 13 14 15 16 17 18
39
f 2 8 25 48 65 40 20 9 2
Cum. 2 10 35 83 148 188 208 217 219
Freq.
The total number of observations is 219 which is odd. Clearly then the median is 14. i.e.
~
x = ¿ = ¿value = 110th value = 14
th th
1 ( n+1 ) 1 ( 219+1 )
Q 1= (
4 ) value = (
4) value = 55 value = 13 th
th th
2(n+1) 2 (219+ 1 )
¿ Q =(2 ) value = ( ) value = 110 value = 14 = ~x th
4 4
th th
3(n+1) 3 ( 219+1 )
Q =(
3 ) value = ( ) value = 165 value = 15 th
4 4
th th
4(n+ 1) 4 (219+1)
D =(
4 ) value = ( ) value = 88 value = 14 th
10 10
th th
9(n+ 1) 9(219+1)
D =(
9 ) value = ( ) value = 198 value = 16 th
10 10
th th
40(n+1) 40(219+1)
P =(
100 )
40 value = ( ) value = 88 value = 14 th
100
th th
90(n+1) 90(219+1)
P =(
100 ) ( 100 ) value = 198 value = 16 th
90 value =
Example 3.16: Marks of 50 students out of 85 is given below. Based on the data find Q 1,
D4 ∧P7.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
fi 4 8 15 5 9 5 4
Solution:- first find the class boundaries and cumulative frequency distributions.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
class 45.5-50.5 50.5-55.5 55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5-80.5
boundary
fi 4 8 15 5 9 5 4
Cum. 4 12 27 32 41 46 50
frequency
40
Q1 Measure of (n/4)th value = 12.5th value which lies in group 55.5 – 60.5
w n 5
Q1 = L +
f Q1 4 ( )
−CF = 55.5 + ( 12.5−12 ) = 55.7
15
D4 Measure of (4n/10)th value = 20th value which lies in group 55.5 – 60.5.
w
D4 = L +
f D4 ( 410n −CF ) = 55.5 + 155 ( 20−12) = 58.2
P7 Measure of (7n/100)th value = 3.5th value which lies in group 45.5 – 50.5
w 7n 5
P7 = L + (
f P 7 100 )
−CF = 45.5 + ( 3.5−0 ) = 49.875.
4
Exercise- 3
1. Calculate the median, quartiles, 8th decile, and 75th percentile for the following data.
Show that the value of 75th percentile is the same as that of Q3.
Lifetime (C.M) 50 100 150 200 250 300 350 400
No of Batteries 6 8 13 20 9 6 3 2
2. The following data represent the number of offences for various robberies in a town per a
given day.
No. of robberies 26 34 30 15 10 32 12 25 7
No. of days 13 19 12 30 14 8 19 20 3
Compute the mean, median and mode
3. Calculate Q1, Q2, Q3, D5, D8, and P90 for the following table
Temperature (oF) 50-59 60-69 70-79 80-89 90-99
Days 2 8 20 4 1
4. The following data represent the pulse rates (beats per minute) of nine students 76 60 60
81 72 80 80 68 and 73. Calculate the mean, mode and the third quartile.
5. The number of births in a hospital is given below
Days Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Num. of 50 60 52 55 62 30 40
births
Find the average number of births per day and the mode.
41
6. From the table given below find the mode and 5th decile.
size 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50
Frequency 7 10 13 26 35 22 11 5
7. If the arithmetic mean of two items is 5 and G.M. is 4, find their H.M.
8. The following frequency distribution represents the magnitude of earth quake.
Magnitud 0-0.9 1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6-6.9 7-7.9
e
Frequency 20 50 45 30 10 8 6 1
Compute the median and verify that it is equal to the second quartile and find 72nd percentile.
42
CHAPTER FOUR: MEASURES OF DISPERSION (VARIATION)
4.1 Introduction
Just as central tendency can be measured by a number in the form of an average, the amount of
variation (dispersion, spread, or scatter) among the values in the data set can also be measured.
The measures of central tendency describe that the major part of values in the data set appears to
concentrate around a central value called average with the remaining values scattered
(distributed) on either sides of that value. But these measures do not reveal how these values are
dispersed (spread or scatter) on each side of the central value. The dispersion of values is
indicated by the extent to which these values tend to spread over an interval rather than cluster
closely around an average.
The term dispersion is generally used in two senses. Firstly, dispersion refers to the variations of
the items among themselves. If the value of all the items of a series is the same, there will be no
variation among different items of a series. Secondly, dispersion refers to the variation of the
items around an average. If the difference between the value of items and the average is large,
the dispersion will be high and on the other hand if the difference between the value of the items
and averaging is small, the dispersion will be low. Thus, dispersion is defined as scatteredness or
spreadness of the individual items in a given series.
43
statistical unit in which the original data are given such as kilograms, tones etc.
These measures are suitable for comparing the variability in two distributions
having variables expressed in the same units and of the same averaging size. These
measures are not suitable for comparing the variability in two distributions having
variables expressed in different units.
Absolute
measure of
dispersion
Based on
Based on all
selected items
items
Mean
Range &
deviation &
Inter-quartile Standard
range deviation
Relative measure of
dispersion
Coefficient of mean
Coefficient of
range and deviation &coefficient of
coefficient of standard deviation or
quartile coefficient of variation
deviation
R=L−S
Where R=Range, L= Largest value in a given set of data, S= smallest value in a given set of data.
The difference between upper class limit of the last class and the lower class limit of the
first class, or
The difference between the largest class mark and the smallest class mark, or
The difference between the upper class boundary of the last class and the lower class
boundary of the first class.
The range is used in describing like the maximum change in daily temperature, rainfall, etc.
When the sample size is small, it can be an adequate measure of variation. It is commonly used
in quality control.
Example 4.1: Five students obtained the following marks in statistics:20 , 35 ,25 , 30 , 15. Find the
range and relative range
Range=L−S=35−15=20
L−S 35−15
RR = =0 . 4
= L+S 35+15
Example 4.2: Find out range and relative range of the following given data.
Solution: Here,
45
L = Upper class limit of the largest class = 30
Inter-quartile range and quartile deviation are other measures of dispersion. The difference
between the upper quartile ( Q3 ) and lower quartile ( Q1 ) is called inter-quartile range.
Symbolically,
The inter-quartile ranges covers dispersion of middle 50% of the items of the series. Quartile
deviation, also called semi-inter-quartile range, is half of the difference between the upper and
lower quartile. That is, half of the inter-quartile range. Its formula is
Q 3 −Q 1
Quartile Deviation ( QD ) =
2
The relative measure of quartile deviation also called the coefficient of quartile deviation (CQD)
is defined as:
46
Q 3 −Q1
CQD=
Q 3 +Q1
Example 4.3: Find inter-quartile range, quartile deviation and coefficient of quartile deviation
from the following data.
Solution: First arrange the data in ascending order. 15, 18, 20, 24, 27, 28, 30
th th
n+1 7+ 1
Q1=¿ ( )
4
item=¿ ( ) item
4
th th
n+ 1 7+ 1
Q3=¿ 3 ( )
4
item¿ 3 ( ) item
4
IQR=Q3 −Q 1=28−18=10
Q 3−Q 1 28−18
QD= = =5
2 2
Q 3 −Q1 28−18
CQD= = =0.217
Q 3 +Q1 28+18
Example 4.4: Find inter-quartile range, quartile deviation and coefficient of quartile deviation
from the following data
Marks 2 3 4 5 6 7 8 9
No. Of students 10 11 12 13 5 12 7 5
Solution:
Marks 2 3 4 5 6 7 8 9
No. of students 10 11 12 13 5 12 7 5
47
CF 10 21 33 46 51 63 70 75=N
Q 1= ( N4+ 1 )= 75+1
4
th
=19 item=3
IQR=Q3 −Q 1=7−3=4
Q 3−Q 1 7−3
QD= = =2
2 2
Q 3 −Q 1 7−3
CQD= = =0.4
Q 3 +Q1 7+ 3
Remark: Q.D or CQD includes only the middle 50% of the observation.
Merits of QD
Demerits of QD
It is not based on all the items (it ignores 50% items, i.e., the first 25% and the last
25%).
It is greatly influenced by sampling fluctuations.
It is not amenable to algebraic manipulations.
The mean deviation (MD) measures the average deviation of a set of observations about their
central value, generally the mean or the median, ignoring the plus/minus sign of the deviations.
In other words the mean deviation of a set of items is defined as the arithmetic mean of the
48
values of the absolute deviations from a given average. Depending up on the type of averages
used we have different mean deviations.
The mean deviation of a sample of n observations x1, x2, . . .,xn (individual series)is given
as
MD=
∑|X i− A|
n
Where | X i− A| denotes the absolute value of the deviation. Generally, arithmetic mean and
median are used in calculating mean deviation. So, A stands for the average used for calculating
MD. That is, A=median (~ X ) ∨A=mean( X́).
In case of discrete data arranged in FD and continuous grouped data, the formula for MD
becomes
MD=
∑ f i|X i− A| , where X i is the class mark of the ith class, f i is the frequency of the
n
ith class and n = ∑ f i.
1. The mean deviation about the arithmetic mean is, therefore, given by
MD( X́ )=
∑ |X i− X́|… for ungrouped data (individual series).
n
MD( X́ )=
∑ f i|X i − X́| . . . for discrete data arranged in FD and a grouped continuous
n
frequency distribution; where X i is the value for discrete data arranged in FD and class
mark of the ith class for continuous grouped data, f i is the frequency of the ith class and n
= ∑ f i.
Steps to calculate M.D for ( X́ )
Find the arithmetic mean, X́
Find the deviations of each reading from X́
Find the arithmetic mean of the deviations, ignoring sign.
2. The mean deviation about the median is also given by
MD( ~
X )=
∑ |X i−~x|… for ungrouped data (individual series).
n
49
MD( ~
X )=
∑ f i|X i −~x| . . . for discrete data arranged in FD and a grouped continuous
n
frequency distribution; where X i is the value for discrete data arranged in FD and class
mark of the ith class for continuous grouped data , f i is the frequency of the ith class and n
= ∑ f i.
Steps to calculate M.D (~
X)
Find the median, ~ X
~
Find the deviations of each reading from X
Find the arithmetic mean of the deviations, ignoring sign.
MD( x^ )=
∑ |X i−^x|… for ungrouped data (individual series).
n
MD( x^ )=
∑ f i| X i−^x| . . for discrete data arranged in FD and a grouped continuous
n
frequency distribution; where X i is the value for discrete data arranged in FD and class
mark of the ith class for continuous grouped data, f i is the frequency of the ith class and n
= ∑ f i.
Steps to calculate M.D ( ^x )
Find the mode, ^x
Find the deviations of each reading from ^x
Find the arithmetic mean of the deviations, ignoring sign.
Example 4.5
The following are the number of visit made by ten mothers to the local doctor’s surgery. 8, 6, 5,
5, 7, 4, 5, 9, 7, 4. Find mean deviation about mean, median and mode.
Solution:
First calculate the three averages
~
X́ =6, X =5.5, ^x =5
Then take the deviations of each observation from these averages.
xi 4 4 5 5 5 6 7 7 8 9 Total
50
| X i− X́| 2 2 1 1 1 0 1 1 2 3 14
| X i−~x| 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14
| X i− ^X| 1 1 0 0 0 1 2 2 3 4 14
Since the distribution is ungrouped the mean deviation about mean, median and mode:
MD ( X́ )=
∑ |X i− X́|= 14 =1.4
n 10
~ ∑ |X i −~
x| 14
MD ( X )= = =1.4
n 10
MD ( ^x )=
∑ |X i −^x| = 14 =1.4
n 10
Merits of MD
Demerit of MD
It does not take in to account the signs of the deviations of items from the average.
Remark: Of all the mean deviations taken about different averages or any arbitrary value, the
mean deviation about the median has the smallest value.
The relative measure of mean deviation, also called the coefficient of mean deviation is obtained
by dividing mean deviation by the particular average used in computing mean deviation. Thus,
51
MD( X́ )
CMD( X́ )= where MD is the mean deviation calculated about the arithmetic
X́
mean.
CMD about the median is given by:
~
MD( X )
CMD(~
X )= ~ in which case MD is calculated about the median of the
X
observations.
Example 4.6: Calculate the coefficient of mean deviation about the mean, median and mode for
the data in Example 4.5 above.
Solution:
MD ( X́ ) 1.4
CMD ( X́ ) = = =0.23
X́ 6
MD (~X ) 1.4
CMD (~
X )= ~ = =0.25
X 5.5
MD ( x^ ) 1.4
CMD ( ^x ) = = =0.28
x^ 5
Like the mean deviation, the variance is also based on all observations in a set of data. But
the variance is the average of squared deviations from the mean. Recall that the sum of squared
deviations is minimum only when taken from the mean. Squared deviations are mathematically
manipulated than absolute deviations. Thus, if we averaged the squared deviations from the
mean and take the square root of the result (to compensate for the fact that the deviations were
squared), we obtain the standard deviation. This overcomes the limitation of the mean deviation.
Population Variance (σ 2)
If we divide the variation by the number of values in the population, we get something called the
population variance. This variance is the "average squared deviation from the mean".
For ungrouped data (individual series )
52
N
∑ ( X i −μ)2 N
2
σ = i=1
N
=
1
N [∑
i=1
X i2−N μ 2 ] whereμ is the population arithmetic mean and N is the
σ =
2 ∑ f i ( X i−μ)2 = 1 [ ∑ f X i −N μ ]whereμ is the population arithmetic mean, X i is the class
2 2
i
N N
mark of the ith class, f i is the frequency of the ithclass and N=∑ f i
Sample Variance ( S2)
One would expect the sample variance to simply be the population variance with the population
mean replaced by the sample mean. However, one of the major uses of statistics is to estimate
the corresponding parameter. This formula has the problem that the estimated value isn't the
same as the parameter. To offset this, the sum of the squares of the deviations is divided by one
less than the sample size.
For ungrouped data
n
∑ ( xi −x́)2 n
S= 2 i=1
n−1
=
1
n−1 [∑
i=1
xi2−n x́ 2 ] wherex́ is the sample arithmetic mean and n is the
If the values xi have frequencies fi (i=1,2,…,m), then the sample variance is given by:
1 m
f i xi x
2
2 ∑ f i ( x i− x́)2 = 1 [ ∑ f x 2−n x́ 2 ] S2
S=
n−1 n−1 i i
or n 1 i 1
∑ f i ( x i− x́)2 1
f i x i −n x́ ]wherex́ is the sample arithmetic mean, x i is the class
2 2 2
S= = [ ∑
n−1 n−1
mark of the ith class, f i is the frequency of the ith class and n=∑ f i.
The Standard Deviation
53
There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square
root must be taken.
Population Standard Deviation (s )
2
σ =√ σ 2 where σ is the population variance.
Sample Standard Deviation ( S )
2
S= √ S2 where S is the sample variance.
Example 4.7: Find the sample variance and standard deviation of:
xi 2 4 5 6 8
fi 2 2 3 1 2
1
S2 = [ ∑ f i x i2−n x́ 2 ]
n−1
1 49 2 1
=
9 [ ]
279−10( ) = ( 38.9 ) =4.32 ,∧S=√ 4.32=2.08.
10 9
Example 4.8: Find the sample variance and standard deviation for the distribution:
54
Freq. 4 1 2 3
Solution: In a continuous F.D., xi is the class mark representing the ith class.
C.I xi fi f i xi 2
f i xi
1-5 3 4 12 36
6-10 8 1 8 64
11-15 13 2 26 338
16.20 18 3 54 972
∑ f i x i = 100 =10 ,
Where, n=∑ f i=10 , x́ =
n 10
∑ f i x i2=1410, so that
1 1
S2 = [ ∑ f i x i2−n x́ 2 ] = [ 1410−10 (10 )2 ]
n−1 9
410
¿ =45.56 ,
9
S= √ 45.56 = 6.75.
1. If a constant is added to (or subtracted from) all the values, the variance remains the
Example 4.9 Consider the 6 sample values xi: 54,52,53,50,51, and 52.
55
1. If each and every value is multiplied by a non-zero constant (k), the standard deviation is
2. Both the variance and the standard deviation give more weight to extreme values and
less to those which are near to the mean.
Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).
Of course, standard deviation is an absolute measure of dispersion that expresses the variation in
the same unit as the original data but it can not be the sole basis for comparing two distributions.
For instance, if we have a standard deviation of 10 and a mean of 5, the values vary by an
amount twice as large as the mean itself. If, on the other hand, we have a standard deviation of
10 and a mean of 5000, the variation relative to the mean is significant. Therefore, we cannot
know the dispersion of a set of data until we know the standard deviation, the mean, and how the
standard deviation compares with the mean.
Coefficient of variation is used in such problems where we want to compare the variability of
two or more different series. Coefficient of variation is the ratio of the standard deviation to the
arithmetic mean, usually expressed in percent.
Standard deviation
CV = ×100 %
mean
56
Example 4.10: Last semester, the students of Mathematics and Chemistry Departments took
Introduction to Statistics course. At the end of the semester, the following information was
recorded.
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Mathematics Departments Chemistry Departments
S S
CV = ×100 CV = ×100
x́ x́
25 12
= ×100 = ×100
85 65
= 29.41% = 18.46%
Interpretation: Since the CV of Mathematics Department students is greater than that of
Chemistry Department students, we can say that there is more dispersion relative to the mean in
the distribution of Mathematics students’ scores compared with that of Chemistry students.
4.4 Standard Scores (Z-Scores)
A standard score for sample value in a data set is obtained by subtracting the mean of the data set
from the value and dividing the result by the standard deviation of the data set. Basically, the
standard score (z-score) tells us how many standard deviations a specific value is above or below
the mean value of the data set. That is, the z-score is the number of standard deviations the data
value falls above (positive z-score) or below (negative z-score) the mean for the data set.
X −μ
Z score=
σ
X − X́
Z score=
S
57
Example 4.11: What is the Z-score for the value of 14 in the following sample data set?
3 8 6 14 4 12 7 10
Solution:
14−8
X́ = 8, SD = 3.8173 thus, Z = ≈ 1.57.
3.8173
The data value of 14 is located 1.57 standard deviations above the mean 8 because the z-
score is positive.
Example 4.12: Suppose that a student scored 66 in Statistics and 80 in Mathematics. The score
of the summary of the courses is given below.
Course Average score Standard deviation of the score
Statistics 51 12
Mathematics 72 16
In which course did the student scored better as compared to his classmates?
Solution:
X−μ 66−51 15
Z-score of student in Statistics: Z= = = =1.25
σ 12 12
X−μ 80−72 8
Z-score of student in Mathematics: Z= = = =0.5
σ 16 16
From these two standard scores, we can conclude that the student has scored better in Statistics
course relative to his classmates than in Mathematics course.
The measures of central tendency and variation discussed in previous one do not reveal the entire
story about a frequency distribution. Two distributions may have the same mean and standard
deviation but may differ in their shape of the distribution. Further description of their
characteristics is necessary that is provided by measures of skewness and kurtosis.
4.5.1 Moments
58
Moments are statistical tools used in statistical investigation. The moments of a distribution are
the arithmetic mean of the various powers of the deviations of items from some number. In our
course, we shall use it in the study of Skewness and Kurtosis of statistical distribution.
M r=
∑ X ir
n
Where r =0 ,1 , 2 ,3 , …
Moments about the origin for grouped frequency distribution and for ungrouped frequency
distribution is
M r=
∑ f i X ir
n
Where f i is the frequency of X i . X i is the midpoint in the case of grouped frequency distribution
or class value in the case of ungrouped frequency distribution.
r
'∑ ( X i− X́ )
M=r
n
Moments about the mean for grouped frequency distribution and for ungrouped frequency
distribution.
r
'∑ f i ( X i− X́ )
M=r
n
Where f i is the frequency of X i . X i is the midpoint in the case of grouped frequency distribution
or class value in the case of ungrouped frequency distribution.
59
Moments about any arbitrary constant A
r
∑'( X i− A )
M= r
n
Moments about any arbitrary constant A for grouped frequency distribution and for ungrouped
frequency distribution
∑ f i ( X i− A )r
M= '
r
.
n
Example 4.13: Find the first four moments about the mean for the following individual series
X i: 3 6 8 10 18
Solution: n=5,
S.No Xi ( X i− X́ ) ( X i− X́ )
2
( X i− X́ )
3
( X i− X́ )
4
1 3 -6 36 -216 1296
2 6 -3 9 -27 81
3 8 -1 1 -1 1
4 10 1 1 1 1
5 18 9 81 729 6561
Total ∑ X=45 ∑ ( X− X́ )=0 ∑ ( X− X́ ) =128 ∑ ( X− X́ ) =486∑ ( X− X́ )4 =7940
2 3
Thus,
45 X −9 1 X −9 2 X −9 3
X́ = =9 , M '1= ∑ ( i ) =0, M '2= ∑ ( i ) = 128 =25.6, M '3= ∑ ( i ) = 486 =97.2
5 5 5 5 5 5
4
' ∑ ( X i−9 ) 7940
M 4 = = =1588
5 5
60
4.5.2 Skewness
61
How to check the presence of skewness in a distribution?
Measures of skewness (α 3)
A measure of skewness gives a numerical expression for and the direction of asymmetry in a
distribution. It gives information about the shape of the distribution and the degree of variation
on either side of the central value. The three most commonly used measures of skewness are
Pearson’s coefficient of skewness, Bowley’s coefficient of skewness and coefficient of skewness
based on moments.
M '3 M '3
α 3= 3/ 2 =
M' 2 σ3
n
r
Where, M'r = ∑ (x i−x́) /n
i=1
62
The shape of the curve is determined by the value of α 3
α 3> 0, the distribution is positively skewed/skewed to the right, i.e mode < median <mean
smaller observations are more frequent than larger observations. i.e., the majority of
α 3 < 0, the distribution is negatively skewed/skewed to the left. i.e., mean < median < mode
smaller observations are less frequent than larger observations. i.e., the majority of
4.5.3 Kurtosis
Measures of Kurtosis (α 4
63
M '4 M '4
α 4= 2 =
M' 2 σ4
The peakedness depends on the value of α 4
α 4 > 3 the curve is leptokurtic,
Exercise 4
1. Calculate the mean deviation about the mean, median and mode, and their coefficients and
also variance and standard deviation for the following data.
Size of shoes 3 6 11 2 4 10 5 7 8 9
No. of pairs sold 10 15 25 6 4 3 2 8 9 4
64
2. An analysis of the monthly wages paid (in birr) to workers in two firms A and B belonging to
the same industry gives the following results.
Value Firm A Firm B
Mean wage 52.5 47.5
Variance 100 121
In which firm A or B is there greater variability in individual wages?
3. A meteorologist interested in the consistency of temperatures in three cities during a given
week collected the following data. The temperatures for the five days of the week in the three
cities were
City 1: 25, 24, 23, 26, 17
City 2: 22, 21, 24, 22, 20
City 3: 32, 27, 35, 24, 28
Which city have the most consistent temperature, based on these data?
4. Some characteristics of annually family income distribution (in Birr) in two regions is as
follows:
Region Mean Median Standard deviation
A 6250 5100 960
B 6980 5500 940
65
CHAPTER FIVE: ELEMENTARY PROBABLITY
Objectives
After studying this chapter, you should be able to:
Understand the fundamental concepts of probability.
Apply the principle of counting techniques to solve real problem.
Define some basic terms of probability.
5.1 Definition of some probability terms
Experiment: Any process of observation or measurement or any process which generates
well defined outcome.
Random experiment: it is an experiment which can be repeated any number of times under
the same conditions, but does not give unique results. The result will be any one of several
possible outcomes, but for each trial, the result will not be known in advance. A Random
experiment is also called a trial & the outcomes are called events.
Sample space: - is the collection of all possible out comes or sample points of a random
experiment.
Sample point: -Each element of sample space is called Sample point.
Event: - is a subset of a sample space i.e. an event is a collection of sample points.
Impossible event:- this is an event which will never occur.
Example 5.1: In an experiment of rolling a fair die, S = {1, 2, 3, 4, 5, 6}, each sample point is an
equally likely out come. It is possible to define many events on this sample space as follows:
Example 5.2
If we toss a coin the sample space (S) of this experiment S = {head, tail} where head and tail are
two faces of a coin. If we are interested the outcome of head will turn up then the event E=
{head}.
Example 5.3: Find the sample space of tossing a coin three times.
66
S= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Mutually exclusive event: - two events A and B are said to be mutually exclusive if there is
no sample point which is common to A and B. i.e. A ∩ B = ∅
Independent event: two or more events are said to be independent if the occurrence or non-
occurrence of an event does not affect the occurrence or non-occurrence of the other.
Dependent Events: Two events are dependent if the first event affects the outcome or
occurrence of the second event in a way the probability is changed.
Complement of an Event: the complement of an event A means nonoccurrence of A and is
denoted by A', or Ac contains those points of the sample space which don’t belong to A.
Equally likely outcomes: if each outcome in a sample space has the same chance to be
occurred.
Example 5.4: Casting a fair die all possible outcomes are equally likely.
5.2 Counting rules: addition, multiplication, Permutation & Combination rule
In order to calculate probabilities, we have to know
The number of elements of an event.
The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.
In order to determine the number of out comes one can use several rules of counting:
1. The addition rule
2. The multiplication rule
3. Permutation rule
4. Combination rule
1. The addition Rule
Suppose that a procedure, designated by 1, can be done in n 1 ways. Assume that a second
procedure designated by 2, can be done in n2 ways. Suppose furthermore, that it is not possible
that both 1 and 2 done together. Then, the number of ways in which we can do1 or 2 is n1 +n 2
ways.
Example 5.5: suppose we are planning a trip to some place. If there are 3 bus routs & two train
routs that we can take, then there are 3+2=5 different routs that we can take.
2. Multiplication rule: If an operation consists of k steps and the 1 st step can be done in n1 ways,
the 2nd step can be done in n 2 ways (regardless of how the 1st step was performed), the k th step
67
can be done in nk ways, (regardless of how the preceding steps were performed), then the entire
operation can be performed in n1 · n2 ·… · nk ways.
Example 5.6: Suppose that a person has 2 different pairs of trousers and 3 shirts. In how many
ways can he wear his trousers and shirts?
Solution: He can choose the trousers in n1 =2 ways, and shirts in n2 =3 ways. Therefore, he
68
Solution:
Assume 3 persons Aster (A), lemma (L), Yared (Y) and n=3
Since n! =3! = 3*2! = 6, there are 6 possible arrangement ALY, AYL, LAY, LYA,YLA and
YAL.
Example 5.9: Suppose we have a letters A,B, C, D & E
a) How many permutations are there taking all the four?
b) How many permutations are there taking two letters at a time?
Solution:
a) Here n = 5, there are four distinct object.
There are 5! = 120 permutations.
b) Here n = 5, r = 2
There are 5P2 = 5!/(5-2)! = 120/6 = 20 permutations.
Example 5.10: Fifteen Ethiopian athletes were entered to the race. In how many different ways
could prizes for the first, the second and the third place be awarded?
Solution
15 objects taken 3 at a time 15P3=15!/(15-3)! = 2730 ways.
4. Combination-A selection of objects considered without regard to order in which they occur is
called Combination. The number of combination of n different objects taking r of them at a time
n!
is
(nr )= r !(n−r)!
n Cr =
, for r=0,1,2,⋯,n .
Example 5.11: Given the letters A, B, C, and D list the permutation and combination for
selecting two letters.
Solution:
Permutation Combination
AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD DC
Note that in permutation AB is different from BA but in combination AB is the same as BA.
69
Example 5.12: In a club containing 7 members a committee of 3 people is to be formed. In how
many ways can the committee be formed?
n! 7!
Solution: 7C3 =
(nr )= r !(n−r)!
n Cr =
7 C3 = (73)= 3!(7−3 )! = 35
Example 5.13: How many four-digit numbers can be formed with the 10 digits 0,1,2, . . ,9 if
a/ repetitions are allowed
b/ repetitions are allowed, and
c/ the last digit must be zero & repetitions are not allowed.
Solution:
a/ the first digit can be any one of 9 (since 0 is not allowed). The second, third and fourth digits
can be any one of 10. Then 9.10.10.10=9000 numbers can be formed.
b/ the first digit can be any one of 9 & the remaining three can be chosen in 9 P3 ways. Thus 9.
c/ the first digit can be chosen in 9 ways & the next two digits in 9 P2 ways. Thus 9. 8 P2 =
504 numbers can be formed.
5.3 Probability of an event
Definition: Probability is a numerical measure of the chance or likelihood that a particular event
will occur & it lies in the range from 0-1, inclusive. Probability is a building block of inferential
statistics.
Definition: Let E be an experiment. Let S be a sample space associated with E. With each event
A in S we associate a real number designated by P (A) and called the probability of A.
Generally probability can be divided into two
i) Subjective probability: - probability determined based on individual’s own judgment,
experience, information, belief . . . is called Subjective probability.
ii) Objective probability: - the probability of an event in a certain experiment based on
experimental evidence.
Basic approaches to probability
There are three different conceptual approaches to the study of probability theory.
These are:
1. The classical approach.
70
2. The frequentist approach.
3. The axiomatic approach.
1. Classical approach:
Definition: If there are n equally likely outcomes of an experiment, and out of the n outcomes
event A occur only k times the probability of the event A is denoted by P (A) is defined as
¿ n(A) k
p(A) = Number of outcomes favorable ¿ event A Total number of outcomes = =
n(S) n
Note: Classical approach of measuring probability fails to answer for the following conditions:
If total number of outcomes is infinite or if it is not possible to enumerate all elements of the
sample space.
If each out come is not equally likely.
Example 5.14: Compute a/ the probability of having two boys & one girl is a three child family
using the classical method, assuming boys & girls are equally likely.
b/ using (a) compute the probability of having three boys in a three-child family.
c/ using (a) compute the probability of having three girls in a three –child family.
d/ using (a) compute the probability of having two girls & one boy in three child family.
Solution
71
For the event D= ''two girls & one boy'' = {BGG, GBG,GGB}, we have n(A)=3,Since
the outcome are equally likely, the probability of D is P(D)= n(D)/n(S)=3/8 =0.375.
Example 5.15: A box of 80 candles consists of 30 defective and 50 non defective candles. If 10
of these candles are selected at random with out replacement, what is the probability
a) all will be defective?
b) 6 will be non-defective?
c) all will be non-defective?
Solution
Total Selection:
(8010 )=N=n(S )
a) Let A be the event that all will be defective.
n( A)
P (A) ) = n (S) =
(3010 )∗¿ ¿ (500 )/¿ ¿ (8010 )= 0.00001825
n( A)
P (A) ) = n (S) =
(304 )∗¿ ¿ (506 )/¿ ¿ (8010 )= 0.265
n( A)
P (A) = n (S) =
(300 )∗¿ ¿ (5010 )/¿ ¿ (8010 )= 0.00624.
72
Definition: Suppose we do again and again a certain experiment n times and let A be an event of
the experiment and let k be the number of times that event A occurs. Therefore the probability of
the event A happening in the long run is given by:
Number of ×event A has occured k
P(A) = =
Total number of observations n
In other words given a frequency distribution, the probability of an event (A) being
Frequency of class A
in a given class is P(A) =
Total frequency ∈t h e distribution
Example 5.16: The national center for health statistics reported that of every 539 deaths in
recent years, 24 resulted that from automobile accident, 182 from cancer, and 353 from other
disease. What is the probability that particular death is due to an automobile accident?
Solution
P (automobile) = death due to automobile /total death =24/539 = 0.445
The probability that particular death is due to an automobile accident is 0.445.
3. The axiomatic approach.
Let E be a random experiment and S be a sample space associated with E. With each event A a
real number called the probability of A satisfies the following properties called axioms of
probability or postulates of probability.
1. 0≤ P (A )≤ 1
2. P(S) =1, S is the sure/certain event.
3. If A1 and A2 are mutually exclusive events, the probability that one or the other occur equals
the sum of the two probabilities. i. e. P(A1∪A2)=P(A1)+P(A2)
n
Similarly P(A1∪A2∪ . . . An) = P(A1)+P(A2) +. . . +P(An) = ∑ A i
i=1
73
P(S) = P (AUA') = P (A') + P (A) and P(S) = 1
1= P (A') + P (A) => P (A') = 1-P (A)
Rule 2: let A and B are events of a sample space S, then
P (A' ∩ B) = P (B) - P (A ∩ B)
Proof: B =S ∩ B = (AUA') ∩ B = (A∩ B) U (A'∩ B)
If A∩B ≠ø , then P(B) =P (A∩ B) +P (A' ∩ B)
P (A' ∩ B) = P(B) – P(A ∩ B).
Rule 3: Suppose A and B are two events of a sample space, then
P(AUB) = P(A) + P(B) – P(A ∩ B)
Proof:
(AUB) = AU(A' ∩ B), A and A' ∩ B are disjoint sets
∴ P(AU B) = p(A) + p(A' ∩ B) . . . .*
But we have already proved that P (A’ n B) = P (B) – P (A ∩ B)
Put this in equation *
P(A U B) = P(A) + P (B) – P (A ∩ B)
Example 5.17: A fair die is thrown twice. Calculate the probability that the sum of spots on the
face of the die that turn up is divisible by 2 or 3.
Solution
S={(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),(3,2),(3,3),(3,4),(3,5),
(3,6),(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,4),(5,5),(5,6),(6,1),(6,2),(6,3),(6,4),
(6,5),(6,6)}
This sample space has 6*6 =36 elements let A be the event that the sum of the spots on the die is
divisible by 2 and B be the event that the sum of the spots on the die is divisible by three, then
A = {(1,1), (1,3), (1,5), (2,2), (2,4), (2,6), (3,1), (3,3), (3,5), (4,2), (4,4), (4,6), (5,1), (5,3), (5,5),
(6,2), (6,4), (6,6)}
B = {(1,2), (1,5), (2,1), (2,4), (3,3), (3,6), (4,2), (4,5), (5,1), (5,4), (6,3), (6,6)}
A∩B = {(1, 5), (2,4), (3,3), (4,2), (5,1), (6,6)}
P (A or B) = P (A U B)
= P (A) +P (B) – P (A∩B)
= 18/36 + 12/36 -6/36 = 24/36 = 2/3
5.5 Conditional Probability and Independence
74
5.5.1 Conditional Probability
If A and B are events. Conditional probability of A given B means the probability of occurrence
of A when the event B has already happened.
It is denoted by P (A/B) and is defined by
P (A/B) = P(A ∩ B)/P (B), if P (B)≠0
Conditional probability of B given A means the probability of occurrence of B when the event A
has already happened. It is denoted by P (B/A) and is defined
P (B/A) = P(A ∩ B)/P (A), if P (A)≠0
P (A ∩ B) = P (A) P (B/A) = P (B) P (A/B).
5.5.2 Multiplication Law of Probability
If A and B are events in a sample space S, then
P (A ∩ B) = P (A) P (B/A), P (A) ≠ 0
P (A ∩ B) = P (B) P (A/B), P (B) ≠ 0
Where P (B/A) represents the conditional probability of B given A and P (A/B) represents the
conditional probability of A given B.
Note: Extension of multiplication law of probability for ‘n’ events A1, A2, …, An we have
P (A1 ∩ A2 ∩ …∩An) = P (A1) P (A2/A1) p (A3/A1 ∩ A2)…P(An/A1∩ A2 ∩ …∩An-1)
Example 5.18: A coin is tossed twice. If it is already known that the first coin has thrown a head,
what is the probability of getting two heads?
Solution:
S = {HH, HT, TH, TT}, A = the first shows a head = {HH, HT}, B= two heads occur ={HH}
P (B/A) = P(A ∩ B)/ P(A)
But A ∩ B ={HH}, P(A ∩ B) =1/4, P(A)=1/2, therefore, P (B/A) = P(A ∩ B)/ P(A) = 1/2
Example 5.19: Let A and B are events such that P (A U B) = ¾, P (A ∩ B) = ¼ and P(A' ) = 2/3.
Find P (A'/B)
Solution:
P(A') = 2/3 P (A) = 1- P(A') = 1-2/3 = 1/3
Now, P (A U B) = P (A) + P (B) - P (A ∩ B)
3/4 = 1/3 + P (B) – ¼
P(B) = 3/4 - 1/3 + ¼ = 2/3
Therefore, P (A/B) = P (A ∩ B)/P(B) = 3/8 P(A'/B) =1-P (A/B) = 1-3/8 =5/8.
75
5.5.3 Probability of Independent Event
Two events A and B are said to be independent if the occurrence of A has no bearing on
occurrence of B. That means knowledge of A has occurred given no information about the
occurrence of B. Two events, A and B, are said to be independent if P(A∩B) =P(A)P(B).
Suppose A and B are independent events with 0<P (A) <1 and 0<P (B) <1. The following
statements true:
i. A' and B' are independent, ii. A and B' are independent, iii. A' and B are independent
iv. P(B|A) = P(B), v. P(B|A') = P(B)
Example 5.20: A box contains four black and six white balls. What is the probability of getting
two black balls in drawing one after the other under the following conditions?
a. The first ball drawn is not replaced
b. The first ball drawn is replaced
Solution
Let A= first drawn ball is black
B= second drawn is black
Required P (A n B)
a. P (A ∩ B) = P (B/A) P(A) = (4/10) (3/9) = 2/15
b. P (A ∩ B) = P (A) P (B) = (4/10) (4/10) = 16/100 = 4/25.
5.6 Total probability and Bayes’ Theorem
Total probability:-If events B1, B2, …,& Bk constitute a partition of the sample space S &
Example 5.21: In a factory, machines A 1, A2, A3 manufactures 25%, 35%, 40% of the total
output respectively. Out of their products 5%, 4% & 2% are, respectively defective. An item is
drawn at random from the products is found to be defective. What is the probability that
defective item is produced by all machines?
So/n: p(A1)=0.25, p(A2) = 0.35, p(A3) = 0.40, P(D/A1)= 0.05, P(D/A2) = 0.04, P(D/A3) =0.02
76
Bayes’ Theorem:- If B1, B2, …,& Bk are events which make an exhaustive partition of the
sample space S, if A is any event in S, then the conditional probability of Bi given that A has
P(B i )×P ( A / B i )
P( B i / A )= k
∑ P( Bi )×P( A / Bi )
already occurred is: i
Example 5.22: Based on the above example, what is the probability that it was manufactured by
machine A1?
P( A1 )×P( D / A1 )
P( A 1 /D )= k
∑ P( A i )×P( D/ A i )
Sol/n:- i = (0.25)(0.05)/0.0345 = 0.3623
Exercise 5
1. A fair die is tossed once. What is the probability of getting?
a) Number 4? b) An odd number? c) Outcome of at least number 4? d) Number 8?
2. In how many ways can 10 people be seated on a bench if only 4 seats are available?
3. A committee of 5 is formed by drawing lots from 8 boys and 6 girls. Find the probability that
the committee will consist of 2 boys and 3 girls.
4. A box contains 6 red, 4 white and 5 black balls. A man draws 4 balls from the box at random.
Find the probability that among the balls drawn there is at least one ball of each color.
5. How many four-digit numbers can be formed with the 6 digits 1,2,. . .,6, if
a/ repetitions are allowed b/repetitions are not allowed.
6. The probabilities that A and B solve a given problem independently are 2/3 and 3/5
respectively. If both of them attempt the problem, find the probability that the problem will be
solved.
7. A bag contains 15 items of which 4 are defective. The items are selected at random one by one
and examined. The ones examined are not put back. What is the chance that 10th one examined is
the last defective?
77
8. A company has two machines M1 and M2. M1 produces 60% of its product and M2 produces
40% of its product. M1 produces 5% defective units and M2 produces 4% defective units. A unit
is selected at random from the whole product.
a/Find the probability that it is defective. b/ What is the probability that it was manufactured by
machine M2.
78
CHAPTER SIX
PROBABILITY DISTRIBUTION
The purpose of this unit is to introduce you with the concept of random variable and their
probability distributions. In a probability distribution, the variables are distributed according to
some definite probability function. In the previous unit we have discussed the concept of
probability. The different rules of probability and frequency distributions were also discussed. In
this unit we utilize this information to understand the discrete and continuous probability
distributions. Moreover, the concept of mathematical expectation is discussed.
Definition: A variable whose values are determined by chance with associated probabilities is
called a random variable. It is a quantity which in different observations can assume different
values.
In any experiment of chance, the outcomes occur randomly. For example, the total score when a
pair of dice is rolled, the number of heads when a coin is tossed several times, annual household
income, and so on are examples of random variables (or stochastic variables).
Random variables are usually denoted with capital letter X, Y, Z etc, while the values taken by
them are denoted by lower case letters x, y, z etc. Thus, P (x 1 X x2) is the probability that the
random variable X takes values between x1 and x2, both inclusive. A random variable can be
discrete or continuous.
79
6.1.1 Discrete Random Variable
If the random variable X can assume only a particular finite or countably infinite set of values, it
is said to be a discrete random variable. For example, if you throw a die, the outcome X is a
random variable, which can assume only the values 1, 2, 3, 4, 5 and 6.
Example 6.1: Consider an experiment of "flipping a fair coin 3 times". List the elements of the
sample space that are assumed to be equally likely (as this is what is meant by a fair or balanced
coin) and the corresponding values x of the r-v X, the number of heads observed.
Solution: If H stands for heads and T for tails, then the sample space corresponding to this
experiments is S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
Since X= the number of heads observed, the results are shown in the following table:
A random variable X is said to be continuous if it can take all possible values (integral as well as
fractional) between certain limits. Continuous random variables occur when we deal with
quantities that are measured on a continuous scale. For instance, the life length of an electric
bulb, the speed of a car, weights, heights, and the like are continuous. In such cases, probabilities
80
are associated with intervals or regions of a continuous random variable, and not with individual
points.
A probability distribution shows the possible outcomes of an experiment and the probability of
each of these outcomes. That is, probability distribution is a complete list of all possible of
values of a random variable and their corresponding probabilities.
A formula giving the probability of the different values of the random variable X for:
Discrete variable is the probability massy function (pmf) and is usually denoted by p(x).
If X is a discrete random variable taking at most a countably infinite number of values x 1,
x2, …, then P (xi) = P(X = xi): i= 1, 2 …is called the probability mass function of random
variable X. The set of ordered pairs {xi, P (xi)} i= 1, 2 … gives the probability
distribution of the random variable X. The numbers P (x i): i= 1, 2…must satisfy the
following conditions.
i) P(xi) ≥0
∞
∑ P( X =x i )
ii) i=1 =1
Continuous variable is the probability density function (pdf) and is usually denoted by
f(x). A random variable, X, is said to be a continuous random variable if there is a non–
negative function, f,
x
F(x) = ∫ f (t )d t
t =−∞
The function f is called probability density function of X. And it satisfies the following
conditions.
i) f(x)≥0 for all x, -∞ < x < ∞
81
∞
ii) ∫ f ( x ) d x=1
−∞
∫ f ( x)dx
P(a X b) = a . The integration from a to b in the case of the continuous variable is
analogous to the summation of probabilities in the discrete case.
Example 6.4: A continuous random variable X has a probability density function given by
1 1
x+ ,
f(x) = 4 2 0 X 1.
82
Find the probability that X lies between the interval 0 and 1.
∫ ( 14 x+ 12 ) dx= 18 x 2+ 12 x|10 = 18 + 12 = 58
Solution: 0
The objective of this section is to introduce you with the most common parameters of probability
distributions. There are some summary measures in terms of which we can summarize the
behavior of probability distributions. The most common of these are the average called expected
value and dispersion about the average called the variance.
6.3.1 Expectation
The averaging process, when applied to a random variable is called expectation. It is denoted by
E(X) or and is read as the expected value of X or the mean value of X.
Case 1: For discrete random variable
Suppose X is a discrete random variable which takes on values in a finite set x1, x2,…, xn with
probabilities P(xi) = P[X = xi] i= 1, 2, …n, then Expected value of X, E(X) of the discrete
random variable is given by:
n
∑ x i P( xi )
E(x) = = i=1
Properties of Expectation
83
If X and Y are random variables and a, b are constants then:
1. E(k) = k, where k is any constant
2. E (kX) = k E(X), where k is any constant
3. E (X + k) =E(X) + k
4. E(X + Y) = E(X) +E(Y)
5. E(XY) = E(X) E(Y), if X, Y are independent random variables
6. E(X) ≥ 0, if X ≥ 0.
7. |E(X)| ≤ E(|X|)
8. |E(XY)2| ≤ E(X2) E(Y2).
6.3.2 Mean and variance of a random variables
Mean of X = E(X)
2
Variance of X =σ x 2=E ( X 2 ) −[ E ( X ) ]
= E [X −E( X)]2
Case 1:
If X is a discrete random variable with expected value μ then the variance of X, denoted by Var
(X), is defined by:
σ x 2=¿Var(X) = E(X-μ)2 = E(X2) – μ2
n
2 2
= ∑ (x ¿¿ i) P ( x i )−μ ¿
i=1
n
2
Alternatively, Var(X) = ∑ (x ¿ ¿ i−μ x ) P ¿ ¿ ¿
i=1
Case 2:
If X is a continuous random variable, then var (X),
∞
σ x 2=∫ ( x−x́ )2 f x (x) dx
−∞
Properties of Variances
For any random variable X and constant a, it can be shown that
- Var(aX) = a2Var(X)
- Var(X + a) = Var(X) +0 = Var(X)
If X and Y are independent random variables, then
Var(X + Y) = Var(X) + Var(Y)
84
More generally if X1, X2 ……, Xk are independent random variables, then
Var (X1 +X2 + …..+ Xk) = Var (X1) +Var (X2) +…. + var (Xk)
k k
i.e., Var (∑ )
i=1
x i =∑ Var (x i )
i=1
Example 6.5: Two fair coins are tossed. Determine Var (X) where X is the number of heads that
appear.
a) Use the definition of the variance.
b) Use the fact that the variance of the sum of independent variables is equal to the sum of the
variance.
Solution:
a) Let X is number of heads with possible values 0,1and 2. The Sample space consists of {HH,
TH, HT,TT}
P (X = 0) =¼, P (X = 1) = ½, P(X=2) = ¼
E (X) = 0.P(X=0) +1.P (X=1) +2P(X=2)
= 0 (1/4) + 1(1/2) +2(1/4)
= 1.
E(X2) = 02P(X=1) +12.P(X=1) +22P(X=2)
= 0(1/4) + 1(1/2) +4(1/4)
= 3/2.
Implies that, Var (X) = E(X2) – μ2 = 3/2-1=1/2
b) Let X be head on the first coin with possible values 0 and 1
Y be head on the second coin with possible values 0 and 1.
P(X= 0) = ½, P (X = 1) = ½ and P (Y=0) = ½, P(Y=1) = ½
E(X) = 0.P(X=0 + 1.P(X=1) E(Y) = 0.P(Y=0) +1P(Y=1)
85
= 0(1/2) +1(1/2) = 0(1/2) +1(1/2)
= 1/2 = 1/2
E(X2) = 02 .P(X=0) +12.P(X=1) E (Y2) = 02.P(Y=0) +12P(Y=1)
= 0(1/2) +1(1/2) = 0(1/2) +1(1/2)
=1/2 =½
Var (X) = E (X2) – μ2 Var (Y) = E (Y2) - μ2
= ½ - (1/2)2 = ¼ = ½ - (1/2)2 = ¼
X and Y are independent (i.e. the outcome of one coin does not influence the outcome of the
second)
Var (X+Y) = Var (X) +Var (Y) = 1/4 +1/4 = ½ .
2
x
Example 6.6: Compute the variance of f(x) = 9 for 0 < x < 3
3 3
x2 x4 1 x 5 3 27
∫
E(x2) = 0
x
9
2
( )
dx=∫ dx=
0 9
|=
9 5 0 5 [ ]
3
x2 1 x4 3 9
E(x) =
∫x
0
( ) 9
dx= [ ]
|=
9 4 0 4
2
27 9
= 5
−
4 () = 0.34
The origin of binomial distribution is Bernoulli's trial. Bernoulli's trial is an experiment where
there are only two possible outcomes, “success" or "failure". In connection with this trial, a
success may be getting heads with a balanced coin; it may be passing an examination. Whenever
we face such experiment, we use binomial distribution under the assumptions stated below. Any
experiment can also be turned into a Bernoulli trial by defining one or more possible results
86
which we are interested as ‘‘Success” and all other possible results as “Failure”. For instance,
while rolling a fair die, a "success" may be defined as "getting even numbers on top" and odd
numbers as "Failure".
Generally, the sample space in a Bernoulli trial is S = {S, F}, S = Success, F = failure.
87
Another important discrete probability distribution is the Poisson distribution. It is a discrete
probability distribution which is used in the area of rare events. The Poisson distribution counts
the number of success in a fixed interval of time or within a specified region.
Examples of random variables that usually obey the Poisson distribution are:
The number of car accidents in a day.
Arrival of telephone calls over interval of times.
The number of misprints on a typed page (a group of pages) of a book.
Natural disasters like earth quake.
The number of suicides reported by a particular city.
The number of customers entering a post office on a given day.
To apply the Poisson distribution, two conditions must be met:
i) The number of success that occurs in any interval is independent of those that occur
in other non-overlapping intervals.
ii) The probability of a success in an interval is proportional to the size of the interval. In
short, the two important traits of the Poisson distribution are independence and
probability.
Let X is the number of occurrences in a Poisson process and λ be the actual average number of
occurrence of an event in a unit length of interval, the probability function for Poisson
distribution is,
λ x e−λ
P (( X) ) = , x = 0,1,2, ….
x!
Remarks
Poisson distribution possesses only one parameter λ
If X has a Poisson distribution with parameter λ , then E (X) = λ and Var (X) = λ,
i.e. E (X) = Var (X) =λ ,
∞
∑ P( xi )=1
i=0
Example 6.8 In a small city, 10 accidents took place in a time of 50 days. Find the probability
that there will be a) two accidents in a day and b) three or more accidents in a day.
Solution:
There are 0.2 accidents per day.
88
Let X be the random variable, the number of accidents per day
X ~poiss (λ = 0.2) X = 0, 1, 2, ….
(0.2)2 e−0.2
P (( X=2) )= =0.0164
2!
b) P (X ≥ 3) = P(X = 3) + P(X = 4) + P(X = 5) +...
= 1- [P(X = 0) + P(X = 1) + P(X = 2)]
∞
∑ P( xi )=1
. . . . . . since i=0
= 1- [0.8187 + 0.1637 + 0.0164] = 0.0012
6.5. Common Continuous Probability Distributions
6.5.1 Normal Distributions
It is the most important distribution in describing a continuous random variable and used as an
approximation of other distribution. Many variables in the practical world follow this distribution,
and hence in many ways it is the cornerstone of modern Statistical Theory. It has been noticed that
empirical distributions of various types of observations in natural and social sciences are often very
assumed to be approximately normal. In statistical estimation and testing of hypotheses the normal
distribution plays an important role.
A random variable X has a normal distribution with parameters μ & σ2 and it is known as a normal
2
1 -1 x−μ 1 −( x−μ )2 /2σ 2
f (x )=
σ √2 π
exp
2 σ { ( )} =
σ √2 π
e
X
Normal probability curve
89
SOME PROPERTIES OF THE NORMAL CURVE
The following are the important properties of the normal curve:
1. The normal curve is “bell-shaped” and symmetrical about the mean. The property of
symmetry can be shown using the pdf as: f (μ+ c )=f ( μ−c ) .
Since this is the property of the median, it follows that, for the normal distribution,
2. The height of the normal curve is at its maximum when X =μ= mean , which means,
again, Mean = Median= Mode. This property can also be verified using the first and
'
second derivative tests; that is, f ( x )=0⇒ x=μ .
This shows x = μ may be maximum or minimum value of X, but using the second
'' −1
f ( μ)= <0
derivative test, σ 3√2π , we see that the point is the maximum value.
Therefore, by property 1 and 2, we can conclude that, the mean, median and mode
coincide for the normal curve.
3. The normal curve is asymptotic to the X- axis.
4. The first and the third quartiles are equidistant from the median,
Q1+Q 3
i.e.,
Q3−Q2=Q2 −Q1 . Or, Q2=
2 .
5. The Probability that a random variable will have a value between any two points is equal
to the area under the curve between those points.
90
By standardization we mean that the random variable X will be transformed to another random
variable whose mean is 0 and variance is 1. The normal distribution with zero mean and standard
deviation one is known as standard normal distribution. If X has normal distribution with mean
μx and standard deviation σ , then the standard normal distribution Z is given by
x−μ
Z= , for population
σ
x−x́
Z= , for sample
S
Using the properties of expectations, it is now trivial to show that E( Z )=0 and V(Z )=1 . The pdf
1
1 −2 z
2
f (z )= e ,−∞<z<∞
of Z is, thus, given by √2 π .
z
P(0<Z< z)=∫ f ( z)dz
The entries in Table A of the Appendix are the values of 0 .
That is, the table gives us the probabilities that a random variable Z having the standard normal
distribution will take on a value on the interval from 0 to z, for z=0.00, 0.01, 0.02, …, 3.98, and
3.99; due to the symmetrical property of the normal curve with respect to its mean, it is
unnecessary to extend the table for negative values of Z.
Table value
α 0 Z α
91
z 1
1 −2 z
2
P(0<Z< z)=∫ e dz
That is, the arrowed region is 0 √ 2 π .
Example 6.9: Find the probabilities that a random variable having the standard normal
distribution will take on a value
a) Less than 1.72; b)Less than -0.88;
c) Between 1.30 and 1.75; d) Between -0.25 and 0.45.
a) P(Z <1 .72 )=P( Z< 0)+ P( 0< Z <1. 72) =0 . 5+0 . 4573 =0 . 9573 .
b) P(Z <−0 . 88 )=P(Z >0 . 88 ) =0 . 5−P(0 <Z <0 . 88) =0 . 5−0. 3106 =0 .1894 .
c) P(1. 30<Z<1. 75) =P(0<Z <1 .75 )−P(0<Z <1. 30 ) =0 . 4599−0 . 4032 =0 .0567 .
92
d) P(−0. 25<Z <0 . 45) =P(−0 . 25<Z <0 )+ P( 0<Z<0 . 45)
2
Let X N ( μ , σ ) . Suppose that we want to find the probability P(a< X< b ) .
Now, we need only to get the readings from the Z- table corresponding to z 1 and z2 to get the
required probabilities, as we have done in the preceding example.
b−μ a−μ
(
P( X <b )=P Z <
σ )
=P( Z< z2 )
, and
P( X >a )=P Z >( σ )
=P( Z> z1 )
.
We have seen that a Z- value measures the distance between a particular value of X and the mean
a) P( μ−σ < X < μ+ σ ) ; b) P( μ−2 σ < X < μ+2 σ ) ; c) P( μ−3 σ < X < μ+3 σ ) .
a)
P( μ−σ < X < μ+ σ )=P ( μ−σσ −μ < Z < μ+ σ−μ
σ )
=P(−1<Z<1 ) =2 P(0<Z<1) =2(0.3413) (See Table A)
=0.6828 or 68.28%.
93
b) Similarly, P( μ−2 σ < X < μ+2 σ )=P(−2< Z <2) =2 P(0<Z<2) =2(0.4772) = 0.9544.
a) About 68.30% lies in the region μ−σ∧μ+ σ (1 Standard Dev. on either side).
Notation: Z α denotes the value of Z for which the area to its right is equal to α .
Solution: a)
Z 0 .01 corresponds to an entry of 0.5 - 0.01 = 0.4900.
In Table A, look for the value closest to 0.4900, which is 0.4901, and the Z value for this is
b) Again, Z 0 .05 is obtained as 0.5 - 0.05 = 0.4500, which lies exactly between 0.4495 and
Example 6.12: Suppose that X N (165, 9), where X = the breaking strength of cotton fabric. A
sample is defective if X<162. Find the probability that a randomly chosen fabric
will be defective.
2
Solution: Given that μ=165 and σ =9 ,
94
=0 . 5−P(0 <Z <1 ) (By symmetry)
2
α,n
=0.05, 0.025, 0.01, 0.005 and n=1, 2, 3, …, 30, where χ is such that the area to its right
2
α,n
under the Chi-square curve with n degrees of freedom is equal to α . That is, χ is such
that if X is a random variable having a Chi-square distribution with n degrees of freedom, then
2
α ,n
P( X≥ χ )=α . α is known as the level of significance. When n is greater than 30, the
table cannot be used and probabilities related to Chi-square distributions are usually
approximated with normal distributions.
2
α,ν
0 χ
95
Properties of Chi-square Distribution
1. The exact shape of the distribution depends upon the number of degrees of freedom n. In
general, when n is small, the shape of the curve is skewed to the right and as n gets larger, the
distribution becomes more and more symmetrical.
2. The mean and variance of the χ 2distribution are n and 2n respectively.
3. As n → ∞ the χ 2 distribution approaches a normal distribution.
4. The sum of independent χ 2varieties is also χ 2 variety.
6.5.3 The t-distribution
Let X1,X2,….Xn be a random sample drawn from a normal distribution having mean μ and
standard deviation σ (unknown but estimated by S, sample standard deviation).
X́−μ
t=
The statistic S has t – distribution with (n-1) degree of freedom where X́ is sample mean
√n
and S is standard deviation.
In view of its importance, the t distribution has been tabulated extensively. Table B at the end of
Notation: tα, (n-1) stands for a value of t with (n-1) degree of freedom the right of which an area
equal to a in reading the tabulated values.
α
α −t α tα
0
Student’s t Distribution
96
t α , n−1 =
−t α , n−1 .
2. When (n-1) =30 or more, probabilities related to the t distribution are usually
approximated with the use of normal distributions.
a)
t 0.05 =1.725; c)
−t 0.10 = -1.328.
t α =t 0 . 005=
t =
b) 0.975 2.093; d) 2 2.861; &
−t 0.005= -2.861
Applications of t Distribution
The t distribution has wide applications in Statistics, only some are listed below:
σ2 is unknown, the t distribution with n-1 degrees of freedom, is used to test the
alternatives:
μ≠μO , or μ> μO , or μ< μO .
X́−μ tα
Then, we calculate t = S , which is to be compared with the table value 2 , or
tα
√n
with n-1 degrees of freedom.
Note: The assumptions underlying student’s t-distribution for such tests are:
97
c) The population standard deviation ( σ ) is unknown.
d) n is small; that is n<30.
Example: 6.14: In 16 one-hour test runs, the gasoline consumption of an engine averaged 16.4
gallons with a standard deviation of 2.1 gallons. In order to test the claim that the average
gasoline consumption of this engine is 12.0 gallons per hour, calculate the t value and
t α , n−1 ,
for α =0.05.
Solution: Substituting n=16, μ =12.0, X̄ =16.4, and S=2.1 in the formula, we get
X́−μ 16.4−12.0
t
t = S = 2.1/ √ 16 = 8.38; and the table value for n-1 = 15 is 0.05,15 = 1.753.
√n
between the means of two population means, μ1 −μ 2 =0 , or the equality of two means
I. The parent populations from which the samples have been drawn are normally
distributed;
2 2 2
II. The two population variances are equal, though unknown: σ 1 =σ 2=σ .
III. The two samples are random and independent of each other;
IV. The sample sizes are small: n1 and/or n2 are <30.
c) t-Test of correlation and regression coefficients
In a normal regression and correlation analyses, it is used to test:
98
b) if the population correlation coefficient is significantly different from zero.
Exercise 6
1. From a lot containing 20 items, of which 5 are defective, 4 are chosen at random. Let X
be the number of defectives found.
3. The amount of bread X ( in hundreds of kg) that a certain bakery is able to sell in a day
is found to be a continuous r-v with a pdf given as below:
kx , 0≤x<5
{
f (x )= k(10−x ) , 5≤x<10
0 , otherwise
a) Find k; b) Find the probability that the amount of bread that will be sold tomorrow is
i) More than 500kg, ii) between 250 and 750 kg;
4. Find the value of Z if the area between -Z and Z is a) 0.4038; b) 0.8812; c) 0.3410.
5. The reduction of a person's oxygen consumption during periods of deep meditation may
a)
t 0.05 for n =13;
b)
t 0.01 for n = 9;
c)
t 0.995 for n =16;
tα −t α <t <t α )
d) 2 Such that P( 2 2 =0.95 for n =11.
9. The heights of 10 males of a certain locality are found to be 70, 67, 62, 68, 61, 68, 70, 64,
64, and 66 inches. If it is desired to test if the average height is 64 inches, at α =0.05,
a) calculate the t value;
100
7.1 Basic Concepts
Population:- is the complete collection of individuals, objects or measurements for which
inferences are to be made. The population represents the target of an investigation, and the
objective of the investigation is to draw conclusions about the population and it should be
defined on the basis of the objective of the study by the investigator.
Example 7.1:
All customers of electric supply company.
All students of DMU.
Population of farms having a certain type of natural fertility.
Population of house holds in a certain village.
Sample:- A sample from a population is the set of measurements that are actually collected in
the course of an investigation. It should be selected using some predefined sampling technique in
such a way that they represent the population very well.
Sampling (elementary) unit:- the ultimate unit to be sampled or elements of the population to
be sampled.
Example 7.2
If some body studies economic status of the house holds, households is the sampling unit.
If one studies performance of freshman students in some college, the student is the
sampling unit.
Sampling frame:- is the list of all elements (sampling units) in a population.
Examples7.3
List of house holds of a certain city.
List of students in the registrar office of the university.
Parameter and Statistic:- are basic terms in sampling theory. Parameter is a value calculated
from the population. For instance population mean, population variance, population proportion is
parameters. Statistic is a value calculated from a sample. Sample mean, sample variance, sample
proportion, etc are statistics.
Sampling error:- A type of error that may arise due to inappropriate sampling techniques
applied .A sampling error is the difference between a sample statistic and its corresponding
parameter. We can make probabilistic statements about this sampling error only if we have a
probability sample.
101
Non-sampling error:- In addition to sampling error, the sample estimate may be subject to other
errors, sampling errors. Errors in observation, interview or measurement error, errors due to non-
response and errors in data processing: editing, coding, etc. The non-sampling error is likely to
increase with increase in sample size. For instance a census survey may have non-sampling
errors in large amount collected in the course of an investigation. It should be selected using
some predefined sampling technique in such a way that they represent the population very well.
7.2 Reasons for sampling
Sample survey saves money:- It is possible to collect information from sample households and
obtain estimates that reasonably approximate the actual characteristics of a large population .It
obviously cheaper to gather information from 100 house holds rather than from 10,000 house
holds.
Sample Survey saves time:- sample survey requires a smaller scale of operations at all stage
and it reduces data collection and processing time.
Sample survey provides higher level of accuracy:- This accuracy can be achieved through
more selective recruiting of interviewers and supervisors, more extensive training programs, a
closer supervision of the personnel involved and a more efficient monitoring of the field work.
Sample survey could be the only option for the study in some specialized area. For example,
there are some cases where information of technical nature requires highly trained personnel and
specialized equipment like in medical areas.
Experimentation could be destructive in nature like testing industrial products such as testing
the average duration of burning of bulbs, and testing the quality of wine, beer, etc. In this case
sampling is the only feasible means of study.
7.3 Sampling Techniques
The technique of selecting a sample is important in sampling theory and usually it depends upon
the nature of the investigation. The commonly used sampling techniques may be broadly
classified as: Non Probability and Probability Sampling.
A. Random Sampling or probability sampling.
Probability sampling techniques is a method of sampling in which all elements in the population
have a pre-assigned probability to be included in to the sample.
In this sub-section, four different techniques of taking a random sample are discussed.
a/ Simple random sampling
102
b/ Stratified random sampling
c/ Cluster sampling
d/ Systematic sampling
finite population of size N units have the same probability of selection. There are N Cn
distinct possible samples in the case of sampling without replacement; the chance of selecting
1
. Nn
each one of them is
C
N n There are possible samples in the case of sampling with
replacement, the chance of selecting each one of them is 1/ N n. Conceptually, simple random
sampling is the simplest of the probability sampling techniques. It requires a complete sampling
frame, which may not be available or feasible to construct for large populations. Even if a
complete frame is available, more efficient approaches may be possible if other useful
information is available about the units in the population.
Simple random sampling is free of classification error, and it requires minimum advance
knowledge of the population. It best suits situations where the population is fairly homogeneous
and not much information is available about the population. If these conditions are not true, some
other types of sampling techniques may be a better choice. Lottery method and computer
generated random numbers are used to select a random sample in simple random sampling:
103
i) Lottery method: This is a very common method of taking a random sample under this
method; we label each member of the population by identifiable ticket or pieces of papers.
Tickets must be of identical size, color and shape. They are placed in the container and well
mixed before each draw and then draws may be continued until a sample of the required size is
selected. This shows that selection of items depends entirely on chance.
Example 7.4: If we want to take a sample of 25 persons out of a population of 150, the
procedure is to write the names of all the 150 persons on separate slips of papers, fold these slips,
mix them thoroughly and then make a blindfold selection of 25 slips without replacement.
This is an alternative method of selecting a simple random sample. It is constructed from the
digits 0, 1, 2,…, 9. There are several tables available in standard books of Statistics.
Column
Row 1 2 3 4 5 6 7 8
104
5 56149 55678 38169 47228 49931 94303 67448 31286
Example 7.5: Suppose that N= 40 and we want to select n=10 without replacement, starting
with the 3rd row and 2nd column by reading vertically using the above random table, we get
Solution: starting with the 3rd row and 2nd column by reading vertically we will get:
15, 26, 19, 08, 24, 35, 16, 38, 12 and 17.
represent the population size in the ith strata. Then a sample is drawn from each stratum
independently, the sample size within the ith stratum being ni (i=1,2 ,… , k) such that
105
Remarks:
In stratified random sampling, the following two points are equally important to ensure
accuracy.
106
to be included in the sample. It is however, much more efficient and much less expensive to do.
Suppose that we have a complete and up-to-date list of the N units in the population numbered
from 1 to N in some order. To select a sample of size n, if N is an integral multiple of n, N = kn
for some integer k, k = population size / sample size = N/n.
The procedure starts in determining the first element to be included in the sample, select a unit i
randomly from the first group, i≤ k as the first element. The second unit will be (i+k)th element
from the frame. Totality we have a sample of size n from the population of size N, i th , (i+k)th ,
(i+2k)th ,… (i+(n-1)k)th element of the population are taken as a sample.
Example 7.6: Suppose that N = 20 and we want to select a sample of size 4, so that k = N/n
=20/4 = 5. The first element in the sample is selected from the first 5 units randomly, say 3 rd,
which is the random start. Then, every 5th unit is selected, and the sample contains the 3rd, 8th, 13th
and 18th units of the population.
B. Non-Random Sampling or non-probability sampling.
It is a sampling technique in which the choice of individuals for a sample depends on the basis of
convenience, personal choice or interest.
Types of non-random sampling are:
1. Judgment sampling.
2. Convenience sampling
3. Quota Sampling.
1. Judgment Sampling
In this case, the person taking the sample has direct or indirect control over which items are
selected for the sample. This method is mainly used for opinion surveys but is not recommended
for general use, as it bias of the sampler.
2. Convenience Sampling
In this method, the decision maker selects a sample from the population in a manner that is
relatively easy and convenient.
3. Quota Sampling
This is a type of judgment sampling and may be the most commonly used one in the non-
probability category. In a quota sample, quotas are set up according to some specified
characteristics such as income groups, age groups, political or religions groups, etc. Within the
quota, the selection of sampling units depends up on personal judgment.
107
7.4 Sampling Distribution of the sample mean
Consider all possible samples of size n that can be drawn from a given population (either with or
without replacement). For each sample, we can compute a statistic (such as the mean & the
standard deviation) that will vary from sample to sample. In this manner we obtain a distribution
of the statistic that is called its sampling distribution.
Steps for the construction of Sampling Distribution of the mean
1. From a finite population of size N , randomly draw all possible samples of size n. There are N n
possible samples if sampling is with replacement and there are N Cn possible samples if
sampling is without replacement.
2. Calculate the mean for each sample.
3. Summarize the mean obtained in step 2 in terms of frequency distribution
Example 7.7: Suppose we have a population of size 5, consisting of the age of five children 3, 5,
7, 9, and 11. Population mean is 7 and population variance is 8. (Consider sampling without
replacement).
Take samples of size 2 and construct sampling distribution of the sample mean.
Solution:
C
Step 1: N= 5 , n=2 we have 5 2 =10, possible samples.
(3,5), (3,7), (3,9), (3,11), (5,7), (5,9), (5,11), (7,9), (7,11) and (9, 11)
Step 2: Calculate the sample mean for each sample:
Means = 4, 5,6,7,6,7,8,8,9,10 respectively.
Step 3: Summarize the mean obtained in step 2 in terms of frequency distribution.
x̄ i 4 5 6 7 8 9 10 Total
fi 1 1 2 2 2 1 1 10
fi 4 5 12 14 16 9 10 70
f (−7)2 9 4 2 0 2 4 9 30
∑ fi
a) Mean of sample means , E( X̄ ) = = 70/10 = 7
∑ fi
k
108
2
σ N −n 8 5−2
V ( x̄ )= ( )
n N −1 = 2 5−1
( ) =3
Example 7.8
Three students have taken a class test which is marked out of 10. We want to estimate the mean
mark using the sample mean as the estimate of the population mean. We take a sample of size 2
in two cases and suppose the marks of the three students are 1, 2 and 6.
The population mean μ is (1+2+6)/3 = 3
2
The population variance σ = ∑ ¿¿ ¿ = 14/3
The sample mean is a random variable, and we see that it can take three possible values. We can
now write down its probability distribution as follows
x̄ i 1.5 3.5 4 Total
P( X̄ =
x̄ i ) 1/3 1/3 1/3 1
(−3)2 2.25 0.25 1 3.5
109
σ 2 N −n 14 /3 3−2
In which if Sampling without replacement,
V ( x̄ )=
n N −1( ) = 2 3−1 ( ) =14/12 =
1.17.
ii) Sampling with replacement
In this type of sampling an observation has a chance to be selected at each draw.
Suppose that we take the sample with replacement, there are 32 = 9 possible samples.
(1,1) (1,2) (1,6) (2,1) (2,2) (2,6) (6,1) (6,2) (6,6)
Sample
Sample mean 1 1.5 3.5 1.5 2 4 3.5 4 6
The sample mean is a random variable & its probability distribution is:
x̄ i 1 1.5 2 3.5 4 6 Total
x̄
P( X̄ = i )
1/9 2/9 1/9 2/9 2/9 1/9 1
x̄ i P( X̄ =
1/9 1/3 2/9 7/9 8/9 6/9 3
x̄ i )
fi(−3)2 4 4.5 1 0.50 2 9 21
σ2 14/3
V ( X )=σ 2x =
In which if sampling with replacement, n = 2 = 14/6 = 2.33.
In each case the expected value of the sample mean equals the population mean. This explains
why the sample mean is a good estimate of the population mean. If we use the sample mean as
110
an estimate of the population mean we will sometimes overestimate it, and sometimes under-
estimate it, but “on average” we will be accurate.
The example above illustrates an important result:
Remark:
∑ fi
1. Mean of sample means= E( X̄ ) = = ∑ X́ i p ( ¿ ) = population mean.
∑ fi
2
σ
V ( X )=σ 2x =
2. Variance of sample means, n ( if sampling is with replacement).
2
σ N −n
3) Variance of sample means
V ( x̄ )= ( )
n N −1 ,(if sampling is with out replacement).
N −n
The quantity
( )
N−1 is finite population correction (fpc), and if n/N <0.05, fpc is ignored.
Note: the square root the Variance of sample means is known as standard error.
The distribution of sample means depends on distribution of the population, sample size and
whether population variance is known or unknown. A sample may be from a normally
distributed population or from a non-normally distributed population, from a population with
variance is known or unknown and the sample size may be large or small.
Case-I: If sampling is from a normally distributed population with known variance: When
sampling is from a normally distributed population with known variance, the distribution of
111
70−68
P( X̄ >70) = p(Z> √0.56 ) = p(Z>2.67) = 0.0038
Case-II: When sampling from a non normal population and when the sample size is large
If sampling is from a non normal population and when the sample size is large the distribution of
X̄ depends on Central Limit Theorem.
The Central Limit Theorem
2
If X1, X2, …, Xn is a random sample from a population with mean μ and variance σ , then as
n goes to infinity the distribution of the sample mean, X̄ , approximates normal distribution
2
σ σ2
with mean μ and variance n . In short as n gets large number, X ~
( ).
N μ,
n
X−μ
Z=
We can standardize this to get σ /√n ~ N (0, 1) (approximately as n gets large). When
X−μ
Z=
population variance is unknown S/√n ~ N (0, 1) (approximately as n gets large).
Example 7.10: The mean weight of 500 male students at a certain university is 151 pounds (lb)
and the standard deviation is 15 lb. Assuming that the weights are normally distributed. Suppose
that a sample of 64 students is taken, what is the probability that the weight in the sample is more
than 154.75 lb?
Solution
As we have taken a large (n=64) sample we can use the Central Limit Theorem. This says that
the mean weight of the sample can be approximated by a normal random variable with a mean of
151 and a variance of 225. If we let X̄ be the mean weight of the students, it is required to
find
112
Solution
We have n = 150 which is large enough to use the Central Limit Theorem. Mean =7.50 and
standard deviation = 3.40.
Let X́ be the mean amount of an individual’s expenditure during the day. X́ ~ N (7.50, 0.077)
Let X̄ the average amount of an individual’s expenditure during the day, it is required to find
P( X̄ >8)
X−μ 8.00−μ 8.00−7.5
P( X̄ >8.00) = p( σ / √ n > σ / √ n ) = p(Z > 3. 4/ √150 ) = p(Z>1.80) = 0.5 – P (0<Z<1.80)
= 0.5 – 0.4641 = 0.0359
This means there is only 0.0359 probabilities that a person will spent larger than 8.00 birr on
average.
Case-III: When sampling is from normally distributed population with unknown population
variance,
X−μ
Z=
a) If the sample size is large, S /√n ~ N (0, 1) , where S is an estimate of σ .
X−μ
t=
b) If the sample size is small (n<30), S /√n ~t(n-1). t has t-distribution with (n-1) degree of
freedom, where S is an estimate of σ .
7.5 Sampling Distribution of the sample Proportion
In situations where it is not possible to measure the characteristic under study, but is possible to
classify the whole population in various categories with respect to the attributes they possess,
consideration is usually given to estimating the population elements that belong to a defined
category of class. Suppose that we have two complementary and mutually exclusive class, C and
C' such that every unit in the population falls into either of them.
In order to know how many of the units fall in class C, we define a counting variable as
113
n
a ∑ X i from
Given a simple random sample of n units, the sample proportion denoted by p= = i
n
n
the formula, we see that X̄ and p are essentially identical. In fact p is special case of X̄ , the
case where possible values of Xi are only 0 and 1. Consequently p possesses all properties of
X̄ .
p is an estimate of P, with variance
σ 2 N−n N
114
b) Standard error of p is√ ❑ = √ ❑ = 0.0434.
Exercise 7
1. What is the difference between a statistic and a parameter?
2. What is a sampling frame?
3. How do you select a simple random sample?
4. A population consists of the four numbers, 3,7,11 and 15. Consider all possible samples of
size 2 drawn from this population with replacement.
Find a) population mean
b) Population variance
c) the sampling distribution of sample means
d) the mean of sample means
e ) the standard deviation of the sample means.
5. Solve problem 4 if the sampling is wor.
6. An electrical firm manufactures light bulbs that have a length of life which are normally
distributed with population mean 800 hrs and standard deviation 40 hrs. Find the probability that
a bulb burns:
a) Between 778 and 834 hrs
b) Greater than 834 hrs?
7. The amount of sulphur in a daily emission from a factory has a normal distribution with mean
of 134 pounds and a standard deviation of 22 pounds. For a day selected randomly, find the
probability that the mean amount of sulphur emission will be less than 130 pounds.
8. A population consists of the four numbers, 3,7,11, 13 and 15. Consider all possible samples of
size 2 drawn from this population without replacement.
Find: a) the sampling distribution of sample means
b) The mean of sample means
c) The standard deviation of the sample means.
115
CHAPTER EIGHT: STATISTICAL INFERENCES
The process of inferring information about a population from a sample is known as statistical
inference. This chapter has two major parts .The first part is statistical estimation discusses the
method of estimating a population parameter by using statistic, point estimation. It also explains
the concepts of confidence interval. The second part is hypothesis testing describes the different
techniques of testing a given tentative assumptions by applying an appropriate test statistic.
Objectives:
After completing this unit, the student should be able to
Explain the concepts of statistical estimation and the confidence interval.
Distinguish interval estimation from point estimation.
Calculate and interpret point estimate of population mean and population proportion.
Define the concept of hypothesis testing and differentiate types of tests.
List down the basic steps in hypothesis testing.
Follow the steps to solve problems on hypothesis testing.
Identify the appropriate test statistics for a given practical problem.
8.1 Statistical Estimation
It is the procedure of using a sample statistic to estimate a population parameter. This is one way
of making inference about the population parameter where the investigator does not have any
prior notion about values or characteristics of the population parameter. A statistic used to
estimate a parameter is called an estimator and the value taken by the estimator is called an
estimate. Statistical estimation is divided into two main categories: Point Estimation and Interval
Estimation.
Point Estimation:- When we use a single value of a statistic to estimate the corresponding
parameter of a population, it is called point estimation. It is a common way of estimating a
parameter, where a random sample of n observations is selected from a population and the
statistic is calculated.
Examples:
A sample mean is an estimate for population mean μ, Sample variance is an estimate
for population variance. e.t.c.
for population Variance σ 2.
A sample proportion estimate for population proportion
116
Properties of best estimator
The following are some qualities of an estimator.
It should be unbiased.
It should be consistent.
It should be relatively efficient.
To explain these properties let θ^ be an estimator of θ.
1. Unbiased Estimator: An estimator whose expected value is the value of the parameter
being estimated. i.e., E(θ^ ) = θ.
2. Consistent Estimator: An estimator which gets closer to the value of the parameter as
the sample size increases. i.e., θ^ gets closer to θ as the sample size increases.
3. Relatively Efficient Estimator: The estimator for a parameter with the smallest variance.
This actually compares two or more estimators for one parameter.
Interval estimation: It is unlikely that any particular estimate will be exactly equal to the
population mean, surely an estimate can be greater than or less than the parameter .That is, it is
not always possible to estimate population parameter with out any error so allowance is needed
for such error .We take interval, ranges of values about an estimate in which the parameter may
lie. This procedure is Interval estimation. It is the procedure that results in the interval of values
of a parameter. Interval estimates indicate the precision or accuracy of an estimate and are,
therefore, preferable to point estimates. It deals with identifying the upper and lower limits of a
parameter. Confidence interval for the parameter is:
Estimate ± critical value × Standard error of the estimator
Example 8.1:: Confidence interval for the population mean is:
X́ ± Critical value × Standard error of ( X́ )
8.1.1 Confidence interval Estimation for population means
Although X́ possesses nearly all the qualities of a good estimator, because of sampling error, we
know that it's not likely that our sample statistic will be equal to the population parameter, but
instead will fall into an interval of values. We will have to be satisfied knowing that the statistic
is "close to" the parameter. That leads to the obvious question, what is "close"?
117
We can phrase the latter question differently: How confident can we be that the value of the
statistic falls within a certain "distance" of the parameter? Or, what is the probability that the
parameter's value is within a certain range of the statistic's value? This range is the confidence
interval.
The confidence level is the probability that the value of the parameter falls within the range
specified by the confidence interval surrounding the statistic. There are different cases to be
considered to construct confidence intervals.
X́−μ
X́ ~ N(μ, σ 2/n) . We can standardize this to get Z= ~ N (0, 1).
σ /√ n
From the standard normal distribution, we have
P (−Zα /2 < Z< Z α /2 )=1−α
Where α is risk probability and 1- α confidence level. The confidence level is the probability that
the value of the parameter falls within the range specified by the confidence interval surrounding
the statistic. σ / √ n is the standard error of the statistic . Standard error is the square root of
variance where Var ( X́ ) = σ 2 /n .
Using the standardized form of the sampling distribution of the sample mean in the above
probability statement, we get the limits of the confidence interval as follows:
X́−μ
(
P −Z α / 2< )
<Z =1−α
σ / √ n α /2
P (−Z α /2 σ / √ n< X́ – μ< Z α /2 σ / √ n ) =1−α
P (−Z α /2 σ / √ n− X́ ← μ← X́ + Z α / 2 σ / √ n )=1−α
118
Here are the Z values corresponding to the most commonly used confidence levels.
(1- α) 100% α α /2 Z α /2
90 0.10 0.05 1.645
95 0.05 0.025 1.96
99 0.01 0.005 2.58
Example 8.2: The weights of full boxes of a certain kind of cereal are normally distributed
with a standard deviation of 0.27 ounce. If a sample of 15 randomly selected boxes produced a
mean weight of 9.87 ounce, find:
a) The 95% confidence interval for the true mean weight of boxes of this cereal,
b) The 99% confidence interval for the true mean weight of boxes of this cereal,
c) What effect does the increase in the level of confidence have on the width of the
interval?
Solution:
X̄−μ
Z=
Where σ /√n .
σ σ
x̄−Z α /2× < μ < x̄+ Z α /2 ×
Substituting these values in √n √n , the resulting
confidence interval is (9.73, 10.01).
c) The increase in the confidence level widens the length of the confidence interval.
Case-II: When sampling from a non normal population and when the sample size is large the
distribution of X́ depends on Central Limit Theorem (with known and unknown population
variance).
119
Recall the Central Limit Theorem, which applies to the sampling distribution of the mean of a
sample. Consider samples of size n drawn from a population, whose mean is μ and standard
deviation is σ. The population can have any frequency distribution. The sampling distribution of
σ
X́ will have a mean μ and standard deviation is . The sampling distribution of X́ is normal
√n
σ2 σ2
with a mean μ and variance as n gets large .That is X́ ~ N (μ, ) (as n gets large). We can
n n
X́−μ X́−μ
standardize this to get Z = ~ N(0,1) or Z = ~ N(0,1) when σ is unknown.
σ /√ n S/ √ n
A (1-α) 100% confidence interval for population mean (μ) is
( X́ −Z α /2 σ / √ n , X́ + Z α /2 σ / √ n) if σ 2 is known and
( X́ −Z α /2 S / √ n , X́+ Z α /2 S / √n) if σ 2 is unknown.
Example 8.3: An economist wants to estimate the average amount in checking accounts at banks
in given region. A random sample of 100 accounts gives X́ =$ 357.60 and S= $140.00. Give a
95% confidence interval for μ, the average amount in any checking account at a bank in the
given region.
Solution
Given: n = 100, X́ =$ 357.60, S= $140.00 & α = 0.05
A 95% confidence interval for population mean (μ) is
( X́ – Z α/ 2 S / √ n , X́ + Z α / 2 S/ √ n) … since n is large and σ 2 is unknown
= (357.60−1.96 ( 140.00 / √100 ) , 357.60+1.96(140.00 / √100)¿
¿(330.16 ,385.04).
Case-III: When sampling is from normally distributed population with unknown population
variance and when the sample size is small (n<30).
When population variance σ2 is unknown, we estimate it by sample variance. The standardized
X́−μ 0
distributions of the sample mean, t= is t-distribution with (n-1) degrees of freedom.
S /√n
From this distribution, (1-α) 100% confidence interval for population mean is
S S
( X́ – t α /2(n−1) , X́ +t α / 2(n−1) ) .
√n √n
120
Example 8.4: From a normal sample of size 25 a mean of 32 was found .Given that the standard
deviation is 4.2. Find
Solution:
α
a) Given: n = 25 X́ =32, S = 4.2, 1-α = 0.95 ⟹ α = 0.05, 2 =0.025
⟹ tα =2.064 ¿ table .
, 24
2
S S
⟹ The required interval will be ( X́ – t α /2(n−1) , X́ +t α / 2(n−1) )
√n √n
4.2
= 32± 2.064 ×
√25
= 32 ± 1.73
= (30.27, 33.73)
α
b) Given: n = 25 X́ =32, S = 4.2, 1-α = 0.99 ⟹ α = 0.01, 2 =0.005
⟹ tα =2.797 ¿ table .
, 24
2
S S
⟹ The required interval will be ( X́ – t α /2(n−1) , X́ +t α / 2(n−1) )
√n √n
4.2
= 32 ± 2.797×
√25
= 32 ± 2.35
= (29.65, 34.35)
8.1.2 Sample size determination in estimation of population mean
In the process of estimating population mean μ using the sample mean with absolute margin of
error (d) and risk probability α, the sample size is given by:
2
Zα σ
n= [ ]2
d
where | X́−μ|=d
Example 8.5: To determine the average amount of time students take to get from one class to the
next, how large a sample is needed with probability 0.95 that the error will be at most 0.25
minutes, if σ is known from past experience to be 1.50 minutes?
121
Solution: Using
Z 0 . 025=1.96 , and replacing E=0. 25 , and σ =1. 50 in the formula for n
, we get n =138 . 30≈139 (always rounded to the next integer) is required for the estimate.
pq pq
estimate of variance of sample proportion is Var ¿) ¿ for large sample Var ( ^p )= .
n−1 n
p^ q^
^p ± Z α / 2
√ n
Example 8.6: The Human Resource director of a large organization wanted to know what
proportion of all persons who had ever been interviewed for a job with his organization had been
hired. He was willing to settle for 95% confidence interval. A random sample of 500 interview
records revealed that 76 or 0.152 of the persons in the sample had been hired.
Solution:
p^ q^ 0.152 ×0.848
^p ± Z α / 2
√ n
=0.152± 1.96
500 √ =0.152± 0.031
= (0.121, 0.183)
In section 8.1, we have studied how to make estimations of the mean using point and interval
estimations. The other aspect of statistical inference is known as statistical test of hypothesis.
The branch of statistics which helps us in arriving at the criterion for deciding about the
characteristics of the population, a parameter, based on the information obtained from the
sample data is known as testing of hypothesis. We shall use the theoretical results presented for
122
the interval estimation, and hence, a test of hypothesis is highly connected with the theory of
estimation we studied before.
In this section, basically we will deal with testing hypotheses about population mean and
population proportion. While doing so, we shall define some important terminologies which we
may face and the errors we are committing in the process. We shall employ the standard normal
distribution (or Z-test) and the t-distribution (or t-test), depending upon the nature of the
population sampled and the sample.
123
Statistical test of hypothesis can lead to two kinds of errors. If the statistical test rejects Ho when
it is true, the error is type I error. If the test accepts Ho when it is false, the error is a type II error.
The following table gives a summary of possible results of any hypothesis testing procedure:
Type I error is the error committed in rejecting the null hypothesis when it is true. Probability of
committing type I error is sometimes called level of significance and denoted by α.
Type II error is the error committed in accepting the null hypothesis when it is false. Probability
of committing type II error is denoted by β.
In both types of errors, a wrong decision has occurred. An ideal test procedure is one which is so
planned as to safeguard against both these errors. However, in practical situations an attempt to
minimize one of the errors maximizes the other. In view of this dilemma and the fact that wrong
rejection of Ho is a more serious error, we will hold α at a predetermined low level, such as
0.1, 0.05, or 0.01 when choosing a rejection region. The level of significance 5% (α=0 .05)
implies that in 5 samples out of 100 we are likely to reject a correct H0. In other words this
implies that we are 95% confident that our decision to reject H0 is correct.
General steps in hypothesis testing on population mean, μ
Step-1 The first step in hypothesis testing is to specify the null hypothesis (H0) and the
alternative hypothesis (H1). Suppose the assumed or hypothesized value of μ is denoted by μo,
then one can formulate two sided and one sided hypothesis as follows:
1. Ho: μ =μo versus H1: μ μo (two sided test)
2. Ho: μ = μo versus H1: μ < μo (one sided test)
3. Ho: μ = μo versus H1: μ > μo (one sided test)
Step-2: Specify a significance level of α.
Step-3 We should identify the sampling distribution of the estimator and the test statistic.
Case-I: Population variance (σ2) is known and parent population is normal.
124
X́−μ
Z=
The test statistic is σ ~ N (0, 1).
√n
Case-II: When sampling from a non normal population and when the sample size is large the
distribution of X depends on Central Limit Theorem (with known and unknown variance).
X́−μ0
Z=
a) The test statistic is: σ ~ N (0, 1) with known variance
√n
X́−μ0
Z=
b) The test statistic is: S ~ N (0, 1) with unknown variance.
√n
Case-III: When sampling is from normally distributed population with unknown population
variance.
X́−μ0
Z=
i) When the sample size is large, S ~ N (0, 1).
√n
X́−μ 0
t=
ii) When the sample size is small (n<30), S ~ t(n-1).
√n
Step-4. The value of the test statistic can be calculated as follows:
X́−μ o
a) Zc = σ with known variance.
√n
X́−μ o
b) Zc = S with unknown variance & large sample size.
√n
X́−μ o
c) tc = S with unknown variance and small sample size.
√n
where X́ is the sample mean and μo the parameter specified by the null hypothesis.
Step-5: Identify the critical (rejection) region or put the decision rule.
a) For two sided test Ho: μ = μo versus H1: μ μo , reject Ho if
Zc > Z α /2 or Zc < −Z α /2 .
125
Note:Zc refers to Zcalculated
Graphically, the rejection and acceptance regions are:
α
Rejection Region
Acceptance Region
Rejection Region
2
Zα Zα
- 2 2
b) For one sided test (right sided test) Ho: μ = μo versus H1: μ > μo reject Ho if Zcalculated > Z α.
Graphically, the rejection and acceptance regions are
Z α
c) For one sided test (left sided test) Ho: μ = μo versus H1: μ < μo reject Ho if Zcalculated < −Z α.
Graphically, the rejection and acceptance regions are
126
−Z α
Decision Table
To test
H 0 : μ=μ 0 against the three alternatives, the rules are summarized as:
orZ C =−Z α /2
Example 8.7: Test at α=0 . 05 whether the mean of a random sample of size n = 16 is
"significantly less than 10" if the distribution from which the sample was taken is normal,
x̄=8 .4 and σ =3. 2 (known).
Solution:
*
Z α =Z 0. 05=1 .645 (critical value)
x̄−μ0 8. 4−10
ZC= = =−2
* σ / √n 3 .2/ 4 (calculated value)
* Since
Z c =−2<−Z α =−1. 645 , the null hypothesis is rejected. That is, the population mean
8.4 is significantly less than 10 at 5% level of significance.
Example 8.8: Based upon a random sample of size 100 with an average of 3.4 minutes and a
standard deviation of 2.8 minutes, is the claim that the average telephone call is 4 minutes true
with a confidence of 95%?
127
Solution: Given: n=100, x̄=3. 4 min, s=2. 8 min, α=0.05
H 0 : μ=4
To test: H A : μ≠4
Since σ is unknown this should be a t-distribution; however, since n=100 is large the z-
satistic is used.
X−μ 0 3. 4−4
Zc= = =−2. 14
S / √ n 2. 8 /10
Since the calculated value is less than the tabulated value (-2.14<-1.96), the null hypothesis
will be rejected. Therefore average telephone call is significantly different from 4 minutes at
α=0. 05.
Example 8.9: A sample of 16 students gave an average mark of 53.8 with a standard deviation
Solution:
H 0 : μ=50 H A :μ≠50
x̄−μ 0 53 . 8−50 3. 8
tC = = = =2 . 92.
s / √ n 5 . 2/ √16 1. 3
Since
t c=2 . 92>2. 131 , H 0 is rejected. Therefore the population mean mark is significantly
128
^ po
P−
Z=
po q ~ N (0, 1) where q o=1−p o
√ n
o
^ po
P− 0.7−0.6
=¿ =1.44
Zc = po q o 0.6 × 0.4
√
n √ 50
Since computed value of Zc =1.44 is less than the critical value of Z 0.05=1.645, therefore, the null
hypothesis cannot be rejected. Hence, based on this sample data we cannot reject the claim of the
sales clerk.
Exercise 8
1. From a normal population with the standard deviation is 4.2. A sample of size 25 is taken
with mean of 32. Find 99% confidence interval for the population mean.
2. A sample from an assumed normal distribution produced the values 9, 14, 10, 12, 7, 13, 12.
a) What is the single best estimate of μ ? b) Find an 80% C.I. for μ ?
129
3. Out of a sample of 80 customers 60 of them reply they are satisfied with the service they
received .Calculate a 95% confidence interval for the proportion of satisfied customers.
4. The manager claims that the average content of juice per bottle is less than 50cl. The
machine operator disagrees. A sample of 100 bottles yields an average content of 49cl per
bottle. Does this sample allow the manager to claim he is right (5% significance level)?
Assume that the population standard deviation s = 5 cl.
5. According to the norms established for a reading comprehension test, students should
average 84. If 45 randomly selected students averaged 87.8 with a s.d of 8.6, test the null
a)
H 0 : μ=55 V s H A : μ≠55 , α=0.01 , n=25 , x=50 , s=10 .
b)
H 0 : μ=327 V s H A : μ>327 , α=0.10 , n=9 , x=329 . 3 , s=3 .
7. In a study of aviophobia, a psychologist claims that 30% of all women are afraid of flying. If,
in a random sample, 41 of 150 women are afraid of flying, test the null hypothesis p = 0.30
130
CHAPTER NINE: SIMPLE LINEAR REGRESSION AND CORRELATION
131
This is because the dependent variable, Y is the effect of many independent variables in which X
is one of them. Contribution of other independent variables not considered in the model may be
minor. However, we cannot be certain that Y depends only on X. Thus the contribution of these
variables not included in the model and other factors such as measurement error is
accommodated by∈.
Mean of the values of ∈ is zero. Some of its values are positive, that is when the actual value lies
above the line Y^ = α^ + ^β Xi and some are negative in case when the actual value of Y lies below
the fitted regression line.
Assumptions:
1. The relationship between the dependent variable Y and independent variable X exist and is
linear.
2. For every value of the independent variable X, there is an expected value of the dependent
variable Y.
3. The dependent variable Y is a continuous random variable, whereas values of the independent
variable X are fixed values.
4. The sampling error ∈, associated with the expected value of the dependent variable Y is
assumed to be an independent random variable distributed normally with mean 0 and constant
2
variance σ about the regression line.
To estimate this model we take a sample of n independent observations which give rise to n pairs
(Xi, Yi) and find best estimates of the parameters or best fitted line using least square method of
estimation. A best fitting line is one for which the sum of squares of the errors, ∑ ε i2 is
minimum.
In the principle of least square method, one would select α and β such that
∑ ε i2 = ∑ (Y ¿¿ i−Y^ i )2 ¿ is minimum where Y^ i = α^ + ^βX
i
To minimize this function, first we take the partial derivatives of ∑ ε i2 with respect to α^ ∧ ^β
respectively .Then the partial derivatives are equated to zero separately and result in the
following normal equations respectively
n n
∑ Y i = n^α + ^β ∑ X i
i=1 i=1
132
n n n
∑ X i Y i =α^ ∑ X i + ^β ∑ X 2i
i=1 i =1 i=1
Solving these normal equations simultaneously we can get the values of α^ ∧ ^β as follows.
n n n
^β =
∑ xy−n x̄ ȳ =
n∑ xi yi −
i=1
(∑ )(∑ )
i=1
xi
i =1
yi
=
∑ ( x− x̄ ) ( y− ȳ )
α^ Ý ^β X́
∑ x 2 −n x̄ 2 n 2
∑ ( x− x̄ )2
n ∑ x 2−
i (∑ )
i=1
xi
and = -
These estimates are denoted by α^ ∧ ^
β . The estimated (fitted) regression line is given by:
Y^ i= α^ + ^β Xi
Before estimating the regression coefficients, it would be wise to plot the observed data on a
graph known as a scatter diagram. Scatter diagram is a plot of all ordered pairs (x i ,yi ) on the
coordinate plane which helps to observe relationship between two variables. This diagram gives
a preliminary idea on the type of relationship the two variables have.
Regression analysis is useful in predicting the value of one variable from the given value of
another variable, Y^ i= α^ + ^β Xi.
Example 9.1: For the following example [the number of hours (X) a student spent studying and
the marks (Y) each student received in an examination]:
Student 1 2 3 4 5 6 7 8 9 10 Total
x 8 5 11 13 10 6 18 15 2 9 97
y 65 44 79 72 70 54 90 85 33 56 648
2
x 64 25 121 169 100 36 324 225 4 81 1149
xy 520 220 869 936 700 324 1620 1275 66 504 7034
2 4225 1936 6241 5184 4900 2916 8100 7225 1089 3136 44952
y
133
a/ Draw the scatter diagram;
100
90
80
M a rk s o b ta in e d
70
60
50 y
40
30
20
10
0
0 5 10 15 20
hours spe nt
^β =
∑ xy−n x̄ ȳ =
7034−(10 )(9 .7)(64 . 8)
= 748. 4
=3.596
∑ x 2−n x̄ 2 1149−(10 )(9.7 )2 208.1 and
α^ = 64.8-3.596(9.7) =29.92.
134
r=
∑ ( x− x̄ )( y− ȳ )
√∑ ( x− x̄ )2 ∑ ( y− ȳ )2
Alternatively: The correlation coefficient is given by
r=
∑ xy−n x̄ ȳ
√ (∑ x2−n x̄2)(∑ y2−n ȳ2 )
The correlation coefficient, r is always lies between –1 and +1, inclusive.
• r = -1 implies perfect negative linear relationship between the two variables.
• r = +1 implies perfect positive linear relationship between the two variables.
• r = 0 implies there is no linear relationship between the two variables. But the two variables
may have non-linear relationship between them.
• r approaches +1 indicates strong positive linear relationship between the two variables.
• r approaches -1 indicates strong negative linear relationship between the two variables.
• r approaches 0 indicates weak linear relationship between the two variables .
9.3 Coefficient of Determination (r2)
The square of the correlation coefficient, r2, is called the coefficient of determination. It
measures the variation in the dependent Y explained by the simple linear regression of Y on X.
1− r2 measures the proportion of variation in Y not explained by the simple linear regression of
Y on X.
Example 9.2: If r = 0.9, then r2 = 0.81 and 1- r2 =0.19. Approximately 81% of the variation in
the dependent variable, Y, is explained by the simple linear regression of Y on X. The remaining,
1-r2, 19 % of the variation in Y is unexplained by the simple linear regression of Y on X.
Example 9.3: The research director of the Saving and Loan Bank collected 25 observation of
montage interest rates X and number of house sales Y at each interest rate. The director
computed that,
∑ x i=125 , ∑ y i=100 , ∑ x i yi =520 , ∑ x 2i =650 , ∑ y 2i = 436
Compute and interpret (i) Coefficient of correlation.
(ii) The coefficient of determination.
Solution: i) Coefficient of correlation.
135
r=
∑ xy−n x̄ ȳ 520−(25 )(5)(4 )
=
√ (∑ x2−n x̄2)(∑ y2−n ȳ2 ) √ ( 650−25 (5 )(5 ))( 436−(25)( 4 )(4 )) =
0.667
The two variables have positive linear relationship.
ii) Coefficient of determination, r2= (0.667)2 =0.44 this shows that 44% of the variation in the
number of house sales is due to the variation in the interest rate.
The simple correlation coefficient (r) cannot be used when we are dealing with a qualitative
data such as judgment about beauty, efficiency, honesty, etc. In such cases, the rank correlation
coefficient is used to explain the correlation or if there is an agreement in ranking. It is denoted
by
rs and is defined as follows:
2
6∑ d
r s =1−
n( n2 −1) , where d is the difference between the rank of x and the corresponding y.
To calculate
r s , we first rank the x ' s among themselves from least to best or from best to
least; then we rank the y' s in the same way, find the sum of the squares of the differences, d,
between the ranks of the x's and the y’s. When there are ties in rank, we assign to each of the tied
observations (having equal value) the mean of their ranks.
Example 9.4: Assume that ten girls in a beauty contest for Miss Debre Markos were ranked by
two judges as follows:
Girl Number 1 2 3 4 5 6 7 8 9 10
Judge A 4 8 6 7 1 3 2 5 10 9
Judge B 3 9 6 5 1 2 4 7 8 10
136
Calculate
rs and interpret it.
Solution: Since the ranks are given, we need to find only the difference in ranks for each girl
and the square of these differences.
D 1 -1 0 2 0 1 -2 -2 2 -1 0
d2 1 1 0 4 0 1 4 4 4 1 20
6 (20)
2
r 1− =0 .88
For these n = 10 pairs, ∑ d =20 , and s = 10(100−1) , which is positive
and close to 1, showing that there is a very good agreement (or concordance) between the
two judges regarding the beauty of the girls.
positive agreement,
r s =-1complete disagreement where the two rankings go
completely in opposite direction.
Exercise 9
1. What is scatter diagram? What is the advantage of scatter diagram?
2. What is the coefficient of determination?
3. Based on the following data answer the question.
Sales 15 18 25 27 30 35
Advertising 50 65 82 95 110 120
expenditure
a. Decide which variable should be the independent variable and which should be the dependent
variable.
137
b. Make a scatter plot of the data.
c. Does it appear from inspection that there is a relationship between the variables?
d. Calculate the least squares line. Put the equation in the form of: Y^ i= α^ + ^β Xi
e. Find and interpret the correlation coefficient.
f. What is the slope of the least squares (best-fit) line? Interpret the slope.
4. Below are the planets distance from the sun and the time it takes for the planet to complete its
orbit around the sun.
a. Make a scatter plot of the data. Does it appear from inspection that there is arelationship
between the variables?
b. Calculate the least squares line. Put the equation in the form of: Y^ i= α^ + ^β Xi
c. Find and interpret the correlation coefficient.
d. What is the slope of the least squares (best-fit) line? Interpret the slope.
e. find the estimated year to complete the orbit if the distance is 1000.
5. The number of cigarettes consumed (in billions)(say, x) and the number of cigarettes exported
from the same country (say, y).
X 525 510 500 485 486 487 486
y 164 179 206 196 220 231 244
Compute and interpret
a) Calculate the least squares line. Put the equation in the form of: Y^ i= α^ + ^β Xi
b) The Coefficient of correlation, r.
c) The coefficient of determination, r2
138
ANSWERS FOR EXERCISES
Exercise 1
2. a/ collection of all the 350 one-acre plots, b/a sub-collection of ten one-acre plots selected
randomly, c/ amount of wheat produced in a one-acre plot, d/ mean amount of wheat produced
per one acre plot.
3. a/ collection of enrolled students in a particular semester, b/smoking habit of a student, c/it is
qualitative variable since a person is either a smoker or non-smoker.
4. a/ nominal, b/nominal, c/ratio, d/ratio, e/nominal, f/ratio, g/ratio, h/ratio, i/ordinal
Exercise 2
Frequency 2 8 7 8 2 3 30
139
i) the less than ¿ cf ii) the ‘or more’ ¿ cf
than
Exercise 3
1. Median =191.25, Q1= 135.58, Q2=191.25, Q3=243.06, P75= 243.06
2. Mean=21, median=15, mode=15
3. Q1= 67.9, Q2=73.3, Q3=77.6, D5= 73.3, D8=78.5, P90=83.25
4. Mean=72, mode= 60 & 80, Q3= 80
5. Mean=50, mode= Friday
6. Mode=32.5, D5= 31.7,
7. H.M. =3.2
8. median=2.28, Q2=2.28, P72=3.2
Exercise 4
1. M.D(x́) = 2.65, C.M.D(x́) = 0.36, M.D(^x ) = 3.71, C.M.D(^x ) = 0.34 S2 = 9.584, S = 3.096
2. CVfirm A = 19.048% and CVfirm B = 23.158%. There is greater variability in individual wages of
firm B.
b) For region A CV = 15.360% and for region B CV= 13.467%. Therefore, in the region B of
family income is more consistent.
140
5. α 3=−1.01. Since α 3< 0, the distribution is negatively skewed.
Exercise 5
1. a/ 1/6, b/0.5, c/0.5, d/0
2. 5040
3. 0.2797
4. 0.5275
5. a/ 1296, b/360
6. 13/15
7. 4/65
8. a/0.046, b/0.348
Exercise 6
5 15
f ( x )=
( x )( 4−x )
, x=0,1,2,3,4
20
1. a)
(4) ; b) Find f (0), f (1 ), f (2), f (3 ) and f (4 ) in (a);
Exercise 7
4.a/population mean = 9, b/ popl. Variance = 20, d/sample mean = 9,
e/ st.dev. of sample mean=3.16
5.a/population mean = 9, b/ popl. Variance = 20, d/sample mean = 9,
e/ st.dev. of sample mean=2.58
141
6. a/0.5111, b/0.1977
7. 0.4286, 8. b/9, c/s.d.=2.64
Exercise 8
1. (29.833, 34.167)
5.
Z c =2 .96 >2. 33 ⇒ H o is rejected
6. a) t = -2.5
⇒ H o cannot be rejected; b) t=2. 3 ⇒ H o must be rejected
7. Z = -0.81 ⇒ H0 cannot be rejected
Exercise 9
3. a/ dependent-sales, independent- Advertising expenditure, d/^y =0.928+0.277x.
e/ r =0.99,strong relationship, f/0.277, the dependent variable changes by 0.277 when the
independent variable is changed by one unit.
4. b/ ^y = -12.497+0.066x, c/r = 0.989, e/53.503
5.a/ ^y = 970.779-1.54x, b/ -0.840, c/ 70.6%
142
Appendix: Table A. Approximate values of the standard normal distribution function (i.e. area
between z=0 and Z=z OR area between Z= 0 and Z≤z):
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0190 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2157 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2969 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3513 0.3554 0.3577 0.3529 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4215 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4492 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993
3.2 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995 0.4995
3.3 0.4995 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997
3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
143
tα= 0.1 0.05 0.025 0.01 0.005 0.0025 0.001 0.0005
df = 1 3.078 6.314 12.706 31.821 63.656 127.321 318.289 636.578
2 1.886 2.920 4.303 6.965 9.925 14.089 22.328 31.600
3 1.638 2.353 3.182 4.541 5.841 7.453 10.214 12.924
4 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610
5 1.476 2.015 2.571 3.365 4.032 4.773 5.894 6.869
6 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959
7 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408
8 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041
9 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781
10 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587
11 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437
12 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318
13 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221
14 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140
15 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073
16 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015
17 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965
18 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922
19 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883
20 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850
21 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819
22 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792
23 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.768
24 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745
25 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725
26 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707
27 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.689
28 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674
29 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.660
30 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646
40 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551
50 1.299 1.676 2.009 2.403 2.678 2.937 3.261 3.496
60 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460
Infinity 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.290
Table B. t-table with right tail probabilities
144
Table C. Right tail areas for the Chi-square Distribution
df\area 0.995 0.99 0.975 0.95 0.9 0.25 0.1 0.05 0.025 0.01 0.005
1 0.000 0.000 0.001 0.004 0.016 1.323 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 2.773 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 4.108 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 5.385 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 6.626 9.236 11.071 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 7.841 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 9.037 12.017 14.067 16.013 18.475 20.278
8 1.344 1.647 2.180 2.733 3.490 10.219 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 11.389 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 12.549 15.987 18.307 20.483 23.209 25.188
11 2.603 3.053 3.816 4.575 5.578 13.701 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 6.304 14.845 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 7.042 15.984 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 7.790 17.117 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 8.547 18.245 22.307 24.996 27.488 30.578 32.801
16 5.142 5.812 6.908 7.962 9.312 19.369 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 10.085 20.489 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.390 10.865 21.605 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 11.651 22.718 27.204 30.144 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 12.443 23.828 28.412 31.410 34.170 37.566 39.997
21 8.034 8.897 10.283 11.591 13.240 24.935 29.615 32.671 35.479 38.932 41.401
22 8.643 9.542 10.982 12.338 14.041 26.039 30.813 33.924 36.781 40.289 42.796
23 9.260 10.196 11.689 13.091 14.848 27.141 32.007 35.172 38.076 41.638 44.181
24 9.886 10.856 12.401 13.848 15.659 28.241 33.196 36.415 39.364 42.980 45.559
25 10.520 11.524 13.120 14.611 16.473 29.339 34.382 37.652 40.646 44.314 46.928
26 11.160 12.198 13.844 15.379 17.292 30.435 35.563 38.885 41.923 45.642 48.290
27 11.808 12.879 14.573 16.151 18.114 31.528 36.741 40.113 43.195 46.963 49.645
28 12.461 13.565 15.308 16.928 18.939 32.620 37.916 41.337 44.461 48.278 50.993
29 13.121 14.256 16.047 17.708 19.768 33.711 39.087 42.557 45.722 49.588 52.336
30 13.787 14.953 16.791 18.493 20.599 34.800 40.256 43.773 46.979 50.892 53.672
145
146
References:
147