Ch#5# ST
Ch#5# ST
Introduction to Statistics
1
Contents
Chapter Page
1. Introduction--------------------------------------------------------------- 3
2. Organization and Methods of Data Presentation------------------ 9
3. Measures of Central Tendency and Location-----------------------21
4. Measures of Dispersion (Variation)-----------------------------------33
5. Elementary Probability--------------------------------------------------40
6. Probability Distributions-------------------------------------------------47
7. Sampling and sampling distribution of the mean-------------------58
8. Estimation and hypothesis testing--------------------------------------65
9. Simple Linear Regression and correlation analysis-----------------73
2
Chapter One
Introduction
The word “statistics” could be singular or plural. The definition given in the second place above
might be taken as the singular form of “statistics”.
Statistics, in its singular sense is a subject area or field of study. It is defined as science, which
deals with the collection, processing, analysis, interpretation and presentation of numerical facts.
The subjects of statistics, as it seems, is not a new discipline but it is as old as the human society
itself. The sphere of its utility, however, was very much restricted.
The word “statistics” is derived from the Latin for “state” indicating the historical importance of
governmental data gathering, which related to demographic information (military recruitment
and tax collecting). Thus, the scope of statistics in the ancient times was primarily limited to the
collection of demographic, property and wealth data of a country by governments for framing
military and fiscal policies.
Nowadays, statistics is used almost in every field of study, such as natural science, social science
engineering, medicine, agriculture, e t c.
Classification: Statistics is broadly divided into two categories based on how the collected data
are used.
1. Descriptive Statistics
deals with describing data without attempting to infer anything that goes beyond the given set
of data,
consists of collection, organization, summarization and presentation of data.
2. Inferential Statistics
deals with making inferences and/or conclusions about a population based on data obtained
from a limited sample of observations,
consists of performing hypothesis testing, determining relationships among variables and
making predictions.
Example
A newspaper reports the following net paid circulation from 1989 E.C to 1993 E.C :
365,000 368,650 370,475 375,950 383,250
i) If one performs the necessary calculation to show that the average yearly net paid
3
circulation form 1989 to 1993 was 372,605 , then his work belongs to the domain of
descriptive statistics
(383,250 365,000)
ii) If he says there was a 5 percent X 100 increase from 1989 to
365,000
1993, again this is descriptive statistics
iii) If he uses the data to predict that by the year 1996 E.C the news papers net paid
circulation will be 402413, then his work belongs to the domain of inferential
statistics.
b) Sample: Is a part of a population taken so that some generation about the population
can be made. A sample should be a representative of the population. Example: If you
want to study the mean age of primary school teachers in sodo town, all primary school
teachers in sodo town constitute the population as mentioned above, but if you study
only some of the teachers, the selected ones constitute your sample.
We have defined statistics, in singular sense, as a science that deals with collection,
organization (classification), presentation, analysis, and interpretation of numerical facts.
So we consider the following stages of statistical investigation:
Data Collection: This is a stage where we gather information for our purpose.
Data Organization: It is a stage where we edit our data. A large mass of figures that are
collected from surveys frequently need organization. The collected data involve irrelevant
figures, incorrect facts, omission and mistakes.
Data Presentation: The organized data can now be presented in the form of tables, charts
diagrams and graphs. At this stage, large data are presented in a very summarized and
condensed manner.
Data Analysis: This is the stage where we critically study the data. The purpose of data
analysis is to dig out information useful for decision making.
Data Interpretation: This is the stage where draw valid conclusions from the results
obtained through data analysis. If the data that have been analyzed are not properly
4
interpreted, the whole purpose of the investigation may be defected and misleading
conclusion may be drawn.
Uses of statistics
The science of statistics is very essential for research and decision making processes in all aspects
of human life. The following are some of the areas for which statistical analysis is required:
to represent the facts in the form of numerical data.
to summarize a mass of data into a few presentable understandable and
precise figures.
to Predict or forecast future trend.
to help select a course of action among a number of alternatives.
to help in formulating policies.
a) It does not study qualitative characteristics directly Examples: Beauty, honesty, poverty,
and standard of living.
b) It does not study a single individual but deals with aggregate of facts. Example: The
population size of a country for some given year does not help us for comparative
studies.
c) Statistical results are true only on the average. Examples: The probability of getting a
head in tossing a coin is 1|2. The germination percentage of a given variety of seed is
80%
d) It is sensitive for misuse: Examples: The number of car accidents committed in a city in
a particular year by women drivers is 10 while that committed by men drivers is 40.
Hence women drivers are safe drivers.
1.5 TYPES OF VARIABLES AND MEASUREMENT SCALES
a) Quantitative variables: are variables that can be quantified or can have numerical
values. Examples: height, area, income, temperature e t c.
b) Qualitative variables: are variables that can not be quantified directly. Examples: colour
, beauty, sex, location qualitative variables are also called categorical variables. And
hence we have two types of data; quantitative & qualitative data.
5
Examples: weight, Length, Volume, e t c.
1. Nominal scale: - “Nominal “is a Latin word for “name” This is a scale for grouping
individuals into different categories.
Example 1: red, brown, black
2: short, tall
3: pass, fail
In this scale, one is different from the other
+, -, *, /, impossible, comparison is impossible
2. Ordinal scale: - “ ordinal” is a Latin word, meaning “order”
Interval scale data convey better information than nominal and ordinal scale data.
There is a constant interval size between any adjacent units on the measurement scale.
There exists a zero point on the measurement scale and that there is a physical
significance to this zero point.
6
Examples: height, weight, volume, etc
One is different, larger /taller/ better/ less by a certain amount of difference and so much
times than the other.
(+, -, *, / are possible on this scale)
This measurement scale provides better information than interval scale of measurement
Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are
Comparable
Meaningful and
Collected for a well defined objective
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of
magnitude.
It enables us to know the rang of the data set easy and it also gives us some idea
about the general characteristics of the distribution.
Any scientific investigation requires data related to the study. The required data can be obtained
from either a primary source or a secondary source.
Primary source: Is a source of data that supplies first hand information for the use of the
immediate purpose.
Primary data: are data originally collected for the immediate purpose.
- Primary data are more expensive than secondary data.
Secondary source: are individuals or agencies, which supply data originally collected for
other purposes by them or others.
- Usually they are published or unpublished materials, records, reports, e t c.
Secondary data: data collected from a secondary source.
Methods of data collection
There are three major methods of data collection
i. observation or measurement
ii. Interviews and questionnaires
iii. The use of documentary sources
I. Observation or measurement
In this method, data can be obtained through direct observation or measurement .
- It requires training of persons who measure in order to insure the use of standard
procedure
- Provides accurate information but it is expensive and inconvenient
7
b) Telephone interviews
c) Mailed questionnaires ( Self administered questionnaires returned by mail )
Exercise
1. How does statistics help for your profession?
2. Differentiate descriptive and inferential statistics.
3. Mention some limitations of statistics (discuss by examples).
4. Explain the difference between the following statistical terms by giving example?
. Qualitative and quantitative variables
. Nominal and ordinal
. Parameter and statistic
. Secondary and primary data
5. Explain various methods of collecting primary and secondary data.
6. What is a questionnaire?
8
Chapter Two
Classification eliminates inconsistency and also brings out the points of similarity and/or
dissimilarity of collected items/data.
Classification is necessary because it would not be possible to0 draw inferences and conclusions
if we have a large set of collected [raw] data.
A frequency distribution is a table that presents data according to some criteria with the
corresponding number of items falling in each class (i.e. with the corresponding frequencies.)
Example: A frequency distribution presenting the number of males and females in a class
Sex Frequency
Male 57
Female 39
Generally, there are two basic types of frequency distributions: Ungrouped and Grouped
frequency distributions.
Ungrouped frequency distribution is a table of all potential raw scored values that could possibly
occur in the data along with their corresponding frequencies. Ungrouped frequency distribution is
often constructed for small set of data or a discrete variable.
Example: The following data are the ages in years of 20 women who attend health education last
year:
30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.
Construct a frequency distribution for these data.
STEP 1. Find the range of the data:
Range Maximum observation Minimum observation
STEP 2. Construct a table, tally the data and complete the frequency column. The frequency
distribution becomes as follows.
9
Age Tally Frequency
29 / 1
30 //// 4
31 / 1
32 /// 3
33 / 1
35 // 2
36 // 2
37 / 1
39 / 1
41 /// 3
42 / 1
When the range of the data is large, the data must be grouped into classes. Grouped frequency
distribution is a frequency distribution when several numbers of data are grouped into one class.
10
– Relative frequency: the frequency a class divided by the total frequency (i.e. sum of all
frequencies) and, if multiplied by 100, gives the percent of values falling in that class.
Frequency of that class
Re lative frequency of a class
Total frequency
Note:
The relative frequency shows what fractional part or proportion of the total frequency
belongs to the corresponding class.
The sum of all the relative frequencies in the frequency distribution is always 1.
– Relative cumulative frequency (less than type/ more than type): total of the relative frequencies
above/ below a class inclusively. Or the cumulative frequency (less than type/more than type)
divided by the total frequency. This gives the percent of values which are less than/more than
the upper/lower class boundary.
11
26 31 36 41 46 51 56 61
STEP 5. Upper limit of the first class = 31-1 = 30. And hence the upper class limits become
30 35 40 45 50 55 60 65
The lower and the upper class limits (Steps 5 and 6) can be written as follows.
Class limits
26 – 30
31 – 35
36 – 40
41 – 45
46 – 50
51 – 55
56 – 60
61 – 65
STEP 6. By subtracting 0.5 units of measurement from the lower class limits and by adding 0.5 units
of measurement to the upper class limits, we can get lower and upper class boundaries as
follows.
Class
boundaries
25.5 – 30.5
30.5 – 35.5
35.5– 40.5
40.5– 45.5
45.5– 50.5
50.5– 55.5
55.5– 60.5
60.5– 65.5
STEPS 8, 9 and 10 are displayed in the following table (columns 3, 4 and 5&6 respectively).
Class limits Class Tally frequency Cumulative Cumulative
boundaries frequency (less frequency
than type) (more than
type)
26 – 30 25.5 – 30.5 ///// 5 5 40
31 – 35 30.5 – 35.5 ///// 5 10 35
36 – 40 35.5– 40.5 ///// 5 15 30
41 – 45 40.5– 45.5 ///// //// 9 24 25
46 – 50 45.5– 50.5 ///// // 7 31 16
51 – 55 50.5– 55.5 / 1 32 9
56 – 60 55.5– 60.5 // 2 34 8
61 – 65 60.5– 65.5 ///// / 6 40 6
The data that is presented by a frequency distribution can also be displayed diagrammatically or
graphically.
Diagrams and graphs:
- are techniques for presenting data in visual displays using geometric figures;
- are visual aids which give a bird’s eye view about a given set of numerical data;
- have greater attraction than mere figures (numbers);
- facilitate comparison of data;
12
- are easily understandable by anyone who does have no statistical background
Usually diagrams are appropriate for presenting discrete data, whereas graphs are appropriate for
presenting continous types of data.
There are three common diagrammatic presentations of data: bar-diagram/charts, pie-chart and
pictograms, as well as three common graphic presentations of data: histogram, frequency
polygon, and cumulative frequency polygon (ogive).
I. Bar-diagrams/ Bar-charts
- Bar-diagram is a series of equally spaced bars having equal width and the height of each bar
representing the magnitude or frequency of observations in each group.
- Bar-diagrams are usually used to represent one way or simple frequency distribution.
- Bar-diagrams can be drawn either horizontally or vertically. Usually horizontal bar-diagrams
are used for qualitatively classified data whereas vertical bar-diagrams are used for
quantitatively classified data.
AB
Blood Type
8 10 12 14 16 18
Frequency
Example: The following frequency distribution shows sales of production (in million birr) of
three products for 2004 production year.
Product Sale (in million)
A 14
B 21
C 9
D 17
The bar-diagram presentation for these data is given below.
13
22
20
18
16
Sales (in million birr)
14
12
10
6
A B C D
Product
2. Deviation bar-diagrams
When the data take both positive and negative values (for instance data on profit, net export,
percent change, etc) deviation bar-diagrams are appropriate.
Data: Net profit (in thousands birr) in oil sales for five years
The deviation bar-diagram for the data looks like the following.
20
Profit (in thousands)
10
-10
1997 1998 1999 2000 2001
Year
14
3. Broken bar-diagrams
This kind of bar-diagram is used to present data involving a few extreme values where it will be
difficult to accommodate the magnitude of the bars corresponding to these values within the
graph paper. In this case we use pieces of bars with each piece starting with a jump on the
numerical scale.
Example: Data: - Amount of production per a day for four products of a factory.
Product Quantity
produced
(kg/day)
A 14
B 35
C 23
D 109
4. Component bar-diagrams
When it is desired to show how a total (an aggregate) is divided into component parts, we use
component bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a
variable with each aggregate broken into its component parts and different colors or designs are
used for identification.
50
Production
40
30
20
MAIZE
10
WHAET
0 BARLEY
1990 1991 1992 1993
YEAR
15
5. Multiple bar-diagrams
Multiple bar-diagrams are used to display data on more than one variable. They are used for
comparing different variables at the same time.
Example: The data given in the above example can be presented by using multiple bar-diagram as
below.
30
20
Production
10
BARLEY
WHAET
0 MAIZE
1990 1991 1992 1993
YEAR
II. Pie-charts
A pie-chart is a circle that is divided into sections or wedgrs according to the percentages of
frequencies in each category of the distribution. The angle of the sector of a class is obtained by
multiplying the ratio of the frequency of the class to the total frequency by 3600.
frequency of the class
i.e. sec tor angle of a class 360 0
total frequency
Note that pie-charts are usually used for depicting nominal level data.
Example: A survey showed that a car owner spends birr 2,950 per year on operating expenses.
Below is the breakdown of the various expenditure items. Draw an appropriate chart to portray
the data.
16
Expenditure item Amount (in Percentage Degree
birr) (approx) (approx)
Fuel 603 20 74
Interest on car loan 279 9 34
Repairs 930 32 113
Insurance and license 646 22 79
depreciation 492 17 60
Total 2,950 100 360
17% 20%
Key
Fuel
Insurance and license
9% 22%
Repairs
Interest on car loan
Depreciation
32%
III. Pictograms
In pictograms, we represent the data by means of some picture symbols. Here we decide a
suitable picture to represent a definite number of units in which the variable is measured.
Example: Draw a pictorial diagram to present the following data (number of students in a certain
school for four years.)
1995
1994 Key: = 1000 students
1993
1992
17
IV. Histogram
A histogram is another way of data presentation which is more suitable for frequency
distributions with continuous classes.
In drawing a histogram, we put the class boundaries of each class on the horizontal axis and its
respective frequency on the vertical axis.
V. Frequency Polygon
A frequency polygon is a line graph drawn by taking the frequencies of the classes along the
vertical axis and their respective class marks along the horizontal axis. Then join the cross points
by a free hand curve.
Example: Present the data in the previous example using a frequency polygon.
10
6
Frequency
0
0.0 8.50 14.50 20.50 26.50 32.50 38.50
Class Marks
Cumulative frequency polygon can be traced on less than or more than cumulative frequency
basis. Place the class boundaries along the horizontal axis and the corresponding cumulative
frequencies (either less than or more than cumulative frequencies) along the vertical axis. Then
join the cross points by a free hand curve.
Example: the data in the previous example can be presented using either a less than or a more
than cumulative frequency polygon as given below (i) and (ii) respectively.
18
(i) Less than type cumulative frequency polygon
30
10
0
11.50 17.50 23.50 29.50 35.50 41.50
30
More than type cumulative frequencies
20
10
0
5.50 11.50 17.50 23.50 29.50 35.50
Exercise
1. Given the following row data:
62 50 57 58 51 53 62 64 60 61
60 51 64 55 55 52 60 65 58 60
59 52 63 56 56 58 64 63 62 60
58 54 62 54 54 60 65 60 62 59
56 63 52 53 62 53 61 61 59 65
a) Construct simple frequency distribution table.
b) Construct grouped frequency distribution table.
2. If class mid-points in a frequency distribution of a group of persons are 25, 32, 39, 46, 53, 60,
67, 74 and 81, find (a) size of the class interval, and (b) the class boundaries.
19
3. In a sample study about coffee drinking habits in two villages A and B, the following
information was recorded:
A: Females were 40%. Total coffee drinkers were 45% and male non-coffee drinkers were 20%.
B: Male were 55%, male non-coffee drinkers were 30% and female coffee drinkers were 15%.
Present the above information in a tabular form.
4. The following table shows the marital status of males and females (18 years and older) in a
certain city. Draw a pie chart separately for males and females to display the data.
Marital Status Male (percent of total) Female (percent of total)
Single 21 16
Married 65 73
Widowed 9 4
Divorced 5 7
5. Prepare (a) histogram (b) frequency polygon (c) Ogive for the following frequency distribution
of marks in a final examination.
Class 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89
Frequency 6 12 20 14 12 8 6 2
20
Chapter Three
The most important aspect of studying the distribution of a sample measurement is the position of
the central value, that is, a representative value about which the measurements are distributed and
when it is convenient to have one figure that is representative of each group. This figure is known
as the average of the group. If the numbers of the group are arranged in order of magnitude, the
averages tend to fall around the central position in the group, so averages are called measures of
central tendency. In short, any measure intended to represent the center of data set is called a
measure of central tendency.
We say a measure of central tendency is best if it posses most of the following. It should:
- be simple to understand and easy to calculate/interpret,
- exist and be unique,
- be rigidly defined by mathematical formula,
- be based on all observations,
- Not be seriously affected by extreme observations,
- Have capable of further statistical analysis and/or algebraic manipulation.
Let a data set consists of a number of observations, represents by x1 , x 2 , ..., x n where n (the last
subscript) denotes the number of observations in the data and xi is the ith observation. Then the
sum
For instance a data set consisting of six measurements 21, 13, 54, 46, 32 and 37 is represented by
x1 , x 2 , x3 , x 4 , x5 and x 6 where x1 = 21, x 2 = 13, x3 = 54, x 4 = 46, x5 = 32 and x 6 = 37.
6
Their sum becomes x
i 1
i 21+13+59+46+32+37=208.
n
2 2 2 2
Similarly x1 x 2 ... x n = xi
i 1
Some Properties of the Summation Notation
n
1. c = n.c
i 1
where c is a constant number.
21
n n
2. b.xi b xi where b is a constant number
i 1 i 1
n n
3. (a bxi ) n.a b xi
i 1 i 1
where a and b are constant numbers
n n n
4. ( xi y i ) xi y i
i 1 i 1 i 1
Several types of averages or measures of central tendency can be defined, the most commons are
- the arithmetic mean or the mean
- the geometric mean
- the harmonic mean
- the mode
- the median
The choice of average (measure of central tendency) depends upon which best represents the
property under discussion.
The arithmetic mean is defined as the sum of the measurements of the items divided by the total
number of items.
When the data are arranged or given on the form of ungrouped frequency distribution, then the
formula for the mean is
Example 1: You measure the body lengths (in inches) of 10 full-term infants at birth and record
the following:
17.5 19.5 17.5 19 20
21 18 19.5 18 10.75
Compute the sample mean length of the infants for these data.
Example 2: Monthly incomes of fourth year regular students are given in the following frequency
distribution.
Monthly income (birr) 54.5 64.5 74.5 84.5 94.5 104.5 114.5
Number of students 6 9 15 25 13 7 5
Compute the mean for these data.
If data are given in the form of continuous frequency distribution, the sample mean can be
computed as
22
Where xi = the class mark of the i th class; i = 1, 2, …, k
f i = the frequency of the i th class and k = the number of classes
k
Note that f
i 1
i n = the total number of observations.
Example: The following table gives the daily wages of laborers. Calculate the average daily
wages paid to a laborer.
The sum of the squares of the deviations of a set of observations from any number, say A, is
the least only when A= . That is,
When a set of observations is divided into k groups and x1 is the mean of n1 observations of
group 1, x 2 is the mean of n 2 observations of group2, …, x k is the mean of n k observations
of group k , then the combined mean ,denoted by xc , of all observations taken together is
given by
If a wrong figure has been used in calculating the mean, we can correct if we know the
correct figure that should have been used. Let
denote the wrong figure used in calculating the mean
be the correct figure that should have been used
be the wrong mean calculated using , then the correct mean, , is given
by
23
Solution:
Exercise: The average score on the mid-term examination of 25 students was 75.8 out of 100.
After the mid-term exam, however, a student whose score was 41 out of 100 dropped the course.
What is the average/mean score among the 24 students?
Example: A student’s final mark in Mathematics, Physics, Chemistry and Biology are
respectively 82, 80, 90 and 70.If the respective credits received for these courses are 3, 5, 3 and 1,
determine the approximate average mark the student has got for one course.
Solution: We use a weighted arithmetic mean, weight associated with each course being taken as
the number of credits received for the corresponding course.
xi 82 80 90 70
wi 3 5 3 1
Therefore x w
w x i i
(3 82) (5 80) (3 90) (1 70)
82.17
w i 3 5 3 1
Average mark of the student for one course is approximately 82.
24
3.3.2 Geometric Mean (G.M)
The geometric mean is the nth root of the product of n positive values. If X1, X2,…,Xn are n
positive values, then their geometric mean is
G.M =(X1X2…Xn)1/n .
The geometric mean is usually used in:
Average rates of change
Ratio
Percentage distribution
Logarithmical distribution.
In case of number of observation is more than two it may be tedious taking out from
square root ,in that case calculation can be simplified by taking natural logarithm
with base ten
1
G.M = n x1 , x2 . . . . xn G . M= x1 . . . . xn n take log in both sides.
1
log ( G . M) = log x1 , . . . . xn
n
1
= log x1 log x2 . . . log xn
n
n
1
=
n
i 1
log xi
1 n
G. M = Antilog log xi
n i 1
This shows that the logarithms of G. M is the mean of the logarithms of individuals observations
Example1, The ratio of prices in 1999 to those in 2000 for 4 commodities were 0.9, 1.25,1.75 and
0.85. Find the average price ratio by means of geometric mean.
Solution:
G.M = antilog
log X i
= antilog
(log 0.92 log 1.25 log 1.75 log 0.85)
n 4
(0.963 1 0.0969 0.2430 0.9294 1)
= antilog = antilog0.5829 = 1.14///
4
What is the arithmetic mean of the above values?
0.92 1.25 1.75 0.85
X=
4
25
Note that
1.when the observed values x1,x2,……….xn have the corresponding frequencies f1.f2………fn
respectively then geometric mean is obtained by
f
G.M = n
x1 1 , x f 2 2 . . . . x f n n
n n
1
=
n
i 1
fi log xi where n=
i 1
fi
2. When ever the frequency distributions are grouped (continuous), class marks of the class
interval are considered as Xi and the above formula can be used that is
n f f2 fn
G.M = m1 1 , m2 . . . . mn
n n
1
= f i log mi where n= fi and mi is class mark if ith class.
n i 1 i 1
Example
Find the harmonic mean of the values 2,3 &6.
3 3 3 6
H.M = = = = 3 ///
1/ 2 1/ 3 1/ 6 3 2 1 6
6
26
The harmonic mean is used to average rates rather than simple values. It is usually appropriate in
averaging kilometers per hour.
Example: A driver covers the 300km distance at an average speed of 60 km/hr makes the return
trip at an average speed of 50km/hr. What is his average speed for total distance?
Solution
Trip Distance Average speed Time taken
f i
xi
H. M = Reciprocal
n
n
= , Where n is the total no. of observations
f
xi
i
Properties of harmonic mean
i. It is based on all observation in a distribution.
ii. Used when a situations where small weight is give for larger observation and
larger weight for smaller observation
27
iii. Difficult to calculate and understand
iv. Appropriate measure of central tendency in situations where data is in ratio,
speed or rate.
3.3.4 The Median
The median of a set of items (numbers) arranged in order of magnitude (i.e. in an array form) is the
middle value or the arithmetic mean of the two middle values. We shall denote the median of
x1 , x 2 , ..., x n by ~
x . For ungrouped data the median is obtained by
The mode or the modal value is the most frequently occurring score/observation in a series and
denoted by x̂ . Note that the mode may not exist in the series or, even if it does exist, it may not be
unique.
28
1
xˆ Lmod W
1 2
Where Lmod lower class boundary of the modal class
1 The difference between the frequency of the modal class and the next lower class
2 The difference between the frequency of the modal class and the next higher class
W is the class width
The modal class is the class with the highest frequency in the distribution.
Examples 1: The marks obtained by ten students in a semester exam in statistics are: 70, 65, 68, 70,
75, 73, 80, 70, 83 and 86. Find the mode of the students’ marks.
Merits of mode
- Mode is not affected by extreme values.
- Mode can be calculated even in the case of open-end intervals. And it is not necessary to know all
observations.
Demerits of mode
- Mode may not exist in the series and if it exists it may not be a unique value.
- It does not fulfill most of the requirements of a good measure of central tendency
- It may be unrepresentative in many cases.
I. Quartiles: are values which divide the data set in to four equal parts, denoted by Q1 ,Q2 and Q3 . The
first quartile is also called the lower quartile and the third quartile is the upper quartile. The second
quartile is the median.
For Ungrouped data:
Let Q j be the j th quartile value for j 1, 2, 3 . Then
th
j
Q j n 1 item; j 1, 2, 3.
4
For grouped data
We can apply the following formula:
j n 4 FQ j
Q j LQ j W ; j 1, 2, 3.
f
Qj
th
Where Q j the j quartile which is to be worked out
LQ j Lower class boundary of the j th quartile class
FQ j Sum of frequencies of all classes lower than the j th quartile class
f Q j Frequency of the j th quartile class and W Class width
29
The j th quartile class is the class with the smallest cumulative frequency greater than or equal
to j n 4 . It can be located by counting j n 4 of the frequencies beginning from the lowest class.
II. Deciles: are values dividing the data in to ten equal parts, denoted by D1 , D2 , ..., D9 . The fifth decile
is the median.
For Ungrouped data
Let D j be the j th decile value for j 1, 2, ... , 9 . Then
th
j
D j n 1 item; j 1, 2, ... , 9
10
For grouped data
We can apply the following formula:
j n10 FD j
D j LD j W ; j 1, 2, ... , 9
f Dj
Define the symbols similar way as we did in the case of quartiles.
The j th decile class is the class with the smallest cumulative frequency greater than or equal
to j n 10 . It can be located by counting j n 10 of the frequencies beginning from the lowest class.
III. Percentiles: are values which divide the data in to one hundred equal parts, denoted by P1 , P2 , ... P99 .
The fiftieth percentile is the median.
For ungrouped data
Let Pj be the percentile value for j 1, 2, 3, ... , 99 . Then
th
j
Pj n 1 item; j 1, 2, 3, ... , 99
100
For grouped data
We can use the following formula:
j n100 FPj
Pj LPj W ; j 1, 2, 3, ... , 99
f Pj
Define the symbols similar way as we did in the case of quartiles.
The j th percentile class is the class with the smallest cumulative frequency greater than or equal
to j n 100 . It can be located by counting j n 100 of the frequencies beginning from the lowest class.
Interpretations
1. Q j is the value below which ( j 25) percent of the observations in the series are found
(where j 1, 2, 3 ). For instance, Q3 means the value below which 75 percent of observations in the
given series are found.
2. D j Is the value below which ( j 10) percent of the observations in the series are found
(where j 1, 2, ... , 9 ). For instance, D4 is the value below which 40 percent of the values are
found in the series.
30
3. Pj is the value below which j percent of the total observations are found
(where j 1, 2, 3, ... , 99 ). For example, 73 percent of the observations in a given series are
below P73 .
3.5 When to Use the Different Averages
Mean is appropriate if the data is quantitative and there is no extreme (abnormal) observation(s). For
the data having extreme value(s) (or for qualitative data having ordinal measurement scale) it is better
to use median as measure of central tendency. It is largely used measure of central tendency in
psychology, education and other social sciences. On the other hand, mode is best measure of central
tendency for qualitative data with nominal scale of measurement. It can also be used as a quick
measure of central tendency for both qualitative and quantitative data.
Exercises
1. Explain the desirable properties of measures of central tendency.
2. Discuss the mathematical properties of arithmetic mean.
3. Given the following frequency distribution on wages per week of 100 workers in a certain
factory.
Wage class 39.5-44.5 44.5-49.5 49.5-54.5 54.5-59.5 59.5-64.5 64.5-69.5
No of workers 15 22 30 15 10 8
Calculate average wage paid by the factory
4. The mean salary paid to 1000 employees of an establishment was found to be 180.4. Later on
after disbursement of the salary it was discovered that the salary of two employees was wrongly
entered as 297 and 165. Their correct salaries were 197 and 185. Find the correct average salary
of the employee.
5. A tourist traveled 900 Km by train at average speed of 60 Km/hr. 300 Km by boat at an
average speed of 25 Km/hr. 400 Km by plan at an average speed 350 km /hr and finally 15 Km
by train at speed of 25 Km/hr. What is his average speed (use the concept of harmonic mean)
6. Given the following data:
Food items Quantity consumed Price (in birr) (per kg.)
Flour 500 kg. 3.25
Ghee 20 kg. 50.00
Sugar 30 kg. 8.00
Oil 40 kg. 20.00
Calculate the weighted price mean
7. Determine an appropriate average for the following income distribution.
31
Income Groups: below 100 100-200 200-300 300-400 400-500 above 500
No. of persons: 5 10 18 30 20 17
8. The following table gives the mark distribution of 60 students out of 10% in mathematics test.
Marks: 4.5 5 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
No. of students 2 5 7 2 6 4 8 4 2 10 5 5
Find the values of Q1, Q3, P30, D7 and modal size of shoes.
32
Chapter Four
Measures of variation are statistical measures, which provide ways of measuring the extent to
which the data are dispersed or spread out.
In case the two sets of data are expressed in different units, however, such as quintals of sugar
versus tones of sugarcane or if the average sizes are very different such as manager’s salary
versus worker’s salary, the absolute measures of dispersion are not comparable. In such cases
measures of relative dispersion should be used.
33
4.3 Types of Measures of Dispersion
4.3.1 The Range and Relative Range
Range (R) is defined as the difference between the largest and the smallest observation in a given
set of data. That is, R x max x min where xmax and xmin are the largest and the smallest
observations in the series respectively.
In the case of grouped data, range is found by taking the difference between the class mark of the
last class and that of the first class. That is, R M last M first where M last and M first are the
class marks of the last class and that of the first class respectively.
Example 1: Find the range and relative range for the monthly salary of ten workers in a certain
paint factory given below.
462 480 534 624 498 552 606 588 516 570
Solution:
x max 624 birr x min 462 birr
R x max x min 624 birr 462 birr 162 birr
x max x min 624 birr 462 birr 162 birr
RR 0.149
x max x min 624 birr 462 birr 1086 birr
Example 2: Find the values of the range and relative range for the following frequency
distribution: which shows the distribution of the maximum loads supported by a certain number
of cables.
Maximum load Number
(in kilo-Newton) of cables
93 – 97 2
98 – 102 5
103 – 107 12
108 – 112 17
113 – 117 14
118 – 122 6
123 – 127 3
128 – 132 1
34
Solution:
M first 95 kN M last 130 kN
R M last M first 130 kN 95 kN 35 kN
M last M first 130 kN 95 kN 35 kN
RR 0.156
M last M first 130 kN 95 kN 225 kN
The mean deviation (MD) measures the average deviation of a set of observations about their
central value, generally the mean or the median, ignoring the plus/minus sign of the deviations.
MD
x i A
Where A is a central measure (the mean or the median)
n
In case of grouped data, the formula for MD becomes
MD
f i xi A
Where xi is the class mark of the i th class, f i is the frequency of
n
th
the i class and n f i .
The mean deviation about the arithmetic mean is, therefore, given by
MD
x i x
.... for ungrouped data
n
MD
f i xi x
.... for grouped frequency distribution; where xi is the class mark of
n
the i th class, f i is the frequency of the i th class and n f i
MD
f i xi ~x .... for grouped frequency distribution; where x is the class mark of
i
n
the i th class, f i is the frequency of the i th class and n f i .
The coefficient of mean deviation (CMD) is the ratio of the mean deviation of the observations to
their appropriate measure of central tendency: the arithmetic mean or the median.
MD
In general, CMD where A is a measure of central tendency: the arithmetic mean or the
A
median.
35
MD
That is, CMD about the arithmetic mean is given by CMD where MD is the mean
x
deviation calculated about the arithmetic mean. On the other hand CMD about the median is
MD
given by CMD ~ in which case MD is calculated about the median of the observations.
x
The Variance
Variance is the arithmetic mean of the square of the deviation of observations from their
arithmetic mean.
Population Variance ( 2 )
For ungrouped data
x
2
1 xi
2
. .. x i Where is the population arithmetic
2 i 2
N N N
mean and N is the total number of observations in the population.
For grouped data
2 f x i 1
i
2
2 f i xi 2 Where is the population
N
. ..
N
f i xi N
arithmetic mean, xi is the class mark of the i class, f i is the frequency of the i th class
th
and N f i .
Sample Variance ( S 2 )
For ungrouped data
2 x i x
2
1 2 xi 2 Where is the sample arithmetic
S
n 1
...
n 1
xi n x
mean and n is the total number of observations in the sample.
For grouped data
2 f x i 1
i x
2
2 f i xi 2
Where x is
S
n 1
.. .
n 1
f i xi
n
the sample
arithmetic mean, xi is the class mark of the i th class, f i is the frequency of the i th class
and n f i .
36
The Standard Deviation
Standard deviation is the positive square root of the variance.
Population Standard Deviation ( )
2 where 2 is the population variance.
Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).
Coefficient of variation is used in such problems where we want to compare the variability of two
or more than two different series. Coefficient of variation is the ratio of the standard deviation to
the arithmetic mean, usually expressed in percent.
S
CV 100 . Where S is the standard deviation of the observations.
x
A distribution having less coefficient of variation is said to be less variable or more consistent or
more uniform or more homogeneous.
Example: Last semester, the students of Biology and Chemistry Departments took Stat 273
course. At the end of the semester, the following information was recorded.
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Biology Department Chemistry Department
S S
CV 100 CV 100
x x
23 11
100 29.11% 100 17.19%
79 64
Interpretation: Since the CV of Biology Department students is greater than that of Chemistry
Department students, we can say that there is more dispersion relative to the mean in the
distribution of Biology students’ scores compared with that of Chemistry students.
37
Standard Deviation
– It is considered to be the best measure of dispersion.
– [Demerits] If the values of two series have different unit of measurement, then we can not
compare their variability just by comparing the values of their respective standard deviations.
– It is calculated based on all the observations/data in the series. Standard deviation is capable of
further algebraic treatment.
– Standard deviation is as such neither easy to calculate nor to understand.
– Similar to the variance, standard deviation gives more weight to extreme values and less to
those which are near to the mean.
Interpretation:
Example: Two sections were given an exam in a course. The average score was 72 with standard
deviation of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A from
section 1 scored 84 and student B from section 2 scored 90. Who performed better relative to
his/her group?
Solution: Section 1: x = 72, S = 6 and score of student A from Section 1; x A = 84
Section 2: x = 85, S = 5 and score of student B from Section 2; x B = 90
x A x1 84 72
Z-score of student A: Z 2.00
S1 6
x x 2 90 85
Z-score of student B: Z B 1.00
S2 5
From these two standard scores, we can conclude that student A has performed better relative to
his/her section students because his/her score is two standard deviations above the mean score of
selection 1 while the score of student B is only one standard deviation above the mean score of
section 2 students.
Exercises
1. Consider the marks of 20 students out of 20% in biology test as follows
38
Marks of Students’ 0-5 5-10 10-15 15-20 Total
Number of students 2 6 8 4 20
Find
i. Range
ii. Quartile deviation
iii. Mean and median deviation
iv. Variance and standard deviation
2. The final exam of a course consists of two exams: mathematics and History. If a student
scored 66 in Mathematics and 80 in History. However, all students’ average score is 51
with a standard deviation of 12 in mathematics and 72 with the standard deviation 16 in
history.
a. In which subject a student had better performance?
b. In which subject all students have similar (consistent) results?
39
Chapter Five
Elementary Probability
40
Counting techniques:
In order to determine the number of out comes one can use several rules of counting.
1. Multiplication rule: - in a sequence of n events in which the first event has k1 possibilities, the
second event has k 2 possibilities,…, the nth event has kn possibilities, then the total possibilities of
the sequence will be k1.k2….kn.
Example: - in a personnel department a larger corporation wishes to issue each employee an ID
cards with two letters followed by two digit numbers. How many possible ID cards can be
imposed?
Solution
K1 K2 K3 K4
26 26 10 10
Thus the total number of ID cards issued could be:
26*26*10*10=67600(with repetition)
26*25*10*9=58500 (with out repetition)
2. Permutation: is an arrangement of n objects in a specific order. In this case order is crucial.
a) The number of permutations of n objects taken all together is n! i.e. n! / (n-n)!
b) The arrangement of n distinct objects in a specific order taking r objects at a time is given by
nPr =n!/(n-r)!= n(n-1)(n-2)…..(n-r-1)
c) The number of permutation of n objects in which k1 are alike, k2 are alike, kn are alike is
n! / k1!k2!....kn!
Example: a photographer wants to arrange 3 persons in a raw for photograph. How many
different types of photographs are possible?
Solution:
Assume 3 persons Aster (A), lemma (L), Yared (Y) and n=3
Since n! =3! = 3*2! = 6, there are 6 possible arrangement ALY, AYL, LAY, LYA, YLA and YAL
Example2: fifteen athletes including Haile were entered to the race.
a) In how many different ways could prizes for the first, the second and the third place be
awarded?
b) How many of the above triplets just counted have if Haile is in the first position?
Solution:
a) 15 objects taken 3 at a time 15P3=15! / (15-3)! = 2730
b) There are 14P2= 14! / (14-2) = 182
41
3. Combination: - counting technique in which the order of the objects is immaterial. Selection of
r objects from a collection of n objects where r<= n without regarding order. The
combination of n objects r objects taken at a time is given by
nCr = n! / (n-r)! r!
Example: In a club containing 7 members a committee of 3 people is to be formed. In how many
ways can the committee be formed?
Solution: 7C3 = 7! / (7-3)! 3! = 35
Basic approaches to probability
Classical approach: - Uses sample space to determine the numerical probability that an event
will happen. If there are n equally likely outcomes of an experiment, and out of the n outcomes
event E occur only k times the probability of the event E is denoted by P (E) is defined as
P (E) = n (E)/ n(S) =k/n
= P( A )
i 1
i
42
Then the ratio k/n is called the relative frequency of event A.
number of times event A has occurred k
P ( A)
total number of observations n
In other words given a frequency distribution , the probability of an event (E) being in a
frquency of a class
given class is P(E)=
total frequency in the distribution
Example: the national center for health statistics reported out of every 539 deaths in recent years,
24 resulted from automobile accident, 182 from cancer, and 353 from other disease. What is the
probability that particular death is due to an automobile accident?
Solution P (automobile) = death due to automobile /total death =24/539
Rules of probability
Rule l: let A be an event and A’ be the compliment of A with respect to a given sample space of
an experiment, then p(A’)=1-P(A)
Proof:
let S be a sample space
S=A A’
A A’ = P ( An A’)=0
P(S) = P (A A’) = P (A’) + P (A) - P( An A’)
1= P (A’) + P (A) - 0 1= P (A’) + P (A)
P (A’) = 1 - P (A)
Rule 2: let A and B are events of a sample space S, then
P (A’ B) = P (B)-P (A B)
Proof: B =S B = (A A’) B = (A B) (A’ B)
Case 1: if A B ≠ , then P (B) =P (A B) +P (A’ B)
P (A’ B) = P (B) – P (A B)
Case 2: if A B = , then P (B) =P (A B) + P (A’ B) since P (A B) = P ( ) =0
43
Solution:
S= {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),
(3,2),(3,3),(3,4),(3,5),(3,6),(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,4),(5,5),(5,6),
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}
This sample space has 6*6 =36 elements let E1 be the event that the sum of the spots on the die
is divisible by 2 and E2 be the event that the sum of the spots on the die is divisible by 3, then
P (E1 or E2) = P (E1 E2)
= P (E1) +P (E2) – P (E1 E2)
= 18/36 + 12/36 -6/36 = 24/36 = 2/3
Conditional probability: the conditional probability of an event A in relation to B is defined as
the probability that event E occurs given that event A has already occurred.
P (A/B) = P (A B)/ P (B) where P (B) > 0
Remark: (i) P (A B) & P (B) are computed w. r. t. original sample
(ii) P (S/B) = P(S B)/P (B) = P (B)/P (B) = 1
P (B/S) = P (B) because P (B/S) = P (B S)/P(S) = P (B)/1 =P (B)
(iv) if A and B are independent events, then P(A/B) =P(A) and P(B/A) =P(B) two events
are independent if the occurrence of B doesn’t affect the occurrence of A. i.e. P(A/B)
=P(A B)/P(B)
P (A B) = P (A/B) *P (B) but P (A/B) = P (A)
Hence P (A B) = P (A)* P (B)
Example: Suppose that an office has 100 calculating machines. Some of them use electric power
(E) while others are manual (M) and some machines are well known (N) while others are used
(U). The table below gives numbers of machines in each category. A person enter the office picks
a machine at random and discovers that it is new. What is the probability that it is used with
electric power?
E M Total
N 40 30 70
U 20 10 30
Total 60 40 100
Solution: P (E/N) =P (E N) /P (N) = 40/70 =7/4
44
Baye’s theorem
Theorem 1.1: let {E1,E2, .., En} be partitions of the sample space S, and suppose E1,E2, .., En
has non-zero probability that is P(Ei) ≠ 0 for I = 1,2, … ,n and let E be any event, then P(E)
=P(E1)* P(E/E1) + P(E2)*P(E/E2) +….+P(En)*P(E/En)
n
= P ( E )P( E E )
i 1 i
P( E ) P( E E )
i 1
i i
Example: suppose that three machines are A1, A2 and A3 produce 60%, 30%, and 20%
respectively of the total production of machines are 2%, 4%, and 6% respectively.
If an item is selected at random, then find the probability that the item is defective.
Assuming that an item selected at random is found to be defective. Find the probability that the
item was produced on machine A1.
Solution :let B be an event of selecting a defective item at random and let E1, E2, E3 be an items
produced on machines A1, A2, A3 respectively then
P (B/E1) = 2%=0.02, P (B/E2) = 4% = 0.04 and P (B/E3) = 6% = 0.06
P (B) = P (B [E1 E2 E3])
= P ([B E1] [B E2] [B E3])
= P (B E1) + P (B E2) +P (B E3)
= P (E1)*P (B/E1) + P (E2)*P (B/E2) +P (E3)*P (B/E3)
= 0.6*0.02 + 0.3*0.04 + 0.1*.006
= 0.03
p ( E1 B ) P ( E1) P ( B
E ) = 0.6 * 0.02 =0.4
1
We use Baye’s formula P (E1/B) = = n
P( B) 0.03
P( E ) P( B E )
i 1
i i
Exercise
1. For two equally likely, exhaustive and independent events A and B, p(AnB) = ------
45
i) P(A/S) = P(A) and ii) P(S/A) = 1
3. From your class of 20 female and 30 male total students the department head wants to
select 5 female and 7 male students for the purpose of specific meeting
a. What is the possible number of ways to select those required students with out
any restriction
b. What is the probability that 6 male and 3 female students to be included in to the
meeting.
4. Five biology, 2 statistics and 3 physics books are to be arranged in a row where books of
the same subjects are not distinguishable from each other, how many different ways of
arrangement are possible?
5. There are 12 ways in which manufactured items can be minor defective and 10 ways in
which it can be major defective. In how many ways can
a. One minor and one major defective occur?
b. Two minor and 2 major defective occur?
6. Out of 3 mathematicians and 7 physicists, a committee consisting of 2 mathematician and
3 physicists is to be formed.
i. In how many ways can this be done if
a. Any mathematicians and physicists can be included?
b. One particular physicist must be on the committee
46
Chapter six: Probability Distributions
Probability distribution: is a list of all the possible out comes of an experiment and the
probability associated with each out come.
Example: Suppose we are interested in the number of heads showing face up on 3 tosses of coin.
This is the experiment and the possible outcomes are 0 heads, 1 head, 2 head, and 3 heads. What
is the probability distribution for the number of heads?
Solution: The experiment has 8 possible outcomes, and below is the list of all the outcomes.
Possible Coin toss No. of heads
st nd rd
result 1 2 3
1. T T T 0
2. T T H 1
3. T H T 1
4. T H H 2
5. H T T 1
6. H T H 2
7. H H T 2
8. H H H 3
From the above table, the probability distribution for the number of heads is
No. of heads, x P (outcome), P (x)
0 1/8
1 3/8
2 3/8
3 1/8
Total 1
6.1. Random variables.
A random variable is a quantity resulting from an experiment that can assume different values.
In any experiment of chance, the outcomes occur randomly. For example, rolling a single die is
an experiment; and any one of the six possible outcomes can occur at a time.
A random variable may be either discrete or continuous.
i. Discrete random variable: a variable that results from counting and can assume only certain
clearly separated values of some item of interest.
Example: The number of heads in flipping a fair coin 5 times.
ii. Continuous random variable: a variable that results from measuring and can take any value
with in a certain range of values.
Example: The distance b/n Sodo & Addis Ababa could be 330 km, 330.5 km, 331.5 km. and
soon; depending on the accuracy of our measuring device.
6.2. Discrete probability distributions (probability mass function), expectation and variance
of discrete random variable
If we organize a set of discrete random variables in a probability distribution, the distribution is
called a discrete probability distribution; it is also called probability mass function (pmf). And
it can be summarized by its mean and variance.
Mean: The mean of a probability distribution is also referred to as expected value, E (x), and is
given by
Mean = E (x) =∑(x p(x))
P(x)= p (the possible value of random variable x).
Variance & standard deviation: Though the mean is a typical value used to summarize a discrete
probability distribution, it does not describe about the spread in the distribution, but the variance
does this.
47
2 2
= = = ∑x2p(x) – 2
Example: the following is the probability distribution for the number of cars a company expects
to sell on a particular day.
No. of cars sold, x Probability. P(x)
0 0.1
1 0.2
2 0.3
3 0.3
4 0.1
Total 1.0
1. What type of distribution is it?
2. On a typical day, how many cars does the company expect to sell?
3. What is the variance of the distribution? What is the standard deviation?
Solution:
1. It is a discrete probability distribution.
2. = E (x) =∑(x p(x))
= 0(0.1) +1(0.2) +2(0.3) +3(0.3) +4(0.1)
= 2.1.
Interpretation: Over a large number of days, the company expects to sell 2.1cars a day. Of course,
it is not possible for him to sell exactly 2.1 cars on any particular day; thus the mean is sometimes
called the expected value.
2
3. = ∑x2p(x) – 2 = (02(0.1)+12(0.2)+…+42(0.1)) - (2.1)2 = 1.29
= 2 = 1.29
1.136
48
6.3. Common discrete problem distributions
1. Binomial distribution.
It is used to represent the probability distribution of discrete random variables. Binomial means
two categories. The successive repetition of an observation (trial) may result in an outcome which
possesses or which does not possess a specified character. Our primary interest will be either of
these possibilities. Conventionally, the outcome of primary interest is termed as success. The
alternative outcome is termed as failure. These terminologies are used irrespective of the nature
of the outcome. For example, non-germination of a seed may be termed as success.
Properties:
1. There must be only two mutually exclusive outcomes: success or failure.
2. The probability of success, p, and the probability of failure, q=1-p, remains constant from
one trial to another.
3. The probability of success in one trial is totally independent of any other trial.
4. The experiment can be repeated many times
Example: The coin flip experiment has only two possible outcomes: head or tail. The probability
of each is known and constant from one trial to another. We can flip a coin many times.
The binomial distribution is computed by
P( x) n c x ( p x )(q n x )
C = combination
n= number of trials
x=number of successes
p=the probability of success
q=1-p=the probability of failure
Mean of a binomial distribution
= np
Variance of a binomial distribution
2
= npq
Example: There are 5 flights daily from Addis Ababa to Washington, suppose the probability that
any flight arrives late is 0.2. What is the probability that
a. None of the flights are late today?
b. Exactly one flight is late today?
c. Construct the entire probability distribution
d. What is the probability that less than 3 flights are late?
e. What is the probability that more than 4 flights are late?
f. Between 2 and 4 (inclusive) flights are late?
g. Exactly 2 flights are not late?
h. What is the mean?
i. What is the variance?
Solution: given that the probability of a particular flight is late is 0.2, and thus the probability that
a particular flight is not late is 0.8. There are 5 flights, so n = 5, and x refers to the number of
successes. In the questions a to e, we are asked about the late flights, so here let success = late
flight. Then p = 0.2, and q = 0.8.
a. P (none of the flights are late today) = P (0 flights are late) = P (x = 0)
P( x) n c x ( p x )(q n x )
P(0) 5 c0 (0.2 0 )(0.850 ) =0.3277
49
b. P (exactly one flight is late today) = P (1 flight is late) = P (x = 1)
P(1) 5 c1 (0.21 )(0.8 51 ) 0.4096
c. The entire distribution is
Number of P (x)
late flights, x
0 0.3277
1 0.4096
2 0.2048
3 0.0512
4 0.0064
5 0.0003
Total 1.0000
d. P (less than 3 flights are late today) = P (x < 3) = P (x = 0) + P (x = 1) + P (x = 2)
From the above table P (x < 3) = 0.3277 + 0.4096 + 0.2048 = 0.9421
e. P (x > 4) = P (x = 5) = 0.0003
f. P (2 ≤ x ≤ 4) = P (x = 2) + P (x = 3) + P (x = 4) = 0.2048 + 0.0512 + 0.0064 = 0.2624
g. P (exactly 2 flights are not late) = ?
Here we are asked about the not late flights, so we let success = not late flights.
So p=0.8, and q=0.2
Then P (exactly 2 flights are not late) = P (2) 5 c 2 (0.8 2 )(0.2 5 2 ) 0.0512
h. = np = 5 * 0.2 = 1 late flight or 5 * 0.8 = 4 not late flights
2
i. = npq = 5 * 0.2 * 0.8 = 0.8
2. The Poisson distribution
The Poisson distribution is also used to represent the probability distribution of a discrete
random variable. It is employed in describing random events that occur rarely over some
unit of time or space.
Examples of events where Poisson probability function can be used:
Number of telephone calls per hour
Number of typing errors per page
Number of accidents on a particular road per day
Hospital emergencies per day,
etc
Assumptions:
1. The probability of occurrence of an event is constant for any two intervals of time or
space
2. The occurrence of an event in any interval is independent of the occurrence in any other
interval.
Having these assumptions, the poisson distribution is given by the function
x e
P (x) =
x!
Where x = the number of times the event has occurred
= is the mean no. of occurrences per unit of time or space.
e = 2.71828, the base of the natural logarithm system.
Example: Simple observation over the past 80 hours has shown that 800 customers have entered
the shop. What is the probability that
a. exactly 5 customers will arrive during any given hour?
b. more than 3 customers will arrive during any given hour?
50
c. exactly 5 customers will arrive during any 30 minutes?
800
Solution: = 10 customers
80 hour
5 10
10 2.71828
a. P (x = 5) = 0.0378
5!
b. P (x > 3) = P (4) + P (5) + …
by the complement rule that we have discussed earlier P (x > 3) = 1 – P (x ≤ 3)
10 0 e 10 101 e 10 10 2 e 10 10 3 e 10
= 1 P(0) P(1) P (2) P(3) = 1 -
0! 1! 2! 3!
= 1 – (0.0103) = 0.989
c. P (x = 5/30 minutes)
Here, as we are asked per 30 minutes, we should change the μ value per 30 minutes; thus
800
= 10 customers 10 customers 5 customers
80 hour 60 min utes 30 min utes
5 5
5 2.71828
P (x = 5) = 0.175
5!
3. Hyper geometric distribution
When the probability of success does not remain constant from trial to trial when
sampling from a relatively small proportion with out replacement, the binomial
distribution should not be used. Instead the hyper geometric distribution should be
applied.
Assumtions:
If f(x) is pdf of x
1. f(x) 0 for all x
2.
f ( x)dx 1
i.e. area under the graph of f(x) must equals 1, since the sum of relative frequencies is 1.
Example: The diameter of an electronic cable, say x, is assumed to be continuous random
variable with pdf f(x)=6x(1-x), 0
1. Check f(x) is pdf
2. Determine number ‘b’ such that P(x<b)=P(x>b)
51
ii.
f ( x)dx 1
1 1 1 1
2 6x 2 2 1 6x3 1
0 6 x(1 x)dx 0 (6 x 6 x )dx 0 6 xdx 0 6 x dx 2 0
3 0
3 2 1
Example: Calculate the E(x) and Var (x), for the following function
f(x) = 2x, 0
So/n: 1. E (x) =
1 1 1
2x3 2 2
xf ( x)dx x(2 x)dx 2 x
0 0 3 0 3
b. var (x)=
1 2 1 1
2 4 2x 4 4 2 4 2 1
x f ( x)dx x 2 xdx 2 x 3 dx
2 2 2
0 3 0
9 4 0 9 4 9 36 18
52
The cumulative density function (cdf), F(x)
If x is a continuous random variable with pdf, f(x), then
x
F(x)= P (X x) =
f (t )dt; x
Properties
1. 0 F(x) 1
2. F ' ( x) f ( x)
3. F(- )= 0, F( )=1
4. P(a x b) F (b) F (a )
Example Given f(x) = 6x (1-x), 0 ,
1. Find F(x)
2. what is the P (0.3 x 0.8)
So/n: 1. F (x) =
x x
f (t )dt ; x 6t (1 t )dt;0 x 1
0
x x x x
2 6t 2 6t 3
6tdt 6t dt
0 0 2 0 3 0
=> F ( x) 3 x 2 2 x 3
2. P (0.3 x 0.8) = F (0.8) – F (0.3) = (3(0.8)2–2(0.8)3) – (3(0.3)2–2(0.3)3)
4. The Probability that a random variable will have a value between any two points is equal
to the area under the curve between those points.
5. The height of the normal curve attains its maximum at X this implies the mean and
mode coincides(equal)
53
6.4.2 Standard normal Distribution
It is a normal distribution with mean 0 and variance 1.Normal distribution can be converted to
standard normal distribution as follows. If X has normal distribution with mean X and standard
x
deviation , then the standard normal distribution devariate Z is given by Z=
2
1 z
P (Z) =
2
e 2
Given a normal distributed random variable X with mean µ and standard deviation σ.
b x a
P (a<X<b) P ( )
x a x
P( X a) P But, Z Standard normal r.v.
a
PZ
i.e. i) p(Z<Z0)
iv) P(Z1<Z<Z2) ii) p(Z>Z0)
54
0 Z1 Z2 0 Z0 Z0 0
iii) p (Z1<Z<Z2)
Z1 0 Z2
As the value of increases, the curve becomes more and more flat and vice versa.
Solution: a)
-2.2 1.2
P (-2.2<Z<1.2) = P (0<Z<1.2) +p (-2.2<Z<0)
= p (0<Z<1.2) +P (0<Z<2.2)
= 0.3849+0.4861
= 0.8710
b)
= P (Z>1.05) = 1 - P (0<Z<1.05)
= 1-0.8531 = 0.1469
c) P (0<Z<0.96) = 0.3315
d) P (-1.45 <Z<0) = P (0<Z<1.45) = 0.4265
NOTE: By determining the z- value, we can find the area or the probability under any normal
curve by referring to the standard normal distribution table.
How to use the Normal distribution table to determine probabilities
a. If you wish to find the area between 0 and Z (or – Z), look up the value directly in the table.
Example: P (0 < Z < 0.96) = 0.3315
Example: P (-0.96 < Z < 0) = P (0 < Z < 0.96); because the curve is symmetric to z = 0
= 0.3315
b. To find area between two points on the different sides of the mean, add the corresponding
areas found in the N table.
Example: P (-2.2 < Z < 1.2) = P (-2.2 < Z < 0) + P (0 < Z < 1.2)
=P (0 < Z < 2.2) + P (0 < Z < 1.2)
=0.4861 + 0.3849
= 0.8710
c. To find the area between two points on the same side of the mean, determine the areas related
to the two values from the table, and then subtract the smaller area from the larger.
Example: P (0.96 < Z < 1.2) = P (0 < Z < 1.2) – P (0 < Z < 0.96)
= 0.3849 – 0.3315
= 0.0534
Example: P (-1.2 < Z < -0.96) = P (-1.2 < Z < 0) – P (-0.96 < Z < 0)
55
= P (0 < Z < 1.2) – P (0 < Z < 0.96); because the curve is
symmetric to z = 0
= 0.3849 – 0.3315
= 0.0534
d. To find the area beyond Z (or -Z) value towards the same direction, look the value of Z
directly from the table, and then subtract it from 0.5.
Example: P (Z > 1.05) = 0.5 – (0 < Z < 1.05)
= 0.5 – 0.3531
= 0.1469
Example: P (Z < -1.05) =0.5 – P (-1.05 < Z < 0)
=0.5 – P (0< Z < 1.05); because the curve is symmetric to z = 0
=0.5– 0.3531
=0.1469
e. To find area beyond Z (or –Z) value towards the different direction, look the value of Z
directly from the table, and then add the probability with 0.5.
Example: P (Z > -1.05) = P (-1.05 < Z < 0) + P (0 < Z < )
= P (0 < Z < 1.05) + 0.5
= 0.3531 + 0.5
= 0.8531
Example: P (Z < 1.05) = P ( < Z < 0) + P (0< Z < 1.05)
= P 0.5 + 0.3531
= 0.8531
Example: The average satellite transmission is 150 seconds, with a standard deviation of 150
seconds. Time appears to be normally distributed. What is the probability that a call will last
a. between 125 and 150 seconds e. less than 125 seconds
b. between 145 and 155 seconds f. between 160 and 165 seconds
c. more than 175 seconds g. between 135 and 140 seconds
d. less than 160 seconds h. more than 140 seconds
e.
So/n: Given = 150 = 15, and let x = time
a) P (125 < x < 150)
125 x 150
= P
56
The t- distribution (student’s t distribution)
Suppose we have a sample X1….., Xn from a N population that has mean (unknown) and
standard deviation (unknown); and using this sample data, we want to develop an interval
estimator of the population mean . Then X N , 2/n)
Z= has a standard normal distribution, but is unknown so that we can substitute it by its
point estimator S. Hence, now tn-1 = , is said to follow a t-distribution with n-1 df.; this is true
if the n is not sufficiently large (n ≤ 30)
N.B. df = degrees of freedom- is the number of independent observation in a set of observations
Note: tα (v) stands for a value of t with v df to the right of which an area equals to lies.
Exercises
1. Suppose 20% of the population is victims of crime. In a family of 5, what is the probability that
none of the family is a crime victim?
2. Consider a random variable X that takes a value either 1 or 0 with respective probabilities P
and 1-P. find the expected value as well as the variance of the r.v.
3. The probability that a student entering a college will graduate is 0.4. Determine the probability
that out of 5 students (a) none, (b) one (b) at least one (a) at most three will graduate.
4. Find the area under the standard normal curve bounded by:
57
Chapter Seven
Sampling and sampling distribution of the mean
Introduction
When secondary data are not available for the problem under study, a decision may be taken to
collect primary data by using any of the methods discussed in the previous chapter. The required
information may be obtained by following either the census method or the sample method.
7.1 Difference between Census and Sample Methods
Census Method
Under the census or complete enumeration survey method, data are collected for each and every
unit (person, household, field, shop, factory etc.), as the case may be of the population or
58
universe, which is the complete set of items, which are of interest in any particular situation. For
example, if the average wage of workers working in sugar industry. Average is to be calculated,
then wage figures would be obtained from each and every worker working in the sugar industry
and by dividing the total wages which all these workers receive by the number of workers
working in sugar industry, we would get the figure of average wage.
merits of Census method
The merits of the census method are
1. Data are obtained from each and every unit of the population.
2. The results obtained are likely to be more representative, accurate and reliable.
4. Data of complete enumeration census can be widely used as a basis for various surveys.
Demerits
However, despite these advantages the census method is not very popularly used in practice.
1. The effort, money and time required for carrying out complete enumeration will generally be
very large and in many cases cost may be so prohibitive that the very idea of collecting
information may have to be dropped. This is truer of underdeveloped countries where resources
constitute a big constraint.
2. Also if the population is infinite or the evaluation process destroys the population unit, the
method cannot be adopted.
Sample method
Definition: Sampling is simply the process of learning about the population on the basis of a
sample drawn from it. Thus in the sampling technique, instead of every unit of the universe, only
a part of the universe is studied and the conclusions are drawn on that basis for the entire
universe. A sample is a subset of population units. The process of sampling involves three
elements:
a. Selecting the sample.
b. Col1ecting the information, and
c. Making an inference about the population.
Practical examples of sampling
A doctor examines a few drops of blood and draws conclusion about the blood
constitution of the whole body.
A businessman places orders for material examining only a small sample of the same.
A teacher may put questions to one or two students and find out whether the class as a
whole is following the lesson. In fact there is hardly any field where the technique of
sampling is not used either consciously or unconsciously.
It should be noted that a sample is not studied for its own sake. The basic objective of its study is
to draw inference about the population. In other words, sampling is a tool which helps to know
the characteristics of the universe or population by examining only a small part of it.
Unit: An element of the population from which information can be obtained.
59
Sampling Frame:- A list of all the units in the population from which information can be
obtained.
Major reasons why sampling is necessary
1) the destructive nature of certain tests
2) physical impossibility of checking all items in the population
3) cost of studying all items in the population is often prohibitive
4) the adequacy of sample result
5) in terms of time
7.2. Types of Error
Clearly, every estimate based on a sample to the population might not be accurate (exact). Si we
categorize errors that might occur during estimation in to two:
Sampling errors
Non sampling error
Sampling Errors
Even if we have a representative sample we might face errors if the sample size is not sufficiently
large. We cannot overcome this type of error (unless we take a census rather than a sample). This
will enable us to approximate the sample size we need to ensure our estimates are accurate to a
certain degree of accuracy. Our estimates of parameters will often be inaccurate if our sample is
not representative of the population.
Non-sampling Errors
Suppose we have a representative sample, and have chosen a sample size large enough to ensure
that our parameter estimates are accurate to a good degree of precision, we might still have other
kinds of errors; Such as measurement errors, recording errors, non-response errors, and
interviewer errors. Measurement errors and recording errors occur if there is an error in
measuring the item being studied or in recording its result. Interviewer errors can occur in surveys
when an interviewer introduces bias into an interview or when a questionnaire is badly designed.
Non-responses can be due to refusals, in differences, lost questionnaires or other factors. If the
non-response rate is large it is important to try to understand why this is so before analyzing data.
60
Example:- A researcher wants to study about cultural habits in Ethiopia, she could choose a
simple random sample of 100 students from W/sodo university and continue her work. However,
if by chance the sample contained a large number of students from SNNPR the result could be
misleading (as the SNNPR culture will be emphasized most). But she will have some better result
if she uses a stratified random sample instead. For this, she might divide the students in the
university according to the regions (nation) they came from and decide in advance how many
students to choose from each region. So clearly she might have get the chance to assess all
cultural habits in Ethiopia.
1. Judgment sampling
This involves choosing a sample by the judgment of the investigator. The investigator chooses a
sample of individuals that are thought to be representative of the population.
Example:- We want to choose 100 students in W/sodo university to ask their opinions on the
quality of teaching in the university. We might decide to use personal judgment to choose 100
students who seem to give a good mix of first-, second-, and third year students, and making sure
the sample contains a fair representation of male and female students. If we try to get this
representation without using any randomization, but simply using judgment, we can say that we
are using a judgment sample.
2. Quota sampling
This sampling method has some aspects in common with stratified sampling, but has no
randomization. We divide the population into strata as in stratified sampling, but we choose a
judgment sample from each stratum
61
Example:- We want to choose a sample of 100 students in W/sodo university, and want 10 to be
female and 60 male. If we choose the 40 female students by taking a simple random sample from
all the female students in the university and the 60 male students by taking a simple random
sample from all the male students in the university then we are using a stratified random sample.
On the other hand if we choose the students using our own judgment, it is a quota sample.
3. Convenience sampling
A convenience sample: involves taking individuals that are easy to find. This can be very
convenient, but it can lead to large biases in the results.
62
number of means
Sample mean (frequency) Probability
7 3 0.1429
7.5 9 0.4285
8 6 0.2857
8.5 3 0.1429
Total 21 1
7 * 3 7.5 * 9 8 * 6 8.5 * 3
x
21
=$7.71
The sample mean is a random variable, and we see that it can take three possible values. We can
now write down its probability distribution as follows.
X=x 1.5 3.5 4
P(X=x) 1/3 1/3 1/3
Suppose, instead, that we took the sample with replacement. The following samples are possible.
Possible sample (1,1) (1,2) (1,6) (2,1) (2,2) (2,6) (6,1) (6,2) (6,6)
Sample mean 1 1.5 3.5 1.5 2 4 3.5 4 6
The sample mean is a random variable, and we see that it can take six possible values. We can
now write down its probability function.
X=x 1 1.5 2 3.5 4 6
P(X=x) 1/9 2/9 1/9 2/9 2/9 1/9
63
2
2. Var (X)= /n.
The first statement is true for any type of population (whether it is finite or infinite). The second
statement is true if we have an infinite population, but it is also true if we sample with
replacement from a finite population.
A consequence of this result is that if n is large the sample mean will have an expected value
equal to and a variance close to zero. This means that the sample mean will be a good estimator
of the population mean if the sample size is large.
N.B. For the rest of the chapter the term “random sample” will mean a simple random sample
from an infinite population or a sample taken with replacement.
Exercises
1. The marks scored by five students in a test of statistics carrying 100 marks are 50, 60, 50, 60
and 40. If a simple random sample of size 4 is drawn without replacement, construct the
sampling distribution of sample mean and find the standard error of the sample mean.
2. Suppose that the population distribution of the gripping strengths of industrial workers is
known to have a mean of 110 and standard deviation of 10. For a random sample of 75
workers, what is the probability that the sample mean gripping strength will be
a) Between 109 and 112?
b) Greater than112?
3. The amount of sulphur in a daily emission from a factory has a normal distribution with mean
of 134 pounds and a standard deviation of 22 pounds. For a day selected randomly, find the
probability that the mean amount of sulphur emission will be less than 130 pound.
64
Chapter Eight
Estimation and hypothesis testing
8.1 Introduction
Definitions:
Estimation: is a process by which we estimate the unknown population parameter from sample
statistic.
Estimator: is any statistic that is used to estimate a population parameter.
Estimate: is a numerical value of an estimator is called an
There are two types of estimations: point estimation and interval estimation
1. Point estimation
- involves using a single statistic value to estimate an unknown parameter value.
2. Interval estimation
- involves using a range of values to estimate an unknown parameter
65
Suppose θ is an unknown parameter that we wish to estimate. Let T1, and T2 be functions of x1,
x2, …xn. If P (T1 < θ< T2) = 1-α, we say that the interval (T1, T2) is a (1- α) 100% confidence
interval for θ.
The most common confidence intervals are 90%, 95%, and 99%.
Point estimator for the mean (μ)
- is the sample mean (X)
Interval estimator for the mean (μ)
Case1. - The variance (δ2) is known and
- Either the data are Normal or the sample is large.
In this case we know that X N (μ, δ2/n)
X
N (0, 1)
n
Then by using the N table
P( Z Z Z ) 1
2 2
X
= P Z Z 1
2 2
n
Rearranging terms gives us,
P X Z X Z 1
2 n 2 n
Then the (1- α)100% confidence interval for μ is
X Z , X Z
2 n 2 n
Example: the manufacturer of a battery is trying to estimate the life time of the battery. He
believes each battery will last for a random amount of time that has a normal distribution with
mean μ and variance 100. He carries out an experiment to estimate μ. A sample of 400 batteries is
tested and their life times are measured. The mean life time is found to be 74.2 hours. Calculate
the 95% confidence interval for μ, and interpret your results.
So/n:
Given δ2=100, n=400, X=74.2
95% CI = ?
Here the sample is large and δ is known, then the (1- α) 100% confidence interval for μ is given
by
10 10
X Z , X Z 74.2 1.96 ,74.2 1.96 (73.22, 75.18)
2 n 2 n 400 400
Interpretation: we have 95% chance of being correct to estimate the life time of the battery is
between 73.22 and 75.18 hours.
66
X
N (0, 1)
s
n
Then by using the N table
P( Z Z Z ) 1
2 2
X
= P Z Z 1
2 s 2
n
Rearranging terms gives us,
s s
P X Z X Z 1
2 n 2 n
Then the (1- α) 100% confidence interval for μ is
s s
X Z , X Z
2 n 2 n
Example: a study by a professor at AAU is designed to offer inference about unemployment rates
by regions in Ethiopia. A sample of 200 regions reported a mean rate of 46.2%, with a standard
deviation of 1.7%. At the 90% level of confidence, what is the estimate of the mean
unemployment rate by region in the country?
So/n:
Given s=1.7%, X=46.2%, n=200
90% CI = ?
Here though δ is unknown, the sample is large, then by central limit theorem, the (1- α) 100%
confidence interval for μ is given by
s s 1.7% 1.7%
X Z , X Z 46.2% 1.65 ,46.2% 1.65
2 n 2 n 200 200
= (46.2 – 0.198, 46.2 + 0.198) = (46, 46.398)
Interpretation: we have 90% chance of being correct to estimate the mean unemployment rate of
Ethiopia by region is between 46% and 46.398%
67
X
= P t n 1 t n 1 1
2 s 2
n
Rearranging terms gives us,
s s
P X t n 1 X t n 1 1
2 n 2 n
Then the (1- α) 100% confidence interval for μ is
s s
X t n 1 , X t n 1
2 n 2 n
Example: the signing bonuses for 10 new players in the national football league are used to
estimate the mean bonus for all new players. The sample mean is $65,890 with standard deviation
$12,300. What is your 90% confidence interval for the population mean?
So/n:
Given s=12,300X=65,890 n=10
90% CI = ?
The (1- α) 100% confidence interval for μ is given by
s s 12300 12300
X t n 1 , X t n 1 65890 t 0.05 , 9 ,65890 t 0.05 , 9
2 n 2 n 10 10
= (65890 – 1.833*3886.2, 65890 + 1.833*3886.2) = (65890 – 7123.4, 65890 + 7123.4)
= (58766.6, 73013.4)
Interpretation: we have 90% chance of being correct to estimate the mean unemployment rate of
Ethiopia by region is between $58766.6 and $73013.4.
68
The manufacturer of a certain type of battery is trying to estimate the life time of the battery. He
believes that each battery will last for a random amount of time that has a Normal distribution
with mean μ and variance 100. He carries out an experiment to estimate μ. A sample of n
batteries is tested and their life times are measured. He wants to choose n so that the 95%
confidence interval for μ has a width of less than 4 hours. What value should be taken for n?
So/n:
95% CI for μ is given by
X Z , X Z X 1.96 , X 1.96
2 n 2 n n n
10 39.2
Width= X 1.96 X 1.96 21.96 4
n n n n
n > 96.04. As n should be an integer value, we should take n ≥ 97.
69
X
Z=
n
4. decision rule
i. If H1: μ≠ μo,
Reject Ho if Z cal > Z α/2 or if Z cal < -Z α/2
=> │Z cal│> Z α/2
ii. If H1: μ> μo
Reject Ho if Z cal > Z α
iii. If H1: μ< μo
Reject if Z cal < -Z α
5. conclusion
Example: a large trial was performed to test the hypothesis that the mean blood pressure of all
healthy men is 140 mmhg. The blood pressure of 100 healthy men was measured, and the sample
mean was found to be equal to 137.9 mmhg, and the sample standard deviation was 10 mmhg.
What will be concluded from the test?
So/n:
1. Hypothesis
Ho: μ= 140 vs. H1: μ≠ 140
2. Significance level, α = 5%
3. Test statistic
137.9 140
Z= -2.1
10
100
4. Decision rule
Reject Ho if Z cal > Z α/2 = 1.96or if Z cal < -Z α/2 = -1.96
=> │Z cal│> Z α/2 = 1.96
5. Conclusion
As│-2.1│> 1.96, we reject Ho, and conclude that the mean blood pressure of all healthy
men is different from 140 mmhg.
Case2. The variance (δ2) is unknown and the sample is large.
1. Hypothesis
i. Ho: μ= μo vs. H1: μ≠ μo
ii. Ho: μ= μo vs. H1: μ> μo
iii. Ho: μ= μo vs. H1: μ< μo
2. Significance level, α
3. Test statistic
X
Z=
s
n
4. Decision rule
i. If H1: μ≠ μo,
Reject Ho if Z cal > Z α/2 or if Z cal < -Z α/2
=> │Z cal│> Z α/2
ii. If H1: μ> μo
Reject Ho if Z cal > Z α
70
iii. If H1: μ< μo
Reject if Z cal < -Z α
6. Conclusion
Example: An economist is trying to test whether the mean earnings of all graduates in the country
is more than 500 Birr/month or not. The distribution of monthly earnings has a mean μ and
variance δ2. The economist has interviewed a sample of 1000 graduates and found the mean
earning is 532 Birr/month, and the standard deviation of 245 Birr/month. What will he conclude
at 1% significance level?
So/n:
1. Hypothesis
Ho: μ= 500 vs. H1: μ> 500
2. Significance level, α = 1%
3. Test statistic
532 500
Z= 4.13
245
1000
4. Decision rule
Reject Ho if Z cal > Z α = 2.33
5. Conclusion
As 4.13 > 2.33, we reject Ho, and conclude that the earnings of all graduates in the
country is greater than 500Birr/month.
Case3. -Variance (δ2) is unknown,
-The data is normal, and n is small
1. Hypothesis
i. Ho: μ= μo vs H1: μ≠ μo
ii. Ho: μ= μo vs H1: μ> μo
iii. Ho: μ= μo vs H1: μ< μo
2. Significance level, α
3. Test statistic
X
t=
s
n
4. Decision rule
i. If H1: μ≠ μo,
Reject Ho if t cal > t α/2, n-1 or if t cal < -t α/2, n-1
=> │tcal│> t α/2, n-1
ii. If H1: μ> μo
Reject Ho if t cal > t α, n-1
iii. If H1: μ< μo
Reject if t cal < -t α, n-1
5. Conclusion
Example: A soft drinks company sells its drinks in bottles that are supposed to contain 330ml of
drink. In fact the amount of drink in each bottle has a Normal distribution with unknown mean μ
and variance δ2. A quality control inspector carries out an experiment to test the company’s claim
that the mean drink in the bottles is 330ml. Suppose he takes a sample of 25 bottles and measures
the volume of their contents. The mean is found to be 328.5ml and the variance is found to be
9ml. Should the inspector believe the company’s claim?
71
So/n:
1. Hypothesis
Ho: μ= 500 vs H1: μ< 330
2. Significance level, α = 1%
3. Test statistic
328.5 330
t= -2.5
3
25
4. Decision rule
Reject Ho if t cal <- t α, n-1= - t0.01, 24= -2.492
5. Conclusion
As -2.5 < - 2.492, we reject Ho and conclude that the inspector shouldn’t believe the
company’s claim.
Exercises
1. An experiment involves selecting a random sample of 256 middle managers. One item of
interest is annual income. The sample mean is $45,420 and the sample standard deviation
is $2,050.
(i) What is the estimated mean income of all middle manager (point estimate or population
mean)?
(ii) What is the 95 percent confidence interval for population mean?
(iii) What degree of confidence being used?
(iv) Interpret the result.
2. The manufacturer of a certain type of battery is trying to estimate the lifetime of the
battery. He believes each battery will last for a random amount of time that has a N (μ, 100)
distribution. (The lifetimes are measured in hours.) He carries out an experiment to estimate μ.
A sample of 400 batteries is tested and their lifetimes are measures. The (sample) mean lifetime
is found to be 74.2 hours. Calculate a 95% confidence interval for μ. How do you interpret this
interval?
3. A biostatistician intends to estimate μ, the mean blood pressure of women between the ages
of 45 and 50. She takes a random sample of 20 women and measures their blood pressure. Based
on past experience she believes the measurements will follow a
N(μ, 100) distribution. (Measurements are in mm mercury.) Suppose she discovers the sample
mean is equal to 136.9 mm mercury. Find a 95% confidence interval for μ.
4. A biologist measured a random sample of 12 fossil skeletons of an extinct species of bird.
He found that their skulls had a mean length of 6.34cm and a standard deviation of 0.45cm. He
believes that the lengths of the skulls follow a normal distribution. Us the data to obtain a 95%
confidence interval for the mean of this distribution.
5. A sports scientist takes a random sample of 17 athletes and asks them to run 5km on a
treadmill. Their heart rates are measured before the start of the run and five minutes after the
finish. The increases in heart rates are measured and are shown below.
53 45 71 74 65 83 47 56 61 74 61 72 54 43 72 65 54
Increase in heart rates (beats per minute)
(i) Calculate the mean and standard deviation of the data.
(ii) The sport scientist wanted to estimate μ, the mean increase in heart rate. Find a
point estimate for μ and construct a 95% confidence interval for it. What
assumptions do you need to make about the population for this interval to be
valid?
72
6. A consumer service agency examined a new automobile for its gasoline performance. A
sample of 12 randomly chosen of kms covered per gallon under normal condition resulted an
average of 60 kms/gallon with stdev 1.8 km. Do this result support manufactures claim that the
new automobile covers more than 50 km/gallon? Use a=0.10
Chapter Nine
73
Regression is concerned with bringing out the nature of relation ship and using it to know the
best approximate value of one variable corresponding to a known value of other variable
Simple linear regression deals with method of fitting a straight line (regression line) on a sample
of data of two variables in terms of equation so that if the value of one variable is given we can
predict the value of the other variable.
In other words if we have two variables under study one may represent the cause and the other
may represent the effect. The variable representing the cause is known as independent (predictor
or repressor) variable and it is usually denoted by X. The variable representing the effect is
known as dependent (predicted) variable and is usually denoted by Y. Then, if the relationship
between the two variables is a straight line, it is known as simple linear regression.
When there are more than two variables and one of them is assumed to be dependent up on the
others, the functional relationship between the variables is known as multiple linear regressions.
Scatter diagram: is a plot of all ordered pairs (x, y) on the coordinate plane which is necessary to
discover weather the relationship b/n two variables indeed best explained by straight line.
Example:
Y
13 x x
12 x
11
10 x
9 x
8 x
7 x
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11 X
So if we draw a line, the regression line is one that passes through almost all or closest to all
points in the scatter diagram.
Y
x x x
x xx x
x
x x x
74
x x x
Y = + X + ε
Where
= y-intercept
= slope of the line or regression coefficient
ε=is the error term
The y-intercept and the regression coefficient are the population parameters. We obtain the
estimates of and from the sample. The estimators of and are denoted by a and b,
respectively. The fitted regression line is thus,
Ye = a + b X
The above algebraic equation is known as a regression line. The method of finding such a
relationship is known as fitting regression line. For each observed value of the variable X, we can
find out the value of Y. The computed values of Y are known as the expected values of Y and are
denoted by Ye.
The observed values of Y are denoted by Y. The difference between the observed and the
expected values Y-Ye, is known as error or residual, and is denoted by e. The residual can be
positive, negative or zero.
2
A best fitting line is one for which the sum of squares of the residuals, e ; , is minimum. For
this purpose the principle called the method of least squares is used.
According to the principle of least squares, one would select a and b such that
2
e ; = (Y- Y ) ² is minimum where Y
e e = a+ bx.
2
To minimize this function, first we take the partial derivatives of e ; with respect to a and b.
Then the partial derivatives are equated to zero separately. These will result in the following
normal equations:
y na b x
2
xy a x b x
Solving these normal equations simultaneously we can get the values of a and b as follows:
x y
xy n
b and
2
( x) 2
x
n
a y bx
75
Regression analysis is useful in predicting the value of one variable from the given values of
another variable.
Example: A researcher wants to find out if there is any relationship b/n height of the son and his
father. He took random sample 6 fathers and their sons. The height in inch is given in the table
bellow (i) Find the regression line of Y on X
(ii) What would be the height of the son if his father’s height is 70 inch?
Height of father (X) 63 65 66 67 67 68
Height of the son (Y) 66 88 65 67 69 70
2 2
Solution : X 396 , Y 425 , X 26152 , XY 26740 , Y 27355
x y
xy
n 6(26740) (396)(405)
b 2
0.625 2
( x)
2 6(26152) (396)
(i) x n
a y bx
Y b X
405 (0.625)(396)
67.5
n 6
Y=26.25-0.625X
(ii) If X=70, then
Y=26.25-0.625(70) =70, thus the height of the son is 70 inch
Standard Error of estimates: measures the average amount by which the estimated Ye values
depart from the corresponding observed Y values (dispersion of observed values around the line
of regression Yon X)
2
Sx.y =
( y i y ei ), where Ye = + X + ε and
n2
Yi is observed (actual) value of y
Example: given the observation (2, 2), (4, 5), (6, 4) and (8, 7), we can get the regression line
Ye =1+0.7X. Find the standard error of the estimates of the regression line.
Solution:
Ye =1+0.7Xi, I = 1, 2, 3, 4
Then Ye1 =1+0.7(x1) Ye3 = 1+0.7(6) = 5.2
=1+0.7(2) = 2.4 Ye4 = 1+0.7(8) = 6.6
Ye2=1+07(4) = 3.8
2
( y i y ei) 1
Sx.y = = (2 2.4) ... (7 6.6) 1.26
n2 2
The measure of the degree of relationship between two continuous variables is known as
correlation coefficient. The population correlation coefficient is represented by and its estimator
by r. The correlation coefficient r is also called Pearson’s correlation coefficient since it was
developed by Karl Pearson. r is given as the ratio of the covariance of the variables x and y to the
product of the standard deviations of x and y. Symbolically,
76
( x x )( y y )
Cor ( x, y ) n 1
r
sd ( x).sd (Y ) 2
(x x ( y y)
n 1 n 1
=
( x x )( y y )
2 2
(x x) ( y y)
x y
xy n
= 2
( x 2( X ) )( y ( y ) 2
2
n
n
)
The numerator is termed as the sum of products of x and y, SPxy. In the denominator, the first
term is called the sum of squares of x, SSx, and the second term is called the sum of squares of y,
SSy. Thus,
SPxy
r=
SS x SS y
x x
77
For example, if r = 0.8, then r2 = 0.64. This means on the basis of the sample
approximately 64% of the variation in the dependent variable, say Y, is caused by the variation of
the independent variable, say X. The remaining, 1-r2, 36% variation in Y is unexplained by
variation in X. In other words, variables (factors) other than X could have caused the remaining
36% variation in Y.
Example: the research director of the Dubbary Saving and Loan Bank collected 24 observation of
montage interest rates X and number of house sales Y at each interest rate. The director computed
that,
2 2
x 276, y 768, x i 3300, y 2500, xi y 8690
i i i i
Then compute (i) Coefficient of correlation.
(iii) The coefficient of determination.
Solution:
(i)
r
( x x )( y y )
2 2
(x x) ( y y)
24(86.9) 276(768)
0.61
2 2
24(3300) 24(2500)
( 276) (768)
(ii) Coefficient of determination (R2) = r2= (0.61)2 =0.37 this shows that 37% of the variation
in the number of house holds is due to the variation in the interest rate.
Exercise
1. Define, briefly, regression and correlation, in statistics.
2. How do you interpret a calculated value of Karl person’s correlation coefficient?
Discuss in particular the values of r= 0, r=-1 and +1.
3. calculate and interpret the Karl Pearson’s correlation coefficient for the ages of
husband and wife for the data given below
Age of husband 23 27 28 29 30 31 33 35 36 39
Age of wife 18 22 23 24 25 26 28 29 30 32
4. Assuming that we conduct an experiment with eight fields planted with corn, amount
of nitrogen fertilizer applied is given in kgs and corn yield per hectare, the resulting corn yield
and amount of fertilizer applied shown in the table below .
Amount of Nitrogen 22 26 23 29 20 15 18 32
fertilizer(kg)(x)
Corn yield/hectare(y) 120 130 160 180 120 110 118 190
78
a) Compute a linear regression equation of corn yield per hectare on amount of nitrogen
fertilizer applied and also by using the equation predict corn yield for a field treated with
34kgs of fertilizer.
Calculate and interpret simple correlation coefficient between amount of fertilizer applied and
corn yield obtained, also obtain coefficient of determination
79