0% found this document useful (0 votes)
30 views79 pages

Ch#5# ST

This document provides an introduction to statistics including definitions of key terms, classification of statistics, stages of statistical investigation, applications and limitations of statistics, and types of variables and measurement scales. It covers descriptive and inferential statistics, population and sample, parameters and statistics, quantitative and qualitative variables, and nominal, ordinal, interval and ratio measurement scales.

Uploaded by

Gemechis Gurmesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views79 pages

Ch#5# ST

This document provides an introduction to statistics including definitions of key terms, classification of statistics, stages of statistical investigation, applications and limitations of statistics, and types of variables and measurement scales. It covers descriptive and inferential statistics, population and sample, parameters and statistics, quantitative and qualitative variables, and nominal, ordinal, interval and ratio measurement scales.

Uploaded by

Gemechis Gurmesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Wolaita Sodo University

Faculty of Natural and Computational Science

Introduction to Statistics

1
Contents

Chapter Page
1. Introduction--------------------------------------------------------------- 3
2. Organization and Methods of Data Presentation------------------ 9
3. Measures of Central Tendency and Location-----------------------21
4. Measures of Dispersion (Variation)-----------------------------------33
5. Elementary Probability--------------------------------------------------40
6. Probability Distributions-------------------------------------------------47
7. Sampling and sampling distribution of the mean-------------------58
8. Estimation and hypothesis testing--------------------------------------65
9. Simple Linear Regression and correlation analysis-----------------73

2
Chapter One

Introduction

1.1 Definition and classification of statistics


Definition:
 Statistics is a collection of numerical facts and data.
 Statistics is a mathematical science dealing with the methods of collection, organizing the
collected data, presentation, analysis and interpretation of the data.
 Statistics is a subject that deals with numbers and figures describing certain situations. It
primarily deals with numerical data taken by surveys and summarizes these data in such a
way that this summary gives a good indication about the nature of the data.

The word “statistics” could be singular or plural. The definition given in the second place above
might be taken as the singular form of “statistics”.
Statistics, in its singular sense is a subject area or field of study. It is defined as science, which
deals with the collection, processing, analysis, interpretation and presentation of numerical facts.

The subjects of statistics, as it seems, is not a new discipline but it is as old as the human society
itself. The sphere of its utility, however, was very much restricted.

The word “statistics” is derived from the Latin for “state” indicating the historical importance of
governmental data gathering, which related to demographic information (military recruitment
and tax collecting). Thus, the scope of statistics in the ancient times was primarily limited to the
collection of demographic, property and wealth data of a country by governments for framing
military and fiscal policies.

Nowadays, statistics is used almost in every field of study, such as natural science, social science
engineering, medicine, agriculture, e t c.

Classification: Statistics is broadly divided into two categories based on how the collected data
are used.

1. Descriptive Statistics
 deals with describing data without attempting to infer anything that goes beyond the given set
of data,
 consists of collection, organization, summarization and presentation of data.

2. Inferential Statistics
 deals with making inferences and/or conclusions about a population based on data obtained
from a limited sample of observations,
 consists of performing hypothesis testing, determining relationships among variables and
making predictions.

Example
A newspaper reports the following net paid circulation from 1989 E.C to 1993 E.C :
365,000 368,650 370,475 375,950 383,250
i) If one performs the necessary calculation to show that the average yearly net paid

3
circulation form 1989 to 1993 was 372,605 , then his work belongs to the domain of
descriptive statistics
 (383,250  365,000) 
ii) If he says there was a 5 percent  X 100 increase from 1989 to
 365,000 
1993, again this is descriptive statistics
iii) If he uses the data to predict that by the year 1996 E.C the news papers net paid
circulation will be 402413, then his work belongs to the domain of inferential
statistics.

1.2 Definition of some basic terms

a) Population: Is the totality (collection) of all objects or items under consideration.


Example: If you want to study the mean age of primary school teachers in sodo town,
all primary school teachers in sodo town constitute the population of your study.

b) Sample: Is a part of a population taken so that some generation about the population
can be made. A sample should be a representative of the population. Example: If you
want to study the mean age of primary school teachers in sodo town, all primary school
teachers in sodo town constitute the population as mentioned above, but if you study
only some of the teachers, the selected ones constitute your sample.

c) Parameter: is a descriptive measure of a population, or summary value calculated


from a population. Examples: Average, Range, variance value of the population.

d) Statistic: is a descriptive measure of a sample, or summary value calculated from a


sample.
Example: Average, Range, variance value of the sample.

1.3 Stages in Statistical Investigation

We have defined statistics, in singular sense, as a science that deals with collection,
organization (classification), presentation, analysis, and interpretation of numerical facts.
So we consider the following stages of statistical investigation:

Data Collection: This is a stage where we gather information for our purpose.

Data Organization: It is a stage where we edit our data. A large mass of figures that are
collected from surveys frequently need organization. The collected data involve irrelevant
figures, incorrect facts, omission and mistakes.

Data Presentation: The organized data can now be presented in the form of tables, charts
diagrams and graphs. At this stage, large data are presented in a very summarized and
condensed manner.

Data Analysis: This is the stage where we critically study the data. The purpose of data
analysis is to dig out information useful for decision making.

Data Interpretation: This is the stage where draw valid conclusions from the results
obtained through data analysis. If the data that have been analyzed are not properly

4
interpreted, the whole purpose of the investigation may be defected and misleading
conclusion may be drawn.

1.4 Application and limitation of statistics

Uses of statistics
The science of statistics is very essential for research and decision making processes in all aspects
of human life. The following are some of the areas for which statistical analysis is required:
 to represent the facts in the form of numerical data.
 to summarize a mass of data into a few presentable understandable and
precise figures.
 to Predict or forecast future trend.
 to help select a course of action among a number of alternatives.
 to help in formulating policies.

However, Statistics has the following limitations.

a) It does not study qualitative characteristics directly Examples: Beauty, honesty, poverty,
and standard of living.
b) It does not study a single individual but deals with aggregate of facts. Example: The
population size of a country for some given year does not help us for comparative
studies.
c) Statistical results are true only on the average. Examples: The probability of getting a
head in tossing a coin is 1|2. The germination percentage of a given variety of seed is
80%
d) It is sensitive for misuse: Examples: The number of car accidents committed in a city in
a particular year by women drivers is 10 while that committed by men drivers is 40.
Hence women drivers are safe drivers.
1.5 TYPES OF VARIABLES AND MEASUREMENT SCALES

A variable is a characteristic of an object that can have different possible values.


There are two types of variables.

a) Quantitative variables: are variables that can be quantified or can have numerical
values. Examples: height, area, income, temperature e t c.
b) Qualitative variables: are variables that can not be quantified directly. Examples: colour
, beauty, sex, location qualitative variables are also called categorical variables. And
hence we have two types of data; quantitative & qualitative data.

Qualitative variables can be further classified as


 Discrete variables, and
 Continuous variables

a) Discrete variables are variables whose values are counts.


Examples: number of students, number of households (family size), Number of pages of
a book.
b) Continuous variables are variables that can have any value within an interval.

5
Examples: weight, Length, Volume, e t c.

1.5 Measurement scales

There are four types of measurement scales for variables

1. Nominal scale: - “Nominal “is a Latin word for “name” This is a scale for grouping
individuals into different categories.
Example 1: red, brown, black
2: short, tall
3: pass, fail
 In this scale, one is different from the other
 +, -, *, /, impossible, comparison is impossible
2. Ordinal scale: - “ ordinal” is a Latin word, meaning “order”

 It is a scale for grouping and ordering of individuals in to different


categories.
 Data consisting of an ordering or ranking of measurements are said to be
on an ordinal scale of measurements.
Examples: military ranks, ranks in race, ranks of collage academic staff, e t c.
 One is different from and grater /better/ less than the other.
 +, -, *, / are impossible, comparison is possible.
Ordinal scales data contain and convey more information than the nominal scale data, for relative
magnitudes are known, however, quantitative comparisons are impossible.

3. Interval scale: is a measurement scale in which:


 There is no true zero point (arbitrary zero paint)
 There is no physical significance to the zero point.
 There is a constant interval size between any adjacent units on the measurement
scale.
Example: oc, oF (Measuring units of temperature)
 In this measurement scale
One is different, better/greater and by a certain amount of difference than another (Possible to add
and subtract but multiplication and division are not possible)
37Oc – 35oc = 2oc
45oc – 43 oc= 2oc
40oc = 2(20oc) But this does not imply that an object which is 40 oc is twice as hot as an
object which is 20 oc.

 Interval scale data convey better information than nominal and ordinal scale data.

4. Ratio scale: is a measurement scale in which

 There is a constant interval size between any adjacent units on the measurement scale.
 There exists a zero point on the measurement scale and that there is a physical
significance to this zero point.

6
Examples: height, weight, volume, etc

 One is different, larger /taller/ better/ less by a certain amount of difference and so much
times than the other.
 (+, -, *, / are possible on this scale)
 This measurement scale provides better information than interval scale of measurement

1.6 Sources of data and methods of data collection

Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are
 Comparable
 Meaningful and
 Collected for a well defined objective
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of
magnitude.
 It enables us to know the rang of the data set easy and it also gives us some idea
about the general characteristics of the distribution.

Any scientific investigation requires data related to the study. The required data can be obtained
from either a primary source or a secondary source.

Primary source: Is a source of data that supplies first hand information for the use of the
immediate purpose.

 Primary data: are data originally collected for the immediate purpose.
- Primary data are more expensive than secondary data.
Secondary source: are individuals or agencies, which supply data originally collected for
other purposes by them or others.
- Usually they are published or unpublished materials, records, reports, e t c.
 Secondary data: data collected from a secondary source.
Methods of data collection
There are three major methods of data collection
i. observation or measurement
ii. Interviews and questionnaires
iii. The use of documentary sources

I. Observation or measurement
In this method, data can be obtained through direct observation or measurement .
- It requires training of persons who measure in order to insure the use of standard
procedure
- Provides accurate information but it is expensive and inconvenient

II. Interviews and Questionnaires


Questionnaire: - are written documents which instruct the readers or listeners to answer the
questions written on it.
There are three ways of collecting information under this method
a) Face to face interviews ( Questionnaires in charge of interviewers )

7
b) Telephone interviews
c) Mailed questionnaires ( Self administered questionnaires returned by mail )

III. The use of documentary sources


It is extracting of information from existing sources (e.g. Hospital records)

Exercise
1. How does statistics help for your profession?
2. Differentiate descriptive and inferential statistics.
3. Mention some limitations of statistics (discuss by examples).
4. Explain the difference between the following statistical terms by giving example?
. Qualitative and quantitative variables
. Nominal and ordinal
. Parameter and statistic
. Secondary and primary data
5. Explain various methods of collecting primary and secondary data.

6. What is a questionnaire?

7. Classify the following data based on scale of measurement.

a. Months of the year Meskerm, Tikimit, hedare …


b. The net wages of a group of workers
c. Socioeconomic status of a family when classified as low, middle and upper
classes.
d. The daily temperature of w/sodo town for 30 days.

8
Chapter Two

Organization and Methods of Data Presentation

2.1 Classification and Tabulation of Data

Classification: - is the process of arranging items/data into classes or categories according to


their similarities and/or differences.

Classification eliminates inconsistency and also brings out the points of similarity and/or
dissimilarity of collected items/data.

Classification is necessary because it would not be possible to0 draw inferences and conclusions
if we have a large set of collected [raw] data.

2.2 Frequency Distributions


Frequency: - is the number of times a certain value or set of values occurs in a specific group.

A frequency distribution is a table that presents data according to some criteria with the
corresponding number of items falling in each class (i.e. with the corresponding frequencies.)

Example: A frequency distribution presenting the number of males and females in a class
Sex Frequency
Male 57
Female 39

Generally, there are two basic types of frequency distributions: Ungrouped and Grouped
frequency distributions.

1. Ungrouped frequency distribution

Ungrouped frequency distribution is a table of all potential raw scored values that could possibly
occur in the data along with their corresponding frequencies. Ungrouped frequency distribution is
often constructed for small set of data or a discrete variable.

Constructing an ungrouped frequency distribution


To construct an ungrouped frequency distribution, first find the smallest and the largest raw
scores in the collected data. Then make a columnar table of all potential raw scored values
arranged in order of magnitude with the number of times a particular value is repeated, i.e., the
frequency of that value. To facilitate counting method, tallies can be used.

Example: The following data are the ages in years of 20 women who attend health education last
year:
30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.
Construct a frequency distribution for these data.
STEP 1. Find the range of the data:
Range  Maximum observation  Minimum observation
STEP 2. Construct a table, tally the data and complete the frequency column. The frequency
distribution becomes as follows.

9
Age Tally Frequency
29 / 1
30 //// 4
31 / 1
32 /// 3
33 / 1
35 // 2
36 // 2
37 / 1
39 / 1
41 /// 3
42 / 1

2. Grouped frequency distribution

When the range of the data is large, the data must be grouped into classes. Grouped frequency
distribution is a frequency distribution when several numbers of data are grouped into one class.

Some Important Definitions


– Raw data: data collected in original form.
– Array: data arranged, in ascending or descending order.
– Class: the different, on overlapping groups of data.
– Class limits: are limits that separate one class in a grouped frequency distribution from
another. The limits could actually appear in the collected data and have gaps between the
limit of one class and the lower limit of the next class.
– Class boundaries: separate one class in a grouped frequency distribution from another. The
boundaries have one more decimal place than the raw data and therefore do not appear in the
collected data. There is no gap between the upper boundary of one class and the lower
boundary of the next class. The lower class boundary (LCB) is found by subtracting 0.5 units
of measurement from the lower class limit (LCL) and the upper class boundary (UCB) is
found by adding 0.5 units of measurement to the upper class limit (UCL).
That is, LCB=LCL+ 1 2 U and UCB =UCL + 1 2 U
– Class width (W): the difference between the upper and lower boundaries of any class or the
lower limits of two consecutive classes, or the upper limits of two consecutive classes.
N.B. Class width is not equal to the difference between UCL and LCL of the same class.
– Class mark (M): the mid point of a class interval.
UCBi  LCBi
i.e. M 
2
– Unit of measurement (U): the smallest difference between any two values of the variable
being measured.
– Cumulative frequency (Cf) less than type: the total frequency of all values (observations) less
than or equal to the upper class boundary for the given class.
– Cumulative frequency (Cf) more than type: The total frequency of all values (observations)
greater than or equal to the lower class boundary for the given class.

A tabular arrangement of class intervals together with their corresponding cumulative


frequency (either less than or more than type; as defined above) is called a cumulative frequency
distribution.

10
– Relative frequency: the frequency a class divided by the total frequency (i.e. sum of all
frequencies) and, if multiplied by 100, gives the percent of values falling in that class.
Frequency of that class
Re lative frequency of a class 
Total frequency
Note:
 The relative frequency shows what fractional part or proportion of the total frequency
belongs to the corresponding class.
 The sum of all the relative frequencies in the frequency distribution is always 1.
– Relative cumulative frequency (less than type/ more than type): total of the relative frequencies
above/ below a class inclusively. Or the cumulative frequency (less than type/more than type)
divided by the total frequency. This gives the percent of values which are less than/more than
the upper/lower class boundary.

Guidelines to construct a grouped frequency distribution


STEP 1. Find the maximum(Max) and the minimum(Min) observation, and then compute their
range, R Range  Max  Min
STEP 2. Fix the number of classes desired (k). there are two ways to fix k:
– Fix k arbitrarily between 6 and 20, or
– Use Sturge’s Formula: k  1  3.332 log10 N where N is the total frequency. And
round this value of k up to get an integer number.
STEP 3. Find the class widths (W) by dividing the range by the number of classes and round the
number up to get an integer value. W R
K
STEP 4. Pick a suitable starting point less than or equal to the minimum value. This starting point
is the lower limit of the first class. Continue to add the class width to this lower limit to
get the rest of the lower limits.
STEP 5. Find the upper class limits. To find the upper class limit of the first claa, subtract one unit
of measurement from the lower limit of the second class. Then continue to add the class
width to this upper limit so as to get the rest of the upper limits.
STEP 6. Compute the class boundaries as: LCB  LCL  12 U and UCB  UCL  12 U
Where LCL = lower class limit, UCL= upper class limit, LCB= lower class boundary and UCB=
upper class boundary. The class boundaries are also half way between the upper limit of one class
and the lower limit of the next class.
STEP 7. Tally the data.
STEP 8. Find the frequencies.
STEP 9. (If necessary) Find the cumulative frequencies (more than and less than types).
Example: The number of hours 40 employees spends on their job for the last 7 working days is given
below.
62 50 35 36 31 43 43 43
41 31 65 30 41 58 49 41
37 62 27 47 65 50 45 48
27 53 40 29 63 34 44 32
58 61 38 41 26 50 47 37
Construct a suitable frequency distribution for these data using 8 classes.
STEP 1. Max = 65, Min = 26 so that R = 65-26 = 39
STEP 2. It is already determined to construct a frequency distribution having 8 classes.
STEP 3. Class width W  39  4.875  5
5
STEP 4. Starting point = 26 = lower limit of the first class. And hence the lower class limits become

11
26 31 36 41 46 51 56 61
STEP 5. Upper limit of the first class = 31-1 = 30. And hence the upper class limits become
30 35 40 45 50 55 60 65
The lower and the upper class limits (Steps 5 and 6) can be written as follows.
Class limits
26 – 30
31 – 35
36 – 40
41 – 45
46 – 50
51 – 55
56 – 60
61 – 65
STEP 6. By subtracting 0.5 units of measurement from the lower class limits and by adding 0.5 units
of measurement to the upper class limits, we can get lower and upper class boundaries as
follows.
Class
boundaries
25.5 – 30.5
30.5 – 35.5
35.5– 40.5
40.5– 45.5
45.5– 50.5
50.5– 55.5
55.5– 60.5
60.5– 65.5
STEPS 8, 9 and 10 are displayed in the following table (columns 3, 4 and 5&6 respectively).
Class limits Class Tally frequency Cumulative Cumulative
boundaries frequency (less frequency
than type) (more than
type)
26 – 30 25.5 – 30.5 ///// 5 5 40
31 – 35 30.5 – 35.5 ///// 5 10 35
36 – 40 35.5– 40.5 ///// 5 15 30
41 – 45 40.5– 45.5 ///// //// 9 24 25
46 – 50 45.5– 50.5 ///// // 7 31 16
51 – 55 50.5– 55.5 / 1 32 9
56 – 60 55.5– 60.5 // 2 34 8
61 – 65 60.5– 65.5 ///// / 6 40 6

2.3 Diagrammatic and Graphic Presentation of Data

The data that is presented by a frequency distribution can also be displayed diagrammatically or
graphically.
Diagrams and graphs:
- are techniques for presenting data in visual displays using geometric figures;
- are visual aids which give a bird’s eye view about a given set of numerical data;
- have greater attraction than mere figures (numbers);
- facilitate comparison of data;

12
- are easily understandable by anyone who does have no statistical background
Usually diagrams are appropriate for presenting discrete data, whereas graphs are appropriate for
presenting continous types of data.

There are three common diagrammatic presentations of data: bar-diagram/charts, pie-chart and
pictograms, as well as three common graphic presentations of data: histogram, frequency
polygon, and cumulative frequency polygon (ogive).

I. Bar-diagrams/ Bar-charts

- Bar-diagram is a series of equally spaced bars having equal width and the height of each bar
representing the magnitude or frequency of observations in each group.
- Bar-diagrams are usually used to represent one way or simple frequency distribution.
- Bar-diagrams can be drawn either horizontally or vertically. Usually horizontal bar-diagrams
are used for qualitatively classified data whereas vertical bar-diagrams are used for
quantitatively classified data.

Example: Horizontal bar-diagram.

AB
Blood Type

8 10 12 14 16 18

Frequency

There are a number of bar-diagrams. The most common being:


- Simple bar-diagrams
- Deviation (two-way) bar-diagrams
- Broken bar-diagrams
- Component (subdivided) bar-diagrams
- Multiple bar-diagrams
1. Simple bar-diagrams
Simple bar-diagrams are used to depict data of single variable or one-way variable.

Example: The following frequency distribution shows sales of production (in million birr) of
three products for 2004 production year.
Product Sale (in million)
A 14
B 21
C 9
D 17
The bar-diagram presentation for these data is given below.

13
22

20

18

16
Sales (in million birr)

14

12

10

6
A B C D

Product

2. Deviation bar-diagrams

When the data take both positive and negative values (for instance data on profit, net export,
percent change, etc) deviation bar-diagrams are appropriate.

Example: Present the following data using a suitable bar-diagram.

Data: Net profit (in thousands birr) in oil sales for five years

Year Profit (in


thousands)
1997 12
1998 -5
1999 14
2000 9
2001 -6

The deviation bar-diagram for the data looks like the following.

20
Profit (in thousands)

10

-10
1997 1998 1999 2000 2001

Year

14
3. Broken bar-diagrams
This kind of bar-diagram is used to present data involving a few extreme values where it will be
difficult to accommodate the magnitude of the bars corresponding to these values within the
graph paper. In this case we use pieces of bars with each piece starting with a jump on the
numerical scale.

Example: Data: - Amount of production per a day for four products of a factory.

Product Quantity
produced
(kg/day)
A 14
B 35
C 23
D 109

4. Component bar-diagrams

When it is desired to show how a total (an aggregate) is divided into component parts, we use
component bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a
variable with each aggregate broken into its component parts and different colors or designs are
used for identification.

Example: Represent the following data using bar-charts


Data: Yields of production of farmers in Southern Ethiopia.

Year  1990 EC 1991 EC 1992 EC 1993 EC


Crop
Barley 14 15 26 19
Wheat 10 15 14 25
Maize 2 6 10 3
Total 26 36 50 47

The component bar-diagram for this table is as follows


60

50
Production

40

30

20

MAIZE
10
WHAET

0 BARLEY
1990 1991 1992 1993

YEAR

15
5. Multiple bar-diagrams

Multiple bar-diagrams are used to display data on more than one variable. They are used for
comparing different variables at the same time.

Example: The data given in the above example can be presented by using multiple bar-diagram as
below.

30

20
Production

10

BARLEY

WHAET

0 MAIZE
1990 1991 1992 1993

YEAR

II. Pie-charts

A pie-chart is a circle that is divided into sections or wedgrs according to the percentages of
frequencies in each category of the distribution. The angle of the sector of a class is obtained by
multiplying the ratio of the frequency of the class to the total frequency by 3600.
frequency of the class
i.e. sec tor angle of a class   360 0
total frequency
Note that pie-charts are usually used for depicting nominal level data.

Example: A survey showed that a car owner spends birr 2,950 per year on operating expenses.
Below is the breakdown of the various expenditure items. Draw an appropriate chart to portray
the data.

Expenditure item Amount (in


birr)
Fuel 603
Interest on car loan 279
Repairs 930
Insurance and license 646
depreciation 492
Total 2,950

How to draw a pie-chart

- First find the percentages of each class


- Next calculate the degree measures for each class
- Finally, using a protractor, put each sector /degree measure/ in a circle and give a key for
explanation.

16
Expenditure item Amount (in Percentage Degree
birr) (approx) (approx)
Fuel 603 20 74
Interest on car loan 279 9 34
Repairs 930 32 113
Insurance and license 646 22 79
depreciation 492 17 60
Total 2,950 100 360

Now we can draw the pie-chart for the data.

17% 20%
Key
Fuel
Insurance and license
9% 22%
Repairs
Interest on car loan
Depreciation
32%

III. Pictograms

In pictograms, we represent the data by means of some picture symbols. Here we decide a
suitable picture to represent a definite number of units in which the variable is measured.

Example: Draw a pictorial diagram to present the following data (number of students in a certain
school for four years.)

Year 1992 1993 1994 1995


No. of students 2000 3000 5000 7000

Let a single picture () represents one thousand students.

1995 
1994  Key: = 1000 students
1993 
1992 

17
IV. Histogram

A histogram is another way of data presentation which is more suitable for frequency
distributions with continuous classes.
In drawing a histogram, we put the class boundaries of each class on the horizontal axis and its
respective frequency on the vertical axis.

Example: Draw a histogram presenting the following data.

Frequency Cumulative Cumulative


Class Boundaries Class Mark Frequency (less Frequency (more
than type) than type)
5.5 – 11.5 8.5 2 2 20
11.5 – 17.5 14.5 2 4 18
17.5 – 23.5 20.5 7 11 16
23.5 – 29.5 26.5 4 15 9
29.5 – 35.5 32.5 3 18 5
35.5 – 41.5 38.5 2 20 2

V. Frequency Polygon

A frequency polygon is a line graph drawn by taking the frequencies of the classes along the
vertical axis and their respective class marks along the horizontal axis. Then join the cross points
by a free hand curve.

Example: Present the data in the previous example using a frequency polygon.

10

6
Frequency

0
0.0 8.50 14.50 20.50 26.50 32.50 38.50

Class Marks

VI. Cumulative Frequency Polygon (Ogive)

Cumulative frequency polygon can be traced on less than or more than cumulative frequency
basis. Place the class boundaries along the horizontal axis and the corresponding cumulative
frequencies (either less than or more than cumulative frequencies) along the vertical axis. Then
join the cross points by a free hand curve.

Example: the data in the previous example can be presented using either a less than or a more
than cumulative frequency polygon as given below (i) and (ii) respectively.

18
(i) Less than type cumulative frequency polygon

30

Less than type cumulative frequencies


20

10

0
11.50 17.50 23.50 29.50 35.50 41.50

Upper class boundaries

(ii) More than type cumulative frequency polygon

30
More than type cumulative frequencies

20

10

0
5.50 11.50 17.50 23.50 29.50 35.50

Lower class boundaries

Exercise
1. Given the following row data:
62 50 57 58 51 53 62 64 60 61
60 51 64 55 55 52 60 65 58 60
59 52 63 56 56 58 64 63 62 60
58 54 62 54 54 60 65 60 62 59
56 63 52 53 62 53 61 61 59 65
a) Construct simple frequency distribution table.
b) Construct grouped frequency distribution table.

2. If class mid-points in a frequency distribution of a group of persons are 25, 32, 39, 46, 53, 60,
67, 74 and 81, find (a) size of the class interval, and (b) the class boundaries.

19
3. In a sample study about coffee drinking habits in two villages A and B, the following
information was recorded:
A: Females were 40%. Total coffee drinkers were 45% and male non-coffee drinkers were 20%.
B: Male were 55%, male non-coffee drinkers were 30% and female coffee drinkers were 15%.
Present the above information in a tabular form.
4. The following table shows the marital status of males and females (18 years and older) in a
certain city. Draw a pie chart separately for males and females to display the data.
Marital Status Male (percent of total) Female (percent of total)
Single 21 16
Married 65 73
Widowed 9 4
Divorced 5 7
5. Prepare (a) histogram (b) frequency polygon (c) Ogive for the following frequency distribution
of marks in a final examination.
Class 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89
Frequency 6 12 20 14 12 8 6 2

20
Chapter Three

Measures of Central Tendency and Location

3.1 Objectives of Measuring Central Tendency

The most important aspect of studying the distribution of a sample measurement is the position of
the central value, that is, a representative value about which the measurements are distributed and
when it is convenient to have one figure that is representative of each group. This figure is known
as the average of the group. If the numbers of the group are arranged in order of magnitude, the
averages tend to fall around the central position in the group, so averages are called measures of
central tendency. In short, any measure intended to represent the center of data set is called a
measure of central tendency.

The most important objectives of measuring central tendency are:


 To determining a single value around which the other data will concentrate
 To summarizing/reducing the volume of the data
 To facilitating comparison within one group or between groups of data

Desirable properties of measure of central tendency

We say a measure of central tendency is best if it posses most of the following. It should:
- be simple to understand and easy to calculate/interpret,
- exist and be unique,
- be rigidly defined by mathematical formula,
- be based on all observations,
- Not be seriously affected by extreme observations,
- Have capable of further statistical analysis and/or algebraic manipulation.

3.2 The Summation Notation (∑)

Let a data set consists of a number of observations, represents by x1 , x 2 , ..., x n where n (the last
subscript) denotes the number of observations in the data and xi is the ith observation. Then the
sum

For instance a data set consisting of six measurements 21, 13, 54, 46, 32 and 37 is represented by
x1 , x 2 , x3 , x 4 , x5 and x 6 where x1 = 21, x 2 = 13, x3 = 54, x 4 = 46, x5 = 32 and x 6 = 37.
6
Their sum becomes x
i 1
i  21+13+59+46+32+37=208.
n
2 2 2 2
Similarly x1  x 2  ...  x n =  xi
i 1
Some Properties of the Summation Notation
n
1.  c = n.c
i 1
where c is a constant number.

21
n n
2.  b.xi  b xi where b is a constant number
i 1 i 1
n n
3.  (a  bxi )  n.a  b xi
i 1 i 1
where a and b are constant numbers
n n n
4.  ( xi  y i )   xi   y i
i 1 i 1 i 1

3.3 Types of Measures of Central Tendency

Several types of averages or measures of central tendency can be defined, the most commons are
- the arithmetic mean or the mean
- the geometric mean
- the harmonic mean
- the mode
- the median
The choice of average (measure of central tendency) depends upon which best represents the
property under discussion.

3.3.1. The Arithmetic Mean (The Mean)

The arithmetic mean is defined as the sum of the measurements of the items divided by the total
number of items.

Arithmetic Mean for Ungrouped Frequency Distribution

When the data are arranged or given on the form of ungrouped frequency distribution, then the
formula for the mean is

Example 1: You measure the body lengths (in inches) of 10 full-term infants at birth and record
the following:
17.5 19.5 17.5 19 20
21 18 19.5 18 10.75
Compute the sample mean length of the infants for these data.
Example 2: Monthly incomes of fourth year regular students are given in the following frequency
distribution.

Monthly income (birr) 54.5 64.5 74.5 84.5 94.5 104.5 114.5
Number of students 6 9 15 25 13 7 5
Compute the mean for these data.

Arithmetic Mean for Grouped Frequency Distribution

If data are given in the form of continuous frequency distribution, the sample mean can be
computed as

22
Where xi = the class mark of the i th class; i = 1, 2, …, k
f i = the frequency of the i th class and k = the number of classes
k
Note that f
i 1
i  n = the total number of observations.

Example: The following table gives the daily wages of laborers. Calculate the average daily
wages paid to a laborer.

Wages in birr 11-13 13-15 15-17 17-19 19-21 21-23 23-25


Number of laborers 3 4 5 6 6 4 3

Properties of the Arithmetic Mean


 The sum of the deviations of the items from their arithmetic mean is zero. This means, the
algebraic sum of the deviations of a set of numbers x1 , x 2 , ..., x n from their mean x is zero.
n
That is  (x
i 1
i  x)  0

 The sum of the squares of the deviations of a set of observations from any number, say A, is
the least only when A= . That is,
 When a set of observations is divided into k groups and x1 is the mean of n1 observations of
group 1, x 2 is the mean of n 2 observations of group2, …, x k is the mean of n k observations
of group k , then the combined mean ,denoted by xc , of all observations taken together is
given by

 If a wrong figure has been used in calculating the mean, we can correct if we know the
correct figure that should have been used. Let
 denote the wrong figure used in calculating the mean
 be the correct figure that should have been used
 be the wrong mean calculated using , then the correct mean, , is given
by

 If the mean of x1 , x 2 , ..., x n is x , then


a) the mean of x1  k , x 2  k , ..., x n  k will be x  k
b) The mean of kx1 , kx 2 , ..., kx n will be kx .
Example 1: Last year there were three sections taking Stat 273 course in a certain University. At
the end of the semester, the three sections got average marks of 80, 83 and 76. There were 28, 32
and 35 students in each section respectively. Find the mean mark for the entire students.
Solution:
n1 x1  n2 x 2  n3 x3 28(80)  32(83)  35(76) 7556
xc     79.54
n1  n2  n3 28  32  35 95
Example 2: An average weight of 10 students was calculated to be 65 kg, but latter, it was
discovered that one measurement was misread as 40 kg instead of 80 kg. Calculate the corrected
average weight.

23
Solution:

Exercise: The average score on the mid-term examination of 25 students was 75.8 out of 100.
After the mid-term exam, however, a student whose score was 41 out of 100 dropped the course.
What is the average/mean score among the 24 students?

Weighted Arithmetic Mean


In finding arithmetic mean, all items were assumed to be of equal importance. When due
importance is to be given to each item, that is, when proper importance is required to be given to
different data, then we find weighted average. Weights are assigned to each item in proportion to
its relative importance.
If x1 , x 2 , ..., x k represent values of the items and w1 , w2 , ... , wk are the corresponding weights,
then the weighted mean, ( x w ) is given by

Example: A student’s final mark in Mathematics, Physics, Chemistry and Biology are
respectively 82, 80, 90 and 70.If the respective credits received for these courses are 3, 5, 3 and 1,
determine the approximate average mark the student has got for one course.
Solution: We use a weighted arithmetic mean, weight associated with each course being taken as
the number of credits received for the corresponding course.
xi 82 80 90 70
wi 3 5 3 1

Therefore x w 
w x i i

(3  82)  (5  80)  (3  90)  (1  70)
 82.17
w i 3  5  3 1
Average mark of the student for one course is approximately 82.

Merits of Arithmetic Mean


- Arithmetic mean is rigidly defined a mathematical formula so that its value is always definite.
- It is calculated based on all observations.
- Arithmetic mean is simple to calculate and easy to understand. It doesn’t need arraying
(arranging in increasing or decreasing order) of the data.
- Arithmetic mean is also capable of further algebraic treatment.
- It affords a good standard of comparison.

Drawbacks of Arithmetic Mean


- It is highly affected by extreme (abnormal) observations in the series. For instance, the
monthly incomes of three boys are 37 birr, 53 birr and 48 birr and that of their father is 1026
birr. The average income become for one of these four people becomes 219 birr which is not
at all a representative figure.
- It can be a number which does not exist in the series.
- It sometime gives such results which appear almost absurd. For example it is likely that we
can get an average of ‘3.6 children’ per family.
- It gives greater importance to bigger items of a series and lesser importance to smaller items.
That means it is an upward bias measure.
- It can’t be calculated for open-ended classes.

24
3.3.2 Geometric Mean (G.M)
The geometric mean is the nth root of the product of n positive values. If X1, X2,…,Xn are n
positive values, then their geometric mean is
G.M =(X1X2…Xn)1/n .
The geometric mean is usually used in:
 Average rates of change

 Ratio

 Percentage distribution

 Logarithmical distribution.

In case of number of observation is more than two it may be tedious taking out from
square root ,in that case calculation can be simplified by taking natural logarithm
with base ten
1
G.M = n x1 , x2 . . . . xn G . M= x1 . . . . xn n take log in both sides.

1
log ( G . M) = log x1 , . . . . xn 
n
1
= log x1  log x2  . . .  log xn 
n
n
1
=
n

i  1
log xi

1 n 
G. M = Antilog   log xi 
n i 1 
This shows that the logarithms of G. M is the mean of the logarithms of individuals observations
Example1, The ratio of prices in 1999 to those in 2000 for 4 commodities were 0.9, 1.25,1.75 and
0.85. Find the average price ratio by means of geometric mean.
Solution:

G.M = antilog
 log X i
= antilog
(log 0.92  log 1.25  log 1.75  log 0.85)
n 4
(0.963  1  0.0969  0.2430  0.9294  1)
= antilog = antilog0.5829 = 1.14///
4
What is the arithmetic mean of the above values?
0.92  1.25  1.75  0.85
X=
4

25
Note that
1.when the observed values x1,x2,……….xn have the corresponding frequencies f1.f2………fn
respectively then geometric mean is obtained by
f
G.M = n
x1 1 , x f 2 2 . . . . x f n n
n n
1
=
n

i  1
fi log xi where n= 
i  1
fi

2. When ever the frequency distributions are grouped (continuous), class marks of the class
interval are considered as Xi and the above formula can be used that is
n f f2 fn
G.M = m1 1 , m2 . . . . mn
n n
1
=  f i log mi where n=  fi and mi is class mark if ith class.
n i  1 i  1

Properties of geometric mean


a. Its calculations are not as such easy.
b. It involves all observations during computation
c. It may not be defined even it a single observation is negative.
d. If the value of one observation is zero its values becomes zero.

3.3.3 Harmonic mean (H.M)


The Harmonic mean is the reciprocal of the arithmetic mean of the reciprocal of the single values.
If X1,X2, X3,…,Xn are n values, then their harmonic mean is
n n
H.M = =
1 1 1 1

X1 X 2
 ... 
Xn
X
i

Example
Find the harmonic mean of the values 2,3 &6.
3 3 3 6
H.M = = = = 3 ///
1/ 2  1/ 3  1/ 6 3  2 1 6
6

26
The harmonic mean is used to average rates rather than simple values. It is usually appropriate in
averaging kilometers per hour.
Example: A driver covers the 300km distance at an average speed of 60 km/hr makes the return
trip at an average speed of 50km/hr. What is his average speed for total distance?
Solution
Trip Distance Average speed Time taken

1st 300km 60km/hr 5hrs

2nd 300km 50km/hr 6hrs

Total 600km --------- 11hrs

Total dis tan ce


Average speed for the whole distance= =600km/11hrs=54.55km/hr.
Total time taken
Using harmonic formula
2
H.M= =600/110=54.55km/hr.
1 / 60  1 / 50
60  50
Note that A.M= =55km/hr
2
G.M= 60  50 =54.7km/hr
In general, A.M ≥G.M≥H.M
Note that
For simple frequency data harmonic is calculated by using the following formula.

  f i 
xi 
H. M = Reciprocal
n
n
= , Where n is the total no. of observations
f 
  xi 

 i 
Properties of harmonic mean
i. It is based on all observation in a distribution.
ii. Used when a situations where small weight is give for larger observation and
larger weight for smaller observation

27
iii. Difficult to calculate and understand
iv. Appropriate measure of central tendency in situations where data is in ratio,
speed or rate.
3.3.4 The Median

The median of a set of items (numbers) arranged in order of magnitude (i.e. in an array form) is the
middle value or the arithmetic mean of the two middle values. We shall denote the median of
x1 , x 2 , ..., x n by ~
x . For ungrouped data the median is obtained by

 x n 1 if the number of items, n, is odd


~  2
x  1
 ( x n  x n 2 ) if the number of items, n, is even
 2 2 2
For grouped data the median, obtained by interpolation method, is given by

Where Lmed  lower class boundary of the median class


F p  Sum of frequencies of all class lower than the median class (in other words it is the
cumulative
frequency preceding the median class)
f med  Frequency of the median class and W  is class width
The median class is the class with the smallest cumulative frequency greater than or equal to n . It
2
can be located by counting n of the frequencies beginning from the lowest class.
2
Examples1: The birth weights in pounds of five babies born in a hospital on a certain day are 9.2, 6.4,
10.5, 8.1 and 7.8. Find the median weight of these five babies.
Solution: the median is 8.1.
Merits of median
- Median is a positional average and hence it is not influenced by extreme values.
- Median can be calculated even in case of open-ended intervals.
- It gives best result in a study of those phenomena’s which are incapable of direct quantitative
measurement. Example: intelligence
Demerits of median
- It is not capable of further algebraic treatment.
- It is not a good representative of the data if the number of items (data) is small.
- The arrangement of items in order of magnitude is sometimes very tedious process if the number
of items is very large.

3.3.5 The Mode

The mode or the modal value is the most frequently occurring score/observation in a series and
denoted by x̂ . Note that the mode may not exist in the series or, even if it does exist, it may not be
unique.

For grouped data, the mode is found by the following formula:

28
 1 
xˆ  Lmod   W
 1   2 
Where Lmod  lower class boundary of the modal class
 1  The difference between the frequency of the modal class and the next lower class
 2  The difference between the frequency of the modal class and the next higher class
W  is the class width
The modal class is the class with the highest frequency in the distribution.
Examples 1: The marks obtained by ten students in a semester exam in statistics are: 70, 65, 68, 70,
75, 73, 80, 70, 83 and 86. Find the mode of the students’ marks.
Merits of mode
- Mode is not affected by extreme values.
- Mode can be calculated even in the case of open-end intervals. And it is not necessary to know all
observations.

Demerits of mode
- Mode may not exist in the series and if it exists it may not be a unique value.
- It does not fulfill most of the requirements of a good measure of central tendency
- It may be unrepresentative in many cases.

3.4 Measures of Location


Quantiles
Quantiles are values which divides the data set arranged in order of magnitude in to certain equal
parts. They are averages of position (non-central tendency). Some of these values of quantiles are
quartiles, deciles and percentiles.

I. Quartiles: are values which divide the data set in to four equal parts, denoted by Q1 ,Q2 and Q3 . The
first quartile is also called the lower quartile and the third quartile is the upper quartile. The second
quartile is the median.
 For Ungrouped data:
Let Q j be the j th quartile value for j  1, 2, 3 . Then
th
j 
Q j   n  1 item; j  1, 2, 3.
4 
 For grouped data
We can apply the following formula:
 j  n 4  FQ j 
Q j  LQ j   W ; j  1, 2, 3.
 f 
 Qj 
th
Where Q j  the j quartile which is to be worked out
LQ j  Lower class boundary of the j th quartile class
FQ j  Sum of frequencies of all classes lower than the j th quartile class
f Q j  Frequency of the j th quartile class and W  Class width

29
The j th quartile class is the class with the smallest cumulative frequency greater than or equal
to j  n 4 . It can be located by counting j  n 4 of the frequencies beginning from the lowest class.

II. Deciles: are values dividing the data in to ten equal parts, denoted by D1 , D2 , ..., D9 . The fifth decile
is the median.
 For Ungrouped data
Let D j be the j th decile value for j  1, 2, ... , 9 . Then
th
 j 
D j   n  1 item; j  1, 2, ... , 9
 10 
 For grouped data
We can apply the following formula:
 j  n10  FD j 
D j  LD j  W ; j  1, 2, ... , 9
 f Dj 
 
Define the symbols similar way as we did in the case of quartiles.
The j th decile class is the class with the smallest cumulative frequency greater than or equal
to j  n 10 . It can be located by counting j  n 10 of the frequencies beginning from the lowest class.

III. Percentiles: are values which divide the data in to one hundred equal parts, denoted by P1 , P2 , ... P99 .
The fiftieth percentile is the median.
 For ungrouped data
Let Pj be the percentile value for j  1, 2, 3, ... , 99 . Then
th
 j
Pj   n  1 item; j  1, 2, 3, ... , 99
 100 
 For grouped data
We can use the following formula:
 j  n100  FPj 
Pj  LPj   W ; j  1, 2, 3, ... , 99
 f Pj 
 
Define the symbols similar way as we did in the case of quartiles.
The j th percentile class is the class with the smallest cumulative frequency greater than or equal
to j  n 100 . It can be located by counting j  n 100 of the frequencies beginning from the lowest class.

Interpretations

1. Q j is the value below which ( j  25) percent of the observations in the series are found
(where j  1, 2, 3 ). For instance, Q3 means the value below which 75 percent of observations in the
given series are found.
2. D j Is the value below which ( j  10) percent of the observations in the series are found
(where j  1, 2, ... , 9 ). For instance, D4 is the value below which 40 percent of the values are
found in the series.

30
3. Pj is the value below which j percent of the total observations are found
(where j  1, 2, 3, ... , 99 ). For example, 73 percent of the observations in a given series are
below P73 .
3.5 When to Use the Different Averages

Mean is appropriate if the data is quantitative and there is no extreme (abnormal) observation(s). For
the data having extreme value(s) (or for qualitative data having ordinal measurement scale) it is better
to use median as measure of central tendency. It is largely used measure of central tendency in
psychology, education and other social sciences. On the other hand, mode is best measure of central
tendency for qualitative data with nominal scale of measurement. It can also be used as a quick
measure of central tendency for both qualitative and quantitative data.

Exercises
1. Explain the desirable properties of measures of central tendency.
2. Discuss the mathematical properties of arithmetic mean.
3. Given the following frequency distribution on wages per week of 100 workers in a certain
factory.
Wage class 39.5-44.5 44.5-49.5 49.5-54.5 54.5-59.5 59.5-64.5 64.5-69.5
No of workers 15 22 30 15 10 8
Calculate average wage paid by the factory
4. The mean salary paid to 1000 employees of an establishment was found to be 180.4. Later on
after disbursement of the salary it was discovered that the salary of two employees was wrongly
entered as 297 and 165. Their correct salaries were 197 and 185. Find the correct average salary
of the employee.
5. A tourist traveled 900 Km by train at average speed of 60 Km/hr. 300 Km by boat at an
average speed of 25 Km/hr. 400 Km by plan at an average speed 350 km /hr and finally 15 Km
by train at speed of 25 Km/hr. What is his average speed (use the concept of harmonic mean)
6. Given the following data:
Food items Quantity consumed Price (in birr) (per kg.)
Flour 500 kg. 3.25
Ghee 20 kg. 50.00
Sugar 30 kg. 8.00
Oil 40 kg. 20.00
Calculate the weighted price mean
7. Determine an appropriate average for the following income distribution.

31
Income Groups: below 100 100-200 200-300 300-400 400-500 above 500
No. of persons: 5 10 18 30 20 17
8. The following table gives the mark distribution of 60 students out of 10% in mathematics test.
Marks: 4.5 5 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
No. of students 2 5 7 2 6 4 8 4 2 10 5 5
Find the values of Q1, Q3, P30, D7 and modal size of shoes.

32
Chapter Four

Measures of Dispersion (Variation)

4.1 Objectives of Measuring Variation

Variation (dispersion) is the scatter or spread of observations /values/ in a distribution.


The average or central value is of little use unless the degree of variation, which occurs about it,
is given. If the scatter about the measure of central tendency is very large, the average is not a
typical value. Therefore it is necessary to develop a quantitative measure of the dispersion (or
variation) of the values about the average.

Measures of variation are statistical measures, which provide ways of measuring the extent to
which the data are dispersed or spread out.

Measures of variation are needed for the following basic objectives.


 To judge the reliability of a measure of central tendency
 To compare two or more sets of data with regard to their variability
 To control variability itself like in quality control, body temperature, etc
 To make further statistical analysis or to facilitate the use of other statistical measures.

Properties of a good measure of dispersion


A good measure of dispersion should:
- be rigidly defined by a mathematical formula,
- be simple to understand and easy to calculate,
- be unique,
- be fundamental of all observations in the series,
- not be affected by some extreme values existing in the series,
- have sampling stability property, and
- be capable of further algebraic treatment as well as further statistical analysis.

4.2 Absolute and Relative Measures of Dispersion

Measures of dispersion /variation may be either absolute or relative. Absolute measures of


dispersion are expressed in the same unit of measurement in which the original data are given.
These values may be used to compare the variation in two distributions provided that the
variables are in the same units and of the same average size.

In case the two sets of data are expressed in different units, however, such as quintals of sugar
versus tones of sugarcane or if the average sizes are very different such as manager’s salary
versus worker’s salary, the absolute measures of dispersion are not comparable. In such cases
measures of relative dispersion should be used.

A measure of relative dispersion is the ratio of a measure of absolute dispersion to an appropriate


measure of central tendency. It is sometimes called coefficient of dispersion because the word
“coefficient” represents a pure number (that is independent of any unit of measurement). It should
be noted that while computing the relative dispersion, the average (the measure of central
tendency) used as a base should be the same one from which the absolute deviations were
measured. Note also that the value of a relative dispersion is unit less quantity.

33
4.3 Types of Measures of Dispersion
4.3.1 The Range and Relative Range

Range (R) is defined as the difference between the largest and the smallest observation in a given
set of data. That is, R  x max  x min where xmax and xmin are the largest and the smallest
observations in the series respectively.

In the case of grouped data, range is found by taking the difference between the class mark of the
last class and that of the first class. That is, R  M last  M first where M last and M first are the
class marks of the last class and that of the first class respectively.

A relative range (RR), also known as coefficient of range, is given by


x max  x min R
RR   ........ for ungrouped data
x max  x min x max  x min
M last  M first R
RR   ......... for grouped data
M last  M first M last  M first

Properties of Range and Relative Range


- Range and relative range are easy to calculate and simple to understand.
- Both cannot be computed for grouped data with open ended classes.
- They do not tell us anything about the distribution of values in the series.

Example 1: Find the range and relative range for the monthly salary of ten workers in a certain
paint factory given below.

462 480 534 624 498 552 606 588 516 570

Solution:
x max  624 birr x min  462 birr
R  x max  x min  624 birr  462 birr  162 birr
x max  x min 624 birr  462 birr 162 birr
RR     0.149
x max  x min 624 birr  462 birr 1086 birr

Example 2: Find the values of the range and relative range for the following frequency
distribution: which shows the distribution of the maximum loads supported by a certain number
of cables.
Maximum load Number
(in kilo-Newton) of cables
93 – 97 2
98 – 102 5
103 – 107 12
108 – 112 17
113 – 117 14
118 – 122 6
123 – 127 3
128 – 132 1

34
Solution:
M first  95 kN M last  130 kN
R  M last  M first  130 kN  95 kN  35 kN
M last  M first 130 kN  95 kN 35 kN
RR     0.156
M last  M first 130 kN  95 kN 225 kN

4.3.2 The Mean Deviation and Coefficient of Mean Deviation

The mean deviation (MD) measures the average deviation of a set of observations about their
central value, generally the mean or the median, ignoring the plus/minus sign of the deviations.

The mean deviation of a sample of n observations x1 , x 2 , ... , x n is given as

MD 
x i A
Where A is a central measure (the mean or the median)
n
In case of grouped data, the formula for MD becomes

MD 
f i xi  A
Where xi is the class mark of the i th class, f i is the frequency of
n
th
the i class and n  f i .

 The mean deviation about the arithmetic mean is, therefore, given by

MD 
x i x
.... for ungrouped data
n

MD 
f i xi  x
.... for grouped frequency distribution; where xi is the class mark of
n
the i th class, f i is the frequency of the i th class and n  f i

 The mean deviation about the median is also given by


x i ~
x
MD  .... for ungrouped data
n

MD 
 f i xi  ~x .... for grouped frequency distribution; where x is the class mark of
i
n
the i th class, f i is the frequency of the i th class and n   f i .

The coefficient of mean deviation (CMD) is the ratio of the mean deviation of the observations to
their appropriate measure of central tendency: the arithmetic mean or the median.
MD
In general, CMD  where A is a measure of central tendency: the arithmetic mean or the
A
median.

35
MD
That is, CMD about the arithmetic mean is given by CMD  where MD is the mean
x
deviation calculated about the arithmetic mean. On the other hand CMD about the median is
MD
given by CMD  ~ in which case MD is calculated about the median of the observations.
x

Properties of Mean Deviation and coefficient of mean deviation


- It is easy to understand and compute
- It is based on all observations
- It is not affected very much by the of extreme value(s).
- It is not capable of further mathematical treatments and it is not a very accurate measure
of dispersion.

4.3.3 The Variance, the Standard Deviation and Coefficient of Variation

The Variance
Variance is the arithmetic mean of the square of the deviation of observations from their
arithmetic mean.
 Population Variance (  2 )
For ungrouped data

 x  
2
1  xi 
2

 . ..    x i   Where  is the population arithmetic
2 i 2
 
N N N 
 
mean and N is the total number of observations in the population.
For grouped data

2  f x i 1 
i  
2
2  f i xi 2  Where is the population
 
N
 . .. 
N
 f i xi  N  
 
arithmetic mean, xi is the class mark of the i class, f i is the frequency of the i th class
th

and N  f i .

 Sample Variance ( S 2 )
For ungrouped data

2  x i  x
2
1  2  xi 2  Where is the sample arithmetic
S 
n 1
 ... 
n 1
 xi  n  x
 
mean and n is the total number of observations in the sample.
For grouped data

2  f x i 1 
i  x
2
2  f i xi  2

 Where x is
S 
n 1
 .. . 
n 1
 f i xi 
n 
the sample
 
arithmetic mean, xi is the class mark of the i th class, f i is the frequency of the i th class
and n  f i .

36
The Standard Deviation
Standard deviation is the positive square root of the variance.
 Population Standard Deviation (  )
   2 where  2 is the population variance.

 Sample Standard Deviation ( S )


S  S 2 where S 2 is the sample variance.

Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).

Coefficient of variation is used in such problems where we want to compare the variability of two
or more than two different series. Coefficient of variation is the ratio of the standard deviation to
the arithmetic mean, usually expressed in percent.
S
CV   100 . Where S is the standard deviation of the observations.
x
A distribution having less coefficient of variation is said to be less variable or more consistent or
more uniform or more homogeneous.

Example: Last semester, the students of Biology and Chemistry Departments took Stat 273
course. At the end of the semester, the following information was recorded.

Department Biology Chemistry


Mean score 79 64
Standard deviation 23 11

Compare the relative dispersions of the two departments’ scores using the appropriate way.

Solution:
Biology Department Chemistry Department
S S
CV   100 CV   100
x x
23 11
  100  29.11%   100  17.19%
79 64

Interpretation: Since the CV of Biology Department students is greater than that of Chemistry
Department students, we can say that there is more dispersion relative to the mean in the
distribution of Biology students’ scores compared with that of Chemistry students.

Properties of the Variance and the Standard Deviation


Variance
– It removes most of the demerits or drawbacks of the measures of dispersion discussed so far.
– Its unit is the square of the unit of measurement of values. For example, if the variable is
measured in kg, the unit of variance is kg2.
– It is calculated based on all the observations/data in the series.
– It gives more weight to extreme values and less to those which are near to the mean.

37
Standard Deviation
– It is considered to be the best measure of dispersion.
– [Demerits] If the values of two series have different unit of measurement, then we can not
compare their variability just by comparing the values of their respective standard deviations.
– It is calculated based on all the observations/data in the series. Standard deviation is capable of
further algebraic treatment.
– Standard deviation is as such neither easy to calculate nor to understand.
– Similar to the variance, standard deviation gives more weight to extreme values and less to
those which are near to the mean.

The Standard Scores (Z-Scores)


A standard score is a measure that describes the relative position of a single score in the entire
distribution of scores in terms of the mean and standard deviation. It also gives us the number of
standard deviations a particular observation lie above or below the mean.
x
Population standard score: Z  where x is the value of the observation,  and  are the

mean and standard deviation of the population respectively.
xx
Sample standard score: Z  where x is the value of the observation, x and S are the mean
S
and standard deviation of the sample respectively.

Interpretation:

Example: Two sections were given an exam in a course. The average score was 72 with standard
deviation of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A from
section 1 scored 84 and student B from section 2 scored 90. Who performed better relative to
his/her group?
Solution: Section 1: x = 72, S = 6 and score of student A from Section 1; x A = 84
Section 2: x = 85, S = 5 and score of student B from Section 2; x B = 90

x A  x1 84  72
Z-score of student A: Z    2.00
S1 6
x  x 2 90  85
Z-score of student B: Z  B   1.00
S2 5
From these two standard scores, we can conclude that student A has performed better relative to
his/her section students because his/her score is two standard deviations above the mean score of
selection 1 while the score of student B is only one standard deviation above the mean score of
section 2 students.

Exercises
1. Consider the marks of 20 students out of 20% in biology test as follows

38
Marks of Students’ 0-5 5-10 10-15 15-20 Total

Number of students 2 6 8 4 20
Find
i. Range
ii. Quartile deviation
iii. Mean and median deviation
iv. Variance and standard deviation
2. The final exam of a course consists of two exams: mathematics and History. If a student
scored 66 in Mathematics and 80 in History. However, all students’ average score is 51
with a standard deviation of 12 in mathematics and 72 with the standard deviation 16 in
history.
a. In which subject a student had better performance?
b. In which subject all students have similar (consistent) results?

39
Chapter Five

Elementary Probability

5.1. Definition of basic terms of probability

Random experiment: - is a process of measurement or observation which is repeated at any time


and whose out come can’t be predicted with certainty. E.g. tossing a coin
Outcome: - a particular result of an experiment (result of single trial of an experiment)
Sample space: - is the set of all possible outcomes of a random experiment. Each possible
outcome is called sample point.
Event: - is a subset of a sample space (one or more outcomes of an experiment)
Example1: if we toss a coin, the sample space (S) of this experiment is
S = {head, tail} where head and tail are two faces of a coin. If we are interested the outcome of
head will turn up then the event E= {head}
Example 2: the sample space of tossing a coin twice is
S= {HH, HT, TH, TT}
Elementary or simple event: - an event having only one sample point.
Mutually exclusive event: - two events E1 and E2 are said to be mutually exclusive if there is no
sample point which is common to E1 and E2.
 E1n E2 =  i.e., if E1 and E2 are mutually exclusive events, then P (E1  E2) = P (E1) +
P (E2).
Independent event: two events E1 and E2 are said to be independent if the occurrence or non
occurrence of one cannot affect the occurrence or non occurrence of the other.
Equally likely outcomes: - if each outcome in a sample space has the same chance to be
occurred.
Example In throwing a fair die all possible outcomes are equally likely. That means the elements
of the sample space have equal chance to be occurred.
Definition of probability
Probability:-is a chance (likely hood) of occurrence of an event. It is expressed by a numerical
value between 0 and 1 inclusively. Probability is a building block of inferential statistics.
Generally probability can be divided into two
i) Subjective probability: - probability of an event in a certain experiment to be occurred
based on individual’s belief or attitude.
ii) Objective probability: - the probability of an event in a certain experiment based on
experimental evidence.

40
Counting techniques:
In order to determine the number of out comes one can use several rules of counting.
1. Multiplication rule: - in a sequence of n events in which the first event has k1 possibilities, the
second event has k 2 possibilities,…, the nth event has kn possibilities, then the total possibilities of
the sequence will be k1.k2….kn.
Example: - in a personnel department a larger corporation wishes to issue each employee an ID
cards with two letters followed by two digit numbers. How many possible ID cards can be
imposed?
Solution
K1 K2 K3 K4
26 26 10 10
Thus the total number of ID cards issued could be:
26*26*10*10=67600(with repetition)
26*25*10*9=58500 (with out repetition)
2. Permutation: is an arrangement of n objects in a specific order. In this case order is crucial.
a) The number of permutations of n objects taken all together is n! i.e. n! / (n-n)!
b) The arrangement of n distinct objects in a specific order taking r objects at a time is given by
nPr =n!/(n-r)!= n(n-1)(n-2)…..(n-r-1)
c) The number of permutation of n objects in which k1 are alike, k2 are alike, kn are alike is
n! / k1!k2!....kn!
Example: a photographer wants to arrange 3 persons in a raw for photograph. How many
different types of photographs are possible?
Solution:
Assume 3 persons Aster (A), lemma (L), Yared (Y) and n=3
Since n! =3! = 3*2! = 6, there are 6 possible arrangement ALY, AYL, LAY, LYA, YLA and YAL
Example2: fifteen athletes including Haile were entered to the race.
a) In how many different ways could prizes for the first, the second and the third place be
awarded?
b) How many of the above triplets just counted have if Haile is in the first position?
Solution:
a) 15 objects taken 3 at a time 15P3=15! / (15-3)! = 2730
b) There are 14P2= 14! / (14-2) = 182

41
3. Combination: - counting technique in which the order of the objects is immaterial. Selection of
r objects from a collection of n objects where r<= n without regarding order. The
combination of n objects r objects taken at a time is given by
nCr = n! / (n-r)! r!
Example: In a club containing 7 members a committee of 3 people is to be formed. In how many
ways can the committee be formed?
Solution: 7C3 = 7! / (7-3)! 3! = 35
Basic approaches to probability
Classical approach: - Uses sample space to determine the numerical probability that an event
will happen. If there are n equally likely outcomes of an experiment, and out of the n outcomes
event E occur only k times the probability of the event E is denoted by P (E) is defined as
P (E) = n (E)/ n(S) =k/n

Deficiencies of classical approach


- If total number of outcomes is infinite or if it is not possible to enumerate all elements of
the sample space.
- If each out come is not equally likely
Example: in the experiment of tossing a coin and a die together, find the probability of an event
E consisting head and even numbers.
Solution: S={H1,H2,H3,H4,H5,H6,T1,T2,T3,T4,T5,T6} then
E= {H2, H4, H6} thus, P (E) =n (E)/n(S) =3/12= ¼
Let S be sample space of an experiment, P is called probability function if it satisfies the
following condition
0 < P (A) ≤ 1, for each event A, P (A) is called probability of A
Where P (S) = 1
Note: If A and B are mutually exclusive events, then P (A  B) = P (A) + P (B)

Similarly P (  Ai ) =P ( A
i 1
1
)+P( A ) +…+ P ( A )
2 n


=  P( A )
i 1
i

Relative frequency Approach (empirical approach):- suppose we repeat a certain experiment n


times and let A be an event of the experiment and let k be the number of times that event A occurs.

42
Then the ratio k/n is called the relative frequency of event A.
number of times event A has occurred k
P ( A)  
total number of observations n
In other words given a frequency distribution , the probability of an event (E) being in a
frquency of a class
given class is P(E)=
total frequency in the distribution
Example: the national center for health statistics reported out of every 539 deaths in recent years,
24 resulted from automobile accident, 182 from cancer, and 353 from other disease. What is the
probability that particular death is due to an automobile accident?
Solution P (automobile) = death due to automobile /total death =24/539

Rules of probability
Rule l: let A be an event and A’ be the compliment of A with respect to a given sample space of
an experiment, then p(A’)=1-P(A)
Proof:
let S be a sample space
 S=A  A’
 A  A’ =   P ( An A’)=0
 P(S) = P (A  A’) = P (A’) + P (A) - P( An A’)
 1= P (A’) + P (A) - 0  1= P (A’) + P (A)
 P (A’) = 1 - P (A)
Rule 2: let A and B are events of a sample space S, then
P (A’  B) = P (B)-P (A  B)
Proof: B =S  B = (A  A’)  B = (A  B)  (A’  B)
Case 1: if A  B ≠  , then P (B) =P (A  B) +P (A’  B)

P (A’  B) = P (B) – P (A  B)
Case 2: if A  B =  , then P (B) =P (A  B) + P (A’  B) since P (A  B) = P (  ) =0

=> P (B) = P (A’  B)


Rule 3: Suppose A and B are two events of a sample space, then
P (A  B) = P (A) + P (B) - P (A  B)
Example: A fair die is thrown twice. Calculate the probability that the sum of spots on the face of
the die that turn up is divisible by 2 or 3.

43
Solution:
S= {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),
(3,2),(3,3),(3,4),(3,5),(3,6),(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,4),(5,5),(5,6),
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}
This sample space has 6*6 =36 elements let E1 be the event that the sum of the spots on the die
is divisible by 2 and E2 be the event that the sum of the spots on the die is divisible by 3, then
P (E1 or E2) = P (E1  E2)
= P (E1) +P (E2) – P (E1  E2)
= 18/36 + 12/36 -6/36 = 24/36 = 2/3
Conditional probability: the conditional probability of an event A in relation to B is defined as
the probability that event E occurs given that event A has already occurred.
P (A/B) = P (A  B)/ P (B) where P (B) > 0
Remark: (i) P (A  B) & P (B) are computed w. r. t. original sample
(ii) P (S/B) = P(S  B)/P (B) = P (B)/P (B) = 1
P (B/S) = P (B) because P (B/S) = P (B  S)/P(S) = P (B)/1 =P (B)
(iv) if A and B are independent events, then P(A/B) =P(A) and P(B/A) =P(B) two events
are independent if the occurrence of B doesn’t affect the occurrence of A. i.e. P(A/B)
=P(A  B)/P(B)
P (A  B) = P (A/B) *P (B) but P (A/B) = P (A)
Hence P (A  B) = P (A)* P (B)
Example: Suppose that an office has 100 calculating machines. Some of them use electric power
(E) while others are manual (M) and some machines are well known (N) while others are used
(U). The table below gives numbers of machines in each category. A person enter the office picks
a machine at random and discovers that it is new. What is the probability that it is used with
electric power?
E M Total
N 40 30 70
U 20 10 30
Total 60 40 100
Solution: P (E/N) =P (E  N) /P (N) = 40/70 =7/4

44
Baye’s theorem
Theorem 1.1: let {E1,E2, .., En} be partitions of the sample space S, and suppose E1,E2, .., En
has non-zero probability that is P(Ei) ≠ 0 for I = 1,2, … ,n and let E be any event, then P(E)
=P(E1)* P(E/E1) + P(E2)*P(E/E2) +….+P(En)*P(E/En)
n
=  P ( E )P( E E )
i 1 i

Theorem 1.2: (Baye’s theorem)


Let {E1,E2, .., En} be partitions of the sample space S, and suppose E1,E2, .., En has non-zero
probability that is P(Ei) ≠ 0 for I = 1,2, … ,n and let E be any event for which P(E) > 0, then
for each integer k, 1 ≤ K ≤ n, we have
P( E k ) P( E
E)
p( E k )= k
E n

 P( E ) P( E E )
i 1
i i

Example: suppose that three machines are A1, A2 and A3 produce 60%, 30%, and 20%
respectively of the total production of machines are 2%, 4%, and 6% respectively.
If an item is selected at random, then find the probability that the item is defective.
Assuming that an item selected at random is found to be defective. Find the probability that the
item was produced on machine A1.
Solution :let B be an event of selecting a defective item at random and let E1, E2, E3 be an items
produced on machines A1, A2, A3 respectively then
P (B/E1) = 2%=0.02, P (B/E2) = 4% = 0.04 and P (B/E3) = 6% = 0.06
P (B) = P (B  [E1  E2  E3])
= P ([B  E1]  [B  E2]  [B  E3])
= P (B  E1) + P (B  E2) +P (B  E3)
= P (E1)*P (B/E1) + P (E2)*P (B/E2) +P (E3)*P (B/E3)
= 0.6*0.02 + 0.3*0.04 + 0.1*.006
= 0.03
p ( E1  B ) P ( E1) P ( B
E ) = 0.6 * 0.02 =0.4
1
We use Baye’s formula P (E1/B) = = n
P( B) 0.03
 P( E ) P( B E )
i 1
i i

Exercise

1. For two equally likely, exhaustive and independent events A and B, p(AnB) = ------

2. If there is any event A in the sample space(S), prove that

45
i) P(A/S) = P(A) and ii) P(S/A) = 1
3. From your class of 20 female and 30 male total students the department head wants to
select 5 female and 7 male students for the purpose of specific meeting
a. What is the possible number of ways to select those required students with out
any restriction
b. What is the probability that 6 male and 3 female students to be included in to the
meeting.
4. Five biology, 2 statistics and 3 physics books are to be arranged in a row where books of
the same subjects are not distinguishable from each other, how many different ways of
arrangement are possible?
5. There are 12 ways in which manufactured items can be minor defective and 10 ways in
which it can be major defective. In how many ways can
a. One minor and one major defective occur?
b. Two minor and 2 major defective occur?
6. Out of 3 mathematicians and 7 physicists, a committee consisting of 2 mathematician and
3 physicists is to be formed.
i. In how many ways can this be done if
a. Any mathematicians and physicists can be included?
b. One particular physicist must be on the committee

c. Two particular mathematicians cannot be on the committee

ii. Find probabilities of the above a, b, and c?

46
Chapter six: Probability Distributions

Probability distribution: is a list of all the possible out comes of an experiment and the
probability associated with each out come.
Example: Suppose we are interested in the number of heads showing face up on 3 tosses of coin.
This is the experiment and the possible outcomes are 0 heads, 1 head, 2 head, and 3 heads. What
is the probability distribution for the number of heads?
Solution: The experiment has 8 possible outcomes, and below is the list of all the outcomes.
Possible Coin toss No. of heads
st nd rd
result 1 2 3
1. T T T 0
2. T T H 1
3. T H T 1
4. T H H 2
5. H T T 1
6. H T H 2
7. H H T 2
8. H H H 3
From the above table, the probability distribution for the number of heads is
No. of heads, x P (outcome), P (x)
0 1/8
1 3/8
2 3/8
3 1/8
Total 1
6.1. Random variables.
A random variable is a quantity resulting from an experiment that can assume different values.
In any experiment of chance, the outcomes occur randomly. For example, rolling a single die is
an experiment; and any one of the six possible outcomes can occur at a time.
A random variable may be either discrete or continuous.
i. Discrete random variable: a variable that results from counting and can assume only certain
clearly separated values of some item of interest.
Example: The number of heads in flipping a fair coin 5 times.
ii. Continuous random variable: a variable that results from measuring and can take any value
with in a certain range of values.
Example: The distance b/n Sodo & Addis Ababa could be 330 km, 330.5 km, 331.5 km. and
soon; depending on the accuracy of our measuring device.
6.2. Discrete probability distributions (probability mass function), expectation and variance
of discrete random variable
If we organize a set of discrete random variables in a probability distribution, the distribution is
called a discrete probability distribution; it is also called probability mass function (pmf). And
it can be summarized by its mean and variance.
Mean: The mean of a probability distribution is also referred to as expected value, E (x), and is
given by
Mean = E (x) =∑(x p(x))
P(x)= p (the possible value of random variable x).
Variance & standard deviation: Though the mean is a typical value used to summarize a discrete
probability distribution, it does not describe about the spread in the distribution, but the variance
does this.

47
2 2
= = = ∑x2p(x) – 2

Standard deviation (δ) = var iance

Example: the following is the probability distribution for the number of cars a company expects
to sell on a particular day.
No. of cars sold, x Probability. P(x)
0 0.1
1 0.2
2 0.3
3 0.3
4 0.1
Total 1.0
1. What type of distribution is it?
2. On a typical day, how many cars does the company expect to sell?
3. What is the variance of the distribution? What is the standard deviation?
Solution:
1. It is a discrete probability distribution.
2. = E (x) =∑(x p(x))
= 0(0.1) +1(0.2) +2(0.3) +3(0.3) +4(0.1)
= 2.1.
Interpretation: Over a large number of days, the company expects to sell 2.1cars a day. Of course,
it is not possible for him to sell exactly 2.1 cars on any particular day; thus the mean is sometimes
called the expected value.
2
3. = ∑x2p(x) – 2 = (02(0.1)+12(0.2)+…+42(0.1)) - (2.1)2 = 1.29
= 2 = 1.29
 1.136

48
6.3. Common discrete problem distributions
1. Binomial distribution.
It is used to represent the probability distribution of discrete random variables. Binomial means
two categories. The successive repetition of an observation (trial) may result in an outcome which
possesses or which does not possess a specified character. Our primary interest will be either of
these possibilities. Conventionally, the outcome of primary interest is termed as success. The
alternative outcome is termed as failure. These terminologies are used irrespective of the nature
of the outcome. For example, non-germination of a seed may be termed as success.
Properties:
1. There must be only two mutually exclusive outcomes: success or failure.
2. The probability of success, p, and the probability of failure, q=1-p, remains constant from
one trial to another.
3. The probability of success in one trial is totally independent of any other trial.
4. The experiment can be repeated many times
Example: The coin flip experiment has only two possible outcomes: head or tail. The probability
of each is known and constant from one trial to another. We can flip a coin many times.
The binomial distribution is computed by

P( x) n c x ( p x )(q n  x )
C = combination
n= number of trials
x=number of successes
p=the probability of success
q=1-p=the probability of failure
Mean of a binomial distribution
= np
Variance of a binomial distribution
2
= npq
Example: There are 5 flights daily from Addis Ababa to Washington, suppose the probability that
any flight arrives late is 0.2. What is the probability that
a. None of the flights are late today?
b. Exactly one flight is late today?
c. Construct the entire probability distribution
d. What is the probability that less than 3 flights are late?
e. What is the probability that more than 4 flights are late?
f. Between 2 and 4 (inclusive) flights are late?
g. Exactly 2 flights are not late?
h. What is the mean?
i. What is the variance?
Solution: given that the probability of a particular flight is late is 0.2, and thus the probability that
a particular flight is not late is 0.8. There are 5 flights, so n = 5, and x refers to the number of
successes. In the questions a to e, we are asked about the late flights, so here let success = late
flight. Then p = 0.2, and q = 0.8.
a. P (none of the flights are late today) = P (0 flights are late) = P (x = 0)
P( x) n c x ( p x )(q n  x )
P(0) 5 c0 (0.2 0 )(0.850 ) =0.3277

49
b. P (exactly one flight is late today) = P (1 flight is late) = P (x = 1)
P(1) 5 c1 (0.21 )(0.8 51 )  0.4096
c. The entire distribution is
Number of P (x)
late flights, x
0 0.3277
1 0.4096
2 0.2048
3 0.0512
4 0.0064
5 0.0003
Total 1.0000
d. P (less than 3 flights are late today) = P (x < 3) = P (x = 0) + P (x = 1) + P (x = 2)
From the above table P (x < 3) = 0.3277 + 0.4096 + 0.2048 = 0.9421
e. P (x > 4) = P (x = 5) = 0.0003
f. P (2 ≤ x ≤ 4) = P (x = 2) + P (x = 3) + P (x = 4) = 0.2048 + 0.0512 + 0.0064 = 0.2624
g. P (exactly 2 flights are not late) = ?
Here we are asked about the not late flights, so we let success = not late flights.
So p=0.8, and q=0.2
Then P (exactly 2 flights are not late) = P (2) 5 c 2 (0.8 2 )(0.2 5 2 )  0.0512
h. = np = 5 * 0.2 = 1 late flight or 5 * 0.8 = 4 not late flights
2
i. = npq = 5 * 0.2 * 0.8 = 0.8
2. The Poisson distribution
The Poisson distribution is also used to represent the probability distribution of a discrete
random variable. It is employed in describing random events that occur rarely over some
unit of time or space.
Examples of events where Poisson probability function can be used:
 Number of telephone calls per hour
 Number of typing errors per page
 Number of accidents on a particular road per day
 Hospital emergencies per day,
etc
Assumptions:
1. The probability of occurrence of an event is constant for any two intervals of time or
space
2. The occurrence of an event in any interval is independent of the occurrence in any other
interval.
Having these assumptions, the poisson distribution is given by the function
x e 
P (x) =
x!
Where x = the number of times the event has occurred
 = is the mean no. of occurrences per unit of time or space.
e = 2.71828, the base of the natural logarithm system.
Example: Simple observation over the past 80 hours has shown that 800 customers have entered
the shop. What is the probability that
a. exactly 5 customers will arrive during any given hour?
b. more than 3 customers will arrive during any given hour?

50
c. exactly 5 customers will arrive during any 30 minutes?
800
Solution:  =  10 customers
80 hour
5 10
10 2.71828
a. P (x = 5) =  0.0378
5!
b. P (x > 3) = P (4) + P (5) + …
by the complement rule that we have discussed earlier P (x > 3) = 1 – P (x ≤ 3)
10 0 e 10 101 e 10 10 2 e 10 10 3 e 10
= 1  P(0)  P(1)  P (2)  P(3) = 1 -     
0! 1! 2! 3!
= 1 – (0.0103) = 0.989
c. P (x = 5/30 minutes)
Here, as we are asked per 30 minutes, we should change the μ value per 30 minutes; thus
800
 =  10 customers  10 customers  5 customers
80 hour 60 min utes 30 min utes
5 5
5 2.71828
P (x = 5) =  0.175
5!
3. Hyper geometric distribution
When the probability of success does not remain constant from trial to trial when
sampling from a relatively small proportion with out replacement, the binomial
distribution should not be used. Instead the hyper geometric distribution should be
applied.
Assumtions:

6.4. Continuous probability distribution


Continuous probability distribution is also called probability density function (pdf)
Let x be a continuous random variable, then the pdf of x is a function f(x), such that for any two
numbers a and b with a b
b
P (a ) = Pa  x  b    f ( x)dx
a
Which is the area under the curve bounded by x=a and x=b

If f(x) is pdf of x
1. f(x) 0 for all x

2.
 f ( x)dx  1

i.e. area under the graph of f(x) must equals 1, since the sum of relative frequencies is 1.
Example: The diameter of an electronic cable, say x, is assumed to be continuous random
variable with pdf f(x)=6x(1-x), 0
1. Check f(x) is pdf
2. Determine number ‘b’ such that P(x<b)=P(x>b)

So/n: 1. To check f (x) is pdf, we should check the two points


i.f(x) 0 for all x Simple trial and error check can show us f (x) 0

51

ii.
 f ( x)dx  1

1 1 1 1
2 6x 2 2 1 6x3 1
0 6 x(1  x)dx  0 (6 x  6 x )dx  0 6 xdx  0 6 x dx  2 0

3 0
 3 2 1

2. P(x<b)=P(x>b) means P ( P (  x  b)  P (  x  b)


= P (  x  b)  P (b  x  )
b 
  f ( x)dx   f ( x)dx
 b
b 
  6 x(1  x)dx   6 x(1  x)dx
 b
b 1
  6 x(1  x)dx   6 x(1  x)dx
0 b
2 3 b 1
 6x 6x   6 x 2 6 x3 
      
 2 3  0  2 3  b
    
 3b 2  2b 3  3(0) 2  2(0) 3  3(1) 2  2(1) 3  3b 2  3b 3   
 3b 2  2b 3  1  3b 2  2b 3
 4b 3  6b 2  1  0
Then we can solve mathematically for b, and we will take the value of b that lies in the given
range of the function only.

Expected value and variance of a continuous random variable:



E(x) = μ =
1.
 xf ( x)dx

 
2 2
2. Var (x)==
 ( x   ) f ( x)dx  x f ( x)dx   2
 

Example: Calculate the E(x) and Var (x), for the following function
f(x) = 2x, 0
So/n: 1. E (x) =
 1 1 1
 2x3  2 2
   xf ( x)dx   x(2 x)dx   2 x    
 0 0  3 0 3
b. var (x)=
 1 2 1 1
2 4  2x 4  4 2 4 2 1
  x f ( x)dx     x 2 xdx      2 x 3 dx 
2 2 2
      
 0 3 0
9  4  0 9 4 9 36 18

52
The cumulative density function (cdf), F(x)
If x is a continuous random variable with pdf, f(x), then
x
F(x)= P (X x) =

 f (t )dt;  x  
Properties
1. 0 F(x) 1
2. F ' ( x)  f ( x)
3. F(- )= 0, F( )=1
4. P(a  x  b)  F (b)  F (a )
Example Given f(x) = 6x (1-x), 0 ,
1. Find F(x)
2. what is the P (0.3  x  0.8)
So/n: 1. F (x) =
x x
  f (t )dt ;  x     6t (1  t )dt;0  x  1
 0
x x x x
2  6t 2   6t 3 
  6tdt   6t dt      
0 0  2 0  3 0
=> F ( x)  3 x 2  2 x 3
2. P (0.3  x  0.8) = F (0.8) – F (0.3) = (3(0.8)2–2(0.8)3) – (3(0.3)2–2(0.3)3)

6.5. Common continuous probability distributions


1. Normal distribution (N-distribution)
It is the most important distribution in describing a continuous random variable and used as an
approximation of other distribution. A random variable X is said to have a normal distribution if
its probability density function is given by
1 2
1  2  x 
f ( x)  e 2 , Where X is the real value of X,
 2
i.e. -  <x<  , ∞<µ<∞ and σ>0
Where µ=E(x) (σ) 2 = variance(X)
2
µ and (σ) are the Parameters of the Normal Distribution.

Properties of Normal Distribution:


1. It is bell shaped and is symmetrical about its mean. The maximum coordinate is at
x = X
2. The curve approaches the horizontal x-axis as we go either direction from the mean.
  1
1   x   2
2
3. Total area under the curve sums to 1, that is  f ( x)dx  
 2
e

dx  1

4. The Probability that a random variable will have a value between any two points is equal
to the area under the curve between those points.
5. The height of the normal curve attains its maximum at  X this implies the mean and
mode coincides(equal)

53
6.4.2 Standard normal Distribution
It is a normal distribution with mean 0 and variance 1.Normal distribution can be converted to
standard normal distribution as follows. If X has normal distribution with mean  X and standard
x
deviation , then the standard normal distribution devariate Z is given by Z=

2
1 z
P (Z) =
2
e 2

Properties of the standard normal distribution:


 The same as normal distribution, but the mean is zero and the variance is one.
 Areas under the standard normal distribution curve have been tabulated in various ways.
The most common ones are the areas between Z = 0 and a positive value of Z.

Given a normal distributed random variable X with mean µ and standard deviation σ.
b x a
P (a<X<b)  P (   )
  
 x a  x
P( X  a)  P   But,  Z Standard normal r.v.
    
 a
 PZ  
  

Note: i) P (a<x<b) = P (a<=X<b)


= P (a<X<=b)
=P (a<=X<=b)
ii) P (   Z  )  1
iii) P  a  Z  b   P  Z  b   P  Z  a  forq  b
Consider the situations under the standard normal curve. It is clear that
P  0  Z   0.5  P  Z  0 
i) Let Z0 be negative number then,
P  Z  Z 0   P  Z  0   P ( Z 0  Z  0)
ii) If Z0 is positive real number, then
P  Z  Z 0   P  Z  0   P ( Z 0  Z  0)
iii) Let Z1 be a negative number and Z2 be positive real number, then
P  Z 1  Z  Z 2   P  Z 1  Z  0   P ( Z 2  Z  0)
iv) If Z1 and Z2 are positive real numbers with Z1<Z2
P  Z 1  Z  Z 2   P  Z 1  Z  0   P ( Z 2  Z  0)

i.e. i) p(Z<Z0)
iv) P(Z1<Z<Z2) ii) p(Z>Z0)

54
0 Z1 Z2 0 Z0 Z0 0

iii) p (Z1<Z<Z2)

Z1 0 Z2
As the value of  increases, the curve becomes more and more flat and vice versa.

Examples: - For a standard normal variable Z find

a) P(-2.2 <Z<1.2) c) P(0<Z<0.96)


b) P(Z>1.05) d) p(-1.45 <Z<0)

Solution: a)
-2.2 1.2
P (-2.2<Z<1.2) = P (0<Z<1.2) +p (-2.2<Z<0)
= p (0<Z<1.2) +P (0<Z<2.2)
= 0.3849+0.4861
= 0.8710
b)

= P (Z>1.05) = 1 - P (0<Z<1.05)
= 1-0.8531 = 0.1469

c) P (0<Z<0.96) = 0.3315
d) P (-1.45 <Z<0) = P (0<Z<1.45) = 0.4265

NOTE: By determining the z- value, we can find the area or the probability under any normal
curve by referring to the standard normal distribution table.
How to use the Normal distribution table to determine probabilities
a. If you wish to find the area between 0 and Z (or – Z), look up the value directly in the table.
Example: P (0 < Z < 0.96) = 0.3315
Example: P (-0.96 < Z < 0) = P (0 < Z < 0.96); because the curve is symmetric to z = 0
= 0.3315

b. To find area between two points on the different sides of the mean, add the corresponding
areas found in the N table.
Example: P (-2.2 < Z < 1.2) = P (-2.2 < Z < 0) + P (0 < Z < 1.2)
=P (0 < Z < 2.2) + P (0 < Z < 1.2)
=0.4861 + 0.3849
= 0.8710
c. To find the area between two points on the same side of the mean, determine the areas related
to the two values from the table, and then subtract the smaller area from the larger.
Example: P (0.96 < Z < 1.2) = P (0 < Z < 1.2) – P (0 < Z < 0.96)
= 0.3849 – 0.3315
= 0.0534
Example: P (-1.2 < Z < -0.96) = P (-1.2 < Z < 0) – P (-0.96 < Z < 0)

55
= P (0 < Z < 1.2) – P (0 < Z < 0.96); because the curve is
symmetric to z = 0
= 0.3849 – 0.3315
= 0.0534
d. To find the area beyond Z (or -Z) value towards the same direction, look the value of Z
directly from the table, and then subtract it from 0.5.
Example: P (Z > 1.05) = 0.5 – (0 < Z < 1.05)
= 0.5 – 0.3531
= 0.1469
Example: P (Z < -1.05) =0.5 – P (-1.05 < Z < 0)
=0.5 – P (0< Z < 1.05); because the curve is symmetric to z = 0
=0.5– 0.3531
=0.1469
e. To find area beyond Z (or –Z) value towards the different direction, look the value of Z
directly from the table, and then add the probability with 0.5.
Example: P (Z > -1.05) = P (-1.05 < Z < 0) + P (0 < Z <  )
= P (0 < Z < 1.05) + 0.5
= 0.3531 + 0.5
= 0.8531
Example: P (Z < 1.05) = P (   < Z < 0) + P (0< Z < 1.05)
= P 0.5 + 0.3531
= 0.8531

Example: The average satellite transmission is 150 seconds, with a standard deviation of 150
seconds. Time appears to be normally distributed. What is the probability that a call will last
a. between 125 and 150 seconds e. less than 125 seconds
b. between 145 and 155 seconds f. between 160 and 165 seconds
c. more than 175 seconds g. between 135 and 140 seconds
d. less than 160 seconds h. more than 140 seconds
e.
So/n: Given = 150 = 15, and let x = time
a) P (125 < x < 150)
 125   x   150   
= P   
    

= P (-1.67 < Z <O)


=
Note: The rest are exercise.

56
The t- distribution (student’s t distribution)
Suppose we have a sample X1….., Xn from a N population that has mean (unknown) and
standard deviation (unknown); and using this sample data, we want to develop an interval
estimator of the population mean . Then X N , 2/n)
Z= has a standard normal distribution, but is unknown so that we can substitute it by its
point estimator S. Hence, now tn-1 = , is said to follow a t-distribution with n-1 df.; this is true
if the n is not sufficiently large (n ≤ 30)
N.B. df = degrees of freedom- is the number of independent observation in a set of observations
Note: tα (v) stands for a value of t with v df to the right of which an area equals to lies.

Example: t0.025, (12) = 2.179 means P (t (12) ) = 0.025


t0.01 (25) = 2.485 means P (t (25) ) = 0.01

Exercises
1. Suppose 20% of the population is victims of crime. In a family of 5, what is the probability that
none of the family is a crime victim?

2. Consider a random variable X that takes a value either 1 or 0 with respective probabilities P
and 1-P. find the expected value as well as the variance of the r.v.

3. The probability that a student entering a college will graduate is 0.4. Determine the probability
that out of 5 students (a) none, (b) one (b) at least one (a) at most three will graduate.
4. Find the area under the standard normal curve bounded by:

________________ (a) Z= -1.3 and Z= 1.39


________________ (b) Z= 0.73 and Z= 1.36
________________ (c) Z=-2.43 and Z=-1.56
5. A production engineer finds that on an average mechanics working in a machine shop complete
a certain task in 15 minutes. The time required to complete the task are approximately normally
distributed with the standard deviation of 3 minutes. Find the probabilities that the task is
completed:
________________ (a) in less than 8 minutes,
________________(b) in more than 9 minutes, and
________________(c) between 10 and 12 minutes

57
Chapter Seven
Sampling and sampling distribution of the mean
Introduction
When secondary data are not available for the problem under study, a decision may be taken to
collect primary data by using any of the methods discussed in the previous chapter. The required
information may be obtained by following either the census method or the sample method.
7.1 Difference between Census and Sample Methods
Census Method
Under the census or complete enumeration survey method, data are collected for each and every
unit (person, household, field, shop, factory etc.), as the case may be of the population or

58
universe, which is the complete set of items, which are of interest in any particular situation. For
example, if the average wage of workers working in sugar industry. Average is to be calculated,
then wage figures would be obtained from each and every worker working in the sugar industry
and by dividing the total wages which all these workers receive by the number of workers
working in sugar industry, we would get the figure of average wage.
merits of Census method
The merits of the census method are
1. Data are obtained from each and every unit of the population.

2. The results obtained are likely to be more representative, accurate and reliable.

3. It is an appropriate method of obtaining information on rare events such as areas under


some crops and yield there of, the number of persons of certain age groups, their
distribution by sex, educational level of people, etc. This is the reason why throughout
the world the population data are obtained by conducting a census generally every 10
years by the census method.

4. Data of complete enumeration census can be widely used as a basis for various surveys.

Demerits
However, despite these advantages the census method is not very popularly used in practice.
1. The effort, money and time required for carrying out complete enumeration will generally be
very large and in many cases cost may be so prohibitive that the very idea of collecting
information may have to be dropped. This is truer of underdeveloped countries where resources
constitute a big constraint.
2. Also if the population is infinite or the evaluation process destroys the population unit, the
method cannot be adopted.

Sample method
Definition: Sampling is simply the process of learning about the population on the basis of a
sample drawn from it. Thus in the sampling technique, instead of every unit of the universe, only
a part of the universe is studied and the conclusions are drawn on that basis for the entire
universe. A sample is a subset of population units. The process of sampling involves three
elements:
a. Selecting the sample.
b. Col1ecting the information, and
c. Making an inference about the population.
Practical examples of sampling
 A doctor examines a few drops of blood and draws conclusion about the blood
constitution of the whole body.
 A businessman places orders for material examining only a small sample of the same.
 A teacher may put questions to one or two students and find out whether the class as a
whole is following the lesson. In fact there is hardly any field where the technique of
sampling is not used either consciously or unconsciously.

It should be noted that a sample is not studied for its own sake. The basic objective of its study is
to draw inference about the population. In other words, sampling is a tool which helps to know
the characteristics of the universe or population by examining only a small part of it.
Unit: An element of the population from which information can be obtained.

59
Sampling Frame:- A list of all the units in the population from which information can be
obtained.
Major reasons why sampling is necessary
1) the destructive nature of certain tests
2) physical impossibility of checking all items in the population
3) cost of studying all items in the population is often prohibitive
4) the adequacy of sample result
5) in terms of time
7.2. Types of Error
Clearly, every estimate based on a sample to the population might not be accurate (exact). Si we
categorize errors that might occur during estimation in to two:
 Sampling errors
 Non sampling error

Sampling Errors
Even if we have a representative sample we might face errors if the sample size is not sufficiently
large. We cannot overcome this type of error (unless we take a census rather than a sample). This
will enable us to approximate the sample size we need to ensure our estimates are accurate to a
certain degree of accuracy. Our estimates of parameters will often be inaccurate if our sample is
not representative of the population.

Non-sampling Errors
Suppose we have a representative sample, and have chosen a sample size large enough to ensure
that our parameter estimates are accurate to a good degree of precision, we might still have other
kinds of errors; Such as measurement errors, recording errors, non-response errors, and
interviewer errors. Measurement errors and recording errors occur if there is an error in
measuring the item being studied or in recording its result. Interviewer errors can occur in surveys
when an interviewer introduces bias into an interview or when a questionnaire is badly designed.
Non-responses can be due to refusals, in differences, lost questionnaires or other factors. If the
non-response rate is large it is important to try to understand why this is so before analyzing data.

7.3. Methods of Sampling


Sampling techniques/methods can be grouped into two categories:
- Random (probability) sampling methods, and
- Non-random (non-probability) sampling methods.

7.3.1. Random Sampling Methods


1. Simple random sampling (S.R.S)
From a population of N objects if we wish to choose a sample of size n, we have seen that there
are NCn possible ways. Then if each of these possible samples has an equal probability of being
chosen, 1/NCn), we say it is simple random sampling method. If the population is small we can
easily choose a simple random sample by a lottery method. If we have a large population we can
perform the same procedure using a computer or a table of random numbers. S.R.S. is the basis of
all random sampling.

2. Stratified random sampling


Stratified random sampling is often used when the population can be classified into subgroups
called “strata”. The different subgroups are believed to be very different from each other, but it is
thought that the individuals who make up each “strata” are similar. The units are chosen by
simple random sampling method from each sub group.

60
Example:- A researcher wants to study about cultural habits in Ethiopia, she could choose a
simple random sample of 100 students from W/sodo university and continue her work. However,
if by chance the sample contained a large number of students from SNNPR the result could be
misleading (as the SNNPR culture will be emphasized most). But she will have some better result
if she uses a stratified random sample instead. For this, she might divide the students in the
university according to the regions (nation) they came from and decide in advance how many
students to choose from each region. So clearly she might have get the chance to assess all
cultural habits in Ethiopia.

3. Cluster random sampling


Cluster random sampling is often used to reduce the cost or inconvenience of a survey. It is used
when it is possible to divide the population into subgroups called “clusters”. Ideally each cluster
will be a reflection of the population. The researcher will choose a number of clusters at random
by using a simple random sampling method. Every unit in the chosen cluster can then be
included in the sample, or the researcher can choose a simple random sample from each cluster.
Example:- A researcher wants to identify problems in the W/sodo university students’ cafeteria
service. As the population includes all students in W/sodo University, a simple random sample
will include students from all departments of each batch, so interviewing them all might be
expensive or impossible instead he can divide the students in to clusters where each cluster is a
department of each batch (e.g. First year biology students, 2nd year chemistry students, etc). A
number of classes are chosen at random and the interviewer either takes a census in each chosen
class or takes a simple random sample from each.

4. Systematic random sampling


Systematic random sampling is often used when the names of the population are held on a list.
Example:- suppose we want to choose a sample of about 20 students out of a class of 99 students.
First we put the class in order (maybe alphabetical order, or by ID number) and give each a
number between 1 and 99. Next we divide 99 by 20 and round to get 99/20 5. We now choose a
number at random between 1 and 5. The student corresponding to that number is the first student
in the sample, and we then take every 5th student.

7.3.2. Non-random sampling methods


But sometimes, random sampling is too costly or inconvenient to perform, we might then use
non-random sampling methods such as judgment sampling, quota sampling and convenience
sampling.

1. Judgment sampling
This involves choosing a sample by the judgment of the investigator. The investigator chooses a
sample of individuals that are thought to be representative of the population.
Example:- We want to choose 100 students in W/sodo university to ask their opinions on the
quality of teaching in the university. We might decide to use personal judgment to choose 100
students who seem to give a good mix of first-, second-, and third year students, and making sure
the sample contains a fair representation of male and female students. If we try to get this
representation without using any randomization, but simply using judgment, we can say that we
are using a judgment sample.

2. Quota sampling
This sampling method has some aspects in common with stratified sampling, but has no
randomization. We divide the population into strata as in stratified sampling, but we choose a
judgment sample from each stratum

61
Example:- We want to choose a sample of 100 students in W/sodo university, and want 10 to be
female and 60 male. If we choose the 40 female students by taking a simple random sample from
all the female students in the university and the 60 male students by taking a simple random
sample from all the male students in the university then we are using a stratified random sample.
On the other hand if we choose the students using our own judgment, it is a quota sample.

3. Convenience sampling
A convenience sample: involves taking individuals that are easy to find. This can be very
convenient, but it can lead to large biases in the results.

7.4. The sampling distribution of the sample mean


Examle1 Tartus industries have seven production employees are given in the table below.
Employee Joe Sam Sue Bob Jan Art Ted
Hourly Earnings ($) 7 7 8 8 7 8 9
(i) What is the population mean?
(ii) What is the sampling distribution of the sample means for the sample of size 2?
(iii) What is the mean of the sampling distribution?
Solution:
$7  $7  $8  $8  $7  $8  $9
(i)    $7.71
7
(ii) The possible sample of size 2 selected without replacement from the population were
N! 7!
NCn = = =21
computed as follow n!( N  n)! 2!(7  2)!
Sample Employees Sample mean
1 Joe & Sam 7
2 Joe & Sue 7.5
3 Joe & Bob 7.5
4 Joe & Jan 7
5 Joe & Art 7.5
6 Joe & Ted 8
7 Sam & Sue 7.5
8 Sam & Bob 7.5
9 Sam & Jan 7
10 Sam & Art 7.5
11 Sam &Ted 8
12 Sue & Bob 8
13 Sue & Jan 7.5
14 Sue & Art 8
15 Sue & Ted 8.5
16 Bob & Jan 7.5
17 Bob & Art 8
18 Bob & Ted 8
19 Jan & Art 7.5
20 Jan & Ted 8
21 Art & Ted 8.5

Then sampling distribution of sample means for n=2

62
number of means
Sample mean (frequency) Probability
7 3 0.1429
7.5 9 0.4285
8 6 0.2857
8.5 3 0.1429
Total 21 1

7 * 3  7.5 * 9  8 * 6  8.5 * 3
 x

21
=$7.71

Example2 (N=3, n=2)


Three students have taken a class lest (which is marked out of 10). We want to estimate the mean
mark and do this by taking a sample of size 2 and using the sample mean as the estimate of the
population mean. Suppose the marks of the three students are 1,2 and 6.
If we choose one of the students at random the marks of that student can be represented by a
random variable, X with probability distribution as follows.
X=x 1 2 6
P(X=x) 1/3 1/3 1/3

Exercise: show that = 3 and 2 =14/3.


Now let us look at the sample mean. The sample mean is itself a random variable so it must have
a distribution. Let us derive it here. If we are sampling without replacement we can take 3C2 =3
possible samples; the possibilities are given below.
Possible sample (1,2) (1,6) (2,6)
Sample mean 1.5 3.5 4

The sample mean is a random variable, and we see that it can take three possible values. We can
now write down its probability distribution as follows.
X=x 1.5 3.5 4
P(X=x) 1/3 1/3 1/3

Suppose, instead, that we took the sample with replacement. The following samples are possible.
Possible sample (1,1) (1,2) (1,6) (2,1) (2,2) (2,6) (6,1) (6,2) (6,6)
Sample mean 1 1.5 3.5 1.5 2 4 3.5 4 6

The sample mean is a random variable, and we see that it can take six possible values. We can
now write down its probability function.
X=x 1 1.5 2 3.5 4 6
P(X=x) 1/9 2/9 1/9 2/9 2/9 1/9

It is left as an exercise to show that E(X)=3 and Var. (X)= 7/3


In each case the expected value of the sample mean equals the population mean. This explains
why the sample mean is a good estimator of the population mean. If we use the sample mean as
an estimate of the population mean we will sometimes over-estimate it, and sometimes
underestimate it, but “on average” we will be accurate. The example above illustrates an
important result:

If X1, X2,… Xn form a simple random sample from an infinite population


1. E(X)= and

63
2
2. Var (X)= /n.

The first statement is true for any type of population (whether it is finite or infinite). The second
statement is true if we have an infinite population, but it is also true if we sample with
replacement from a finite population.

A consequence of this result is that if n is large the sample mean will have an expected value
equal to and a variance close to zero. This means that the sample mean will be a good estimator
of the population mean if the sample size is large.
N.B. For the rest of the chapter the term “random sample” will mean a simple random sample
from an infinite population or a sample taken with replacement.

7.5. The central limit theorem


When sampling from a non-normal population, the distribution of X depends on the particular
form of the population distribution that prevails. A result, known as the Central limit theorem,
states that when the sample size is large, the distribution of X is approximately normal regardless
of the shape of the population distribution. In practice, the normal approximation is usually
adequate when the sample size, n is greater than 30.

The central Limit theorem


If X1, X2,…., Xn is a random sample from a population with mean and variance 2, then as n
goes to infinity: The distribution of the sample mean, X, approximates normal distribution
with mean and variance 2/n.
In short as n gets large number, X N( distribution.
We can standardize this to get
The central limit Theorem is an extremely important result in probability theory. It says that the
distribution of the sample mean can be approximated by a normal random variable with mean
and variance 2/n provided n is large, and this is true even if the original data are not normally
distributed. This explains why the Normal distribution is so important. We will use this Theorem
a lot in the next chanter when we look at estimation and hypothesis testing.

Exercises
1. The marks scored by five students in a test of statistics carrying 100 marks are 50, 60, 50, 60
and 40. If a simple random sample of size 4 is drawn without replacement, construct the
sampling distribution of sample mean and find the standard error of the sample mean.

2. Suppose that the population distribution of the gripping strengths of industrial workers is
known to have a mean of 110 and standard deviation of 10. For a random sample of 75
workers, what is the probability that the sample mean gripping strength will be
a) Between 109 and 112?
b) Greater than112?

3. The amount of sulphur in a daily emission from a factory has a normal distribution with mean
of 134 pounds and a standard deviation of 22 pounds. For a day selected randomly, find the
probability that the mean amount of sulphur emission will be less than 130 pound.

64
Chapter Eight
Estimation and hypothesis testing
8.1 Introduction
Definitions:
Estimation: is a process by which we estimate the unknown population parameter from sample
statistic.
Estimator: is any statistic that is used to estimate a population parameter.
Estimate: is a numerical value of an estimator is called an
There are two types of estimations: point estimation and interval estimation
1. Point estimation
- involves using a single statistic value to estimate an unknown parameter value.
2. Interval estimation
- involves using a range of values to estimate an unknown parameter

65
Suppose θ is an unknown parameter that we wish to estimate. Let T1, and T2 be functions of x1,
x2, …xn. If P (T1 < θ< T2) = 1-α, we say that the interval (T1, T2) is a (1- α) 100% confidence
interval for θ.
The most common confidence intervals are 90%, 95%, and 99%.
Point estimator for the mean (μ)
- is the sample mean (X)
Interval estimator for the mean (μ)
Case1. - The variance (δ2) is known and
- Either the data are Normal or the sample is large.
In this case we know that X N (μ, δ2/n)
X 
N (0, 1)

n
Then by using the N table
P( Z   Z  Z  )  1  
2 2

 
 
 X 
= P  Z   Z   1  
 2  2 
 
 n 
Rearranging terms gives us,
   
P X  Z     X  Z   1  
 2 n 2 n
Then the (1- α)100% confidence interval for μ is
   
  X  Z  , X  Z 
 2 n 2 n
Example: the manufacturer of a battery is trying to estimate the life time of the battery. He
believes each battery will last for a random amount of time that has a normal distribution with
mean μ and variance 100. He carries out an experiment to estimate μ. A sample of 400 batteries is
tested and their life times are measured. The mean life time is found to be 74.2 hours. Calculate
the 95% confidence interval for μ, and interpret your results.
So/n:
Given δ2=100, n=400, X=74.2
95% CI = ?
Here the sample is large and δ is known, then the (1- α) 100% confidence interval for μ is given
by
     10 10 
  X  Z  , X  Z    74.2  1.96 ,74.2  1.96   (73.22, 75.18)
 2 n 2 n  400 400 
Interpretation: we have 95% chance of being correct to estimate the life time of the battery is
between 73.22 and 75.18 hours.

Case2. The variance (δ2) is unknown and the sample is large.


According to the central limit theorem X N (μ, δ2/n)

66
X 
N (0, 1)
s
n
Then by using the N table
P( Z   Z  Z  )  1  
2 2

 
 
 X 
= P  Z   Z   1  
 2 s 2 
 
 n 
Rearranging terms gives us,
 s s 
P X  Z     X  Z   1  
 2 n 2 n
Then the (1- α) 100% confidence interval for μ is
 s s 
  X  Z  , X  Z 
 2 n 2 n
Example: a study by a professor at AAU is designed to offer inference about unemployment rates
by regions in Ethiopia. A sample of 200 regions reported a mean rate of 46.2%, with a standard
deviation of 1.7%. At the 90% level of confidence, what is the estimate of the mean
unemployment rate by region in the country?
So/n:
Given s=1.7%, X=46.2%, n=200
90% CI = ?
Here though δ is unknown, the sample is large, then by central limit theorem, the (1- α) 100%
confidence interval for μ is given by
 s s   1.7% 1.7% 
  X  Z  , X  Z    46.2%  1.65 ,46.2%  1.65  
 2 n 2 n  200 200 
= (46.2 – 0.198, 46.2 + 0.198) = (46, 46.398)

Interpretation: we have 90% chance of being correct to estimate the mean unemployment rate of
Ethiopia by region is between 46% and 46.398%

Case3. -Variance (δ2) is unknown,


-The data is normal, and n is small

In this case we know that X tn-1 (μ, S2/n)


X 
tn-1
s
n
Then by using the N table
P(t  t  t )  1  
2 2

67
 
 
 X 
= P  t n  1   t n  1  1  
 2 s 2 
 
 n 
Rearranging terms gives us,
 s s 
P X  t n  1    X  t n  1   1  
 2 n 2 n
Then the (1- α) 100% confidence interval for μ is
 s s 
  X  t n  1 , X  t n  1 
 2 n 2 n
Example: the signing bonuses for 10 new players in the national football league are used to
estimate the mean bonus for all new players. The sample mean is $65,890 with standard deviation
$12,300. What is your 90% confidence interval for the population mean?
So/n:
Given s=12,300X=65,890 n=10
90% CI = ?
The (1- α) 100% confidence interval for μ is given by
 s s   12300 12300 
  X  t n  1 , X  t n  1    65890  t 0.05 , 9 ,65890  t 0.05 , 9  
 2 n 2 n  10 10 
= (65890 – 1.833*3886.2, 65890 + 1.833*3886.2) = (65890 – 7123.4, 65890 + 7123.4)
= (58766.6, 73013.4)
Interpretation: we have 90% chance of being correct to estimate the mean unemployment rate of
Ethiopia by region is between $58766.6 and $73013.4.

8.2. Determining the sample size for estimating the mean


So far we have been acting as if we were given a set of data and asked to analyze that data set.
But often we are supposed to select a sample and collect a data ourselves. So we have to decide
on the sample size. A standard method is to decide how accurate we want our estimate to be.
Choosing an appropriate sample size depends upon three factors:
a) The level of confidence desired
b) The variability in the population being studied, and
c) The maximum allowable error, E
We can express this interaction among these three factors and the sample size in the following
formula:
E
Z
S n
Solving the equation for n, we obtain the required sample size is
2
 Z .S 
n
 E 
Where: E= the maximum allowable error
S= the estimate of the population standard deviation
N= the sample size
Z= the standard normal value corresponding to the desired level of confidence
And let’s illustrate this by looking at the following example.
Example:

68
The manufacturer of a certain type of battery is trying to estimate the life time of the battery. He
believes that each battery will last for a random amount of time that has a Normal distribution
with mean μ and variance 100. He carries out an experiment to estimate μ. A sample of n
batteries is tested and their life times are measured. He wants to choose n so that the 95%
confidence interval for μ has a width of less than 4 hours. What value should be taken for n?
So/n:
95% CI for μ is given by
       
  X  Z  , X  Z    X  1.96 , X  1.96 
 2 n 2 n  n n
       10  39.2
Width=   X  1.96    X  1.96    21.96   4
 n  n   n n
n > 96.04. As n should be an integer value, we should take n ≥ 97.

8.3. Hypothesis Testing


Hypothesis: An assertion or tentative assumption regarding the values of a population parameter.
For example:  = 2000 Birr,  < 2000 Birr, etc
Here we are given (or we believe we know) the value of the parameter, and want to use the data
to test whether or not this value is correct.
Basic terms
 Null hypothesis (Ho): the claim that we are trying to test
 Alternative hypothesis (H1): the claim that we could accept if Ho is false.
 Type I error: the error that occurs when we reject a true Ho.
 Type II error: the error that occurs when we accept a false Ho.
Ideally we would like to have 0 probability of making either types of errors, but usually
not possible. We usually consider type I error as more serious that type II errors, and then
it is important to make sure that we have a small probability of making a type I error. One
way of doing this is choosing a small α value.
 Significance level (α): the probability of committing a type I error.
 Rejection region: a region if the test statistic value lies inside we reject Ho.
 Acceptance region: a region if the test statistic value lies inside we accept Ho.
Procedures/steps in hypothesis test.
1. write the hypotheses
2. fix the significance level, α
3. obtain the appropriate test statistic
4. give decision rule
5. write conclusion

8.4. Testing a hypothesis about a single population mean μ


Case1. - The variance (δ2) is known and
- Either the data are Normal or the sample is large.
1. hypothesis
i. Ho: μ= μo vs H1: μ≠ μo
ii. Ho: μ= μo vs H1: μ> μo
iii. Ho: μ= μo vs H1: μ< μo
2. significance level, α
3. test statistic

69
X 
Z=

n
4. decision rule
i. If H1: μ≠ μo,
Reject Ho if Z cal > Z α/2 or if Z cal < -Z α/2
=> │Z cal│> Z α/2
ii. If H1: μ> μo
Reject Ho if Z cal > Z α
iii. If H1: μ< μo
Reject if Z cal < -Z α
5. conclusion

Example: a large trial was performed to test the hypothesis that the mean blood pressure of all
healthy men is 140 mmhg. The blood pressure of 100 healthy men was measured, and the sample
mean was found to be equal to 137.9 mmhg, and the sample standard deviation was 10 mmhg.
What will be concluded from the test?
So/n:
1. Hypothesis
Ho: μ= 140 vs. H1: μ≠ 140
2. Significance level, α = 5%
3. Test statistic
137.9  140
Z=  -2.1
10
100
4. Decision rule
Reject Ho if Z cal > Z α/2 = 1.96or if Z cal < -Z α/2 = -1.96
=> │Z cal│> Z α/2 = 1.96
5. Conclusion
As│-2.1│> 1.96, we reject Ho, and conclude that the mean blood pressure of all healthy
men is different from 140 mmhg.
Case2. The variance (δ2) is unknown and the sample is large.
1. Hypothesis
i. Ho: μ= μo vs. H1: μ≠ μo
ii. Ho: μ= μo vs. H1: μ> μo
iii. Ho: μ= μo vs. H1: μ< μo
2. Significance level, α
3. Test statistic
X 
Z=
s
n
4. Decision rule
i. If H1: μ≠ μo,
Reject Ho if Z cal > Z α/2 or if Z cal < -Z α/2
=> │Z cal│> Z α/2
ii. If H1: μ> μo
Reject Ho if Z cal > Z α

70
iii. If H1: μ< μo
Reject if Z cal < -Z α
6. Conclusion

Example: An economist is trying to test whether the mean earnings of all graduates in the country
is more than 500 Birr/month or not. The distribution of monthly earnings has a mean μ and
variance δ2. The economist has interviewed a sample of 1000 graduates and found the mean
earning is 532 Birr/month, and the standard deviation of 245 Birr/month. What will he conclude
at 1% significance level?
So/n:
1. Hypothesis
Ho: μ= 500 vs. H1: μ> 500
2. Significance level, α = 1%
3. Test statistic
532  500
Z=  4.13
245
1000
4. Decision rule
Reject Ho if Z cal > Z α = 2.33
5. Conclusion
As 4.13 > 2.33, we reject Ho, and conclude that the earnings of all graduates in the
country is greater than 500Birr/month.
Case3. -Variance (δ2) is unknown,
-The data is normal, and n is small
1. Hypothesis
i. Ho: μ= μo vs H1: μ≠ μo
ii. Ho: μ= μo vs H1: μ> μo
iii. Ho: μ= μo vs H1: μ< μo
2. Significance level, α
3. Test statistic
X 
t=
s
n
4. Decision rule
i. If H1: μ≠ μo,
Reject Ho if t cal > t α/2, n-1 or if t cal < -t α/2, n-1
=> │tcal│> t α/2, n-1
ii. If H1: μ> μo
Reject Ho if t cal > t α, n-1
iii. If H1: μ< μo
Reject if t cal < -t α, n-1
5. Conclusion

Example: A soft drinks company sells its drinks in bottles that are supposed to contain 330ml of
drink. In fact the amount of drink in each bottle has a Normal distribution with unknown mean μ
and variance δ2. A quality control inspector carries out an experiment to test the company’s claim
that the mean drink in the bottles is 330ml. Suppose he takes a sample of 25 bottles and measures
the volume of their contents. The mean is found to be 328.5ml and the variance is found to be
9ml. Should the inspector believe the company’s claim?

71
So/n:
1. Hypothesis
Ho: μ= 500 vs H1: μ< 330
2. Significance level, α = 1%
3. Test statistic
328.5  330
t=  -2.5
3
25
4. Decision rule
Reject Ho if t cal <- t α, n-1= - t0.01, 24= -2.492
5. Conclusion
As -2.5 < - 2.492, we reject Ho and conclude that the inspector shouldn’t believe the
company’s claim.

Exercises
1. An experiment involves selecting a random sample of 256 middle managers. One item of
interest is annual income. The sample mean is $45,420 and the sample standard deviation
is $2,050.
(i) What is the estimated mean income of all middle manager (point estimate or population
mean)?
(ii) What is the 95 percent confidence interval for population mean?
(iii) What degree of confidence being used?
(iv) Interpret the result.
2. The manufacturer of a certain type of battery is trying to estimate the lifetime of the
battery. He believes each battery will last for a random amount of time that has a N (μ, 100)
distribution. (The lifetimes are measured in hours.) He carries out an experiment to estimate μ.
A sample of 400 batteries is tested and their lifetimes are measures. The (sample) mean lifetime
is found to be 74.2 hours. Calculate a 95% confidence interval for μ. How do you interpret this
interval?
3. A biostatistician intends to estimate μ, the mean blood pressure of women between the ages
of 45 and 50. She takes a random sample of 20 women and measures their blood pressure. Based
on past experience she believes the measurements will follow a
N(μ, 100) distribution. (Measurements are in mm mercury.) Suppose she discovers the sample
mean is equal to 136.9 mm mercury. Find a 95% confidence interval for μ.
4. A biologist measured a random sample of 12 fossil skeletons of an extinct species of bird.
He found that their skulls had a mean length of 6.34cm and a standard deviation of 0.45cm. He
believes that the lengths of the skulls follow a normal distribution. Us the data to obtain a 95%
confidence interval for the mean of this distribution.

5. A sports scientist takes a random sample of 17 athletes and asks them to run 5km on a
treadmill. Their heart rates are measured before the start of the run and five minutes after the
finish. The increases in heart rates are measured and are shown below.
53 45 71 74 65 83 47 56 61 74 61 72 54 43 72 65 54
Increase in heart rates (beats per minute)
(i) Calculate the mean and standard deviation of the data.
(ii) The sport scientist wanted to estimate μ, the mean increase in heart rate. Find a
point estimate for μ and construct a 95% confidence interval for it. What
assumptions do you need to make about the population for this interval to be
valid?

72
6. A consumer service agency examined a new automobile for its gasoline performance. A
sample of 12 randomly chosen of kms covered per gallon under normal condition resulted an
average of 60 kms/gallon with stdev 1.8 km. Do this result support manufactures claim that the
new automobile covers more than 50 km/gallon? Use a=0.10

Chapter Nine

Simple Linear Regression and correlation analysis

9.1. Simple Linear Regression Analysis

73
Regression is concerned with bringing out the nature of relation ship and using it to know the
best approximate value of one variable corresponding to a known value of other variable

Simple linear regression deals with method of fitting a straight line (regression line) on a sample
of data of two variables in terms of equation so that if the value of one variable is given we can
predict the value of the other variable.

In other words if we have two variables under study one may represent the cause and the other
may represent the effect. The variable representing the cause is known as independent (predictor
or repressor) variable and it is usually denoted by X. The variable representing the effect is
known as dependent (predicted) variable and is usually denoted by Y. Then, if the relationship
between the two variables is a straight line, it is known as simple linear regression.

When there are more than two variables and one of them is assumed to be dependent up on the
others, the functional relationship between the variables is known as multiple linear regressions.

Scatter diagram: is a plot of all ordered pairs (x, y) on the coordinate plane which is necessary to
discover weather the relationship b/n two variables indeed best explained by straight line.

Example:

Advertizing budget (X) 5 6 7 8 9 10 11


Profit(Y) 8 7 9 10 13 12 13

Y
13 x x
12 x
11
10 x
9 x
8 x
7 x
6
5
4
3
2
1

1 2 3 4 5 6 7 8 9 10 11 X

So if we draw a line, the regression line is one that passes through almost all or closest to all
points in the scatter diagram.

Y
x x x
x xx x
x
x x x

74
x x x

The simple linear regression of Y on X in the population is given by:

Y =  + X + ε
Where
 = y-intercept
 = slope of the line or regression coefficient
ε=is the error term

The y-intercept  and the regression coefficient  are the population parameters. We obtain the
estimates of  and  from the sample. The estimators of  and  are denoted by a and b,
respectively. The fitted regression line is thus,

Ye = a + b X

The above algebraic equation is known as a regression line. The method of finding such a
relationship is known as fitting regression line. For each observed value of the variable X, we can
find out the value of Y. The computed values of Y are known as the expected values of Y and are
denoted by Ye.

The observed values of Y are denoted by Y. The difference between the observed and the
expected values Y-Ye, is known as error or residual, and is denoted by e. The residual can be
positive, negative or zero.
2
A best fitting line is one for which the sum of squares of the residuals, e ; , is minimum. For
this purpose the principle called the method of least squares is used.
According to the principle of least squares, one would select a and b such that
2
e ; = (Y- Y ) ² is minimum where Y
e e = a+ bx.
2
To minimize this function, first we take the partial derivatives of e ; with respect to a and b.
Then the partial derivatives are equated to zero separately. These will result in the following
normal equations:
 y  na  b x
2
 xy  a x  b x
Solving these normal equations simultaneously we can get the values of a and b as follows:

 x y
 xy  n
b and
2
(  x) 2
x 
n
a  y  bx

75
Regression analysis is useful in predicting the value of one variable from the given values of
another variable.

Example: A researcher wants to find out if there is any relationship b/n height of the son and his
father. He took random sample 6 fathers and their sons. The height in inch is given in the table
bellow (i) Find the regression line of Y on X
(ii) What would be the height of the son if his father’s height is 70 inch?
Height of father (X) 63 65 66 67 67 68
Height of the son (Y) 66 88 65 67 69 70

2 2
Solution :  X  396 ,  Y  425 ,  X  26152 ,  XY  26740 ,  Y  27355
 x y
 xy 
n 6(26740)  (396)(405)
b  2
 0.625 2
(  x)
2 6(26152)  (396)
(i) x  n

a  y  bx 
 Y  b X 
405  (0.625)(396)
 67.5
n 6
 Y=26.25-0.625X
(ii) If X=70, then
Y=26.25-0.625(70) =70, thus the height of the son is 70 inch

Standard Error of estimates: measures the average amount by which the estimated Ye values
depart from the corresponding observed Y values (dispersion of observed values around the line
of regression Yon X)
2

Sx.y =
( y i  y ei ), where Ye =  + X + ε and
 n2
Yi is observed (actual) value of y
Example: given the observation (2, 2), (4, 5), (6, 4) and (8, 7), we can get the regression line
Ye =1+0.7X. Find the standard error of the estimates of the regression line.
Solution:
Ye =1+0.7Xi, I = 1, 2, 3, 4
Then Ye1 =1+0.7(x1) Ye3 = 1+0.7(6) = 5.2
=1+0.7(2) = 2.4 Ye4 = 1+0.7(8) = 6.6
Ye2=1+07(4) = 3.8
2
( y i  y ei) 1
 Sx.y =  = (2  2.4)  ...  (7  6.6)  1.26
n2 2

9.2 Simple Linear Correlation Analysis

The measure of the degree of relationship between two continuous variables is known as
correlation coefficient. The population correlation coefficient is represented by  and its estimator
by r. The correlation coefficient r is also called Pearson’s correlation coefficient since it was
developed by Karl Pearson. r is given as the ratio of the covariance of the variables x and y to the
product of the standard deviations of x and y. Symbolically,

76
( x  x )( y  y )
Cor ( x, y )  n 1
r 
sd ( x).sd (Y ) 2
 (x  x  ( y  y)
n 1 n 1

=
 ( x  x )( y  y )
2 2
 (x  x)  ( y  y)
 x y
 xy  n
= 2

( x 2( X ) )( y  ( y ) 2
2

n
 n
)

The numerator is termed as the sum of products of x and y, SPxy. In the denominator, the first
term is called the sum of squares of x, SSx, and the second term is called the sum of squares of y,
SSy. Thus,
SPxy
r=
SS x SS y

The correlation coefficient is always between –1 and +1, i.e.,-1  r  1.


r = -1 implies perfect negative linear correlation between the variables under
consideration
r = +1 implies perfect positive linear correlation between the variables under
consideration
r = 0 implies there is no linear relationship between the two variables: but there could be a non-
linear relationship between them. In other words, when two variables are uncorrelated, r = 0, but
when r = 0, it is not necessarily true that the variables are uncorrelated.

x perfect negative perfect positive x no correlation


correction(r = -1) correlation (r = 0)
x (r = 1) x x x x

x x

9.3 Coefficient of Determination(R2)

The square of the correlation coefficient, r2, is called the coefficient of


determination. It measures the variation in the dependent Y variable explained by variation in the
independent variable X.

77
For example, if r = 0.8, then r2 = 0.64. This means on the basis of the sample
approximately 64% of the variation in the dependent variable, say Y, is caused by the variation of
the independent variable, say X. The remaining, 1-r2, 36% variation in Y is unexplained by
variation in X. In other words, variables (factors) other than X could have caused the remaining
36% variation in Y.

Example: the research director of the Dubbary Saving and Loan Bank collected 24 observation of
montage interest rates X and number of house sales Y at each interest rate. The director computed
that,
2 2
x  276,  y  768,  x i  3300,  y  2500,  xi y  8690
i i i i
Then compute (i) Coefficient of correlation.
(iii) The coefficient of determination.
Solution:
(i)

r
 ( x  x )( y  y ) 
2 2
 (x  x)  ( y  y)
24(86.9)  276(768)
 0.61
2 2
24(3300)   24(2500)  
 ( 276)   (768) 
(ii) Coefficient of determination (R2) = r2= (0.61)2 =0.37 this shows that 37% of the variation
in the number of house holds is due to the variation in the interest rate.

Exercise
1. Define, briefly, regression and correlation, in statistics.
2. How do you interpret a calculated value of Karl person’s correlation coefficient?
Discuss in particular the values of r= 0, r=-1 and +1.
3. calculate and interpret the Karl Pearson’s correlation coefficient for the ages of
husband and wife for the data given below
Age of husband 23 27 28 29 30 31 33 35 36 39
Age of wife 18 22 23 24 25 26 28 29 30 32

4. Assuming that we conduct an experiment with eight fields planted with corn, amount
of nitrogen fertilizer applied is given in kgs and corn yield per hectare, the resulting corn yield
and amount of fertilizer applied shown in the table below .
Amount of Nitrogen 22 26 23 29 20 15 18 32
fertilizer(kg)(x)
Corn yield/hectare(y) 120 130 160 180 120 110 118 190

78
a) Compute a linear regression equation of corn yield per hectare on amount of nitrogen
fertilizer applied and also by using the equation predict corn yield for a field treated with
34kgs of fertilizer.
Calculate and interpret simple correlation coefficient between amount of fertilizer applied and
corn yield obtained, also obtain coefficient of determination

79

You might also like