0% found this document useful (0 votes)
33 views91 pages

Lecture Note Introduction To Stat Seni

The document provides an introduction to statistics including definitions, applications, uses, limitations and basic concepts. It discusses how statistics can be used to summarize large amounts of data, quantify error, deduce properties of populations from samples, and aid in experimental reasoning. Descriptive and inferential statistics are introduced along with the stages of statistical investigation.

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views91 pages

Lecture Note Introduction To Stat Seni

The document provides an introduction to statistics including definitions, applications, uses, limitations and basic concepts. It discusses how statistics can be used to summarize large amounts of data, quantify error, deduce properties of populations from samples, and aid in experimental reasoning. Descriptive and inferential statistics are introduced along with the stages of statistical investigation.

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 91

Chapter One: Introduction to Statistics

1.1 Introduction
The origin of modern statistics can be traced back to the 17 th and 18th centuries when
mathematicians were mainly interested in the development of the theory of probability as
applied to the theory of chance. A commoner, named John Graunt, who was a native of
London, begin reviewing a weekly church publication issued by the local parish clerk that
listed the number of births, christenings, and deaths in each parish. These so-called Bills
of Mortality also listed the causes of death. Graunt who was a shopkeeper organized this
data in the forms we call descriptive statistics, which was published as Natural and
Political Observation Made upon the Bills of Mortality. Whatever it is possible to define
Statistics in different way in different authors. For instance, “Statistics is the branch of
the scientific method which deals with the data obtained by counting or measuring the
properties of populations of natural phenomena.” Kendall and Stuart [1963]. “Statistics
is concerned with the inferential process, in particular with the planning and analysis of
experiments or surveys, and with the efficient summarizing of sets of data.” Kruskal
[1968] and the like.

New and ever growing diverse fields of human activities are using statistics; however, it
seems that this field itself remains obscure to the public. Professor Bradley Efron
expressed this fact nicely:

During the 20th Century statistical thinking and methodology has become
the scientific framework for literally dozens of fields including education,
agriculture, economics, biology and medicine, and with increasing
influence recently on the hard sciences such as astronomy, geology, and
physics. In other words, we have grown from a small obscure field into a
big obscure field.

Bahir Dar University, Department of Statistics.


Where data is used in a decision-making process, the expertise of a statistician is essential
and they are ‘data engineers’ who deal with problems such as how to obtain reliable
information from different disciplines and analyses it. Thus, Statistics is applied in fields
which have economic, social and marketing research as well as population problems in
common.

The Statistics component of the course aims to provide the basic statistical tools for
management and business decisions, comprising, descriptive statistics, probability
distributions, estimation, hypothesis testing, correlation and regression.
1.2 Definitions of Statistics
The word “statistics” is derived from the Latin word ‘status’ meaning states. In political
leader ship the interest was in the numerical description of a political unit such as
provinces, states, cities, towns, etc. in which the main concern was the collection of
information on revenue, population, political manpower for military services, area of land
under cultivation, about births and deaths.

Recent definition of Statistics: By statistics, we mean aggregates of facts affected to a


marked extent by multiplicity of causes, numerically expressed, enumerated or estimated
according to reasonable standards of accuracy, collected in a systematic manner for a pre-
determined purpose and placed in relation to each other. It may also be defined as the
science of collection, organization, presentation, analysis and interpretation of numerical
data.

1.3 Daily Exposure to Statistics


Whether you realize it or not, you are bombarded with statistics every day. Polls, average
temperature for this time of year, crime rates, average income, and traffic counts on city
streets, all are statistics.

Bahir Dar University, Department of Statistics.


Summarizing information: Large amounts of information can be condensed into a few
simple figures and/or statements using statistics.
Tools: Statistical techniques are tools used to organize information and interpret
observations. That is all they are!
Describe and quantify error: Statistical analysis allows one to describe and quantify
sources of uncertainty and/or error in experimental data. With an idea of the uncertainty
in mind, one can assess the usefulness of the data. Obviously, a politician would like to
know just how accurate a poll is, especially if his/her popularity is hovering around the
area where s/he may not get re-elected.
Deduce and infer properties: Statistics allow one to deduce or infer the properties of a
population, based on the information we can derive from a sample of the population. As
mentioned above, we cannot always ask everyone their opinions.
Analytic reasoning of experiment: Statistics forces you to do some analytic reasoning
of the experiment you are considering while it is being planned. If there is some sort of
problem with the experiment itself and how the data are collected, no amount of analysis
will give you worthwhile results!
1.4 Applications, Uses, and Limitations of Statistics
Application of Statistics
Statistics is applied in almost all fields of human endeavor. It has become the scientific
framework for including education, agriculture, business and economics, industry and
health.
Uses of statistics
 Presents facts in a summarized and precise form
 Simplifies complex data (data reduction)
 Facilitates comparisons
 Helps in estimating unknown population characteristics
 Helps in studying the relationship between two or more variables

Bahir Dar University, Department of Statistics.


 Helps in prediction and forecasting future values and formulating policies
Limitations of Statistics
 Statistics deals with only aggregate of facts and not with individual data items
 Statistics deals with only with quantitative data (information)
 Statistical data are true only on average (approximately)
 Statistics can be easily misused and therefore should be used be expert.
1.5 Classification of Statistics
1.5.1 Descriptive Statistics

This part of statistics deals only with describing some characteristics of the data collected
without going beyond the sample. i.e. this part deals with only describing the data
collected without going any further or without attempting to infer (conclude) anything
that goes beyond. It comprises the first four stages of statistical investigation namely:
collection, organization presentation, and analysis of data.
1.5.2 Inferential Statistics
This type of statistics is concerned with drawing statistically valid conclusions about the
characteristics of the population based on information obtained from a sample. It is the
part of statistics that is generalizing from the sample to population using probabilities,
performing hypothesis testing, determining relationships between variables, and making
predictions.
Stages in Statistical Investigation
According to the definition of statistics, we have the following five stages of a statistical
investigation.
1. Collection of data: The first stage of statistical investigation. The data should be
collected with a specific and well defined purpose so that the conclusions drawn are
not to be misleading. Two methods of data collection: Primary and Secondary:
Primary method of data collection refers to obtaining original and first hand data
and Secondary method of data collection involves obtaining data from other sources.

Bahir Dar University, Department of Statistics.


2. Organization of data: This is a methodology for classification and describing the
properties of data in a summary form. Editing, coding and classification are the three
steps in the organization of data.
3. Presentation of data: In this stage the collected and organized data are presented
with in some systematic order to facilitate statistical analysis. The organized data are
presented with the help of tables, diagrams and graphs.
4. Analysis of data: Analysis of data involves extraction of relevant information from
the collected data using some mathematical and statistical tools. It involves extracting
relevant information from the data (like mean, median, mode, range, variance…),
mainly through the use of elementary mathematical operation.
5. Interpretation of data: This stage involves drawing a valid conclusion from the
analyzed data. That is interpretation of data involves making inferences (drawing
conclusions) based on the analysis of data.

1.6 Basic Concepts


Population versus Sample
The population includes all objects of interest whereas the sample is only a portion of the
population. In other words, a population is the totality of all subjects, measurements or
individuals possessing certain common characteristics that are being studied or sample
comes from while sample is a representative subset (subgroup) of a population selected
by using valid statistical procedures (sampling techniques). There are several reasons
why we do not work with populations. They are usually large, and it is often impossible
to get data for every object we are studying. Sampling does not usually occur without
cost, and the more items surveyed, the larger the cost.
Parameter versus statistic

Bahir Dar University, Department of Statistics.


Parameter is numerical measurements obtained from a population that denoted by Greek
letters (mu, sigma) whereas statistic is numerical measures obtained from a sample or a
measure of some characteristics of a sample. We compute statistic and use them to
estimate parameter. The computation is the first part of the statistics course (Descriptive
Statistics) and the estimation is the second part (Inferential Statistics)

Population
μ,σ ,π , ρ
Parameter

Sample X ,S , p,r
Statistic

1.7 Data Collection


Statistical data can be defined as any information collected as part of the study and
expressed in numbers or simply the numerical result of any count or measurement.
Data are only crude information and not knowledge by themselves. The sequence from
data to knowledge is from Data to Information, from Information to Facts, and
finally, from Facts to Knowledge. Data becomes information when it becomes relevant
to your decision problem. Information becomes fact when the data can support it. Fact
becomes knowledge when it is used in the successful completion of decision process. The
following figure illustrates the statistical thinking process based on data in constructing
statistical models for decision making under uncertainties.

In any statistical investigation the first step is to collect a set of related observations from

which conclusions may be drawn. Data are a set of related information (facts) from

Bahir Dar University, Department of Statistics.


which statistical conclusion may be drawn. Variable is a characteristics that can assume

different values.
Based on the information desired variable can be classified as qualitative and
quantitative.
Qualitative variable: variables in which the characteristic or variable being studied is
non-numeric. A qualitative variable is a variable that can be described only in words.
Example: gender, color, religion, ethnic group etc.
Quantitative variable: variables that can be expressed numerically or are variables that
are numeric in nature. Quantitative variables can be further classified as discrete or
continuous.
Discrete variables: A Variable that assumes a finite or countable number of possible
values is called a discrete variable. There are finite or countable numbers of choices
available with discrete data. You cannot have 2.63 people in the room. Discrete variable
is usually obtained by counting. E.g., number of children’s in a family, number of cars at
a traffic light is usually obtained by counting.
Continuous variables: A variable that can theoretically assume infinite number of
possible values is called a continuous variable. Continuous variables are obtained by
measuring. Continuous variables assume any value between two given values. Length,
weight, and time are all examples of continuous variables. Since continuous variables are
real numbers, we usually round them. This implies a boundary depending on the number
of decimal places. For example: 64 is really anything 63.5 ≤ x < 64.5. Likewise, if there
are two decimal places, then 64.03 is really anything 63.025≤ x < 63.035. Boundaries
always have one more decimal places than the data.
Exercises
Classify each of the following as qualitative and quantitative and if it is quantitative
classify as discrete or continuous.
a. Colors of automobiles in a dealer’s show room.

Bahir Dar University, Department of Statistics.


b. Number of seats in a movie theater.
c. Miles driven by the port authority bus drivers each day
d. Number of tomatoes on each plant on a field
e. Weight of newly born babies.

1.7.1 Classification of Data.

A. Classifications by Sources
According to source, data are classified as primary and secondary.
Primary data refer to data collected either by or under the direct supervision and
instruction of the researcher while Secondary data refer to data which are not
originated by the researcher himself/herself, but obtained from other sources such as
newspapers, journals, official records, etc.

B. Classification by the Role of Time


According to the role of time, data are classified in to cross-section and time series data.
Cross-section data is a set of observations taken at one point in time.
Time series data is a set of observations collected for a sequence of times, usually at
equal interval which may be on weekly, monthly, quarterly, yearly, etc basis.

C. Classification by Scale of Measurement


There are four levels of measurement namely: Nominal, Ordinal, Interval, and Ratio data.
Data are classified according to the highest level they fit.
Nominal Data: are categorical or qualitative data that are converted into numerical data
by coding. These are numerical in name only; because the numbers assigned are more
symbols and hence cannot have any numerical meaning in the real sense. There is no any
mathematical difference between the coded values.
Examples

Bahir Dar University, Department of Statistics.


1. Sex of a person (male and female could be coded as 0 and 1).
2. Ethnic group (black, white and oriental may be coded as 0, 1 and 2).
3. Marital status (Single, married, divorced and widowed as 1, 2, 3 and 4).
4. Phone numbers could be coded as 0581111111, 0111550119, etc.
5. Eye color.
6. Type of cars.
7. Religious preferences.
 Note that in all the above cases one cannot say that 1> 0 or 2> 1 etc. If we consider
the second example and take the codes “0” and “1”, it will be equivalent to say that
‘white is greater than black’ since 1 is greater than zero. That is, no quantitative
information is conveyed and no ordering of the items is implied when dealing with
nominal measurement scales.
Ordinal Data: are nominal data, which have order and consensus. Measurements with
ordinal scales are ordered in the sense that higher numbers represent higher values, i.e.,
they can have meaningful inequalities (< or >). In such kind of data, only counting and
ranking are possible but it is not likely to find exact differences.
Examples
1. Military ranks: comparing 3 stars general and 4 stars general.
2. Graduates of a university with distinct programs (Diploma, Degree, Masters and
PhD)
3. Most of the data in social science studies such as IQ, personality scores, attitude
scores and personal evaluation.
4. Health status: very sick, sick and cured
5. Likert scale such as 1= poor, 2= fair, 3= good and 4= excellent.
 Note that with these data the intervals between the numbers are not necessarily equal,
i.e., if we consider the fifth example above the difference between a rating of 2 or fair

Bahir Dar University, Department of Statistics.


rating and a rating of 3 or good rating may not represent the same difference as the
difference between a rating of 4 or very good rating and a rating of 5 or excellent.
Interval Data: are ordinal data in which the differences between units have meaning.
These data do not have a’ true’ zero point and therefore it is not possible to make
statements about how many times higher one score is than other. In other words, the
ratios of different values are meaningless.
Examples
1. Data obtained from temperature condition of a town measured in 0C or oF.
2. Number of votes in an eliction.
3. Exam scores of students.
4. Data on shoe size of individuals.
Suppose we take the first example, which is given above. A temperature of 0°C does not
mean that there is no temperature. Furthermore, a temperature of 30°C in town X on a
specific day may not be twice as warm as 15°C on another day in the same town.
Ratio Data: are interval data, which also have true zero point. With these data, one can
perform addition, subtraction, division and multiplication.
Examples:
1. Income is a ratio data because zero dollars is truly “no income”
2. Measurement data like height, weight, volume and area.
Review Exercises
1. Determine whether the given value is a statistic or parameter.
a. The Federal house of population parliament seat consists of 160 women
and 387 men.
b. A sample of students is selected and the mean amount of time waiting in
line to take grade reports from a registrar is 4 hour
c. In a study of all 1700 University of Gondar staff members, it is found that
only 500 are academic staff members.

10

Bahir Dar University, Department of Statistics.


2. Determine whether the given value is from a discrete or continuous data set.
a. The monthly salary of diploma graduate before 20 years was birr 347 and
the current annual salary is birr 973.
b. A statistics student obtains a sample data and finds the mean loading weight
of cars in the sample is 70 quintal.
c. In a survey of 6000 business development service renders, it is found that
20% of them are matured enough to create job opportunity for others.
3. Determine which of the four levels of measurements (nominal, Ordinal,
interval or ratio) is the most appropriate.
a. Height of women basketball players in the University of Gondar
b. Rating of fantastic, good, average, poor, or, unacceptable for some diets.
c. Current temperature in oC at your office.
d. The number of ‘yes’ response received when 1250 drivers are asked if they
have ever used a mobile while driving.
4. The Nib Insurance Company has 5000 stockholders and a poll is conducted by
randomly selecting 30 stockholders from each of the 9 regions. The number of
shares held by each sampled stockholder is recorded.
a. Are the values obtained discrete or continuous?
b. Identify the level of measurement (nominal, ordinal, interval or ratio) for
the sample data.
c. Which type of sampling (random, systematic, convenience, stratified,
cluster) is being used?
d. If the average (mean) number of shares is calculated, is the result a
statistic or a parameter?
e. What is wrong with gauging Stockholders views by mailing a
questionnaire that stockholders could complete and mail back?

11

Bahir Dar University, Department of Statistics.


Chapter Two: Methods of Data collection and Presentation

2.1. Methods of data collection


The first and foremost task in any statistical investigation is collection of data. Depending
on the type of variable and the objective of the study different data collection methods
can be employed. Data collection techniques allow us to systematically collect data about
our objects of study (people, objects, and phenomena) and about the setting in which they
occur.

Broadly, there are two methods of data collection, which are primary and secondary.
The primary method consists of obtaining data or information by any of the following
ways.
 Direct personal interview
 Indirect personal interview
 Information from correspondents
 Mailed questionnaires
 Questionnaires to be filled by enumerators
The secondary method is a method by which we obtain data from the records of
institutions that collect and publish statistics as part of their routine duties.
2.2 Methods of data presentation
Data presentation is a statistical procedure of arranging and putting data in a form of
tables, graphs, charts and diagrams. After data have been collected and organized, the
next step is to present them in some suitable form. The need for proper presentation arises
because of the fact that statistical data in their raw form are difficult to make
comprehension.
In this chapter, we will be introduced with the following concepts.

12

Bahir Dar University, Department of Statistics.


 Frequency distributions as a tool for describing, exploring or comparing
distributions of data sets.
 Construct graphs in order to understand the nature of their distribution existing in
a data set.
2.3 Tabular presentation
A statistical table is presentation of numbers in a logical arrangement, with some brief
explanation to show what they are. However, before tabulating data it is often necessary
first to classify them. The objective of tabular presentation of data or classification in
general is to arrange data in groups of classes according to their resemblance and
affinities. A classification according to characteristics, for example, can be done:
 Chronologically like monthly, yearly, etc.
 Qualitatively such as sex, color etc,.
 Geographically like Africa, Asia, etc.
 Quantitatively like age, income, height, etc.
For example, assume that a sales slip can have 0, 1, 2, or 3 errors after it is filled out.
Fifty sales slips are randomly selected with the following results: 10 had no errors, 20 had
only one error, 12 had two errors, and 8 had three errors. These results can be constructed
in a tabular format as follows:

One of the simplest and most revealing devices for summarizing data and presenting
them in meaningful fashion is the statistical table. A table is a symmetrical arrangement

13

Bahir Dar University, Department of Statistics.


of statistical data in columns and rows. Tables may broadly be classified into two
categories: simple versus complex and general purpose versus special purpose.

Simple or One- way Table: in this type of table only one characteristic is shown.
Complex Tables: such tables represent two or more characteristics with the same table.
General Purpose Tables: also known as the reference tables or repository tables provide
information for general reference. On the other hand, special purpose tables, also known
as summary or analytical tables, provide information for particular discussion.
An ideal table should consists: table number, title of the table, caption or column
heading, body of the table, stubs or row designation, footnotes and source of data.

Table number: for easy reference and identification a table should be numbered. This
number, if possible, should be written in the center at the top of the table.
Title: title of the table is a description of the contents in the table. A complete title has to
answer the questions what precisely are the data in the table? Where and when the data
occurred?
Caption: in a table stands for brief and self explanatory headings of vertical columns and
it explains what the column represents.
Stubs: stands for brief and self-explanatory headings of horizontal rows. Stubs do
perform the same function for the horizontal in the table as the column headings do for
vertical columns.
Body: the body of a table contains the numerical information of frequency of
observations in the different cells.
Footnotes: are given at the foot of a table to explain any fact or information included in
the table, which needs some explanation.
Source of data: one should also mention the source of information, from which the data
are taken. This may include author, volume, page and year of publication.
Classification of table has the following chief advantages

14

Bahir Dar University, Department of Statistics.


 To eliminate unnecessary details.
 To facilitate comparison.
 To have a bird’s –eye-view of the significant features of the data.
 To utilize the data for further statistical analysis.
Example
In 1990, out of 5000 workers in a factory 4300 were members of a trade union. The
number of women workers employed was 500 out of which 400 did not belong to any
union. In 1989, the number of workers in the union was 4450 of which 4000 were men.
The number of non-union workers was 800 of which 350 were women.
Therefore, the above information is Present by two way statistical table as we consider
two characteristics namely, year and membership.

years 1989 1990


Male Female Total Male Female Total
Membership
of union
Member 4000 450 4450 4200 100 4300
Non-member 450 350 800 300 400 700
Total 4450 800 5250 4500 500 5000

2.4 Frequency Distribution


Definition
It is necessary to define some of the important terms in order to understand the
distribution of frequency.
Frequency
The number of times a certain value or class of values occurs.
Frequency Distribution

15

Bahir Dar University, Department of Statistics.


The organization of raw data in table form with classes and frequencies.
 Categorical Frequency Distribution: A frequency distribution in which the data
is only nominal or ordinal.
 Ungrouped Frequency Distribution: A frequency distribution of numerical data
in which the values are not grouped. It is a table of all the potential raw score
values that could possible occur in the data along with the number of times each
actually occurred. It is often constructed for small set or data on discrete variable.
 Grouped Frequency Distribution: When the range of the data is large, the data
must be grouped in to classes that are more than one unit in width. Grouped
frequency distribution is a frequency distribution where several numbers are
grouped into one class.
Class Limits
Separate one class in a grouped frequency distribution from another. The limits could
actually appear in the data and have gaps between the upper limit of one class and the
lower limit of the next.

Units of measurement (U or d)

the distance between two possible consecutive measures or the gap between two
successive classes. It is usually taken as 1, 0.1, 0.01, 0.001, -----.

Class Boundaries

Separate one class in a grouped frequency distribution from another. The boundaries
have one more decimal place than the raw data and therefore do not appear in the data.
There is no gap between the upper boundary of one class and the lower boundary of the
next class. The lower class boundary is found by subtracting U/2 from the corresponding
lower class limit and the upper class boundary is found by adding U/2 to the
corresponding upper class limit.

Class Width (w)

16

Bahir Dar University, Department of Statistics.


It is the difference between the upper and lower boundaries of any class. The class width
is also the difference between the lower limits of two consecutive classes or the upper
limits of two consecutive classes. It is not the difference between the upper and lower
limits of the same class.
Class Mark (Midpoint)
The number in the middle of the class. It is found by adding the upper and lower limits
and dividing by two. It can also be found by adding the upper and lower boundaries and
dividing by two.
Cumulative Frequency
The number of values less/more than the upper/lower class boundary for the current class.
As the definition tells us there are two cumulative frequencies.
 Cumulative frequency below: it is the total frequency of all values less than
or equal to the upper class boundary of a given class.
 Cumulative frequency above: it is the total frequency of all values greater
than or equal to the lower class boundary of a given class.
Relative Frequency
The frequency the class divided by the total frequency. This gives the percent of values
falling in that class.
Cumulative Relative Frequency (Relative Cumulative Frequency)
It is the running total of the relative frequencies or the cumulative frequency divided by
the total frequency. It gives the percent of the values which are less/more than the
upper/lower class boundary.
Histogram
It is a graph that displays the data by using vertical bars of various heights to represent
frequencies. The horizontal axis should be the class boundaries.
Frequency Polygon

17

Bahir Dar University, Department of Statistics.


It is a line graph of class frequencies against midpoints of the classes. The frequency is
placed along the vertical axis and the class midpoints are placed along the horizontal axis.
These points are connected with lines.
Ogive
A line graph that represents the cumulative frequencies (less than or more than type)
plotted against upper or lower class boundaries respectively. That is class boundaries are
plotted along the horizontal axis and the corresponding cumulative frequencies are
plotted along the vertical axis. The points are joined by a free hand curve. The graph
always starts at zero at the lowest class boundary and will end up at the total frequency
(for a cumulative frequency) or 1.00 (for a relative cumulative frequency).
Pie Chart: Graphical depiction of data as slices of a pie. The frequency determines the
size of the slice. The number of degrees in any slice is the relative frequency multiply by
360 degrees.
Pictograph: A graph that uses pictures to represent data.
2.4.1 Guidelines for Construction of Classes
There is no fast and hard rule for the number of classes determined. The number of
classes to use depends largely on how many measurements or observations we have.
Using too few classes gives an inaccurate picture by smoothing out too much detail and if
the number of classes is too large, the classification loses its effectiveness as a method of
summarizing data. The following are some of the guidelines for using classes.

1. Choosing the number of classes to use, preferably between 5 and 20.

2. The classes must be mutually exclusive. This means that no data value can fall into
two different classes.

3. The classes must be all inclusive or exhaustive. This means that all data values
must be included.

4. The classes must be continuous. There are no gaps in a frequency distribution.

18

Bahir Dar University, Department of Statistics.


5. The classes must be equal in width. The exception here is the first or last class
when we have a "below ..." or "... above" class. This is often used with ages.

2.4.2 Steps for Constructing Grouped Frequency Distribution


1. Find the largest and smallest values and compute the Range(R) = Maximum –
Minimum

2. Select the number of classes desired, usually between 5 and 20 or use Sturges

rule k  1  3.332 log n where k is number of classes desired and n is total


number of observation.

R
w
3. Find the class width by dividing the range by the number of classes: k . It is
also the difference between the upper and lower class boundaries of the class, that
is, w = UCB – LCB.

4. Pick a suitable starting point less than or equal to the minimum value. The
starting point is called the lower limit of the first class. Continue to add the class
width to this lower limit to get the rest of the lower limits.

5. To find the upper limit of the first class, subtract U from the lower limit of the
second class. Then continue to add the class width to this upper limit to find the
rest of the upper limits.

6. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2
units from the upper limits.

7. Tally the data.

8. Find the frequencies.

9. Find the cumulative frequencies.

10. If necessary, find the relative frequencies and/or relative cumulative frequencies

19

Bahir Dar University, Department of Statistics.


Example

1. The following data represent the mark of 20 students and then construct ungrouped
frequency distribution.

80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Solution
Step 1: Arrange the data in the order of magnitude and make a table as shown below.
Step 2: Tally the data.
Step 3: Compute the frequency.
Mark 60 62 63 65 70 74 75 76 80 85 90
Tally // / / / //// / // / /// /// /
Frequency 2 1 1 1 4 1 2 1 3 3 1

N. B.: Each individual value is presented separately, that is why it is named ungrouped
frequency distribution.

2. Construct grouped frequency distribution for the following data.

11 29 6 33 14 31 22 27 19 20

18 17 22 38 23 21 26 34 39 27

Solutions

Step 1: Find the highest and the lowest value H=39, L=6 and find the range; R=H-L=39-
6=33.

20

Bahir Dar University, Department of Statistics.


Step 2: Find the number of classes using Sturges formula; which are given by
k  1  3.332 log n =1+3.32log (20) =5.32=6(rounding up)

R
w
Step 3: Find the class width; k =33/6=5.5=6 (rounding up)

Step 4: Select the starting point let it be the minimum observation and add the class width
(=6). Therefore, you have: 6, 12, 18, 24, 30, 36 are the lower class limits.

Step 5: Find the upper class limit; i.e. the first upper class=12-U=12-1=11 and add the
class width (=6). Therefore, you have: 11, 17, 23, 29, 35, 41 are the upper class
limits.

So combining step 5 and step 6, one can construct the following classes.

Class Limits: 6-11 12-17 18 – 23 24 – 29 30 – 35 36 – 41


Step 6: Find the class boundaries;

For the first class: Lower class boundary=6-U/2=5.5

Upper class boundary =11+U/2=11.5

Then continue adding w on both boundaries to obtain the rest boundaries. By doing so,
one can obtain the following classes.

Class boundary: 5.5 – 11.5 11.5 – 17.5 17.5 – 23.5 23.5 – 29.5 29.5 – 35.5 35.5 – 41.5

Step 7& 8: tally the data and write the numeric values for the tallies in the frequency
column.

Step 9 & 10: Find cumulative, relative or/and relative cumulative frequencies.

The complete grouped frequency distribution is given as:

Class limit Class boundary Class Tally Freq Cf. (less Cf (more rf. rcf (less
Mark . than type) than type) than type)

21

Bahir Dar University, Department of Statistics.


6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10

12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20

18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55

24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75

30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90

36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00

2.5 Graphical Presentation of Data


A graphic presentation of the data found in table is more likely to get attention of the
casual observer and shows trends or relationships that might be overlooked in a table
and are the most commonly used devices for presenting statistical data. The histogram,
frequency polygon and cumulative frequency graph or Ogive are most commonly
applied graphical representation for continuous data. Thus, this method enables us to
present statistical data in a simple, clear, unambiguous and effective manner.
The following are procedures for constructing statistical graphs even if these are not fast
rules.
 Draw and label the X and Y axes.
 Choose a suitable scale for the frequencies or cumulative frequencies and label it on
the Y axes.
 Represent the class boundaries for the histogram or Ogive or the mid points for the
frequency polygon on the X axes.
 Plot the points.
 Draw the bars or lines to connect the points.

22

Bahir Dar University, Department of Statistics.


2.5.1 Histogram
It is a graph that displays data by using vertical bars of various heights to represent
frequencies and class boundaries are placed along the horizontal axes. This means that in
order to construct a histogram one has to formerly prepare a frequency distribution table.
The heights of the bars correspond to the frequency values, and the bars are drawn
adjacent to each other (without gaps). Here the ends of the bases of two adjacent
rectangles coincide and it differs from a bar chart in that there is a numerical scaling on the
horizontal axis.
Example
The histogram of the following grouped frequency distribution is given below.
Class 14.5-24.5 24.5-34.5 34.5- 44.5-54.5 54.5-64.5
Boundary 44.5
Frequency 3 4 8 6 7

23

Bahir Dar University, Department of Statistics.


N.B.: If the size of class interval is not uniform, the height of rectangles can be adjusted
by taking the ‘frequency density (fd)’ of the corresponding classes as scale for vertical
axis. fd = frequency of the class / width of the same class.

2.5.2 Frequency Polygon


A frequency polygon is a line graph of class frequencies plotted against class marks. It is
obtained by joining the middle points of the tops of adjacent rectangles of a histogram
with line segments. Here ends must be joined to the x-axis at mid points of empty classes:
one before the lowest class and the other after the highest class.
Example
Consider grouped frequency distribution given in the above example and construct a
frequency polygon.

24

Bahir Dar University, Department of Statistics.


Frequency
Frequency Polygon
10

5 Frequency

0
4.5- 14.5- 24.5- 34.5- 44.5- 54.5- 64.5-
14.5 24.5 34.5 44.5 54.5 64.5 74.5
Class Boundaries
.
2.5.3 Cumulative Frequency Curve (Ogive)
Ogive is a frequency polygon (line plot) of the cumulative/relative cumulative
frequencies. The horizontal axis is marked with the class boundaries and the vertical one
is by cumulative/relative cumulative frequencies. A given cumulative frequency
distribution can have both a ‘more than’ and a ‘less than’ Ogive and it is useful to find
out the values quantiles (quartiles, deciles and percentiles).
Example
Construct both the less than and the more than Ogive by considering the data set given
below.
CB 19.5-29.5 29.5-39.5 39.5-49.5 49.5-59.5 59.5-69.5 69.5-9.5 79.5-89.5
f 4 6 8 12 9 7 4
Solution
Classes less than 29.5 39.5 49. 59.5 69. 79.5 89.5
5 5
Cum. Freq. 4 10 18 30 39 46 50
Classes more than 19.5 29.5 39. 49.5 59. 69.5 79.5
5 5
Cum. Freq 50 46 40 32 20 11 4

25

Bahir Dar University, Department of Statistics.


2.5.4 Line Graphs
The line graph is especially useful for the study of some variables according to the
passage of time. The time, in weeks, months or years is marked along the horizontal axis;
and the value of the quantity that is being studied is marked on the vertical axis. The line
graph is suitable for depicting a consecutive trend of a series over a long period. These
are especially useful in showing variations in continuously changing variables such as
temperature, rainfall, pressure, commodity prices, etc.
Example
A Biologist measures the efficiency (%) of a polymerization reaction for various vessel
temperature and pressures as: efficiency (%) 74 81 85 76 85 88 76 82 91, then
present the data by a line graph.
7000
6000 5788
5000 4967
4000 3900
3412 effi (%)
3000
2338
2000 2005 1990
1000
0 0 0
1 2 3 4 5 6 7 8

2.6 Diagrammatic Representation of Data

26

Bahir Dar University, Department of Statistics.


Analogous to graphical data presentation, diagrammatic representation of data seems to
be more appealing and more effective in conveying idea of the trends in qualitative data.
Such representation provides a good visual impression of the important features of entire
mass of qualitative data. It is appealing to the eye, and has a greater memorizing effect
and facilitates comparison.
2.6.1 Bar Diagrams
They are the simplest and most widely used diagrams for the visual
presentation of qualitative data and having the following types:
 Simple
 Multiple
 Subdivided
 Broken
Simple Bar-Chat: consist of a number of rectangles and is used only for one-
dimensional comparisons.
Example
Draw a bar-chart to represent the following data related to students’ enrolment in a
university.
Year 199 1991 1992 199 1994 1995
0 3
No. of students 200 2338 3412 390 4967 5788
5 0

27

Bahir Dar University, Department of Statistics.


Multiple Bar-Chart: In this type of chart the component figures are shown as separate
bars adjoining each other. The height of each bar represents the actual value of the
component figure. It depicts distributional pattern of more than one variable and
comparisons of each component are desired.
Example
Represent the following data relate to faculty wise enrolment of students in a college by
using multiple bar chart.
Years 1990 199 1992
1
No. of Art students 120 115 132
No. of Science students 160 165 190
No. of Health students 80 90 94
Solution
Since year-wise data is compared in three aspects (Art, Science and Health), the
appropriate diagram is a multiple bar chart.

28

Bahir Dar University, Department of Statistics.


Component Bar chart: Bars are sub-divided into component parts of the figure. These
sorts of diagrams are constructed when each total is built up from two or more
component figures. This is done by dividing the bars into parts representing the
components and shading them accordingly.
Example
Consider the above example and give the Component bar chart.

2.6.2 Pie-Chart
is a circle divided into sectors with areas equal to the corresponding components. It is
used for representing breakdown of an aggregate into its components. The proportion of
the category can express either by percentages or by angles.
That is degree of central angle of a category = (amount of the category / total amount)*
360o. The proportion of a category = (frequency of a category / total frequency)* 100%.
Example

29

Bahir Dar University, Department of Statistics.


Represent the following data that refer students enrolled at three universities in the year
2004 by using Pie-chart.
University Addis Gondar Jimma Total
Ababa
No. of students 8000 6000 6000 20000
Solution
The total degree measure of a circle is 360; hence, the amount of degrees corresponding
to each university can be obtained as:
Addis Ababa = [8000/20000]* 360 = 144, Gondar = [6000/20000]*360 = 108 = Jimma

Pie Chart Addis Ababa

30%
40%

30%

Review Exercises
1. a. Draw a histogram for the following information.
Height (feet): (Number of pupils) Relative frequency:
0-2 0 0
2-4 1 1
4-5 4 8
5-6 8 16
6-8 2 2
b. Find the cumulative frequency
Height (cm) Frequency:

30

Bahir Dar University, Department of Statistics.


000-100 4
100-120 6
140-160 2
160-180 6
180-220 4
2. Consider the following data and construct ungrouped frequency distribution.
1 0 3 2 0
2 4 1 3 1
4 1 2 2 3
3. The following data show the daily wage of 15 laborers in a given town.
10, 4, 4, 7, 7, 7, 5, 5, 8, 7, 8, 5, 10, 8, 7, 5, 7, 8, 7, 4. Prepare an ungrouped
frequency distribution.
4. Consider the following Grouped frequency distribution of daily income of 30
females.
Daily 5-9 10 -14 15-19 20-24 25-29
income
Frequency 2 6 12 7 3

a. What is the class frequency of the 3rd class?


b. How many observations are linked in the last class?
c. Find
I. The lower and upper class limits of the 4th class.
II. The lower and upper class boundaries of the 3rd class.
III. The class interval (class width) of the 5th class.
IV. The class mark of the 2nd class.
5. The number of customers of Ethiopian Commercial bank for the past consecutive
30 days was

31

Bahir Dar University, Department of Statistics.


20 35 53 18 49 25 42 57 16 63
48 25 42 65 68 48 22 65 39 29
65 72 23 37 69 49 58 37 42 67

a. Construct a grouped frequency distribution with a suitable number of classes.


b. find the class boundaries for the distribution constructed in section (a) above.
6. Construct a pie chart for the annual income statement of a certain family shown
below.
Items Amount
Expenditure on 10000
goods
Saving 35000
Investment cost 25000
Miscellaneous 30000

Chapter Three: Measures of Central Tendency

3.1 Introduction
Tabulation, diagrammatic and graphic presentation techniques do not tell the complete
picture of the phenomena. The simplest case for highlighting some of the key features
and terminology of statistics is that of making statements about a single population based
on a sample. There is a need to calculate single representative values that has strong
power to create images on the user of information, which are known as average. Average
is the most popularly known tools to condense the chaos data and represent it through
single numbers. To have clear information about observations closeness or far apart,

32

Bahir Dar University, Department of Statistics.


measure of central tendency is not sufficient. Therefore, it is useful to find some way of
measuring their dispersion. In this chapter we will be also, learn about relative measure of
variation and statistical measure of position.
3.2 Statistical Measures of Central Tendency
The most important objective of statistical analysis is to determine a single value for the
entire mass of data so that it describes the overall level of the group of observations and
can be considered as a representative value of the whole set of data. It shows the center
of a set of observations.
Before attempting measures of central tendency and dispersion, let’s see some of the
Summation notations that are used frequently.

The summation notation


Suppose x1, x2, x3,…, xn are numerical measurements of a variable X. The sum of all X i’s
n

x i
where i goes from 1 up to n is symbolically given by: i 1 = x1 + x2 +…. + xn
Properties of Summation Notation
n
n(n  1)
1)  i  1  2  ....  n 
i 1 2
n n n
2)  ( xi  y i )   xi   y i
i 1 i 1 i 1

n n
3)  cxi  c xi
i 1 i 1

n
4)  c  nc
i 1
n n
5)  ( xi  a)   xi  na
i i 1
n n n
6)  ( xi  a)   xi  2a xi  na 2
i 1
2

i 1
2

i 1

33

Bahir Dar University, Department of Statistics.


3.2.1 "Average"
An average is a value which tends to sum up or describe the mass of the data. Even
though the term "Average" is vague, the mean has two related meanings in the field of
Statistics. It is sometimes stated that the 'mean' means average. But it is incorrect if
"mean" is taken in the specific sense of "arithmetic mean" as there are different types of
averages: the mean, median, and mode. For instance, average house prices usually use the
median value for the average.
3.2.1.1 Arithmetic Mean
The arithmetic mean is the "standard" average, often simply called the "mean" and
determine as:

The mean may often be similar with the median or mode for symmetrical set of values, or
distribution; however, for skewed distributions, the mean is not necessarily the same as
the middle value (median), or the most likely occurrence value (mode). For example,
mean income is skewed upwards by a small number of people with very large incomes,
so that the majority has an income lower than the mean. By contrast, the median income
is the level at which half the population is below and half is above. The mode income is
the most likely occurrence income, and favors the larger number of people with lower
incomes. The median or mode is often more intuitive measures of such data.
Nevertheless, many skewed distributions are best described by their mean such as the
exponential and Poisson distributions.
For example, the arithmetic mean of six values: 34, 27, 45, 55, 22 and 34 is:

If X is a variable having values X1, X2,…, Xm occurring with frequencies f1, f2,…, fm
respectively, then its arithmetic mean is given by:

34

Bahir Dar University, Department of Statistics.


m

X2f +¿…+ X m f m
∑ Xif i

X =X 1 f + 2
= i=1m ¿.
1
f 1+ f 2 +…+ f m
∑fi
i=1

Example: Suppose the X values are 3, 5, 4, 2, 7 and 6 with corresponding frequencies of


2, 1, 3, 2, 1 and 1 respectively. Then fine the mean for frequent data.
m

∑ Xi 3∗2+5∗1+…+7∗2+6∗1 40
i=1
X= = = = 4.
m
2+…+ 1 10
∑ fi
i=1

3.2.1. 2 Geometric mean


If the observed values are measured as ratios, proportions or percentages and the series
of observations contains one or more unusually large values geometric mean gives a
better measure of central tendency than other means. It is obtained by taking the n th root
of the product of “n” values, i.e, if the values of the observation are demoted by X 1,X2,
…,Xn, then

GM =
{√ m
√n X 1 . X 2 … . X n , for ungrouped data sets
X f1 . X f2 … . X mf , for data sets of Xi having frequencies fi
1 2 m

N.B.: The geometric mean is an average that is useful for sets of data that are not
containing odd negative numbers and have no zeros. For example rates of growth.
Example: the geometric mean of six values: 34, 27, 45, 55, 22 and 34 is:

3.2.1.3 Harmonic Mean


Harmonic mean is a suitable measure of central tendency when the data pertains to speed,
rates and time. The harmonic mean is defined as the reciprocal of the mean of the

35

Bahir Dar University, Department of Statistics.


reciprocals of a series of observations. That is let X 1, X2, …, Xn be the values of a set of
n n
n
observations, then the harmonic mean is given by: HM = 1 1
+ + …+
1 =∑ 1 .
X1 X2 Xn i=1 Xi

When the observed values X1, X2, …Xk have the corresponding frequencies f1, f2,…fk
n ❑

∑ Xi , where n= ∑
k
respectively, then the harmonic mean is given by: HM = f fi .

i=1 i

Example: the harmonic mean of the six values: 34, 27, 45, 55, 22, and 34 is

The relation among the three means: Arithmetic mean (AM), geometric mean (GM) and
harmonic mean (HM) have the following relationships.
AM≥GM≥HM, this statement can be elaborated by considering x 1 and x2 as non negative
observation, then
2 x1 x2
x x
xx
1 2

and HM= x1
2 1 2  x2
AM= GM=
AM=GM=HM if x1 = x2

If x1 and x2 is a real quantity so that x1


 x 2 will be real and ( x 1
 x 2 ) 2≥0

x x
xx xx
1 2

Which becomes X1 + x2-2 1 2 ≥0 and 2 ≥ 1 2

It shows that AM≥ GM.

Again x1 + x2-2 xx 1 2 ≥0 which gives x1 + x2 ≥2 xx 1 2 xx 1 2


. Multiply it by ,

2 x1 x2
xx
1 2 ≥ x x
1 2
then we have after some mathematical manipulation and shows

36

Bahir Dar University, Department of Statistics.


GM≥ HM.
By implies AM≥GM≥HM!!!
3.2.1.4 Weighted Arithmetic Mean
In computation of mean we had given equal importance to each observation. While, when
averaging quantities, it is often necessary to account for the fact that not all of them are
equally important in the phenomenon being described. In order to give quantities being
averaged there proper degree of importance, it is necessary to assign them relative
importance called weights, and then calculate a weighted mean. In general, the weighted
mean, X w, of a set of values X1, X2, …, Xn, whose relative importance is expressed
numerically by a corresponding set of weights W1, W2, … Wn, is given by:
n

X 2 W +¿ …+ X ∑ XiW i

¿.
fn
X w =X 1 W + 2 n
= i=1n
1
W 1 +W 2 +…+W n
∑ Wi
i=1

Example
In a given drug shop four different drugs were sold for unit price of 0.60, 0.85, 0.95 and
0.50 birr and the total numbers of drugs sold were 10, 10, 5 and 20 respectively. What is
the average price of the four drugs in this drug shop?
Solution: for this example we have to use weighted mean using number of drugs sold as
the respective weights for each drug's price.
Therefore, the average price will be:
Xw = (10*0.60 +10*0.85 + 5*0.95 + 20*0.50)/ (10+10+5+20) = 29.25/45 = 0.65 birr. If
we don't consider the weights, the average price will be 0.725 birr and it is totally
wrong!!

3.2.1.5 Properties of the Mean


i. For a given set of data, there is one, and only one, arithmetic mean.
ii. Its meaning is easily understood.

37

Bahir Dar University, Department of Statistics.


iii. Since every value goes into its computation, it is affected by the magnitude of each value.
Because of this, the mean may not be the best measure of central tendency when there are
one or two extreme values in the set of data.
iv. Since it is a computation, the mean may not correspond to any actual value.
v. It can, however, be manipulated algebraically, which makes it an especially useful
measure for statistical inference purposes.

3.2.1.6 Mean of Grouped Data


In order to compute the mean from grouped data, we use a weighted mean. The class mark
or midpoint of each class can be used to represent all the observations that have been
included in that class. To calculate the mean it is necessary to undertake the following steps:
1. Multiply each midpoint xj by the number of observations fj it represents.
2. Sum together all these products.
g

x
j 1
j fj
x
3. Divide the total sum by the number of observations. This is: n
Where,
n = number of observations in the sample
g = number of classes in the frequency distribution
xj = midpoint of the jth class
fj = number of observations in the jth class
Example
Using the age of employees example, the frequency distribution is:
Class Interval Class mark Frequency
CI (xj) fj xjfj
15 - 19 17 2 34
20 - 24 22 10 220

38

Bahir Dar University, Department of Statistics.


25 - 29 27 19 513
30 - 34 32 27 864
35 - 39 37 16 592
40 - 44 42 10 420
45 - 49 47 6 282
50 - 54 52 5 260
55 - 59 57 3 171
60 - 64 62 2 124
Total 100 3,480
The mean = 3,480 = 34.8
100
Average could mean one of the four things. That is the arithmetic mean, the median, or
mode. For this reason, it is better to specify which average you are talking.
3.2.2 Median
Median is defined as the middle measure - that value above and below which half the values
lie. In other words, it is the middle value of an ordered sequence of data.
3.2.2.1 Median of Ungrouped Data
The median is obtained by placing the raw data in numerical order, or in an ordered array.
Rule1: If the size of the sample is an odd number, the median is represented by the
numerical value corresponding to the (n + 1)/2 th observation.
Rule2: If the size of the sample is an even number, the median is represented by the
numerical value corresponding to the average of the (n/2) and (n/2 + 1) th observations.
Example
Five households have total annual incomes of £10,000, £24,500, £15,000, £21,500, and
£13,000. To obtain the median level of income, the 5 values are arranged in order of
magnitude:
£10,000 £13,000 £15,000 £21,500 £24,500

39

Bahir Dar University, Department of Statistics.


th
 51 
 
For a sample of 5, the median is the value of the  2  observation, in other words
the third observation, £15,000.
Example
If there was an additional observation - £9,000, the ordered array would be:
£9,000 £10,000 £13,000 £15,000 £21,500 £24,500
The median would be the average of the third and fourth observations (13,000 + 15,000)/ 2
= £14,000.
3.2.2.2 Median for Grouped Data
The median is still defined as the value in a set of data above and below which half the
values lies. With grouped data, it is not possible to find the exact value of the median, since
the individual values are not identifiable from the frequency distribution.
The median for a frequency distribution is that value which divides the area of the histogram
into two equal parts. For this to be valid, it is assumed that the values in each class interval
are evenly distributed over the entire interval.
The median is estimated using the following formula:
n
-  f -1
Median = L + ( 2 )w
fm
Where, L = lower class boundary of the median class (the class containing the median)
n = sample size
Σf-1 = cumulative frequency of all classes lower than the median class.
fm = frequency of the median class
w = width of the median class interval.
Example
Using the age data of the above example, the median is calculated as follows:

40

Bahir Dar University, Department of Statistics.


(50 - 31)
Median = 29.5 + 5 = 33.0
27
The median can also be approximated from less than percentage ogives. By plotting relative
frequencies against class limits or boundaries, it is possible to locate 50% on the vertical
axis, moving across to the less than ogive, and then down to the horizontal axis. This value
is the median, since 50% of the observations are less than this value.

3.2.2.3 Properties of the median


1. The median always exists in a set of numerical data.
2. The median is not affected by a few extreme values, whereas the mean is. When the data
is skewed, the median would be the preferable choice for central tendency measure.
3. The median can be used to characterise qualitative data. In situations where it is not
possible to give a numerical score, but it is possible to rank observations or place in an
order of preference, the median can be used. Examples include competitions, where
people or objects are placed in order.
4. The median is easy to calculate unless a large number of values are involved.
5. The median can be located even when the data set is incomplete, provided that the
number and general location of all the measurements are known and that exact
information on the size of measurements near the centre of the data set is available. For
example, in surveys it is often difficult to locate and interview the extreme rich and poor.
Provided we know the tail of the distribution from which the observations are missing
and their amount, we can still obtain an average.
X1 X2 X3 100 201 320 440 450 480 510 X11 X12: Median = (320 + 440) / 2 = 380.
Disadvantages of Median:
 It does not consider all variables because it is a positional average.
 It does not enable us for further algebraic treatment.
 The value of median is affected more by sampling fluctuations.
3.2.3 Quantiles

41

Bahir Dar University, Department of Statistics.


There are often employed when summarizing or describing a set of data where it is
necessary to divide the data into equal parts to indicate the position of data values in the
given data set. The most important position indicators are quartiles, deciles and
percentiles and collectively called Quantiles.

3.2.3.1 Quartiles
Quantiles are three points which divide a given ordered data into four equal parts. The
first, second and third points are known as 1 st, 2nd, 3rd quartiles and are denoted by Q 1, Q2
and Q3 respectively.
th
k(n + 1 ¿
For ungrouped data the Kth quartile Qk is the value of the item which is a the
4
position, wher K =1, 2, 3 and n is the total number of observations.
For a grouped data, the computation of three quartiles can be done as follows:
Calculate kn/4 and search for the minimum cumulative frequency which is greater than or
equal to kn/4, k=1, 2, 3. The class corresponding to this cumulative frequency is the k th
quartile class. This is the class where Qk lies. Thus,
kn
Qk = L + c ( −C F ¿ ¿ , k =1, 2, 3.
4 f
3.2.3.2 Deciles
Deciles are nine points which divide a given ordered data into ten equal parts. Each part
contains equal number of elements. The first, second … and ninth points are known as 1 st,
2nd, 3rd… and 9th deciles and are denoted by D1, D2 … and D9 respectively.
th
k(n + 1 ¿
For ungrouped data the Kth deciles Dk is the value of the item which is a the
10
position, wher K =1, 2, 3…9 and n is the total number of observations.
For a grouped data, the computation of nine deciles can be done as follows:

42

Bahir Dar University, Department of Statistics.


Calculate kn/10 and search for the minimum cumulative frequency which is greater than
or equal to kn/10, k=1, 2, 3…9. The class corresponding to this cumulative frequency is
the kth deciles class. This is the class where Dk lies. Thus,
kn
Dk = L + c ( −C F ¿ ¿ , k =1, 2, 3…9.
10 f
3.2.3.3 Percentiles
Percentiles are 99 points which divide a given ordered data into 100 equal parts. Each
part contains equal number of elements. The first, second … and nineteenth points are
known as 1st, 2nd, 3rd… and 99th percentiles and are denoted by P1, P2 … and P99
respectively.
th
k(n + 1 ¿
For ungrouped data the Kth percentiles Pk is the value of the item which is a the
100
position, wher K =1, 2, 3…99 and n is the total number of observations.
For a grouped data, the computation of 99 percentiles can be done as follows:
Calculate kn/100 and search for the minimum cumulative frequency which is greater than
or equal to kn/100, k=1, 2, 3…9. The class corresponding to this cumulative frequency is
the kth percentiles class. This is the class where Pk lies. Thus,
kn
Dk = L + c ( −C F ¿ ¿ , k =1, 2, 3…99.
100 f
Where L = lower class boundary of the kth quartiles/deciles/percentiles class
n= the total number of observations
CF = the less than cumulative frequency corresponding to the class immediately
preceding the kth quartiles/deciles/percentiles class
C= the class width of the quartiles/deciles/percentiles class
F = frequency of the kth quartiles/deciles/percentiles class
N.B.: 1) To compute quantiles, we first sort the data in ascending order (present the row
data in an array from).

43

Bahir Dar University, Department of Statistics.


k(n + 1) ¿ ¿
2) if
4
, k(n + 1 ¿ 10 , k(n + 1 ¿ 100 is a fraction , then we use the average of

the (kn/4)th and (kn/4 +1)th, (kn/10)th and (kn/10 +1)th, (kn/100)th and (kn/100)th
value resapectively.
3) Q2=D5=P50=median of the distributions, P25=Q1, P75=Q3, and Di = Pi*10, i=1, 2, 3,…9.
4) Intuitively, the pth percentile is the value Vp such that p percent of the sample points
are less than or equal to Vp. For example, the median, being the 50 th percentile, that
indicate half of the observation are above and half of the observations are below the
50th percentile.
5) Quantiles have the advantage that being less sensitive to outliers and of not being
much affected by the sample size (n).
3.2.4 Mode
The mode is the value of the observation that occurs with the greatest frequency. A
particular disadvantage is that, with a small number of observations, there may be no
mode. In addition, sometimes, there may be more than one mode such as when dealing
with a bimodal (two-peaks) distribution. It is even less amenable (responsive) to
mathematical treatment than the median.
Find the modal values for the following data: (a) 22, 66, 69, 70, 73. (No modal value) (b)
1.8, 3.0, 3.3, 2.8, 2.9, 3.6, 3.0, 1.9, 3.2, 3.5 (modal value = 3.0 kg). 10, 10, 9, 9, 8, 12, 15,
5 (modal value = 9 and 10). Hence, it is possible for a frequency distribution to have
more than one mode. Distributions with one mode are called unimodal, those with two
modes are called bimodal, and those with more than two modes are called multimodal.
3.2.4.1 Mode of Grouped Data
To find the Modal value for grouped (continuous) frequency distribution, first find the
modal class which is the class that contains the mode which is the class with the highest

frequency. Then to compute the modal value for grouped data, we use the formula:

44

Bahir Dar University, Department of Statistics.


^
X =L+ ( ∆1
∆ 2+ ∆ 1 )* w
, where L = Lower class boundary of the modal class;
w = the class width of the modal class;
∆ 1=f 2−f 1 and ∆ 2=f 2 −f 3;
f 2=frequency of modal class ;
f 1 =frequency of t h e class immediately preciding t he modal class ;
f 3 = frequency of the class immediately succeeding the modal class.
Example
Find the modal value of total circulating albumin in gm for 30 normal males, aged 20-29 years.
Circulating albumin in gm (CL): 100-109 110-119 120-129 130-139 140-149 150-159
Frequency (f): 2 6 6 7 8 1

3.2.4.2 Properties of the Mode


1. The mode is easy for the layman to understand - it is the most typical or popular or
common value. It is easier to imagine the average family having 2 children rather
than 2.24.
2. It is particularly useful in market research where the most common value is the most
relevant.
3. Mode is the most representative value of distribution, it is useful to calculate
modal wage.
4. For open-ended distributions we can calculate mode.
5. We can also calculate mode by using graph.
6. Mode cannot be calculated when frequency distribution is ill-defined. It is not
based on all observations.
Relationship between mean, Median and Mode: In case the symmetrical distribution,
mean, median and mode are coinciding. However, for a moderately asymmetrical

45

Bahir Dar University, Department of Statistics.


distribution, mean and mode usually lie on the two ends and median lies between them
and they have the following important empirical relationship, which is mean – mode =
3(mean - median).

Chapter Four: Measures of Variation


4.1 Introduction
The measure of central tendency helps us in describing a set of data by a single number
or typical value. However, they do not provide us any information about the extent to
which the values differ from one another or from the average value. Hence, to increase
our understanding of the pattern of a data, we must also measure its dispersion-it
indicates the degree to which the numerical data tend to spread or variability about an
average value. The dispersion of the data set gives us additional information that enables
us to judge the reliability of our measure of central tendency: if data are widely dispersed,
then the mean (or median or mode) is not good representative of the data as a whole than
it would be for data with small dispersion. The measures of dispersion also enable us to
compare several samples with similar averages.
Consider the following data sets:
Data sets Mean
Set 1: 60 40 30 50 60 40 70 50
Set 2: 50 49 49 51 48 50 53 50
Set 3: 50 50 50 50 50 50 50 50
The three data sets have a mean of 50, but obviously set 1 is more “spread out” than set 2
and set 3 has no variability. How do we express this numerically? The objective of
measuring this scatter or dispersion is to obtain a single summary figure which
adequately exhibits whether the distribution is compact or spread out. Thus, measures of
dispersion help us in studying the extent to which observations are scattered around the
central value, i.e., the mean, the median or the mode.

46

Bahir Dar University, Department of Statistics.


Our discussion of dispersion will include absolute and relative measures of dispersion.
An absolute measure of dispersion is expressed in terms of same units in which the
original data are given; while relative measures of dispersion is free of units and it is
obtained by dividing an absolute measure of dispersion by a measure of central tendency.
4.1.1 Measures of Absolute Variations
Measures of Absolute Variations are expressed in the same unit in which the original
data are expressed. Hence, these are useful for comparing variations in two or more
distributions where units of measurements are the same. For example, we can compare
the marks of students at two or more departments in a particular course. Range, inter–
quartile range, quartile deviation, mean deviation, variance and standard deviation are
examples of absolute variation measures.
Range
The range is the simplest measure of variation to find. It is simply the highest value
minus the lowest value. That is Range (R)= Maximum - Minimum
Since the range only uses the largest and smallest values, it is greatly affected by extreme
values, which mean it is not resistant to change. It is clear in this case, that we do not take
into account ‘frequencies’ of different groups, if any.
Range = Largest value in the distribution – smallest value in the distribution.
In the case of grouped data, Range= UCB of the last class – LCB of the first class.
Example: Find the range of the following distributions.
a. 23, 42, 20, 30, 35, 21, 45, 33, 23, 23, 20, 42, 29, 20.
b. Class: 3 – 9 9 – 15 15 – 21 21 – 27 27 – 33 33 – 39
Frequency: 4 7 6 15 35 20
Merits of Range
 Simple to calculate and understand.
 It provides broad picture of the scatterings in a data quickly.
Demerits of Range
 Its composition is not based on all observations.
 It is affected by extreme observations.

47

Bahir Dar University, Department of Statistics.


 Sampling fluctuations influences it.
 Cannot be calculated for open–end distributions.
Mean Deviation
The Mean Deviation is the arithmetic mean of the absolute deviations of average items
from any one of its averages, i.e. mean, median or mode.
 If X1, X2, …, Xn are n given observations, then the mean deviation is given by:
1
 | xi  x |
Mean deviation from arithmetic mean = n
1
n
 | xi  Md . |
Mean deviation from median =
1
n
 | xi  mod e |
Mean deviation from mode =
 In the case of frequency distribution:
1
 fi | xi  x |
 Mean deviation from arithmetic mean = n
1
 fi | xi  Md . |
 Mean deviation from median = n
1
 fi | xi  Mode. |
 Mean deviation from mode = n
Self check exercises
For the following distributions, compute the mean deviation from arithmetic mean,
median and mode.
a. 2, 3, 4, 6, 8, 2, 10.
b. Xi: 1 2 3 6 8
f i: 4 6 5 3 5
c. Marks: 0–10 10–20 20–30 30 -40 40–50 50–60 60-70
No of students: 6 5 8 15 7 6 3
Advantages of Mean Deviation
 Easy to calculate and simple to understand.

48

Bahir Dar University, Department of Statistics.


 Its computation is based on all the observations.
 Less affected by extreme observations.
Disadvantages
 It ignores signs and it leads to serious difficulties in inference.
 The mean deviation calculated from different averages (mean, median and mode)
may not be the same.
Variance and standard Deviation
The standard deviation is defined as the positive square root of the arithmetic mean of the
squares of deviations of the observations from arithmetic mean. Standard deviation
obtained from population is denoted by  and sample standard deviation is denoted by
S.
Variance is the average of the squares of the distance each value is form the mean, i.e.,
 2 or S2. Alternatively, the variance is the square of standard deviation.
If X1, X2,..., Xn is a set of n sample of observations, then its standard deviation is given
1 n  1 n 1
S  ( xi  x ) 2 x  xi S [ xi2  nx 2 ]
n  1 j i n j i n 1
by: Where, Or
1 ( x ) 2
S [ xi2  ]
Or n 1 n
In the case of frequency distribution, the standard deviation is given by:
1  fx
S  f (x  x)2 Here x
n 1 f
Where, x is the value of the variable or the mid- value of the class (in the case of
continuous frequency distribution) and f is the corresponding frequency of the value x. If
the values of x and f are large, the computation of fx, fx 2 is quite time consuming. In such
a case, we use the following formula.

s
d 2


( d ) 2
n 1 n(n  1) ……….for ungrouped data. And di = xi – A, where, A is
assumed mean.

49

Bahir Dar University, Department of Statistics.


s
 fd 2


( fd ) 2
n 1 n(n  1) ……...for grouped data. Where, d = Xmi – A, where
i

A is assumed mean & Xm = class midpoint


N

N.B.: σ 2
=
∑ ( X i−μ ¿ )2
in which variance is calculated for the population distribution.
i=1
¿
N
Example
1. Compute the mean, variance and standard deviation for data sets A and B.
A: 10 60 50 30 40 20
B: 40 30 45 35 40 20
After you computed the mean and standard deviation, what did you observe? Comment
on the result.
Solution
MeanA MeanB StDevA StDevB VarianceA1 VarianceB
35 35 18.7083 8.94427 350 80
Both A and B has equal means, but the observations of A is more scattered than B.

2. Find the sample variance and standard deviation for the following distributions.
a) Xi: 3 4 6 8
fi : 2 3 4 2
b) Class: 0 –10 10 –20 20 –30 30 – 40 40 – 50
Frequency 7 6 15 12 10
Solutions: a
xi- x (xi- x )2 f(xi- x )2
-2.27273 5.165289 10.33
-1.27273 1.619835 4.86
0.727273 0.528926 2.12
2.727273 7.438017 14.88
Mean=5.27 Sum=14.7520 Variance=3.22

50

Bahir Dar University, Department of Statistics.


7

and standard deviation=1.79

class
3.
mark frequency xifi xi- x (xi- x )2 fi(xi- x )2

5 7 35 -22.4 501.76 3512.32

15 6 90 -12.4 153.76 922.56

25 15 375 -2.4 5.76 86.4

35 12 420 7.6 57.76 693.12

45 10 450 17.6 309.76 3097.6

sum 50 1370
Variance=169.6
3 and SD = 13.04

The mean and standard deviation of a set of 100 sample observations were worked out
as 40 and 5 respectively by a computer. Later, it was detected that the value 50 was
recorded in place of 40 for one observation. Find the correct variance and standard
deviation.
Solution

 xi  40  xi  40


100 99

5040 
2 2
2

i 1
 i 1
 25
Sd = 2 99 99 99 , but when x=40, then

 xi  40
99

4040 
2
2

i 1

Sd 2= 99 99 that is the new variance and is

 xi  40
99

5040
2
2

i 1
99 =25- 99 =25-100/99=23.98

51

Bahir Dar University, Department of Statistics.


4. The mean of five observations is 4.4 and the variance is 8. 24. If three of the five
1  2  6  x1  x2
4.4 
observations are 1, 2 and 6, find the other two. (Hint: 4 and

14.4  24.4  64.4  x14.4  x 24.4


2 2 2 2 2

8.24= 4 and the


answers are 4 and 9)
5. Based on the following information finds the sample variance and standard deviation

 ( x  7)  7,  ( x  7) 2
 535 and n  15
. (Answer: s  37.98 and s  6.16 )
2

Algebraic Properties of Variance and Standard Deviation


 If a constant, value is added to/subtracted from each observation, the variance and
standard deviation remain the same.
 If we multiplied/divided each value in a given data set by a constant K, then the
new variance and standard deviation will be obtained by multiplying/dividing the
original variance and standard deviation by k2 and k respectively.

( xi  x ) 2  (x  A) 2
 n 1 
i

 n 1 , For any arbitrary value A.


 The combined population variance of K individual groups is given by:
N 1 ( 12  d12 )  N 2 ( 22  d 22 )  .......  N k ( k2  d k2 )
 212.. k 
N 1  N 2  .......  N k

N 1 1  N 2  2  .....  N k  k
d j   i   and  
Where, N 1  N 2  ......  N k

 The combined sample variance of K individual distributions is given by:


(n1  1)( s12  d12 )  (n2  1)( s 22  d 22 )  .......  (nk  1)( s k2  d k2 )
s 2
12.. k 
n1  n2  .......  nk  k

52

Bahir Dar University, Department of Statistics.


d j  xi  x and x  nx 1 1
 n 2 x 2  .....  n k x k

Where, n1  n2  ......  n k

 In particular, when k =2
(n1  1)( s12  d12 )  (n2  1)( s 22  d 22 )
s 212 
n1  n2  2

n 2 X 2  n1 X 1
Where, d1 = X 1  X ; d2 = X 2  X and X = n1  n 2

Example
The mean weight of 150 students is 60 kg. The mean weight of boys is 70 kg with a
standard deviation of 10 kg. For the girls, the mean weight is 55 kg and the standard
deviation is 15 kg. Find the number of boys and the combined standard deviation.
(49)(100  100)  (99)( 225  25)
s 212 
(Hint: 60*150=nb*70+ (150-nb)*55 and 150  2 the

answer: nb = 50 and s  15.28kg . )


The Interquartile Range (IQR) and Quartile Deviation (QD)
IQR is the distance (difference) between the third (Q 2) and first (Q1) quartiles. The
interquartile range is a preferable measure to the range because it is less prone to
distortion by a single large or small value. That is, outliers in the data do not affect the
interquartile range. Also, it can be computed when the distribution has open-end classes.
There is also possible to calculate the quartile deviation (QD) known as semi-inter
quartile range as (Q3 – Q1)/2. Both IQR and QD are location based measures of
dispersion and computed without considering all observations – called crude measures.
Example: Given the following data set (age of patients in year): 18, 59, 24, 42, 21, 23,
24, 32. Then find the inter-quartile range and quartile deviation.

4.1.2 Measures of Relative Variations

53

Bahir Dar University, Department of Statistics.


These measures are used to compare variations of different sets of data measured without
the same standards or units. The measures of relative variations include coefficient of
range, Coefficient of quartile deviation, coefficient of mean deviation, coefficient of
variation and standard scores.
Coefficient of range
L arg est value in the series  Smallest value in the series
The coefficient of range = L arg est value in the series  smallest value in the series
Example
Compare the dispersion in the following distribution of scores on two examinations using
coefficient of range.
Marks in History: 20 40 30 35 25 45 24 35 10 29
Marks in Biology: 30 36 24 44 20 38 25 30 20 22
Solution
For History: CR= (45-10)/(45+10)=0.64 and Biology: CR= (44-20)/(44+20)=0.38.
Therefore, variability in Biology is less as it is smaller than that of History.

The Coefficient of Quartile Deviation


Q3  Q1
The Coefficient of Quartile Deviation = Q3  Q1 where, Q1 and Q2 are the 1st and 3rd
quartiles.

Example
Consider the following hypothetical distribution on height and weight of individuals.
Height (cm): 152 166 174 181 175 172 190 180 178
Weight (kg): 45 52 54 66 62 59 84 90 70
Using coefficient of quartile deviation check whether the height or weight data is more
variable.

54

Bahir Dar University, Department of Statistics.


Solution: For Height: CQ= (180.5-169)/ (180.5+169) =0.033 and Weight: CQ= (77-53)/
(77+53) =0.18. It shows that weight is more variable than that of height.

The Coefficient of Mean Deviation


Mean Deviation
The Coefficient of Mean Deviation (CMD) = Average about which it is calculated
mean deviation form arithmethi cmeam
CMD from arithmetic mean = Arithmetic mean

mean deviation form median


CMD form median = Median
mean deviation form mod e
CMD from mode = Mode
Self check exercise
Compute the coefficient of mean deviation from mode for the following series and state,
which one is more variable.
Brand A: 3 4 2 6 4 7 1 4 3
Brand B: 4 9 7 6 8 7 6 7 8

The Coefficient of Variation (CV)


Whenever two or more distributions have the same unit of measurements or have the
same standards, their variations can be compared directly using standard deviation.
However, when the units of measurements and/or the standards are different we are
prohibited to make use of standard deviation in order to compare their dispersion. Rather
we can compare the variations of such distributions using the coefficient of variation.
The coefficient of variation or the coefficient of standard deviation is the relative measure
of variation corresponding to standard deviation. It is a pure number independent of the

55

Bahir Dar University, Department of Statistics.


units of measurement and thus is suitable for comparing the variability, homogeneity or
uniformity of two or more distributions.
The coefficient of variation is also a useful measure to compare the variability of two or
more distributions measured in the same units but their means are unequal.
The formulae of coefficient of variation for data obtained from population and sample are
 s
CV  x100 % and CV  x 100%
respectively,  x
.

Exercise
a. A sample of 5 items was taken from the output of a factory. The length and weight of
them are given below.
Length (in inches): 5 6 7 9 12
Weight (in ounces): 13 15 18 19 20
Which of the two characteristics is more variable? Why? (Find the variance of length
and weight)
b. The average IQ of students in one calculus class is 110, with standard deviation of 5.
The average IQ of students in another class is 106 with standard deviation 4.Which
class is uniform in terms of IQ?
4.2 The Standard scores (Z – score)
Scores are generally meaningless by themselves unless they are compared to the
distribution or scores from some reference group. The numerical value of the Z-score
reflects the relative measure of standing because of its value. Therefore, Z-score is the
number of standard deviations that a given value X is below or above the mean of the
Xi– X Xi–μ
data and defined as Z = (for the sample data sets) and Z = (for the
S σ
population data sets).
Properties of Z-Score

56

Bahir Dar University, Department of Statistics.


 The mean of the standard scores is zero and the standard deviation is 1. This is the
well known feature of the standard score, no matter what the original scale was.
When the data is converted to its standard score, then the mean is zero and the
standard deviation is 1.
 The standard score tells us the number of standard deviations the original
measurement is from the mean. It is unit-free measure, which is useful to compare
variations from the means of two or more distributions.
 The Z-score is also important to transform a given set of data into a new
distribution such as if all data for a variable are transformed into Z Scores, the
resulting distribution will have a mean 0 and standard deviation 1, which is the
normal distribution.
Example
A student obtained 80 on a civics exam that had a mean of 70 and a standard deviation of
10. The student obtained 60 on calculus exam, which had a mean of 51 and a variance of
64. On which exams did the student perform better? Why?
Solution
Zci= (80-70)/10=1 and Zca= (60-51)/8=9/8=1.1. Therefore, the student had better
performed in Calculus as the Z-score is greater.
Note that upon comparing the variability of two or more distributions in all the above
cases, the greater the value of measure of variation a distribution has, the more variable
data it contains and the smaller the value of measure of variation a distribution has, the
less variable data it contains as compared to the other (s).
4.3 Moments, Skewness and Kurtosis
These are functions of numerical observations that show the degree of departure from
symmetry and the peakedness of frequency distribution.
Moments
Moments are used to measure the skewness and kurtosis.

57

Bahir Dar University, Department of Statistics.


x1, x2, xn
If X is a variable that assumes values … , then the kth raw moment about
arbitrary origin A is:

k 
(X i  A) k
 For population data: N , k = 0, 1, 2, 3, …and

k 
 f (X
i i  A) k
N , for grouped.

mk 
(X i  A) k
mk 
 f (X i i  A) k
 For sample data: n , k = 0, 1, 2, …and n ,
for grouped data.
The kth central moment (centered about the arithmetic mean) is defined as:

 k 
(X i  )k
 For population data: N , k = 0, 1, 2, 3 … and

 k 
 f (X i  )k
N , for grouped data.

mk 
(X i  x) k
 For sample data: n , k = 0, 1, 2, … and

m 
 f (X i  x) k
k
n , for grouped data. And Xi becomes class midpoint in
the case of continuous grouped data.
Remark
 The first central moment is zero.
 The second central moment is the same as variance.
 For a symmetric distribution, all odd central moments are zero.
Example
Calculate the first three raw moments about 0 and the first three moments centered about
the arithmetic mean for the following sample data.

58

Bahir Dar University, Department of Statistics.


1, 2, 3, 3, 4, 5, 6, 8
m
(Answer: m1 = 4, m2 = 20.5, m3 = 122.5, m1 = 0, m2 = 4.5 = 3 )

Relationship between raw moments and central moments:

  2 =  2  1 2

 3 =  3  3 2 1  21
3

 4 =  4  4 3 1  6 2 1  31
2 4

In particular,   1  A and    2  1 , here A is arbitrary origin where the raw


2 2

moment is calculated.
Exercise
The first two moments of a distribution about 1 are 2 and 25 respectively. Find the mean
and standard deviation of the distribution.

Skewness
Skewness is a measure of distortion of a distribution having a single mode. The
frequency distribution of a set of observations is called symmetrical about the mean if the
number of frequencies above the mean is the same as those below the mean.
Alternatively, a distribution is said to be symmetrical if observations are arranged in a
symmetrical order around mean, median and mode. Such a distribution has no skewness.
When a distribution is not symmetrical, it is called skewed. Whenever mean is greater
than the median and the mode, then the distribution is positively skewed, but if the mean
is less than the median and the mode, then the distribution is negatively skewed.

59

Bahir Dar University, Department of Statistics.


Measures of Skewness
1) The Karl Pearson’s Coefficient of Skewness (S K): Karl Pearson (1895) first
suggested measuring skewness by standardizing the difference between the mean and
Mean  Mode
the mode, that is, SK= S tan dard deviation

3( Mean  Median )
S tan dard deviation
K=

Or from the empirical relationship S

Properties of Skewness
 If SK = 0, then the distribution is symmetrical.
 If SK > 0, then the distribution is positively skewed.
 If SK < 0, then the distribution is negatively skewed.
 There is no theoretical limit to this measure, however, in practice the value
given by this formula falls between -3 and 3.
Example
The following facts are gathered before and after an industrial dispute, then compare the
position before, and after dispute in respect of skewness.
Before dispute After dispute
Number of workers: $ 515 $ 509
Mean wages: $49.50 $52.75
Median wages: $52.80 $50.00

60

Bahir Dar University, Department of Statistics.


Variance of wages: $121.00 $144.00
Answer: Skewness before = 1.87428 and Skewness after = 1.77374

Exercise
The arithmetic mean, the median and coefficient of variations for a distribution are 30,
33 and 40% respectively. Find the coefficient of skewness.
2) The Bowley’s Coefficient of Skewness (S b): it is based on the relative positions of
the median and the two quartiles.
(Q3  median)  (median  Q1 ) Q3  Q1  2median
Sb = (Q3  median)  (median  Q1 ) = Q3  Q1

N.B.: i. For a symmetric distribution twice the median is the same as the sum of
the two quartiles and hence Sb is zero; If Sb <0, then the distribution is
negatively or left skewed; and If Sb >0, then the distribution is positively
or right skewed.
ii. Note that the Bowley’s measure of skewness is recommended when the
mode is ill defined and/or the distribution has open end classes as well as
unequal class intervals.
3) Measure of Skewness Based on Moments (Sm)
 3  3
3
(  2 ) 3
Sm = (  2 )
 2
 Or Sm = … for population data.
m3 m3
3
 (m2 ) 3
Sm = (m2 )
2
 Or Sm = … for sample data.
The interpretation of Sm is the same as the above two.
Exercise
Find the coefficient of skewness based on the following summary statistics that are
obtained from a certain distribution.
a. Q1 = 8, Q3 = 20 and Q2 = 11

61

Bahir Dar University, Department of Statistics.


b. The first three central moments of a distribution are -1.5, 17 and -30
respectively
Kurtosis
Kurtosis is a measure of whether the data are peaked or flat relative to a normal
distribution. That is, data sets with high kurtosis tend to have a distinct peak near the
mean, decline rather rapidly and have heavy tails. Data sets with low kurtosis tend to
have a flat top near the mean rather than a sharp peak. A uniform distribution would be
the extreme case. The normal curve is mesokurtic. When the curve of a distribution is
relatively flatter than normal it is known as platykurtic and when the distribution is more
peaked than normal, it is called leptokurtic.
A measure of kurtosis (K) is defined in terms of the second and fourth central moments.
 4
 K = (  2 ) … for population data.
 2

m4
 K= (m2 ) 2 … for sample data.
Interpretation of the value of K
1. If K =3, then the distribution is mesokurtic.
2. If K > 3, then the distribution is leptokurtic.
3. If K < 3, then the distribution is platykurtic.
If we want to our reference point to be zero, we can change the above coefficient as:
 4
 φ = (  2 ) - 3… for population data.
 2

m4
 2
 φ = (m2 ) - 3… for sample data
Accordingly, If φ =0, then the distribution is said to be mesokurtic.
If φ > 0, then the distribution is said to be leptokurtic.
If φ < 0, then the distribution is said to be platykurtic.

62

Bahir Dar University, Department of Statistics.


Exercise
Determine the type of distribution based on kurtosis for the datasets given below.
A. X: 3 1 2 3 5
f: 2 1 4 2 3 (Answer: K=0.42 which is platykurtic).
B. The first four moments of a distribution about the value 5 of the variable are 2, 20, 40
and 50. Find the mean, variance, S m and K. Comment on the nature of the

distribution. (Answer:  = 7,  = 16, Sm = 1 and K = 0.63)


2

C. Find out the lower and upper quartiles, the coefficient of quartile deviation (CQD) and
the coefficient of skewness from the following information. Sum of the two quartiles
= 110, Difference of the two quartiles = 26 and Median = Double the difference of the
upper and lower quartiles. (Answer: CQD = 0.236, Q1 = 42, Q3 = 68 and Sk = 0.23)
Review Exercises
1. Calculate geometric mean for 9 and 16
2. Suppose that the arithmetic mean and geometric mean of two observations are 25 and
20, respectively. Then find the harmonic mean.
3. If for a certain distribution the coefficient of variation = 20% and the mean = 10,
calculate it standard deviation and variance.
4. Find the value of the first quartile of: 2, 4, 6, 8, 10, 12, and 14.
5. Find quantiles for: 5, 7, 7, 8, 10, 11, 12, 15, and 17.
6. Suppose that you obtianed: 16, 17, 18, 19, and 20 on your last five quizzes. If you get
20, 20 and 20 on the next three quizzes, which of the following would change?
I. Q1 II. Q2 III. Q3 IV. Minimum V. maximum
7. Calculate quartile deviation and co-efficient of quartile deviation if Q1=20 and Q3 =40
8. Consider the monthly salaries of 5 individuals: 2000, 2500, 2500, 2800, 4000
 calculate the mean salary,
 find variance and standard deviation for the salary.

63

Bahir Dar University, Department of Statistics.


8. Alemu is applying to engineering this year, and wants to calculate his average.
The following is a summary of his marks, and the weight factor the university
uses.
Subject Mark Weight factor
Discrete Math (requirement) 90% 3
Calculus (requirement) 89% 3
Physics (requirement) 88% 2
Data Management (requirement) 85% 3
English (requirement) 70% 4
Philosophy (elective) 94% 1

a) What is the un-weighted mean? (b) What is the weighted mean?


c) Is there a difference between the values obtained in (a) and (b)?
10. Five households have total annual incomes of £10,000, £24,500, £15,000, £21,500, and
£13,000 (i) calculate the median level of income (ii) Calculate the median if there was an
additional observation i.e., £9,000.
11. Data on age of employees of a certain factory are given below.
Age group 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64
number of
employees 17 22 27 32 37 42 47 52 57 62

A. Calculate the sample variance and standard deviation.


B. Calculate the Pearson's coefficient of skewness.
12. The following summary statistics are obtained from two groups of datasets:
A : x = 34.8 s = 9.8 CV(A) = 28.2%
B : x = 20.6 s = 9.8 CV(B)= 47.6% Which one has the wider dispersion?

13. The arithmetic mean and standard deviation of a series of 20 items were computed as
20 and 5 respectively. While calculating these, an item 13 was misread as 30. Find the
correct mean and standard deviation.

64

Bahir Dar University, Department of Statistics.


14. Consider the following grouped frequency distribution and find: (i) the median (ii)
the mode (iii) the variance (iv) coefficient of variation (v) coefficient of skewness (vi)
coefficient of kurtosis.
Class (cm) 11-13 14- 17-19 20-22 23-25
16
11 20 30 15 4

Chapter 5: Elementary Probabilities

5.1 Introduction
The subject of probability can be traced back to the 17 th century when it arose out of the
study of gambling games. As we will see the range of applications extend beyond games
into business decisions, insurance, law, medical tests, investments, weather forecasting
and the social sciences. The telephone network, call centers, and airline companies with
their randomly fluctuating loads could not have been economically designed without
probability theory. Consequently, Probability as a general concept can be defined as the
chance of an event occurring.
“Probability is basically common sense reduced to calculation; it makes us appreciate
with exactitude what reasonable minds feel by a sort of instinct.” So said Laplace. In the
modern scientific and technological world, it is even more important to understand
probabilistic argument. This module introduces the tools needed for probability and goes
on to use them in simple situations related to repeated experiments with applications to
quality control.
Probability axioms and simple properties

65

Bahir Dar University, Department of Statistics.


The real valued function P(.) is a probability measure if it acts on subsets of S and obeys
the following axioms:
I. P(S) = 1. II. If A⊆ S then P(A) ≥0.
III. If A and B are disjoint (A∩B =∅ ), then P(A∪B) = P(A)+ P(B) : Repeated use of
Axiom III gives the more general result that if A1, A2, . . ., An are mutually disjoint, then P
n
(¿ i=1 ¿ n A i) = ∑ P(Ai). Indeed, we will assume further that the above result holds even if
i=1

we have a countable infinite collection of disjoint events (n = ∞ ).


These axioms are all we need to develop a theory of probability, but there are a collection
of commonly used properties which follow directly from these axioms, and which we
make extensive use of when carrying out probability calculations.
Property A: P(Ac) = 1 - P(A).
Property B: P(∅ ) = 0.
Property C: If A⊆ B, then P(A) ≤P(B).
Property D: Addition Law, P(A∪B) = P(A)+ P(B) - P(A∩B).
5.2 Fundamental Counting Principle
5.2.1 Factorials
If n is a positive integer, then: n! = n (n-1)! Or n! = n (n-1) (n-2) ... (3)(2)(1) and a special
case is 0! = 1.
5.2.2 Permutations
Suppose that we have a collection of n objects and we want to put them in an order after
making r selections, then the number of permutations of r out of n objects refers
permutation.
Therefore, permutation is an arrangement of objects without repetition where order is
important using all the objects and/or n objects arranged into one group of size r.
n!
The above definition is given as Pnn = n! and/or Pnr =
( n−r ) !
, respectively.

66

Bahir Dar University, Department of Statistics.


Example 1: Find all permutations of the letters "ABC"
Solutions: ABC, ACB, BAC, BCA, CAB, CBA
Example 2: Find all two-letter permutations of the letters "ABC"
Solutions: AB, AC, BA, BC, CA, CB
Shortcut formula for finding a permutation assuming that you start at n and count down
to 1 in your factorials... P (n, r) = first r factors of n factorial
If a word has N letters, k of which are unique, and you let n (n 1, n2, n3, ..., nk) be the
frequency of each of the k letters, then the total number of distinguishable permutations is
given by:

Example 3: Consider the word "STATISTICS". Find the permutations of the word.
Solution: Here is the frequency of each letter: S=3,T=3,A=1,I=2,C=1, there are 10 letters
10!
in total and 10! =10*9*8*7*6*5*4*3*2*1. Therefore, Permutations = 3!3!1!2!1! = 50400.
5.2.3 Combinations
Suppose that we have a collection of objects and that we wish to make r selections from
this list of objects where the order does not matter. An unordered selection such as this is
referred to as a combination.
Note: The difference between a permutation and a combination is not whether there is
repetition or not, there must not be repetition with either, and if there is repetition, you
cannot use the formulas for permutations or combinations. The only difference in the
definition of a permutation and a combination is whether order is important.
A combination of n objects, arranged in groups of size r, without repetition, and order is:
n!
n
C r =¿
r ! ( n−r ) !
.
Example 1: Find all two-letter combinations of the letters "ABC"

67

Bahir Dar University, Department of Statistics.


Solution: AB = BA, AC = CA, BC = CB

5.3 Definitions of some Probability Terms


Experiment: In statistics anything that results in a count or a measurement is called an
experiment.
Sample Space: is the set of all possible outcomes of a given experiment. Consider the
experiment of flipping two coins. It is possible to get 0 heads, 1 head, or 2 heads. Thus,
the sample space could be {0, 1, and 2}. Another way to look at it is flip {HH, HT, TH,
and TT}. The second way of representing the sample space is better in this case because
each event is as equally likely to occur as any other. When writing the sample space, it is
highly desirable to have events, which are equally likely.
Another example is rolling two dice. The sums are {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12}.
However, each of them is not equally likely. The only way to get a sum 2 is to roll a 1 on
both dice, but you can get a sum of 4 by rolling a 1&3, 2&2, or 3&1.
Event: is a subset of the sample space and it is a set of possible outcomes that consists of
one or more outcomes of probability experiment.
Simple Event: is the same as one outcome. An event consisting of a single point of S is
called a simple or elementary event. A die is thrown and a ‘6’ shows, flipping a coin and
a ‘Head’ turns up are examples of simple events. An event is a subset A of the sample
space S, i.e., if the outcome of an experiment is an element of A, we say that the event A
has occurred.
Sure (Certain) Event: Event ‘E’ is Said to be sure (certain) event if E=Sample space.

Impossible event: Event “E” is called impossible event if E =  (a subset containing


none of the points in a sample space).

Mutually Exclusive Events: Two events are mutually exclusive if they cannot occur at
the same time. Means, mutually exclusive event are disjoint. Thus, the probability of both
occurring at the same time is 0 if two events are disjoint, i.e. P(A ∩B) =0. And if two

68

Bahir Dar University, Department of Statistics.


events are mutually exclusive, then the probability of either occurring is the sum of the
probabilities of each occurring.
Specific Addition Rule: only valid when the events are mutually exclusive.
P (A ∪B) = P (A) + P (B).
Example: If we consider P (A) = 0.20, P (B) = 0.70, and A and B are disjoint, then the
probability of either A or B can be found by:
P (A ∪B) = P(A)+P(B)- P(A ∩B) = 0.20+0.70-0=0.90.
It is preferred to use what is called a joint probability distribution. (Since disjoint means
nothing in common, joint is what they have in common).
5.3.1 Conditional Probability
Sometimes the chance of a particular event happens depends on the outcome of some
other event. Let A be an event with non-zero probability, and let B be any event. The
conditional probability of B given A is defined as P(B│A), which is read as “the
probability event B occurs given that event A has already occurred .” From this we are
led to the definition P (B|A) = P (A∩ B) / P (A). Thus, the probability of an event
occurring given that another event has already occurred is called a conditional
probability.
Example
Consider the data given in the following table and answer the questions below.
Do you smoke?
se sex . Yes No Total
Male 19 41 60
Female 12 28 40
Total 31 69 100

 What is the probability that a randomly selected individual is a male smoker? This
is just a joint probability. The number of "Male and Smoke" divided by the total =
19/100 = 0.19

69

Bahir Dar University, Department of Statistics.


 What is the probability that a randomly selected individual is a male? This is total
number of males divided by total number of individuals = 60/100 = 0.60.
 What is the probability that a randomly selected individual smokes? This is
obtained from the ratio of people who smoke and total number of individuals =
31/100 = 0.31.
 What is the probability that a randomly selected male smokes? In this case,
selection will be made from among males only. Thus, 19 males smoke out of 60
males and hence the required probability will be 19/60 = 0.31666...
 What is the probability that a randomly selected smoker is male? In this case, you
are told that selection will be made from among the entire smokers There are 19
male smokers out of 31 total smokers, the required probability will be 19/31 =
0.6129.
It often happens that the knowledge that a certain event E has occurred has no effect on

the probability that some other event F has occurred, that is, that P ( E | F )  P ( E ) . One

would expect that in this case, the equation P ( F | E )  P ( F ) would also be true. If these
equations are true, we might say the F is independent of E.

Definition: Two events E and F are independent if both E and F have positive

probability and if P( E | F )  P( E ) and P( F | E )  P( F )

Note that: If P ( E )  0 and P ( F )  0, then E and F are independent if and only if


P( E  F )  P( E ) P( F )

Example 5.14: Suppose that we roll a pair of fail dice, so each of the 36 possible out
come is equally likely. Let A denotes the event that the first die lands on
3, let C be the event that the sum of the dice is 7

A. Are A and B independent?

70

Bahir Dar University, Department of Statistics.


B. Are A and C independent

Solution:

A. Since A  B is the event that the first die lands on 3 and the second on 5, we see
that

1
P ( A  B )  P ((3,5)) 
36

On the other hand

6
P ( A)  P ((3,1), (3,2), (3,3), (3,4), (3,4), (3,6)) 
36 and

P ( B )  P ((2,6), (3,5), (4,4), (5,3), (6,2))  5


36

1 (6 ).( 5 ),
Therefore, since 36  36 36 we see that P( A  B)  P( A) P( B) and so
events A and B are not independent

B. Events A and C are independent. This is seen by noting that

1
P ( A  C )  P (3,4) 
36

1 6
P ( A)  P (C )  P ((1,6), (2,5), (3,4), (4,3), (5,2), (6,1)) 
While 6 and 36 .Therefore,
P ( A  C )  P ( A).P (C ) and so events A and C are independent.

71

Bahir Dar University, Department of Statistics.


Chapter 6: Probability Distributions

Definition
A random variable is a variable that assumes numerical values associated with the
random outcomes of an experiment, where one (and only one) numerical value is
assigned to each sample point. A random variable can be either discrete or continuous.
Example
Consider an experiment of counting the number of customers who use the drive-up
window of a bank each day. The random variable can be: “the number of customers”
and the possible values of this random variable range from 0 to the maximum number of
customers the window could possibly serve in a day.
Random variables that can assume a countable number of values are called discrete.
Example
1. The number of sales made by a salesperson in a given week: x = 0, 1, 2,. ..

72

Bahir Dar University, Department of Statistics.


2. The number of consumers in a sample of 500 who favor a particular product over all
competitors: x = 0,1,2,. . . ,500
3. The number of bids received in a bond offering: x = 0,1,2,. . .
Random variables that can assume values corresponding to any of the points contained in
one or more intervals are called continuous.
Example

1. The length of time between arrivals at a supermarket: x  0

2. The weight of a food item bought in a supermarket: x  0


Remark
 A random variable does not mean that the values can be anything (a random
number).
 Random variables have a well-defined set of outcomes and well-defined
probabilities.
A probability distribution is a pictorial display of the probability—P(x)— for any value
of a random variable, x. Just like a random variable, a probability distribution can be
either discrete or continuous.
The probability distribution of a discrete random variable is a graph, table, or formula
that specifies the probability associated with each possible value the random variable can
assume.
Two requirements must be satisfied by all probability distributions for discrete random
variables, namely,
p(x)  0, for all values of x
 p(x) 1
The probability that a continuous random variable will be between limits a and b is given
by an integral, or the area under a curve.
b

 f(x)dx
Pr [a < X < b] = a .The function f(x) is called a probability density function.

73

Bahir Dar University, Department of Statistics.


Introduction to Expectation of Random Variable
Since a probability distribution for a random variable x is a model for a population
relative frequency distribution, we can describe it with numerical descriptive measures,
such as its mean and standard deviation.
The expected value (or mean) of a random variable x, denoted by the symbol E(x), is
defined as follows:
Let x be a discrete random variable with probability distribution p(x). Then the mean or
'
expected value of x is E(X) = ∑ xp (x).
all x

The second important numerical characteristics of random variable are its variance and
standard deviation, which are defined as follows:
Let x be a discrete random variable with probability distribution p(x). Then the variance
of x is
' '

σ = E[(X - μ ¿ ¿ =∑ (x−μ) p(x )=¿ ∑ x 2 p (x) – ¿


2 2 2

all x all x

The standard deviation of x is the positive square root of the variance of x, i.e. σ = √ σ 2 .

Let X be a continuous random variable with density function f(x). Then the mean or the
+∞

expected value of X is given by: E(X) = ∫ x f ( x ) dx.


−∞

Let X be a continuous random variable with density function f(x) and g(x) is a function of
+∞

x. Then the mean or the expected value of g(X) is given by: E[g(x)] = ∫ g(x )f ( x ) dx .
−∞

Let X be a continuous random variable with the expected value E(X) = μ . Then the
+∞ +∞

variance of X is σ = E[(X - μ ¿ ¿ = ∫ ¿ ¿
2 2
∫ x 2 f ( x ) dx - ¿
−∞ −∞

The standard deviation of X is the positive square root of the variance, i. e. σ = √ σ 2 .


Example

74

Bahir Dar University, Department of Statistics.


1. A balanced coin is tossed twice, and the number X of heads is observed. Then,
find the probability distribution of X, the mean and variance of X.
2. A team of 3 is to be chosen from 3 boys and four girls. if X is a random variable
‘the number of girls in the team’, then find the probability distribution and hence
the mean and variance of X.
3. The pdf of the age of babies, x years being brought to a post-natal Clinic is given
by:
f(x) = ¾ x (2 – x),0<x <2. Then, find the mean and variance for the random
variable X.

4. The percentage X of impurities per batch in a certain chemical product is a


random variable with probability density function is given by: f(x) = 9 x (1 - x),
0 < x < 1. Then find the mean and SD of X.

Common Discrete Probability Distributions


A. Binomial Distribution
Binomial Experiment: this is an experiment, which satisfies the following four
conditions.
- There are a fixed number of trials in the experiment.
- Each trial is independent of the other(s).
- There are only two outcomes: success or failure in a single trial.
- The probability of each outcome remains constant from trial to trial.
These can be summarized as an experiment with a fixed number of independent trials
each of which can only have two possible outcomes. The fact that each trial is
independent actually means that the probabilities remain constant.
Examples of Binomial Experiments
1. Tossing a coin 20 times to see how many tails occur.
2. Asking 200 people whether they watch ABC news.

75

Bahir Dar University, Department of Statistics.


3. Rolling a die to see if a 5 appears.
An example which is not a binomial experiment: rolling a die until a 6 appears
The binomial distribution is given by:
n
P(X  x)    p x (1  p) n  x ; x  0,1, . . ., n
x
Where, p: probability of success on a single trial
q=l-p
n = Number of trials
x = Number of successes in n trials
Example
A machine that produces stampings for automobile engines is malfunctioning and
producing 5% defectives. The defective and non-defective stampings proceed from the
machine in a random manner. If the next five stampings are tested, find the probability
that three of them are defective.
Solution
Let x equal the number of defectives in n = 5 trials. Then x is a binomial random variable
with p, the probability that a single stamping will be defective, equal to .05, and q = 1-
0.05 = 1 – 0.05 = 0.95. The probability distribution for x is given by the expression:
 5
P(X  3)    0.053 (1  0.05) 53
 3
5!
 (0.05) 3 (0.95) 2
3! (5 - 3)!
5x4x3x2x1
 (0.05) 3 (0.95) 2
3x2x1(2x1)
B. Multinomial Distribution
Suppose that events A1, A2, … , Ak are mutually exclusive, and can occur with respective
probabilities p1, p2, …, pk where p1 + p2 +…+ pk = 1. If X1, X2, … , Xk are random
variables giving the number of times that A1, A2,…, Ak occur respectively in a total of n

76

Bahir Dar University, Department of Statistics.


n!
p(X  x 1 , X  x 2 ,.., X  x k ) 
trials so that X1 + X2 + ... Xk = n, then n1!n 2 !... n k
p1n1p2n2…p3nk is known as multinomial distribution. This distribution is a generalization of
the binomial distribution since the equation above is the general term in the multinomial
expansion of (p1 + p2 +…pk)n.
Note
 A multinomial probability distribution is an extension of the binomial
distribution.
 The experiment consists of n identical trials.
 There are k possible outcomes to each trial.
 The probabilities of the k outcomes, denoted by p,, p2, . . . , pk, remain the same
from trial to trial, where p, + p2 + . . , pk = 1.
 The trials are independent.
Example
The probability that a person will pass a College Algebra class is 0.55, the probability
that a person will withdraw before the class is completed is 0.40, and the probability that
a person will fail the class is 0.05. In a class of 30 students, find the probability that
exactly 16 pass, 12 withdraw, and 2 fail.
Outcome X p(outcome)
Pass 16 0.55
Withdraw 12 0.40
Fail 2 0.05
Total 30 1.00
n!
p
n1!n 2 !... n k p1n1p2n2…p3nk
30! 16 12 2

p = 16!12!2!
0.55 0.40 .05

77

Bahir Dar University, Department of Statistics.


C. Poisson Distribution
A type of probability distribution that is often useful in describing the number of events
that will occur in a specific period of time or in a specific area or volume is the Poisson
distribution.
Typical examples of random variables for which the Poisson probability distribution
provides a good model are:
- the number of industrial accidents per month at a manufacturing plant,
- the number of death claims received per day by an insurance company,
- the number of customer arrivals per unit of time at a supermarket checkout
counter.
e λ λ x
P(X  x)  ; x  0,1, .3, . . .
The Poisson distribution is given by: x!

Where, λ is the mean number of occurrences.


Example
If there are 500 customers per eight-hour day in a checkout line, what is the probability
that there will be exactly 3 in line during any five-minute period?
Solution
There are 96 five-minute periods in eight hours. Therefore, the expected value during any
five minute period would be 500 / 96 = 5.2083333, i.e., you expect about 5.2 customers
in 5 minutes. Thus, the probability of getting exactly 3 customers in line is:

e 500/96 500/96 
3

P (3, 500/96) = 3! = 0.1288
D. Hypergeometric Distribution
Hyper geometric experiments occur when the trials are not independent of each other and
occur due to sampling without replacement.
Characteristics of Hypergeometric distribution

78

Bahir Dar University, Department of Statistics.


 The experiment consists of randomly drawing n elements without replacement
from a set of N elements, r of which areSuccess and (N-r) of which are failures.
 The sample size n is large relative to the number N of elements in the population
i.e., n/N >0.05.
 The hypergeometric random variable X is the number of Success in the draw of n
elements.

The hypergeometric probability distribution is given by:


( x )( n−x )
r N−r
,
(n)
N

x= {max[o ,n−( N −r )], …


min(r , n)
Example: an experiment is conducted to select a suitable catalyst for the commercial
production of ethylenediamine (EDA, a product used in soaps. Suppose a chemical
engineer randomly selects 3 catalysts for testing from among a group of 10 catalysts, 6 of
which have low acidity and 4 of which have high acidity. (a) Find the probability that no
highly acidic catalyst is selected and (b) the probability that exactly one highly acidic
catalyst is selected. [Answer: (a) 1/6 and (b) ½]
Common Continuous Probability Distributions
A. Normal Distribution
A random variable X is normal or normally distributed with parameters μ and σ2,
(abbreviated N(μ, σ2)), if it is continuous with probability density function:
1 x  μ 2
1 ( )
f(x)  e2 σ

σ 2Π   x  ; σ  0 and    μ  
Properties of the Theoretical Normal Distribution
1. The curve is bell-shaped.
2. The mean, median and mode are equal and located at the center of the
distribution.

79

Bahir Dar University, Department of Statistics.


3. It is uni-modal
4. The curve is symmetrical about the mean.
5. The curve is continuous, i.e., for each X, there is a corresponding Y value.
6. It never touches the X axis.
7. The total area under the curve is 1 and half of it is 0.5000
8. The areas under the curve that lie within one standard deviation, two and three
standard deviations of the mean are approximately 0.68 (68%), 0.95 (95%) and
0.997 (99.7%) respectively.
Graphically, it can be shown as:

Standard Normal Distribution


The standard normal distribution is a normal distribution with, μ = 0 and σ = 1. A
random variable with a standard normal distribution, denoted by the symbol z, is called a
standard normal random variable.
Convert all normal random variables to standard normal in order to easily obtain the area
under the curve with the help of the standard normal table.
Finding Area under the Standard Normal Distribution Curve
i. Draw the picture
ii. Shade the desired area /region
i. If the area/region is:
 between 0 and any Z value, then look up the Z value in the table,

80

Bahir Dar University, Department of Statistics.


 in any tail, then look up the Z value to get the area and subtract the area
from 0.5000,
 between two Z values on the same side of the mean, then look up both Z
values from the table and subtract the smaller area from the larger,
 between two Z values on opposite sides of the mean, then look up both Z
values and add the areas,
 less than any Z value to the right of the mean, then look up the Z value from
the table to get the area and add 0.5000 to the area,
 greater than any Z value to the left of the mean, then look up the Z value and
add 0.5000 to the area,
 in any two tails, then look up the Z values from the table, subtract the areas
from 0.5000 and add the answers.
Note that finding the area under the curve is the same as finding the probability of
choosing any Z value at random.
Example
1. Find the area under the standard normal distribution curve between Z =0 and Z
= 2.31 (Ans: 0.4896), Z =0 and Z = -179 (Ans: 0.4633), Z = 2.01 and Z = 2.34
(Ans: 0.0126), Z = -1.35 and Z = - 0.71 (Ans: 0.1504), Z = 1.21 and Z = -2.41
(Ans: 0.8789), to the right of Z = 1.54 (Ans: 0.0618), to the left of Z = 1.75
(Ans: 0.9599)
2. If the scores for an IQ test have a mean of 100 and a standard deviation of 15,
find the probability that IQ scores will fall below 112.
Solution
IQ ~ N(100, 225)
Y  μ 112  100
P(Y  112)  P[  ]
σ 15
 P[Z  .800]  0.500  P(0  Z  .800)  0.500  0.2881  0.7881
B. Chi-square Distribution

81

Bahir Dar University, Department of Statistics.


Consider n independent random variables with the standard normal distribution. Call

these variables Zi, i = 1, 2, . . . , n. The statistic X2 = Z 2


i
is also a random variable
having a chi-square distribution.

The chi-square distribution contains only one parameter called the degrees of freedom,
and is
equal to the number of Z values in the sum of squares.
Characteristics of the χ2 Distribution
- χ2 values cannot be negative since they are sums of squares.
- The χ2 distribution is non-symmetric.
- The mean of the χ2 distribution is its degree of freedom (n), and the variance is
2n.

- For large values of n (usually greater than 30), the χ2 distribution may be
approximated by the normal.

- The degrees of freedom when working with a single population variance is n-1.
A common use of the χ2 distribution is to describe the distribution of the sample
variance. Let Y1, Y2, . . . , Yn be a random sample from a normally distributed
population with mean = μ and variance = σ2. Then the quantity (n − 1)S2/σ2 is a
random variable whose distribution is described by a χ2 distribution with (n − 1)
degrees of freedom.
C. Student’s t-Distribution
This distribution is quite similar to the normal in that it is symmetric and bell shaped.
However, the t distribution has “fatter” tails than the normal. That is, it has more
probability in the extreme or tail areas than does the normal distribution, a characteristic
quite apparent for small values of the degrees of freedom, but barely noticeable if the
degrees of freedom exceed 30 or so.
It is symmetric about its mean

82

Bahir Dar University, Department of Statistics.


 It has a mean of zero
 It has a standard deviation and variance greater than 1.
 There are actually many t distributions, one for each degree of freedom
 As the sample size increases, the t distribution approaches the normal distribution.
 It is bell shaped.
 The t-scores can be negative or positive, but the probabilities are always positive.
D. F-Distribution
2
Suppose that a random variable X has a χ distribution with n1 degrees of freedom and a
2
random variable Y has χ distribution with n2 degrees of freedom. Suppose also that these
two chi-square variables are independent. Then the ratio of the two divided by their
respective degrees of freedom is the F –Distribution.

Thus, F-distribution is formed by the ratio of two independent chi-square variables


χ 12 (v 1 )/v 1
F(v1 , v 2 ) 
divided by their respective degrees of freedom, i.e., χ 22 (v 2 )/v 2

Since F is formed by chi-square, many of the characteristics in chi-square are also


possessed by the F distribution.
 The F-values are all non-negative.
 The distribution is non-symmetric.
 The mean is approximately 1.
 There are two independent degrees of freedom, one for the numerator (v 1), and
one for the denominator (v2).
 A different table is needed for each combination of degrees of freedom.
Review Exercises
1. Determine the probability of each of the following events:
(i) an even number appears in a single throw of a fair die,
(ii) a king appears in drawing a single card from an ordinary deck of 52 cards,

83

Bahir Dar University, Department of Statistics.


(iii) at least one tail appears in a toss of three fair coins,
(iv) an even score or a score divisible by three, appears in the throw of a dice.

2. A bead is drawn from a bag containing 6 red beads, 4 green beads, 2 yellow beads and
3 blue beads. What is the probability that a bead drawn at random
(i) is either a red bead or blue bead
(ii) is neither a yellow nor a red bead

3. What is the probability of obtaining


(i) two fives in a single through of two dice?
(ii) one three and one four in a single through of two dice?
4. A company has a security system comprising four electronic devices (A, B, C, and D)
which operate independently. Each device has a probability of failure of 0.1. The
four electronic devices are arranged so that the whole system operates properly if at
least one of A or B functions and at least one of C or D functions. What is the
probability that the whole system functions properly?
5. A class has 12 boys and 3 girls. If three students are selected at random from the class,
what is the probability that they are all boys?
6. Near-miss collisions near an airport occur on average once every 20 days. In 5
consecutive days, what is the probability that no near-miss collisions occur?
7. A firm submits tenders for two different contracts. The probability that the first tender
will be successful is 60% and the probability that the second tender will be
successful is 40%. Calculate the probability that:
(i) both will be successful,
(ii) neither will be successful,
(iii) only the first will be successful,
(iv) only the second will be successful,
(v) at least one will be successful.

84

Bahir Dar University, Department of Statistics.


8. When you give a casino $5 for a bet on the pass line in the game of craps, there is a
24/495 probability that you will win $5 and a 252/495 probability that you will lose
$5. What is your expected value? In the long run, how much do you lose for each
dollar bet?
9. The Nile Insurance share Company charges Alemu birr 250 for a one-year birr 100000
life insurance policy. Because Alemu is 21 years old male, there is a 0 .9985
probability that he will live for a year.
a. From Alemu’s perspective, what are the values of the two different outcomes?
b. If Alemu purchases the policy, what is his expected value?
c. What would be the cost of the insurance policy if the company just breaks even
(in the long run with many such policies), instead of making a profit?

Chapter 7: Sampling Techniques

It is incumbent on the researcher to clearly define the target population. There are no
strict rules to follow, and the researcher must rely on logic and judgment. The population
is defined in keeping with the objectives of the study.

85

Bahir Dar University, Department of Statistics.


Sometimes, the entire population will be sufficiently small, and the researcher can
include the entire population in the study. This type of research is called a census study
because data is gathered on every member of the population.

Usually, the population is too large for the researcher to attempt to survey all of its
members. A small, but carefully chosen sample can be used to represent the population.
The sample reflects the characteristics of the population from which it is drawn.

7.1 Methods of Sampling


Sampling methods are classified as either probability or non probability. In probability
samples, each member of the population has a known non-zero probability of being
selected. Probability methods include random sampling, systematic sampling, and
stratified sampling. In non probability sampling, members are selected from the
population in some nonrandom manner. These include convenience sampling, judgment
sampling, quota sampling, and snowball sampling. The advantage of probability
sampling is that sampling error can be calculated. Sampling error is the degree to which a
sample might differ from the population. When inferring to the population, results are
reported plus or minus the sampling error. In non probability sampling, the degree to
which the sample differs from the population remains unknown.

Random sampling is the purest form of probability sampling. Each member of the
population has an equal and known chance of being selected. That is, simple random
sampling takes the selection of every possible combination of the desired number of units
equally likely. To under take the sample selection there are two types of random
sampling: Sampling with replacement and sampling without replacement.

Sampling without replacement (swr) means that once a unit has been selected, it can’t
be selected again. In other words, this means that no unit can appear more than once in

86

Bahir Dar University, Department of Statistics.


the sample. If there are n sample units required for selection from a population having N

units, then there are ( Nn ) ways of selecting n units. Hence, simple random sampling is
1
equivalent to the selection of the
N
n ( )
possible samples with an equal probability N
n ( )
assigned to each sample.

In simple random sampling without replacement the probability of a specified unit of the
population being selecting at any given draw is equal to the probability of its being
1
selected at the first draw, that is, . However, for a sample of sizen, the probability of
N
n
including a specified unit is .
N

Sampling with replacement(sw): This process allow for a unit to be selected on more
than one draw. There are N n ways of selecting n units out of the total N units with
replacement. In this case, the order of selection will be considered. All selections are
independent since the selected unit is returned to the population before making the next
1
selection. Thus, the probability is for any specific element on each of the n draws.
N

Note that: Simple random sampling with and without replacement is practically identical
if the sample size is a very small fraction of the population size. Generally,
sampling without replacement yields more precise results and is
operationally more convenient.

Simple Random Sampling Selection Procedure

In sample survey when sample units are selected there could be a bias in the selection
procedure which may come from the use of a non-random method. That is, the selection

87

Bahir Dar University, Department of Statistics.


is consciously or unconsciously influenced by human judgment. Such bias can be
avoided by a random selection method. To ensure true randomness the method of
selection must be independent of human influence.

There are different methods to select a random sample. The impossible part of each
random selection method is that the selection of each unit is biased purely on chance.
This eliminates selection bias, which may prevent the sample from being representative
of the population. A representative means that the sample gives an accurate (valid)
picture of the total population. If the population has N units then a random method of
selection is one which gives each of the N units in the procedures of random selection
method here.

Lottery Method: This is a very common method of talking a random sample. Under this
method, we label each member of the population by identifiable disc or a ticket or pieces
of paper. Discs or tickets must be of identical size, color and shape. They are placed in a
container and well mixed before each draw, and then without looking selecting
designated labels with or without replacement. Then draw may be continued until a
sample of the required size is selected. This shows that selection of items depends
entirely on chance.

Table of Random Number: The members of the population are numbered from 1 to N
and n numbers are selected from one of the random tables in any convenient and
systematic way. A table of random numbers consists of digit from 0 to 9, which are
equally represented with no pattern or order. The procedure of selection is outlined as
follows:

 Identify the population units ( N ) and gives serial numbers from 1 to N . This,
total number determines how many of the random digits we need when
selecting each element

88

Bahir Dar University, Department of Statistics.


 Decide the sample size ( n ) to be selected
 Select a starting point of the table of random numbers; you can start from any
one of the columns, which can be determined randomly
 Since each digit has an equal chance of being selected at any draw, you may
read down columns of digits in the table
 Depending on the size of N , you can use numbers in pairs, three at a time,
four at a time and so on
 If selected numbers are less or equal to the population size N , then they will
be considered
 Ignore all numbers greater than N
 For sampling without replacement, reject numbers that come up for a second
time
 The selection process continues until n

Importance and Limitations of Simple Random Sampling

Simple random sampling is very important as a basis for development of the theory of
sampling. It serves as a central reference for all other sampling designs. Under simple
random sampling ant particular sample of n elements from the population of N elements
can be chosen and in addition, is as likely to be chosen as any other sample. In this sense,
it is conceptually the simplest possible method, and hence it is one against which all other
methods can be compared. However, despite such importance, simple random sampling
has the following limitations:

 It can be expensive and often not feasible in practice since it requires that all
elements be identified and labeled prior to the sampling. This prior identification
is not possible and hence a simple random sample of elements can’t be drawn

89

Bahir Dar University, Department of Statistics.


 Since it gives each element in the population an equal chance of being chosen in
the sample, it may result in samples that are spread out over a large geographical
area. Such a geographic distribution of the sample would be very costly to
implement
 It would not be good for those surveys in which interest is focused on subgroups
that comprise a small proportion of the population.

Systematic sampling is often used instead of random sampling. It is also called N th name
selection technique. After the required sample size has been calculated, every N th record
is selected from a list of population members. As long as the list does not contain any
hidden order, this sampling method is as good as the random sampling method. Its only
advantage over the random sampling technique is simplicity. Systematic sampling is
frequently used to select a specified number of records from a computer file.

Stratified sampling is commonly used probability method that is superior to random


sampling because it reduces sampling error. A stratum is a subset of the population that
shares at least one common characteristic. Examples of stratums might be males and
females, or managers and non-managers. The researcher first identifies the relevant
stratums and their actual representation in the population. Random sampling is then used
to select a sufficient number of subjects from each stratum. "Sufficient" refers to a sample
size large enough for us to be reasonably confident that the stratum represents the
population. Stratified sampling is often used when one or more of the stratums in the
population have a low incidence relative to the other stratums.

Convenience sampling is used in exploratory research where the researcher is interested


in getting an inexpensive approximation of the truth. As the name implies, the sample is
selected because they are convenient. This non probability method is often used during

90

Bahir Dar University, Department of Statistics.


preliminary research efforts to get a gross estimate of the results, without incurring the
cost or time required to select a random sample.

Judgment sampling is a common non probability method. The researcher selects the
sample based on judgment. This is usually and extension of convenience sampling. For
example, a researcher may decide to draw the entire sample from one "representative"
city, even though the population includes all cities. When using this method, the
researcher must be confident that the chosen sample is truly representative of the entire
population.

Quota sampling is the non probability equivalent of stratified sampling. Like stratified
sampling, the researcher first identifies the stratums and their proportions as they are
represented in the population. Then convenience or judgment sampling is used to select
the required number of subjects from each stratum. This differs from stratified sampling,
where the stratums are filled by random sampling.

Snowball sampling is a special non probability method used when the desired sample
characteristic is rare. It may be extremely difficult or cost prohibitive to locate
respondents in these situations. Snowball sampling relies on referrals from initial subjects
to generate additional subjects. While this technique can dramatically lower search costs,
it comes at the expense of introducing bias because the technique itself reduces the
likelihood that the sample will represent a good cross section from the population.

91

Bahir Dar University, Department of Statistics.

You might also like