0% found this document useful (0 votes)
125 views98 pages

Basic Statistics Module

This document provides an overview of statistics and probability concepts across six chapters. Chapter 1 defines statistics and key terms. It describes the stages of a statistical investigation and different scales of measurement. Chapter 2 discusses methods of collecting and presenting data through frequency distributions, diagrams, graphs and charts. Chapter 3 covers measures of central tendency including the mean, median and mode. Chapter 4 examines measures of dispersion like range, variance and standard deviation. Chapter 5 introduces elementary probability concepts such as rules of counting, approaches to probability, and conditional probability. Chapter 6 defines random variables and their distributions, both discrete and continuous. It also discusses the binomial, Poisson and normal distributions.

Uploaded by

getajebesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views98 pages

Basic Statistics Module

This document provides an overview of statistics and probability concepts across six chapters. Chapter 1 defines statistics and key terms. It describes the stages of a statistical investigation and different scales of measurement. Chapter 2 discusses methods of collecting and presenting data through frequency distributions, diagrams, graphs and charts. Chapter 3 covers measures of central tendency including the mean, median and mode. Chapter 4 examines measures of dispersion like range, variance and standard deviation. Chapter 5 introduces elementary probability concepts such as rules of counting, approaches to probability, and conditional probability. Chapter 6 defines random variables and their distributions, both discrete and continuous. It also discusses the binomial, Poisson and normal distributions.

Uploaded by

getajebesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

Contents

CHAPTER ONE ............................................................................................................................................... 4


1. Introduction .......................................................................................................................................... 4
1.1 Definitions and classification of Statistics ..................................................................................... 4
1.2 Stages in Statistical Investigation .................................................................................................. 5
1.3 Definition of Some Terms ............................................................................................................. 6
1.4 Applications, Uses and Limitations of Statistics ............................................................................ 8
1.5 Scales of Measurement................................................................................................................. 9
CHAPTER TWO ............................................................................................................................................ 13
2. Methods of Data Collection and Presentation ................................................................................... 13
2.1 Methods of Data Collection ........................................................................................................ 13
2.1.1 Sources of Data ....................................................................................................................... 13
2.2 Methods of Data Presentation ................................................................................................... 16
2.2.1 Introduction ........................................................................................................................ 16
2.2.2 Frequency Distribution ....................................................................................................... 16
2.2.3 Diagrammatic and Graphical Presentation of Data ............................................................ 22
2.2.3.1 Diagrammatic display of data: Bar charts, Pie-chart, Cartograms ...................................... 23
2.2.3.2. Graphical presentation of data: Histogram, Frequency Polygon, Ogive Curves ............. 26
CHAPTER THREE .......................................................................................................................................... 28
3. Measures of Central Tendency ........................................................................................................... 28
3.1 Introduction ...................................................................................................................................... 28
3.2 Objectives of Measures of Central Tendency ................................................................................... 29
3.3 The Summation Notation () ........................................................................................................... 29
3.4 Important Characteristics of Measures of Central Tendency ........................................................... 30
3.5 Types of Measures of Central Tendency........................................................................................... 30
3.5.1 Arithmetic Mean ........................................................................................................................ 31
3.5.2 Median ....................................................................................................................................... 37
3.5.3 The Mode ................................................................................................................................... 39
3.5.4 The Relationship of the Mean, Median and Mode ............................................................. 40
3.6 The Quantiles (Quartiles, Deciles, Percentiles) ................................................................................. 42

1|Page
CHAPTER FOUR ........................................................................................................................................... 46
4. Measures of Dispersion (Variation) .................................................................................................... 46
4.1 Introduction ...................................................................................................................................... 46
4.2 Absolute and Relative Measures of Dispersion ................................................................................ 47
4.3 Types of Measures of Variation ........................................................................................................ 48
4.3.1 The Range and Relative Range ................................................................................................... 48
4.3.2 The Quartile Deviation and Coefficient of Quartile Deviation ................................................... 50
4.3.3 The Mean Deviation and Coefficient of Mean Deviation .......................................................... 51
4.3.4 The Variance, Standard Deviation and Coefficient of Variation ................................................ 54
4.4 Standard Scores (Z-Scores) ......................................................................................................... 59
4.5 Moments, Skewness and Kurtosis ....................................................................................... 60
4.5.1 Moments ............................................................................................................................. 60
4.5.2 Skewness ............................................................................................................................. 62
4.5.3 Kurtosis ............................................................................................................................... 64
CHAPTER FIVE ............................................................................................................................................. 66
5 Elementary Probability........................................................................................................................ 66
5.1 Introduction ....................................................................................................................................... 67
5.2 Definitions of Some concepts of Probability Terms ......................................................................... 67
5.3 Counting Rules.................................................................................................................................. 68
5.3.1 Addition Rule.............................................................................................................................. 69
5.3.2 Multiplication (Fundamental) Rule ............................................................................................ 69
5.3.3 Permutation Rule ....................................................................................................................... 69
5.3.4 Combination Rule....................................................................................................................... 71
5.4 Approaches in Probability Definition ............................................................................................... 71
5.4.1 Subjective approach:.................................................................................................................. 72
5.4.2 Objective approach: ................................................................................................................... 72
5.5 Some Probability Rules..................................................................................................................... 74
5.5 Conditional Probability and Independence ....................................................................................... 74
CHAPTER SIX................................................................................................................................................ 76
6 Probability Distribution ....................................................................................................................... 76
6.1 The Definition of Random Variable and Probability Distribution .................................................... 76
6.1.1 Discrete Random Variable and Probability Distribution (pmf) .................................................. 77

2|Page
6.1.2 Continuous Random Variable and Probability Distribution ....................................................... 81
6.2 Introduction to Expectation- Mean and Variance of a Random Variable ......................................... 83
6.3 Common Discrete Probability Distribution ...................................................................................... 87
6.3.1. Binomial Distribution ................................................................................................................ 87
6.3.2 The Poisson Distribution ............................................................................................................ 88
6.4 Common Continuous Probability Distribution ................................................................................. 90
6.4.1 Normal Random Variables ......................................................................................................... 90

6.4.2 Chi-square Distribution:  2  Distribution ............................................................................ 97

6.4.3Student’s 𝒕 distribution ............................................................................................................... 97

3|Page
CHAPTER ONE
1. Introduction
1.1 Definitions and classification of Statistics
Statistics is defined differently by different authors over period of time. In the olden days statistics
was confined to only state affairs but in modern days it embraces almost every sphere of human
activity. Therefore, a number of old definitions, which were confined to narrow field of enquiry,
were replaced by more definitions, which are much more comprehensive and exhaustive.
We can define statistics in two senses
• In the plural sense: statistics are the raw data themselves (Numerical facts), like statistics of
births, statistics of deaths, statistics of students, statistics of imports and exports, etc.
• In the singular sense: Statistics is the science of conducting studies to collect, organize,
summarize, analyze, and well as deriving valid conclusions and making reasonable
decisions on the basis of data.
Classifications:
Depending on how data can be used statistics is sometimes divided in to two main areas or
branches.
1. Descriptive Statistics:
• Is concerned with summary calculations, graphs, charts and tables.
• In descriptive statistics our objective is to describe a group of data that we have ‘in hand’ i.e.
data that are accessible to us.
• Generally characterizes or describes a set of data elements by graphically displaying the
information or describing its central tendencies and how it is distributed.
Example: the following data refers to the number of malaria patients who have been treated
in Debre Berhan referal Hospital from 1986 to 1990 (Eth. Calendar).
3645; 4568; 5432; 6751; 7369
If we calculate the average malaria patients from 1986 to 1990 as
1
Average  (3645  4568  5432  6751 7369)  5553, then our work belongs to the
5
domain of descriptive statistics.
If we say that there was an increase of 724 patients from 1986 to 1990, then again this belongs
to the domain of descriptive statistics.

4|Page
2. Inferential Statistics: consists of generalizing from samples to populations, performing
estimations and hypothesis tests, determining relationships among variables, and making
predictions. Statistical techniques based on probability theory are required.
Example 1.1: In the above example if we predict the number of malaria patients in the year
1995 to be 9917, then our work belongs to the domain of inferential statistics.
Example 1.2: Suppose we want to have an idea about the percentage of illiterates in our
country. We take a sample from the population and find the proportion of illiterates in the
sample. This sample proportion with the help of probability enables us to make some
inferences about the population proportion. This study belongs to inferential statistics.

1.2 Stages in Statistical Investigation


Before we deal with statistical investigation, let us see what statistical data mean. Each and every
numerical data can’t be considered as statistical data unless it possesses the following criteria.
These are:
 The data must be aggregate of facts
 They must be affected to a marked extent by a multiplicity of causes
 They must be estimated according to reasonable standards of accuracy
 The data must be collected in a systematic manner for predefined purpose
 The data should be placed in relation to each other
A statistician should be involved at all the different stages of statistical investigation. This includes
formulating the problem, and then collecting, organizing and classifying, presenting, analyzing
and interpreting of statistical data. Let’s see each stage in detail
I. Formulating the problem: first research must emanate if there is a problem. At this stage
the investigator must be sure to understand the problem and then formulate it in statistical
term. Clarify the objectives very carefully. Ask as many questions as necessary because “An
approximate answer to the right question is worth a great deal more than a precise answer
to the wrong question.”-The first golden rule of applied mathematics-
Therefore, the first stage in any statistical investigation should be to:
 Get a clear understanding of the physical background to the situation under study;
 Clarify the objectives;
 Formulate the objective in statistical terms

5|Page
II. Proper collection of data: in order to draw valid conclusions, it is important ‘good’ data.
Data are gathered with aim to meet predetermine objectives. In other words, the data must
provide answers to problems. The data itself form the foundation of statistical analyses and
hence the data must be carefully and accurately collected. In section 1.6 we will see the
methods of data collection.
III. Organization and classification of data: in this stage the collected data organized in a
systematic manner. That means the data must be placed in relation to each other. The
classification or sorting out of data is, by itself, a kind of organization of data.
IV. Presentation of data: The purpose of putting the organized data in graphs, charts and tables
is two-fold. First, it is a visual way to look at the data and see what happened and make
interpretations. Second, it is usually the best way to show the data to others. Reading lots of
numbers in the text puts people to sleep and does little to convey information.
V. Analyses of data: is the process of looking at and summarizing data with the intent to extract
useful information and develop conclusions. Data analysis is closely related to data mining,
but data mining tends to focus on larger data sets, with less emphasis on making inference,
and often uses data that was originally collected for a different purpose. In this stage
different types of inferential statistical methods will apply. For instance, hypothesis testing
such as  2 test of association.
VI. Interpretation of data: interpretation means drawing valid conclusions from data which
form the basis of decision making. Correct interpretation requires a high degree of skill and
experience.
Note that: Analyses and interpretation of data are the two sides of the same coin.

1.3 Definition of Some Terms


In this section, we will define those terms which will be used most frequently. These are:
Data: are the values (measurements or observations) that the variables can assume. OR Facts or figures
from which the conclusion can be drawn.
Data set: Facts or figures collected for a particular study. Each value in the data set is called data value or
datum.
Raw Data: Data sheets are where the data are originally recorded. Original data are called raw data.
Data sheets are often hand drawn, but they can also be printouts from database programs like
Microsoft Excel.

6|Page
Population: The totality of all subjects with certain common characteristics that are
being studied in a specified time and place.
Sample: Is a portion of a population which is selected using some technique of sampling. Sample
must be representative of the population so that it must be selected by any of the developed
technique.
Sampling: Is the process of selecting units (e.g., people, households, organizations) from a
population of interest so that by studying the sample we may fairly generalize our results back to
the population from which they were chosen.
Sample size: The number of elements or observation to be included in the sample.
Parameter: Any measure computed from the data of a population. Example: Populations mean
(µ) and population standard deviation (𝜎)
Statistic: Any measure computed from the sample. Example: sample mean (𝑥̅ ), sample standard
deviation (s)
Survey: A collection of quantitative information about members of a population when no special
control is exercised over any of the factors influencing the variable of interest.
Sample survey: A survey that include only a portion of the population.
Census: A collection of information about every member of a population
Sample survey has the following advantages over census
• Sample survey saves time and cost
• Has great accuracy
• Avoid wastage of material
Variable: A variable is a characteristic or attribute that can assume different values. Variables
whose values are determined by chance are called random variables. Variables are often specified
according to their type and intended use and hence variable can be classified in to two namely
qualitative and quantitative variables.
• A quantitative variable is naturally measured as a number for which meaningful arithmetic
operations make sense. Examples: Height, age, crop yield, GPA, salary, temperature, area,
air pollution index (measured in parts per million), etc.
• Qualitative variable: Any variable that is not quantitative is qualitative. Qualitative
variables take a value that is one of several possible categories. As naturally measured,

7|Page
qualitative variables have no numerical meaning. Examples: Hair color, gender, field of
study, marital status, political affiliation, status of disease infection.
Quantitative variables can be classified as discrete and continuous variable.
1. Discrete variables can assume certain numerical values. That is, there are gaps between the
possible values. Such as 0, 1, 2...It may be countable finite or countable infinite. For example
the number of students in a classroom, number of children a family.
2. Continuous variable can take any value within a specified interval with a finite enough
measuring device. No gaps between possible values. They are obtained by measuring. For
example, consider the heights of two people no matter how close it is we can find another
person whose height falls somewhere between the two heights is a continuous variable.

1.4 Applications, Uses and Limitations of Statistics


I. Applications of Statistics
 Apart from helping elicit an intelligent assessment from a body of figures and facts,
statistics is indispensable tool for any scientific enquiry-right from the stage of planning
enquiry to the stage of conclusion. It applies almost all sciences: pure and applied, physical
natural, biological, medical, agricultural and engineering. It also finds applications in social
and management sciences, in commerce, business and industry.
 In almost all fields of human endeavor.
 Almost all human beings in their daily life are subjected to obtaining numerical facts.
 Applicable in some process e.g. invention of certain drugs, extent of environmental
pollution.
 In industries especially in quality control area.
I. Uses of Statistics
 Statistics presents fact in the form of numerical data
 It condenses and summarizes a mass of data in to a few presentable and precise figures.
 It facilitates comparison of data
 It helps in formulating and testing hypothesis
 It helps in predicting future trend
 It helps in formulating polices.
II. Limitations of Statistics

8|Page
Statistics with all its wide application in every sphere of human activity has its own limitation.
Some of them are given below
 Statistics is not suitable to the study of qualitative phenomenon: Since statistics is basically
a science and deals with a set of numerical data, it is applicable to the study of only these
subjects of enquiry, which can be expressed in terms of quantitative measurements. As a matter
of fact, qualitative phenomenon like honesty, poverty, beauty, intelligence etc, cannot be
expressed numerically and any statistical analysis cannot be directly applied on these
qualitative phenomenons. Nevertheless, statistical techniques may be applied indirectly by first
reducing the qualitative expressions to accurate quantitative terms. For example, the
intelligence of a group of students can be studied on the basis of their marks in a particular
examination.
 Statistics does not study individuals: Statistics does not give any specific importance to the
individual items; in fact it deals with an aggregate of objects. Individual items, when they are
taken individually do not constitute any statistical data and do not serve any purpose for any
statistical enquiry.
 Statistical laws are not exact: It is well known that mathematical and physical sciences are
exact. But statistical laws are not exact and statistical laws are only approximations. Statistical
conclusions are not universally true. They are true only on an average.
 Statistics table may be misused: Statistics must be used only by experts; otherwise, statistical
methods are the most dangerous tools on the hands of the inexpert. The use of statistical tools
by the inexperienced and untraced persons might lead to wrong conclusions. Statistics can be
easily misused by quoting wrong figures of data. As King says aptly ‘statistics are like clay of
which one can make a God or Devil as one pleases.’
 Statistics is one of the methods of studying a problem: Statistical method does not provide
complete solution of the problems because problems are to be studied taking the background
of the countries culture, philosophy or religion into consideration. Thus the statistical study
should be supplemented by other evidences.
1.5 Scales of Measurement
Normally, when one hears the term measurement, they may think in terms of measuring the length
of something (i.e. the length of a piece of wood) or measuring a quantity of something (i.e. a cup
of flour). This represents a limited use of the term measurement. In statistics, the term measurement

9|Page
is used more broadly and is more appropriately termed scales of measurement. Scales of
measurement refer to ways in which variables or numbers are defined and categorized. Each scale
of measurement has certain properties which in turn determine the appropriateness for use of
certain statistical analyses. The four scales of measurement are nominal, ordinal, interval, and
ratio.
Nominal Scales
Nominal scales possess the following properties.
 Level of measurement which classifies data into mutually exclusive, all-inclusive
categories in which no order or ranking can be imposed on the data.
 No arithmetic and relational operation can be applied.
 No quantitative information is conveyed
 Thus only gives names or labels to various categories.
Examples:
 Political party preference (Republican, Democrat, or Other,)
 Sex (Male or Female.)
 Marital status (married, single, widow, divorce)
 Country code
 Regional differentiation of Ethiopia.
2. Ordinal Scales
Ordinal Scales are measurement systems that possess the following properties:
 Level of measurement which classifies data into categories that can be ranked, however
Differences between the ranks do not exist.
 Arithmetic operations are not applicable but relational operations are applicable.
 Ordering is the sole property of ordinal scale.
Examples:
 Letter grades (A, B, C, D, F).
 Rating scales (Excellent, Very good, Good, Fair, poor).
 Military status.
3. Interval Scales
Interval scales are measurement systems that possess the following properties:

10 | P a g e
 Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
 All arithmetic operations except division are applicable.
 Relational operations are also possible.
Examples:
 IQ, Temperature in F0.
4. Ratio Scales
Ratio scales measurement possess the following properties: Level of measurement which classifies
data that can be ranked, differences are meaningful, and there is a true zero. True ratios exist
between the different units of measure.
 All arithmetic and relational operations are applicable.
Examples:
 Weight
 Height
 Number of students
 Age
Use of level of measurements
 Helps you decide how to interpret the data from the variable.
 Helps you decide what statistical analysis is appropriate on the values that were assigned.
For example if a measurement is nominal then you know that you never average the data
level.
Exercise 1: Classify the following different measurement systems into one of the four types of
scales.
a) Your checking account number as a name for your account.
b) Your checking account balance as a measure of the amount of money you have in that
account
c) Your score on the first statistics test as a measure of your knowledge of statistic
d) A response to the statement "Abortion is a woman's right" where "Strongly Disagree" =
1, "Disagree" = 2, "No Opinion" = 3, "Agree" = 4, and "Strongly Agree" = 5, as a measure
of attitude toward abortion.
e) Times for swimmers to complete a 50-meter race

11 | P a g e
f) Months of the year Meskerm, Tikimit…
g) Socioeconomic status of a family when classified as low, middle and upper classes.
h) Blood type of individuals, A, B, AB and O.
i) Pollen counts provided as numbers between 1 and 10 where 1 implies there is almost no
pollen and 10 that it is rampant, but for which the values do not represent an actual counts
of grains of pollen.
j) Regions numbers of Ethiopia
k) The number of students in a college
l) The net wages of a group of workers
m) The height of the men in a town

12 | P a g e
CHAPTER TWO
2. Methods of Data Collection and Presentation
2.1 Methods of Data Collection
Once it is decided what type of study is to be made, it becomes necessary to collected information
about the concerned study, mostly in the form of data. In order to generate valid conclusion from
a data, information has to be collected in a systematic manner. Whatever the quality of sampling
and analysis method, a haphazardly collected dataset is less likely to produce valuable and
generalizable information.
2.1.1 Sources of Data
There are two sources of data these are primary and secondary sources. Depending on its source
data can also be classified into two types.
(1). Primary Data (2). Secondary Data
1) Primary data
• The primary data are the first hand information collected, compiled and published by
organization for some purpose. They are most original data in character and have not
undergone any sort of statistical treatment.
• Refer to those that are collected by conducting survey to meet the specific problem needs at
hand.
Example: Population census reports are primary data because these are collected, complied and
published by the population census organization.
2) Secondary data
• The secondary data are the second hand information which are already collected by
someone (organization) for some purpose and are available for the present study. The
secondary data are not pure in character and have undergone some treatment at least once.
• Data taken from already available published or unpublished source.
There are three major methods of data collection
1. self-administered questionnaire
2. direct investigation-measurement (observation) of the subject and interviewing(face-to-
face, telephone, …)
3. the use of documentary source
1. Self-administered questionnaire

13 | P a g e
Questionnaire is the main data collection instrument in formal sample survey. Before examining
the steps in designing a questionnaire we need to review the types of questions used in
questionnaires. Depending on the amount of freedom given to respondent in offering responses,
there are two basic types of questions that can be used in questionnaires: open-ended questions
and closed ended questions.
The type of questions for use will be determined by the form of responses wanted, the nature of
the respondents and their ability to answer the questions.
Open-ended questions: - allows the respondent to answer it freely in his or her own words
Example: what do you think are the reasons for a high drop-out rate of village health committee
members?
Closed- ended questions:-
Predetermined list of alternate responses is presented to the respondent for checking the appropriate
one(s). It implies that the respondent’s answers are restricted in some way to a limited range of
alternatives.
Advantage
• It is the cheapest and can be conducted by a single researcher.
• Questionnaires can be sending to a wide geographical area.
• There is no interviewer variability
Disadvantage
• Low response rate
• No assurance that the questioners was answered by the right person.
• Mail questionnaire is not suitable for illiterate community
2. direct investigation
i. measurement or/and observation
• data can be obtained through direct observation or measurement
• provides accurate information but it is expensive and inconvenient
eg: Land area measurement, Animal weight gain, Physical examination, direct observation of
work.
ii. Interview
a) Face-to-Face interview
Advantage:-

14 | P a g e
• Interviewers can observe the surroundings and can use nonverbal communication and
visual aids.
• The interviewer can help the respondent if he/she has difficulty in understanding the
questions.
• Respondent is likely to answer all the questions alone
Disadvantage:-
• Cost is high
• Interviewer bias is also high
• Untrained interviewer may distort the meaning of the questions
b) Telephone Interview
Advantage:-
• It is less expensive in time and money compared to face to face interviews
• Relatively high response rate
• Reach people who would not open their doors to an interviewer, but might willing to
talk on the telephone
Disadvantage:-
• Unrepresentative of the groups which do not have telephones
• Unlisted telephone numbers are excluded from the study.
• Respondent may be substitute by another
3. The use of documentary source
• Extracting information from existing resources.
• Is much less expensive than any other two sources
• It is difficult to get the information needed when records are compiled in
unstandardized manner.
Example: - Hospital records, professional institutes, Official statistics, - - -
Editing of Data:
After collecting the data either from primary or secondary source, the next step is it’s editing.
Editing means the examination of collected data to discover any error and mistake before presenting it. It
has to be decided before hand what degree of accuracy is wanted and what extent of errors can be
tolerated in the inquiry. The editing of secondary data is simpler than that of primary data.

15 | P a g e
2.2 Methods of Data Presentation
2.2.1 Introduction
This topic introduces tabular and graphical methods commonly used to summarize both qualitative and
quantitative data. Tabular and graphical summaries of data can be obtained in annual reports, newspaper
articles and research studies. Everyone is exposed to these types of presentations, so it is important to
understand how they are prepared and how they will be interpreted.
Modern statistical software packages provide extensive capabilities for summarizing data and preparing
graphical presentations. MINITAB, SPSS, STATA and R are three packages that are widely available.
Tabulation of Data: The process of placing classified data into tabular form is known as tabulation. A
table is a symmetric arrangement of statistical data in rows and columns. Rows are horizontal
arrangements whereas columns are vertical arrangements.
2.2.2 Frequency Distribution
A frequency distribution is the organization of row data in table form, using classes and frequencies. There
are three basic types of frequency distributions, and there are specific procedures for constructing each type.
The three types are categorical, ungrouped and grouped frequency distributions.
The reasons for constructing a frequency distribution are as follows
• To organize the data in a meaningful, intelligible way.
• To enable the reader to determine the nature or shape of the distribution
• To facilitate computational procedures for measures of average and spread
• To enable the researcher to draw charts and graphs for the presentation of data
• To enable the reader to make comparisons between different data set
Some of basic terms that are most frequently used while we deal with frequency distribution are the
following:
• Lower Class Limits are the smallest number that can belong to the different class.
• Upper Class Limits are the largest number that can belong to the different classes.
• Class Boundaries are the number used to separate classes, but without the gaps created by
class limits.
• Class midpoints are the midpoints of the classes. Each class midpoint can be found by
adding the lower class limit to the upper class limit and dividing the sum by 2.
• Class width is the difference between two consecutive lower class limits or two
consecutive lower class boundaries.

16 | P a g e
2.2.2.1Categorical Frequency Distribution
The categorical frequency distribution is used for data which can be placed in specific categories such as
nominal or ordinal level data. For example, data such as data such as political affiliation, religious
affiliation, or major field of study would use categorical frequency distribution.
The major components of categorical frequency distribution are class, tally and frequency. Moreover, even
if percentage is not normally a part of a frequency distribution, it will be added since it is used in certain
types of graphical presentations, such as pie graph.
Steps of constructing categorical frequency distribution
1. You have to identify that the data is in nominal or ordinal scale of measurement
2. Make a table as show below
A B C D
Class Tally Frequency Percent

3. Put distinct values of a data set in column A


4. Tally the data and place the result in column B
5. Count the tallies and place the results in column C
6. Find the percentage of values in each class by using the formula
𝑓
%= 𝑥100%
𝑛
Where 𝑓 is frequency and 𝑛 is total number of values
Example 2.1: Twenty-five army inductees were given a blood test to determine their blood type. The data
set is given as follows:
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.

17 | P a g e
Solution:
A B C D
class Tally Frequency Percent
A //// 5 20
B //// // 7 28
O //// //// 9 36
AB //// 4 16

2.2.2.2 Ungrouped Frequency Distribution


When the data are numerical interested of categorical, the range of data is small and each class is only one
unit, this distribution is called an ungrouped frequency distribution.
The major components of this type of frequency distributions are class, tally, frequency and cumulative
frequency. The steps are almost similar with that of categorical frequency distribution.
Cumulative frequencies are used to show how many values are accumulated up to and including a specific
class.
Example 2.2: The following data represent the number of days of sick leave taken by each of 50 workers
of a company over the last 6 weeks.
2 0 0 5 8 3 4 1 0 0 7 1
7 1 5 4 0 4 0 1 8 9 7 0
1 7 2 5 5 4 3 3 0 0 2 5
1 3 0 2 4 5 0 5 7 5 1 1
0 2
A. Construct ungrouped frequency distribution
B. How many workers had at least 1 day of sick leave?
Solution:
A. Since this data set contains only a relatively small number of distinct or different values, it is
convenient to represent it in a frequency table which presents each distinct value along with its
frequency of occurrence.
Class Frequency Cumulative Frequency
0 12 12
1 8 20
2 5 25
3 4 29
4 5 34

18 | P a g e
5 8 42
7 5 47
8 2 49
9 1 50

B. Since 12 of the 50workers had no days of sick leave, the answer is 50-12=38

2.2.2.3 Grouped Frequency Distribution


When the range of the data is large, the data must be grouped in which each class has more than one unit
in width. While we construct this frequency distribution, we have to follow the following steps.
1. Find the highest and the lowest values
2. Find the range; 𝑅𝑎𝑛𝑔𝑒 = 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 − 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 or 𝑅 = 𝐻 − 𝐿
3. Select the number of classes desired. Here, we have two choices to get the desired number of
classes:
a) Use Struge’s rule. That is, 𝐾 = 1 + 3.32 𝑙𝑜𝑔 𝑛 where 𝐾 is the number of class and 𝑛
is the number of observations.
b) Select the number of classes arbitrarily between 5 and 20. This is a conventional way. If you
fail to calculate 𝐾 by Struge’s rule, this method is more appropriate.
When we choose the number of classes, we have to think about the following criteria
• The classes must be mutually exclusive. Mutually exclusive classes have non
overlapping class limits so that values can’t be placed in to two classes.
• The classes must be continuous. Even if there are no values in a class, the class must be
included in the frequency distribution. There should be no gaps in a frequency
distribution. The only exception occurs when the class with a zero frequency is the first
or last. A class width with a zero frequency at either end can be omitted without affecting
the distribution.
• The classes must be equal in width. The reason for having classes with equal width is
so that there is not a distorted view of the data. One exception occurs when a distribution
is open-ended. i.e., it has no specific beginning or end values.
4. Find the class width by dividing the range by the number of classes

𝑅𝑎𝑛𝑔𝑒 𝑅
𝑊𝑖𝑑𝑡ℎ = 𝑜𝑟 𝑊 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠𝑒𝑠 𝐾

19 | P a g e
Note that: Round the answer up to the nearest whole number if there is a reminder. For instance,
4.7 ≈ 5 and 4.12 ≈ 5
5. Select the starting point as the lowest class limit. This is usually the lowest score (observation).
Add the width to that score to get the lower class limit of the next class. Keep adding until you
achieve the number of desired class(𝐾) calculated in step 3.
6. Find the upper class limit; subtract unit of measurement(𝑈) from the lower class limit of the second
class in order to get the upper limit of the first class. Then add the width to each upper class limit
to get all upper class limits.
Unit of measurement: Is the next expected upcoming value. For instance, 28, 23, 52, and then the
unit of measurement is one. Because take one datum arbitrarily, say 23, then the next upcoming
value will be 24. Therefore,𝑈 = 24 − 23 = 1. If the data is 24.12, 30, 21.2 then give priority to the
datum with more decimal place. Take 24.12 and guess the next possible value. It is 24.13.
Therefore, 𝑈 = 24.12 − 24.13 = 0.01.
Note that: 𝑈 = 1 is the maximum value of unit of measurement and is the value when we don’t
have a clue about the data.
7. Find the class boundaries.
𝑈
𝑳𝑜𝑤𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑩𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = 𝑳𝑜𝑤𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑳𝑖𝑚𝑖𝑡 − 2 and,
𝑈
𝑼𝑝𝑝𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑩𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = 𝑼𝑝𝑝𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑳𝑖𝑚𝑖𝑡 − 2 .
𝑈 𝑈
In short, 𝐿𝐶𝐵 = 𝐿𝐶𝐿 − 2 and 𝑈𝐶𝐵 = 𝑈𝐶𝐿 − 2 .
8. Tally the data and write the numerical values for tallies in the frequency column
9. Find cumulative frequency. We have two type of cumulative frequency namely less than
cumulative frequency and more than cumulative frequency. Less than cumulative frequency is
obtained by adding successively the frequencies of all the previous classes including the class
against which it is written. The cumulate is started from the lowest to the highest size. More than
cumulative frequency is obtained by finding the cumulate total of frequencies starting from the
highest to the lowest class.
For example, the following frequency distribution table gives the marks obtained by 40 students:

20 | P a g e
The above table shows how to find less than cumulative frequency and the table shown below
shows how to find more than cumulative frequency.

5.511
Example 2.3: Consider the following set of data and construct the grouped frequency distribution.
11 29 6 33 14 21 18 17 22 38
31 22 27 19 22 23 26 39 34 27
Steps
1. Highest value = 39, Lowest value = 6
2. R = 39 − 6 = 33
3. K = 1 + 3.32 log 20 = 5.32 ≈ 6
R 33
4. W = K = 6
= 5.5 ≈ 6
5. Select starting point. Take the minimum which is 6 then add width 6 on it to get the next class
LCL.
6 12 18 24 30 36
6. Upper class limit. Since unit of measurement is one. 12 − 1 = 11. So 11 is the UCL of the first
class. Therefore, 6 − 11 is the first class
Class limit 6-11 12-17 18-23 24-29 30-35 36-41
7. Find the class boundaries. Take the formula in step 7.

21 | P a g e
LCB1 = LCL1 − 0.5 , and UCB1 = UCL1 − 0.5

Class 5.5-11.5 11.5-17.5 17.5-23.5 23.5-29.5 29.5-35.5 35.5-41.5


Boundaries
8. Steps 9 and 10

2.2.2.4Relative Frequency Distribution


An important variation of the basic frequency distribution uses relative frequencies, which are easily found
by dividing each class frequency by the total of all frequencies. A relative frequency distribution includes
the same class limits as a frequency distribution, but relative frequencies are used instead of actual
frequencies. The relative frequencies are sometimes expressed as percent.
𝐶𝑙𝑎𝑠𝑠 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 =
𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠
Relative frequency distribution enables us to understand the distribution of the data and to compare different
sets of data.
2.2.3 Diagrammatic and Graphical Presentation of Data
We have discussed the techniques of classification and tabulation that help us in organizing the collected
data in a meaningful fashion. However, this way of presentation of statistical data does not always prove to
be interesting to a layman. Too many figures are often confusing and fail to convey the massage effectively.
One of the most effective and interesting alternative way in which a statistical data may be presented is
through diagrams and graphs. There are several ways in which statistical data may be displayed pictorially
such as different types of graphs and diagrams.
General steps in constructing graphs
1. Draw and label the x and y axes
2. Choose a suitable scale for the frequencies or cumulative frequencies and label it on the y axis.

22 | P a g e
3. Represent the class boundaries for the histogram or Ogive or the mid-point for the frequency
polygon on the x axis.
4. Plot the points
5. Draw the bars or lines
2.2.3.1 Diagrammatic display of data: Bar charts, Pie-chart, Cartograms
I. Pie chart
Pie chart can used to compare the relation between the whole and its components. Pie chart is a circular
diagram and the area of the sector of a circle is used in pie chart. Circles are drawn with radii proportional
to the square root of the quantities because the area of a circle is πr 2.
To construct a pie chart (sector diagram), we draw a circle with radius (square root of the total). The total
angle of the circle is360o . The angles of each component are calculated by the formula.
Component Part
Angle of Sector = x360o
Total
These angles are made in the circle by mean of a protractor to show different components. The arrangement
of the sectors is usually anti-clock wise.
Example 2.4: The following table gives the details of monthly budget of a family. Represent these figures
by a suitable diagram.

Solution: The necessary computations are given below:

23 | P a g e
Monthly family budget
misclaneous
20%
food
Fuel and Light 40%
7%

House Rent
27%

clothing
6%

II. Bar Charts


The bar chart (simple bar chart, multiple bar chart and stratified or stacked bar chart) uses vertical or
horizontal bars to represent the frequencies of a distribution. While we draw bar chart, we have to consider
the following two points. These are
• Make the bars the same width
• Make the units on the axis that are used for the frequency equal in size
a) A simple bar chart is used to represents data involving only one variable classified on spatial,
quantitative or temporal basis. In simple bar chart, we make bars of equal width but variable
length, i.e. the magnitude of a quantity is represented by the height or length of the bars.
Following steps are undertaken in drawing a simple bar diagram:
• Draw two perpendicular lines one horizontally and the other vertically at an appropriate
place of the paper.
• Take the basis of classification along horizontal line (X-axis) and the observed variable
along vertical line (Y-axis) or vice versa.
• Marks signs of equal breath for each class and leave equal or not less than half breath in
between two classes.
• Finally marks the values of the given variable to prepare required bars.
Example 2.5: Draw simple bar diagram to represent the profits of a bank for 5 years.

Years 1989 1990 1991 1992 1993


Profit 10 12 18 25 42
( million $)

24 | P a g e
b) Multiple bar charts are used two or more sets of inter-related data are represented (multiple
bar diagram facilities comparison between more than one phenomenons). The technique of
simple bar chart is used to draw this diagram but the difference is that we use different shades,
colors, or dots to distinguish between different phenomena.
Example 2.6: Draw a multiple bar chart to represent the import and export of Canada (values in $) for the
years 1991 to 1995.
Years 1991 1992 1993 1994 1995
Imports 7930 8850 9780 11720 12150
Exports 4260 5225 6150 7340 8145

c) Stratified (Stacked or component) Bar Chart is used to represent data in which the total magnitude
is divided into different or components. In this diagram, first we make simple bars for each class taking
total magnitude in that class and then divide these simple bars into parts in the ratio of various
components. This type of diagram shows the variation in different components within each class as
well as between different classes. Sub-divided bar diagram is also known as component bar chart or
staked chart.

25 | P a g e
Example 2.7: The table below shows the quantity in hundred kgs of Wheat, Barley and Oats produced on
a certain form during the years 1991 to 1994. Draw stratified bar chart.
Years 1991 1992 1993 1994
Wheat 34 43 43 45
Barley 18 14 16 13
Oats 27 24 27 34

Solution: To make the component bar chart, first of all we have to take year wise total production.

Years 1991 1992 1993 1994


Wheat 34 43 43 45
Barley 18 14 16 13
Oats 27 24 27 34
Total 79 81 86 92

The required diagram is given below:

2.2.3.2. Graphical presentation of data: Histogram, Frequency Polygon, Ogive Curves


Statistical graphs can be used to describe the data set or to analyze it. Graphs are also useful in getting the
audience’s attention in a publication or a speaking presentation.
They can be used to discuss an issue, reinforce a critical point, or summarize a data set. They can also be
used to discover a trend or pattern in a situation over a period of time.
The three most commonly used graphs in research are
i. The histogram.
ii. The frequency polygon.
iii. The cumulative frequency graph, or ogive (pronounced o-jive).

26 | P a g e
i. Histogram
Histogram is a special type of bar chart in which the horizontal scale represents classes of data values and
the vertical scale represents frequencies. The height of the bars correspond to the frequency values, and the
drawn adjacent to each other (without gaps).
We can construct a histogram after we have first completed a frequency distribution table for a data set.
Example 2.8: Take the data in example 2.3.

7.0

6.0

5.0

Frequency 4. 0

3.0

2.0

1.0

0.0 5.5 11.5 17.5 23.5 29.5 35.5 41.5


Class boundaries

Relative frequency histogram has the same shape and vertical (𝑦 𝑎𝑥𝑖𝑠) scale as a histogram, but the vertical
(𝑦 𝑎𝑥𝑖𝑠) scale is marked with relative frequencies instead of actual frequencies.
ii. Frequency Polygon
A frequency polygon uses line segment connected to points located directly above class midpoint values.
The heights of the points correspond to the class frequencies, and the line segments are extended to the left
and right so that the graph begins and ends on the horizontal axis with the same distance that the previous
and next midpoint would be located.
Example 2.9: Take the data in example 2.3.

7.0

6.0

5.0

4.0

3.0

2.0
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5
Midpoints

27 | P a g e
iii. Ogive Graph
An Ogive (pronounced as “oh-jive”) is a line that depicts cumulative frequencies, just as the cumulative
frequency distribution lists cumulative frequencies. Note that the Ogive uses class boundaries along the
horizontal scale, and graph begins with the lower boundary of the first class and ends with the upper
boundary of the last class. Ogive is useful for determining the number of values below some particular
value. There are two type of Ogive namely less than Ogive and more than Ogive. The difference is that
less than Ogive uses less than cumulative frequency and more than Ogive uses more than cumulative
frequency on 𝑦 axis.
Example 2.10: Take the data in example 2.3 and draw less than and more than Ogive

20 Less than Ogive

15

10

More than Ogive


0
5.5 11.5 17.5 23.5 29.5 35.5 41.5
Class Boundaries

CHAPTER THREE
3. Measures of Central Tendency

3.1 Introduction
When we want to make comparison between groups of numbers it is good to have a single value that is
considered to be a good representative of each group. This single value is called the average of the
group. Averages are also called measures of central tendency.
Measures of central tendency are numerical measures which intends to describe the middle value or the
central value or the typical value in a given data set. An average which is representative is called typical

28 | P a g e
average and an average which is not representative and has only a theoretical value is called a descriptive
average.
At the end of this chapter students will be able to:
 Identify measure of central tendency.
 Understand properties of arithmetic mean.
 Summarize an aggregate of statistical data by using single measure.
 Define and calculate the mean, mode and median.
 Measure the position of data using quartiles, deciles and percentiles with their
interpretation.
3.2 Objectives of Measures of Central Tendency

 To comprehend the data easily or to describe a data in a concise manner.


 To facilitate comparison.
 To summarizing/reducing the volume of the data
 To make further statistical analysis.
3.3 The Summation Notation ()

Statistical Symbols: Let a data set consists of a number of observations, represents by 𝑥1 , 𝑥2


, … , 𝑥𝑛 where n (the last subscript) denotes the number of observations in the data and 𝑥𝑖 is the ith
observation. Then the sum of all numbers (𝑥𝑖 ′𝑠) where i goes from 1 up to n is symbolically given
by ∑𝑛𝑖=1 𝑥𝑖 𝑜𝑟 ∑ 𝑥𝑖 𝑜𝑟 ∑ 𝑥 that is,
∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + … + 𝑥𝑛
x - Whole set of numbers
𝑥𝑖 - Specific score in a set of numbers
n - Total number of observations
For instance a data set consisting of six measurements 2, 3, 9, 10, 8 and -2 is represented by 𝑥1 ,
𝑥2 , … , 𝑥6 where 𝑥1 = 2, 𝑥2 =3, 𝑥3 =9, 𝑥4 = 10, 𝑥5 = 8 and 𝑥6 =-2 Their sum becomes ∑6𝑖=1 𝑥𝑖
= 𝑥1 + 𝑥2 + … + 𝑥6 = 2+3+9+10+8+ (-2) = 30
Some Properties of the Summation Notation
1. ∑𝑛𝑖=1 𝑐 = n.c, where c is a constant number.

2. ∑𝑛𝑖=1 𝑏𝑥𝑖 = b∑𝑛𝑖=1 𝑥𝑖 where b is a constant number

29 | P a g e
3. ∑𝑛𝑖=1(𝑎 + 𝑏𝑥𝑖 ) = n.a + b∑𝑛𝑖=1 𝑥𝑖

4. ∑𝑛𝑖=1((𝑥𝑖 ± 𝑦𝑖 ) = ∑𝑛𝑖=1 𝑥𝑖 ± ∑𝑛𝑖=1 𝑦𝑖

5. ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 ≠ ∑𝑛𝑖=1 𝑥𝑖 ∑𝑛𝑖=1 𝑦𝑖


Example 3.1: ∑7𝑖=1 𝑥𝑖 = 20 , ∑7𝑖=1 𝑦𝑖 = 30, ∑7𝑖=1 𝑥𝑖2 = 420, ∑7𝑖=1 𝑦𝑖2 =280

Find i/ ∑7𝑖=1(6𝑥𝑖 + 4𝑦𝑖 ) = 6 ∑7𝑖=1 𝑥𝑖 + 4∑7𝑖=1 𝑦𝑖 = 6.20 + 4.30 = 240


ii/ 3∑7𝑖=1 𝑥𝑖2 − 2 ∑7𝑖=1 𝑦𝑖2 = 3.420 – 2.280 = 700

3.4 Important Characteristics of Measures of Central Tendency


A typical average should possess the following:
 It should be rigidly defined, exist and be unique.
 It should be based on all observation under investigation.
 It should be as little as affected by extreme observations.
 It should be capable of further algebraic treatment.
 It should be as little as affected by fluctuations of sampling.
 It should be ease to calculate and simple to understand.
3.5 Types of Measures of Central Tendency
Measures of Central Tendency:- give us information about the location of the center of the
distribution of data values. A single value that approximately describes the characteristics of the
entire mass of data is called measures of central tendency. We will discuss briefly the three
measures of central tendency: Mean, Median and Mode in this unit.
The following are types of Central Tendency which are suitable for a particular type of data. These
are
 Mean
-Arithmetic Mean
- Weighted Arithmetic Mean
- Combined mean
- Geometric Mean
-Harmonic Mean
 Median
 Mode or modal value

30 | P a g e
3.5.1 Arithmetic Mean
Arithmetic mean is defined as the sum of the measurements of the items divided by the total number of
items. It is usually denoted by 𝑥̅ .

Arithmetic Mean for individual series


Suppose 𝑥1 , 𝑥2 , … , 𝑥𝑛 are observed values in a sample of size n from a population of size N, n<N
then the arithmetic mean of the sample, denoted by 𝑥̅ is given by
𝑥1 + 𝑥2+ … +𝑥𝑛 ∑𝑛
𝑖=1 𝑥𝑖
𝑥̅ = =
𝑛 𝑛

If we take an entire population the mean is denoted by μ and is given by:


𝑋1 + 𝑋2+ … +𝑋𝑁 ∑𝑁
𝑖=1 𝑋𝑖
𝜇= =
𝑁 𝑁

Where N stands for the total number of observations in the population.


Example 3.2: The data represent the number of days off per year for a sample of individuals

Selected from nine different countries. Find the mean.

20, 26, 40, 36, 23, 42, 35, 24, 30

Solution:
The sample values are 20, 26, 40, 36, 23, 42, 35, 24, and 30
∑𝑛
𝑖=1 𝑥𝑖 20+ 26+40+36+23+42+35+24+30 276
𝑥̅ = = = = 30.7 days
𝑛 9 9

Hence, the mean of the number of days off is 30.7 days.


Arithmetic mean for discrete data arranged in frequency distribution

When the numbers 𝑥1 , 𝑥2 , … , 𝑥𝑘 occur with frequencies 𝑓1 , 𝑓2 , … , 𝑓𝑘 , respectively, then the


mean can be expressed in a more compact form as:
𝑥1 𝑓1 +𝑥2 𝑓2 + …+𝑥𝑘 𝑓𝑘 ∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = = ∑𝑘
𝑓1 +𝑓2 + …+ 𝑓𝑘 𝑖=1 𝑓𝑖

Example 3.3: Calculate the arithmetic mean of the sample of numbers of students in 10 classes:
50 42 48 60 58 54 50 42 50 42
∑𝑛
𝑖=1 𝑥𝑖 50+42+48+60+58+54+50+42+50+42 496
𝑥̅ = = = = 49.6 ≈ 50
𝑛 10 10

In this case there are three 42’s, one 48, three 50’s, one 54, one 58 and one 60. The number of
times each number occurs is called its frequency and the frequency is usually denoted by f. The
information in the sentence above can be written in a table, as follows.

31 | P a g e
Value, xi 42 48 50 54 58 60
Frequency, fi 3 1 3 1 1 1
xifi 126 48 150 54 58 60

The formula for the arithmetic mean for data of this type is
𝑥1 𝑓1 +𝑥2 𝑓2 + …+𝑥𝑘 𝑓𝑘 ∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = = ∑𝑘
𝑓1 +𝑓2 + …+ 𝑓𝑘 𝑖=1 𝑓𝑖

In this case we have:


42𝑥3 + 48𝑥1 + 50𝑥3 + 54𝑥1+58𝑥1+60𝑥1 126+48 + 150+54+58+60 496
𝑥̅ = = = = 49.6 ≈ 50
3+1+3+1+1+1 10 10

The mean numbers of students in ten classes is 50.

Arithmetic Mean for Grouped Continuous Frequency Distribution


If data are given in the form of continuous frequency distribution, the sample mean can be
computed as
∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖 𝑥1 𝑓1 +𝑥2 𝑓2 + …+𝑥𝑘 𝑓𝑘
𝑥̅ = ∑𝑘
= where 𝑥𝑖 is the class mark of the ith class; i=1, 2, . . . , k , 𝑓𝑖 is the
𝑖=1 𝑓𝑖 𝑓1 +𝑓2 + …+ 𝑓𝑘

frequency of the ith class and k is the number of classes


Note that ∑𝑘𝑖=1 𝑓𝑖 = n = the total number of observations.
Example 3.4: The following frequency table gives the height (in inches) of 100 students in a
college.
Class Interval (CI) 60- 62-64 64-66 66-68 68-70 70-72 Total
62
Frequency (f) 5 18 42 20 8 7 100

Calculate the mean


Solution: The formula to be used for the mean is as follows:

∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = ∑𝑘
𝑖=1 𝑓𝑖
Let us calculate these values and make a table for these values for the sake of convenience.

Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total
Frequency (f) 5 18 42 20 8 7 100
Mid-Point (𝑥𝑖 ) 61 63 65 67 69 71
𝑓𝑖 𝑥𝑖 305 1134 2730 1340 552 497 6558

Substituting these values with ∑6𝑖=1 𝑓𝑖 = 100, we get

32 | P a g e
∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖 6558
𝑥̅ = ∑𝑘
= 𝑥̅ = = 65.58
𝑖=1 𝑓𝑖 100

The mean height of students is 65.58

Properties of the Arithmetic Mean


• The algebraic sum of the deviations of a set of numbers 𝑥1 , 𝑥2 , … , 𝑥𝑛 from their mean x is
always zero. i.e.
n

 ( x  x)  0
i 1
i

• The sum of squares of deviations from the mean is the least comparing to other measure of central
n
tendencies. That is, (x
i 1
i  A) 2 is minimum when A  x .

 If the mean of 𝑥1 , 𝑥2 , … , 𝑥𝑛 is 𝑥̅ , then


a) The mean of 𝑥1 ± k, 𝑥2 ± k ,..., 𝑥𝑛 ± k will be 𝑥̅ ± k
b) The mean of 𝑘𝑥1 , 𝑘𝑥2 , … , 𝑘𝑥𝑛 will be k 𝑥̅ .
Merits of Arithmetic Mean
 Arithmetic mean has a rigidly defined mathematical formula so that its value is always
definite or unique. It can be calculated for any set of numerical data.
 It is calculated based on all observations.
 Arithmetic mean is simple to calculate and easy to understand.
 It doesn’t need arrangement of data in increasing or decreasing order to calculate the
results.
 Arithmetic mean of many samples from the same population does not fluctuate
considerably.
 It affords a good standard of comparison.
Demerits of Arithmetic Mean
• It can’t be calculated for data which are not quantifiable.
• It is highly affected by extreme (abnormal) values in the series.
• It can be a number which does not exist in the series.
• It can’t be calculated for grouped continuous open-ended classes.
Weighted Arithmetic Mean

33 | P a g e
While calculating simple arithmetic mean, all items were assumed to be of equally importance
(each value in the data set has equal weight). When the observations have different weight, we use
weighted average. Weights are assigned to each item in proportion to its relative importance.
If 𝑥1 , 𝑥2 , … , 𝑥𝑛 represent values of the items and 𝑤1 , 𝑤2 , … , 𝑤𝑛 are the corresponding weights,
then the weighted mean, (𝑥̅𝑤 ) is given by

w1 x1  w2 x2    wn xn  wi xi
xw  
w1  w2    wn  wi
Example 3.5: A student’s final mark in Mathematics, Physics, Chemistry and Biology are
respectively A, B, D and C. If the respective credits (weight) received for these courses are 4, 4, 3
and 2, determine the average grade the student has got for the course.
Solution
We use a weighted arithmetic mean, weight associated with each course being taken as the number
of credits received for the corresponding course.
𝑥𝑖 4 3 1 2 Total
𝑤𝑖 4 4 3 2 13
𝑥𝑖 𝑤𝑖 16 12 3 4 35

w1 x1  w2 x2    wn xn  wi xi
xw  
w1  w2    wn  wi
16+12+3+4 35
= = = 2.69
13 13

Average grade of the student is approximately 2.69.

Combined mean: When a set of observations is divided into k groups and 𝑥̅1 is the mean of n1
observations of group 1, 𝑥̅2 is the mean of n2 observations of group2, …, 𝑥̅𝑘 is the mean of nk
observations of group k, then the combined mean, denoted by 𝑥̅𝑐 , of all observations taken together
is given by

𝑥̅1 𝑛1 + 𝑥̅2 𝑛2 + ⋯ + 𝑥̅𝑘 𝑛𝑘


𝑥̅𝑐 =
𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘

This is a special case of the weighted mean. In this case the sample sizes are the weights.

34 | P a g e
Example 3.6: In the Previous year there were two sections taking Statistics course. At the end of
the semester, the two sections got average marks of 70 & 78. There were 45 and 50 students in
each section respectively. Find the mean mark for the entire students.
Solution:
𝑥̅ 1 𝑛1 +𝑥̅ 2 𝑛2 +⋯+𝑥̅ 𝑘 𝑛𝑘 𝑥̅ 1 𝑛1 +𝑥̅ 2 𝑛2 70𝑥45 +78𝑥50 7050
̅𝑥𝑐 = = = = = 74.21
𝑛1 +𝑛2 +⋯+𝑛𝑘 𝑛1 +𝑛2 45+50 95

The combined mean of the entire students will be 74.21.


Geometric Mean

The geometric mean like arithmetic mean is calculated an average. It is used when observed values
are measured as ratios, percentages, proportions, indices or growth rates.

Geometric mean for individual series: The geometric mean, G.M. of an individual series of
positive numbers (> 0) 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined as the nth root of their product.
1
G.M  n x1 .x2  xn = antilog ( ∑ 𝑙𝑜𝑔𝑥𝑖 )
𝑛

Example 3.7: Find the G. M of (a) 3 and 12 b) 2, 4 and 8

Solution: a) GM  3  12  36  6 ; b) GM= √2𝑥4𝑥8 = √64 = 4


3 3

Properties of geometric mean


 It is less affected by extreme values. E.g. x = 2, 5, 8, 72; Find compare for Arithmetic and
geometric mean?
 It takes each and every observation into consideration.
 If the value of one observation is zero its values becomes zero.
Geometric mean for discrete data arranged in FD: When the numbers 𝑥1 , 𝑥2 , … , 𝑥𝑘 occur
with frequencies 𝑓1 , 𝑓2 , … , 𝑓𝑚 , respectively, then the geometric mean is obtained by
1
G.M .  n x1f1 .x2f 2 ..xmf m = antilog ( ∑ 𝑓𝑖 𝑙𝑜𝑔𝑥𝑖 ) where n is sum of 𝑓𝑖 for all i.
𝑛
Example 3.8: Compute the geometric mean of the following values: 3, 3, 4, 4, 4, 5, 6 and 6.
Solution
Values 3 4 5 6
Frequency 2 3 1 2
8
G.M. = √32 𝑋43 𝑋51 𝑋62 = 4.236
The geometric mean for the given data is 4.236.

35 | P a g e
Geometric mean for continuous grouped FD:- The above formula can also be used whenever
the frequency distribution is grouped continuous, class marks of the class intervals are considered
as xi.
Harmonic Mean
It is a suitable measure of central tendency when the data pertains to speed, rate and time. The
harmonic Mean of n values is defined as n divided by the sum of their reciprocal.
Harmonic mean for individual series: If 𝑥1 , 𝑥2 , … , 𝑥𝑛 are n observations, then harmonic mean
can be represented by the following formula:
n
H .M 
1 1 1
 
x1 x2 xn

Example 3.9 A car travels 25 miles at 25 mph, 25 miles at 50 mph, and 25 miles at 75 mph. Find
average mean ( the harmonic mean) of the three velocities.
Solution:
n 3
H .M  = 1 1 1 = 40.9
1 1 1 + +
25 50 75
 
x1 x2 xn

Harmonic mean for discrete data arranged in FD: If the data is arranged in the form of
frequency distribution
n
H .M  m

f1 f 2 f , where n f k
  m k 1

x1 x 2 xm

Harmonic mean for continuous grouped FD: Whenever the frequency distribution are grouped
continuous, class marks of the class intervals are considered as 𝑥𝑖 and the above formula can be
used as
𝑛 m
H.M. = 𝑓𝑖 where n   f k
∑𝑛
𝑖=1𝑥 k 1
𝑖

𝑥𝑖 is the class mark of ith class


Properties of harmonic mean
 It is unique for a given set of data.
 It takes each and every observation into consideration.
 Difficult to calculate and understand.

36 | P a g e
 Appropriate measure of central tendency in situations where data is in time, speed or rate.
Relations among different means
i. If all the observations are positive we have the relationship among the three means given as: 𝑥̅ ≥
GM ≥HM
ii. For two observations √𝑥̅ ∗ HM = GM
iii. 𝑥̅ = GM = HM if all observation are positive and have equal value.
3.5.2 Median
The median is as its name indicates the middle most value in the arrangement which divides the
data into two equal parts. It is obtained by arranging the data in an increasing or decreasing order
of magnitude and denoted by 𝑥̃.
Median for individual series
We arrange the sample in ascending order of the variable of interest. Then the median is the middle
value (if the sample size n is odd) or the average of the two middle values (if the sample size n is
even).
For individual series the median is obtained by
𝑛+1 𝑡ℎ
a/ 𝑥̃ = ( ) value if n is odd, and
2
𝑛 𝑛
( )𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 + ( +1)𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
2 2
b/ 𝑥̃ = if n is even
2

Example 3.10: Find the median for the following data.


a/ -5 15 10 5 0 2 1 4 6 and 8
b/ 5 2 2 3 1 8 4
Solution;
i. The data in ascending order is given by:
-5 0 1 2 4 5 6 8 10 15
n=10 n is even. The two middle values are 5th and 6th observations. So the median
is,
10 10
( )𝑡ℎ +( +1)𝑡ℎ 5𝑡ℎ +6𝑡ℎ 4+5
2 2
𝑥̃ = value = 2 = 2 = 4.5
2
ii. The data in ascending order is given by:
1 2 2 3 4 5 8
The middle value is the 4th observation. So the median is 3.
Note: The median is easy to calculate for small samples and is not affected by an "outlier".
Median for Discrete data arranged in a frequency distribution:- In this case also, the median
is obtained by the above formula. After arranging the values in an increasing order find the smallest

37 | P a g e
CF greater than or equal to the rank/position of the median value (i.e., that value obtained by a &
b above formula) and the corresponding value is the median.
Median for grouped continuous data:-For continuous data, the median is obtained by the following
formula.

w n 
Median  L    CF   ~
x
f med  2 

Where: L= the lower class boundary of the median class;

W = the class width of the median class;

𝑓𝑚𝑒𝑑 = the frequency of the median class; and CF = the cum. freq. corresponding to the class
preceding the median class. That is, the sums of the frequencies of all classes lower than the median
class. Where the median class is the class which contains the (n/2)th observation whether n is odd or
even, since the items have already lost their originality once they are grouped in to continuous classes.

Example 3.11: Calculate the median for the following frequency distribution.

C.I 1 - 5 6 - 10 11 – 15 16 – 20 21 - 25 26 – 30 31 - 35 Total

Freq. 4 8 12 6 3 4 3 40

Solution: Construct the less than cumulative frequency distribution, then:

C.I 1-5 6 - 10 11 – 15 16 – 20 21 - 25 26 – 30 31 - 35 Total

Freq. 4 8 12 6 3 4 3 40

Cuml. Freq. 4 12 24 30 33 37 40

Since n = 40, 40/2 = 20, and the smallest CF greater than or equal to 20 is 24; thus, the median class

is the third class. And for this class, L = 10.5, w = 5, f med =12, CF = 12. Then applying the formula,
we get:
~
x =10.5+(20-12)*5/12=13.8
Merits of median
 It is less affected by extreme values.
38 | P a g e
 Median can be calculated even in case of open-ended intervals.
 It can be computed for ratio, interval, and ordinal level of data.
Demerits of median
 Its value is not determined by each & every observation.
 It is not a good representative of the data if the number of items (data) is small.
 The arrangement of items in order of magnitude is sometimes very tedious process if the
number of items is very large.
3.5.3 The Mode
The mode or the modal value is the value with the highest frequency and denoted by 𝑥̂. A data set
may not have a mode or may have more than one mode. A distribution is called a bimodal
distribution if it has two data values that appear with the greatest frequency. If a distribution has
more than two modes, then the distribution is multi modal. If a distribution has no modes, then the
distribution is no modal.

Mode of individual series:- The mode or the modal value of individual series (raw data) is simply
obtained by locating the observation with the maximum frequency.

Example 3.12: Consider the following data:


a. 30 45 69 70 32 18 32. The mode (𝑥̂ ) = 32.
b. 10 20 30 10 40 30. The mode (𝑥̂ ) = 10 and 30.
c. 10 40 30 20 50 60. No mode.
Note that in some samples there may be more than one mode or there may not be a mode. The
mode is not a suitable measure of central tendency in these cases. We use the mode as a measure
of central tendency if we require a measure that takes on one of the sample values. The mode can
be used for variables that are measured on a category (nominal) scale, e.g. the most popular
computer type.

Mode for discrete data arranged in a frequency distribution:-In the case of discrete grouped data,
the mode is determined just by looking to that value (s) having the highest frequency.

Mode for Grouped Continuous Frequency Distribution


For grouped data, the mode is found by the following formula:
In such cases, one can only determine the modal class easily: the class with the highest
frequency.

39 | P a g e
After locating this class, the mode is interpolated using:
1
Mode  L   w , where L = the lower class boundary of the modal class;  1  f mod  f 1 ,
1   2

 2  f mod  f 2 , w = the common class width, f 1 = frequency of the class immediately preceding the

modal class; f 2 = frequency of the class immediately succeeding the modal class; and fmode = frequency
of the modal class.
Example 3.13: Calculate the mode for the frequency distribution of data of example 3.11.
Solution: By inspection, the mode lies in the third class, where L =10.5, fmod = 12, f1=8, f2=6, w = 5
Using the formula, the mode is:
1
Mode  L   w = 10.5 + (12-8)*5/(12-8)+(12-5) = 12.5
1   2
Merits of mode
 Mode is not affected by extreme values.
 We can change the size of the observations without changing the mode.
 It can be computed for all level of data i.e. ratio, interval, ordinal or nominal.
Demerits of mode
 It may not exist.
 It does not take every value into consideration.
 Mode may not exist in the series and if it exists it may not be unique.
3.5.4 The Relationship of the Mean, Median and Mode
Comparing the Mean, Median, and the Mode
 If the data is skewed –avoid the mean.
 If there is high gap around the middle- avoid the median.
 A measure is a resistant measure if its value is not affected by an outlier or an extreme
data value.
 The mean is not a resistant measure of central tendency because it is not resistant to the
influence of the extreme data values or outliers.
 The median is resistant to the influence of extreme data values or outliers and its value does
not respond strongly to the changes of a few extreme data values regardless of how large
the change may be.

40 | P a g e
 The mode has an advantage over both the mean and the median when the data is
categorical since it is not possible to calculate the mean or median for this type of data.
Also, the mode usually indicates the location within a large distribution where the data
values are concentrated. However, the mode cannot always be calculated because if a
distribution has all different data values, then the distribution is non modal.
 In the case of symmetrical distribution; mean, median and mode coincide. That is
mean=median = mode. However, for a moderately asymmetrical (nonsymmetrical)
distribution, mean and mode lie on the two ends and median lies between them and they
have the following important empirical relationship, which is
Mean – Mode = 3(Mean - Median)
Example 3.14: In a moderately asymmetrical distribution, the mean and the mode are 30 and 42
respectively. What is the median of the distribution?
Solution:
Median = (2mean + Mode)/2 = (2*30 + 42)/3 = 34
Hence the median of the distribution is 34.
Which of the Three Measures is the ‘’Best’’?
At this stage, one may ask as to which of these three measure of central tendency is the best. There
is no simple answer to this question. It is because these three measures are based upon different
concepts. The arithmetic mean is the sum of the values divided by the total number of observations
in the series. The median is the value of the middle observations tend to concentrate. As such; the
use of a particular measure will largely depend on the purpose of the study and the nature of the
data. For example, when we are interested in knowing the consumers’ preferences for different
brands of television sets or kinds of advertising, the choice should go in favor of mode. The use of
mean and median would not be proper. However, the median can sometimes be used in the case
of qualitative data when such data can be arranged in an ascending or descending order. Let us
take another example. Suppose we invite applications for a certain vacancy in our company. A
large number of candidates apply for that post. We are now interested to know as to which age or
age group has the largest concentration of applicants. Here, obviously the mode will be the most
appropriate choice. The arithmetic mean may not be appropriate as it may be influenced by some
extreme values.

41 | P a g e
3.6 The Quantiles (Quartiles, Deciles, Percentiles)
Median is the value of the middle item, which divides the data in to two equal parts and found by
arranging the data in an increasing or decreasing order of magnitude, whereas quintiles are
measures, which divides a given set of data in to approximately equal subdivision and are obtained
by the same procedure to that of median. They are averages of position (non-central tendency).
Some of these are quartiles, deciles and percentiles.
Quartiles: are values which divide the data set in to approximately four equal parts, denoted by
𝑄1 , 𝑄2 𝑎𝑛𝑑 𝑄3 . The first quartile (𝑄1) is also called the lower quartile and the third quartile (𝑄3 )
is the upper quartile. The second quartile ( 𝑄2 ) is the median.
• Quartiles for Individual series:

Let x1 , x 2 ,  , x n be n ordered observations. The ith quartile Qi  is the value of the item
corresponding

with the [i(n+1)/4]th position, i = 1, 2, 3.

That is, after arranging the data in ascending order, Q1, Q2, & Q3 are, obtained by:

1(𝑛+1) 𝑡ℎ 2(𝑛+1) 𝑡ℎ 3(𝑛+1) 𝑡ℎ


𝑄1 = ( ) 𝑣𝑎𝑙𝑢𝑒, 𝑄2 = ( ) 𝑣𝑎𝑙𝑢𝑒 and 𝑄3 = ( ) 𝑣𝑎𝑙𝑢𝑒.
4 4 4

• Quartiles for discrete data arranged in a frequency distribution:-Arranged in a frequency


distribution this case also, we will follow the same procedure as the median. That is, we construct the
less than cumulative frequency distribution and apply the formula of quartile for individual series.

• Quartiles in continuous data:- For continuous data, use the following formula:

w  in 
Qi  L    CF 
f Qi  4 

Where i = 1,2, 3, and L, w ,fQi and CF are defined in the same way as the median.
𝑤 𝑛 𝑤 2𝑛 𝑤 3𝑛
i.e. Q1 = L + ( − 𝐶𝐹) , Q2 = L + ( − 𝐶𝐹) 𝑎𝑛𝑑 Q3 = L + ( − 𝐶𝐹)
𝑓𝑄1 4 𝑓𝑄2 4 𝑓𝑄3 4

The class under question is the one including (ixn/4)th value. That is, the class with the minimum
frequency greater than or equal to (ixn/4) th is the class of the ith quartile.
Deciles: are values dividing the data approximately in to ten equal parts, denoted by 𝐷1 , 𝐷2,…, 𝐷9 .
• Deciles for Individual Series:

42 | P a g e
Let x1 , x 2 ,  , x n be n ordered observations. The ith decile (𝐷𝑖 ) is the value of the item

corresponding

with the [i(n+1)/10]th position, i = 1, 2, . . . ,9.

That is, after arranging the data in ascending order, D1, D2, . . . & D9 are, obtained by:

1(𝑛+1) 𝑡ℎ 2(𝑛+1) 𝑡ℎ 9(𝑛+1) 𝑡ℎ


𝐷1 = ( ) 𝑣𝑎𝑙𝑢𝑒, 𝐷2 = ( ) 𝑣𝑎𝑙𝑢𝑒 … and 𝐷9 = ( ) 𝑣𝑎𝑙𝑢𝑒.
10 10 10

• Deciles for Discrete data arranged in a frequency distribution:-Arranged in a frequency


distribution this case also, we will follow the same procedure as the median. That is, we construct
the less than cumulative frequency distribution and apply the formula of deciles for individual
series.
• Deciles for continuous data: Apply the following formula and follow the procedures of quartile
for continuous data.
𝑤 𝑖𝑛
𝐷𝑖 = 𝐿 + (10 − 𝐶𝐹) ,i = 1, 2,...,9 .
𝑓𝐷𝑖

Then define the symbols in similar ways as we did in the case of quartiles for continuous data.
Percentiles: are values which divide the data approximately in to one hundred equal parts, and
denoted by 𝑃1 , 𝑃2,…, 𝑃99 .
• Percentiles for Individual Series:

Let x1 , x 2 ,  , x n be n ordered observations. The ith percentile (𝑃𝑖 ) is the value of the item

corresponding with the [i(n+1)/100]th position, i = 1, 2, . . . ,99.


That is, after arranging the data in ascending order, P1, P2, . . . & P99 are, obtained by:
1(𝑛+1) 𝑡ℎ 2(𝑛+1) 𝑡ℎ 99(𝑛+1) 𝑡ℎ
𝑃1 = ( ) 𝑣𝑎𝑙𝑢𝑒, 𝑃2 = ( ) 𝑣𝑎𝑙𝑢𝑒 . . . and 𝑃99 = ( ) 𝑣𝑎𝑙𝑢𝑒.
100 100 100

• Percentiles for Discrete data arranged in a frequency distribution:-Arranged in a frequency


distribution this case also, we will follow the same procedure as the median. That is, we construct
the less than cumulative frequency distribution and apply the formula of percentile for individual
series.
• Percentiles for continuous data: Apply the following formula
𝑤 𝑖𝑛
𝑃𝑖 = 𝐿 + ( − 𝐶𝐹) ,i = 1, 2,...,99 . Then
𝑓𝑃𝑖 100

Define the symbols similar ways as we did in the case of quartiles or deciles for continuous data.

43 | P a g e
Interpretations
1. 𝑄𝑖 is the value below which ( i × 25) percent of the observations in the series are found
(where i = 1, 2,3). For instance 𝑄3 means the value below which 75 percent of observations in
the given series are found.
2. 𝐷𝑖 is the value below which ( i ×10) percent of the observations in the series are found (where
i = 1, 2,...,9 ). For instance 𝐷4 is the value below which 40 percent of the values are found in the
series.
3. 𝑃𝑖 is the value below which i percent of the total observations are found (where i = 1, 2,3,...,99
). For example 60 percent of the observations in a given series are below 𝑃60 .
Example 3.15: Calculate 𝑄1 , 𝑄2 , 𝑄3, 𝐷4, 𝐷9, 𝑃40 & 𝑃90 for the following data given on the table
below.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2

Solution: The data is arranged in an increasing order. So we need to construct only the
cumulative frequency table before calculating the required values.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Cum. 2 10 35 83 148 188 208 217 219
Freq.

The total number of observations is 219 which is odd. Clearly then the median is 14. i.e.
𝑛+1 𝑡ℎ 219+1 𝑡ℎ
𝑥̃ = ( ) =( ) value = 110th value = 14
2 2

1(𝑛+1) 𝑡ℎ 1(219+1) 𝑡ℎ
𝑄1 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 55th value = 13
4 4

2(𝑛+1) 𝑡ℎ 2(219+1) 𝑡ℎ
𝑄2 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 110th value = 14 = 𝑥̃
4 4

3(𝑛+1) 𝑡ℎ 3(219+1) 𝑡ℎ
𝑄3 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 165th value = 15
4 4

4(𝑛+1) 𝑡ℎ 4(219+1) 𝑡ℎ
𝐷4 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 88th value = 14
10 10

9(𝑛+1) 𝑡ℎ 9(219+1) 𝑡ℎ
𝐷9 = ( 10
) 𝑣𝑎𝑙𝑢𝑒 = ( 10
) 𝑣𝑎𝑙𝑢𝑒 = 198th value = 16

44 | P a g e
40(𝑛+1) 𝑡ℎ 40(219+1) 𝑡ℎ
𝑃40 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 88th value = 14
100 100

90(𝑛+1) 𝑡ℎ 90(219+1) 𝑡ℎ
𝑃90 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 198th value = 16
100 100

Example 3.16: Marks of 50 students out of 85 is given below. Based on the data find 𝑄1,
𝐷4 𝑎𝑛𝑑 𝑃7.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
fi 4 8 15 5 9 5 4

Solution: first find the class boundaries and cumulative frequency distributions.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
class boundary 45.5-50.5 50.5-55.5 55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5-80.5
fi 4 8 15 5 9 5 4
Cum. 4 12 27 32 41 46 50
frequency

Q1 Measure of (n/4)th value = 12.5th value which lies in group 55.5 – 60.5
𝑤 𝑛 5
Q1 = L +𝑓 ( 4 − 𝐶𝐹) = 55.5 +15 (12.5 − 12) = 55.7
𝑄1

D4 Measure of (4n/10)th value = 20th value which lies in group 55.5 – 60.5.
𝑤 4𝑛 5
D4 = L +𝑓 ( 10 − 𝐶𝐹) = 55.5 +15 (20 − 12) = 58.2
𝐷4

P7 Measure of (7n/100)th value = 3.5th value which lies in group 45.5 – 50.5
𝑤 7𝑛 5
P7 = L +𝑓 (100 − 𝐶𝐹) = 45.5 +4 (3.5 − 0) = 49.875.
𝑃7

Exercise- 3
1. Calculate the median, quartiles, 8th decile, and 75th percentile for the following data. Show
that the value of 75th percentile is the same as that of Q3.
Lifetime (C.M) 50 100 150 200 250 300 350 400
No of Batteries 6 8 13 20 9 6 3 2

2. The following data represent the number of offences for various robberies in a town per a
given day.
No. of robberies 26 34 30 15 10 32 12 25 7

45 | P a g e
No. of days 13 19 12 30 14 8 19 20 3
Compute the mean, median and mode
3. Calculate Q1, Q2, Q3, D5, D8, and P90 for the following table
Temperature (oF) 50-59 60-69 70-79 80-89 90-99
Days 2 8 20 4 1

4. The following data represent the pulse rates (beats per minute) of nine students 76 60 60
81 72 80 80 68 and 73. Calculate the mean, mode and the third quartile.
5. The number of births in a hospital is given below
Days Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Num. of 50 60 52 55 62 30 40
births

Find the average number of births per day and the mode.
6. From the table given below find the mode and 5th decile.
Size 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50
Frequency 7 10 13 26 35 22 11 5

7. If the arithmetic mean of two items is 5 and G.M. is 4, find their H.M.
8. The following frequency distribution represents the magnitude of earth quake.
Magnitude 0-0.9 1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6-6.9 7-7.9
Frequency 20 50 45 30 10 8 6 1

Compute the median and verify that it is equal to the second quartile and find 72nd percentile.
CHAPTER FOUR
4. Measures of Dispersion (Variation)

4.1 Introduction
Just as central tendency can be measured by a number in the form of an average, the amount of
variation (dispersion, spread, or scatter) among the values in the data set can also be measured.
The measures of central tendency describe that the major part of values in the data set appears to
concentrate around a central value called average with the remaining values scattered (distributed)
on either sides of that value. But these measures do not reveal how these values are dispersed
(spread or scatter) on each side of the central value. The dispersion of values is indicated by the

46 | P a g e
extent to which these values tend to spread over an interval rather than cluster closely around an
average.
The term dispersion is generally used in two senses. Firstly, dispersion refers to the variations of
the items among themselves. If the value of all the items of a series is the same, there will be no
variation among different items of a series. Secondly, dispersion refers to the variation of the items
around an average. If the difference between the value of items and the average is large, the
dispersion will be high and on the other hand if the difference between the value of the items and
averaging is small, the dispersion will be low. Thus, dispersion is defined as scatteredness or
spreadness of the individual items in a given series.
After studying this chapter, you should be able to:
 Explain the meaning of measures of dispersion

 Compare two or more sets of data using relative measures of dispersion.


 Apply the Z-score to find out the relative standing of values.
 Explain measures of skewness and kurtosis.
Objectives of measuring Variation:
 To judge the reliability of measures of central tendency
 To control variability itself.
 To compare two or more groups of numbers in terms of their variability.
 To make further statistical analysis.

4.2 Absolute and Relative Measures of Dispersion


Absolute measures of dispersion : Absolute measure is expressed in the same
statistical unit in which the original data are given such as kilograms, tones etc. These
measures are suitable for comparing the variability in two distributions having
variables expressed in the same units and of the same averaging size. These measures
are not suitable for comparing the variability in two distributions having variables
expressed in different units.

47 | P a g e
Relative measures of dispersion: A relative measure of dispersion is the ratio of a measure of
absolute dispersion to an appropriate average or the selected items of the data.

Relative
measure of
dispersion

Based on Based on
selected all items
items
Coefficient of
Coefficient of mean deviation
range and &coefficient of
coefficient of standard deviation
quartile or coefficient of
deviation variation

4.3 Types of Measures of Variation


4.3.1 The Range and Relative Range
Range is the simplest measures of dispersion. It is defined as the difference between the largest
and smallest value in a given set of data. Its formula is:
𝑅 =𝐿−𝑆
Where R=Range, L= Largest value in a given set of data, S= smallest value in a given set of data.
For a continuous grouped distribution, the range may be obtained as:

48 | P a g e
 The difference between upper class limit of the last class and the lower class limit of the
first class, or
 The difference between the largest class mark and the smallest class mark, or
 The difference between the upper class boundary of the last class and the lower class
boundary of the first class.
The range is used in describing like the maximum change in daily temperature, rainfall, etc. When
the sample size is small, it can be an adequate measure of variation. It is commonly used in quality
control.
The relative measures of range, also called coefficient of range, is defined as
LS
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑅𝑎𝑛𝑔𝑒(𝑅𝑅) =
LS
Example 4.1: Five students obtained the following marks in statistics: 20, 35, 25, 30, 15. Find the
range and relative range
Solution: Here, 𝐿 = 35, 𝑎𝑛𝑑 𝑆 = 15

𝑅𝑎𝑛𝑔𝑒 = 𝐿 − 𝑆 = 35 − 15 = 20

LS 35  15
𝑅𝑅 =   0.4
LS 35  15

Example 4.2: Find out range and relative range of the following given data.

Size 5-10 11-15 16-20 21-25 26-30


Frequency 4 9 15 30 40
Solution: Here,
L = Upper class limit of the largest class = 30
L = lower class limit of the smallest class = 5
30  5
Range = 30 – 5 = 25, 𝑅𝑅 =  0.7143
30  5
.

Merits of the Range


 It is well-defined, easy to compute and simple to understand.
 It helps in giving an idea about the variation, just by giving the lowest value and the
greatest value of variable.
Demerits of the Range

49 | P a g e
 It is not based on all observations of the series.
 It can’t be calculated in case of open-ended distribution.
 It is affected by sampling fluctuation.
 It is affected by extreme values in the series.
4.3.2 The Quartile Deviation and Coefficient of Quartile Deviation
Inter-quartile range and quartile deviation are other measures of dispersion. The difference
between the upper quartile (𝑄3 ) and lower quartile (𝑄1 ) is called inter-quartile range.
Symbolically,
𝑰𝑛𝑡𝑒𝑟 𝑸𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑹𝑎𝑛𝑔𝑒 (𝐼𝑄𝐷) = 𝑄3 − 𝑄1
The inter-quartile ranges covers dispersion of middle 50% of the items of the series. Quartile
deviation, also called semi-inter-quartile range, is half of the difference between the upper and
lower quartile. That is, half of the inter-quartile range. Its formula is
𝑄3 − 𝑄1
𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑄𝐷) =
2
The relative measure of quartile deviation also called the coefficient of quartile deviation (CQD)
is defined as:
𝑄3 − 𝑄1
𝐶𝑄𝐷 =
𝑄3 + 𝑄1
Example 4.3: Find inter-quartile range, quartile deviation and coefficient of quartile deviation
from the following data.
28, 18, 20, 24, 27, 30, 15
Solution: First arrange the data in ascending order. 15, 18, 20, 24, 27, 28, 30
𝑛 + 1 𝑡ℎ 7 + 1 𝑡ℎ
𝑄1 = 𝑠𝑖𝑧𝑒 𝑜𝑓 ( ) 𝑖𝑡𝑒𝑚 = 𝑠𝑖𝑧𝑒 𝑜𝑓 ( ) 𝑖𝑡𝑒𝑚
4 4
= 𝑠𝑖𝑧𝑒 𝑜𝑓 2𝑛𝑑 𝑖𝑡𝑒𝑚 = 18 𝑚𝑎𝑟𝑘𝑠
𝑛 + 1 𝑡ℎ 7 + 1 𝑡ℎ
𝑄3 = 𝑠𝑖𝑧𝑒 𝑜𝑓 3 ( ) 𝑖𝑡𝑒𝑚 𝑠𝑖𝑧𝑒 𝑜𝑓 3 ( ) 𝑖𝑡𝑒𝑚
4 4
= 𝑠𝑖𝑧𝑒 𝑜𝑓 6𝑡ℎ 𝑖𝑡𝑒𝑚 = 28 𝑚𝑎𝑟𝑘𝑠
𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 28 − 18 = 10
𝑄3 − 𝑄1 28 − 18
𝑄𝐷 = = =5
2 2
𝑄3 − 𝑄1 28 − 18
𝐶𝑄𝐷 = = = 0.217
𝑄3 + 𝑄1 28 + 18

50 | P a g e
Example 4.4: Find inter-quartile range, quartile deviation and coefficient of quartile deviation
from the following data
Marks 2 3 4 5 6 7 8 9
No. Of students 10 11 12 13 5 12 7 5
Solution:
Marks 2 3 4 5 6 7 8 9
No. of students 10 11 12 13 5 12 7 5
CF 10 21 33 46 51 63 70 75=N

𝑁+1 75 + 1
𝑄1 = ( )= = 19𝑡ℎ 𝑖𝑡𝑒𝑚 = 3
4 4
𝑁+1 75+1
𝑄3 = 3 ( ) = 3( ) = 57th item = 7
4 4

𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 7 − 3 = 4
𝑄3 − 𝑄1 7 − 3
𝑄𝐷 = = =2
2 2
𝑄3 − 𝑄1 7 − 3
𝐶𝑄𝐷 = = = 0.4
𝑄3 + 𝑄1 7 + 3
Remark: Q.D or CQD includes only the middle 50% of the observation.
Merits of QD
 It is well-defined, easy to compute and simple to understand.
 It helps in studying the middle 50% item in the series.
 It is not affected by the extreme items.
 It is useful in measuring variations in the case of open-ended distributions.
Demerits of QD
 It is not based on all the items (it ignores 50% items, i.e., the first 25% and the last
25%).
 It is greatly influenced by sampling fluctuations.
 It is not amenable to algebraic manipulations.

4.3.3 The Mean Deviation and Coefficient of Mean Deviation


The mean deviation (MD) measures the average deviation of a set of observations about their
central value, generally the mean or the median, ignoring the plus/minus sign of the deviations. In
other words the mean deviation of a set of items is defined as the arithmetic mean of the values of

51 | P a g e
the absolute deviations from a given average. Depending up on the type of averages used we have
different mean deviations.
 The mean deviation of a sample of n observations x1, x2, . . .,xn (individual series)is given
as
∑|𝑋𝑖 − 𝐴|
𝑀𝐷 =
𝑛
Where |𝑋𝑖 − 𝐴| denotes the absolute value of the deviation. Generally, arithmetic mean and
median are used in calculating mean deviation. So, 𝐴 stands for the average used for
calculating 𝑀𝐷. That is, 𝐴 = 𝑚𝑒𝑑𝑖𝑎𝑛(𝑋̃ ) 𝑜𝑟 𝐴 = 𝑚𝑒𝑎𝑛(𝑋̅).
 In case of discrete data arranged in FD and continuous grouped data, the formula for MD
becomes
∑ 𝑓𝑖 |𝑋𝑖 −𝐴|
𝑀𝐷 = , where 𝑋𝑖 is the class mark of the ith class, 𝑓𝑖 is the frequency of the ith
𝑛

class and n = ∑ 𝑓𝑖 .
1. The mean deviation about the arithmetic mean is, therefore, given by
∑|𝑋 −𝑋̅|
𝑀𝐷(𝑋̅) = 𝑖 … for ungrouped data (individual series).
𝑛
∑ 𝑓 |𝑋 −𝑋 | ̅
𝑀𝐷 (𝑋̅) = 𝑖 𝑛 𝑖 . . . for discrete data arranged in FD and a grouped continuous

frequency distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class
mark of the ith class for continuous grouped data, 𝑓𝑖 is the frequency of the ith class and n =
∑ 𝑓𝑖 .
Steps to calculate M.D for (𝑋̅)
 Find the arithmetic mean, 𝑋̅
 Find the deviations of each reading from 𝑋̅
 Find the arithmetic mean of the deviations, ignoring sign.
2. The mean deviation about the median is also given by
∑|𝑋 −x̃|
𝑀𝐷(𝑋̃) = 𝑖 … for ungrouped data (individual series).
𝑛
∑ 𝑓 |𝑋 −x̃|
𝑀𝐷(𝑋̃) = 𝑖 𝑛 𝑖 . . . for discrete data arranged in FD and a grouped continuous

frequency distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class
mark of the ith class for continuous grouped data , 𝑓𝑖 is the frequency of the ith class and n
= ∑ 𝑓𝑖 .
Steps to calculate M.D (𝑋̃ )

52 | P a g e
 Find the median, 𝑋̃
 Find the deviations of each reading from 𝑋̃
 Find the arithmetic mean of the deviations, ignoring sign.
3. The mean deviation about the mode is also given by
∑|𝑋 −x̂|
𝑀𝐷(x̂) = 𝑛𝑖 … for ungrouped data (individual series).
∑ 𝑓𝑖 |𝑋𝑖 −x̂|
𝑀𝐷(x̂) = . . for discrete data arranged in FD and a grouped continuous frequency
𝑛

distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class mark of the
ith class for continuous grouped data, 𝑓𝑖 is the frequency of the ith class and n = ∑ 𝑓𝑖 .
Steps to calculate M.D (x̂)
 Find the mode, x̂
 Find the deviations of each reading from x̂
 Find the arithmetic mean of the deviations, ignoring sign.
Example 4.5
The following are the number of visit made by ten mothers to the local doctor’s surgery. 8, 6, 5, 5,
7, 4, 5, 9, 7, 4. Find mean deviation about mean, median and mode.
Solution:
First calculate the three averages
𝑋̅ = 6, 𝑋̃ = 5.5, x̂ = 5
Then take the deviations of each observation from these averages.
xi 4 4 5 5 5 6 7 7 8 9 Total
|𝑋𝑖 − 𝑋̅| 2 2 1 1 1 0 1 1 2 3 14

|𝑋𝑖 − x̃| 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14

|𝑋𝑖 − 𝑋̂| 1 1 0 0 0 1 2 2 3 4 14

Since the distribution is ungrouped the mean deviation about mean, median and mode:
∑|𝑋𝑖 − 𝑋̅| 14
𝑀𝐷(𝑋̅) = = = 1.4
𝑛 10
∑|𝑋𝑖 − x̃| 14
𝑀𝐷(𝑋̃) = = = 1.4
𝑛 10
∑|𝑋𝑖 −x̂| 14
𝑀𝐷(x̂) = = 10 = 1.4
𝑛
Merits of 𝑴𝑫

53 | P a g e
 It is well-defined, easy to compute and simple to understand.
 It is based on all observations.
 It is not greatly affected by the extreme items.
 It can be calculated by using any average.
Demerit of 𝑴𝑫
 It does not take in to account the signs of the deviations of items from the average.
Remark: Of all the mean deviations taken about different averages or any arbitrary value, the
mean deviation about the median has the smallest value.
Coefficient of mean deviation (CMD):
The relative measure of mean deviation, also called the coefficient of mean deviation is obtained
by dividing mean deviation by the particular average used in computing mean deviation. Thus,
 CMD about the arithmetic mean is given by:
𝑀𝐷(𝑋 ) ̅
𝐶𝑀𝐷(𝑋̅) = 𝑋̅ where MD is the mean deviation calculated about the arithmetic mean.

 CMD about the median is given by:


𝑀𝐷(𝑋 ) ̃
𝐶𝑀𝐷(𝑋̃) = 𝑋̃ in which case MD is calculated about the median of the observations.

 CMD about the mode is given by:


𝑀𝐷(x̂)
𝐶𝑀𝐷(x̂) = in which case MD is calculated about the mode of the observations.

Example 4.6: Calculate the coefficient of mean deviation about the mean, median and mode for
the data in Example 4.5 above.
Solution:
𝑀𝐷(𝑋̅) 1.4
𝐶𝑀𝐷(𝑋̅) = = = 0.23
𝑋̅ 6
𝑀𝐷(𝑋̃) 1.4
𝐶𝑀𝐷(𝑋̃) = = = 0.25
𝑋̃ 5.5
𝑀𝐷(x̂) 1.4
𝐶𝑀𝐷(x̂) = = = 0.28
x̂ 5
4.3.4 The Variance, Standard Deviation and Coefficient of Variation
Variance and Standard Deviation
Like the mean deviation, the variance is also based on all observations in a set of data. But
the variance is the average of squared deviations from the mean. Recall that the sum of squared
deviations is minimum only when taken from the mean. Squared deviations are mathematically

54 | P a g e
manipulated than absolute deviations. Thus, if we averaged the squared deviations from the mean
and take the square root of the result (to compensate for the fact that the deviations were squared),
we obtain the standard deviation. This overcomes the limitation of the mean deviation.
Population Variance (𝝈𝟐 )
If we divide the variation by the number of values in the population, we get something called the
population variance. This variance is the "average squared deviation from the mean".
 For ungrouped data (individual series )
∑𝑵
𝒊=𝟏(𝑿𝒊 −𝝁)
𝟐 𝟏 2
𝝈𝟐 = = 𝑵 [∑N 𝟐
i=1 X i − 𝑵𝝁 ] where 𝝁 is the population arithmetic mean and N is
𝑵

the total number of observations in the population.

 For discrete data arranged in FD & for continuous grouped data


∑ 𝒇𝒊 (𝑿𝒊 −𝝁)𝟐 𝟏
𝝈𝟐 = = 𝑵 [∑ fi X i 2 − 𝑵𝝁𝟐 ] where 𝝁 is the population arithmetic mean, 𝑿𝒊 is the class
𝑵

mark of the ith class, fi is the frequency of the ithclass and N=∑ fi
Sample Variance (𝑺𝟐 )
One would expect the sample variance to simply be the population variance with the population
mean replaced by the sample mean. However, one of the major uses of statistics is to estimate the
corresponding parameter. This formula has the problem that the estimated value isn't the same as
the parameter. To offset this, the sum of the squares of the deviations is divided by one less than
the sample size.
 For ungrouped data
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 1
𝑆2 = = 𝑛−1 [∑ni=1 xi 2 − 𝑛𝑥̅ 2 ] , Where 𝒙
̅ is the sample arithmetic mean and n is
𝑛−1

the total number of observations in the sample.


 For discrete data arranged in FD

If the values xi have frequencies fi (i=1,2,…,m), then the sample variance is given by:

1 m
S   fi  xi  x 
2 2
∑ 𝑓𝑖 (𝑥𝑖 −𝑥̅ )2 1
𝑆2 = = 𝑛−1 [∑ fi xi 2 − 𝑛𝑥̅ 2 ] or
𝑛−1 n  1 i 1
 For continuous grouped data

55 | P a g e
∑ 𝑓𝑖 (𝑥𝑖 −𝑥̅ )2 1
𝑆2 = = 𝑛−1 [∑ fi xi 2 − 𝑛𝑥̅ 2 ] where 𝒙
̅ is the sample arithmetic mean, 𝒙𝒊 is the class mark
𝑛−1

of the ith class, fi is the frequency of the ith class and n=∑ fi .
The Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square root
must be taken.
 Population Standard Deviation (s )
𝜎 = √𝝈𝟐 where 𝜎 2 is the population variance.
 Sample Standard Deviation ( S )
𝑆 = √𝑆 2 where 𝑆 2 is the sample variance.

Example 4.7: Find the sample variance and standard deviation of:

xi 2 4 5 6 8

fi 2 2 3 1 2

Solution: Prepare the following table:

xi fi fixi xi2 fixi2


2 2 4 4 8
4 2 8 16 32
5 3 15 25 75
6 1 6 36 36
8 2 16 64 128
Sum 10 49 279

Thus, n=∑ fi = 10, ∑ fi xi = 49, ∑ fi xi 2 = 279.

1
𝑆2 = [∑ fi xi 2 − 𝑛𝑥̅ 2 ]
𝑛−1

1 49 1
= 9 [279 − 10(10)2 ] = 9 (38.9) = 4.32, 𝑎𝑛𝑑 𝑆 = √4.32 = 2.08.

56 | P a g e
Example 4.8: Find the sample variance and standard deviation for the distribution:

C.I 1-5 6-10 11-15 16-20

Freq. 4 1 2 3

Solution: In a continuous F.D., xi is the class mark representing the ith class.

C.I xi fi f i xi 2
f i xi
1-5 3 4 12 36

6-10 8 1 8 64

11-15 13 2 26 338

16.20 18 3 54 972

Total 10 100 1410

∑ fi xi 100
Where, n=∑ fi = 10, x̅ = = = 10, ∑ fi xi 2 = 1410, so that
𝑛 10
1 1
𝑆 2 = 𝑛−1 [∑ fi xi 2 − 𝑛𝑥̅ 2 ] = 9 [1410 − 10(10)2 ]
410
= = 45.56,
9
𝑆 = √45.56 = 6.75.

Properties of Variance & Standard Deviation

1. If a constant is added to (or subtracted from) all the values, the variance remains the same;

i.e., for any constant k, V ( xi  k )  V ( xi ) .


Example 4.9 Consider the 6 sample values xi: 54,52,53,50,51, and 52.

The sample variance is 2 = V xi  . Now, subtract 50 from each value to get:

yi : 4, 2, 3, 0, 1, 2; and, the variance of this new series is 2. i.e., V x   V  y   2 .

1. If each and every value is multiplied by a non-zero constant (k), the standard deviation is

multiplied by |𝑘| and the variance is multiplied by k2; i.e., V (kxi )  k V ( xi ) .


2

57 | P a g e
2. Both the variance and the standard deviation give more weight to extreme values and less
to those which are near to the mean.
Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).
Of course, standard deviation is an absolute measure of dispersion that expresses the variation in
the same unit as the original data but it cannot be the sole basis for comparing two distributions.
For instance, if we have a standard deviation of 10 and a mean of 5, the values vary by an amount
twice as large as the mean itself. If, on the other hand, we have a standard deviation of 10 and a
mean of 5000, the variation relative to the mean is significant. Therefore, we cannot know the
dispersion of a set of data until we know the standard deviation, the mean, and how the standard
deviation compares with the mean.
Coefficient of variation is used in such problems where we want to compare the variability of two
or more different series. Coefficient of variation is the ratio of the standard deviation to the
arithmetic mean, usually expressed in percent.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV = × 100%
𝑚𝑒𝑎𝑛

For population data:


𝜎
CV = 𝜇 × 100

Where 𝜎 is the population standard deviation and 𝜇 is population mean.


For sample data:
𝑆
CV = x̅ × 100

Where 𝑆 is the sample standard deviation and x̅ is sample mean.


Remark: A distribution having less coefficient of variation is said to be less variable or more
consistent or more uniform or more homogeneous.
Example 4.10: Last semester, the students of Mathematics and Chemistry Departments took
Introduction to Statistics course. At the end of the semester, the following information was
recorded.
Department Mathematics Chemistry
Mean score 85 65
Standard deviation 25 12

58 | P a g e
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Mathematics Departments Chemistry Departments
𝑆 𝑆
CV = x̅ × 100 CV = x̅ × 100
25 12
= 85 × 100 = 65 × 100

= 29.41% = 18.46%
Interpretation: Since the CV of Mathematics Department students is greater than that of
Chemistry Department students, we can say that there is more dispersion relative to the mean in
the distribution of Mathematics students’ scores compared with that of Chemistry students.
4.4 Standard Scores (Z-Scores)
A standard score for sample value in a data set is obtained by subtracting the mean of the data set
from the value and dividing the result by the standard deviation of the data set. Basically, the
standard score (z-score) tells us how many standard deviations a specific value is above or below
the mean value of the data set. That is, the z-score is the number of standard deviations the data
value falls above (positive z-score) or below (negative z-score) the mean for the data set.
Z-score computed from the population
𝑋−𝜇
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝜎
Z-score computed from the sample
𝑋 − 𝑋̅
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝑆
Example 4.11: What is the Z-score for the value of 14 in the following sample data set?
3 8 6 14 4 12 7 10

Solution:
14−8
𝑋̅ = 8, SD = 3.8173 thus, Z =3.8173 ≈ 1.57.

 The data value of 14 is located 1.57 standard deviations above the mean 8 because the z-score
is positive.
Example 4.12: Suppose that a student scored 66 in Statistics and 80 in Mathematics. The score of
the summary of the courses is given below.

59 | P a g e
Course Average score Standard deviation of the score
Statistics 51 12
Mathematics 72 16
In which course did the student scored better as compared to his classmates?
Solution:
𝑋−𝜇 66−51 15
Z-score of student in Statistics: 𝑍 = = = 12 = 1.25
𝜎 12
𝑋−𝜇 80−72 8
Z-score of student in Mathematics: 𝑍 = = = 16 = 0.5
𝜎 16

From these two standard scores, we can conclude that the student has scored better in Statistics
course relative to his classmates than in Mathematics course.
4.5 Moments, Skewness and Kurtosis
The measures of central tendency and variation discussed in previous one do not reveal the entire
story about a frequency distribution. Two distributions may have the same mean and standard
deviation but may differ in their shape of the distribution. Further description of their
characteristics is necessary that is provided by measures of skewness and kurtosis.
4.5.1 Moments
Moments are statistical tools used in statistical investigation. The moments of a distribution are the
arithmetic mean of the various powers of the deviations of items from some number. In our course,
we shall use it in the study of Skewness and Kurtosis of statistical distribution.
Moments about the origin
∑ 𝑋𝑖 𝑟
𝑀𝑟 =
𝑛
Where 𝑟 = 0, 1, 2, 3, …
Moments about the origin for grouped frequency distribution and for ungrouped frequency
distribution is
∑ 𝑓𝑖 𝑋𝑖 𝑟
𝑀𝑟 =
𝑛
Where 𝑓𝑖 is the frequency of 𝑋𝑖 . 𝑋𝑖 is the midpoint in the case of grouped frequency distribution
or class value in the case of ungrouped frequency distribution.
Note that: 𝑀1 = 𝑋̅, 𝑀0 = 1
Moments about the Mean (Central Moments)

60 | P a g e
∑(𝑋𝑖 − 𝑋̅)𝑟
𝑀𝑟′ =
𝑛
Moments about the mean for grouped frequency distribution and for ungrouped frequency
distribution.
∑ 𝑓𝑖 (𝑋𝑖 − 𝑋̅)𝑟
𝑀𝑟′ =
𝑛
Where 𝑓𝑖 is the frequency of 𝑋𝑖 . 𝑋𝑖 is the midpoint in the case of grouped frequency distribution
or class value in the case of ungrouped frequency distribution.

Note that: 𝑀2′ = 𝑆𝐷2 if it is assumed𝑛 = 𝑛 − 1.


Moments about any arbitrary constant 𝑨
∑(𝑋𝑖 − 𝐴)𝑟
𝑀𝑟′ =
𝑛
Moments about any arbitrary constant 𝐴 for grouped frequency distribution and for ungrouped
∑ 𝑓𝑖 (𝑋𝑖 −𝐴)𝑟
frequency distribution 𝑀𝑟′ = .
𝑛

Example 4.13: Find the first four moments about the mean for the following individual series
𝑋𝑖 : 3 6 8 10 18
Solution: n=5,
S.No 𝑿𝒊 ̅)
(𝑿𝒊 − 𝑿 ̅ )𝟐
(𝑿𝒊 − 𝑿 ̅ )𝟑
(𝑿𝒊 − 𝑿 ̅ )𝟒
(𝑿𝒊 − 𝑿

1 3 -6 36 -216 1296
2 6 -3 9 -27 81
3 8 -1 1 -1 1
4 10 1 1 1 1
5 18 9 81 729 6561
Total ∑ 𝑋 = 45 ∑(𝑋 − 𝑋̅) = 0 ∑(𝑋 − 𝑋̅)2 ∑(𝑋 − 𝑋̅)3 ∑(𝑋 − 𝑋̅)4
= 128 = 486 = 7940

Thus,
45 ∑(𝑋𝑖 −9) 1 ∑(𝑋𝑖 −9) 128 2 ∑(𝑋𝑖 −9) 486 3
𝑋̅ = 5 = 9, 𝑀1′ = = 0, 𝑀2′ = = 5 = 25.6, 𝑀3′ = = 5 = 97.2
5 5 5

∑(𝑋𝑖 − 9)4 7940


𝑀4′ = = = 1588
5 5

61 | P a g e
4.5.2 Skewness
Skewness refers to lack of symmetry (or departure from symmetry) in a distribution.
 A skewed frequency distribution is one that is not symmetrical.
 Skewness is concerned with the shape of the curve not size.
A distribution is said to be symmetrical when the value is uniformly distributed around the mean
(distribution of the data below the mean and above the mean are equal). In a symmetrical
distribution, the mean, median and mode coincide (i.e., mean = median = mode).
Positively skewed distribution: if the value of mean is greater than the mode, skewness is said to
be positive. In a positively skewed distribution mean is greater than the mode and the median lies
somewhere in between mean and mode. A positively skewed distribution contains some values
that are much larger than the majority of other observations.
Negatively Skewed distribution: if the value of mode is greater than the mean, skewness is said
to be negative. In a negatively skewed distribution mode is greater than the mean and the median
lies in between mean and mode. The mean is pulled towards the low-valued item (that is, to the
left). A negatively skewed distribution contains some values that are much smaller than the
majority of observations.
Note that: In moderately skewed distributions the averages have the following
relationship.
(Mean – mode) = 3(mean - median)

How to check the presence of skewness in a distribution?

62 | P a g e
Skewness present in the data if:
i) The graph is not symmetrical.
ii) The mean, median and mode do not coincide.
iii) The sum of positive and negative deviations from the median is not zero.
iv) The frequencies are not similarly distributed on either side of the mode.
Measures of skewness (𝜶𝟑 )
A measure of skewness gives a numerical expression for and the direction of asymmetry in a
distribution. It gives information about the shape of the distribution and the degree of variation on
either side of the central value. The three most commonly used measures of skewness are
Pearson’s coefficient of skewness, Bowley’s coefficient of skewness and coefficient of skewness
based on moments.
1. Pearson’s coefficient skewness (Pearsonian coefficient of skewness)
The skewness of the distribution can be measured by Pearson’s Coefficient of Skewness
(𝜶𝟑 ), for which the formula is given below:
𝑀𝑒𝑎𝑛−𝑀𝑜𝑑𝑒
𝛼3 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

2. Bowley’s Coefficient of Skewness


Bowley’s coefficient of skewness is based on quartiles. The formula for calculating
coefficient of skewness is:
(𝑄3 −𝑄2 )−(𝑄2 − 𝑄1 ) 𝑄3 +𝑄1 − 2𝑄2
𝛼3 = =
𝑄3 −𝑄1 𝑄3 −𝑄1

3. Moment Coefficient of Skewness


Moment coefficient of skewness is based on moments. The formula for calculating
coefficient of skewness is:
𝑀′3 𝑀′3
𝛼3 = 3/2 =
𝑀′2 𝜎3

Where, M'r = ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )𝑟 /𝑛


The shape of the curve is determined by the value of 𝛼3
𝛼3 > 0, the distribution is positively skewed/skewed to the right, i.e mode < median <mean
smaller observations are more frequent than larger observations. i.e., the majority of
the observations have a value below an average.
α3 = 0, the distribution is symmetric, i.e. mean = mode = median

63 | P a g e
α3 < 0, the distribution is negatively skewed/skewed to the left. i.e., mean < median < mode

smaller observations are less frequent than larger observations. i.e., the majority of
the observations have a value above an average.
4.5.3 Kurtosis
Kurtosis is a measure of peakedness of a distribution. The degree of kurtosis of a distribution is measured
relative to the peakedness of a normal curve. If a curve is more peaked than the normal curve it is called
‘leptokurtic’; if it is more or flate-topped than the normal curve it is called ‘platykurtic’ or flat-topped. The
normal curve itself is known as ‘mesokurtic’.

Measures of Kurtosis (𝜶𝟒 )

The moment coefficient of kurtosis:


𝑀′4 𝑀′4
α4 = =
𝑀′22 𝜎4

The peakedness depends on the value of 𝛼4


 𝛼4 > 3  the curve is leptokurtic,
 𝛼4 = 3  the curve is mesokurtic,
 𝛼4 < 3  the curve is platykurtic.

Example: Based on the following data:


𝑀′0 = 1, 𝑀′1 = -0.6, 𝑀′2 = 1.6, 𝑀′3 = -2.4, 𝑀′4 = 5.8
a/ Find the coefficient of skewness and discuss the distribution type.
b/ Find the coefficient of kurtosis and discuss the distribution type.

64 | P a g e
Solution:
𝑀′3 −2.4
a/ 𝛼3 = 3/2 = 1.63/2 = -1.19 < 0, the distribution is negatively skewed.
𝑀′2

𝑀′4 5.8
b/ 𝛼4 = = 1.62 = 2.26 < 3, the curve is platykurtic.
𝑀′22

Example 4.14: Find the coefficient of skewness and the coefficient of kurtosis for the
above example 4.13.
Solution:
𝑀′3 97.2 97.2
i) 𝛼3 = 3/2 = 3 = 129.527 = 0.75
𝑀′2 (25.6)2

the distribution is positively skewed.


𝑀′4 1588
ii) 𝛼4 = 𝑀′22
= 25.62 = 2.42

the curve is platykurtic.


Exercise 4
1. Calculate the mean deviation about the mean, median and mode, and their coefficients and also
variance and standard deviation for the following data.
Size of shoes 3 6 11 2 4 10 5 7 8 9
No. of pairs sold 10 15 25 6 4 3 2 8 9 4

2. An analysis of the monthly wages paid (in birr) to workers in two firms A and B belonging to
the same industry gives the following results.
Value Firm A Firm B
Mean wage 52.5 47.5
Variance 100 121
In which firm A or B is there greater variability in individual wages?
3. A meteorologist interested in the consistency of temperatures in three cities during a given
week collected the following data. The temperatures for the five days of the week in the three
cities were
City 1: 25, 24, 23, 26, 17
City 2: 22, 21, 24, 22, 20
City 3: 32, 27, 35, 24, 28
Which city have the most consistent temperature, based on these data?

65 | P a g e
4. Some characteristics of annually family income distribution (in Birr) in two regions is as
follows:
Region Mean Median Standard deviation
A 6250 5100 960
B 6980 5500 940

a) Calculate coefficient of skewness for each region


b) For which region the income is more consistent?
5. The median and the mode of a mesokurtic distribution are 32 and 34 respectively. The
4thmoment about the mean is 243. Compute the Pearsonian coefficient of skewness and identify
the type of skewness. Assume (n-1 = n).
6. If the standard deviation of a symmetric distribution is 10, what should be the value of the
fourth moment so that the distribution is mesokurtic?

CHAPTER FIVE
5 Elementary Probability

“Life is a school of probability.”

Walter Bagehot

The notion that chance, or probability, can be treated numerically is relatively recent. Indeed, for
most of recorded history it was felt that what occurred in life was determined by forces that were
beyond one’s ability to understand. It was only during the first half of the 17th century, near the
end of Renaissance, that people become curious about the world and the laws governing its
operation. Among the curious were the gamblers.

A cynical person once said, “The only two sure things are death and taxes.” This philosophy no
doubt arose because so much in people’s lives is affected by chance.

66 | P a g e
5.1 Introduction
Probability as a general concept can be defined as the chance of an event occurring. Most people
are familiar with probability from observing or playing games of chance, such as card games or
lotteries. Probability is the basis of inferential statistics.

The basic concepts of probability are explained in this chapter. These concepts
include probability experiments, sample spaces, the addition and multiplication rules,
and the probabilities of complementary events. Also in this chapter, you will learn the rule
for counting, the differences between permutations and combinations, and how to figure
out how many different combinations for specific situations exist. Section 4–5
explains how the counting rules and the probability rules can be used together to solve a
wide variety of problems. Finally in section six, the concept of probability is extended to
conditional probability and independence.
At the end of this chapter students are expected to:
 Know what is meant by sample space, event, relative frequency, probability,
conditional probability, independence.

5.2 Definitions of Some concepts of Probability Terms


Terms that are most frequently used and cornerstone of probability are defined as follows:
Probability experiment: It is a process that leads to well-defined results called outcomes.
For example, flipping a coin once, rolling one die once, or the like.
Outcomes: It is the result of a single trial of probability experiment. It is sometimes called
sample point. Example 5.1: when a coin is tossed once, there are two possible outcomes:
head or tail. In the roll of a single die, there are six possible outcomes: 1, 2, 3, 4, 5, or 6
Sample Space: It is the set of all possible outcomes of a probability experiment and denoted
by𝑆 𝑜𝑟 Ω. Example 5.2: consider example 5.1, S={H, T}, S={1, 2, 3, 4, 5, 6}
Event: It is a subset of sample space (contains one or more outcomes which are in the
sample space) and is defined for a particular purpose. An event can be one outcome or
more than one outcome. Simple event is an event having only single outcome. Compound
event consisting of one or more outcomes or simple events. Event is denoted by capital
letters such as A, B, F etc. Example 5.3: let A be the event of odd number in tossing a die
experiment, then A={1, 3, 5}

67 | P a g e
Mutually exclusive events: Suppose you have two events, say A and B. if these events
have no common sample point(s) or do not occur simultaneously, then the two events are
called mutually exclusive events. Example 5.4: consider experiment o tossing a die. Let A
be the event of odd numbers and B be the event of even numbers, A={1, 3, 5}, B={2, 4,
6}, then A and B are mutually exclusive events.
Exhaustive events: It is a satiation where the events contain all elements based on the
definition of the events. For example S={Head, Tail} is exhaustive for tossing a coin
experiment.
Union of events: The union of two events A and B, denoted by𝐴 ∪ 𝐵, consists of all
outcomes that are in A or in B or both A and B. Example 5.5: let A={1, 3, 5}, B={2, 4, 5,
6} then AUB={1, 2, 3, 4, 5, 6}
Intersection of events: The intersection of event A and B, denoted by𝐴 ∩ 𝐵, consists of all
outcomes that are in both A and B. Example 5.6: A={1, 3, 5}, B={2, 4, 5, 6} then
A∩B={5}
Compliment of an event: The compliment of event A, denoted by𝐴𝑐 𝑜𝑟 𝐴′ , consists of all
outcomes that are not in A. Example 5.7: Let Sample space S={1, 2, 3, 4, 5, 6}and event
A={1, 3, 5}then 𝐴𝑐 ={2, 4, 6}
Null event: The event containing no outcomes. It is the compliment of the sample space.
Probability of an event: The probability of event A, denoted by𝑃(𝐴), is the probability
the outcome of the experiment is contained in A.
Equally-likely events: It is a situation where the probability of the occurrence of one event
as likely as the other event. That is, they must have equal probability of occurrence.
Example 5.8: In example 5.1, outcomes: 1, 2, 3, 4, 5, and 6 are equally likely.
Independent events: Two events said to be independent if knowing whether a specific one
has occurred does not change the probability that the other occurs. (Example is explained
in section 5.6).

5.3 Counting Rules


In order to probabilities, we have to know
 The elements of an event.
 The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.
68 | P a g e
In order to determine the number of outcomes one can use several rules of counting.
1. Addition rule
2. Multiplication rule
3. Permutation rule
4. Combination rule
5.3.1 Addition Rule
If the choices can’t be performed together then the number of ways in which you can make a choice
in 𝑛1 + 𝑛2 + 𝑛3 + ⋯ + 𝑛𝑘 different ways.

Example 5.9: If there are two way of bus to voyage Debre Derhan from Addis Ababa and three
railways, then collectively we have 2 + 3 = 5 different way to arrive Debre Berhan.

5.3.2 Multiplication (Fundamental) Rule


In sequence of 𝑛 events in which the first one has 𝑘1 possibilities and the second event has 𝑘2 and
the third has 𝑘3 , and so forth, the total number of possibilities of the sequence will be
𝑘1 . 𝑘2 . 𝑘3 … 𝑘𝑛 .

Example 5.10: How many different 7-place license plates are possible if the first 3 places are to
be occupied by letters and the final 4 by numbers?

Solution: By multiplication rule the answer is 26.26.26.10.10.10.10 = 175,760,000

Example 5.11: In the above example, how many license plates would be possible if repetition
among letters or numbers were prohibited?
Solution: In this case there would be 26.25.24.10.9.8.7 = 78, 624, 000 possible plates.
5.3.3 Permutation Rule
How many different ordered arrangements of letters 𝑎, 𝑏, 𝑐 are possible? By direct enumeration
we see that there are 6: namely, 𝑎𝑏𝑐, 𝑎𝑐𝑏, 𝑏𝑎𝑐, 𝑏𝑐𝑎, 𝑐𝑎𝑏 and 𝑐𝑏𝑎. Each arrangement is known as a
permutation. That is a permutation is an arrangement of 𝑛 objects in a specific order. Thus, there
are six possible permutations of a set of 3 objects. This result could also have been obtained from
the basic principle, since the first object in the permutation can be any of the 3, the second object
in the permutation can then be chosen any of the remaining 2, and the third object in the
permutation is then chosen the remaining one. Thus there are 3.2.1 = 6 possible permutations.

69 | P a g e
Permutation Rule 1: Suppose now that we have 𝑛 objects. Reasoning, similar to that we have
just used for the 3 letter shows that there are
𝑛. (𝑛 − 1). (𝑛 − 2) … 3.2.1 = 𝑛!
Different permutations of the 𝑛 objects
Example 5.12: A class of stat 173 consists of 6 men and 4 women. An examination is given, and
the students are ranked according to their performance. Assume that no two students obtain the
same score.
A. How many different rankings are possible?
B. If the men are ranked just among themselves and women among themselves, how many
different rankings are possible?

Solution:
A. As each ranking corresponds to a particular ordered arrangement of the 10 people, we see
that the answer to this part is 10! = 3, 628, 800
B. As there are 6! possible rankings of the men among themselves and 4! possible rankings
of the women among themselves, it follows from the basic principle that the two groups
arrange themselves; it follows the basic principle that the two groups arrange themselves
in 2! way so that we have a total of 6! .4! .2! = 34560 possible rankings.
Permutation Rule 2: We shall now determine the number of permutations of a set of 𝑛 objects
when certain of the objects are indistinguishable from each other. Then the formula is:
𝑛!
𝑛1 !. 𝑛2 ! … 𝑛𝑟 !
Different permutations of 𝑛 objects, of which 𝑛1 are alike 𝑛2 are alike, …, 𝑛𝑟 are alike.
Example 5.13: How many different letter arrangements can be formed using the letter PEPPER?
6!
Solution: = 60 possible teller arrangements.
3!.2!.1!

Permutation Rule 3: Generally, if we are asked to arrange 𝑟 objects among 𝑛 objects, then we
will have the following total arrangements
𝑛!
𝑛𝑃𝑟 =
(𝑛 − 𝑟)!

70 | P a g e
Example 5.14: Suppose a business man has a choice of five locations in which to establish his
business. He wishes to arrange only the top three locations. How many different ways can he
arrange them?
Solution:
5!
5𝑃3 = = 60 𝑤𝑎𝑦𝑠
(5 − 3)!
5.3.4 Combination Rule
We are often interested in determining the number of different groups of 𝑟 objects that could be
formed from a total of 𝑛 objects. A selection of objects without regard to order is called a
combination. That is, combinations are used when the order or arrangement is not important. The
number of combinations of 𝑟 objects selected from 𝑛 objects is denoted by 𝑛𝐶𝑟 and is given by the
𝑛! 𝑛
formula 𝑛𝐶𝑟 = 𝑟!(𝑛−𝑟)! = ( )
𝑟

Example 5.15: From a group of 5 women and 7 men, how many different committees consisting
of 2 women and 3 men can be performed? What if 2 of the men are feuding and refuse to serve on
the committee together?
5 7
Solution: As there are ( ) possible groups of 2 women, and ( ) possible groups of 3 me, it
2 3
follows from the basic principle that there are
5 7 5.4 7.6.5
( )( ) = ( ) = 350
2 3 2.1 3.2.1
Possible committees consisting of 2 women and 3 men. On the other hand, if 2 of the men refuse
2 5
to serve on the committee together, then, as there are ( ) ( ) possible group of 3 men not
0 3
2 5
containing either of the 2 feuding men and ( ) ( ) groups of 3 men containing exactly 1 of the
1 2
2 5 2 5
feuding men, it follows that there are ( ) ( ) + ( ) ( ) = 30 groups of 3 men not containing
0 3 1 2
5
both of the feuding men. Since there are ( ) ways to choose the 2 women, it follows that in this
2
5
case there are 30 ( ) = 300 possible committees.
2
5.4 Approaches in Probability Definition
The probability of an event is denoted by 𝑃(. ) where 𝑃 stands for probability and the dot stands
for any event, say A, B, G etc.

71 | P a g e
Generally approaches to probability can be divided into two, namely subjective approach and
objective approach.
5.4.1 Subjective approach:
A probability derived from an individual's personal judgment about whether a specific outcome is
likely to occur. Subjective probabilities contain no formal calculations and only reflect the subject's
opinions and past experience.
Subjective probabilities differ from person to person. Because the probability is subjective, it
contains a high degree of personal bias. An example of subjective probability could be asking
Arsenal fan, before the football season starts, the chances of Arsenal winning the world
champions. While there is no absolute mathematical proof behind the answer to the example, fans
might still reply in actual percentage terms, such as the Arsenal having a 95% chance of winning
the world champions.
5.4.2 Objective approach:
The probability of an event in a certain experiment based on an experimental evidence or random
process. In this approach to study probability theory there are three sub approaches.
These are
The classical approach
The frequentist approach
The axiomatic approach and
5.4.3.1 The Classical Approach
If a procedure has 𝑛 different simple events, each with an equal chance of occurring, and event A
can occur in 𝑠 of these ways, then
𝑛(𝐴) 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝐴
𝑃(𝐴) = =
𝑛(𝑆) 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑎 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒
Assumptions in classical approach
The outcomes must be equally-likely
The experiment should never be repeated more than once
The sample space should be finite

Example 5.16: Toss a fair coin once and find the probability of the occurrence of head
Solution: Since the sample space is finite i.e., either head or tail and the outcomes are
equally-likely

72 | P a g e
𝑛(ℎ𝑒𝑎𝑑) 1
𝑃(𝐻𝑒𝑎𝑑) = = = 0.5
𝑛(𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒) 2
Example 5.17: For a card drawn from an ordinary deck, find the probability of getting a queen.
4
Solution: Since there are 4 queens and 52 cards, 𝑃(𝑞𝑢𝑒𝑒𝑛) = 52

If one of the assumptions stated above is violated, the classical approach no longer valid
5.4.3.2 Frequentist (empirical) Approach
If after 𝑛 repetition of an experiment, where 𝑛 is very large, an event is observed to occur in ℎ of

these, then the probability of an event is 𝑛 or conduct an experiment a large number of times, and

count the number of times event A actually occurs, then an estimate of 𝑃(𝐴) is

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝐴 𝑜𝑐𝑢𝑢𝑟𝑒𝑑


𝑃(𝐴) ≈
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡𝑟𝑖𝑎𝑙 𝑤𝑎𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑

Example 5.18: Suppose a coin was tossed 1000 times and the result was 587 tails. The relative
587
frequency of tails is1000. Another 1000 tosses lead to 511 tails. Then the relative frequency of tails
587+511 1098
is = 2000 . Proceeding, in this manner we obtain a sequence of numbers, which gets
1000+1000

closer and closer to the number defined as the probability of a trial in a single toss.
Therefore,
𝑛(𝐴)
𝑃(𝐴) = lim
𝑛→∞ 𝑛

5.4.3.3 Axiomatic Approach


Both the classical and frequentist approaches have serious drawbacks, the first because the words
“equally likely” are vague and the second because the “large number” involved is vague. Because
of these difficulties, statisticians have been led to an axiomatic approach of probability.
Axiom 1: For every event A, 𝑃(𝐴) ≥ 0
Axiom 2: For the sure or certain event, 𝑃(𝑆) = 1
Axiom 3: For any number of mutually exclusive events 𝐴1 , 𝐴2 , 𝐴3 …
𝑃(𝐴1 ∪ 𝐴2 ∪ 𝐴3 ∪ … ) = 𝑃(𝐴1 ) + 𝑃(𝐴2 ) + 𝑃(𝐴3 ) + ⋯
In particular, for two mutually exclusive events 𝐴1 𝑎𝑛𝑑 𝐴2
𝑃(𝐴1 ∪ 𝐴2 ) = 𝑃(𝐴1 ) + 𝑃(𝐴2 )

73 | P a g e
5.5 Some Probability Rules
Rule 1: If A1  A2 , then P( A1)  P( A2 )

Rule 2: For every event A, 0  P ( A)  1 i.e. a probability between 0 and 1.


Rule 3: For  , the empty set, P ( )  0 i.e. the impossible event has probability zero.
Rule 4: If A ' is the complement of A, then P( A ' )  1  P( A)
Rule 5: If A and B are any two events, then P( A  B)  P( A)  P( B)  P( A  B)
More generally, if A1 , A2 , A3 are three events, then
P( A1  A2  A3 )  P( A1 )  P( A2 )  P( A3 )  P( A1  A2 )  P( A2  A3 )  P( A3  A1 )  P( A1  A2  A3 )
Rule 6: P( A  B)  P( A) P( B | A) or P( A  B)  P( B) P( A | B)
P( A  B)  P( A) P( B)
, for independent event
Example 5.11: Suppose we toss two coins and suppose that each of the four points in the sample
1
space is 𝑆 = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇} equally likely and hence has probability 4 .
Let E is the event that the first coin falls head, and F is the event that the
second coin falls heads.
Solution: 𝐸 = {𝐻𝐻, 𝐻𝑇}𝑎𝑛𝑑 𝐹 = {𝐻𝐻, 𝑇𝐻} . Then the probability of either the first or the second
coin falls head is

P( E  F )  P( E )  P( F )  P( E  F ) = 1  1  1 = 3
2 2 4 4
5.5 Conditional Probability and Independence
Let A and B two events such that P ( A)  0 . Denote P ( B | A) the probability of B given that A has
occurred since A is know to have occurred; it becomes the new sample replacing the original S.
From this we are led to the definition
P( A  B)
P( B | A)  , P( A)  0 or
P( A)
P ( A  B )  P ( A) P ( B | A)

In words, this is saying that the probability that both A and B occur is equal to the probability that
A occurs times the probability that B occurs given that has occurred. We call P ( B | A) the
conditional probability of B given A, i.e. the probability that B will occur given that A has occurred.
Example 5.12: A jar contains black and white marbles. Two marbles are chosen without
replacement. The probability of selecting a black marble and then a white marble
is 0.34, and the probability of selecting a black marble on the first draw is 0.47.

74 | P a g e
What is the probability of selecting white marble on the second draw, given that
the first marble drawn was black?

Solution: PWhite | Black  


P( Black and White) 0.34
  0.72
P( Black ) 0.47
Example 5.13: The probability that it is Friday and that a student is absent is 0.03. Since there
are 5 schooldays in a week, the probability that it is Friday is 0.2. What is the
probability that a student is absent given that today is Friday?

Solution: P Absent | Friday 


P( Friday and Absent) 0.03
  0.15
P( Firday) 0.2
It often happens that the knowledge that a certain event E has occurred has no effect on the
probability that some other event F has occurred, that is, that P( E | F )  P( E ) . One would expect
that in this case, the equation P( F | E )  P( F ) would also be true. If these equations are true, we
might say the F is independent of E.
Definition: Two events E and F are independent if both E and F have positive probability and if
P( E | F )  P( E ) and P( F | E )  P( F )

Note that: If P( E )  0 and P( F )  0, then E and F are independent if and only if


P( E  F )  P( E ) P( F )
Example 5.14: Suppose that we roll a pair of fail dice, so each of the 36 possible out come is
equally likely. Let A denotes the event that the first die lands on 3, let C be the
event that the sum of the dice is 7
A. Are A and B independent?
B. Are A and C independent
Solution:
A. Since A  B is the event that the first die lands on 3 and the second on 5, we see that

P ( A  B )  P ((3,5)) 
1
36
On the other hand

P ( A)  P ((3,1), (3,2), (3,3), (3,4), (3,4), (3,6)) 


6
and
36

P( B)  P((2,6), (3,5), (4,4), (5,3), (6,2))  5


36

75 | P a g e
Therefore, since 1  (6 ).( 5 ), we see that P( A  B)  P( A) P( B) and so events A
36 36 36
and B are not independent
B. Events A and C are independent. This is seen by noting that

P ( A  C )  P (3,4) 
1
36

and P (C )  P ((1,6), (2,5), (3,4), (4,3), (5,2), (6,1)) 


1 6
While P ( A)  .Therefore,
6 36
P ( A  C )  P ( A).P (C ) and so events A and C are independent.

CHAPTER SIX
6 Probability Distribution
Before probability distribution is defined formally, the definition of reviewed. In the first chapter,
a variable was defined as a characteristic or attribute that can assume different values various letter
of the alphabet are used to represent the variables.
At the end of this chapter students are expected to:
 Know what meant by random variable, probability distribution, probability density
function, expected value and variance;
 Be familiar with some standard discrete and continuous probability distributions;
 Be able to use standard statistical tables for Normal, t, Chi-square distributions.

6.1 The Definition of Random Variable and Probability Distribution


Definition: Let S be a sample space of an experiment and X is a real valued function defined over
the sample space S, then X is called a random variable (or stochastic variable).
A random variable, usually shortened to r.v. (rv), is a function defined on a sample space S and
taking values in the real line  , and denoted by capital letters, such as X, Y, Z. Thus, the value of

76 | P a g e
the r.v. X at the sample point s is X(s), and the set of all values of X, that is, the range of X, is
usually denoted by X(S) or RX.

The difference between a r.v. and a function is that, the domain of a r.v. is a sample space S, unlike
the usual concept of a function, whose domain is a subset of  or of a Euclidean space of higher
dimension. The usage of the term “random variable” employed here rather than that of a function
may be explained by the fact that a r.v is associated with the outcomes of a random experiment.
Of course, on the same sample space, one may define many distinct r.vs.
Example 6.1: Suppose we are about to learn the sexes of the three children of a certain family.
The sample space of this experiment consists of the following 8 outcomes.
𝑆 = {(𝑏, 𝑏, 𝑏), (𝑏, 𝑏, 𝑔), (𝑏, 𝑔, 𝑏), (𝑏, 𝑔, 𝑔, ), (𝑔, 𝑏, 𝑏), (𝑔, 𝑏, 𝑔), (𝑔, 𝑔, 𝑏), (𝑔, 𝑔, 𝑔)}
The outcomes (𝑔, 𝑏, 𝑏) means, for instance that the youngest child is a girl, the next youngest is a
boy, and the oldest is a boy. Suppose that each of these 8 possible outcomes is equally likely, and
so each has probability 1/8. If we let X denote the number of female children in this family, then
the value of X is determined by the outcomes of the experiment. That is, X is a random variable
whose value will be 0, 1, 2 𝑜𝑟 3. i.e.
𝑋(𝑏𝑏𝑏) = 0, 𝑋(𝑔𝑏𝑏) = 𝑋(𝑏𝑔𝑏) = 𝑋(𝑏𝑏𝑔) = 1,
𝑋(𝑔𝑔𝑏) = 𝑋(𝑔𝑏𝑔) = 𝑋(𝑏𝑔𝑔) = 2, 𝑋(𝑔𝑔𝑔) = 3
Example 6.2: Recording the lifetime of an electronic device, or of an electrical appliance. Here S
is the interval (0, T) or for some justifiable reasons, S = (0, ∞), a r.v. X of interest is X(s) = s, s ∈
S.
Example 6.3: Measuring the dosage of a certain medication administered to a patient, until a
positive reaction is observed. Here S = (0, D) for some suitable D.
In the examples discussed above we have seen r.v.s with different values. Hence, random variables
can be categorized in to two broad categories such as discrete and continuous random variables.
6.1.1 Discrete Random Variable and Probability Distribution (pmf)
Definition 6.2:A random variable X is called discrete (or of the discrete type), if X takes on a finite
or countably infinite number of values; that is, either finitely many values such as x 1, . . . , xn, or
countably infinite many values such as x0, x1, x2, . . . .
Or we can describe discrete random variable as, it
 Take whole numbers (like 0, 1, 2, 3 etc.)
 Take finite or countably infinite number of values

77 | P a g e
 Jump from one value to the next and cannot take any values in between.
Example 6.4:
Experiment Random Variable (X) Variable values
Children of one gender in a family Number of girls 0, 1, 2, …

Answer 23 questions of an exam Number of correct 0, 1, 2, ..., 23

Count cars at toll between 11:00 am & 1:00 pm Number of cars arriving 0, 1, 2, ..., n

Definition: If X is a discrete random variable, the function given by f(x) = 𝑝(𝑋 = 𝑥)𝑜𝑟 𝑃{𝑋 = 𝑥𝑖 }
for each 𝑥 within the range of X is said to be probability distribution or probability mass function
of X if it satisfies the following two conditions:
1. The sum of the probabilities of all the events in the sample space must equal 1; that is,
∑ 𝑝(𝑥) = 1
2. The probability of each event in the sample space must be between or equal to 0 and 1. that
is, 0 ≤ 𝑝(𝑥) ≤ 1.
Example 6.5: Consider r.v. X in Example 6.1 and construct probability distribution of X.

Solution: Since X will equal 0 if the outcome is (𝑏, 𝑏, 𝑏), we see that
1
𝑃(𝑋 = 0) = 𝑃(𝑏𝑏𝑏) =
8
Since X will equal 1 if the outcome is (𝑔𝑏𝑏), (𝑏𝑔𝑏), (𝑏𝑏𝑔) we have 𝑃(𝑋 = 1) =
3
𝑃{(𝑔𝑏𝑏) 𝑜𝑟 (𝑏𝑔𝑏) 𝑜𝑟 (𝑏𝑏𝑔)} = 8
3 1
Similarly, 𝑃(𝑋 = 2) = 𝑃{(𝑏𝑔𝑔) 𝑜𝑟 (𝑔𝑏𝑔) 𝑜𝑟 (𝑔𝑔𝑏)} = 8, 𝑃(𝑋 = 3) = 𝑃(𝑔𝑔𝑔) = 8

Therefore,
1
𝑖𝑓 𝑥 = 0, 3
8
𝑓(𝑥) = 3
𝑖𝑓 𝑥 = 1,2
8
{0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Example 6.6: Suppose we toss a coin three times, the sample space is represented as
TTT , TTH , THT , HTT , HHT , HTH , THH , HHH and if the random variable for the

number of heads.
A. Assign a value for a random variable
B. Find the probability distribution for A

78 | P a g e
Solution:
A. Once a random variable, say X , is defined as the number of heads, X  0,1, 2, or3
B.
Number of heads X 0 1 2 3

Probability 𝑝(𝑋 = 𝑥) 1 3 3 1
8 8 8 8

1 3 3 1
We can check that ∑ 𝑝(𝑋 = 𝑥) = 8 + 8 + 8 + 8 = 1

Example 6.7: Suppose that X is a random variable that takes on one of the value 0,1, 2, or3 . If

PX  1  0.4 and PX  2  0.1 . What is PX  1?


Solution: Since the probability must sum to 1, we have
1  PX  1  PX  2  PX  3
1  0.4  0.1  PX  3
PX  3  1  0.4  0.1
 0.5

Example 6.8: A sales women has scheduled two appointments to sell encyclopedias. She feels
her first appointments will lead to a sale with probability 0.3. She also feels that
the second will lead to a sale with probability 0.6 and that the results from the two
appointments are independent. What is the probability distribution of X , the
number of sales made?

Solution: The random variable X can take on any of the value 0,1, 2 . It will equal 0 if neither
appointment leads to a sale, and so
PX  0  Pno sale on first, no sale on sec ond
 Pno sale on first Pno sale on sec ond
 (1  0.3)(1  0.6)
 0.28
The random variable X will equal 1 either if there is a sale on the first and not on the second
appointment or if there is no sale on the first and one sale on the second appointment. Since these
two events are disjoint, we have

79 | P a g e
PX  1  PSale on first, no sale on sec ond  PNo sale on first, sale on sec ond
 PSale on firstPno sale on sec ond  no sale on firstPno sale on sec ond
 0.31  0.6  0.61  0.3
 0.54
Finally, the random variable X will equal 2 if both appointments result in sales; thus
PX  2  Psale on first, sale on sec ond
 Psale on firstPSale on sec ond
 0.3x0.6
 0.18
As check on this result, we note that
PX  0  PX  1  PX  2  0.28  0.54  0.18  1
𝑥+2
Exercise 6.1: Check whether the function given by 𝑓(𝑥) = for x = 1, 2, 3, 4, 5 is a p.m.f?
25

Definition: If X is a discrete random variable, the function given by


F ( x)  P ( X  x)   f (t ) for all x in  and t ∈ X.
t x

Where f(t) is the value of probability distribution or p.m.f of X at t, is called the distribution
function, or the cumulative distribution function of X.
If X takes on only a finite number of values x1, x2, . . . , xn, then the distribution function is given
by

Example 6.9:
Find the distribution function F of the total number of heads obtained in four tosses of a balanced
coin?
The distribution function, or the cumulative distribution function F(X) will be the following;

80 | P a g e
0 𝑓𝑜𝑟 𝑥 < 0
1
𝑓𝑜𝑟 0 ≤ 𝑥 < 1
16
5
𝑓𝑜𝑟 1 ≤ 𝑥 < 2
16
𝐹(𝑋) = 11
𝑓𝑜𝑟 2 ≤ 𝑥 < 3
16
15
𝑓𝑜𝑟 3 ≤ 𝑥 < 4
16
{1 𝑓𝑜𝑟 𝑥 ≥ 4
Exercise 6.2: A telephone survey of households throughout Washington State is given below:

a. What is the probability that a household will have no telephone?


b. What is the probability that a household will have 2 or more telephone lines?
c. What is the probability that a household will have 2 to 4 phone lines?
d. What is the probability a household will have no phone lines or more than 4 phone lines?

e. Who do you think is in that 3.5% of the population?


6.1.2 Continuous Random Variable and Probability Distribution
Definition : A r.v X is called continuous (or of the continuous type) if X takes all values in a proper
interval I ⊆  .
Or we can describe continuous random variables as follows:
 Take whole or fractional number.
 Obtained by measuring.
 Take infinite number of values in an interval.
 Too many to list like discrete variable
Example 6.10:

81 | P a g e
The following examples are continuous r.v.s
Experiment Random Variable X Variable values
Weigh 100 People Weight 45.1, 78, ...

Measure Part Life Hours 900, 875.9, …

Ask Food Spending Spending 54.12, 42, ...

Measure Time Between Arrivals Inter-Arrival time 0, 1.3, 2.78, ...

Definition 6.4: A function with values f(x), defined over the set of all real numbers, is called a
probability density function of the continuous random variable X if and only if
𝑏
P (a ≤ x ≤ b) = ∫𝑎 𝑓(𝑥)𝑑𝑥 for any real constant a ≤ b.
Probability density function also referred as probability densities (p.d.f.), probability function, or
simply densities.
Remark:
 The probability density function f (x) of the continuous random variable X, has the following
properties (satisfy the conditions)
1. f(x) ≥ 0 for all x, or for −∞ < x < ∞

2. f ( x)   f ( x) dx  1


 If X is a continuous random variable and a and b are real constants with a ≤ b, then
P (a ≤ x ≤ b) = P (a < x ≤ b) = P (a ≤ x < b) = P (a < x < b)
Example 6.11: If X is the probability density
−3𝑥
𝑓(𝑥) = {𝑘. 𝑒 𝑓𝑜𝑟 𝑥 > 0
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

Find the constant k and P (0.5 ≤ X ≤ 1)?


Solution:

∫𝟎 𝑓(𝑥)𝑑𝑥 = 1, since 𝑓(𝑥) is pdf.

𝑘 𝑘
∫ 𝑘. 𝑒 −3𝑥 𝑑𝑥 = 1 ⇒ − [( lim 𝑒 −3𝑥 ) − 1] = 1 ⇒ = 1 ⇒ 𝑘 = 3
𝟎 3 𝑥→∞ 3

𝟏 𝟏 𝟏
And 𝑃(0.5 ≤ 𝑋 ≤ 1) = ∫𝟎.𝟓 3. 𝑒 −3𝑥 𝑑𝑥 = [−𝑒 −3𝑥 ]10.5 = −𝒆−𝟑 + 𝒆−𝟏.𝟓 = 𝒆𝟏.𝟓 − 𝒆𝟑

82 | P a g e
Exercise 6.3:
The p.d.f of the random variable X is given by
𝐶
𝑓𝑜𝑟 0 < 𝑥 < 4
𝑓(𝑥) = {√𝑥
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Find a. the value of C?
1
b. 𝑃(𝑋 < ) and P(X > 1)?
4

Definition: If X is a continuous random variable and the value of its probability density is f (t),
x
then function given by F ( x)  P ( X  x) 
 f (t ) dt

is called the distribution function, or the

cumulative distribution of the continuous r.v. X.


Theorem 6.1: If f (x) and F(x) are the values of the probability density and the distribution function
of X at x, then
P (a ≤ x ≤ b) = F(b) - F(a)
For any real constant a and b with a ≤ b, and
𝑑 𝐹(𝑥)
𝑓(𝑥) = 𝑑𝑥

where the derivative exist.


Exercise 6.4: Find the distribution function of the random variable X and evaluate P (0.5 ≤ X ≤
−3𝑥
𝑓(𝑥) = { 3𝑒 𝑓𝑜𝑟 𝑥 > 0
1)? If is the probability density of X is f (x),
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

Exercise 6.5:
A r.v. X has d.f. F given by:

(i) Determine the corresponding p.d.f. (f(x)).


(ii) Determine the constant c.
6.2 Introduction to Expectation- Mean and Variance of a Random Variable
A key concept in probability is the expected value of a random variable.

83 | P a g e
Definition : If X is a discrete random variable that takes on one of the possible values x1 , x2 , xn

then the expected value of X , denoted by 𝐸(𝑋)𝑜𝑟𝐸[𝑋], is defined by


𝑛

𝐸(𝑋) = ∑ 𝑥𝑖 𝑝(𝑥𝑖 )
𝑖=1

If X is continuous random variable



𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞

Where f x  is probability density function in the case of discrete random variable its name will
change to probability mass function (pmf).
Example 6.12: Find the expected value of the following random variable
𝑿 0 1 2 3 4
𝑷(𝑿) 0.18 0.34 0.23 0.21 0.04
Solution: 𝐸(𝑋) = ∑4𝑥=0 𝑥𝑃(𝑋 = 𝑥)
= 0(0.18) + 1(0.34) + 2(0.23) + 3(0.21) + 4(0.04)
= 1.14
Note that: The expected value of a random variable is the same as with the mean of a
random variable
∑ 𝑥𝑃(𝑥) , 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝑋̅ = 𝐸(𝑋) = {
∫ 𝑥𝑓(𝑥) 𝑑𝑥, 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑜𝑢𝑠.

Suppose that we are given random variable random variable along with its probability mass
function (pmf) if it is discrete or probability density function (pdf) if it is continuous, and that we
want to compute the expected value of some function of 𝑋, say 𝑔(𝑋). How can we accomplish
this? One way is follows: Since 𝑔(𝑥) is determined from the pmf/pdf of 𝑋. Once we have
determined the pmf/pdf of 𝑔(𝑥) we can compute 𝐸 [𝑔(𝑥)] by using the definition of expected
value.
∑ 𝑔(𝑥)𝑃(𝑥) , 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝐸[𝑔(𝑥)] = {
∫ 𝑔(𝑥)𝑓(𝑥)𝑑𝑥 , 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑜𝑢𝑠.
Example 6.13: Let X denote a random variable that takes on any of the values -1, 0, 1 with
respective probability 𝑃{𝑋 = −1} = 0.2 ,𝑃{𝑋 = 0} = 0.5 ,𝑝{ 𝑋 = 1} = 0.3, then
compute 𝐸(𝑋 2 ).

84 | P a g e
Solution: Letting 𝑌 = 𝑔(𝑥) = 𝑋 2 ,
𝐸[𝑔(𝑥)] = ∑ 𝑔(𝑥)𝑃(𝑥) = (−1)2 . 𝑃(𝑋 = −1) + 02 . 𝑃(𝑋 = 0) + 12 . 𝑃(𝑋 = 1)
= 1(0.2) + 0(0.5) + 1(0.3) = 0.5
The reader should note that (𝐸 [𝑋])2 = 0.01
0.5 = 𝐸 [𝑋 2 ] ≠ (𝐸[𝑋])2 = 0.01
If 𝑎 and 𝑏 are constants then
𝐸(𝑎𝑋 + 𝑏) = 𝑎𝐸[𝑋] + 𝑏
The expected value of a random variable 𝑋, 𝐸 [𝑋] is also referred to as the mean or the first moment
of 𝑋. The quantity 𝐸[𝑋 𝑛 ], 𝑛 ≥ 1, is called the 𝑛𝑡ℎ moment of 𝑋. By definition
∑ 𝑋 𝑛 𝑃(𝑋 = 𝑥), 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝐸[𝑋 𝑛 ] = { 𝑛
∫ 𝑋 𝑓(𝑥)𝑑𝑥, 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑜𝑢𝑠.
Exercise 6.6: The following are the annual income of 7 men and 7 women residents of a certain
community.

Annual income (in $ 1000)

Men Women

33.5 24.2

25.0 19.5

28.6 27.4

41.0 28.6

30.5 32.2

85 | P a g e
29.6 22.4

32.8 21.6

Suppose that a woman and a man randomly chosen. Find the expected value of the sum of their
incomes.
Solution: Let 𝑋 be the man’s income and Y is the woman’s income. Since 𝑋 is equally likely to
be any of the values in the men’s column, we see that
1
𝐸 (𝑋) = (33.5 + 25 + ⋯ + 32.8) = 31.571
7
1
Similarly, 𝐸 [ 𝑌] = (24.2 + 19.5 + ⋯ + 21.6) = 25.129
7

Therefore, the expected value of the sum of their incomes is


𝐸 [𝑋 + 𝑌] = 𝐸 [𝑋] + 𝐸 [𝑌] = 56.7
That is, the expected value of the sum of their incomes is approximately $ 56,700.
Definition: If 𝑋 is a random variable with mean μ, then the variance of x, denoted by 𝑉𝑎𝑟 (𝑥), is
defined by 𝑉𝑎𝑟 (𝑋) = 𝐸 [(𝑋 − 𝜇)2 ]
An alternative formula for 𝑉𝑎𝑟(𝑋) is derived as follows
𝑉𝑎𝑟(𝑋) = 𝐸[ 𝑋 – 𝜇]2

= ∑(𝑋 − 𝜇)2 𝑃(𝑋)

= ∑( 𝑋2 − 2𝜇𝑋 + 𝜇2 ) 𝑃(𝑋)

= 𝐸[𝑥2] − 2𝜇 2 + 𝜇 2

= 𝐸(𝑋 2 ) – 𝜇 2

That is, 𝑉𝑎𝑟 (𝑋) = 𝐸[𝑋 2 ] – (𝐸[𝑋])2


Example 6.14: The return from a certain investment is a random variable X with probability
distribution. 𝑃{𝑋 = −1} = 0.7, 𝑃{𝑋 = 4} = 0.2, 𝑃 {𝑋 = 8} = 0.1
Find 𝑉𝑎𝑟 (𝑋), the variance of the return.
Solution: Let us first compute that expected return as follows:
𝜇 = 𝐸(𝑋) = −1(0.7) + 4 (0.2) + 8 (0.1) = 0.9
To compute 𝑉𝑎𝑟(𝑋), we use the formula 𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) – 𝜇 2

86 | P a g e
Now, since 𝑋 2 will equal (−1)2 , 42 , 𝑜𝑟 82 with respective probabilities of
0.7, 0.2, 𝑎𝑛𝑑 0.1, we have
𝐸[𝑋 2 ] = 1 (0.7) + 16 (0.2) + 64 (0.1) = 10.3
Therefore, 𝑉𝑎𝑟 (𝑋) = 10.3 – (0.9)2 = 9.94
Properties of Variance
1. For any random variance X and constant C, it can be shown that
𝑉𝑎𝑟 (𝐶𝑋) = 𝐶 2 𝑉𝑎𝑟(𝑋)
𝑉𝑎𝑟 (𝐶 + 𝑋) = 𝑉𝑎𝑟 (𝑋)
2. If 𝑋 and 𝑌are independent random variable, 𝑉𝑎𝑟 (𝑋 + 𝑌) = 𝑉𝑎𝑟 (𝑋) + 𝑉𝑎𝑟(𝑌)
3. The square root of the 𝑉𝑎𝑟 (𝑋) is called the standard deviation of 𝑋, and we denote it by
𝑆𝐷 (𝑋) . That is, 𝑆𝐷 (𝑥) = √𝑉𝑎𝑟(𝑥)
6.3 Common Discrete Probability Distribution
6.3.1. Binomial Distribution
Many types of probability problems have only two outcomes, or they can be reduced to two
outcomes. For example, when a coin is tossed, it can land heads or tails.
A probability experiment is a binomial probability experiment that satisfies the following four
requirements:
1. Each trial can have only two outcomes or outcomes that can be reduced to two
outcomes.
2. There must be a fixed number of trials
3. The outcomes of each trial must be independent
4. The probability of a success must remain the same for each trial
The outcomes of a binomial experiment and the corresponding probabilities of these outcomes are
called a binomial distribution. The probability mass function of a binomial random variable having
parameter (n, p) is given by
𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑃 𝑥 (1 – 𝑃)𝑛 – 𝑥 , 𝑖 = 0, 1 , … . , 𝑛
𝑥

Example 6.15: Five fair coins are flipped. If the outcomes are assumed independent, find
the probability of the number of heads obtained
Solution: If we let 𝑋 equal the number of heads (successes) parameters(𝑛 = 5, 𝑃 = ½).
Hence,

87 | P a g e
 5  1   1 
3 2

PX  3       
10
 3  2   2  32

 5  1   1 
4

PX  4       
5
 4  2   2  32

 5  1   1 
5

PX  5       
1
 5  2   2  32
Example 6.16:
A. Determine PX  12 when  is a Binomial random variable with parameters 𝑛 =
20 and 𝑃 = 0.4
B. Determine PY  10 when Y is a Binomial random variable with parameters 𝑛 = 16and
𝑃 = 0.5
Solution:
A. PX  12  1  PX  12

 1  PX  13  PX  14    pX  20


 0.9790
B. PY  10  1  PY  10  1  PY  9  0.2272
If 𝑋 is Binomial random variable with parameter n and P , then
𝐸(𝑋) = 𝑛𝑝
𝑉𝑎𝑟(𝑋) = 𝑛𝑝(1 − 𝑝) = 𝑛𝑝𝑞, 𝑤ℎ𝑒𝑟𝑒 𝑞 = 1 − 𝑝
Example6.17: Suppose that each screw produced is independently defective with
probability 0.01. Find the expected value and variance of the number of
defective screws in a shipment of size 1000.
Solution: The number of effective screws in the shipment of size 1000 is a Binomial random
variable with parameters n  1000 , P  0.01. Hence, the expected number of defective

screws is Enumber of defectives  10000.01  10 and the variance of the number of

detective screws is Varnumber of defective  10000.01 0.99  9.9


6.3.2 The Poisson Distribution
A discrete probability distribution that is useful when n is large and p is small and when the independent
variable occurs over a period of time is called the Poisson distribution, name for Simeon D. Poisson (1781-

88 | P a g e
1840). In addition to being used for the stated conditions (i.e. 𝑛 is large, p is small, and the variable occur
over a period of time), the Poisson distribution can be used when a density of items is distributed over a
given area or volume, such as the number of plants growing per acre of woods or the number of defects in
a given length of videotape.
If X is Poisson random variable with parameter  , then

𝑒− 
𝑥

𝑃(𝑥) = 𝑃(𝑋 = 𝑥) =
𝑥!

Example6.18: If X is a Poisson random variable with parameter  = 2, find 𝑃(𝑋 = 0)


𝑒 −2 20
Solution: 𝑃(𝑋 = 0) = 0!
, Using the fact that 2 0  1 , 0! 1 , we obtain 𝑃(𝑋 = 0) = 𝑒 −2 = 0.135

Both the expected value and the variance of a Poisson random variable are equal to  . That is, we have the
following. If X is a Poisson random variable with parameter  ,   0 ; then

EX    , VarX   
Example6.19: Suppose the average number of accidents occurring weekly on a particular high way is
equal to 1.2. Approximate the probability that there is at least one accident this week.
Solution: Let x denote the number of accidents because it is reasonable to suppose that there are a large
number of cars passing along the high way, each having a small probability of being involved in
an accident, the number of such accidents should be approximately a Poisson random variable.
That is, if x denotes the number of accidents that will occur this week, then x is approximately
Poisson random variable with mean value   1.2 . The desired probability is now obtained as
follows.

e 1.2 1.2
0
px  0  1  px  0  1   1  e 1.2  0.6988
0!
Therefore, there is approximately a 70% chance that there will be at least one accident this
week.
We can approximate Binomial distribution to Poisson distribution if n is large and p is too small.
Thus, the approximately Poisson distribution has a parameter.
  np
Example 6.20: Suppose that items produced by a certain machine are independently
defective with probability 0.1.What is the Poisson approximation for this
probability?

89 | P a g e
Solution: If we let x denote the number of defective items, then x is a Binomial random variable
with parameters n  10 and P  0.1. Thus the desired probability is
10 10
pX  0  PX  1   0.1 0.9   0.1 0.9
0 10 1 9

0  1 
 0.7361
Since nP  100.1  1 , the Poisson approximation yields the value.

Px  0  Px  1  e 1  e 1  0.7358


Thus, even in this case, where n is equal to 10 (which is not that large) and p is equal
to 0.1 (which is not that small), the Poisson approximation to the Binomial
probability is quite accurate.
6.4 Common Continuous Probability Distribution
Every continuous random variable X has a curve associated with it. This curve, formally known
as a probability density function, can be used to obtain probabilities associated with the random
variable. This is accomplished as follows, consider any two points a and b , where a is less than
b . The probability that x assumes a value that lies between a and b is equal to the area under the
curve between a and b . That is,
Pa  x  b = 𝐴𝑟𝑒𝑎 𝑢𝑛𝑑𝑒𝑟 𝑐𝑢𝑟𝑣𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 a 𝑎𝑛𝑑 b
Since X must assume some value, it follows that the total area under the density curve must equal
1. Also, since the area under the graph of the probability density function between points a and b
is the same regardless of whether the end points a and b are themselves included.

That is, Pa  x  b  Pa  x  b


6.4.1 Normal Random Variables
The most important type of random variable is the normal random variable. The probability density
function of a normal random variable X is determined by two parameters: the expected value and
the standard deviation of X . We designate these values as  and  , respectively.

  EX  And   SD X 


The normal probability density function is a bell-shaped density curve that is symmetric about the
value  ; its variability is measured by  . The larger  is, the more variability there is in the curve.

90 | P a g e
Since the probability density function of a normal random variable 𝑋 is symmetric about its
expected value 𝜇; it follows that 𝑋 is equally likely to be on either side of 𝜇. That is,
𝑃{𝑋 < 𝜇} = 𝑃{𝑋 ≥ 𝜇} = 0.5
Not all bell-shaped symmetric density curves are normal. The normal density curves are specified
by a particular formula:
1 (𝑥−𝜇)2

𝑓(𝑥) = 𝑒 2𝜎2
√2𝜋𝜎
A normal random variable having mean value 0 and standard deviation 1 is called a standard
normal variable, and its density curve is called the standard normal curve. The letter 𝑍 represents
a standard normal random variable.
1 (𝑥)2

𝑓(𝑥) = 𝑒 2
√2𝜋

Probabilities Associated with a Standard Normal Random Variable

𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑛 𝑋−𝜇


𝑍= =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝜎

Once the 𝑋 values are transformed by using the above formula, they are called 𝑍 value is actually
the number of standard deviations that a particular 𝑋 value is a way from the mean.
Steps to find areas under the normal distribution curve
1. Between 0 and any 𝑍 value: Look up the 𝑍 value in the table to get the area
2. In any tail
a. Look up the 𝑍 value to get the area
b. Subtract the area from 0.5
3. Between 𝑍 values on the same side of the mean
a. Look up both 𝑍 values to get the area
b. Subtract the smaller area from the larger area
4. Between two 𝑍 values on opposite sides of the mean
a. Look up both 𝑍 values to get the area
b. Add the areas
5. Less than any 𝑍 value to get the right of the mean
a. Look up the 𝑍 value to get the area

91 | P a g e
b. Add 0.5 to the area
6. Greater than any 𝑍 value to the left of the mean
a. Look up the 𝑍 value in the table to get the area
b. Add 0.5 to the area
7. In any two tailed
a. Look up 𝑍 values in the table to get the areas
b. Subtract both areas from 0.5
c. Add the answer
General procedure is
Draw the picture
Shade the area desired
Find the correct figure
Follow the direction
Example 6.15: Find the area under the normal distribution curve between 𝑍 = 0 and 𝑍 = 2.34
Solution: Draw the area as follows:

0 2.34

Since 𝑍 table gives the area between 0 and any 𝑍 value to the right of 0, one need look up the
𝑍 value in the table. Find 2.3 in the left column and 0.04 in the top row. The value where the
column and row meet in the table is the answer, 0.4904.

𝒁 0.00 0.01 0.02 0.03 0.04 …

0.0

0.1

0.2

92 | P a g e

2.2

2.3 0.4904

Example 6.16: Find


A. 𝑃{𝑍 < 1.5}
B. 𝑃{𝑍 ≥ 0.8}
Solution:
A. Draw the area as follows:

0 1.50

0 0 1.5
0 1.50

𝑃{𝑍 < 1.5} = 0.5 + 𝑃{0 < 𝑍 < 1.5} = 0.5 + 0.4332 = 0.9332

B. Draw the required area as follows:

0 0.8

93 | P a g e
0 0.8 0 0 0.8

𝑃{𝑍 ≥ 0.8} = 0.5 − 𝑃{0 < 𝑍 < 0.8} = 0.5 − 0.2881 = 0.2119
Example 6.17: Find
A. 𝑃{1 < 𝑍 < 2}
B. 𝑃{−1.5 < 𝑍 < 2.5}
Solution:
A. Draw the graph as follows:

0 1 2

0 1 0 2 0 1 2
2

𝑃{1 < 𝑍 < 2} = 𝑃{0 < 𝑍 < 2} − 𝑃{0 < 𝑍 < 1} = 0.4772 − 0.3159 = 0.1359

B. Draw the graph as follows:

-1.5 0 2.5

94 | P a g e
-1.50 2.5
-1.50 0 2.5

𝑃{−1.5 < 𝑍 < 2.5} = 𝑃{−1.5 < 𝑍 < 0} + 𝑃{0 < 𝑍 < 2.5}
Since 𝑃{−1.5 < 𝑍 < 0} = 𝑃{0 < 𝑍 < 1.5} , due to symmetric property of normal distribution
𝑃{−1.5 < 𝑍 < 2.5} = 𝑃{0 < 𝑍 < 1.5} + 𝑃{0 < 𝑍 < 2.5} = 0.4332 + 0.4938 = 0.9270
Finding Normal Probabilities: Conversion to the Standard Normal
Let 𝑋be a normal random variable with mean 𝜇 and standard deviation 𝜎. We can determine
probabilities concerning 𝑋 by using the fact that the variable 𝑍 defined by
𝑋−𝜇
𝑍=
𝜎
has a standard normal distribution.
We can compute any probability statements in terms of 𝑍. For example,
𝑋−𝜇 𝑎−𝜇 𝑋−𝜇
𝑃{𝑋 < 𝑎} = 𝑃 { < } = 𝑃 {𝑍 < }
𝜎 𝜎 𝜎
where 𝑍 is a standard normal random variable
Example 6.18: IQ examination scores for sixth-graders are normally distributed with mean value
100 and standard deviation 14.2.
A. What is the probability a randomly chosen sixth-grader has a score greater
than 130?
B. What is the probability a randomly chosen sixth-grader has score between 90
and 115?

Solution: Let 𝑋 denote the score of a randomly chosen student. We compute probabilities
concerning 𝑋 by making use of the fact that the standardized variable
𝑋 − 100
𝑍=
14.2
has a standard normal distribution
𝑋−100 130−100
A. 𝑃{𝑋 > 130} = 𝑃 { > } = 𝑃{𝑍 > 2.1127} = 0.0170
14.2 14.2

B. The inequality 90 < 𝑋 < 115 is equivalent to

95 | P a g e
90 − 100 𝑋 − 100 115 − 100
< <
14.2 14.2 14.2
Or equivalently,
−0.7042 < 𝑍 < 1.0560
Therefore,
𝑃{90 < 𝑋 < 115} = 𝑃{−0.7042 < 𝑍 < 1.0560}
= 𝑃{0 < 𝑍 < 0.7042} + 𝑃{0 < 𝑍 < 1.0560} = 0.6120
Properties of the Normal distribution
1. The normal distribution curve is bell-shaped
2. The mean, median and mode are equal and located at the center of the distribution
3. The normal distribution curve is unimodal
4. The curve is symmetrical about the mean, which is equivalent to saying that is shape the
same on both sides of vertical line passing through the center
5. The curve is continuous. That is, no gaps or holes
6. The curve never touches the 𝑥 axis
7. The total area under the normal distribution curve is equal to 1
Relation between Binomial and Normal Distribution
Normal distribution is a limiting case of the Binomial probability distribution under the following
condition:
I. 𝑛, the number of trial is indefinitely large
II. Neither 𝑃 and 𝑞 is very small
We know that for a Binomial variable 𝑋 with parameters 𝑛 and 𝑝
𝐸[𝑋] = 𝑛𝑝
𝑉𝑎𝑟[𝑋] = 𝑛𝑝𝑞

De-Moivre provide that under the above two conditions, the distribution of standard Binomial
variable
𝑋 − 𝐸[𝑋] 𝑋 − 𝑛𝑝
𝑍= =
𝜎 √𝑛𝑝𝑞
tends to the distribution of standard normal distribution. If 𝑝 and 𝑞 are nearly equal (i.e., 𝑝 is nearly
0.5), then the normal approximation is surprisingly good even for small values of 𝑛.
Relation between Poisson and Normal Distribution

96 | P a g e
If 𝑋 is a random variable following Poisson distribution with parameter 𝜆, then 𝐸[𝑋] = 𝜆,
𝑉𝑎𝑟[𝑋] = 𝜆
𝑋−𝐸[𝑋] 𝑋−𝜆
Thus standard Poisson variable becomes 𝑍 = = . It has been proved that this variable
𝜎 √𝜆

tends to be a standard normal variable if 𝜆 → ∞


6.4.2 Chi-square Distribution:  2  Distribution
The square of a standard normal variable is called a chi-square variable with one degree of
freedom. Thus if 𝑋 is a random variable following normal distribution with mean 𝜇 and standard
(𝑋−𝜇) 𝑋−𝜇 2
deviation 𝜎, then is a standard normal variable. ( ) is a chi-square variate with 1 degree
𝜎 𝜎

of freedom.
If 𝑥1 , 𝑥2 , … , 𝑥𝑣 are 𝑣 independent random variables following normal distribution with means
𝜇1 , 𝜇2 , … , 𝜇𝑣 and standard deviations 𝜎1 , 𝜎2 , … , 𝜎𝑣 respectively then the variate

2 =(
x1 − μ1 2
) +(
x2 − μ2 2
) + ⋯+ (
xv − μv 2
) = 𝑍21 + 𝑍22 + ⋯ + 𝑍2𝑣 = ∑ 𝑍2𝑖
σ1 σ2 σv
this is the sum of the square of 𝑣 independent standard normal variates, follows chi-square
distribution with 𝑣 degree of freedom.
Applications of chi-square distribution
Chi-square distribution has a number of applications. Some of which are listed below
Chi-square test of goodness of fit
Chi-square test for independence of attributes
To test whether the population has a specified value of the variance

6.4.3Student’s 𝒕 distribution
It is often the case that one wants to calculate the size of sample needed to obtain a certain level of
confidence in survey results. Unfortunately, this calculation requires prior knowledge of the
population standard deviation (𝜎). Realistically, 𝜎 is unknown. Often a preliminary sample will be
conducted so that a reasonable estimate of this critical population parameter can be made. If such
a preliminary sample is not made, but confidence intervals for the population mean are to be
constructing using an unknown 𝜎, then the distribution known as the Student t distribution can be
used.

97 | P a g e
A random variable X has a t distribution (students t–distribution) if its probability distribution
given by
𝑣 +1
𝛤( ) 𝑥 2 −(𝑣+1)/2
𝑓(𝑥, 𝑣) = 2
𝑣 (1 + ) for - ∞ < x < ∞
√𝑣𝜋 𝛤(2) 𝑣

with v degrees of freedom. If v is large (v ≥ 30), the graph of f (x) closely approximates the standard
normal curve.
 Properties of t Distribution
a. Le X be a t-distribution random variable with parameter v then
𝑣 +1
∞ 𝛤(
2
) 𝑥 2 −(𝑣+1)/2
Mean, E(X) = ∫−∞ 𝑥 𝑣 (1 + ) 𝑑𝑥 = 0 for v >2
√ 𝑣𝜋 𝛤( ) 𝑣
2
𝑣
Variance, Var(X) = E(X – E(x))2 = 𝑣−2 for v > 2, and population variance is unknown.
b. The Student t distribution is different for different sample sizes.
c. The Student t distribution is generally bell-shaped, but with smaller sample sizes shows
increased variability (flatter). In other words, the distribution is less peaked than a normal
distribution and with thicker tails. As the sample size increases, the distribution approaches
a normal distribution. For n > 30, the differences are negligible.
d. The distribution is symmetrical about the mean. i.e. about zero.
e. The variance is greater than one, but approaches one from above as the sample size increases
(𝜎=1 for the standard normal distribution).
f. The population is essentially normal (unimodal and basically symmetric)

98 | P a g e

You might also like