Basic Statistics Module
Basic Statistics Module
1|Page
CHAPTER FOUR ........................................................................................................................................... 46
4. Measures of Dispersion (Variation) .................................................................................................... 46
4.1 Introduction ...................................................................................................................................... 46
4.2 Absolute and Relative Measures of Dispersion ................................................................................ 47
4.3 Types of Measures of Variation ........................................................................................................ 48
4.3.1 The Range and Relative Range ................................................................................................... 48
4.3.2 The Quartile Deviation and Coefficient of Quartile Deviation ................................................... 50
4.3.3 The Mean Deviation and Coefficient of Mean Deviation .......................................................... 51
4.3.4 The Variance, Standard Deviation and Coefficient of Variation ................................................ 54
4.4 Standard Scores (Z-Scores) ......................................................................................................... 59
4.5 Moments, Skewness and Kurtosis ....................................................................................... 60
4.5.1 Moments ............................................................................................................................. 60
4.5.2 Skewness ............................................................................................................................. 62
4.5.3 Kurtosis ............................................................................................................................... 64
CHAPTER FIVE ............................................................................................................................................. 66
5 Elementary Probability........................................................................................................................ 66
5.1 Introduction ....................................................................................................................................... 67
5.2 Definitions of Some concepts of Probability Terms ......................................................................... 67
5.3 Counting Rules.................................................................................................................................. 68
5.3.1 Addition Rule.............................................................................................................................. 69
5.3.2 Multiplication (Fundamental) Rule ............................................................................................ 69
5.3.3 Permutation Rule ....................................................................................................................... 69
5.3.4 Combination Rule....................................................................................................................... 71
5.4 Approaches in Probability Definition ............................................................................................... 71
5.4.1 Subjective approach:.................................................................................................................. 72
5.4.2 Objective approach: ................................................................................................................... 72
5.5 Some Probability Rules..................................................................................................................... 74
5.5 Conditional Probability and Independence ....................................................................................... 74
CHAPTER SIX................................................................................................................................................ 76
6 Probability Distribution ....................................................................................................................... 76
6.1 The Definition of Random Variable and Probability Distribution .................................................... 76
6.1.1 Discrete Random Variable and Probability Distribution (pmf) .................................................. 77
2|Page
6.1.2 Continuous Random Variable and Probability Distribution ....................................................... 81
6.2 Introduction to Expectation- Mean and Variance of a Random Variable ......................................... 83
6.3 Common Discrete Probability Distribution ...................................................................................... 87
6.3.1. Binomial Distribution ................................................................................................................ 87
6.3.2 The Poisson Distribution ............................................................................................................ 88
6.4 Common Continuous Probability Distribution ................................................................................. 90
6.4.1 Normal Random Variables ......................................................................................................... 90
3|Page
CHAPTER ONE
1. Introduction
1.1 Definitions and classification of Statistics
Statistics is defined differently by different authors over period of time. In the olden days statistics
was confined to only state affairs but in modern days it embraces almost every sphere of human
activity. Therefore, a number of old definitions, which were confined to narrow field of enquiry,
were replaced by more definitions, which are much more comprehensive and exhaustive.
We can define statistics in two senses
• In the plural sense: statistics are the raw data themselves (Numerical facts), like statistics of
births, statistics of deaths, statistics of students, statistics of imports and exports, etc.
• In the singular sense: Statistics is the science of conducting studies to collect, organize,
summarize, analyze, and well as deriving valid conclusions and making reasonable
decisions on the basis of data.
Classifications:
Depending on how data can be used statistics is sometimes divided in to two main areas or
branches.
1. Descriptive Statistics:
• Is concerned with summary calculations, graphs, charts and tables.
• In descriptive statistics our objective is to describe a group of data that we have ‘in hand’ i.e.
data that are accessible to us.
• Generally characterizes or describes a set of data elements by graphically displaying the
information or describing its central tendencies and how it is distributed.
Example: the following data refers to the number of malaria patients who have been treated
in Debre Berhan referal Hospital from 1986 to 1990 (Eth. Calendar).
3645; 4568; 5432; 6751; 7369
If we calculate the average malaria patients from 1986 to 1990 as
1
Average (3645 4568 5432 6751 7369) 5553, then our work belongs to the
5
domain of descriptive statistics.
If we say that there was an increase of 724 patients from 1986 to 1990, then again this belongs
to the domain of descriptive statistics.
4|Page
2. Inferential Statistics: consists of generalizing from samples to populations, performing
estimations and hypothesis tests, determining relationships among variables, and making
predictions. Statistical techniques based on probability theory are required.
Example 1.1: In the above example if we predict the number of malaria patients in the year
1995 to be 9917, then our work belongs to the domain of inferential statistics.
Example 1.2: Suppose we want to have an idea about the percentage of illiterates in our
country. We take a sample from the population and find the proportion of illiterates in the
sample. This sample proportion with the help of probability enables us to make some
inferences about the population proportion. This study belongs to inferential statistics.
5|Page
II. Proper collection of data: in order to draw valid conclusions, it is important ‘good’ data.
Data are gathered with aim to meet predetermine objectives. In other words, the data must
provide answers to problems. The data itself form the foundation of statistical analyses and
hence the data must be carefully and accurately collected. In section 1.6 we will see the
methods of data collection.
III. Organization and classification of data: in this stage the collected data organized in a
systematic manner. That means the data must be placed in relation to each other. The
classification or sorting out of data is, by itself, a kind of organization of data.
IV. Presentation of data: The purpose of putting the organized data in graphs, charts and tables
is two-fold. First, it is a visual way to look at the data and see what happened and make
interpretations. Second, it is usually the best way to show the data to others. Reading lots of
numbers in the text puts people to sleep and does little to convey information.
V. Analyses of data: is the process of looking at and summarizing data with the intent to extract
useful information and develop conclusions. Data analysis is closely related to data mining,
but data mining tends to focus on larger data sets, with less emphasis on making inference,
and often uses data that was originally collected for a different purpose. In this stage
different types of inferential statistical methods will apply. For instance, hypothesis testing
such as 2 test of association.
VI. Interpretation of data: interpretation means drawing valid conclusions from data which
form the basis of decision making. Correct interpretation requires a high degree of skill and
experience.
Note that: Analyses and interpretation of data are the two sides of the same coin.
6|Page
Population: The totality of all subjects with certain common characteristics that are
being studied in a specified time and place.
Sample: Is a portion of a population which is selected using some technique of sampling. Sample
must be representative of the population so that it must be selected by any of the developed
technique.
Sampling: Is the process of selecting units (e.g., people, households, organizations) from a
population of interest so that by studying the sample we may fairly generalize our results back to
the population from which they were chosen.
Sample size: The number of elements or observation to be included in the sample.
Parameter: Any measure computed from the data of a population. Example: Populations mean
(µ) and population standard deviation (𝜎)
Statistic: Any measure computed from the sample. Example: sample mean (𝑥̅ ), sample standard
deviation (s)
Survey: A collection of quantitative information about members of a population when no special
control is exercised over any of the factors influencing the variable of interest.
Sample survey: A survey that include only a portion of the population.
Census: A collection of information about every member of a population
Sample survey has the following advantages over census
• Sample survey saves time and cost
• Has great accuracy
• Avoid wastage of material
Variable: A variable is a characteristic or attribute that can assume different values. Variables
whose values are determined by chance are called random variables. Variables are often specified
according to their type and intended use and hence variable can be classified in to two namely
qualitative and quantitative variables.
• A quantitative variable is naturally measured as a number for which meaningful arithmetic
operations make sense. Examples: Height, age, crop yield, GPA, salary, temperature, area,
air pollution index (measured in parts per million), etc.
• Qualitative variable: Any variable that is not quantitative is qualitative. Qualitative
variables take a value that is one of several possible categories. As naturally measured,
7|Page
qualitative variables have no numerical meaning. Examples: Hair color, gender, field of
study, marital status, political affiliation, status of disease infection.
Quantitative variables can be classified as discrete and continuous variable.
1. Discrete variables can assume certain numerical values. That is, there are gaps between the
possible values. Such as 0, 1, 2...It may be countable finite or countable infinite. For example
the number of students in a classroom, number of children a family.
2. Continuous variable can take any value within a specified interval with a finite enough
measuring device. No gaps between possible values. They are obtained by measuring. For
example, consider the heights of two people no matter how close it is we can find another
person whose height falls somewhere between the two heights is a continuous variable.
8|Page
Statistics with all its wide application in every sphere of human activity has its own limitation.
Some of them are given below
Statistics is not suitable to the study of qualitative phenomenon: Since statistics is basically
a science and deals with a set of numerical data, it is applicable to the study of only these
subjects of enquiry, which can be expressed in terms of quantitative measurements. As a matter
of fact, qualitative phenomenon like honesty, poverty, beauty, intelligence etc, cannot be
expressed numerically and any statistical analysis cannot be directly applied on these
qualitative phenomenons. Nevertheless, statistical techniques may be applied indirectly by first
reducing the qualitative expressions to accurate quantitative terms. For example, the
intelligence of a group of students can be studied on the basis of their marks in a particular
examination.
Statistics does not study individuals: Statistics does not give any specific importance to the
individual items; in fact it deals with an aggregate of objects. Individual items, when they are
taken individually do not constitute any statistical data and do not serve any purpose for any
statistical enquiry.
Statistical laws are not exact: It is well known that mathematical and physical sciences are
exact. But statistical laws are not exact and statistical laws are only approximations. Statistical
conclusions are not universally true. They are true only on an average.
Statistics table may be misused: Statistics must be used only by experts; otherwise, statistical
methods are the most dangerous tools on the hands of the inexpert. The use of statistical tools
by the inexperienced and untraced persons might lead to wrong conclusions. Statistics can be
easily misused by quoting wrong figures of data. As King says aptly ‘statistics are like clay of
which one can make a God or Devil as one pleases.’
Statistics is one of the methods of studying a problem: Statistical method does not provide
complete solution of the problems because problems are to be studied taking the background
of the countries culture, philosophy or religion into consideration. Thus the statistical study
should be supplemented by other evidences.
1.5 Scales of Measurement
Normally, when one hears the term measurement, they may think in terms of measuring the length
of something (i.e. the length of a piece of wood) or measuring a quantity of something (i.e. a cup
of flour). This represents a limited use of the term measurement. In statistics, the term measurement
9|Page
is used more broadly and is more appropriately termed scales of measurement. Scales of
measurement refer to ways in which variables or numbers are defined and categorized. Each scale
of measurement has certain properties which in turn determine the appropriateness for use of
certain statistical analyses. The four scales of measurement are nominal, ordinal, interval, and
ratio.
Nominal Scales
Nominal scales possess the following properties.
Level of measurement which classifies data into mutually exclusive, all-inclusive
categories in which no order or ranking can be imposed on the data.
No arithmetic and relational operation can be applied.
No quantitative information is conveyed
Thus only gives names or labels to various categories.
Examples:
Political party preference (Republican, Democrat, or Other,)
Sex (Male or Female.)
Marital status (married, single, widow, divorce)
Country code
Regional differentiation of Ethiopia.
2. Ordinal Scales
Ordinal Scales are measurement systems that possess the following properties:
Level of measurement which classifies data into categories that can be ranked, however
Differences between the ranks do not exist.
Arithmetic operations are not applicable but relational operations are applicable.
Ordering is the sole property of ordinal scale.
Examples:
Letter grades (A, B, C, D, F).
Rating scales (Excellent, Very good, Good, Fair, poor).
Military status.
3. Interval Scales
Interval scales are measurement systems that possess the following properties:
10 | P a g e
Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
All arithmetic operations except division are applicable.
Relational operations are also possible.
Examples:
IQ, Temperature in F0.
4. Ratio Scales
Ratio scales measurement possess the following properties: Level of measurement which classifies
data that can be ranked, differences are meaningful, and there is a true zero. True ratios exist
between the different units of measure.
All arithmetic and relational operations are applicable.
Examples:
Weight
Height
Number of students
Age
Use of level of measurements
Helps you decide how to interpret the data from the variable.
Helps you decide what statistical analysis is appropriate on the values that were assigned.
For example if a measurement is nominal then you know that you never average the data
level.
Exercise 1: Classify the following different measurement systems into one of the four types of
scales.
a) Your checking account number as a name for your account.
b) Your checking account balance as a measure of the amount of money you have in that
account
c) Your score on the first statistics test as a measure of your knowledge of statistic
d) A response to the statement "Abortion is a woman's right" where "Strongly Disagree" =
1, "Disagree" = 2, "No Opinion" = 3, "Agree" = 4, and "Strongly Agree" = 5, as a measure
of attitude toward abortion.
e) Times for swimmers to complete a 50-meter race
11 | P a g e
f) Months of the year Meskerm, Tikimit…
g) Socioeconomic status of a family when classified as low, middle and upper classes.
h) Blood type of individuals, A, B, AB and O.
i) Pollen counts provided as numbers between 1 and 10 where 1 implies there is almost no
pollen and 10 that it is rampant, but for which the values do not represent an actual counts
of grains of pollen.
j) Regions numbers of Ethiopia
k) The number of students in a college
l) The net wages of a group of workers
m) The height of the men in a town
12 | P a g e
CHAPTER TWO
2. Methods of Data Collection and Presentation
2.1 Methods of Data Collection
Once it is decided what type of study is to be made, it becomes necessary to collected information
about the concerned study, mostly in the form of data. In order to generate valid conclusion from
a data, information has to be collected in a systematic manner. Whatever the quality of sampling
and analysis method, a haphazardly collected dataset is less likely to produce valuable and
generalizable information.
2.1.1 Sources of Data
There are two sources of data these are primary and secondary sources. Depending on its source
data can also be classified into two types.
(1). Primary Data (2). Secondary Data
1) Primary data
• The primary data are the first hand information collected, compiled and published by
organization for some purpose. They are most original data in character and have not
undergone any sort of statistical treatment.
• Refer to those that are collected by conducting survey to meet the specific problem needs at
hand.
Example: Population census reports are primary data because these are collected, complied and
published by the population census organization.
2) Secondary data
• The secondary data are the second hand information which are already collected by
someone (organization) for some purpose and are available for the present study. The
secondary data are not pure in character and have undergone some treatment at least once.
• Data taken from already available published or unpublished source.
There are three major methods of data collection
1. self-administered questionnaire
2. direct investigation-measurement (observation) of the subject and interviewing(face-to-
face, telephone, …)
3. the use of documentary source
1. Self-administered questionnaire
13 | P a g e
Questionnaire is the main data collection instrument in formal sample survey. Before examining
the steps in designing a questionnaire we need to review the types of questions used in
questionnaires. Depending on the amount of freedom given to respondent in offering responses,
there are two basic types of questions that can be used in questionnaires: open-ended questions
and closed ended questions.
The type of questions for use will be determined by the form of responses wanted, the nature of
the respondents and their ability to answer the questions.
Open-ended questions: - allows the respondent to answer it freely in his or her own words
Example: what do you think are the reasons for a high drop-out rate of village health committee
members?
Closed- ended questions:-
Predetermined list of alternate responses is presented to the respondent for checking the appropriate
one(s). It implies that the respondent’s answers are restricted in some way to a limited range of
alternatives.
Advantage
• It is the cheapest and can be conducted by a single researcher.
• Questionnaires can be sending to a wide geographical area.
• There is no interviewer variability
Disadvantage
• Low response rate
• No assurance that the questioners was answered by the right person.
• Mail questionnaire is not suitable for illiterate community
2. direct investigation
i. measurement or/and observation
• data can be obtained through direct observation or measurement
• provides accurate information but it is expensive and inconvenient
eg: Land area measurement, Animal weight gain, Physical examination, direct observation of
work.
ii. Interview
a) Face-to-Face interview
Advantage:-
14 | P a g e
• Interviewers can observe the surroundings and can use nonverbal communication and
visual aids.
• The interviewer can help the respondent if he/she has difficulty in understanding the
questions.
• Respondent is likely to answer all the questions alone
Disadvantage:-
• Cost is high
• Interviewer bias is also high
• Untrained interviewer may distort the meaning of the questions
b) Telephone Interview
Advantage:-
• It is less expensive in time and money compared to face to face interviews
• Relatively high response rate
• Reach people who would not open their doors to an interviewer, but might willing to
talk on the telephone
Disadvantage:-
• Unrepresentative of the groups which do not have telephones
• Unlisted telephone numbers are excluded from the study.
• Respondent may be substitute by another
3. The use of documentary source
• Extracting information from existing resources.
• Is much less expensive than any other two sources
• It is difficult to get the information needed when records are compiled in
unstandardized manner.
Example: - Hospital records, professional institutes, Official statistics, - - -
Editing of Data:
After collecting the data either from primary or secondary source, the next step is it’s editing.
Editing means the examination of collected data to discover any error and mistake before presenting it. It
has to be decided before hand what degree of accuracy is wanted and what extent of errors can be
tolerated in the inquiry. The editing of secondary data is simpler than that of primary data.
15 | P a g e
2.2 Methods of Data Presentation
2.2.1 Introduction
This topic introduces tabular and graphical methods commonly used to summarize both qualitative and
quantitative data. Tabular and graphical summaries of data can be obtained in annual reports, newspaper
articles and research studies. Everyone is exposed to these types of presentations, so it is important to
understand how they are prepared and how they will be interpreted.
Modern statistical software packages provide extensive capabilities for summarizing data and preparing
graphical presentations. MINITAB, SPSS, STATA and R are three packages that are widely available.
Tabulation of Data: The process of placing classified data into tabular form is known as tabulation. A
table is a symmetric arrangement of statistical data in rows and columns. Rows are horizontal
arrangements whereas columns are vertical arrangements.
2.2.2 Frequency Distribution
A frequency distribution is the organization of row data in table form, using classes and frequencies. There
are three basic types of frequency distributions, and there are specific procedures for constructing each type.
The three types are categorical, ungrouped and grouped frequency distributions.
The reasons for constructing a frequency distribution are as follows
• To organize the data in a meaningful, intelligible way.
• To enable the reader to determine the nature or shape of the distribution
• To facilitate computational procedures for measures of average and spread
• To enable the researcher to draw charts and graphs for the presentation of data
• To enable the reader to make comparisons between different data set
Some of basic terms that are most frequently used while we deal with frequency distribution are the
following:
• Lower Class Limits are the smallest number that can belong to the different class.
• Upper Class Limits are the largest number that can belong to the different classes.
• Class Boundaries are the number used to separate classes, but without the gaps created by
class limits.
• Class midpoints are the midpoints of the classes. Each class midpoint can be found by
adding the lower class limit to the upper class limit and dividing the sum by 2.
• Class width is the difference between two consecutive lower class limits or two
consecutive lower class boundaries.
16 | P a g e
2.2.2.1Categorical Frequency Distribution
The categorical frequency distribution is used for data which can be placed in specific categories such as
nominal or ordinal level data. For example, data such as data such as political affiliation, religious
affiliation, or major field of study would use categorical frequency distribution.
The major components of categorical frequency distribution are class, tally and frequency. Moreover, even
if percentage is not normally a part of a frequency distribution, it will be added since it is used in certain
types of graphical presentations, such as pie graph.
Steps of constructing categorical frequency distribution
1. You have to identify that the data is in nominal or ordinal scale of measurement
2. Make a table as show below
A B C D
Class Tally Frequency Percent
17 | P a g e
Solution:
A B C D
class Tally Frequency Percent
A //// 5 20
B //// // 7 28
O //// //// 9 36
AB //// 4 16
18 | P a g e
5 8 42
7 5 47
8 2 49
9 1 50
B. Since 12 of the 50workers had no days of sick leave, the answer is 50-12=38
𝑅𝑎𝑛𝑔𝑒 𝑅
𝑊𝑖𝑑𝑡ℎ = 𝑜𝑟 𝑊 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠𝑒𝑠 𝐾
19 | P a g e
Note that: Round the answer up to the nearest whole number if there is a reminder. For instance,
4.7 ≈ 5 and 4.12 ≈ 5
5. Select the starting point as the lowest class limit. This is usually the lowest score (observation).
Add the width to that score to get the lower class limit of the next class. Keep adding until you
achieve the number of desired class(𝐾) calculated in step 3.
6. Find the upper class limit; subtract unit of measurement(𝑈) from the lower class limit of the second
class in order to get the upper limit of the first class. Then add the width to each upper class limit
to get all upper class limits.
Unit of measurement: Is the next expected upcoming value. For instance, 28, 23, 52, and then the
unit of measurement is one. Because take one datum arbitrarily, say 23, then the next upcoming
value will be 24. Therefore,𝑈 = 24 − 23 = 1. If the data is 24.12, 30, 21.2 then give priority to the
datum with more decimal place. Take 24.12 and guess the next possible value. It is 24.13.
Therefore, 𝑈 = 24.12 − 24.13 = 0.01.
Note that: 𝑈 = 1 is the maximum value of unit of measurement and is the value when we don’t
have a clue about the data.
7. Find the class boundaries.
𝑈
𝑳𝑜𝑤𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑩𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = 𝑳𝑜𝑤𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑳𝑖𝑚𝑖𝑡 − 2 and,
𝑈
𝑼𝑝𝑝𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑩𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = 𝑼𝑝𝑝𝑒𝑟 𝑪𝑙𝑎𝑠𝑠 𝑳𝑖𝑚𝑖𝑡 − 2 .
𝑈 𝑈
In short, 𝐿𝐶𝐵 = 𝐿𝐶𝐿 − 2 and 𝑈𝐶𝐵 = 𝑈𝐶𝐿 − 2 .
8. Tally the data and write the numerical values for tallies in the frequency column
9. Find cumulative frequency. We have two type of cumulative frequency namely less than
cumulative frequency and more than cumulative frequency. Less than cumulative frequency is
obtained by adding successively the frequencies of all the previous classes including the class
against which it is written. The cumulate is started from the lowest to the highest size. More than
cumulative frequency is obtained by finding the cumulate total of frequencies starting from the
highest to the lowest class.
For example, the following frequency distribution table gives the marks obtained by 40 students:
20 | P a g e
The above table shows how to find less than cumulative frequency and the table shown below
shows how to find more than cumulative frequency.
5.511
Example 2.3: Consider the following set of data and construct the grouped frequency distribution.
11 29 6 33 14 21 18 17 22 38
31 22 27 19 22 23 26 39 34 27
Steps
1. Highest value = 39, Lowest value = 6
2. R = 39 − 6 = 33
3. K = 1 + 3.32 log 20 = 5.32 ≈ 6
R 33
4. W = K = 6
= 5.5 ≈ 6
5. Select starting point. Take the minimum which is 6 then add width 6 on it to get the next class
LCL.
6 12 18 24 30 36
6. Upper class limit. Since unit of measurement is one. 12 − 1 = 11. So 11 is the UCL of the first
class. Therefore, 6 − 11 is the first class
Class limit 6-11 12-17 18-23 24-29 30-35 36-41
7. Find the class boundaries. Take the formula in step 7.
21 | P a g e
LCB1 = LCL1 − 0.5 , and UCB1 = UCL1 − 0.5
22 | P a g e
3. Represent the class boundaries for the histogram or Ogive or the mid-point for the frequency
polygon on the x axis.
4. Plot the points
5. Draw the bars or lines
2.2.3.1 Diagrammatic display of data: Bar charts, Pie-chart, Cartograms
I. Pie chart
Pie chart can used to compare the relation between the whole and its components. Pie chart is a circular
diagram and the area of the sector of a circle is used in pie chart. Circles are drawn with radii proportional
to the square root of the quantities because the area of a circle is πr 2.
To construct a pie chart (sector diagram), we draw a circle with radius (square root of the total). The total
angle of the circle is360o . The angles of each component are calculated by the formula.
Component Part
Angle of Sector = x360o
Total
These angles are made in the circle by mean of a protractor to show different components. The arrangement
of the sectors is usually anti-clock wise.
Example 2.4: The following table gives the details of monthly budget of a family. Represent these figures
by a suitable diagram.
23 | P a g e
Monthly family budget
misclaneous
20%
food
Fuel and Light 40%
7%
House Rent
27%
clothing
6%
24 | P a g e
b) Multiple bar charts are used two or more sets of inter-related data are represented (multiple
bar diagram facilities comparison between more than one phenomenons). The technique of
simple bar chart is used to draw this diagram but the difference is that we use different shades,
colors, or dots to distinguish between different phenomena.
Example 2.6: Draw a multiple bar chart to represent the import and export of Canada (values in $) for the
years 1991 to 1995.
Years 1991 1992 1993 1994 1995
Imports 7930 8850 9780 11720 12150
Exports 4260 5225 6150 7340 8145
c) Stratified (Stacked or component) Bar Chart is used to represent data in which the total magnitude
is divided into different or components. In this diagram, first we make simple bars for each class taking
total magnitude in that class and then divide these simple bars into parts in the ratio of various
components. This type of diagram shows the variation in different components within each class as
well as between different classes. Sub-divided bar diagram is also known as component bar chart or
staked chart.
25 | P a g e
Example 2.7: The table below shows the quantity in hundred kgs of Wheat, Barley and Oats produced on
a certain form during the years 1991 to 1994. Draw stratified bar chart.
Years 1991 1992 1993 1994
Wheat 34 43 43 45
Barley 18 14 16 13
Oats 27 24 27 34
Solution: To make the component bar chart, first of all we have to take year wise total production.
26 | P a g e
i. Histogram
Histogram is a special type of bar chart in which the horizontal scale represents classes of data values and
the vertical scale represents frequencies. The height of the bars correspond to the frequency values, and the
drawn adjacent to each other (without gaps).
We can construct a histogram after we have first completed a frequency distribution table for a data set.
Example 2.8: Take the data in example 2.3.
7.0
6.0
5.0
Frequency 4. 0
3.0
2.0
1.0
Relative frequency histogram has the same shape and vertical (𝑦 𝑎𝑥𝑖𝑠) scale as a histogram, but the vertical
(𝑦 𝑎𝑥𝑖𝑠) scale is marked with relative frequencies instead of actual frequencies.
ii. Frequency Polygon
A frequency polygon uses line segment connected to points located directly above class midpoint values.
The heights of the points correspond to the class frequencies, and the line segments are extended to the left
and right so that the graph begins and ends on the horizontal axis with the same distance that the previous
and next midpoint would be located.
Example 2.9: Take the data in example 2.3.
7.0
6.0
5.0
4.0
3.0
2.0
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5
Midpoints
27 | P a g e
iii. Ogive Graph
An Ogive (pronounced as “oh-jive”) is a line that depicts cumulative frequencies, just as the cumulative
frequency distribution lists cumulative frequencies. Note that the Ogive uses class boundaries along the
horizontal scale, and graph begins with the lower boundary of the first class and ends with the upper
boundary of the last class. Ogive is useful for determining the number of values below some particular
value. There are two type of Ogive namely less than Ogive and more than Ogive. The difference is that
less than Ogive uses less than cumulative frequency and more than Ogive uses more than cumulative
frequency on 𝑦 axis.
Example 2.10: Take the data in example 2.3 and draw less than and more than Ogive
15
10
CHAPTER THREE
3. Measures of Central Tendency
3.1 Introduction
When we want to make comparison between groups of numbers it is good to have a single value that is
considered to be a good representative of each group. This single value is called the average of the
group. Averages are also called measures of central tendency.
Measures of central tendency are numerical measures which intends to describe the middle value or the
central value or the typical value in a given data set. An average which is representative is called typical
28 | P a g e
average and an average which is not representative and has only a theoretical value is called a descriptive
average.
At the end of this chapter students will be able to:
Identify measure of central tendency.
Understand properties of arithmetic mean.
Summarize an aggregate of statistical data by using single measure.
Define and calculate the mean, mode and median.
Measure the position of data using quartiles, deciles and percentiles with their
interpretation.
3.2 Objectives of Measures of Central Tendency
29 | P a g e
3. ∑𝑛𝑖=1(𝑎 + 𝑏𝑥𝑖 ) = n.a + b∑𝑛𝑖=1 𝑥𝑖
30 | P a g e
3.5.1 Arithmetic Mean
Arithmetic mean is defined as the sum of the measurements of the items divided by the total number of
items. It is usually denoted by 𝑥̅ .
Solution:
The sample values are 20, 26, 40, 36, 23, 42, 35, 24, and 30
∑𝑛
𝑖=1 𝑥𝑖 20+ 26+40+36+23+42+35+24+30 276
𝑥̅ = = = = 30.7 days
𝑛 9 9
Example 3.3: Calculate the arithmetic mean of the sample of numbers of students in 10 classes:
50 42 48 60 58 54 50 42 50 42
∑𝑛
𝑖=1 𝑥𝑖 50+42+48+60+58+54+50+42+50+42 496
𝑥̅ = = = = 49.6 ≈ 50
𝑛 10 10
In this case there are three 42’s, one 48, three 50’s, one 54, one 58 and one 60. The number of
times each number occurs is called its frequency and the frequency is usually denoted by f. The
information in the sentence above can be written in a table, as follows.
31 | P a g e
Value, xi 42 48 50 54 58 60
Frequency, fi 3 1 3 1 1 1
xifi 126 48 150 54 58 60
The formula for the arithmetic mean for data of this type is
𝑥1 𝑓1 +𝑥2 𝑓2 + …+𝑥𝑘 𝑓𝑘 ∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = = ∑𝑘
𝑓1 +𝑓2 + …+ 𝑓𝑘 𝑖=1 𝑓𝑖
∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖
𝑥̅ = ∑𝑘
𝑖=1 𝑓𝑖
Let us calculate these values and make a table for these values for the sake of convenience.
Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total
Frequency (f) 5 18 42 20 8 7 100
Mid-Point (𝑥𝑖 ) 61 63 65 67 69 71
𝑓𝑖 𝑥𝑖 305 1134 2730 1340 552 497 6558
32 | P a g e
∑𝑘
𝑖=1 𝑥𝑖 𝑓𝑖 6558
𝑥̅ = ∑𝑘
= 𝑥̅ = = 65.58
𝑖=1 𝑓𝑖 100
( x x) 0
i 1
i
• The sum of squares of deviations from the mean is the least comparing to other measure of central
n
tendencies. That is, (x
i 1
i A) 2 is minimum when A x .
33 | P a g e
While calculating simple arithmetic mean, all items were assumed to be of equally importance
(each value in the data set has equal weight). When the observations have different weight, we use
weighted average. Weights are assigned to each item in proportion to its relative importance.
If 𝑥1 , 𝑥2 , … , 𝑥𝑛 represent values of the items and 𝑤1 , 𝑤2 , … , 𝑤𝑛 are the corresponding weights,
then the weighted mean, (𝑥̅𝑤 ) is given by
w1 x1 w2 x2 wn xn wi xi
xw
w1 w2 wn wi
Example 3.5: A student’s final mark in Mathematics, Physics, Chemistry and Biology are
respectively A, B, D and C. If the respective credits (weight) received for these courses are 4, 4, 3
and 2, determine the average grade the student has got for the course.
Solution
We use a weighted arithmetic mean, weight associated with each course being taken as the number
of credits received for the corresponding course.
𝑥𝑖 4 3 1 2 Total
𝑤𝑖 4 4 3 2 13
𝑥𝑖 𝑤𝑖 16 12 3 4 35
w1 x1 w2 x2 wn xn wi xi
xw
w1 w2 wn wi
16+12+3+4 35
= = = 2.69
13 13
Combined mean: When a set of observations is divided into k groups and 𝑥̅1 is the mean of n1
observations of group 1, 𝑥̅2 is the mean of n2 observations of group2, …, 𝑥̅𝑘 is the mean of nk
observations of group k, then the combined mean, denoted by 𝑥̅𝑐 , of all observations taken together
is given by
This is a special case of the weighted mean. In this case the sample sizes are the weights.
34 | P a g e
Example 3.6: In the Previous year there were two sections taking Statistics course. At the end of
the semester, the two sections got average marks of 70 & 78. There were 45 and 50 students in
each section respectively. Find the mean mark for the entire students.
Solution:
𝑥̅ 1 𝑛1 +𝑥̅ 2 𝑛2 +⋯+𝑥̅ 𝑘 𝑛𝑘 𝑥̅ 1 𝑛1 +𝑥̅ 2 𝑛2 70𝑥45 +78𝑥50 7050
̅𝑥𝑐 = = = = = 74.21
𝑛1 +𝑛2 +⋯+𝑛𝑘 𝑛1 +𝑛2 45+50 95
The geometric mean like arithmetic mean is calculated an average. It is used when observed values
are measured as ratios, percentages, proportions, indices or growth rates.
Geometric mean for individual series: The geometric mean, G.M. of an individual series of
positive numbers (> 0) 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined as the nth root of their product.
1
G.M n x1 .x2 xn = antilog ( ∑ 𝑙𝑜𝑔𝑥𝑖 )
𝑛
35 | P a g e
Geometric mean for continuous grouped FD:- The above formula can also be used whenever
the frequency distribution is grouped continuous, class marks of the class intervals are considered
as xi.
Harmonic Mean
It is a suitable measure of central tendency when the data pertains to speed, rate and time. The
harmonic Mean of n values is defined as n divided by the sum of their reciprocal.
Harmonic mean for individual series: If 𝑥1 , 𝑥2 , … , 𝑥𝑛 are n observations, then harmonic mean
can be represented by the following formula:
n
H .M
1 1 1
x1 x2 xn
Example 3.9 A car travels 25 miles at 25 mph, 25 miles at 50 mph, and 25 miles at 75 mph. Find
average mean ( the harmonic mean) of the three velocities.
Solution:
n 3
H .M = 1 1 1 = 40.9
1 1 1 + +
25 50 75
x1 x2 xn
Harmonic mean for discrete data arranged in FD: If the data is arranged in the form of
frequency distribution
n
H .M m
f1 f 2 f , where n f k
m k 1
x1 x 2 xm
Harmonic mean for continuous grouped FD: Whenever the frequency distribution are grouped
continuous, class marks of the class intervals are considered as 𝑥𝑖 and the above formula can be
used as
𝑛 m
H.M. = 𝑓𝑖 where n f k
∑𝑛
𝑖=1𝑥 k 1
𝑖
36 | P a g e
Appropriate measure of central tendency in situations where data is in time, speed or rate.
Relations among different means
i. If all the observations are positive we have the relationship among the three means given as: 𝑥̅ ≥
GM ≥HM
ii. For two observations √𝑥̅ ∗ HM = GM
iii. 𝑥̅ = GM = HM if all observation are positive and have equal value.
3.5.2 Median
The median is as its name indicates the middle most value in the arrangement which divides the
data into two equal parts. It is obtained by arranging the data in an increasing or decreasing order
of magnitude and denoted by 𝑥̃.
Median for individual series
We arrange the sample in ascending order of the variable of interest. Then the median is the middle
value (if the sample size n is odd) or the average of the two middle values (if the sample size n is
even).
For individual series the median is obtained by
𝑛+1 𝑡ℎ
a/ 𝑥̃ = ( ) value if n is odd, and
2
𝑛 𝑛
( )𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 + ( +1)𝑡ℎ 𝑣𝑎𝑙𝑢𝑒
2 2
b/ 𝑥̃ = if n is even
2
37 | P a g e
CF greater than or equal to the rank/position of the median value (i.e., that value obtained by a &
b above formula) and the corresponding value is the median.
Median for grouped continuous data:-For continuous data, the median is obtained by the following
formula.
w n
Median L CF ~
x
f med 2
𝑓𝑚𝑒𝑑 = the frequency of the median class; and CF = the cum. freq. corresponding to the class
preceding the median class. That is, the sums of the frequencies of all classes lower than the median
class. Where the median class is the class which contains the (n/2)th observation whether n is odd or
even, since the items have already lost their originality once they are grouped in to continuous classes.
Example 3.11: Calculate the median for the following frequency distribution.
C.I 1 - 5 6 - 10 11 – 15 16 – 20 21 - 25 26 – 30 31 - 35 Total
Freq. 4 8 12 6 3 4 3 40
Freq. 4 8 12 6 3 4 3 40
Cuml. Freq. 4 12 24 30 33 37 40
Since n = 40, 40/2 = 20, and the smallest CF greater than or equal to 20 is 24; thus, the median class
is the third class. And for this class, L = 10.5, w = 5, f med =12, CF = 12. Then applying the formula,
we get:
~
x =10.5+(20-12)*5/12=13.8
Merits of median
It is less affected by extreme values.
38 | P a g e
Median can be calculated even in case of open-ended intervals.
It can be computed for ratio, interval, and ordinal level of data.
Demerits of median
Its value is not determined by each & every observation.
It is not a good representative of the data if the number of items (data) is small.
The arrangement of items in order of magnitude is sometimes very tedious process if the
number of items is very large.
3.5.3 The Mode
The mode or the modal value is the value with the highest frequency and denoted by 𝑥̂. A data set
may not have a mode or may have more than one mode. A distribution is called a bimodal
distribution if it has two data values that appear with the greatest frequency. If a distribution has
more than two modes, then the distribution is multi modal. If a distribution has no modes, then the
distribution is no modal.
Mode of individual series:- The mode or the modal value of individual series (raw data) is simply
obtained by locating the observation with the maximum frequency.
Mode for discrete data arranged in a frequency distribution:-In the case of discrete grouped data,
the mode is determined just by looking to that value (s) having the highest frequency.
39 | P a g e
After locating this class, the mode is interpolated using:
1
Mode L w , where L = the lower class boundary of the modal class; 1 f mod f 1 ,
1 2
2 f mod f 2 , w = the common class width, f 1 = frequency of the class immediately preceding the
modal class; f 2 = frequency of the class immediately succeeding the modal class; and fmode = frequency
of the modal class.
Example 3.13: Calculate the mode for the frequency distribution of data of example 3.11.
Solution: By inspection, the mode lies in the third class, where L =10.5, fmod = 12, f1=8, f2=6, w = 5
Using the formula, the mode is:
1
Mode L w = 10.5 + (12-8)*5/(12-8)+(12-5) = 12.5
1 2
Merits of mode
Mode is not affected by extreme values.
We can change the size of the observations without changing the mode.
It can be computed for all level of data i.e. ratio, interval, ordinal or nominal.
Demerits of mode
It may not exist.
It does not take every value into consideration.
Mode may not exist in the series and if it exists it may not be unique.
3.5.4 The Relationship of the Mean, Median and Mode
Comparing the Mean, Median, and the Mode
If the data is skewed –avoid the mean.
If there is high gap around the middle- avoid the median.
A measure is a resistant measure if its value is not affected by an outlier or an extreme
data value.
The mean is not a resistant measure of central tendency because it is not resistant to the
influence of the extreme data values or outliers.
The median is resistant to the influence of extreme data values or outliers and its value does
not respond strongly to the changes of a few extreme data values regardless of how large
the change may be.
40 | P a g e
The mode has an advantage over both the mean and the median when the data is
categorical since it is not possible to calculate the mean or median for this type of data.
Also, the mode usually indicates the location within a large distribution where the data
values are concentrated. However, the mode cannot always be calculated because if a
distribution has all different data values, then the distribution is non modal.
In the case of symmetrical distribution; mean, median and mode coincide. That is
mean=median = mode. However, for a moderately asymmetrical (nonsymmetrical)
distribution, mean and mode lie on the two ends and median lies between them and they
have the following important empirical relationship, which is
Mean – Mode = 3(Mean - Median)
Example 3.14: In a moderately asymmetrical distribution, the mean and the mode are 30 and 42
respectively. What is the median of the distribution?
Solution:
Median = (2mean + Mode)/2 = (2*30 + 42)/3 = 34
Hence the median of the distribution is 34.
Which of the Three Measures is the ‘’Best’’?
At this stage, one may ask as to which of these three measure of central tendency is the best. There
is no simple answer to this question. It is because these three measures are based upon different
concepts. The arithmetic mean is the sum of the values divided by the total number of observations
in the series. The median is the value of the middle observations tend to concentrate. As such; the
use of a particular measure will largely depend on the purpose of the study and the nature of the
data. For example, when we are interested in knowing the consumers’ preferences for different
brands of television sets or kinds of advertising, the choice should go in favor of mode. The use of
mean and median would not be proper. However, the median can sometimes be used in the case
of qualitative data when such data can be arranged in an ascending or descending order. Let us
take another example. Suppose we invite applications for a certain vacancy in our company. A
large number of candidates apply for that post. We are now interested to know as to which age or
age group has the largest concentration of applicants. Here, obviously the mode will be the most
appropriate choice. The arithmetic mean may not be appropriate as it may be influenced by some
extreme values.
41 | P a g e
3.6 The Quantiles (Quartiles, Deciles, Percentiles)
Median is the value of the middle item, which divides the data in to two equal parts and found by
arranging the data in an increasing or decreasing order of magnitude, whereas quintiles are
measures, which divides a given set of data in to approximately equal subdivision and are obtained
by the same procedure to that of median. They are averages of position (non-central tendency).
Some of these are quartiles, deciles and percentiles.
Quartiles: are values which divide the data set in to approximately four equal parts, denoted by
𝑄1 , 𝑄2 𝑎𝑛𝑑 𝑄3 . The first quartile (𝑄1) is also called the lower quartile and the third quartile (𝑄3 )
is the upper quartile. The second quartile ( 𝑄2 ) is the median.
• Quartiles for Individual series:
Let x1 , x 2 , , x n be n ordered observations. The ith quartile Qi is the value of the item
corresponding
That is, after arranging the data in ascending order, Q1, Q2, & Q3 are, obtained by:
• Quartiles in continuous data:- For continuous data, use the following formula:
w in
Qi L CF
f Qi 4
Where i = 1,2, 3, and L, w ,fQi and CF are defined in the same way as the median.
𝑤 𝑛 𝑤 2𝑛 𝑤 3𝑛
i.e. Q1 = L + ( − 𝐶𝐹) , Q2 = L + ( − 𝐶𝐹) 𝑎𝑛𝑑 Q3 = L + ( − 𝐶𝐹)
𝑓𝑄1 4 𝑓𝑄2 4 𝑓𝑄3 4
The class under question is the one including (ixn/4)th value. That is, the class with the minimum
frequency greater than or equal to (ixn/4) th is the class of the ith quartile.
Deciles: are values dividing the data approximately in to ten equal parts, denoted by 𝐷1 , 𝐷2,…, 𝐷9 .
• Deciles for Individual Series:
42 | P a g e
Let x1 , x 2 , , x n be n ordered observations. The ith decile (𝐷𝑖 ) is the value of the item
corresponding
That is, after arranging the data in ascending order, D1, D2, . . . & D9 are, obtained by:
Then define the symbols in similar ways as we did in the case of quartiles for continuous data.
Percentiles: are values which divide the data approximately in to one hundred equal parts, and
denoted by 𝑃1 , 𝑃2,…, 𝑃99 .
• Percentiles for Individual Series:
Let x1 , x 2 , , x n be n ordered observations. The ith percentile (𝑃𝑖 ) is the value of the item
Define the symbols similar ways as we did in the case of quartiles or deciles for continuous data.
43 | P a g e
Interpretations
1. 𝑄𝑖 is the value below which ( i × 25) percent of the observations in the series are found
(where i = 1, 2,3). For instance 𝑄3 means the value below which 75 percent of observations in
the given series are found.
2. 𝐷𝑖 is the value below which ( i ×10) percent of the observations in the series are found (where
i = 1, 2,...,9 ). For instance 𝐷4 is the value below which 40 percent of the values are found in the
series.
3. 𝑃𝑖 is the value below which i percent of the total observations are found (where i = 1, 2,3,...,99
). For example 60 percent of the observations in a given series are below 𝑃60 .
Example 3.15: Calculate 𝑄1 , 𝑄2 , 𝑄3, 𝐷4, 𝐷9, 𝑃40 & 𝑃90 for the following data given on the table
below.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Solution: The data is arranged in an increasing order. So we need to construct only the
cumulative frequency table before calculating the required values.
x 10 11 12 13 14 15 16 17 18
f 2 8 25 48 65 40 20 9 2
Cum. 2 10 35 83 148 188 208 217 219
Freq.
The total number of observations is 219 which is odd. Clearly then the median is 14. i.e.
𝑛+1 𝑡ℎ 219+1 𝑡ℎ
𝑥̃ = ( ) =( ) value = 110th value = 14
2 2
1(𝑛+1) 𝑡ℎ 1(219+1) 𝑡ℎ
𝑄1 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 55th value = 13
4 4
2(𝑛+1) 𝑡ℎ 2(219+1) 𝑡ℎ
𝑄2 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 110th value = 14 = 𝑥̃
4 4
3(𝑛+1) 𝑡ℎ 3(219+1) 𝑡ℎ
𝑄3 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 165th value = 15
4 4
4(𝑛+1) 𝑡ℎ 4(219+1) 𝑡ℎ
𝐷4 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 88th value = 14
10 10
9(𝑛+1) 𝑡ℎ 9(219+1) 𝑡ℎ
𝐷9 = ( 10
) 𝑣𝑎𝑙𝑢𝑒 = ( 10
) 𝑣𝑎𝑙𝑢𝑒 = 198th value = 16
44 | P a g e
40(𝑛+1) 𝑡ℎ 40(219+1) 𝑡ℎ
𝑃40 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 88th value = 14
100 100
90(𝑛+1) 𝑡ℎ 90(219+1) 𝑡ℎ
𝑃90 = ( ) 𝑣𝑎𝑙𝑢𝑒 = ( ) 𝑣𝑎𝑙𝑢𝑒 = 198th value = 16
100 100
Example 3.16: Marks of 50 students out of 85 is given below. Based on the data find 𝑄1,
𝐷4 𝑎𝑛𝑑 𝑃7.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
fi 4 8 15 5 9 5 4
Solution: first find the class boundaries and cumulative frequency distributions.
Marks 46-50 51-55 56-60 61-65 66-70 71-75 76-80
class boundary 45.5-50.5 50.5-55.5 55.5-60.5 60.5-65.5 65.5-70.5 70.5-75.5 75.5-80.5
fi 4 8 15 5 9 5 4
Cum. 4 12 27 32 41 46 50
frequency
Q1 Measure of (n/4)th value = 12.5th value which lies in group 55.5 – 60.5
𝑤 𝑛 5
Q1 = L +𝑓 ( 4 − 𝐶𝐹) = 55.5 +15 (12.5 − 12) = 55.7
𝑄1
D4 Measure of (4n/10)th value = 20th value which lies in group 55.5 – 60.5.
𝑤 4𝑛 5
D4 = L +𝑓 ( 10 − 𝐶𝐹) = 55.5 +15 (20 − 12) = 58.2
𝐷4
P7 Measure of (7n/100)th value = 3.5th value which lies in group 45.5 – 50.5
𝑤 7𝑛 5
P7 = L +𝑓 (100 − 𝐶𝐹) = 45.5 +4 (3.5 − 0) = 49.875.
𝑃7
Exercise- 3
1. Calculate the median, quartiles, 8th decile, and 75th percentile for the following data. Show
that the value of 75th percentile is the same as that of Q3.
Lifetime (C.M) 50 100 150 200 250 300 350 400
No of Batteries 6 8 13 20 9 6 3 2
2. The following data represent the number of offences for various robberies in a town per a
given day.
No. of robberies 26 34 30 15 10 32 12 25 7
45 | P a g e
No. of days 13 19 12 30 14 8 19 20 3
Compute the mean, median and mode
3. Calculate Q1, Q2, Q3, D5, D8, and P90 for the following table
Temperature (oF) 50-59 60-69 70-79 80-89 90-99
Days 2 8 20 4 1
4. The following data represent the pulse rates (beats per minute) of nine students 76 60 60
81 72 80 80 68 and 73. Calculate the mean, mode and the third quartile.
5. The number of births in a hospital is given below
Days Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Num. of 50 60 52 55 62 30 40
births
Find the average number of births per day and the mode.
6. From the table given below find the mode and 5th decile.
Size 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50
Frequency 7 10 13 26 35 22 11 5
7. If the arithmetic mean of two items is 5 and G.M. is 4, find their H.M.
8. The following frequency distribution represents the magnitude of earth quake.
Magnitude 0-0.9 1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6-6.9 7-7.9
Frequency 20 50 45 30 10 8 6 1
Compute the median and verify that it is equal to the second quartile and find 72nd percentile.
CHAPTER FOUR
4. Measures of Dispersion (Variation)
4.1 Introduction
Just as central tendency can be measured by a number in the form of an average, the amount of
variation (dispersion, spread, or scatter) among the values in the data set can also be measured.
The measures of central tendency describe that the major part of values in the data set appears to
concentrate around a central value called average with the remaining values scattered (distributed)
on either sides of that value. But these measures do not reveal how these values are dispersed
(spread or scatter) on each side of the central value. The dispersion of values is indicated by the
46 | P a g e
extent to which these values tend to spread over an interval rather than cluster closely around an
average.
The term dispersion is generally used in two senses. Firstly, dispersion refers to the variations of
the items among themselves. If the value of all the items of a series is the same, there will be no
variation among different items of a series. Secondly, dispersion refers to the variation of the items
around an average. If the difference between the value of items and the average is large, the
dispersion will be high and on the other hand if the difference between the value of the items and
averaging is small, the dispersion will be low. Thus, dispersion is defined as scatteredness or
spreadness of the individual items in a given series.
After studying this chapter, you should be able to:
Explain the meaning of measures of dispersion
47 | P a g e
Relative measures of dispersion: A relative measure of dispersion is the ratio of a measure of
absolute dispersion to an appropriate average or the selected items of the data.
Relative
measure of
dispersion
Based on Based on
selected all items
items
Coefficient of
Coefficient of mean deviation
range and &coefficient of
coefficient of standard deviation
quartile or coefficient of
deviation variation
48 | P a g e
The difference between upper class limit of the last class and the lower class limit of the
first class, or
The difference between the largest class mark and the smallest class mark, or
The difference between the upper class boundary of the last class and the lower class
boundary of the first class.
The range is used in describing like the maximum change in daily temperature, rainfall, etc. When
the sample size is small, it can be an adequate measure of variation. It is commonly used in quality
control.
The relative measures of range, also called coefficient of range, is defined as
LS
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑅𝑎𝑛𝑔𝑒(𝑅𝑅) =
LS
Example 4.1: Five students obtained the following marks in statistics: 20, 35, 25, 30, 15. Find the
range and relative range
Solution: Here, 𝐿 = 35, 𝑎𝑛𝑑 𝑆 = 15
𝑅𝑎𝑛𝑔𝑒 = 𝐿 − 𝑆 = 35 − 15 = 20
LS 35 15
𝑅𝑅 = 0.4
LS 35 15
Example 4.2: Find out range and relative range of the following given data.
49 | P a g e
It is not based on all observations of the series.
It can’t be calculated in case of open-ended distribution.
It is affected by sampling fluctuation.
It is affected by extreme values in the series.
4.3.2 The Quartile Deviation and Coefficient of Quartile Deviation
Inter-quartile range and quartile deviation are other measures of dispersion. The difference
between the upper quartile (𝑄3 ) and lower quartile (𝑄1 ) is called inter-quartile range.
Symbolically,
𝑰𝑛𝑡𝑒𝑟 𝑸𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑹𝑎𝑛𝑔𝑒 (𝐼𝑄𝐷) = 𝑄3 − 𝑄1
The inter-quartile ranges covers dispersion of middle 50% of the items of the series. Quartile
deviation, also called semi-inter-quartile range, is half of the difference between the upper and
lower quartile. That is, half of the inter-quartile range. Its formula is
𝑄3 − 𝑄1
𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑄𝐷) =
2
The relative measure of quartile deviation also called the coefficient of quartile deviation (CQD)
is defined as:
𝑄3 − 𝑄1
𝐶𝑄𝐷 =
𝑄3 + 𝑄1
Example 4.3: Find inter-quartile range, quartile deviation and coefficient of quartile deviation
from the following data.
28, 18, 20, 24, 27, 30, 15
Solution: First arrange the data in ascending order. 15, 18, 20, 24, 27, 28, 30
𝑛 + 1 𝑡ℎ 7 + 1 𝑡ℎ
𝑄1 = 𝑠𝑖𝑧𝑒 𝑜𝑓 ( ) 𝑖𝑡𝑒𝑚 = 𝑠𝑖𝑧𝑒 𝑜𝑓 ( ) 𝑖𝑡𝑒𝑚
4 4
= 𝑠𝑖𝑧𝑒 𝑜𝑓 2𝑛𝑑 𝑖𝑡𝑒𝑚 = 18 𝑚𝑎𝑟𝑘𝑠
𝑛 + 1 𝑡ℎ 7 + 1 𝑡ℎ
𝑄3 = 𝑠𝑖𝑧𝑒 𝑜𝑓 3 ( ) 𝑖𝑡𝑒𝑚 𝑠𝑖𝑧𝑒 𝑜𝑓 3 ( ) 𝑖𝑡𝑒𝑚
4 4
= 𝑠𝑖𝑧𝑒 𝑜𝑓 6𝑡ℎ 𝑖𝑡𝑒𝑚 = 28 𝑚𝑎𝑟𝑘𝑠
𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 28 − 18 = 10
𝑄3 − 𝑄1 28 − 18
𝑄𝐷 = = =5
2 2
𝑄3 − 𝑄1 28 − 18
𝐶𝑄𝐷 = = = 0.217
𝑄3 + 𝑄1 28 + 18
50 | P a g e
Example 4.4: Find inter-quartile range, quartile deviation and coefficient of quartile deviation
from the following data
Marks 2 3 4 5 6 7 8 9
No. Of students 10 11 12 13 5 12 7 5
Solution:
Marks 2 3 4 5 6 7 8 9
No. of students 10 11 12 13 5 12 7 5
CF 10 21 33 46 51 63 70 75=N
𝑁+1 75 + 1
𝑄1 = ( )= = 19𝑡ℎ 𝑖𝑡𝑒𝑚 = 3
4 4
𝑁+1 75+1
𝑄3 = 3 ( ) = 3( ) = 57th item = 7
4 4
𝐼𝑄𝑅 = 𝑄3 − 𝑄1 = 7 − 3 = 4
𝑄3 − 𝑄1 7 − 3
𝑄𝐷 = = =2
2 2
𝑄3 − 𝑄1 7 − 3
𝐶𝑄𝐷 = = = 0.4
𝑄3 + 𝑄1 7 + 3
Remark: Q.D or CQD includes only the middle 50% of the observation.
Merits of QD
It is well-defined, easy to compute and simple to understand.
It helps in studying the middle 50% item in the series.
It is not affected by the extreme items.
It is useful in measuring variations in the case of open-ended distributions.
Demerits of QD
It is not based on all the items (it ignores 50% items, i.e., the first 25% and the last
25%).
It is greatly influenced by sampling fluctuations.
It is not amenable to algebraic manipulations.
51 | P a g e
the absolute deviations from a given average. Depending up on the type of averages used we have
different mean deviations.
The mean deviation of a sample of n observations x1, x2, . . .,xn (individual series)is given
as
∑|𝑋𝑖 − 𝐴|
𝑀𝐷 =
𝑛
Where |𝑋𝑖 − 𝐴| denotes the absolute value of the deviation. Generally, arithmetic mean and
median are used in calculating mean deviation. So, 𝐴 stands for the average used for
calculating 𝑀𝐷. That is, 𝐴 = 𝑚𝑒𝑑𝑖𝑎𝑛(𝑋̃ ) 𝑜𝑟 𝐴 = 𝑚𝑒𝑎𝑛(𝑋̅).
In case of discrete data arranged in FD and continuous grouped data, the formula for MD
becomes
∑ 𝑓𝑖 |𝑋𝑖 −𝐴|
𝑀𝐷 = , where 𝑋𝑖 is the class mark of the ith class, 𝑓𝑖 is the frequency of the ith
𝑛
class and n = ∑ 𝑓𝑖 .
1. The mean deviation about the arithmetic mean is, therefore, given by
∑|𝑋 −𝑋̅|
𝑀𝐷(𝑋̅) = 𝑖 … for ungrouped data (individual series).
𝑛
∑ 𝑓 |𝑋 −𝑋 | ̅
𝑀𝐷 (𝑋̅) = 𝑖 𝑛 𝑖 . . . for discrete data arranged in FD and a grouped continuous
frequency distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class
mark of the ith class for continuous grouped data, 𝑓𝑖 is the frequency of the ith class and n =
∑ 𝑓𝑖 .
Steps to calculate M.D for (𝑋̅)
Find the arithmetic mean, 𝑋̅
Find the deviations of each reading from 𝑋̅
Find the arithmetic mean of the deviations, ignoring sign.
2. The mean deviation about the median is also given by
∑|𝑋 −x̃|
𝑀𝐷(𝑋̃) = 𝑖 … for ungrouped data (individual series).
𝑛
∑ 𝑓 |𝑋 −x̃|
𝑀𝐷(𝑋̃) = 𝑖 𝑛 𝑖 . . . for discrete data arranged in FD and a grouped continuous
frequency distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class
mark of the ith class for continuous grouped data , 𝑓𝑖 is the frequency of the ith class and n
= ∑ 𝑓𝑖 .
Steps to calculate M.D (𝑋̃ )
52 | P a g e
Find the median, 𝑋̃
Find the deviations of each reading from 𝑋̃
Find the arithmetic mean of the deviations, ignoring sign.
3. The mean deviation about the mode is also given by
∑|𝑋 −x̂|
𝑀𝐷(x̂) = 𝑛𝑖 … for ungrouped data (individual series).
∑ 𝑓𝑖 |𝑋𝑖 −x̂|
𝑀𝐷(x̂) = . . for discrete data arranged in FD and a grouped continuous frequency
𝑛
distribution; where 𝑋𝑖 is the value for discrete data arranged in FD and class mark of the
ith class for continuous grouped data, 𝑓𝑖 is the frequency of the ith class and n = ∑ 𝑓𝑖 .
Steps to calculate M.D (x̂)
Find the mode, x̂
Find the deviations of each reading from x̂
Find the arithmetic mean of the deviations, ignoring sign.
Example 4.5
The following are the number of visit made by ten mothers to the local doctor’s surgery. 8, 6, 5, 5,
7, 4, 5, 9, 7, 4. Find mean deviation about mean, median and mode.
Solution:
First calculate the three averages
𝑋̅ = 6, 𝑋̃ = 5.5, x̂ = 5
Then take the deviations of each observation from these averages.
xi 4 4 5 5 5 6 7 7 8 9 Total
|𝑋𝑖 − 𝑋̅| 2 2 1 1 1 0 1 1 2 3 14
|𝑋𝑖 − x̃| 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14
|𝑋𝑖 − 𝑋̂| 1 1 0 0 0 1 2 2 3 4 14
Since the distribution is ungrouped the mean deviation about mean, median and mode:
∑|𝑋𝑖 − 𝑋̅| 14
𝑀𝐷(𝑋̅) = = = 1.4
𝑛 10
∑|𝑋𝑖 − x̃| 14
𝑀𝐷(𝑋̃) = = = 1.4
𝑛 10
∑|𝑋𝑖 −x̂| 14
𝑀𝐷(x̂) = = 10 = 1.4
𝑛
Merits of 𝑴𝑫
53 | P a g e
It is well-defined, easy to compute and simple to understand.
It is based on all observations.
It is not greatly affected by the extreme items.
It can be calculated by using any average.
Demerit of 𝑴𝑫
It does not take in to account the signs of the deviations of items from the average.
Remark: Of all the mean deviations taken about different averages or any arbitrary value, the
mean deviation about the median has the smallest value.
Coefficient of mean deviation (CMD):
The relative measure of mean deviation, also called the coefficient of mean deviation is obtained
by dividing mean deviation by the particular average used in computing mean deviation. Thus,
CMD about the arithmetic mean is given by:
𝑀𝐷(𝑋 ) ̅
𝐶𝑀𝐷(𝑋̅) = 𝑋̅ where MD is the mean deviation calculated about the arithmetic mean.
Example 4.6: Calculate the coefficient of mean deviation about the mean, median and mode for
the data in Example 4.5 above.
Solution:
𝑀𝐷(𝑋̅) 1.4
𝐶𝑀𝐷(𝑋̅) = = = 0.23
𝑋̅ 6
𝑀𝐷(𝑋̃) 1.4
𝐶𝑀𝐷(𝑋̃) = = = 0.25
𝑋̃ 5.5
𝑀𝐷(x̂) 1.4
𝐶𝑀𝐷(x̂) = = = 0.28
x̂ 5
4.3.4 The Variance, Standard Deviation and Coefficient of Variation
Variance and Standard Deviation
Like the mean deviation, the variance is also based on all observations in a set of data. But
the variance is the average of squared deviations from the mean. Recall that the sum of squared
deviations is minimum only when taken from the mean. Squared deviations are mathematically
54 | P a g e
manipulated than absolute deviations. Thus, if we averaged the squared deviations from the mean
and take the square root of the result (to compensate for the fact that the deviations were squared),
we obtain the standard deviation. This overcomes the limitation of the mean deviation.
Population Variance (𝝈𝟐 )
If we divide the variation by the number of values in the population, we get something called the
population variance. This variance is the "average squared deviation from the mean".
For ungrouped data (individual series )
∑𝑵
𝒊=𝟏(𝑿𝒊 −𝝁)
𝟐 𝟏 2
𝝈𝟐 = = 𝑵 [∑N 𝟐
i=1 X i − 𝑵𝝁 ] where 𝝁 is the population arithmetic mean and N is
𝑵
mark of the ith class, fi is the frequency of the ithclass and N=∑ fi
Sample Variance (𝑺𝟐 )
One would expect the sample variance to simply be the population variance with the population
mean replaced by the sample mean. However, one of the major uses of statistics is to estimate the
corresponding parameter. This formula has the problem that the estimated value isn't the same as
the parameter. To offset this, the sum of the squares of the deviations is divided by one less than
the sample size.
For ungrouped data
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 1
𝑆2 = = 𝑛−1 [∑ni=1 xi 2 − 𝑛𝑥̅ 2 ] , Where 𝒙
̅ is the sample arithmetic mean and n is
𝑛−1
If the values xi have frequencies fi (i=1,2,…,m), then the sample variance is given by:
1 m
S fi xi x
2 2
∑ 𝑓𝑖 (𝑥𝑖 −𝑥̅ )2 1
𝑆2 = = 𝑛−1 [∑ fi xi 2 − 𝑛𝑥̅ 2 ] or
𝑛−1 n 1 i 1
For continuous grouped data
55 | P a g e
∑ 𝑓𝑖 (𝑥𝑖 −𝑥̅ )2 1
𝑆2 = = 𝑛−1 [∑ fi xi 2 − 𝑛𝑥̅ 2 ] where 𝒙
̅ is the sample arithmetic mean, 𝒙𝒊 is the class mark
𝑛−1
of the ith class, fi is the frequency of the ith class and n=∑ fi .
The Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square root
must be taken.
Population Standard Deviation (s )
𝜎 = √𝝈𝟐 where 𝜎 2 is the population variance.
Sample Standard Deviation ( S )
𝑆 = √𝑆 2 where 𝑆 2 is the sample variance.
Example 4.7: Find the sample variance and standard deviation of:
xi 2 4 5 6 8
fi 2 2 3 1 2
1
𝑆2 = [∑ fi xi 2 − 𝑛𝑥̅ 2 ]
𝑛−1
1 49 1
= 9 [279 − 10(10)2 ] = 9 (38.9) = 4.32, 𝑎𝑛𝑑 𝑆 = √4.32 = 2.08.
56 | P a g e
Example 4.8: Find the sample variance and standard deviation for the distribution:
Freq. 4 1 2 3
Solution: In a continuous F.D., xi is the class mark representing the ith class.
C.I xi fi f i xi 2
f i xi
1-5 3 4 12 36
6-10 8 1 8 64
11-15 13 2 26 338
16.20 18 3 54 972
∑ fi xi 100
Where, n=∑ fi = 10, x̅ = = = 10, ∑ fi xi 2 = 1410, so that
𝑛 10
1 1
𝑆 2 = 𝑛−1 [∑ fi xi 2 − 𝑛𝑥̅ 2 ] = 9 [1410 − 10(10)2 ]
410
= = 45.56,
9
𝑆 = √45.56 = 6.75.
1. If a constant is added to (or subtracted from) all the values, the variance remains the same;
The sample variance is 2 = V xi . Now, subtract 50 from each value to get:
1. If each and every value is multiplied by a non-zero constant (k), the standard deviation is
57 | P a g e
2. Both the variance and the standard deviation give more weight to extreme values and less
to those which are near to the mean.
Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).
Of course, standard deviation is an absolute measure of dispersion that expresses the variation in
the same unit as the original data but it cannot be the sole basis for comparing two distributions.
For instance, if we have a standard deviation of 10 and a mean of 5, the values vary by an amount
twice as large as the mean itself. If, on the other hand, we have a standard deviation of 10 and a
mean of 5000, the variation relative to the mean is significant. Therefore, we cannot know the
dispersion of a set of data until we know the standard deviation, the mean, and how the standard
deviation compares with the mean.
Coefficient of variation is used in such problems where we want to compare the variability of two
or more different series. Coefficient of variation is the ratio of the standard deviation to the
arithmetic mean, usually expressed in percent.
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
CV = × 100%
𝑚𝑒𝑎𝑛
58 | P a g e
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Mathematics Departments Chemistry Departments
𝑆 𝑆
CV = x̅ × 100 CV = x̅ × 100
25 12
= 85 × 100 = 65 × 100
= 29.41% = 18.46%
Interpretation: Since the CV of Mathematics Department students is greater than that of
Chemistry Department students, we can say that there is more dispersion relative to the mean in
the distribution of Mathematics students’ scores compared with that of Chemistry students.
4.4 Standard Scores (Z-Scores)
A standard score for sample value in a data set is obtained by subtracting the mean of the data set
from the value and dividing the result by the standard deviation of the data set. Basically, the
standard score (z-score) tells us how many standard deviations a specific value is above or below
the mean value of the data set. That is, the z-score is the number of standard deviations the data
value falls above (positive z-score) or below (negative z-score) the mean for the data set.
Z-score computed from the population
𝑋−𝜇
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝜎
Z-score computed from the sample
𝑋 − 𝑋̅
𝑍 𝑠𝑐𝑜𝑟𝑒 =
𝑆
Example 4.11: What is the Z-score for the value of 14 in the following sample data set?
3 8 6 14 4 12 7 10
Solution:
14−8
𝑋̅ = 8, SD = 3.8173 thus, Z =3.8173 ≈ 1.57.
The data value of 14 is located 1.57 standard deviations above the mean 8 because the z-score
is positive.
Example 4.12: Suppose that a student scored 66 in Statistics and 80 in Mathematics. The score of
the summary of the courses is given below.
59 | P a g e
Course Average score Standard deviation of the score
Statistics 51 12
Mathematics 72 16
In which course did the student scored better as compared to his classmates?
Solution:
𝑋−𝜇 66−51 15
Z-score of student in Statistics: 𝑍 = = = 12 = 1.25
𝜎 12
𝑋−𝜇 80−72 8
Z-score of student in Mathematics: 𝑍 = = = 16 = 0.5
𝜎 16
From these two standard scores, we can conclude that the student has scored better in Statistics
course relative to his classmates than in Mathematics course.
4.5 Moments, Skewness and Kurtosis
The measures of central tendency and variation discussed in previous one do not reveal the entire
story about a frequency distribution. Two distributions may have the same mean and standard
deviation but may differ in their shape of the distribution. Further description of their
characteristics is necessary that is provided by measures of skewness and kurtosis.
4.5.1 Moments
Moments are statistical tools used in statistical investigation. The moments of a distribution are the
arithmetic mean of the various powers of the deviations of items from some number. In our course,
we shall use it in the study of Skewness and Kurtosis of statistical distribution.
Moments about the origin
∑ 𝑋𝑖 𝑟
𝑀𝑟 =
𝑛
Where 𝑟 = 0, 1, 2, 3, …
Moments about the origin for grouped frequency distribution and for ungrouped frequency
distribution is
∑ 𝑓𝑖 𝑋𝑖 𝑟
𝑀𝑟 =
𝑛
Where 𝑓𝑖 is the frequency of 𝑋𝑖 . 𝑋𝑖 is the midpoint in the case of grouped frequency distribution
or class value in the case of ungrouped frequency distribution.
Note that: 𝑀1 = 𝑋̅, 𝑀0 = 1
Moments about the Mean (Central Moments)
60 | P a g e
∑(𝑋𝑖 − 𝑋̅)𝑟
𝑀𝑟′ =
𝑛
Moments about the mean for grouped frequency distribution and for ungrouped frequency
distribution.
∑ 𝑓𝑖 (𝑋𝑖 − 𝑋̅)𝑟
𝑀𝑟′ =
𝑛
Where 𝑓𝑖 is the frequency of 𝑋𝑖 . 𝑋𝑖 is the midpoint in the case of grouped frequency distribution
or class value in the case of ungrouped frequency distribution.
Example 4.13: Find the first four moments about the mean for the following individual series
𝑋𝑖 : 3 6 8 10 18
Solution: n=5,
S.No 𝑿𝒊 ̅)
(𝑿𝒊 − 𝑿 ̅ )𝟐
(𝑿𝒊 − 𝑿 ̅ )𝟑
(𝑿𝒊 − 𝑿 ̅ )𝟒
(𝑿𝒊 − 𝑿
1 3 -6 36 -216 1296
2 6 -3 9 -27 81
3 8 -1 1 -1 1
4 10 1 1 1 1
5 18 9 81 729 6561
Total ∑ 𝑋 = 45 ∑(𝑋 − 𝑋̅) = 0 ∑(𝑋 − 𝑋̅)2 ∑(𝑋 − 𝑋̅)3 ∑(𝑋 − 𝑋̅)4
= 128 = 486 = 7940
Thus,
45 ∑(𝑋𝑖 −9) 1 ∑(𝑋𝑖 −9) 128 2 ∑(𝑋𝑖 −9) 486 3
𝑋̅ = 5 = 9, 𝑀1′ = = 0, 𝑀2′ = = 5 = 25.6, 𝑀3′ = = 5 = 97.2
5 5 5
61 | P a g e
4.5.2 Skewness
Skewness refers to lack of symmetry (or departure from symmetry) in a distribution.
A skewed frequency distribution is one that is not symmetrical.
Skewness is concerned with the shape of the curve not size.
A distribution is said to be symmetrical when the value is uniformly distributed around the mean
(distribution of the data below the mean and above the mean are equal). In a symmetrical
distribution, the mean, median and mode coincide (i.e., mean = median = mode).
Positively skewed distribution: if the value of mean is greater than the mode, skewness is said to
be positive. In a positively skewed distribution mean is greater than the mode and the median lies
somewhere in between mean and mode. A positively skewed distribution contains some values
that are much larger than the majority of other observations.
Negatively Skewed distribution: if the value of mode is greater than the mean, skewness is said
to be negative. In a negatively skewed distribution mode is greater than the mean and the median
lies in between mean and mode. The mean is pulled towards the low-valued item (that is, to the
left). A negatively skewed distribution contains some values that are much smaller than the
majority of observations.
Note that: In moderately skewed distributions the averages have the following
relationship.
(Mean – mode) = 3(mean - median)
62 | P a g e
Skewness present in the data if:
i) The graph is not symmetrical.
ii) The mean, median and mode do not coincide.
iii) The sum of positive and negative deviations from the median is not zero.
iv) The frequencies are not similarly distributed on either side of the mode.
Measures of skewness (𝜶𝟑 )
A measure of skewness gives a numerical expression for and the direction of asymmetry in a
distribution. It gives information about the shape of the distribution and the degree of variation on
either side of the central value. The three most commonly used measures of skewness are
Pearson’s coefficient of skewness, Bowley’s coefficient of skewness and coefficient of skewness
based on moments.
1. Pearson’s coefficient skewness (Pearsonian coefficient of skewness)
The skewness of the distribution can be measured by Pearson’s Coefficient of Skewness
(𝜶𝟑 ), for which the formula is given below:
𝑀𝑒𝑎𝑛−𝑀𝑜𝑑𝑒
𝛼3 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
63 | P a g e
α3 < 0, the distribution is negatively skewed/skewed to the left. i.e., mean < median < mode
smaller observations are less frequent than larger observations. i.e., the majority of
the observations have a value above an average.
4.5.3 Kurtosis
Kurtosis is a measure of peakedness of a distribution. The degree of kurtosis of a distribution is measured
relative to the peakedness of a normal curve. If a curve is more peaked than the normal curve it is called
‘leptokurtic’; if it is more or flate-topped than the normal curve it is called ‘platykurtic’ or flat-topped. The
normal curve itself is known as ‘mesokurtic’.
64 | P a g e
Solution:
𝑀′3 −2.4
a/ 𝛼3 = 3/2 = 1.63/2 = -1.19 < 0, the distribution is negatively skewed.
𝑀′2
𝑀′4 5.8
b/ 𝛼4 = = 1.62 = 2.26 < 3, the curve is platykurtic.
𝑀′22
Example 4.14: Find the coefficient of skewness and the coefficient of kurtosis for the
above example 4.13.
Solution:
𝑀′3 97.2 97.2
i) 𝛼3 = 3/2 = 3 = 129.527 = 0.75
𝑀′2 (25.6)2
2. An analysis of the monthly wages paid (in birr) to workers in two firms A and B belonging to
the same industry gives the following results.
Value Firm A Firm B
Mean wage 52.5 47.5
Variance 100 121
In which firm A or B is there greater variability in individual wages?
3. A meteorologist interested in the consistency of temperatures in three cities during a given
week collected the following data. The temperatures for the five days of the week in the three
cities were
City 1: 25, 24, 23, 26, 17
City 2: 22, 21, 24, 22, 20
City 3: 32, 27, 35, 24, 28
Which city have the most consistent temperature, based on these data?
65 | P a g e
4. Some characteristics of annually family income distribution (in Birr) in two regions is as
follows:
Region Mean Median Standard deviation
A 6250 5100 960
B 6980 5500 940
CHAPTER FIVE
5 Elementary Probability
Walter Bagehot
The notion that chance, or probability, can be treated numerically is relatively recent. Indeed, for
most of recorded history it was felt that what occurred in life was determined by forces that were
beyond one’s ability to understand. It was only during the first half of the 17th century, near the
end of Renaissance, that people become curious about the world and the laws governing its
operation. Among the curious were the gamblers.
A cynical person once said, “The only two sure things are death and taxes.” This philosophy no
doubt arose because so much in people’s lives is affected by chance.
66 | P a g e
5.1 Introduction
Probability as a general concept can be defined as the chance of an event occurring. Most people
are familiar with probability from observing or playing games of chance, such as card games or
lotteries. Probability is the basis of inferential statistics.
The basic concepts of probability are explained in this chapter. These concepts
include probability experiments, sample spaces, the addition and multiplication rules,
and the probabilities of complementary events. Also in this chapter, you will learn the rule
for counting, the differences between permutations and combinations, and how to figure
out how many different combinations for specific situations exist. Section 4–5
explains how the counting rules and the probability rules can be used together to solve a
wide variety of problems. Finally in section six, the concept of probability is extended to
conditional probability and independence.
At the end of this chapter students are expected to:
Know what is meant by sample space, event, relative frequency, probability,
conditional probability, independence.
67 | P a g e
Mutually exclusive events: Suppose you have two events, say A and B. if these events
have no common sample point(s) or do not occur simultaneously, then the two events are
called mutually exclusive events. Example 5.4: consider experiment o tossing a die. Let A
be the event of odd numbers and B be the event of even numbers, A={1, 3, 5}, B={2, 4,
6}, then A and B are mutually exclusive events.
Exhaustive events: It is a satiation where the events contain all elements based on the
definition of the events. For example S={Head, Tail} is exhaustive for tossing a coin
experiment.
Union of events: The union of two events A and B, denoted by𝐴 ∪ 𝐵, consists of all
outcomes that are in A or in B or both A and B. Example 5.5: let A={1, 3, 5}, B={2, 4, 5,
6} then AUB={1, 2, 3, 4, 5, 6}
Intersection of events: The intersection of event A and B, denoted by𝐴 ∩ 𝐵, consists of all
outcomes that are in both A and B. Example 5.6: A={1, 3, 5}, B={2, 4, 5, 6} then
A∩B={5}
Compliment of an event: The compliment of event A, denoted by𝐴𝑐 𝑜𝑟 𝐴′ , consists of all
outcomes that are not in A. Example 5.7: Let Sample space S={1, 2, 3, 4, 5, 6}and event
A={1, 3, 5}then 𝐴𝑐 ={2, 4, 6}
Null event: The event containing no outcomes. It is the compliment of the sample space.
Probability of an event: The probability of event A, denoted by𝑃(𝐴), is the probability
the outcome of the experiment is contained in A.
Equally-likely events: It is a situation where the probability of the occurrence of one event
as likely as the other event. That is, they must have equal probability of occurrence.
Example 5.8: In example 5.1, outcomes: 1, 2, 3, 4, 5, and 6 are equally likely.
Independent events: Two events said to be independent if knowing whether a specific one
has occurred does not change the probability that the other occurs. (Example is explained
in section 5.6).
Example 5.9: If there are two way of bus to voyage Debre Derhan from Addis Ababa and three
railways, then collectively we have 2 + 3 = 5 different way to arrive Debre Berhan.
Example 5.10: How many different 7-place license plates are possible if the first 3 places are to
be occupied by letters and the final 4 by numbers?
Example 5.11: In the above example, how many license plates would be possible if repetition
among letters or numbers were prohibited?
Solution: In this case there would be 26.25.24.10.9.8.7 = 78, 624, 000 possible plates.
5.3.3 Permutation Rule
How many different ordered arrangements of letters 𝑎, 𝑏, 𝑐 are possible? By direct enumeration
we see that there are 6: namely, 𝑎𝑏𝑐, 𝑎𝑐𝑏, 𝑏𝑎𝑐, 𝑏𝑐𝑎, 𝑐𝑎𝑏 and 𝑐𝑏𝑎. Each arrangement is known as a
permutation. That is a permutation is an arrangement of 𝑛 objects in a specific order. Thus, there
are six possible permutations of a set of 3 objects. This result could also have been obtained from
the basic principle, since the first object in the permutation can be any of the 3, the second object
in the permutation can then be chosen any of the remaining 2, and the third object in the
permutation is then chosen the remaining one. Thus there are 3.2.1 = 6 possible permutations.
69 | P a g e
Permutation Rule 1: Suppose now that we have 𝑛 objects. Reasoning, similar to that we have
just used for the 3 letter shows that there are
𝑛. (𝑛 − 1). (𝑛 − 2) … 3.2.1 = 𝑛!
Different permutations of the 𝑛 objects
Example 5.12: A class of stat 173 consists of 6 men and 4 women. An examination is given, and
the students are ranked according to their performance. Assume that no two students obtain the
same score.
A. How many different rankings are possible?
B. If the men are ranked just among themselves and women among themselves, how many
different rankings are possible?
Solution:
A. As each ranking corresponds to a particular ordered arrangement of the 10 people, we see
that the answer to this part is 10! = 3, 628, 800
B. As there are 6! possible rankings of the men among themselves and 4! possible rankings
of the women among themselves, it follows from the basic principle that the two groups
arrange themselves; it follows the basic principle that the two groups arrange themselves
in 2! way so that we have a total of 6! .4! .2! = 34560 possible rankings.
Permutation Rule 2: We shall now determine the number of permutations of a set of 𝑛 objects
when certain of the objects are indistinguishable from each other. Then the formula is:
𝑛!
𝑛1 !. 𝑛2 ! … 𝑛𝑟 !
Different permutations of 𝑛 objects, of which 𝑛1 are alike 𝑛2 are alike, …, 𝑛𝑟 are alike.
Example 5.13: How many different letter arrangements can be formed using the letter PEPPER?
6!
Solution: = 60 possible teller arrangements.
3!.2!.1!
Permutation Rule 3: Generally, if we are asked to arrange 𝑟 objects among 𝑛 objects, then we
will have the following total arrangements
𝑛!
𝑛𝑃𝑟 =
(𝑛 − 𝑟)!
70 | P a g e
Example 5.14: Suppose a business man has a choice of five locations in which to establish his
business. He wishes to arrange only the top three locations. How many different ways can he
arrange them?
Solution:
5!
5𝑃3 = = 60 𝑤𝑎𝑦𝑠
(5 − 3)!
5.3.4 Combination Rule
We are often interested in determining the number of different groups of 𝑟 objects that could be
formed from a total of 𝑛 objects. A selection of objects without regard to order is called a
combination. That is, combinations are used when the order or arrangement is not important. The
number of combinations of 𝑟 objects selected from 𝑛 objects is denoted by 𝑛𝐶𝑟 and is given by the
𝑛! 𝑛
formula 𝑛𝐶𝑟 = 𝑟!(𝑛−𝑟)! = ( )
𝑟
Example 5.15: From a group of 5 women and 7 men, how many different committees consisting
of 2 women and 3 men can be performed? What if 2 of the men are feuding and refuse to serve on
the committee together?
5 7
Solution: As there are ( ) possible groups of 2 women, and ( ) possible groups of 3 me, it
2 3
follows from the basic principle that there are
5 7 5.4 7.6.5
( )( ) = ( ) = 350
2 3 2.1 3.2.1
Possible committees consisting of 2 women and 3 men. On the other hand, if 2 of the men refuse
2 5
to serve on the committee together, then, as there are ( ) ( ) possible group of 3 men not
0 3
2 5
containing either of the 2 feuding men and ( ) ( ) groups of 3 men containing exactly 1 of the
1 2
2 5 2 5
feuding men, it follows that there are ( ) ( ) + ( ) ( ) = 30 groups of 3 men not containing
0 3 1 2
5
both of the feuding men. Since there are ( ) ways to choose the 2 women, it follows that in this
2
5
case there are 30 ( ) = 300 possible committees.
2
5.4 Approaches in Probability Definition
The probability of an event is denoted by 𝑃(. ) where 𝑃 stands for probability and the dot stands
for any event, say A, B, G etc.
71 | P a g e
Generally approaches to probability can be divided into two, namely subjective approach and
objective approach.
5.4.1 Subjective approach:
A probability derived from an individual's personal judgment about whether a specific outcome is
likely to occur. Subjective probabilities contain no formal calculations and only reflect the subject's
opinions and past experience.
Subjective probabilities differ from person to person. Because the probability is subjective, it
contains a high degree of personal bias. An example of subjective probability could be asking
Arsenal fan, before the football season starts, the chances of Arsenal winning the world
champions. While there is no absolute mathematical proof behind the answer to the example, fans
might still reply in actual percentage terms, such as the Arsenal having a 95% chance of winning
the world champions.
5.4.2 Objective approach:
The probability of an event in a certain experiment based on an experimental evidence or random
process. In this approach to study probability theory there are three sub approaches.
These are
The classical approach
The frequentist approach
The axiomatic approach and
5.4.3.1 The Classical Approach
If a procedure has 𝑛 different simple events, each with an equal chance of occurring, and event A
can occur in 𝑠 of these ways, then
𝑛(𝐴) 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝐴
𝑃(𝐴) = =
𝑛(𝑆) 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑎 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒
Assumptions in classical approach
The outcomes must be equally-likely
The experiment should never be repeated more than once
The sample space should be finite
Example 5.16: Toss a fair coin once and find the probability of the occurrence of head
Solution: Since the sample space is finite i.e., either head or tail and the outcomes are
equally-likely
72 | P a g e
𝑛(ℎ𝑒𝑎𝑑) 1
𝑃(𝐻𝑒𝑎𝑑) = = = 0.5
𝑛(𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑝𝑎𝑐𝑒) 2
Example 5.17: For a card drawn from an ordinary deck, find the probability of getting a queen.
4
Solution: Since there are 4 queens and 52 cards, 𝑃(𝑞𝑢𝑒𝑒𝑛) = 52
If one of the assumptions stated above is violated, the classical approach no longer valid
5.4.3.2 Frequentist (empirical) Approach
If after 𝑛 repetition of an experiment, where 𝑛 is very large, an event is observed to occur in ℎ of
ℎ
these, then the probability of an event is 𝑛 or conduct an experiment a large number of times, and
count the number of times event A actually occurs, then an estimate of 𝑃(𝐴) is
Example 5.18: Suppose a coin was tossed 1000 times and the result was 587 tails. The relative
587
frequency of tails is1000. Another 1000 tosses lead to 511 tails. Then the relative frequency of tails
587+511 1098
is = 2000 . Proceeding, in this manner we obtain a sequence of numbers, which gets
1000+1000
closer and closer to the number defined as the probability of a trial in a single toss.
Therefore,
𝑛(𝐴)
𝑃(𝐴) = lim
𝑛→∞ 𝑛
73 | P a g e
5.5 Some Probability Rules
Rule 1: If A1 A2 , then P( A1) P( A2 )
P( E F ) P( E ) P( F ) P( E F ) = 1 1 1 = 3
2 2 4 4
5.5 Conditional Probability and Independence
Let A and B two events such that P ( A) 0 . Denote P ( B | A) the probability of B given that A has
occurred since A is know to have occurred; it becomes the new sample replacing the original S.
From this we are led to the definition
P( A B)
P( B | A) , P( A) 0 or
P( A)
P ( A B ) P ( A) P ( B | A)
In words, this is saying that the probability that both A and B occur is equal to the probability that
A occurs times the probability that B occurs given that has occurred. We call P ( B | A) the
conditional probability of B given A, i.e. the probability that B will occur given that A has occurred.
Example 5.12: A jar contains black and white marbles. Two marbles are chosen without
replacement. The probability of selecting a black marble and then a white marble
is 0.34, and the probability of selecting a black marble on the first draw is 0.47.
74 | P a g e
What is the probability of selecting white marble on the second draw, given that
the first marble drawn was black?
P ( A B ) P ((3,5))
1
36
On the other hand
75 | P a g e
Therefore, since 1 (6 ).( 5 ), we see that P( A B) P( A) P( B) and so events A
36 36 36
and B are not independent
B. Events A and C are independent. This is seen by noting that
P ( A C ) P (3,4)
1
36
CHAPTER SIX
6 Probability Distribution
Before probability distribution is defined formally, the definition of reviewed. In the first chapter,
a variable was defined as a characteristic or attribute that can assume different values various letter
of the alphabet are used to represent the variables.
At the end of this chapter students are expected to:
Know what meant by random variable, probability distribution, probability density
function, expected value and variance;
Be familiar with some standard discrete and continuous probability distributions;
Be able to use standard statistical tables for Normal, t, Chi-square distributions.
76 | P a g e
the r.v. X at the sample point s is X(s), and the set of all values of X, that is, the range of X, is
usually denoted by X(S) or RX.
The difference between a r.v. and a function is that, the domain of a r.v. is a sample space S, unlike
the usual concept of a function, whose domain is a subset of or of a Euclidean space of higher
dimension. The usage of the term “random variable” employed here rather than that of a function
may be explained by the fact that a r.v is associated with the outcomes of a random experiment.
Of course, on the same sample space, one may define many distinct r.vs.
Example 6.1: Suppose we are about to learn the sexes of the three children of a certain family.
The sample space of this experiment consists of the following 8 outcomes.
𝑆 = {(𝑏, 𝑏, 𝑏), (𝑏, 𝑏, 𝑔), (𝑏, 𝑔, 𝑏), (𝑏, 𝑔, 𝑔, ), (𝑔, 𝑏, 𝑏), (𝑔, 𝑏, 𝑔), (𝑔, 𝑔, 𝑏), (𝑔, 𝑔, 𝑔)}
The outcomes (𝑔, 𝑏, 𝑏) means, for instance that the youngest child is a girl, the next youngest is a
boy, and the oldest is a boy. Suppose that each of these 8 possible outcomes is equally likely, and
so each has probability 1/8. If we let X denote the number of female children in this family, then
the value of X is determined by the outcomes of the experiment. That is, X is a random variable
whose value will be 0, 1, 2 𝑜𝑟 3. i.e.
𝑋(𝑏𝑏𝑏) = 0, 𝑋(𝑔𝑏𝑏) = 𝑋(𝑏𝑔𝑏) = 𝑋(𝑏𝑏𝑔) = 1,
𝑋(𝑔𝑔𝑏) = 𝑋(𝑔𝑏𝑔) = 𝑋(𝑏𝑔𝑔) = 2, 𝑋(𝑔𝑔𝑔) = 3
Example 6.2: Recording the lifetime of an electronic device, or of an electrical appliance. Here S
is the interval (0, T) or for some justifiable reasons, S = (0, ∞), a r.v. X of interest is X(s) = s, s ∈
S.
Example 6.3: Measuring the dosage of a certain medication administered to a patient, until a
positive reaction is observed. Here S = (0, D) for some suitable D.
In the examples discussed above we have seen r.v.s with different values. Hence, random variables
can be categorized in to two broad categories such as discrete and continuous random variables.
6.1.1 Discrete Random Variable and Probability Distribution (pmf)
Definition 6.2:A random variable X is called discrete (or of the discrete type), if X takes on a finite
or countably infinite number of values; that is, either finitely many values such as x 1, . . . , xn, or
countably infinite many values such as x0, x1, x2, . . . .
Or we can describe discrete random variable as, it
Take whole numbers (like 0, 1, 2, 3 etc.)
Take finite or countably infinite number of values
77 | P a g e
Jump from one value to the next and cannot take any values in between.
Example 6.4:
Experiment Random Variable (X) Variable values
Children of one gender in a family Number of girls 0, 1, 2, …
Count cars at toll between 11:00 am & 1:00 pm Number of cars arriving 0, 1, 2, ..., n
Definition: If X is a discrete random variable, the function given by f(x) = 𝑝(𝑋 = 𝑥)𝑜𝑟 𝑃{𝑋 = 𝑥𝑖 }
for each 𝑥 within the range of X is said to be probability distribution or probability mass function
of X if it satisfies the following two conditions:
1. The sum of the probabilities of all the events in the sample space must equal 1; that is,
∑ 𝑝(𝑥) = 1
2. The probability of each event in the sample space must be between or equal to 0 and 1. that
is, 0 ≤ 𝑝(𝑥) ≤ 1.
Example 6.5: Consider r.v. X in Example 6.1 and construct probability distribution of X.
Solution: Since X will equal 0 if the outcome is (𝑏, 𝑏, 𝑏), we see that
1
𝑃(𝑋 = 0) = 𝑃(𝑏𝑏𝑏) =
8
Since X will equal 1 if the outcome is (𝑔𝑏𝑏), (𝑏𝑔𝑏), (𝑏𝑏𝑔) we have 𝑃(𝑋 = 1) =
3
𝑃{(𝑔𝑏𝑏) 𝑜𝑟 (𝑏𝑔𝑏) 𝑜𝑟 (𝑏𝑏𝑔)} = 8
3 1
Similarly, 𝑃(𝑋 = 2) = 𝑃{(𝑏𝑔𝑔) 𝑜𝑟 (𝑔𝑏𝑔) 𝑜𝑟 (𝑔𝑔𝑏)} = 8, 𝑃(𝑋 = 3) = 𝑃(𝑔𝑔𝑔) = 8
Therefore,
1
𝑖𝑓 𝑥 = 0, 3
8
𝑓(𝑥) = 3
𝑖𝑓 𝑥 = 1,2
8
{0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Example 6.6: Suppose we toss a coin three times, the sample space is represented as
TTT , TTH , THT , HTT , HHT , HTH , THH , HHH and if the random variable for the
number of heads.
A. Assign a value for a random variable
B. Find the probability distribution for A
78 | P a g e
Solution:
A. Once a random variable, say X , is defined as the number of heads, X 0,1, 2, or3
B.
Number of heads X 0 1 2 3
Probability 𝑝(𝑋 = 𝑥) 1 3 3 1
8 8 8 8
1 3 3 1
We can check that ∑ 𝑝(𝑋 = 𝑥) = 8 + 8 + 8 + 8 = 1
Example 6.7: Suppose that X is a random variable that takes on one of the value 0,1, 2, or3 . If
Example 6.8: A sales women has scheduled two appointments to sell encyclopedias. She feels
her first appointments will lead to a sale with probability 0.3. She also feels that
the second will lead to a sale with probability 0.6 and that the results from the two
appointments are independent. What is the probability distribution of X , the
number of sales made?
Solution: The random variable X can take on any of the value 0,1, 2 . It will equal 0 if neither
appointment leads to a sale, and so
PX 0 Pno sale on first, no sale on sec ond
Pno sale on first Pno sale on sec ond
(1 0.3)(1 0.6)
0.28
The random variable X will equal 1 either if there is a sale on the first and not on the second
appointment or if there is no sale on the first and one sale on the second appointment. Since these
two events are disjoint, we have
79 | P a g e
PX 1 PSale on first, no sale on sec ond PNo sale on first, sale on sec ond
PSale on firstPno sale on sec ond no sale on firstPno sale on sec ond
0.31 0.6 0.61 0.3
0.54
Finally, the random variable X will equal 2 if both appointments result in sales; thus
PX 2 Psale on first, sale on sec ond
Psale on firstPSale on sec ond
0.3x0.6
0.18
As check on this result, we note that
PX 0 PX 1 PX 2 0.28 0.54 0.18 1
𝑥+2
Exercise 6.1: Check whether the function given by 𝑓(𝑥) = for x = 1, 2, 3, 4, 5 is a p.m.f?
25
Where f(t) is the value of probability distribution or p.m.f of X at t, is called the distribution
function, or the cumulative distribution function of X.
If X takes on only a finite number of values x1, x2, . . . , xn, then the distribution function is given
by
Example 6.9:
Find the distribution function F of the total number of heads obtained in four tosses of a balanced
coin?
The distribution function, or the cumulative distribution function F(X) will be the following;
80 | P a g e
0 𝑓𝑜𝑟 𝑥 < 0
1
𝑓𝑜𝑟 0 ≤ 𝑥 < 1
16
5
𝑓𝑜𝑟 1 ≤ 𝑥 < 2
16
𝐹(𝑋) = 11
𝑓𝑜𝑟 2 ≤ 𝑥 < 3
16
15
𝑓𝑜𝑟 3 ≤ 𝑥 < 4
16
{1 𝑓𝑜𝑟 𝑥 ≥ 4
Exercise 6.2: A telephone survey of households throughout Washington State is given below:
81 | P a g e
The following examples are continuous r.v.s
Experiment Random Variable X Variable values
Weigh 100 People Weight 45.1, 78, ...
Definition 6.4: A function with values f(x), defined over the set of all real numbers, is called a
probability density function of the continuous random variable X if and only if
𝑏
P (a ≤ x ≤ b) = ∫𝑎 𝑓(𝑥)𝑑𝑥 for any real constant a ≤ b.
Probability density function also referred as probability densities (p.d.f.), probability function, or
simply densities.
Remark:
The probability density function f (x) of the continuous random variable X, has the following
properties (satisfy the conditions)
1. f(x) ≥ 0 for all x, or for −∞ < x < ∞
2. f ( x) f ( x) dx 1
If X is a continuous random variable and a and b are real constants with a ≤ b, then
P (a ≤ x ≤ b) = P (a < x ≤ b) = P (a ≤ x < b) = P (a < x < b)
Example 6.11: If X is the probability density
−3𝑥
𝑓(𝑥) = {𝑘. 𝑒 𝑓𝑜𝑟 𝑥 > 0
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
𝟏 𝟏 𝟏
And 𝑃(0.5 ≤ 𝑋 ≤ 1) = ∫𝟎.𝟓 3. 𝑒 −3𝑥 𝑑𝑥 = [−𝑒 −3𝑥 ]10.5 = −𝒆−𝟑 + 𝒆−𝟏.𝟓 = 𝒆𝟏.𝟓 − 𝒆𝟑
82 | P a g e
Exercise 6.3:
The p.d.f of the random variable X is given by
𝐶
𝑓𝑜𝑟 0 < 𝑥 < 4
𝑓(𝑥) = {√𝑥
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Find a. the value of C?
1
b. 𝑃(𝑋 < ) and P(X > 1)?
4
Definition: If X is a continuous random variable and the value of its probability density is f (t),
x
then function given by F ( x) P ( X x)
f (t ) dt
is called the distribution function, or the
Exercise 6.5:
A r.v. X has d.f. F given by:
83 | P a g e
Definition : If X is a discrete random variable that takes on one of the possible values x1 , x2 , xn
𝐸(𝑋) = ∑ 𝑥𝑖 𝑝(𝑥𝑖 )
𝑖=1
Where f x is probability density function in the case of discrete random variable its name will
change to probability mass function (pmf).
Example 6.12: Find the expected value of the following random variable
𝑿 0 1 2 3 4
𝑷(𝑿) 0.18 0.34 0.23 0.21 0.04
Solution: 𝐸(𝑋) = ∑4𝑥=0 𝑥𝑃(𝑋 = 𝑥)
= 0(0.18) + 1(0.34) + 2(0.23) + 3(0.21) + 4(0.04)
= 1.14
Note that: The expected value of a random variable is the same as with the mean of a
random variable
∑ 𝑥𝑃(𝑥) , 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝑋̅ = 𝐸(𝑋) = {
∫ 𝑥𝑓(𝑥) 𝑑𝑥, 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑜𝑢𝑠.
Suppose that we are given random variable random variable along with its probability mass
function (pmf) if it is discrete or probability density function (pdf) if it is continuous, and that we
want to compute the expected value of some function of 𝑋, say 𝑔(𝑋). How can we accomplish
this? One way is follows: Since 𝑔(𝑥) is determined from the pmf/pdf of 𝑋. Once we have
determined the pmf/pdf of 𝑔(𝑥) we can compute 𝐸 [𝑔(𝑥)] by using the definition of expected
value.
∑ 𝑔(𝑥)𝑃(𝑥) , 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝐸[𝑔(𝑥)] = {
∫ 𝑔(𝑥)𝑓(𝑥)𝑑𝑥 , 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑜𝑢𝑠.
Example 6.13: Let X denote a random variable that takes on any of the values -1, 0, 1 with
respective probability 𝑃{𝑋 = −1} = 0.2 ,𝑃{𝑋 = 0} = 0.5 ,𝑝{ 𝑋 = 1} = 0.3, then
compute 𝐸(𝑋 2 ).
84 | P a g e
Solution: Letting 𝑌 = 𝑔(𝑥) = 𝑋 2 ,
𝐸[𝑔(𝑥)] = ∑ 𝑔(𝑥)𝑃(𝑥) = (−1)2 . 𝑃(𝑋 = −1) + 02 . 𝑃(𝑋 = 0) + 12 . 𝑃(𝑋 = 1)
= 1(0.2) + 0(0.5) + 1(0.3) = 0.5
The reader should note that (𝐸 [𝑋])2 = 0.01
0.5 = 𝐸 [𝑋 2 ] ≠ (𝐸[𝑋])2 = 0.01
If 𝑎 and 𝑏 are constants then
𝐸(𝑎𝑋 + 𝑏) = 𝑎𝐸[𝑋] + 𝑏
The expected value of a random variable 𝑋, 𝐸 [𝑋] is also referred to as the mean or the first moment
of 𝑋. The quantity 𝐸[𝑋 𝑛 ], 𝑛 ≥ 1, is called the 𝑛𝑡ℎ moment of 𝑋. By definition
∑ 𝑋 𝑛 𝑃(𝑋 = 𝑥), 𝑖𝑓 𝑋 𝑖𝑠 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒
𝐸[𝑋 𝑛 ] = { 𝑛
∫ 𝑋 𝑓(𝑥)𝑑𝑥, 𝑖𝑓 𝑋 𝑖𝑠 𝑐𝑜𝑛𝑡𝑖𝑛𝑜𝑢𝑠.
Exercise 6.6: The following are the annual income of 7 men and 7 women residents of a certain
community.
Men Women
33.5 24.2
25.0 19.5
28.6 27.4
41.0 28.6
30.5 32.2
85 | P a g e
29.6 22.4
32.8 21.6
Suppose that a woman and a man randomly chosen. Find the expected value of the sum of their
incomes.
Solution: Let 𝑋 be the man’s income and Y is the woman’s income. Since 𝑋 is equally likely to
be any of the values in the men’s column, we see that
1
𝐸 (𝑋) = (33.5 + 25 + ⋯ + 32.8) = 31.571
7
1
Similarly, 𝐸 [ 𝑌] = (24.2 + 19.5 + ⋯ + 21.6) = 25.129
7
= ∑( 𝑋2 − 2𝜇𝑋 + 𝜇2 ) 𝑃(𝑋)
= 𝐸[𝑥2] − 2𝜇 2 + 𝜇 2
= 𝐸(𝑋 2 ) – 𝜇 2
86 | P a g e
Now, since 𝑋 2 will equal (−1)2 , 42 , 𝑜𝑟 82 with respective probabilities of
0.7, 0.2, 𝑎𝑛𝑑 0.1, we have
𝐸[𝑋 2 ] = 1 (0.7) + 16 (0.2) + 64 (0.1) = 10.3
Therefore, 𝑉𝑎𝑟 (𝑋) = 10.3 – (0.9)2 = 9.94
Properties of Variance
1. For any random variance X and constant C, it can be shown that
𝑉𝑎𝑟 (𝐶𝑋) = 𝐶 2 𝑉𝑎𝑟(𝑋)
𝑉𝑎𝑟 (𝐶 + 𝑋) = 𝑉𝑎𝑟 (𝑋)
2. If 𝑋 and 𝑌are independent random variable, 𝑉𝑎𝑟 (𝑋 + 𝑌) = 𝑉𝑎𝑟 (𝑋) + 𝑉𝑎𝑟(𝑌)
3. The square root of the 𝑉𝑎𝑟 (𝑋) is called the standard deviation of 𝑋, and we denote it by
𝑆𝐷 (𝑋) . That is, 𝑆𝐷 (𝑥) = √𝑉𝑎𝑟(𝑥)
6.3 Common Discrete Probability Distribution
6.3.1. Binomial Distribution
Many types of probability problems have only two outcomes, or they can be reduced to two
outcomes. For example, when a coin is tossed, it can land heads or tails.
A probability experiment is a binomial probability experiment that satisfies the following four
requirements:
1. Each trial can have only two outcomes or outcomes that can be reduced to two
outcomes.
2. There must be a fixed number of trials
3. The outcomes of each trial must be independent
4. The probability of a success must remain the same for each trial
The outcomes of a binomial experiment and the corresponding probabilities of these outcomes are
called a binomial distribution. The probability mass function of a binomial random variable having
parameter (n, p) is given by
𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑃 𝑥 (1 – 𝑃)𝑛 – 𝑥 , 𝑖 = 0, 1 , … . , 𝑛
𝑥
Example 6.15: Five fair coins are flipped. If the outcomes are assumed independent, find
the probability of the number of heads obtained
Solution: If we let 𝑋 equal the number of heads (successes) parameters(𝑛 = 5, 𝑃 = ½).
Hence,
87 | P a g e
5 1 1
3 2
PX 3
10
3 2 2 32
5 1 1
4
PX 4
5
4 2 2 32
5 1 1
5
PX 5
1
5 2 2 32
Example 6.16:
A. Determine PX 12 when is a Binomial random variable with parameters 𝑛 =
20 and 𝑃 = 0.4
B. Determine PY 10 when Y is a Binomial random variable with parameters 𝑛 = 16and
𝑃 = 0.5
Solution:
A. PX 12 1 PX 12
88 | P a g e
1840). In addition to being used for the stated conditions (i.e. 𝑛 is large, p is small, and the variable occur
over a period of time), the Poisson distribution can be used when a density of items is distributed over a
given area or volume, such as the number of plants growing per acre of woods or the number of defects in
a given length of videotape.
If X is Poisson random variable with parameter , then
𝑒−
𝑥
𝑃(𝑥) = 𝑃(𝑋 = 𝑥) =
𝑥!
Both the expected value and the variance of a Poisson random variable are equal to . That is, we have the
following. If X is a Poisson random variable with parameter , 0 ; then
EX , VarX
Example6.19: Suppose the average number of accidents occurring weekly on a particular high way is
equal to 1.2. Approximate the probability that there is at least one accident this week.
Solution: Let x denote the number of accidents because it is reasonable to suppose that there are a large
number of cars passing along the high way, each having a small probability of being involved in
an accident, the number of such accidents should be approximately a Poisson random variable.
That is, if x denotes the number of accidents that will occur this week, then x is approximately
Poisson random variable with mean value 1.2 . The desired probability is now obtained as
follows.
e 1.2 1.2
0
px 0 1 px 0 1 1 e 1.2 0.6988
0!
Therefore, there is approximately a 70% chance that there will be at least one accident this
week.
We can approximate Binomial distribution to Poisson distribution if n is large and p is too small.
Thus, the approximately Poisson distribution has a parameter.
np
Example 6.20: Suppose that items produced by a certain machine are independently
defective with probability 0.1.What is the Poisson approximation for this
probability?
89 | P a g e
Solution: If we let x denote the number of defective items, then x is a Binomial random variable
with parameters n 10 and P 0.1. Thus the desired probability is
10 10
pX 0 PX 1 0.1 0.9 0.1 0.9
0 10 1 9
0 1
0.7361
Since nP 100.1 1 , the Poisson approximation yields the value.
90 | P a g e
Since the probability density function of a normal random variable 𝑋 is symmetric about its
expected value 𝜇; it follows that 𝑋 is equally likely to be on either side of 𝜇. That is,
𝑃{𝑋 < 𝜇} = 𝑃{𝑋 ≥ 𝜇} = 0.5
Not all bell-shaped symmetric density curves are normal. The normal density curves are specified
by a particular formula:
1 (𝑥−𝜇)2
−
𝑓(𝑥) = 𝑒 2𝜎2
√2𝜋𝜎
A normal random variable having mean value 0 and standard deviation 1 is called a standard
normal variable, and its density curve is called the standard normal curve. The letter 𝑍 represents
a standard normal random variable.
1 (𝑥)2
−
𝑓(𝑥) = 𝑒 2
√2𝜋
Once the 𝑋 values are transformed by using the above formula, they are called 𝑍 value is actually
the number of standard deviations that a particular 𝑋 value is a way from the mean.
Steps to find areas under the normal distribution curve
1. Between 0 and any 𝑍 value: Look up the 𝑍 value in the table to get the area
2. In any tail
a. Look up the 𝑍 value to get the area
b. Subtract the area from 0.5
3. Between 𝑍 values on the same side of the mean
a. Look up both 𝑍 values to get the area
b. Subtract the smaller area from the larger area
4. Between two 𝑍 values on opposite sides of the mean
a. Look up both 𝑍 values to get the area
b. Add the areas
5. Less than any 𝑍 value to get the right of the mean
a. Look up the 𝑍 value to get the area
91 | P a g e
b. Add 0.5 to the area
6. Greater than any 𝑍 value to the left of the mean
a. Look up the 𝑍 value in the table to get the area
b. Add 0.5 to the area
7. In any two tailed
a. Look up 𝑍 values in the table to get the areas
b. Subtract both areas from 0.5
c. Add the answer
General procedure is
Draw the picture
Shade the area desired
Find the correct figure
Follow the direction
Example 6.15: Find the area under the normal distribution curve between 𝑍 = 0 and 𝑍 = 2.34
Solution: Draw the area as follows:
0 2.34
Since 𝑍 table gives the area between 0 and any 𝑍 value to the right of 0, one need look up the
𝑍 value in the table. Find 2.3 in the left column and 0.04 in the top row. The value where the
column and row meet in the table is the answer, 0.4904.
0.0
0.1
0.2
92 | P a g e
⋮
2.2
2.3 0.4904
0 1.50
0 0 1.5
0 1.50
𝑃{𝑍 < 1.5} = 0.5 + 𝑃{0 < 𝑍 < 1.5} = 0.5 + 0.4332 = 0.9332
0 0.8
93 | P a g e
0 0.8 0 0 0.8
𝑃{𝑍 ≥ 0.8} = 0.5 − 𝑃{0 < 𝑍 < 0.8} = 0.5 − 0.2881 = 0.2119
Example 6.17: Find
A. 𝑃{1 < 𝑍 < 2}
B. 𝑃{−1.5 < 𝑍 < 2.5}
Solution:
A. Draw the graph as follows:
0 1 2
0 1 0 2 0 1 2
2
𝑃{1 < 𝑍 < 2} = 𝑃{0 < 𝑍 < 2} − 𝑃{0 < 𝑍 < 1} = 0.4772 − 0.3159 = 0.1359
-1.5 0 2.5
94 | P a g e
-1.50 2.5
-1.50 0 2.5
𝑃{−1.5 < 𝑍 < 2.5} = 𝑃{−1.5 < 𝑍 < 0} + 𝑃{0 < 𝑍 < 2.5}
Since 𝑃{−1.5 < 𝑍 < 0} = 𝑃{0 < 𝑍 < 1.5} , due to symmetric property of normal distribution
𝑃{−1.5 < 𝑍 < 2.5} = 𝑃{0 < 𝑍 < 1.5} + 𝑃{0 < 𝑍 < 2.5} = 0.4332 + 0.4938 = 0.9270
Finding Normal Probabilities: Conversion to the Standard Normal
Let 𝑋be a normal random variable with mean 𝜇 and standard deviation 𝜎. We can determine
probabilities concerning 𝑋 by using the fact that the variable 𝑍 defined by
𝑋−𝜇
𝑍=
𝜎
has a standard normal distribution.
We can compute any probability statements in terms of 𝑍. For example,
𝑋−𝜇 𝑎−𝜇 𝑋−𝜇
𝑃{𝑋 < 𝑎} = 𝑃 { < } = 𝑃 {𝑍 < }
𝜎 𝜎 𝜎
where 𝑍 is a standard normal random variable
Example 6.18: IQ examination scores for sixth-graders are normally distributed with mean value
100 and standard deviation 14.2.
A. What is the probability a randomly chosen sixth-grader has a score greater
than 130?
B. What is the probability a randomly chosen sixth-grader has score between 90
and 115?
Solution: Let 𝑋 denote the score of a randomly chosen student. We compute probabilities
concerning 𝑋 by making use of the fact that the standardized variable
𝑋 − 100
𝑍=
14.2
has a standard normal distribution
𝑋−100 130−100
A. 𝑃{𝑋 > 130} = 𝑃 { > } = 𝑃{𝑍 > 2.1127} = 0.0170
14.2 14.2
95 | P a g e
90 − 100 𝑋 − 100 115 − 100
< <
14.2 14.2 14.2
Or equivalently,
−0.7042 < 𝑍 < 1.0560
Therefore,
𝑃{90 < 𝑋 < 115} = 𝑃{−0.7042 < 𝑍 < 1.0560}
= 𝑃{0 < 𝑍 < 0.7042} + 𝑃{0 < 𝑍 < 1.0560} = 0.6120
Properties of the Normal distribution
1. The normal distribution curve is bell-shaped
2. The mean, median and mode are equal and located at the center of the distribution
3. The normal distribution curve is unimodal
4. The curve is symmetrical about the mean, which is equivalent to saying that is shape the
same on both sides of vertical line passing through the center
5. The curve is continuous. That is, no gaps or holes
6. The curve never touches the 𝑥 axis
7. The total area under the normal distribution curve is equal to 1
Relation between Binomial and Normal Distribution
Normal distribution is a limiting case of the Binomial probability distribution under the following
condition:
I. 𝑛, the number of trial is indefinitely large
II. Neither 𝑃 and 𝑞 is very small
We know that for a Binomial variable 𝑋 with parameters 𝑛 and 𝑝
𝐸[𝑋] = 𝑛𝑝
𝑉𝑎𝑟[𝑋] = 𝑛𝑝𝑞
De-Moivre provide that under the above two conditions, the distribution of standard Binomial
variable
𝑋 − 𝐸[𝑋] 𝑋 − 𝑛𝑝
𝑍= =
𝜎 √𝑛𝑝𝑞
tends to the distribution of standard normal distribution. If 𝑝 and 𝑞 are nearly equal (i.e., 𝑝 is nearly
0.5), then the normal approximation is surprisingly good even for small values of 𝑛.
Relation between Poisson and Normal Distribution
96 | P a g e
If 𝑋 is a random variable following Poisson distribution with parameter 𝜆, then 𝐸[𝑋] = 𝜆,
𝑉𝑎𝑟[𝑋] = 𝜆
𝑋−𝐸[𝑋] 𝑋−𝜆
Thus standard Poisson variable becomes 𝑍 = = . It has been proved that this variable
𝜎 √𝜆
of freedom.
If 𝑥1 , 𝑥2 , … , 𝑥𝑣 are 𝑣 independent random variables following normal distribution with means
𝜇1 , 𝜇2 , … , 𝜇𝑣 and standard deviations 𝜎1 , 𝜎2 , … , 𝜎𝑣 respectively then the variate
2 =(
x1 − μ1 2
) +(
x2 − μ2 2
) + ⋯+ (
xv − μv 2
) = 𝑍21 + 𝑍22 + ⋯ + 𝑍2𝑣 = ∑ 𝑍2𝑖
σ1 σ2 σv
this is the sum of the square of 𝑣 independent standard normal variates, follows chi-square
distribution with 𝑣 degree of freedom.
Applications of chi-square distribution
Chi-square distribution has a number of applications. Some of which are listed below
Chi-square test of goodness of fit
Chi-square test for independence of attributes
To test whether the population has a specified value of the variance
6.4.3Student’s 𝒕 distribution
It is often the case that one wants to calculate the size of sample needed to obtain a certain level of
confidence in survey results. Unfortunately, this calculation requires prior knowledge of the
population standard deviation (𝜎). Realistically, 𝜎 is unknown. Often a preliminary sample will be
conducted so that a reasonable estimate of this critical population parameter can be made. If such
a preliminary sample is not made, but confidence intervals for the population mean are to be
constructing using an unknown 𝜎, then the distribution known as the Student t distribution can be
used.
97 | P a g e
A random variable X has a t distribution (students t–distribution) if its probability distribution
given by
𝑣 +1
𝛤( ) 𝑥 2 −(𝑣+1)/2
𝑓(𝑥, 𝑣) = 2
𝑣 (1 + ) for - ∞ < x < ∞
√𝑣𝜋 𝛤(2) 𝑣
with v degrees of freedom. If v is large (v ≥ 30), the graph of f (x) closely approximates the standard
normal curve.
Properties of t Distribution
a. Le X be a t-distribution random variable with parameter v then
𝑣 +1
∞ 𝛤(
2
) 𝑥 2 −(𝑣+1)/2
Mean, E(X) = ∫−∞ 𝑥 𝑣 (1 + ) 𝑑𝑥 = 0 for v >2
√ 𝑣𝜋 𝛤( ) 𝑣
2
𝑣
Variance, Var(X) = E(X – E(x))2 = 𝑣−2 for v > 2, and population variance is unknown.
b. The Student t distribution is different for different sample sizes.
c. The Student t distribution is generally bell-shaped, but with smaller sample sizes shows
increased variability (flatter). In other words, the distribution is less peaked than a normal
distribution and with thicker tails. As the sample size increases, the distribution approaches
a normal distribution. For n > 30, the differences are negligible.
d. The distribution is symmetrical about the mean. i.e. about zero.
e. The variance is greater than one, but approaches one from above as the sample size increases
(𝜎=1 for the standard normal distribution).
f. The population is essentially normal (unimodal and basically symmetric)
98 | P a g e