0% found this document useful (0 votes)
44 views155 pages

MTH212

Uploaded by

Gadisa Bedane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views155 pages

MTH212

Uploaded by

Gadisa Bedane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 155

CHAPTER ONE

INTRODUCTION TO BUSINESS
STATISTICS

By Baldada K. (MBA)
1.1.Definition of Statistics
 Statistics is defined differently by different authors over a
period of time. In the olden days statistics was confined
to only state affairs but in modern days it embraces
almost every sphere of human activity.
 Statistics can be defined in broader sense and narrower
sense.
 In broader sense statistics is defined as an aggregate
figures or facts about a certain events or situations or
attributes of variables.
Example: the numbers of daily accidents recorded per
month.
 In a narrower sense, statistics is defined as: “a science
of collecting, summarizing, presenting and analyzing
data as well as drawing conclusions or making
predictions about population on the basis of sample
analysis.”
Contd
• Statistics is defined as the science which deals with the method
of collecting, classifying, presenting, comparing, and interpreting
numerical data. “Seligman”
• The science of statistics is the method of judging, collecting
natural or social phenomena from the results obtained from the
analysis or enumeration or collection of estimates. “King”
• A.L. Bowley has defined statistics as: (i) statistics is the science
of counting, (ii) Statistics may rightly be called the science of
averages, and (iii) statistics is the science of measurement of
social organism regarded as a whole in all its manifestations.
• Statistics is defined as the science of estimates and probabilities.
“Boddington”
• Generally for our purpose statistics is defined as methods
specially adapted to the collection, classification, analysis, and
interpretation data for making effective decisions in all functional
areas of management.
1.2. Some key terms in statistics
• Population - is the collection of all possible observations of a
specified characteristic of interest.
• Sample – is the portion or part of the population of interest.
• Element – entity on which data are collected. For example, a single
student from whom information is collected.
• Census or complete enumeration: - a study that includes every
member of the target population, but it is too costly & time
consuming. For example, study that involves all management
department students of Arba Minch university.
• Parameter – is the population characteristics of interest. For
example, mean and standard deviation of the population.
• Statistic – is the characteristics of sample. For example, mean and
standard deviation of sample.
• Variable- is a characteristic that assumes different values for
different elements. E.g. Students’ GPA, height of students, weight
of students or it could be price of commodities or other things.
Characteristics of Statistics
 Statistics are aggregates of facts: This means a single
figure is not statistics.
 Statistics are affected by a number of factors: For
example sale of a product depends on a number of
factors such as its price, quality, competition, the
income of consumers, and so on.
 Statistics must be reasonably accurate: if wrong
figures is analyzed, it will lead to erroneous conclusion.
 Statistics must be collected in a systematic manner: If
data are collected in a haphazard manner, they will not
be reliable and will lead to misleading conclusions.
Importance of Statistics in Business
 There are three major functions in any business enterprise
in which statistical methods are useful.
The planning functions: This may relate to either special projects
or to the recurring activities of the firm over specified period.
The setting up standards: This may relate to the size of
employment, volume of sales, fixation of quality norms
for the manufactured products, norms for daily output,
and so forth.
The function of control: This involves comparison of
actual production achieved against the norm or target
set earlier.
1.3 Types of Statistics
 The statisticians commonly classify this subject in to two
broad categories:
 Descriptive statistics: Descriptive Statistics describe the
data set that’s being analyzed, but doesn’t allow us to
draw any conclusions or make any interferences about
the data. Example: Arba Minch University was graduate
students in the year of 2005 is 2050, in the year of 2006 is
2500 and in the year of 2007 is 3500, this belongs to the
domain of descriptive statistics.
 Inferential statistics: Inferential statistics is also a set of
methods, but it is used to draw conclusions or inferences
about characteristics of populations based on data from
a sample. Example: The average per capital income of all
Ethiopian population can be estimated from figures
obtained from a sample of the population is $3.
1.4. TYPES OF VARIABLES OR DATA

• Variable: Is an item of interest that can take


in many different numerical values.
• Variables can be categorized as continuous
or discrete. A continuous variable is
measured along a continuum. A discrete
variable, on the other hand, is measured in
whole units or categories. So discrete
variables are not measured along a
continuum.
• Variables can be categorized as
quantitative or qualitative.
1.5.Scope or Application of Statistics
 Statistics is applied in every sphere of human activity like
social as well as natural science. It is almost impossible to
find a single department of human activity where statistics
cannot be applied.
o Marketing: Forecasting Sales, Demand And Market Shares, to
compare Sales Performances, Customer profiling, market
research, customer buying behavior and Pattern: Cluster
Analysis And Correlation Regression.
o Operations: Input Stage: Sampling for sampling inspection &
inventory control, Process stage: Normal distribution for
statistical quality control and six sigma method, Output stage:
Sampling And Binomial Distribution for QC.
o HR management: Performance appraisal: normal distribution,
Reward system: percentile
Cont’d
o Finance: Helps in ascertaining risks in quantitative
terms, Helps in comparing volatility and selection of
portfolio of stocks.
o Economics: Statistical methods render valuable
assistance in proper understanding of economic
problems.
o Insurance: Forecasting for determining the premium,
Regression analysis for finding out impact of different
factors on Health and life
Why Study Statistics
 Decision Makers Use Statistics to:

 Present and describe business data and


information properly: it transforms data into useful
information for decision makers

 Draw conclusions about large populations, using


information collected from samples

 Make reliable forecasts about a business activity

 Improve business processes


Limitations of statistics
 Statistics with all its wide application in every sphere of
human activity has its own limitations. Some of them
are given below:
Statistics is not suitable to the study of qualitative
phenomenon.
Statistics does not study individuals
Statistical laws are not exact
Statistics table may be misused
Statistics is only, one of the methods of studying a
problem
Misuses of statistics
Many people knowingly or unknowingly use statistical data in
wrong manner. Let us see what the main misuses of statistics
are:
 Sources of data not given: in the absence of the sources, the
reader does not know how far the data are reliable.
Defective data: another misuse is that sometimes one gives
inaccurate data.
Unrepresentative sample: One may choose a sample just on
the bases of convenience.
Unfair comparison: an important misuse of statistics is
making unfair comparison from the data collected.
Unwarranted conclusion: another misuse of statistics is
unwarranted conclusion. This may be as a result of making
false assumptions.
Mistake in arithmetic: finally, one may come across certain
mistakes in calculations or in the application of wrong
formula.
THANK YOU!!

ANY QUESTIONS AND COMMENTS???


CHAPTER TWO
DATA COLLECTION AND
PRESENTATION
2.1Definition of Data
 Everybody collects, interprets and uses data or
information, much of it in a numerical or statistical
form in day-to-day life.
 Data is the value that the variables can take, which is
either numerical or categorical value. It is
unprocessed figures or raw facts that do not provide
full meaning. The purpose of statistics is to
transform data into information, which would be
relevant for making reasonable decisions.
 Information: is manipulated data that will be used for
managerial decision making purpose.
Nature of data
 Different types of data can be collected for different purposes.
The data can be collected in connection with time or
geographical location or in connection with time and location.
The following are the three types of data:
 Time series data.
 Spatial data
 Spacio-temporal data.
i. Time series data
 It is a collection of a set of numerical values, collected over
a period of time. The data might have been collected either
at regular intervals of time or irregular intervals of time.
Example: The data collected for the three types of expenditures
in birrs for a family for the four years 2001,2002,2003,2004.
Cont’d
ii. Spatial Data:
If the data collected is connected with that of a
place, then it is termed as spatial data.
Example: Assume the population of the southern
nation and nationality of Ethiopia in 2006.
iii. Spacio Temporal Data:
If the data collected is connected to the time as
well as place then it is known as Spacio temporal
data.
Example: Assume the population of the southern
nation and nationality of Ethiopia in 2006 and 2007.
Types of scale Measurements

Classification of Data
 Any statistical data can be classified under two
categories depending upon the sources utilized. These
categories are:
Primary data
Secondary data
 Primary source of Data: Are those which are collected a
fresh and for the first time and thus happen to be original
in character and known as Primary data. The user is the
data collector. The primary data can be collected by the
following five methods.
Direct personal interviews.
Indirect Oral interviews.
Information from correspondents.
Mailed questionnaire method.
Schedules sent through enumerators.
Cont’d
 Secondary Data: are those data which have been
already collected and analyzed by some earlier
agency for its own use; and later the same data are
used by a different agency. The sources of secondary
data can broadly be classified under two heads:
Published sources,
Unpublished sources.
2.3Tabular Methods of Data Presentation
 Tabulation is the process of summarizing classified or grouped data
in the form of a table so that it is easily understood and an
investigator is quickly able to locate the desired information.
 A table is a systematic arrangement of classified data in columns
and rows. On the basis of the number of characteristics, tables may
be classified as follows:
 Simple or one-way Table: is the simplest table which contains
data of one characteristic only.
 Two-way Table: A table, which contains data on two
characteristics, in such case, therefore, either stub or caption is
divided into two co-ordinate parts.
 Manifold Table: A table, which has more than two characteristics
of data, we may further classify the caption or stub sub-headings.
Frequency Distributions
 Frequency distribution is a series when a number of
observations with similar or closely related values are put in
separate bunches or groups, each group being in order of
magnitude in a series.
 It is simply a table in which the data are grouped into classes
and the numbers of cases which fall in each class are recorded.
 A frequency distribution is constructed for three main reasons:
To facilitate the analysis of data.
To estimate frequencies of the unknown population
distribution from the distribution of sample data and
To facilitate the computation of various statistical measures
Raw data: the statistical data collected are generally raw
data or ungrouped data. A better way to express the figures in an ascending or
descending order of magnitude and is commonly known as array. Let us
consider the daily wages (in birr) of 10 laborers in a factory.
8 7 5 5 6 2 table is arranged in order
nd

0 0 5 0 0
6 5 3 6 7 3 5 5 5 6
0 5 8 5 5 8 0 5 5 0
Frequency distribution classified in to two: 6 6 7 7 8
i. Discrete (or) Ungrouped frequency distribution: 0 5 0 5 0
 In this form of distribution, the frequency refers to discrete value.
 Here the data are presented in a way that exact measurements of units are
clearly indicated.
 There are definite differences between the variables of different groups of
items. Each class is distinct and separate from the other class.
 Non-continuity from one class to another class exists.
 The process of preparing this type of distribution is:
o Count the number of times a particular value is repeated.
o Prepare a column of tallies.
o Finally count the number of bars or tallies and get frequency.
 Example: In a survey of 40 families in a village, the number of
children per family was recorded and the following data
obtained.
1 0 3 2 1 5 6 2
2 1 0 3 4 2 1 6
3 2 1 5 3 3 2 4
2 2 3 0 2 1 4 5
3 3 4 4 1 2 4 5

Solution:
Frequency distribution of the number of children
ii. Continuous frequency distribution:
 In this form of distribution refers to groups of values.
 This becomes necessary in the case of some variables
which can take any fractional value and in which case an
exact measurement is not possible.
 Hence a discrete variable can be presented in the form of a
continuous frequency distribution.
 Wage distribution of 100 employees
Weekly wages Number of
(birr) employees
50-100 4
100-150 12
150-200 22
200-250 33
250-300 16
300-350 8
350-400 5
Total 100
Nature of class/terms to develop class
 The following are some basic technical terms when a
continuous frequency distribution is formed or data are
classified according to class intervals:
i. Class limits:
 The class limits are the lowest and the highest values that can be
included in the class.
 The two boundaries of class are known as the lower limits and the
upper limit of the class.
 In statistical calculations, lower class limit is denoted by L and upper
class limit by U.
ii. Class Interval:
 It is the size of each grouping of data.
 Each grouping begins with the lower limit of a class interval and ends
at the lower limit of the next succeeding class interval.
 For example, 50-75, 75-100, 100-125… are class intervals.
Cont’d
iii. Width or size of the class interval:
 The difference between the lower and upper class limits is called Width
or size of class interval and is denoted by ‘C’.
C = Range = Range
CI 1+3.322 log10 N
iv. Range:
 The difference between largest and smallest value of the observation
and is denoted by ‘R’
R = Largest value – Smallest value
v. Mid-value or mid-point:
 The central point of a class interval is called the mid value or mid-point.
 It is found out by adding the upper and lower limits of a class and
dividing the sum by 2. Mid value = L+ U
2
vi. Frequency:
 Number of observations falling within a particular class interval
Types of class intervals
i. Exclusive method:
 When the class intervals are so fixed that the upper limit of one class is
the lower limit of the next class; it is known as the exclusive method of
classification.
 It is clear that the exclusive method ensures continuity of data.
ii. Inclusive method:
 Both the lower and upper limits are included in the class interval.
 In this method, the overlapping of the class intervals is avoided.
 This type of classification may be used for a grouped frequency
distribution for discrete variable like members in a family, number of
workers in a factory etc.
 It cannot be used with fractional values like age, height, weight etc.
iii. Open end classes:
 A class limit is missing either at the lower end of the first class interval
or at the upper end of the last class interval or both are not specified.
 The necessity of open end classes arises in a number of practical
situations, particularly relating to economic and medical data when
there are few very high values or few very low values which are far apart
from the majority of observations.
 Example: The following data show the cost of electricity
during July 2013 for a random sample of 50 one-bedroom
apartments:

 Construct a grouped frequency distribution and a


percentage distribution.
 Construct a relative frequency distribution
 Construct a cumulative frequency distribution.

 Step-3: Construct grouped frequency distribution table and
relative frequency distribution.
Classes Frequency R. percentag
Frequency e
82-104 5 10%

104-126 7 14%
126-148 13 26%

148-170 14 28%
170-192 8 16%
192-214 3 6%

Total 50 100%
 Proportion = Relative frequency = number of values in each
class
Total number of values
Cumulative frequency distribution
 Cumulative frequency is defined as the sum of all
frequencies up to or above the current point. Cumulative
frequency can be less-than cumulative frequency or
greater-than cumulative frequency.
 Less-than cumulative frequency indicates the number of
elements in the data set that lie below the current value.
 Greater-than cumulative frequency indicates the number
of elements in the data set that lie above the current value.
 Step-4: Construct less than and greater than cumulative
frequency table.
Classes Frequency <Cumulative
frequency
82-104 5 5 50
104-126 7 12 45
126-148 13 25 38
148-170 14 39 25
170-192 8 47 11
192-214 3 50 3
Total 50
iii. Histogram:
 A graph in which the classes are marked on the horizontal axis by using a
series of adjacent rectangles and the class frequencies represented by the
height of the rectangles on the vertical axis.
 Histogram is similar to bar graph but there are two critical differences:
o The horizontal (x-axis) is a continuous scale. As a result of this there
are no gaps between the bars.
o The height of the rectangle is only proportional to the frequency if the
class intervals are all equal.
 The following data regarding the cost of electricity for a random sample of
50 one-bedroom apartments can be depicted on histogram.
Class Frequency
82-104 5
104-126 7
126-148 13
148-170 14
170-192 8
192-214 3
 Histogram showing the cost of electricity for a random sample
of 50 one-bedroom apartments.

Frequency
16
14
12
class frequencies

10
8
6
4
2
0
[82 - 104)[104 - 126)[126 - 148)[148 - 170)[170 -192) [192 -
214)
Class interval

Class Mid-point Frequency


82-104 93 5
104-126 115 7
126-148 137 13
148-170 159 14
170-192 181 8
192-214 203 3
V. Ogive:
 For a set of observations, we know how to construct a frequency
distribution.
 In some cases we may require the number of observations less than a
given value or more than a given value.
 This is obtained by a accumulating (adding) the frequencies up to (or
above) the give value.
 This accumulated frequency is called cumulative frequency.
 These cumulative frequencies are then listed in a table is called cumulative
frequency table.
 The curve table is obtained by plotting cumulative frequencies is called a
cumulative frequency curve or an Ogive.
 There are two methods of constructing Ogive namely:
o The ‘ less than Ogive’ method
o The ‘more than Ogive’ method.
 In less than Ogive method we start with the upper limits of the classes and
go adding the frequencies. When these frequencies are plotted, we get a
rising curve.
 In more than Ogive method, we start with the lower limits of the classes
and from the total frequencies we subtract the frequency of each class.
When these frequencies are plotted we get a declining curve.
 Ogive (less-than Ogive Curve)
Class Frequency Less than
Ogive
82-104 5 5
104-126 7 12
126-148 13 25
148-170 14 39
170-192 8 47
192-214 3 50
< Cumulative
60

50

40

30

20

10

0
[82 - 104) [104 -126) [126 -148) [148 -170) [170 - 192) [192 - 214)
 Ogive (greater than Ogive Curve)
Class Frequency Greater than
Ogive
82-104 5 50
104-126 7 45
126-148 13 38
148-170 14 25
170-192 8 11
192-214 3 3

> Cumulative
60

50

40
> Cumulative
30

20

10

0
[82 - 104) [104 - 126) [126 -148) [148 -170) [170 - 192) [192 - 214)
CHAPTER THREE

MEASURES OF CENTRAL TENDENCY


AND DISPERSION
Summation Notation

Measures of central tendency
 A measures of central of tendency may be defined as single
expression of the net result of a complex group.
 Central tendency is a statistical measure that determines a
single value that accurately describes the center of the
distribution and represents the entire distribution of scores.
 There are two main objectives for the study of measures of
central tendency
• To get one single value that represent the entire data
• To facilitate comparison
 There are five averages. Among them mean, median and
mode are called simple averages and the other two averages
geometric mean and harmonic mean are called special
averages.
Arithmetic mean or mean

Arithmetic mean grouped data

 Example: Given the following frequency distribution, calculate
the arithmetic mean.
Marks 64 63 62 61 60 59
Number of 8 18 12 9 7 6
Solution:
Students

X F Fx d= X-A d’= d/1 fd and


fd’
64 8 512 2 2 16
63 18 1134 1 1 18
62 12 744 0 0 0
A
61 9 549 -1 -1 -9
60 7 420 -2 -2 -14
59 6 354 -3 -3 -18
Total 60 3713 -7

Income Birr(100) 0-10 10-20 20-30 30-40 40-50 50-60 60-70


Number of 6 8 10 12 7 4 3
persons

Income C.I frequenc Mid (X) fx d fd d’ fd’
y =d/10
0-10 6 5 30 -30 -180 -3 -18
10-20 8 15 120 -20 -160 -2 -16
20-30 10 25 250 -10 -100 -1 -10
30-40 12 35 A 420 0 0 0 0
40-50 7 45 315 10 70 1 7
50-60 4 55 220 20 80 2 8
60-70 3 65 195 30 90 3 9
Total 50 155 -200 -20
0
Merits and demerits of Arithmetic mean
 Merits:
It is rigidly defined.
It is easy to understand and easy to calculate.
If the number of items is sufficiently large, it is more accurate and
more reliable.
It is a calculated value and is not based on its position in the series.
It is possible to calculate even if some of the details of the data
are lacking.
It provides a good basis for comparison.
 Demerits:
It cannot be obtained by inspection nor located through a
frequency graph.
It cannot be in the study of qualitative phenomena not capable of
numerical measurement i.e. Intelligence, beauty, honesty etc.,
It can ignore any single item only at the risk of losing its accuracy.
It is affected very much by extreme values.
Weighted Arithmetic mean

Combined Mean

Harmonic mean (H.M)
 Harmonic mean of a set of observations is defined as the
reciprocal of the arithmetic average of the reciprocal of the given
values. If x1, x2…..xn are n observations:

, for a frequency distribution:

 Example: From the given data calculate H.M 5, 10,17,24,30

X 1
X
= 5
5 0.2000
10 0.1000 0.4338
17 0.0588
24 0.0417
= 11.53
30 0.0333
Total 0.4338
Cont’d
 Example: The marks secured by some students of a class are
given below. Calculate the harmonic mean.
Marks 20 21 22 23 24 25
No of 4 2 7 `1 3 1
Students
Solution:
Marks X No of 1/x ∑( 1/x)
students f
20 4 0.0500 0.2000
21
22
2
7
0.0476
0.0454
0.0952
0.3178
H.M = 18
23
24
1
3
0.0435
0.0417
0.0435
0.1251
0.8216
25 1 0.0400 0.0400 = 21.91
18 0.8216
Geometric mean
 The geometric mean of a series containing n observations is the
nth root of the product of the values. If x1, x2…, xn are
observations then:
For ungrouped data: For grouped data:

 Example: Calculate the geometric mean of the following series


of monthly income of a batch of families 180, 250, 490, 1400,
1050:

x Logx
180 2.2553
= Antilog = 13.5107
250 2.3979
5
490 2.6902
= Antilog 2.7021 = 503.6
1400 3.1461
1050 3.0212
Total 13.5107
 Example: Calculate the average income per head from the
data given below .Use geometric mean.
Class of people Number Monthly income
of per head (Birr)
families
Landlords 2 5000
Cultivators 100 400
Landless – labors 50 200
Money – lenders 4 3750
Office Assistants 6 3000
Shop keepers 8 750
Carpenters 6 600
Weavers 10 300
Solution
Class of people Number Monthly income Log x f logx
of per head (Birr) x
families(f)
Landlords 2 5000 3.6990 7.398
Cultivators 100 400 2.6021 260.21
0
Landless 50 200 2.3010 115.05
labors 0
Money – 4 3750 3.5740 14.296
lenders
Office 6 3000 3.4771 20.863
Assistants
Shop keepers 8 750 2.8751 23.200
Median
 The median is that value of the variant which divides the group
into two equal parts, one part comprising all values greater,
and the other, all values less than median.
 Median is defined as the value of the middle item or the mean
of the values of the two middle items when the data are
arranged in an ascending or descending order of magnitude.
 Ungrouped or Raw data:
 In an ungrouped frequency distribution if the n values are arranged in
ascending or descending order of magnitude, the median is the middle
value if n is odd.
 When n is even, the median is the mean of the two middle values. By
the formula:
Median (Md) = n +1
2
 Example: When odd number of values are given. Find median for the following data
25, 18, 27, 10, 8, 30, 42, 20, 53
Solution:
 Arranging the order 8, 10, data in the increasing 18, 20, 25, 27, 30, 42, 53
 The middle value is the 5th item i.e., 25 is the median
Median (Md) = n +1 = 9 +1 = 10/2 = 5th item = 25
2 2
 Example: When even number of values are given. Find median for the following data
5, 8, 12, 30, 18, 10, 2, 22
Solution:
 Arranging the data in the increasing order 2, 5, 8, 10, 12, 18, 22, 30
 Here median is the mean of the middle two items (i.e.) mean of (10, 12) i.e.
10 +12 = 11
2
 Using the formula,
Median (Md) = n +1
2
= 8 +1
2
= (9/2) th item = 4.5th item
= 4th item + (1/2) (5th item- 4th item)
= 10 + (1/2) (12-10)
= 10 + (1/2) x 2
= 10 +1 = 11
 Grouped Data:
 In a grouped distribution, values are associated with frequencies.
 Grouping can be in the form of a discrete frequency distribution or a
continuous frequency distribution.
 Whatever may be the type of distribution, cumulative frequencies
have to be calculated to know the total number of items.
 In the case of a grouped series, the median is calculated by linear
interpolation with the help of the following formula:
M = l1 + l2 – l1 x m-c
f
Where, m=the median
l1=the lower limit of the class in which the median
lies
l2=the upper limit of the class in which the median
lies
f= the frequency of the class in which the median lies
m= the middle item (n +1)/2th
c= the cumulative frequency of the class
preceding the one in which the median lies.
 Example: find the median for the following grouped data.

Marks of students Frequency Cumulative


frequency
12 – 17 1 1
18 – 23 2 3
24 - 29 4 7
30 – 35 13 20
36 – 41 18 38
42 – 47 12 50
m=50 +1 = 51 = 25.5
2Total2 50
It means median lies in the class interval of birr 30-35
M = l1 + l2 – l1 x m-c
f
= 36 + 41-36 x 25.5-20
18
= 36 + 5 x 5.5
18
= 37.5 marks
Graphic method for Location of median
 Median can be located with the help of the cumulative frequency curve or
‘Ogive’.
 From the point of intersection of ‘ less than’ and ‘more than’ Ogive, if a
perpendicular is drawn on the x-axis, the point so obtained on the
horizontal axis gives the value of the median.
 Merits of Median:
 Median is not influenced by extreme values because it is a
positional average.
 Median can be calculated in case of distribution with open
end intervals.
 Median can be located even if the data are incomplete.
 Median can be located even for qualitative factors such as
ability, honesty etc.
 Demerits of Median:
 A slight change in the series may bring drastic change in
median value.
 In case of even number of items or continuous series,
median is an estimated value other than any value in the
series.
 It is not suitable for further mathematical treatment except
its use in mean deviation.
 It is not taken into account all the observations.
Mode
 The mode refers to that value in a distribution, which occur
most frequently.
 It is an actual value, which has the highest concentration of
items in and around it.
 It shows the center of concentration of the frequency in
around a given value.
 Therefore, where the purpose is to know the point of the
highest concentration it is preferred.
 It is, thus, a positional measure. Its importance is very great in
marketing studies where a manager is interested in knowing
about the size, which has the highest concentration of items.
 Ungrouped or Raw Data:
Example: 2, 7, 10, 15, 10, 17, 8, 10, 2
Mode = M0=10
 In some cases the mode may be absent while in some cases there may be
more than one mode.
Example:
1. 12, 10, 15, 24, 30 (no mode)
2. 7, 10, 15, 12, 7, 14, 24, 10, 7, 20, 10
The modes are 7 and 10
 Grouped Data:
 For Discrete distribution, see the highest frequency and corresponding value of X is
mode. In case of grouped data, mode is determined by the following formula:

Where, l = the lower value of the class in which the mode lies.
f1= the frequency of the class in which the mode lies
f0= the frequency of the class preceding the modal class
f2= the frequency of the class succeeding the modal class
c= the class interval of the modal class
While applying the above formula, we should ensure that the class intervals are uniform
 Example: The following is the distribution of the size of certain farms
selected at random from a district. Calculate the mode of the distribution.
Size of No. of farms
farms
5-15 8
16-25 12
26-35 17
36-45 29
46-55 31
56-65 5
Solution:66-75 3
 The highest frequency is 31 and corresponding class interval is 46 – 55,
which is the modal class. Here l=46, f1=31, f0=29, f2=5, c=10
= 46 + 31 – 29 x 10
( 31-29) + ( 31-5)
= 46 + 2 x 10
28
= 46.7
Determination of Modal class
 For a frequency distribution modal class corresponds to the maximum frequency. But in
any one (or more) of the following cases:
 If the maximum frequency is repeated
 If the maximum frequency occurs in the beginning or at the end of the distribution
 If there are irregularities in the distribution, the modal class is determined by the
method of grouping.
 Steps for Calculation:
We prepare a grouping table with 6 columns:
1. In column I, we write down the given frequencies.
2. Column II is obtained by combining the frequencies two by two.
3. Leave the 1 frequency and combine the remaining frequencies two by two and write
st

in column III
4. Column IV is obtained by combining the frequencies three by three.
5. Leave the 1st frequency and combine the remaining frequencies three by three and
write in column V
6. Leave the 1st and 2 frequencies and combine the remaining frequencies three by
nd

three and write in column VI


 Mark the highest frequency in each column. Then form an analysis table to
find the modal class. After finding the modal classes use the formula to
calculate the modal value.
 Example: Calculate mode for the following frequency
distribution.
Class 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40
interval
Frequency 9 12 15 16 17 15 10 13

Solution:
CI f 2 Grouping
3 Table 4 5 6
0-5 9 21 36
5-10 12 27 43
10-15 15 31 48
15-20 16 33 48
20-25 17 32 42
25-30 15 25 38
30-35 10 23
35-40 13
Analysis Table
Columns 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40
1 1
2 1 1
3 1 1
4 1 1 1
5 1 1 1
6 1 1 1
Total 1 2 4 5 2
 The maximum occurred corresponding to 20-25, and hence it is the modal
class.
Here l=20, f1=17, f0=16, f2=15, c=5

Mode = 20 + (17-16) x5
(17-16) + (17-15)
= 20 + (1/3) x 5
= 21.67
Graphic Location of mode
 Steps:
1. Draw a histogram of the given distribution.
2. Join the rectangle corner of the highest rectangle (modal class rectangle)
by a straight line to the top right corner of the preceding rectangle.
Similarly the top left corner of the highest rectangle is joined to the top left
corner of the rectangle on the right.
3. From the point of intersection of these two diagonal lines, draw a
perpendicular to the x -axis.
4. Read the value in x-axis gives the mode.
 Merits of Mode:
 It is easy to calculate and in some cases it can be located
mere inspection
 Mode is not at all affected by extreme values.
 It can be calculated for open-end classes.
 It is usually an actual value of an important part of the
series.
 In some circumstances it is the best representative of data.
 Demerits of mode:
 It is not based on all observations.
 It is not capable of further mathematical treatment.
 Mode is ill-defined generally; it is not possible to find mode
in some cases.
 As compared with mean, mode is affected to a great extent,
by sampling fluctuations.
 It is unsuitable in cases where relative importance of items
has to be considered.
Empirical Relationship Between Averages
 In a symmetrical distribution the three simple averages: mean = median =
mode.
 For a moderately asymmetrical distribution, the relationship between them are
brought by Prof. Karl Pearson as, mode = 3median - 2mean.
Example: If the mean and median of a moderately asymmetrical series are 26.8
and 27.9 respectively, what would be its most probable mode?
Solution:
Mode = 3 median - 2 mean
= 3 x 27.9 - 2 x 26.8
= 30.1
Example: In a moderately asymmetrical distribution the values of mode and mean
are 32.1 and 35.4 respectively. Find the median value.
Solution:
Median =1/3[2mean+mode]
=1/3 [2 x 35.4 + 32.1]
= 34.3
Example: In a moderately asymmetrical distribution the values of mode and
median are 30.1 and 34.3 respectively. Find the mean value.
Mean = ½(3 median +mode)
= ½(3x34.3 +30.1)
= 66.5
Quartiles
 The quartiles divide the distribution in to four parts.
 The first (lower) quartile (Q1) marks off the first one-fourth, the third
(upper) quartile (Q3) marks off the three-fourth.
 The second quartile divides the distribution into two halves and
therefore is the same as the median.

25% Q1 25% Q2 25% Q3 25%


25% items will be less than Q1,
50% of items will be less than Q2
75% items are less than Q3.
 Raw or ungrouped data:
 First arrange the given data in the increasing order and use the formula for Q and Q
1 3

then:
Where Q1= (n +1) th item and Q3 = 3 (n +1) th item
4 4
 Example: Compute quartiles for the data given below 25, 18,
30, 8, 15, 5, 10, 35, 40, 45
Solution:
5, 8, 10, 15, 18, 25, 30, 35, 40, 45
Q1= n +1 th item, 10 +1 = 2.75th item
4 4
= 2nd item + (3/4) (3rd item-2nd item)
= 8 + (3/4) (10-8)
= 9.5
Q3 =3 n +1 th item
4
= 3 x (2.75)th item
= (8.25)th item
= 8th item + (1/4) (9th item-8th item)
= 35 + (1/4) (40-35) = 36.25
 Discrete Series:
 Step1: Find cumulative frequencies.
 Step2: Find n +1
4
 Step3: See in the cumulative frequencies, the value just
greater than, n +1 then the corresponding value of X is
Q1.
4
 Step4: Find 3 (n +1)
4
 Step5: See in the cumulative frequencies, the value just
greater than 3 (n +1) then the corresponding value of X is
Q3.
4
 Example: Compute quartiles for the data given bellow.
X 5 8 12 15 19 24 30
f 4 3 2 4 5 2 4

Solution:
X F c.f
5 4 4
8 3 7
12 2 9
15 4 13
19 5 18
24 2 20
30 4 24
Total 24

Q1 = n +1 = 24 + 1 = 25/4 = 6.25th item


4 4
Q3 = 3 (n +1) th item = 3 (24 +1) = 3 x 6.25 = 18.75th item, Q1= 8, Q3= 24
4 4
Continuous series :
Step1: Find cumulative frequencies
Step2: Find N
4
Step3: See in the cumulative frequencies, the value just greater than N then the
4
Corresponding class interval is called first quartile class.
Step4: Find 3 N See in the cumulative frequencies the value just greater than 3
N
4 4
 Then the corresponding class interval is called 3rd quartile class. Then apply the
respective formulae:
Q1= l + N/4 –
mxc
f
Q= l + 3(N/4)
-mxc
f
Where, l= the lower
limit of the quartile class
L= the upper
limit of the quartile class
f= the
frequency of the first quartile class
c1 = width of the first quartile class
m1 = c.f. preceding the first quartile class
f3 =
Example: The following series relates to the marks secured by students
in an examination. Find the quartiles:
Solution:
Marks No. of students
0-10 11 CI F cf
10-20 18 0-10 11 11
20-30 25 10-20 18 29
30-40 28 20-30 25 54
40-50 30 30-40 28 82
50-60 33 40-50 30 112
60-70 22 50-60 33 145
70-80 15 60-70 22 167
80-90 12 70-80 15 182
90-100 10 80-90 12 194
N/4 = 204/4 = 51 90-100 10 204
3(N/4) = 3(51) = 153
Total 204
Q1= l + N/4 –m x c
f
= 20 + 51- 29 x 10
25
= 28.8
Q= l + 3(N/4) - m x c
f
= 60 + 3(51) – 145 x 10
22
= 64.36
Deciles
 These are the values, which divide the total number of
observation into 10 equal parts. These are 9 Deciles D1, D2…
D9. These are all called first decile, second decile…etc.
 Deciles for Raw data or ungrouped data
Example: Compute D5 for the data given below
5, 24, 36, 12, 20, 8
Solution: Arranging the given values in the increasing order
5, 8, 12, 20, 24, 36
D5 = 5(n +1) = 5(6 +1) = 35/10 = 3.5th item
10 10
= 3rd item + ½(4th – 3rd item)
= 12 + ½(20-12)
= 16
 Deciles for Grouped data:
Example: Calculate D3 and D7 for the data given below.
Class Interval 0-10 10-20 20-30 30-40 40-50 50-60 60-70
Frequency 5 7 12 16 10 8 4
Solution:

D 3 item = 3(N) th item, 3(62)


10 10
= 18.6 which lies the interval of 20-30
C.I F cf
D3 = l + 3(n)/10)) – m x c
f 0-10 5 5
= 20 + 3(62)/10)) – 12 x10 10-20 7 12
12
= 25.5 20-30 12 24
D 7 item = 7(N) th item, 7(62) 30-40 16 40
10 10
40-50 10 50
= 43.4 which lies the interval of 40-50
50-60 8 58
D7 = l + 7(n)/10)) – m x c
60-70 4 62
f
= 40 + 43.4 – 40 x10 Total 62
10
= 43.4
Percentiles
 The percentile values divide the distribution into 100 parts
each containing 1 percent of the cases. The percentile (Pk) is
that value of the variable up to which lies exactly k% of the
total number of observations.
 Relationship: P25 = Q1 ; P50 = D5 = Q2 = Median and P75 = Q3
Example: Calculate P15 for the data given below:
5, 24, 36, 12, 20, 8
Solution: Arranging the given values in the increasing order.
5, 8, 12, 20, 24, 36
P15 = 15(n +1) = 15(6 +1) = (1.05)th item
100 100
= 1st item + 0.05(2nd – 1st item)
= 5 + 0.05(8-5)
= 5.15
 Percentile for grouped data:
Example: Find P53 for the following frequency distribution.
Class 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40
Interval
Solution:
Frequency 5 8 12 16 20 10 4 3

C.I F cf
0-5 5 5
5-10 8 13
10-15 12 25
15-20 16 41
20-25 20 61
25-30 10 71
30-35 4 75
35-40 3 78
P53 = lTotal
+ 53(n)/100))78
–mxc
f
= 20 + 41.3-41 x 5
20
= 20.085
Measures of Dispersion
 The measure of central tendency serve to locate the center of
the distribution, but they do not reveal how the items are
spread out on either side of the center.
 This characteristic of a frequency distribution is commonly
referred to as dispersion.
 Dispersion is the degree to which numerical data tend to
spread about an average value.
 In a series all the items are not equal; there is difference or
variation among the values.
 The degree of variation is evaluated by various measures of
dispersion. Small dispersion indicates high uniformity of the
items, while large dispersion indicates less uniformity.
Absolute and Relative Measures
 Absolute measure of dispersion indicates the amount of variation in
a set of values in terms of units of observations.
 Absolute measures are not suitable for comparing the variability of
two distributions which are expressed in different units of
measurement and different average size.
 Relative measures of dispersion are free from the units of
measurements of the observations. They are pure numbers.
 They are used to compare the variation in two or more sets, which
are having different units of measurements of observations.
 The various absolute and relative measures of dispersion are listed
below.
Absolute measure Relative measure
1. Range 1.Co-efficient of Range
2. Quartile deviation 2.Co-efficient of Quartile deviation
3. Mean deviation 3. Co-efficient of Mean deviation
4. Standard deviation 4.Co-efficient of variation
Range and coefficient of Range
i. Range:
 It is as the difference between the largest and smallest values of the
variable.
Range = L – S.
Where L = Largest value.
S = Smallest value.
 In individual observations and discrete series, L and S are easily
identified. In continuous series, the following two methods are followed.
Method 1: L = Upper boundary of the highest class
S = Lower boundary of the lowest class.
Method 2: L = mid value of the highest class.
S = mid value of the lowest class.
ii. Co-efficient of Range:
Co-efficient of Range = L – S
L+S
Example: Find the value of range and it’s co-efficient for the
following data.
7, 9, 6, 8, 11, 10, 4
Solution:
L=11, S = 4.
Range = L – S = 11- 4 = 7
Co-efficient of Range = L – S = 11- 4 = 7/15 = 0.4667
L + S 11 +4
Example 2:
Calculate range and its co efficient from the following
distribution.
Size: 60-63 63-66 66-69 69-72 72-75
Number: 5 18 42 27 8
Solution:
L = Upper boundary of the highest class = 75
S = Lower boundary of the lowest class = 60
Range = L – S = 75 – 60 = 15
Co-efficient of Range = L – S = 75- 60 = 15/135 = 0.11111
L + S 75 +60
 Merits and Demerits of Range:
 Merits:
It is simple to understand.
It is easy to calculate.
In certain types of problems like quality control, weather
forecasts, share price analysis, etc., range is most widely
used.
 Demerits:
It is very much affected by the extreme items.
It is based on only two extreme observations.
It cannot be calculated from open-end class intervals.
It is not suitable for mathematical treatment.
It is a very rarely used measure.
Quartile Deviation and Co efficient of Quartile
Deviation
i. Quartile Deviation ( Q.D) :
 Quartile Deviation is half of the difference between the first and third
quartiles.
Q. D =Q3 – Q1
2
ii. Co-efficient of Quartile Deviation :
Co-efficient of Q.D = Q3 – Q1
Q3 + Q1
Example: Find the Quartile Deviation for the following data:
391, 384, 591, 407, 672, 522, 777, 733, 1490, 2488
Solution:
Arrange the given values in ascending order.
384, 391, 407, 522, 591, 672, 733, 777, 1490, 2488.
Position of Q1 is n +1 =10 +1 = 2.75th item
4 4
Q1 = 2nd value + 0.75 (3rd value – 2nd value)
= 391 + 0.75 (407 – 391)
= 391 + 0.75 ´ 16
= 391 + 12
= 403
Position Q3 is 3 n +1 = 3 ´ 2.75 = 8.25th item
4
Q3 = 8th value + 0.25 (9th value – 8th value)
= 777 + 0.25 (1490 – 777)
= 777 + 0.25 (713)
= 777 + 178.25 = 955.25
Q.D = Q3 – Q1 = 955.25 - 403
2 2
= 276.125
Example: For the date given below, give the quartile deviation and coefficient
of quartile deviation.
X: 351 – 500 501 – 650 651 – 800 801–950 951–1100
f: 48 189 88 4 28
Solution:
x F True class Cumulative
Intervals frequency
351-500 48 350.5- 500.5 48
501-650 189 500.5- 650.5 237
651- 800 88 650.5- 800.5 325
801-950 4 800.5- 950.5 372
951-1100 28 950.5- 400
1100.5
Total N = 400
Q1= l1 + N/4 –m1 x c1 = N/4 = 400/4=100
f1
Q1 Class is 500.5 – 650.5
l1 = 500.5, m1 = 48, f1 = 189, c1 = 150
Q1= 500.5 + 100 – 48 x 150
189
Q1 = 541.77
Q3= l3 + 3(N/4) - m3 x c3
f3
3(N) = 3x100 = 300
4
Q3 Class is 650.5 – 800.5
l3 = 650.5, m3 = 237, f3 = 88, C3 = 150
Q3 = 650.5 + 300-237 x 150
88
Q3= 757. 89
Q.D = Q3 – Q1 = 757.89-541.77
2 2
Q.D = 108.06
Coefficient of Q.D = Q3 - Q1 = 757.89 – 541.77 = 0.1663
Q3 + Q1 757.89 + 541.77
 Merits and Demerits of Quartile Deviation
 Merits:
It is Simple to understand and easy to calculate
 It is not affected by extreme values.
It can be calculated for data with open end classes
also.
 Demerits:
It is not based on all the items. It is based on two
positional values Q1 and Q3 and ignores the extreme
50% of the items.
 It is not amenable to further mathematical
treatment.
It is affected by sampling fluctuations.
Mean Deviation and Coefficient of Mean Deviation

i. Mean Deviation:
 The range and quartile deviation are not based on all
observations. They are positional measures of dispersion.
They do not show any scatter of the observations from an
average.
 The mean deviation is measure of dispersion based on all
items in a distribution.
 Mean deviation is the arithmetic mean of the deviations of a
series computed from any measure of central tendency; i.e.,
the mean, median or mode, all the deviations are taken as
positive i.e., signs are ignored.
 According to Clark and Schekade, “Average deviation is the
average amount scatter of the items in a distribution from
either the mean or the median, ignoring the signs of the
deviations”.
ii. Coefficient of mean deviation:
 Mean deviation calculated by any measure of central
tendency is an absolute measure.
 For the purpose of comparing variation among different series,
a relative mean deviation is required.
 The relative mean deviation is obtained by dividing the mean
deviation by the average used for calculating mean deviation.
Coefficient of mean deviation: = Mean deviation
Mean or Median or Mode
 If the result is desired in percentage, the coefficient of mean
Deviation = Mean deviation x 100
Mean or Median or Mode
Computation of mean deviation – Individual Series:
 Calculate the average mean, median or mode of the series.
 Take the deviations of items from average ignoring signs and denote
these deviations by |D|.
 Compute the total of these deviations, i.e., S |D|
 Divide this total obtained by the number of items.
Symbolically: M.D. = ∑ (D)
n
Example: Calculate mean deviation from mean and median for the following data:
100,150,200,250,360,490,500,600,671 also calculate coefficients of M.D.
Solution:

Now arrange the data in ascending order


100, 150, 200, 250, 360, 490, 500, 600, 671
Median = Value of n + 1 item
th

2
= Value of 9 +1 th item
2
= Value of 5th item
= 360
X D = (X – x) D = (x -Md)
100 269 260
150 219 210
200 169 160
250 119 110
360 9 0
490 121 130
500 131 140
600 231 240
671 302 311
Total= 3321 1570 1561

NB: X= frequency, x= mean, Md= median


M.D from mean = ∑(D)
n
= 1570 = 174.44
9
Co-efficient of M.D =M.D = 174.44 = 0.47
x 369
M.D from median = ∑ (D)
n
= 1561 =173.44
9
Co-efficient of M.D. = M.D = 173.44 = 0.48
Median 360
Mean Deviation – Discrete series:
 Find out an average (mean, median or mode)
 Find out the deviation of the variable values from the average, ignoring
signs and denote them by (D)
 Multiply the deviation of each value by its respective frequency and find
out the total ∑f (D)
 Divide ∑f( D) by the total frequencies N
Symbolically, M.D. = ∑f (D)
N
Example: Compute Mean deviation from mean and median from the following
data:
Height in cms 158 15 16 161 162 163 164 165 166
9 0
No. of persons 15 20 32 35 33 22 20 10 8
Also compute coefficient of mean deviation.
Height No. of d= x- A fd |D| = |X- f|D|
X Persons A =162 mean|
f
158 15 -4 -60 3.51 52.65
159 20 -3 -60 2.51 50.20
160 32 -2 -64 1.51 48.32
161 35 -1 -35 0.51 17.85
162 33 0 0 0.49 16.17
163 22 1 22 1.49 32.78
164 20 2 40 2.49 49.80
165 10 3 30 3.49 34.90
166 8 4 32 4.49 35.92
Total 195 -95 338.59
NB: A= stand for the assumed mean (value) i.e. it shows that how much
the individual observation far from the center.
x = A + ∑ fd
N
=162+(-95)
195
Mean = 162 – 0.49 = 161.51
M.D. = ∑ f|D| = 338.59 = 1.74
N 195
Coefficient of M.D. = M.D = 1.74 = 0.0108
X (mean) 161.51
 Compute coefficient of mean deviation from the median.
Height No. of c.f. |D| = |X- median| f|D|
X Persons f
158 15 15 3 45
159 20 35 2 40
160 32 67 1 32
161 35 102 0 0
162 33 135 1 33
163 22 157 2 44
164 20 177 3 60
165 10 187 4 40
166 8 195 5 40
Total 195 ∑f|D| 334
Median = Size of n +1 th item = 195 +1 th item
2 2
= size of 98th item
= 161
M.D. = ∑ f|D| = 334= 1.71
N 195
Coefficient of M.D. = M.D = 1.71 =0.0106
Median 161
Mean deviation-Continuous series:
 The method of calculating mean deviation in a continuous series same
as the discrete series.
 In continuous series we have to find out the mid points of the various
classes and take deviation of these points from the average selected.
M.D = ∑f | D |
N Where D = m - average
m = Mid-point
Example: Find out the mean deviation from mean and median from the
following series.
Age in years No. of persons
0-10 20
10-20 25
20-30 32
30-40 40
40-50 42
50-60 35
60-70 10
70-80 8

X m f d = (m- A)/c fd |D| = |X- f|D|


(A=35,C=10) mean|

0-10 5 20 -3 -60 31.5 630.0


10-20 15 25 -2 -50 21.5 537.5
20-30 25 32 -1 -32 11.5 368.0
30-40 35 40 0 0 1.5 60.0
40-50 45 42 1 42 8.5 357.0
50-60 55 35 2 70 18.5 647.5
60-70 65 10 3 30 28.5 285.0
70-80 75 8 4 32 38.5 308.0
Total 212 32 3193.0
 Calculation of median and M.D. from median:
X m f c.f |D| = |m-Md| f|D|
0-10 5 20 20 32.25 645.00
10-20 15 25 45 22.25 556.25
20-30 25 32 77 12.25 392.00
30-40 35 40 117 2.25 90.00
40-50 45 42 159 7.75 325.50
50-60 55 35 194 17.75 621.25
60-70 65 10 204 27.75 277.50
70-80 75 8 212 37.75 302.00
Total 212 3209.50
N/2 = 212/2 = 106
l = 30, m = 77, f = 40, c = 10
Median = l + (N/2) – m x c
f
= 30 + 106- 77 x 10
40
= 37.25
M.D = ∑f | D | = 3209.50 = 15.14
N 212
Coefficient of M.D. = M.D = 15.14 = 0.41
Median 37.25
 Merits and Demerits of M.D:
Merits:
It is simple to understand and easy to compute.
It is rigidly defined.
It is based on all items of the series.
It is not much affected by the fluctuations of sampling.
It is less affected by the extreme items.
It is flexible, because it can be calculated from any
average.
It is better measure of comparison.
Demerits:
It is not a very accurate measure of dispersion.
It is not suitable for further mathematical calculation.
It is rarely used. It is not as popular as standard deviation.
Algebraic positive and negative signs are ignored. It is
mathematically unsound and illogical.
Standard Deviation and Coefficient of variation

b. Deviations taken from assumed mean:
 This method is adopted when the arithmetic mean is fractional value.
 Taking deviations from fractional value would be a very difficult and
tedious task.
 To save time and labour, we apply short–cut method; deviations are
taken from an assumed mean. The formula is:

Where d-stands for the deviation from assumed mean =


(X-A)
Steps:
 Assume any one of the item in the series as an average (A)
 Find out the deviations from the assumed mean; i.e., X-A denoted by d
and also the total of the deviations ∑d
 Square the deviations; i.e., d2 and add up the squares of deviations, i.e.,
∑d2
 Then substitute the values in the following formula:
 For the frequency distribution:

Values (X)
14 -1 1
22 7 49
9 -6 36
15 0 0
20 5 25
17 2 4
12 -3 9
11 -4 16
120 140
Example: The table below gives the marks obtained by 10 students in
statistics. Calculate standard deviation.
Student Nos: 1 2 3 4 5 6 7 8 9
10
Marks: 43 48 65 57 31 60 37 48 78
Solution:
59 (Deviations from assumed mean)
Nos. Marks (x) d=X-A (A=57) d2
1 43 -14 196
2 48 -9 81
3 65 8 64
4 57 0 0
5 31 -26 676
6 60 3 9
7 37 -20 400
8 48 -9 81
9 78 21 441
10 59 2 4
n = 10 ∑d=-44 ∑d2 =1952

 Assumed mean method:
 Here deviations are taken not from an actual mean but from an
assumed mean. Also this method is used, if the given variable values
are not in equal intervals.
Steps:
 Assume any one of the items in the series as an assumed mean and
denoted by A.
 Find out the deviations from assumed mean, i.e., X-A and denote it by d.
 Multiply these deviations by the respective frequencies and get the ∑fd
 Square the deviations (d2 .
)

 Multiply the squared deviations (d2 by the respective frequencies (f)


)

and get ∑fd2 .

 Substitute the values in the following formula:

Where d = X -A, N = ∑f.


Example: Calculate Standard deviation from the following data.
X: 20 22 25 31 35 40 42 45
f: 5 12 15 20 25 14 10 6
Solution:
Deviations from assumed mean
x f d = x –A (A = 31) d2 fd fd2
20 5 -11 121 -55 605
22 12 -9 81 -108 972
25 15 -6 36 -90 540
31 20 0 0 0 0
35 25 4 16 100 400
40 14 9 81 126 1134
42 10 11 121 110 1210
45 6 14 196 84 1176
N=107 åfd=167 åfd2 =6037
 Calculation of standard deviation: Continuous series
 If the variable values are in equal intervals, then we adopt this method.
Steps:
 Assume the center value of the series as assumed mean A
 Find out d = x - A, where C is the interval between each
value
C
 Multiply these deviations d’ by the respective frequencies
and get ∑fd
 Square the deviations and get d2
 Multiply the squared deviation (d2 ) by the respective
frequencies (f) and obtain the total ∑fd2
 Substitute the values in the following formula to get the
standard deviation.
Example: let us take the following distribution relating to marks obtained by
students in an examination:
Marks 0-10 10-2 20-3 30-4 40-5 50-6 60-7 70-8 80-9 90-10
0 0 0 0 0 0 0 0 0
No of 1 3 6 10 12 11 6 3 2 1
Solution:
students

Marks No of Mid d’ = (m- A)/ fd’ d’ x fd’= fd’2


students Value (m) c
f (A=55,C=10)
0-10 1 5 -5 -5 25
10-20 3 15 -4 -12 48
20-30 6 25 -3 -18 54
30-40 10 35 -2 -20 40
40-50 12 45 -1 -12 12
50-60 11 55 0 0 0
60-70 6 65 1 6 6
70-80 3 75 2 6 12
80-90 2 85 3 6 18
90-100 1 95 4 4 16
Total N=55 ∑fd’ =-45 ∑fd’2= 231

= √231 –(-45 2) x 10
55 55
= 18.8 marks
Combined Standard Deviation


Village
A B
No of people 600 500
Average income 175 186
Standard deviation of income 10 9
 Merits and Demerits of Standard Deviation:
 Merits:
It is rigidly defined and its value is always definite and
based on all the observations and the actual signs of
deviations are used.
As it is based on arithmetic mean, it has all the merits of
arithmetic mean.
It is the most important and widely used measure of
dispersion.
It is possible for further algebraic treatment.
It is less affected by the fluctuations of sampling and
hence stable.
It is the basis for measuring the coefficient of correlation
and sampling.
 Demerits:
It is not easy to understand and it is difficult to calculate.
It gives more weight to extreme values because the values
are squared up.
As it is an absolute measure of variability, it cannot be
used for the purpose of comparison.
Coefficient of Variation


Factory Average Standard Deviation No. of workers
A 34.5 5 476
B 28.5 4.5 524
Example: Prices of a particular commodity in five years in two cities are given below: Which city has
more stable prices?

Price in city A Price in city B


20 10
22 20
19 18
23 12
16 15
Solution:
Actual mean method
City A City B
Prices Deviations dx2 Prices Deviations dy2
(X) from X=20 (Y) from Y =15 dy
dx
20 0 0 10 -5 25
22 2 4 20 5 25
19 -1 1 18 3 9
23 3 9 12 -3 9
16 -4 16 15 0 0
∑x=100 ∑dx=0 ∑dx2=3 ∑y=75 ∑dy=0 ∑dy2 =68
0
City A: X = ∑x = 100 = 20
n 5
 City A had more stable prices than City B, because the coefficient of
variation is less in City A.
CHAPTER FOUR

PROBABILITY THEORY AND


DISTRIBUTION
Probability Theory
 Probability is the chance of occurrence of an
outcome of a particular experiment.
 It is the measure of how likely an outcome is to
occur.
 Probability theory helps us:
To cope up with uncertainty in making
decision
To evaluate the possible risks of the future and
To plan for reducing the risks.
 An event that can’t occur has a probability zero,
and an event that is certain to occur has a
probability of one.
SOME BASIC CONCEPTS


Fundamental rules of probability
Rule 1: the probability of any event is a real number or zero. It cannot be
negative. Symbolically, probability of P (A) ≥ 0
Rule 2: the sum of the probability of all possible mutually exclusive events is
unity. Symbolically, P (A) + P (B) + P(C) + ….. P (N) = 1
Rule 3: the probability of either of two mutually exclusive events, say A and B,
occurring is equal to the sum of their probabilities: P (A or B) = P (A) + P (B).
Example 1: suppose we have a box with 3 red, 2 black and 5 white balls. Each
time a ball is drawn, it is returned to the box. What is the probability of drawing?
 Either a red or a black ball?
 Either a white or a black ball?
Solution
The probability of drawing the specific color of ball is:
P (red) = 0.3, P (black) = 0.2 and P (white) = 0.5
Applying rule 2 = P (red) + P (black) + P (white)
= 0.3 + 0.2 + 0.5 = 1
Probability of drawing either a red or black ball
P (red) + P (black) = 0.3 +0.2 = 0.5
Probability of drawing either a white or a black ball
P (white) + P (black) = 0.5 + 0.2 = 0.7
Approaches to Probability
 The assignment of probability of the event or outcome of an
experiment depends on two approaches of probability:
 Objective probability

 Subjective probability.
i. Objective Probability:
 Based on the objective of assigning probability, there are two
types of probability:
 Classical probability

 Empirical probability.


Computation of probability


 Multiplication rule:
 If there are m ways of doing one thing and n ways of doing
another thing, there are m*n ways of doing both where there
are more than one groupings from which you made selection.
Example: If the home builder offered you four different exterior
styles of a home to choose from and three interior floor plans,
how many different arrangements of interior and exterior styles
of a home can be offered?
Solution:
4*3 = 12 , let an exterior styles A, B, C, D and interior design as E, F, G

A E C E
F F
G G

B E D E
F F
G G


Counting Rules
 In order to calculate probabilities, we have to know the number of
elements of an event and the number of elements of the sample space.
 In order to determine the number of outcomes, one can use several rules
of counting.
 Permutation rule
 Combination rule

i. Permutation:
 Permutation is applied to find the possible number of arrangements when
there is only one group of objects in a specified order.
Permutation Rules:
 The number of permutations of n distinct objects taken all together is n!
Where n!=n*(n-1)*(n-2)……3*2*1
 The arrangement of n objects in a specified order using r objects at a time
is called the permutation of n objects taken r objects at a time. It is
written as and the formula is :


Probability Distribution
 Probability distribution describes how the probability
is spread over the possible numerical values
associated with the outcomes.

 Example: Consider the experiment of tossing a coin

three times. Construct the probability distribution of


the number of heads X.
Outcome TTT TTH THT THH HTH HTT HHT HHH
s
X 0 1 1 2 2 1 2 3
P( X) 0 1/8 1/8 2/8 2/8 1/8 2/8 3/8
Random Variable
 Random variable is a variable whose values are determined by chance
with associated probabilities.
 A random variable (R.V.) is a rule that assigns a numerical value to each
possible outcome of a random experiment.
 Random variable can be classified into two:
 Discrete random variable
 Continuous random variable
i. Discrete Random Variable:
 Discrete random variables produce outcomes that come from a
counting process.
 Are variables which can assume only a specific counting number of
values.
Example: Consider an experiment of "flipping a fair coin 3 times". Count the
number of head observed.
S = {TTT, TTH, THT, THH, HTH, HHT, HTT, HHH}
Let X be the number of head observed in each outcome
ii. Continuous Random Variable:
 Continuous random variables produce outcomes that come
from a measuring process.
 Continuous random variables occur when we deal with
quantities that are measured on a continuous scale.
 Are variables that can assume all values between any two give
values.
Examples:
 Height of students at certain college.

 Mark of a student.

 Life time of light bulbs.


Expected Value and Variance


Discrete Probability Distributions
A. Binomial Distribution:
 The outcomes of the binomial experiment and the corresponding probabilities of
these outcomes are called Binomial Distribution.

 Conditions necessary for Binomial Distribution:


 Each observation is classified in to two categories such as success and
failure.
 It is necessary that the probability of success (or failure) remains the same
for each observation in each trial.
 The trial or individual observations must be independent of each other. In
other words, no trail should influence the outcome another trial.
 The Binomial distribution (q + p) in general term = nCr.qn-r pr
Where, nCr = n!
r!(n-r)!
r is the number of ways in which we can get r success and
n-r failures out of n trails.




Continuous Probability Distributions
 Normal Distribution:
 In the normal distribution, you can calculate the probability that values
occur within certain ranges or intervals.
 A normal distribution is a continuous probability distribution for a random
variable x.
 The graph of a normal distribution is called the normal curve.
 A normal distribution has the following properties.
 The mean, median, and mode are equal.
 The normal curve is bell shaped and is symmetric about the mean.
 The total area under the normal curve is equal to one.
 The normal curve approaches, but never touches, the x-axis as it
extends farther and farther away from the mean.
 Between μ- δ and μ + δ (in the center of the curve) the graph curves
downward. The graph curves upward to the left of μ- δ and to the right
of μ + δ. The points at which the curve changes from curving upward to
curving downward are called inflection points.

 If necessary, we can then convert back to the original units of
Measurement. To do this, simply note that, if we take the
formula for Z, multiply both sides by σ, and then add μ to both
sides, we get:
X=Zσ+μ
Example: The weekly incomes of large group of middle managers
are normally distributed with a mean of 2685birr and a standard
deviation of 268.5birr. What is the Z value for an income of
a. 3000birr
b. 2500birr
c. What is the probability that a particular manager will earn
an income of 3200birr?

Exercise
1. A random variable X has a normal distribution with
mean 80 and standard deviation 4.8. What is the
probability that it will take a value
a. Less than 87.2
b. Greater than 76.4
c. Between 81.2 and 86.0
2. Of a large group of men, 5% are less than 60 inches
in height and 40% are between 60 & 65 inches.
Assuming a normal distribution, find the mean and
standard deviation of heights.
Normal Approximation to Binomial Probabilities

 Example: Thirty-eight percent of people in the United States
admit that they snoop in other people’s medicine cabinets.
You randomly select 200 people in the United States and ask
each if he or she snoops in other people’s medicine cabinets.
What is the probability that at least 70 will say yes?
Solution:
 Because np = 200(0.38) = 76 and nq = 200(0.62) = 124 the binomial
variable x is approximately normally distributed with
 µ = np = 76, δ = = √200 x 0.38 x 0.62 = 6.86.
Continuity correction factor ( 0.5)
 When you use a continuous normal distribution to
approximate a binomial probability, you need to move 0.5 unit
to the left and right of the midpoint to include all possible x-
values in the interval.
 It is the addition / subtraction of 0.5 to or from a district
random variable. If we have some value being estimated, thus
the relative correction factor is as follows:
 Example: Use a correction for continuity to convert each of
the following binomial intervals to a normal distribution
interval.
1. The probability of getting between 270 and 310 successes,
inclusive
2. The probability of at least 158 successes
3. The probability of getting less than 63 successes
Solution:
1. The discrete midpoint values are 270, 271,,,,, 310. The
corresponding interval for the continuous normal
distribution is: 269.5 < x < 310.5.
2. The discrete midpoint values are 158, 159, 160, . The
corresponding interval for the continuous normal
distribution is: X > 157.5
3. The discrete midpoint values are , 60, 61, 62.The
corresponding interval for the continuous normal
distribution is: X < 62.5.
 Example: Thirty-eight percent of people in the United States admit that
they snoop in other people’s medicine cabinets. You randomly select 200
people in the United States and ask each if he or she snoops in other
people’s medicine cabinets. What is the probability that at least 70 will say
yes?
Solution:
 Because np = 200(0.38) = 76 and nq = 200(0.62) = 124 the binomial
variable x is approximately normally distributed with
µ = np = 76, δ = = √200 x 0.38 x 0.62 = 6.86.
 Using the correction for continuity, you can rewrite the discrete probability
P(x ≥ 70) as the continuous probability P(x ≥ 69.5).
 The graph shows a normal curve with µ = 76 and δ = 6.86 and a shaded
area to the right of 69.5.
 The z-score that corresponds to 69.5 is z = (69.5 – 76) / 6.86 = -0.95. So,
the probability that at least 70 will say yes is
P(x ≥ 69.5) = P (z ≥ -0.95)
= 1 – P (z ≤ -0.95)
= 1 – 0.1711 = 0.8289
End of z course

10Q 4 u’r Attention!

You might also like