MTH212
MTH212
INTRODUCTION TO BUSINESS
STATISTICS
By Baldada K. (MBA)
1.1.Definition of Statistics
Statistics is defined differently by different authors over a
period of time. In the olden days statistics was confined
to only state affairs but in modern days it embraces
almost every sphere of human activity.
Statistics can be defined in broader sense and narrower
sense.
In broader sense statistics is defined as an aggregate
figures or facts about a certain events or situations or
attributes of variables.
Example: the numbers of daily accidents recorded per
month.
In a narrower sense, statistics is defined as: “a science
of collecting, summarizing, presenting and analyzing
data as well as drawing conclusions or making
predictions about population on the basis of sample
analysis.”
Contd
• Statistics is defined as the science which deals with the method
of collecting, classifying, presenting, comparing, and interpreting
numerical data. “Seligman”
• The science of statistics is the method of judging, collecting
natural or social phenomena from the results obtained from the
analysis or enumeration or collection of estimates. “King”
• A.L. Bowley has defined statistics as: (i) statistics is the science
of counting, (ii) Statistics may rightly be called the science of
averages, and (iii) statistics is the science of measurement of
social organism regarded as a whole in all its manifestations.
• Statistics is defined as the science of estimates and probabilities.
“Boddington”
• Generally for our purpose statistics is defined as methods
specially adapted to the collection, classification, analysis, and
interpretation data for making effective decisions in all functional
areas of management.
1.2. Some key terms in statistics
• Population - is the collection of all possible observations of a
specified characteristic of interest.
• Sample – is the portion or part of the population of interest.
• Element – entity on which data are collected. For example, a single
student from whom information is collected.
• Census or complete enumeration: - a study that includes every
member of the target population, but it is too costly & time
consuming. For example, study that involves all management
department students of Arba Minch university.
• Parameter – is the population characteristics of interest. For
example, mean and standard deviation of the population.
• Statistic – is the characteristics of sample. For example, mean and
standard deviation of sample.
• Variable- is a characteristic that assumes different values for
different elements. E.g. Students’ GPA, height of students, weight
of students or it could be price of commodities or other things.
Characteristics of Statistics
Statistics are aggregates of facts: This means a single
figure is not statistics.
Statistics are affected by a number of factors: For
example sale of a product depends on a number of
factors such as its price, quality, competition, the
income of consumers, and so on.
Statistics must be reasonably accurate: if wrong
figures is analyzed, it will lead to erroneous conclusion.
Statistics must be collected in a systematic manner: If
data are collected in a haphazard manner, they will not
be reliable and will lead to misleading conclusions.
Importance of Statistics in Business
There are three major functions in any business enterprise
in which statistical methods are useful.
The planning functions: This may relate to either special projects
or to the recurring activities of the firm over specified period.
The setting up standards: This may relate to the size of
employment, volume of sales, fixation of quality norms
for the manufactured products, norms for daily output,
and so forth.
The function of control: This involves comparison of
actual production achieved against the norm or target
set earlier.
1.3 Types of Statistics
The statisticians commonly classify this subject in to two
broad categories:
Descriptive statistics: Descriptive Statistics describe the
data set that’s being analyzed, but doesn’t allow us to
draw any conclusions or make any interferences about
the data. Example: Arba Minch University was graduate
students in the year of 2005 is 2050, in the year of 2006 is
2500 and in the year of 2007 is 3500, this belongs to the
domain of descriptive statistics.
Inferential statistics: Inferential statistics is also a set of
methods, but it is used to draw conclusions or inferences
about characteristics of populations based on data from
a sample. Example: The average per capital income of all
Ethiopian population can be estimated from figures
obtained from a sample of the population is $3.
1.4. TYPES OF VARIABLES OR DATA
0 0 5 0 0
6 5 3 6 7 3 5 5 5 6
0 5 8 5 5 8 0 5 5 0
Frequency distribution classified in to two: 6 6 7 7 8
i. Discrete (or) Ungrouped frequency distribution: 0 5 0 5 0
In this form of distribution, the frequency refers to discrete value.
Here the data are presented in a way that exact measurements of units are
clearly indicated.
There are definite differences between the variables of different groups of
items. Each class is distinct and separate from the other class.
Non-continuity from one class to another class exists.
The process of preparing this type of distribution is:
o Count the number of times a particular value is repeated.
o Prepare a column of tallies.
o Finally count the number of bars or tallies and get frequency.
Example: In a survey of 40 families in a village, the number of
children per family was recorded and the following data
obtained.
1 0 3 2 1 5 6 2
2 1 0 3 4 2 1 6
3 2 1 5 3 3 2 4
2 2 3 0 2 1 4 5
3 3 4 4 1 2 4 5
Solution:
Frequency distribution of the number of children
ii. Continuous frequency distribution:
In this form of distribution refers to groups of values.
This becomes necessary in the case of some variables
which can take any fractional value and in which case an
exact measurement is not possible.
Hence a discrete variable can be presented in the form of a
continuous frequency distribution.
Wage distribution of 100 employees
Weekly wages Number of
(birr) employees
50-100 4
100-150 12
150-200 22
200-250 33
250-300 16
300-350 8
350-400 5
Total 100
Nature of class/terms to develop class
The following are some basic technical terms when a
continuous frequency distribution is formed or data are
classified according to class intervals:
i. Class limits:
The class limits are the lowest and the highest values that can be
included in the class.
The two boundaries of class are known as the lower limits and the
upper limit of the class.
In statistical calculations, lower class limit is denoted by L and upper
class limit by U.
ii. Class Interval:
It is the size of each grouping of data.
Each grouping begins with the lower limit of a class interval and ends
at the lower limit of the next succeeding class interval.
For example, 50-75, 75-100, 100-125… are class intervals.
Cont’d
iii. Width or size of the class interval:
The difference between the lower and upper class limits is called Width
or size of class interval and is denoted by ‘C’.
C = Range = Range
CI 1+3.322 log10 N
iv. Range:
The difference between largest and smallest value of the observation
and is denoted by ‘R’
R = Largest value – Smallest value
v. Mid-value or mid-point:
The central point of a class interval is called the mid value or mid-point.
It is found out by adding the upper and lower limits of a class and
dividing the sum by 2. Mid value = L+ U
2
vi. Frequency:
Number of observations falling within a particular class interval
Types of class intervals
i. Exclusive method:
When the class intervals are so fixed that the upper limit of one class is
the lower limit of the next class; it is known as the exclusive method of
classification.
It is clear that the exclusive method ensures continuity of data.
ii. Inclusive method:
Both the lower and upper limits are included in the class interval.
In this method, the overlapping of the class intervals is avoided.
This type of classification may be used for a grouped frequency
distribution for discrete variable like members in a family, number of
workers in a factory etc.
It cannot be used with fractional values like age, height, weight etc.
iii. Open end classes:
A class limit is missing either at the lower end of the first class interval
or at the upper end of the last class interval or both are not specified.
The necessity of open end classes arises in a number of practical
situations, particularly relating to economic and medical data when
there are few very high values or few very low values which are far apart
from the majority of observations.
Example: The following data show the cost of electricity
during July 2013 for a random sample of 50 one-bedroom
apartments:
104-126 7 14%
126-148 13 26%
148-170 14 28%
170-192 8 16%
192-214 3 6%
Total 50 100%
Proportion = Relative frequency = number of values in each
class
Total number of values
Cumulative frequency distribution
Cumulative frequency is defined as the sum of all
frequencies up to or above the current point. Cumulative
frequency can be less-than cumulative frequency or
greater-than cumulative frequency.
Less-than cumulative frequency indicates the number of
elements in the data set that lie below the current value.
Greater-than cumulative frequency indicates the number
of elements in the data set that lie above the current value.
Step-4: Construct less than and greater than cumulative
frequency table.
Classes Frequency <Cumulative
frequency
82-104 5 5 50
104-126 7 12 45
126-148 13 25 38
148-170 14 39 25
170-192 8 47 11
192-214 3 50 3
Total 50
iii. Histogram:
A graph in which the classes are marked on the horizontal axis by using a
series of adjacent rectangles and the class frequencies represented by the
height of the rectangles on the vertical axis.
Histogram is similar to bar graph but there are two critical differences:
o The horizontal (x-axis) is a continuous scale. As a result of this there
are no gaps between the bars.
o The height of the rectangle is only proportional to the frequency if the
class intervals are all equal.
The following data regarding the cost of electricity for a random sample of
50 one-bedroom apartments can be depicted on histogram.
Class Frequency
82-104 5
104-126 7
126-148 13
148-170 14
170-192 8
192-214 3
Histogram showing the cost of electricity for a random sample
of 50 one-bedroom apartments.
Frequency
16
14
12
class frequencies
10
8
6
4
2
0
[82 - 104)[104 - 126)[126 - 148)[148 - 170)[170 -192) [192 -
214)
Class interval
•
50
40
30
20
10
0
[82 - 104) [104 -126) [126 -148) [148 -170) [170 - 192) [192 - 214)
Ogive (greater than Ogive Curve)
Class Frequency Greater than
Ogive
82-104 5 50
104-126 7 45
126-148 13 38
148-170 14 25
170-192 8 11
192-214 3 3
> Cumulative
60
50
40
> Cumulative
30
20
10
0
[82 - 104) [104 - 126) [126 -148) [148 -170) [170 - 192) [192 - 214)
CHAPTER THREE
X 1
X
= 5
5 0.2000
10 0.1000 0.4338
17 0.0588
24 0.0417
= 11.53
30 0.0333
Total 0.4338
Cont’d
Example: The marks secured by some students of a class are
given below. Calculate the harmonic mean.
Marks 20 21 22 23 24 25
No of 4 2 7 `1 3 1
Students
Solution:
Marks X No of 1/x ∑( 1/x)
students f
20 4 0.0500 0.2000
21
22
2
7
0.0476
0.0454
0.0952
0.3178
H.M = 18
23
24
1
3
0.0435
0.0417
0.0435
0.1251
0.8216
25 1 0.0400 0.0400 = 21.91
18 0.8216
Geometric mean
The geometric mean of a series containing n observations is the
nth root of the product of the values. If x1, x2…, xn are
observations then:
For ungrouped data: For grouped data:
x Logx
180 2.2553
= Antilog = 13.5107
250 2.3979
5
490 2.6902
= Antilog 2.7021 = 503.6
1400 3.1461
1050 3.0212
Total 13.5107
Example: Calculate the average income per head from the
data given below .Use geometric mean.
Class of people Number Monthly income
of per head (Birr)
families
Landlords 2 5000
Cultivators 100 400
Landless – labors 50 200
Money – lenders 4 3750
Office Assistants 6 3000
Shop keepers 8 750
Carpenters 6 600
Weavers 10 300
Solution
Class of people Number Monthly income Log x f logx
of per head (Birr) x
families(f)
Landlords 2 5000 3.6990 7.398
Cultivators 100 400 2.6021 260.21
0
Landless 50 200 2.3010 115.05
labors 0
Money – 4 3750 3.5740 14.296
lenders
Office 6 3000 3.4771 20.863
Assistants
Shop keepers 8 750 2.8751 23.200
Median
The median is that value of the variant which divides the group
into two equal parts, one part comprising all values greater,
and the other, all values less than median.
Median is defined as the value of the middle item or the mean
of the values of the two middle items when the data are
arranged in an ascending or descending order of magnitude.
Ungrouped or Raw data:
In an ungrouped frequency distribution if the n values are arranged in
ascending or descending order of magnitude, the median is the middle
value if n is odd.
When n is even, the median is the mean of the two middle values. By
the formula:
Median (Md) = n +1
2
Example: When odd number of values are given. Find median for the following data
25, 18, 27, 10, 8, 30, 42, 20, 53
Solution:
Arranging the order 8, 10, data in the increasing 18, 20, 25, 27, 30, 42, 53
The middle value is the 5th item i.e., 25 is the median
Median (Md) = n +1 = 9 +1 = 10/2 = 5th item = 25
2 2
Example: When even number of values are given. Find median for the following data
5, 8, 12, 30, 18, 10, 2, 22
Solution:
Arranging the data in the increasing order 2, 5, 8, 10, 12, 18, 22, 30
Here median is the mean of the middle two items (i.e.) mean of (10, 12) i.e.
10 +12 = 11
2
Using the formula,
Median (Md) = n +1
2
= 8 +1
2
= (9/2) th item = 4.5th item
= 4th item + (1/2) (5th item- 4th item)
= 10 + (1/2) (12-10)
= 10 + (1/2) x 2
= 10 +1 = 11
Grouped Data:
In a grouped distribution, values are associated with frequencies.
Grouping can be in the form of a discrete frequency distribution or a
continuous frequency distribution.
Whatever may be the type of distribution, cumulative frequencies
have to be calculated to know the total number of items.
In the case of a grouped series, the median is calculated by linear
interpolation with the help of the following formula:
M = l1 + l2 – l1 x m-c
f
Where, m=the median
l1=the lower limit of the class in which the median
lies
l2=the upper limit of the class in which the median
lies
f= the frequency of the class in which the median lies
m= the middle item (n +1)/2th
c= the cumulative frequency of the class
preceding the one in which the median lies.
Example: find the median for the following grouped data.
Where, l = the lower value of the class in which the mode lies.
f1= the frequency of the class in which the mode lies
f0= the frequency of the class preceding the modal class
f2= the frequency of the class succeeding the modal class
c= the class interval of the modal class
While applying the above formula, we should ensure that the class intervals are uniform
Example: The following is the distribution of the size of certain farms
selected at random from a district. Calculate the mode of the distribution.
Size of No. of farms
farms
5-15 8
16-25 12
26-35 17
36-45 29
46-55 31
56-65 5
Solution:66-75 3
The highest frequency is 31 and corresponding class interval is 46 – 55,
which is the modal class. Here l=46, f1=31, f0=29, f2=5, c=10
= 46 + 31 – 29 x 10
( 31-29) + ( 31-5)
= 46 + 2 x 10
28
= 46.7
Determination of Modal class
For a frequency distribution modal class corresponds to the maximum frequency. But in
any one (or more) of the following cases:
If the maximum frequency is repeated
If the maximum frequency occurs in the beginning or at the end of the distribution
If there are irregularities in the distribution, the modal class is determined by the
method of grouping.
Steps for Calculation:
We prepare a grouping table with 6 columns:
1. In column I, we write down the given frequencies.
2. Column II is obtained by combining the frequencies two by two.
3. Leave the 1 frequency and combine the remaining frequencies two by two and write
st
in column III
4. Column IV is obtained by combining the frequencies three by three.
5. Leave the 1st frequency and combine the remaining frequencies three by three and
write in column V
6. Leave the 1st and 2 frequencies and combine the remaining frequencies three by
nd
Solution:
CI f 2 Grouping
3 Table 4 5 6
0-5 9 21 36
5-10 12 27 43
10-15 15 31 48
15-20 16 33 48
20-25 17 32 42
25-30 15 25 38
30-35 10 23
35-40 13
Analysis Table
Columns 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40
1 1
2 1 1
3 1 1
4 1 1 1
5 1 1 1
6 1 1 1
Total 1 2 4 5 2
The maximum occurred corresponding to 20-25, and hence it is the modal
class.
Here l=20, f1=17, f0=16, f2=15, c=5
Mode = 20 + (17-16) x5
(17-16) + (17-15)
= 20 + (1/3) x 5
= 21.67
Graphic Location of mode
Steps:
1. Draw a histogram of the given distribution.
2. Join the rectangle corner of the highest rectangle (modal class rectangle)
by a straight line to the top right corner of the preceding rectangle.
Similarly the top left corner of the highest rectangle is joined to the top left
corner of the rectangle on the right.
3. From the point of intersection of these two diagonal lines, draw a
perpendicular to the x -axis.
4. Read the value in x-axis gives the mode.
Merits of Mode:
It is easy to calculate and in some cases it can be located
mere inspection
Mode is not at all affected by extreme values.
It can be calculated for open-end classes.
It is usually an actual value of an important part of the
series.
In some circumstances it is the best representative of data.
Demerits of mode:
It is not based on all observations.
It is not capable of further mathematical treatment.
Mode is ill-defined generally; it is not possible to find mode
in some cases.
As compared with mean, mode is affected to a great extent,
by sampling fluctuations.
It is unsuitable in cases where relative importance of items
has to be considered.
Empirical Relationship Between Averages
In a symmetrical distribution the three simple averages: mean = median =
mode.
For a moderately asymmetrical distribution, the relationship between them are
brought by Prof. Karl Pearson as, mode = 3median - 2mean.
Example: If the mean and median of a moderately asymmetrical series are 26.8
and 27.9 respectively, what would be its most probable mode?
Solution:
Mode = 3 median - 2 mean
= 3 x 27.9 - 2 x 26.8
= 30.1
Example: In a moderately asymmetrical distribution the values of mode and mean
are 32.1 and 35.4 respectively. Find the median value.
Solution:
Median =1/3[2mean+mode]
=1/3 [2 x 35.4 + 32.1]
= 34.3
Example: In a moderately asymmetrical distribution the values of mode and
median are 30.1 and 34.3 respectively. Find the mean value.
Mean = ½(3 median +mode)
= ½(3x34.3 +30.1)
= 66.5
Quartiles
The quartiles divide the distribution in to four parts.
The first (lower) quartile (Q1) marks off the first one-fourth, the third
(upper) quartile (Q3) marks off the three-fourth.
The second quartile divides the distribution into two halves and
therefore is the same as the median.
then:
Where Q1= (n +1) th item and Q3 = 3 (n +1) th item
4 4
Example: Compute quartiles for the data given below 25, 18,
30, 8, 15, 5, 10, 35, 40, 45
Solution:
5, 8, 10, 15, 18, 25, 30, 35, 40, 45
Q1= n +1 th item, 10 +1 = 2.75th item
4 4
= 2nd item + (3/4) (3rd item-2nd item)
= 8 + (3/4) (10-8)
= 9.5
Q3 =3 n +1 th item
4
= 3 x (2.75)th item
= (8.25)th item
= 8th item + (1/4) (9th item-8th item)
= 35 + (1/4) (40-35) = 36.25
Discrete Series:
Step1: Find cumulative frequencies.
Step2: Find n +1
4
Step3: See in the cumulative frequencies, the value just
greater than, n +1 then the corresponding value of X is
Q1.
4
Step4: Find 3 (n +1)
4
Step5: See in the cumulative frequencies, the value just
greater than 3 (n +1) then the corresponding value of X is
Q3.
4
Example: Compute quartiles for the data given bellow.
X 5 8 12 15 19 24 30
f 4 3 2 4 5 2 4
Solution:
X F c.f
5 4 4
8 3 7
12 2 9
15 4 13
19 5 18
24 2 20
30 4 24
Total 24
C.I F cf
0-5 5 5
5-10 8 13
10-15 12 25
15-20 16 41
20-25 20 61
25-30 10 71
30-35 4 75
35-40 3 78
P53 = lTotal
+ 53(n)/100))78
–mxc
f
= 20 + 41.3-41 x 5
20
= 20.085
Measures of Dispersion
The measure of central tendency serve to locate the center of
the distribution, but they do not reveal how the items are
spread out on either side of the center.
This characteristic of a frequency distribution is commonly
referred to as dispersion.
Dispersion is the degree to which numerical data tend to
spread about an average value.
In a series all the items are not equal; there is difference or
variation among the values.
The degree of variation is evaluated by various measures of
dispersion. Small dispersion indicates high uniformity of the
items, while large dispersion indicates less uniformity.
Absolute and Relative Measures
Absolute measure of dispersion indicates the amount of variation in
a set of values in terms of units of observations.
Absolute measures are not suitable for comparing the variability of
two distributions which are expressed in different units of
measurement and different average size.
Relative measures of dispersion are free from the units of
measurements of the observations. They are pure numbers.
They are used to compare the variation in two or more sets, which
are having different units of measurements of observations.
The various absolute and relative measures of dispersion are listed
below.
Absolute measure Relative measure
1. Range 1.Co-efficient of Range
2. Quartile deviation 2.Co-efficient of Quartile deviation
3. Mean deviation 3. Co-efficient of Mean deviation
4. Standard deviation 4.Co-efficient of variation
Range and coefficient of Range
i. Range:
It is as the difference between the largest and smallest values of the
variable.
Range = L – S.
Where L = Largest value.
S = Smallest value.
In individual observations and discrete series, L and S are easily
identified. In continuous series, the following two methods are followed.
Method 1: L = Upper boundary of the highest class
S = Lower boundary of the lowest class.
Method 2: L = mid value of the highest class.
S = mid value of the lowest class.
ii. Co-efficient of Range:
Co-efficient of Range = L – S
L+S
Example: Find the value of range and it’s co-efficient for the
following data.
7, 9, 6, 8, 11, 10, 4
Solution:
L=11, S = 4.
Range = L – S = 11- 4 = 7
Co-efficient of Range = L – S = 11- 4 = 7/15 = 0.4667
L + S 11 +4
Example 2:
Calculate range and its co efficient from the following
distribution.
Size: 60-63 63-66 66-69 69-72 72-75
Number: 5 18 42 27 8
Solution:
L = Upper boundary of the highest class = 75
S = Lower boundary of the lowest class = 60
Range = L – S = 75 – 60 = 15
Co-efficient of Range = L – S = 75- 60 = 15/135 = 0.11111
L + S 75 +60
Merits and Demerits of Range:
Merits:
It is simple to understand.
It is easy to calculate.
In certain types of problems like quality control, weather
forecasts, share price analysis, etc., range is most widely
used.
Demerits:
It is very much affected by the extreme items.
It is based on only two extreme observations.
It cannot be calculated from open-end class intervals.
It is not suitable for mathematical treatment.
It is a very rarely used measure.
Quartile Deviation and Co efficient of Quartile
Deviation
i. Quartile Deviation ( Q.D) :
Quartile Deviation is half of the difference between the first and third
quartiles.
Q. D =Q3 – Q1
2
ii. Co-efficient of Quartile Deviation :
Co-efficient of Q.D = Q3 – Q1
Q3 + Q1
Example: Find the Quartile Deviation for the following data:
391, 384, 591, 407, 672, 522, 777, 733, 1490, 2488
Solution:
Arrange the given values in ascending order.
384, 391, 407, 522, 591, 672, 733, 777, 1490, 2488.
Position of Q1 is n +1 =10 +1 = 2.75th item
4 4
Q1 = 2nd value + 0.75 (3rd value – 2nd value)
= 391 + 0.75 (407 – 391)
= 391 + 0.75 ´ 16
= 391 + 12
= 403
Position Q3 is 3 n +1 = 3 ´ 2.75 = 8.25th item
4
Q3 = 8th value + 0.25 (9th value – 8th value)
= 777 + 0.25 (1490 – 777)
= 777 + 0.25 (713)
= 777 + 178.25 = 955.25
Q.D = Q3 – Q1 = 955.25 - 403
2 2
= 276.125
Example: For the date given below, give the quartile deviation and coefficient
of quartile deviation.
X: 351 – 500 501 – 650 651 – 800 801–950 951–1100
f: 48 189 88 4 28
Solution:
x F True class Cumulative
Intervals frequency
351-500 48 350.5- 500.5 48
501-650 189 500.5- 650.5 237
651- 800 88 650.5- 800.5 325
801-950 4 800.5- 950.5 372
951-1100 28 950.5- 400
1100.5
Total N = 400
Q1= l1 + N/4 –m1 x c1 = N/4 = 400/4=100
f1
Q1 Class is 500.5 – 650.5
l1 = 500.5, m1 = 48, f1 = 189, c1 = 150
Q1= 500.5 + 100 – 48 x 150
189
Q1 = 541.77
Q3= l3 + 3(N/4) - m3 x c3
f3
3(N) = 3x100 = 300
4
Q3 Class is 650.5 – 800.5
l3 = 650.5, m3 = 237, f3 = 88, C3 = 150
Q3 = 650.5 + 300-237 x 150
88
Q3= 757. 89
Q.D = Q3 – Q1 = 757.89-541.77
2 2
Q.D = 108.06
Coefficient of Q.D = Q3 - Q1 = 757.89 – 541.77 = 0.1663
Q3 + Q1 757.89 + 541.77
Merits and Demerits of Quartile Deviation
Merits:
It is Simple to understand and easy to calculate
It is not affected by extreme values.
It can be calculated for data with open end classes
also.
Demerits:
It is not based on all the items. It is based on two
positional values Q1 and Q3 and ignores the extreme
50% of the items.
It is not amenable to further mathematical
treatment.
It is affected by sampling fluctuations.
Mean Deviation and Coefficient of Mean Deviation
i. Mean Deviation:
The range and quartile deviation are not based on all
observations. They are positional measures of dispersion.
They do not show any scatter of the observations from an
average.
The mean deviation is measure of dispersion based on all
items in a distribution.
Mean deviation is the arithmetic mean of the deviations of a
series computed from any measure of central tendency; i.e.,
the mean, median or mode, all the deviations are taken as
positive i.e., signs are ignored.
According to Clark and Schekade, “Average deviation is the
average amount scatter of the items in a distribution from
either the mean or the median, ignoring the signs of the
deviations”.
ii. Coefficient of mean deviation:
Mean deviation calculated by any measure of central
tendency is an absolute measure.
For the purpose of comparing variation among different series,
a relative mean deviation is required.
The relative mean deviation is obtained by dividing the mean
deviation by the average used for calculating mean deviation.
Coefficient of mean deviation: = Mean deviation
Mean or Median or Mode
If the result is desired in percentage, the coefficient of mean
Deviation = Mean deviation x 100
Mean or Median or Mode
Computation of mean deviation – Individual Series:
Calculate the average mean, median or mode of the series.
Take the deviations of items from average ignoring signs and denote
these deviations by |D|.
Compute the total of these deviations, i.e., S |D|
Divide this total obtained by the number of items.
Symbolically: M.D. = ∑ (D)
n
Example: Calculate mean deviation from mean and median for the following data:
100,150,200,250,360,490,500,600,671 also calculate coefficients of M.D.
Solution:
2
= Value of 9 +1 th item
2
= Value of 5th item
= 360
X D = (X – x) D = (x -Md)
100 269 260
150 219 210
200 169 160
250 119 110
360 9 0
490 121 130
500 131 140
600 231 240
671 302 311
Total= 3321 1570 1561
Values (X)
14 -1 1
22 7 49
9 -6 36
15 0 0
20 5 25
17 2 4
12 -3 9
11 -4 16
120 140
Example: The table below gives the marks obtained by 10 students in
statistics. Calculate standard deviation.
Student Nos: 1 2 3 4 5 6 7 8 9
10
Marks: 43 48 65 57 31 60 37 48 78
Solution:
59 (Deviations from assumed mean)
Nos. Marks (x) d=X-A (A=57) d2
1 43 -14 196
2 48 -9 81
3 65 8 64
4 57 0 0
5 31 -26 676
6 60 3 9
7 37 -20 400
8 48 -9 81
9 78 21 441
10 59 2 4
n = 10 ∑d=-44 ∑d2 =1952
•
Assumed mean method:
Here deviations are taken not from an actual mean but from an
assumed mean. Also this method is used, if the given variable values
are not in equal intervals.
Steps:
Assume any one of the items in the series as an assumed mean and
denoted by A.
Find out the deviations from assumed mean, i.e., X-A and denote it by d.
Multiply these deviations by the respective frequencies and get the ∑fd
Square the deviations (d2 .
)
= √231 –(-45 2) x 10
55 55
= 18.8 marks
Combined Standard Deviation
•
•
Village
A B
No of people 600 500
Average income 175 186
Standard deviation of income 10 9
Merits and Demerits of Standard Deviation:
Merits:
It is rigidly defined and its value is always definite and
based on all the observations and the actual signs of
deviations are used.
As it is based on arithmetic mean, it has all the merits of
arithmetic mean.
It is the most important and widely used measure of
dispersion.
It is possible for further algebraic treatment.
It is less affected by the fluctuations of sampling and
hence stable.
It is the basis for measuring the coefficient of correlation
and sampling.
Demerits:
It is not easy to understand and it is difficult to calculate.
It gives more weight to extreme values because the values
are squared up.
As it is an absolute measure of variability, it cannot be
used for the purpose of comparison.
Coefficient of Variation
•
•
Factory Average Standard Deviation No. of workers
A 34.5 5 476
B 28.5 4.5 524
Example: Prices of a particular commodity in five years in two cities are given below: Which city has
more stable prices?
Subjective probability.
i. Objective Probability:
Based on the objective of assigning probability, there are two
types of probability:
Classical probability
Empirical probability.
•
•
Computation of probability
•
•
Multiplication rule:
If there are m ways of doing one thing and n ways of doing
another thing, there are m*n ways of doing both where there
are more than one groupings from which you made selection.
Example: If the home builder offered you four different exterior
styles of a home to choose from and three interior floor plans,
how many different arrangements of interior and exterior styles
of a home can be offered?
Solution:
4*3 = 12 , let an exterior styles A, B, C, D and interior design as E, F, G
A E C E
F F
G G
B E D E
F F
G G
•
•
Counting Rules
In order to calculate probabilities, we have to know the number of
elements of an event and the number of elements of the sample space.
In order to determine the number of outcomes, one can use several rules
of counting.
Permutation rule
Combination rule
i. Permutation:
Permutation is applied to find the possible number of arrangements when
there is only one group of objects in a specified order.
Permutation Rules:
The number of permutations of n distinct objects taken all together is n!
Where n!=n*(n-1)*(n-2)……3*2*1
The arrangement of n objects in a specified order using r objects at a time
is called the permutation of n objects taken r objects at a time. It is
written as and the formula is :
•
•
Probability Distribution
Probability distribution describes how the probability
is spread over the possible numerical values
associated with the outcomes.
Mark of a student.