0% found this document useful (0 votes)
13 views

Wordpress Documentation

The document discusses various methods for representing statistical data, including classification, tabulation, frequency distributions, and graphical representations such as bar charts and pie charts. Examples are provided to demonstrate how to construct frequency distributions and calculate related measures. Key terms are defined throughout.

Uploaded by

sameerfraz1122
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Wordpress Documentation

The document discusses various methods for representing statistical data, including classification, tabulation, frequency distributions, and graphical representations such as bar charts and pie charts. Examples are provided to demonstrate how to construct frequency distributions and calculate related measures. Key terms are defined throughout.

Uploaded by

sameerfraz1122
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

REPRESENTATION OF DATA:

The raw data, which have been collected are usually very large in quantity. Therefore, we
have to organize and summarize the collected data in such a form that is easy to understand.
This is called presentation of statistical data.
Array: The arrangement of data in ascending or descending order of magnitude is called an
array.
Different methods used in the presentation of statistical data
1. Classification 2. Tabulation 3. Diagram 4. Graph
Classification: Process of arranging the data into relatively homogenous groups or classes
according to some common characteristics is called classification. For example, population of
the country is classified according to age, sex, religion and marital status.
Tabulation: The systematic arrangement of the data in the form of rows and columns for the
purpose of comparison and analysis is known as tabulation.
Frequency distribution: A frequency distribution is a tabular arrangement of data in which
various items are arranged into classes or groups and the number of items falling in that class
is stated. The number of observations falling in a particular class is called class frequency or
simply frequency of that class and is denoted by "f".
Class and Class frequency: when a set of data are divided into non-overlapping
homogeneous groups, each group is called class or class interval. The number of observations
falling in a particular class is called frequency of that class or simply frequency and is
denoted by "f".
Class limits: The class limits are defined as the number or the values of the variables which
are used to separate two classes. The smaller number is called lower class limit and larger
number is called upper class limit.
Class boundaries: The class boundaries are obtained by subtracting and adding half of the
difference between the upper limit and lower limit of two successive classes respectively. It
can also be obtained by subtracting and adding h/2 from midpoint of each class.
Class mark or mid points: The class mark or the midpoint is that value which divides a
class into two equal parts. It is obtained by dividing the sum of lower and upper class limits
or class boundaries of a class by 2.
Class interval: Class interval is the length of a class. A class interval is usually denoted by "h".
It is obtained by
(i) The difference between the upper-class boundary and the lower-class boundary.(Not
the difference between class limits)OR
(ii) The difference between either two successive lower class limits or two successive
upper class limits. OR
(iii) The difference between two successive midpoints.
CONSTRUCTION OF A FREQUENCY DISTRIBUTION:
Decide the number of classes: The number of classes is determining by the formula i.e.
K=1+3.3log(n) OR n
k (approximately)
Where K denotes the number of classes and n denotes the total number of observations.
Determine the range of the data: The difference between the largest and smallest values in
the data is called the range of the data. i.e. R = largest observation - smallest
observation
Where R denote the range of the data.
Determine the approximate size of class interval: The size of the class interval is
determined by dividing the range of the data by the number of classes i.e. h= R/K
Where h denotes the size of the class interval. In case of fractional results, the next higher
whole number is usually taken as the size of the class interval.
Decide where to locate the class limits: The lower-class limit of the first class is started just
below the smallest value in the data and then add class interval to get lower class limit of the
next class, repeat this process until the lower-class limit of the last class is achieved.
Distribute the data into appropriate classes:Take an observation and marked a vertical bar
"I"(Tally) against the class it belongs.
Cumulative Frequency:Cumulative frequency of a class is obtained by adding all the
frequencies of all preceding classes including that class and is denoted by c.f.
Relative Frequency: The frequency of a class divided by the total frequency of all the classes
is called Relative frequency and is denoted by r.f.
Cumulative relative frequency: Cumulative relative frequency of a class is obtained by
adding all the relative frequencies of all preceding classes including that class.
Percentage frequency: Percentage frequency of a class is obtained by multiplying100 to the
relative frequencies of that class.
Cumulative percentage frequency: Cumulative percentage frequency of a class is
obtained by adding all the percentage frequencies of all preceding classes including that
class.
Example # 1. The following data is the final plant height (cm) of thirty plants of wheat.
Construct a frequency distribution
87 91 89 88 89 91 87 92 90 98 95 97 96
100 101 96 98 99 98 100 102 99 101 105 103 107
105 106 107 112
(i) Number of classes: The number of classes is determining by the
formula K = 1+3.3 log(n) = 1+3.3 log (30)= 1+3.3(1.4771) = 5.87 ≈ 6
(ii) Size of class interval:The size of the class interval h= R/K
R = Largest observation - Smallest observation = 112 - 87 =
25 h = 25/6 = 4.17 ≈5
FREQUENCY DISTRIBUTION

Class Class Tally Frequency Midpoint c.f r.f % Cumulative


limits boundaries f X Freq. Freq. frequency % Freq.

86----90 85.5----90.5 6 88 6 0.2000 20.00 20.00

91----95 90.5----95.5 4 93 10 0.1333 13.33 33.33

96----100 95.5----100.5 10 98 20 0.3333 33.33 66.66

101----105 100.5----105.5 6 103 26 0.2000 20.00 86.66

106----110 105.5----110.5 3 108 29 0.1000 10.00 96.66

111----115 110.5----115.5 1 113 30 0.0333 3.33 99.99

30 1.0000 100

Example # 2. The following data represent the number of goals scored by a team in 10 matches
0,0,1,1,3,1,3,0,2,0 construct frequency distribution

Number of Number of Matches


Goals (X) (f)
0 4
1 3
2 1
3 2
Example # 3. The following data represent the gender of 10 students
Male, Male, Female, Male, Female, Male, Female, Male, Male, Female. Construct frequency
Distribution

Gender Number of Students

Male 6

Female 4

GRAPHICAL REPRESENTATION:The Visual representation of statistical data in


the form of points, lines, areas and other geometrical forms and symbols is known as
graphical representation. Such visual representation can be divided in to two groups.
(i) Graph (ii) Diagram
The basic difference between a graph and a diagram is that a graph is a representation of data
by a continuous curve, usually shown on graph paper while a diagram is any other one, two
or three-dimensional form of visual representation.
Diagrams: Diagram is a device used for representing a statistical data in such a way that it
provides maximum information’s about the movement of the data.
Advantages of Diagrams:
 The diagrams are good looking and attractive.
 The diagrams leave more effective and long lasting impression on the mind of a reader.
 The diagram make it easier to compare two or more things at a time.
Disadvantages of Diagrams:
 The diagrams are less accurate than tables.
 The diagrams have cost money and time and the amount of information conveyed is
limited.
Types of Diagrams: Different types of diagram or charts commonly used for displaying
statistical data are described below:
1. Simple Bar Diagram/Chart2. Multiple Bar Diagram/Chart
3. Sub-divided or Component Bar Diagram/Chart4. Pie Diagram/Chart
Simple bar chart/Diagram: A Simple bar chart consists of horizontal or vertical bars of
equal widths and lengths proportional to the magnitudes of the observations. The space
separating the bars should not exceed the width of the bar and should not be less than half of
its width. The data when do not relate to time should be arranged in ascending or descending
order before charting.
Multiple bar chart/Diagram: A Multiple bar chart shows two or more characteristics
corresponding to the values of a common variable in the form of grouped bars whose lengths
are proportional to the values of the characteristics and each of which is shaded or colored
differently for identification. OR
It is an extension of simple bar chart and is used to represent two or more related sets of data
in the form of groups of bars side by side. Multiple bar charts provide more information’s
about the same problem.
Sub-divided/component bar chart/Diagram: In Component bar chart, each bar is divided
into two or more sections. The length of the bar represents the total and various sections

represent the components of total. OR


Sub-divided bar chart is obtained by dividing simple bar chart in to different components.
Pie diagram/Diagram: A Pie diagram consists of a circle divided into sectors whose areas
are proportional to the various parts into which the whole quantity is divided. OR
A Pie diagram is the division of a circular region into different sectors of any convenient
radius. It is constructed by dividing the total. As a circle consists of 360ois divided into
different
components.
Component
Angleof a sec tor  part
Whole quantity  360

Graphs: Graph means the drawing of geometrical curves in conformity with the given data.
It is a representation of data by a continuous curve.
Advantages of Graphs:
 Graphs are most effective way to represent data.
 Graphs are the most effective way to compare two sets of data at a time.
 Graphs are helpful to show the general trend of data.
 Graphs are helpful in prediction and forecasting.
 Graphs are useful to locate some of the averages.
Types of graphs: Different types of graphs are commonly used for displaying statistical data
are described below:
(i) Historigram/Graph of time series (ii) Histogram
(iii) Frequency polygon & Frequency curve (iv) Cumulative Frequency polygon or Ogive
Historigram/Graph of time series: A graph of time series is called historigram. A
Historigram is constructed by taking time along X-axis and the value of the variable along Y-
axis. Points are plotted and are then connected by straight line segments to get the
Historigram.
Histogram: Histogram is the graphical representation of frequency distribution by a set of
adjacent rectangles in which area of each rectangle is proportional to the corresponding
frequency. In the construction of histogram class boundaries taking along the X-axis and
whose height are proportional to the frequencies with respective classes (frequency along Y-
axis).But in case of unequal class interval adjusted frequency is used in place of frequency
where adjusted frequency is obtained by dividing the frequency to the class interval.
Frequency polygon: A frequency polygon is a line graph of frequency distribution in which
the frequencies are plotted against the mid points of the classes. Itis constructed by taking the
midpoints along X-axis and class frequency along Y-axis. Points are plotted and are then
connected by straight line segments. But to get a polygon* add extra class midpoint at both
ends of the distribution with zero frequency so that the polygon does form a closed figure
with the horizontal axis.
Frequency curve: If the frequency polygon is smoothed out, the resulting graph is called a
frequency curve. OR
A frequency curve is constructed by taking the midpoints along X-axis and class frequency
along Y-axis. Points are plotted and are then connected by free hand curve.
Cumulative frequency polygon/Ogive: A Cumulative frequency polygon is obtained by
plotting the cumulated frequency (along Y-axis) against the upper-class boundaries (along X-
axis) and the points are joined by straight line segments. To get a polygon include lower class
boundary of the first class with zero frequency and joined the last point with the last upper
class boundary.
Types of frequency curve:
(1) Symmetrical distribution (2) Skewed distribution
Symmetrical distribution: A frequency distribution or curve is said to be symmetrical if
values equidistant from a central maximum have the same frequencies. For example, Normal
curve. Skewed distribution A frequency distribution or curve is said to be skewed when it
departs from symmetry.
STEM-AND-LEAF DISPLAY:- A clear disadvantages of using a frequency table is that the
identity of individual observations is lost in grouping process. To overcome this drawback,
Jhon Tukey (1977) introduced a technique known as the Stem-and-Leaf display. This
technique offers a quick and novel way for simultaneously sorting and displaying data sets
where each number in the data set is divided into two parts, a Stem and a Leaf. A Stem is the
leading digit(s) of each number and is used in sorting, while a leaf is the rest of the number or
the trailing digit(s) and
shown in display. A vertical line separates the Leaf from the Stem. For example, the number
243
could be split two ways:

Leading digit Trailing digit OR Leading digit Trailing digit

2 43 24 3

Stem leaf Stem leaf

Represent the following data by STEM-AND-LEAF DISPLAY


(i) Taking 10 unit as the width of the class.
(ii) Taking 5 unit as the width of the class
32 45 38 41 49 36 52 56 51 62 63 59 68

3│286 3*│2
4│519 3.│86
5│2619 4*│1
6│238 4.│59
5*│21
5.│69
6*│23
6.│8

NOTE: *indicate 0—4&. indicate 5—9. * & . are called placeholder

Two data sets can be compared by using Back-to-Back stem and leaf display
Data 1) 32 45 38 41 49 36 52 56 51 62 63 59
68
Data 2) 23 58 26 57 55 65 29 36 59 69 60
Data 1 Data 2
(# 13) (# 11)
│ 2 │369
682│ 3 │6
915│ 4 │
9162│ 5 │8759
832│ 6 │590
MEASURE OF CENTRAL TENDENCY
OR
MEASURE OF CENTRAL
TENDENCY
An average is a single value As an average tends to lie at the center of the distribution
or data so it is called measure of location or measure of central tendency.
Average:
An average is a numerical value that is used to represent a set of data.
Properties of a good Average:A good average must have the following properties:
 It should be clearly defined by mathematical formula.
 It should be easy to calculate and simple to understand.
 It should be based on all observation of data.
 It should be capable for further algebraic treatment.
 It should be least affected by fluctuation of sampling.
 It should not be affected by extreme values.
Types of averages: The common used averages are:
(i) Arithmetic mean (ii) Geometric mean (iii) Harmonic mean
(iv) Median and quantiles (v) Mode
Arithmetic mean: Arithmetic mean (A.M) of a set of data is obtained by dividing the sum of
all the observations by the total number of observations. It is denoted by Greek letter "  ".read
as “meu” for the population data. Population mean for N values is given as

X + X + X + ... 
X X i

For ungrouped i1

data = 1 2
N
3 N
= N

The estimate of population mean  is the sample mean and is denoted by “ X ” read as “X-
bar” for the sample data. Sample mean for n values is given as
For ungrouped data

x + x + x + ...  x x n
i
i1
Direct X= 1 2 3 n
=
Method n n

D
n
i
i1
In-direct Method/Short-cut X =A
Method  n

where D  X  A and “ A ” is an arbitrary value.

U
n
i

Step-deviation Method/Coding X =A i1


Method  n *h
where U XA
 and “ A ” is an arbitrary value and “ h ” is the common interval.
h
Example # 1. Find the arithmetic mean for the following data set.
X i = 87, 91, 89, 88, 89, 91, 87, 92, 90, 98.

x 902
X = n = = 90.2
10
D
X = A
= 89  = 89  1.2  90.2
n 12
10
U
X = A * 12
= 89  *1 = 89  1.2 *1  90.2
h 10
n
When the number of observations is very large, the data is organized into a frequency
distribution, which is used to calculate the approximate values of descriptive measures as the
identity of the observations is lost. To calculate the approximate value of the mean, the
observations in each class are assumed to be identical with the class midpoint so that the
product of the midpoint by the number of observations, i.e. frequency would be
approximately equal to the sum of observations for each class
For grouped data

f x + f x + f x + ...  f fx n
i i
x
i1
Direct X= 1 1 2 2 3 3 n n
=
Method
f f
n n
i i
i1 i1

 fD
n
i
In-direct Method/Short-cut X = A i1

Method f
where D  X  A and “ A ” is an arbitrary value.

 fU
n
i
Step-deviation Method/Coding X = A i 1
*h
Method f
XA
where U and “ A ” is an arbitrary value and “ h ” is the common interval.
 h
Example # 2. Find the arithmetic mean for

Marks Frequency Mid points fX D=X-32 fD U=X-32/5 fU


(f) (X)
20—24 1 22 22 -10 -10 -2 -2
25—29 4 27 108 -5 -20 -1 -4
30—34 8 32 256 0 0 0 0
35—39 11 37 407 5 55 1 11
40—44 15 42 630 10 150 2 30
45—49 9 47 423 15 135 3 27
TOTAL 48 1846 340 62

 fx
X = f 1846
= 48 = 38.52

X = A
fD = 32  = 32  6.52  38.52, where, A  32
340
f 48
fU 62
X = A * = 32  * 5 = 89 1.3* 5  38.2, where, A  32andh  5
h 48

f
NOTE: At least one observation will be below and atleast one will be above the mean
PROPERTIES OF ARITHMETIC MEAN: Following are the properties of the arithmetic
mean.
 Mean of the constant values is equal to a constant.
 The sum of the deviations of the observations from their mean is equal to zero.
 (X  X )
 The sum of squared deviations of the observations from their mean is minimum is that
squared deviation of the observations from an arbitrary value.

 ( X 2 X
2
 (  a)
) X
Where 'a' is any value other than mean of the data
 If n1 values have
mean X1 , n2 values have mean X 2 , n3 values have X 3 , and
mean

n1 X 1 + n2 X 2 ,...nk X
so on then the mean of all the values is Xc k
= n1 + n2 + ...nk
 Arithmetic mean is dependent of origin and scale. i.e. If a variable X has mean X ,
then mean of new variable Y will
be If Y  a  bX
Where a & b are any Y  a  bX
constants
Example # 3.

X (X-68.5) (X-68.5)2 (X-70) (X-70)2 Y=2X+3


67 -1.5 2.25 -3 9 137
72 3.5 12.25 2 4 147
68 -0.5 0.25 -2 4 139
70 1.5 2.25 0 0 143
65 -3.5 12.25 -5 25 133
68 -0.5 0.25 -2 4 139
75 6.5 42.25 5 25 153
63 -5.5 30.25 -7 49 129
TOTAL 0 102 -12 120 1120

Mean of Y = 1120/8 = 140 (By transforming the original variable)


Mean of Y = 2 (68.5) + 3 = 140 (By using property)
Example # 4. The mean weight of 10 students is 50 Kg when two students left the class the
mean weight becomes 48 Kg find the mean weight of students who left the class
SOLUTION:-Total weight of 10 student = (10) (50) = 500
Total weight of 8 student (after 2 students left the class) = (8) (48) = 384
Total weight of 2 students ( Who left the class) = 500 - 384 = 116
Mean weight of the students who left the class = 116/2 = 58
Example # 5. For a class of 25 students, on Tuesday 20 students from the class took a Math
test and their mean marks was 80. On Friday remaining students from the class took the Math
test and their mean marks was 90. Find the mean marks of the entire class.
Total marks of 20 students who took test on Tuesday = (20) (80) = 1600
Total marks of 5 students who took test on Friday = (5) (90) = 450
Total marks of 25 students = 1600 + 450 = 2050
Mean marks of 25 students = 82
Example # 6. Ali Shah took five Math tests during the semester and the mean of his test
score was 85. If his mean after the first three was 83, What was the mean of his 4th and 5th
tests SOLUTION:-
Total marks of all five tests = (5) (85) = 425
Total marks of first three tests = (3) (83) = 249
Total marks of last two tests = 425 – 249 = 176
Mean marks of last two tests = 88
Example # 7. Ali has grades of 84, 65, and 76 on three math tests. What grade must he obtain
on the next test to have an average of exactly 80 for the four tests?
SOLUTION:-
Total marks required to have mean of 80 in 4 tests = (4) (80) = 320
Total marks of first three tests = 225
Marks required in 4th test to have average of 80 = 320 – 225 = 95
Example # 8. If mean marks of students from three sections A, B and C are 45, 40 and 35
respectively with number of students from three section are 50, 40 and 60. Find mean marks
of students from three section ( Combined mean)
Solution:- Total marks of 50 students from Section A= (50) (45) = 2250
Total marks of 40 students from Section B = (40) (40) = 1600
Total marks of 60 students from Section C= (60) (35) = 2100
Total marks of 150 students from three sections = 5950
Mean marks of 150 students from three sections = 39.67
Merits of Arithmetic Mean:
 It is clearly defined by mathematical formula.
 It is easy to calculate and simple to understand.
 It is based on all the observations.
 It is capable for further algebraic treatments.
 It is least affected by sampling fluctuation.
De-Merits of Arithmetic Mean:
 It is not an appropriate average for highly-skewed distribution.
 It is greatly affected by by extreme values.
 It cannot be calculated for open-end classes.
 It may be a value which is usually not present in the data.
 It cannot be computed accurately even one item is missing.
Weighted Arithmetic mean: Sometimes, we want to find the average of certain values which
are not of equal importance. When the values are not of equal importance we assign them
certain numerical values to express their relative importance. These numerical values are called
weights.
If X1,X2,...Xk have weights W1, W2, ...Wk, then the weighted mean is defined as
W 1 X 1 +W 2 X 2 + ...W kX k
X =
= WX
w W 1 +W 2 + ...W k W
Example # 9. An examination was held to decide about the award of a scholarship in an
institution. The weights of various subjects were different. The marks obtained by 3
candidates ( A,B,C) out of 100 are given below. If the candidate getting the height average
score is to be awarded the scholarship, who should get it
Subject Weights XA XB XC WXA WXB WXC
(%)
Statistics 40 70 80 85 2800 3200 3400

Mathematics 30 90 75 75 2700 2250 2250


Economics 20 50 60 45 1000 1200 900

English 10 60 45 65 600 450 650


TOTAL 270 260 270 7100 7100 7200

Un-Weighted Mean (A.M)


A= 67.5
B= 65.0
C= 67.5
Weighted Mean
A= 71
B= 71
C= 72
NOTE:-
 We can not find how many values are above and how many are below the mean
 Aleast one value will be greater and atleast one will be less than mean.
 Geometric mean and Harmonic mean are useful measure of central tendency for
averaging rates and ratios.
Geometric mean:- The geometric mean “G” is defined as the nth root of the product of n
positive values X1, X2, X3...Xn.
1
For ungrouped data Geometric mean is given by G = ( X )n
1

OR G = (X 1 * X 2 * X 3 *...* X n )n OR
[ log x1 + log x2 ...log X k ]  log X
Log G =
= n
n   log X 
G = Antilog
 
 n 

 f1
f2 fn

1
f
f3
For grouped data Geometric mean is given by G
= ) 2 *(X 3 ) *...*(X n ) 
(X1 ) *(X
Where, n denote total number of classes
OR
f 1 log X 1 + f 2 log X 2 + f 3 log X 3 ... f n log X n  f log X
Log G = f + f 2 + f 3 ... f n
= 1 f
  f log X 
G = Antilog  
  f 
Merits of Geometric
Mean:
 It is clearly defined by mathematical formula.
 It is based on all the observations.
 It is least affected by extreme values.
 It is suitable for further algebraic treatments.
 It gives equal weights to all the values.
 It is an appropriate average for averaging rates of change and ratios.
De-Merits of Geometric Mean:
 It is neither easy to calculate nor simple to understand.
 It cannot be calculated if any value is zero or negative in the data.
 It cannot be calculated in case of open-end frequency
distribution.
Example # 10. Find the Geometric mean of the values 3, 5, 6, 6, 7, 10, 12.

X 3 5 6 6 7 10 12 Total
log(X) .4771 .6989 .7782
  log .7782
X .8451 1.000 1.0792 5.65677
G = Antilog Antilog  5.65677 
=
   
 n   7 
= Antilog (.80811) = 6.43
Example # 11. The grouped data is available on insect growth population for age and
corresponding frequencies. Find Geometric mean

CLASS f X log(X) f log(X)


0—4 2 2 0.3010 0.6021
4—8 5 6 0.7782 3.8908
8—12 7 10 1.0000 7.0000
12—16 8 14 1.1461 9.1690
16—20 7 18 1.2553 8.7869
20—24 4 22 1.3424 5.3697
24—28 1 26 1.4150 1.4150
TOTAL 34 36.2334
  f log X   36.2334 
G = Antilog   Antilog  
  f  34 
=

= Antilog (1.0657) = 11.6329
Harmonic mean:- The Harmonic mean “H” of a set of n values X 1, X2, X3,...Xn is defined as
the reciprocal of the arithmetic mean of the reciprocals of the values. It is abbreviated is H.M
and is given by
For ungrouped data
 1 1 1 1 
x + x + x + ... x n
 
H.M = Reciprocal of  1 2 3 k 
=  1 
 n 
  X
   
For grouped
data f3 f 
 f1 f2 + ... k
 + + f
 x

x x x
H .M = Reciprocal
of  1 2 3 k 
=
 f + f + f ... f  f 
1 2 3 k
   x

   
Merits of Harmonic Mean:
 It is clearly defined by mathematical formula.
 It is based on all the observations of the data.
 It is suitable for further algebraic treatments.
 It is not affected by extreme large observations.
 It is not affected by sampling fluctuations.
 It gives more weightage to the small values and less weightage to the large values.
 It is better than weighted mean since in this, values are automatically weighted.
De-Merits of Harmonic Mean:
 It is neither easy to calculate nor simple to understand.
 It cannot be calculated if any value of the data is zero.
 It is affected by extremely small observations.
 It may be a value which is usually not present in the data.
Example # 12. Calculate Harmonic mean for the following data

CLASS f X 1/X f (1/X)


0--4 2 2 0.5000 1.0000
4--8 5 6 0.1667 0.8333
8--12 7 10 0.1000 0.7000
12--16 8 14 0.7114 0.5714
16--20 7 18 0.0556 0.3889
20--24 4 22 0.0385 0.1818
24--28 1 26 1.4150 0.0385
TOTAL 34 3.7139
f 34
H = = 9.15
=
f 3.7139
x

Relation between A.M, G.M &H.M A.M  G.M  H.M  G.M  A.M
H.M
The three means are equal only when all the observations are identical. A.M = G.M = H.M
G.M  A.M * H.M
Example # 13. Verify the relation A.M > G.M > H.M for the following

CLASS f X fX logX f logX 1/X f/X


1---3 5 2 10 0.30103 1.5051 0.500 2.500
4---6 8 5 40 0.69897 5.5918 0.200 1.600
7---9 12 8 96 0.90309 10.8371 0.125 1.500
10---12 9 11 99 1.04139 9.3725 0.091 0.818
13---15 3 14 42 1.14613 3.4384 0.071 0.214
TOTAL 37 287 30.7449 6.632
A.M = 7.76 > G.M = 6.78 > H.M = 5.57
Median & Quantiles:-
Median: Median is defined as the middle value of the data when the data is arranged in
ascending or descending order of magnitude. The median is a value that divides a set of data
in to two equal parts after arranging the values in ascending order of magnitude. It is simply
the middle value of the data when the number of values is odd. It is the mean of two middle
~
values if the number of values is even. Median is denoted by “ X ” read as X-childa or Tilda.
In both cases
th
n1
For ungrouped data Median=Size of   item.
 2 
Merits of Median:
 It is easy to calculate and simple to understand.
 It is not affected by extreme values.
 It can be calculated for open-end frequency distribution.
 It is a useful average, when data are of qualitative nature.
 It is appropriate average for highly skewed distribution.
De-Merits of Median:
 It is not clearly defined by mathematical formula.
 It is not based on all the values.
 It is not suitable for further algebraic treatments.
 It is affected by sampling fluctuations.
 It is difficult to arrange a large number of values.
Example # 14. Given below are the marks obtained by 20 students.
53, 74, 82, 42, 39, 28, 20, 81, 68, 58, 54, 93, 70, 30, 61, 55, 36, 37, 29, 94. Find Median
Solution:-First arrange the data in ascending order of magnitude
20, 28, 29, 30, 36, 37, 39, 42, 53, 54, 55, 58, 61, 68, 70, 74, 81, 82, 93, 94.
th
n1
Median = Size of   item = 10.5th item
 2 
= Size of 10.5th item = 10th + 0.5(11th - 10th)
Median = 54 + 0.5(55 - 54) = 54.5
50% i.e. 10 students obtained marks 54.5 or below
Quantiles: Quantiles are the values that divides a set of data in to more than two equal parts.
Quartiles, Deciles and Percentiles are collectively called Quantiles.
often, we are interested to know the position of an observation in the set of data. We are
interested to know the percentage of students having height less than some specific value.
The measure used for this purpose are called quantiles or fractiles and are usually calculated
under the following headings.
I) Quartiles II) Deciles III) Percentiles
Quartiles: Quartiles are the values that divide a set of data in to four equal parts after
arranging them in ascending or descending order of magnitude. Quartiles are denoted by Q1,
Q2,and Q3. Q1is called lower-quartile, Q3 is called upper-quartile and Q2 is also called median
For ungroup data
th
 n  1
Q j = Size of j observation where j  1,2,3

 4 
th
 n  1
Q1  Size of  item  5.25th item
1 4 

Size of 5 .25th item =5th+0.25(6th-5th)
=36+0.25(37-36)=36.25
Q1 = 36.25, indicates that 25% students (i.e. 5) have marks 36.25 or below OR 75% students
(i.e. 15) have marks 36.25 or above.
th
 n  1
Q3  Size of  item  15.75th item
3 4 

Size of 15.75th item =15th+0.75(16th-15th)
=70+0.75(74-70)=73
Q3 = 73, indicates that75% students (i.e. 15) have marks 73 or below OR 25% students (i.e.
25) have marks 73 or above
Deciles:- Deciles are the values that divide a set of data in to ten equal parts after arranging
them in ascending order of magnitude. Deciles are denoted by D1, D2, D3...D9.
For ungrouped data
th
 n  1
D j = Size of j  observation where j  1,2,39
 10 
Percentiles:-Percentiles are the values that divide a set of data in to 100 equal parts after
arranging them in ascending order of magnitude. Percentiles are denoted by P1, P2, P3...P99.
For ungrouped data
th
j n  1
Pj = Size of 100 observation where j  1,2,399
 
Median and Quantiles for group data
Median for Group data
h n 
Median = l + -c
 
f  2 
l=Lower class boundary of the class containing median
h=class interval of the class containing median*
f=Frequency of the class containing median
n=Total number of observations
C=Cumulative frequency of the class preceding the class containing median.
*The median class is a class which corresponds to the cumulative frequency in which (n/2)
lies.
Example # 15. Estimate the Median and the Quartiles

Daily Income (Rs. 00) f cf Class Boundaries

5-----24 4 4 4.5---24.5
25-----44 6 10 24.5---44.5
45-----64 14 24 44.5---64.5
65-----84 22 46 64.5---84.5
85----104 14 60 84.5---104.5
105---124 5 65 104.5---124.5
125---144 7 72 124.5---144.5
145---164 3 75 144.5---164.5
Since n/2 = 75/2 = 37.5
So the class containing median is 64.5------84.5
Median = h n  20  75 
l+ - c  64.5 + - 24 76.77
=
   
f  2  22  2 
Quartiles for group data
h  n 
Q = l+ j -C
  
j  4
f    
l= Lower class boundary of the class containing jth quartile
(i.e the class corresponding to the cumulative frequency in which 'j(n/4)th' observation
lies).
h= Class interval of the class containing jth quartile
f= Frequency of the class containing jth quartile
n= Total number of observations
C= Cumulative frequency of the class preceding the class containing jth quartile.
Calculate Q1, Q3, D3 & P70
Since n/4 = 75/4 = 18.75
So the class containing Q1 is 44.5-----64.5
20   75  
Q1 = 44.5+ 1  -10  57
14   4 
Since 3n/4 = 225/4 = 56.25
So the class containing Q3 is 64.5-----84.5
20 
= 84.5+
 75  
Q3  3   - 46  99.14
14   4  
Since 3n/10 = 225/10 = 22.5
So the class containing D3 is 44.5-----64.5
20 
= 44.5+
 75  
D3  
3  -10  62.35
14   10 
Since 7n/100 = 525/100 = 5.25
So the class containing P70 is 24.5- - - -44.5
20   75  
= 24.5 + 7 - 4  28.66
P3    
6   100  
Mode:- The mode is defined as that value in the data which occurs the greatest number of
time provided such a value exists. A set of data may have more than one mode or no mode at
all when each observation occurs the same number of time. A distribution having only one
mode is called Uni-modal distribution, having two modes is called bi-modal distribution and
a distribution having more than two modes is called a multi-model distribution.
For grouped data

Mode =l f m- f 1
( f m - f 1 )+( f m - f 2 ) xh
+

Where
l= Lower class boundary of the class containing mode (i.e the class corresponding to the
highest frequency)
h= class interval of the class containing mode
fm=Frequency of the class containing mode
f1=Frequency of the class preceding the class containing mode
f2=Frequency of the class following the class containing mode
Merits of Mode:
 It is easy to calculate and simple to understand.
 It is not affected by extreme values.
 It is suitable average for qualitative data.
 It can be located even in open end classes.
De-Merits of Mode:
 It is an ill-defined average..
 It is not based on all the values.
 It is not suitable for further algebraic treatments.
 It is affected by sampling fluctuations.
Example # 16. Calculate Mode for the data

Weight No of students Class boundaries


118----126 3 117.5----126.5
127----135 5 126.5----135.5
136----144 9 135.5----144.5
145----153 12 144.5----153.5
154----162 5 153.5----162.5
163----171 4 162.5----171.5
172----180 2 171.5----180.5
Since heights frequency is 12
So the class containing Mode is 144.5----153.5

Mode = l
+ fm- f1
( f m - f 1 )+( f m - f 2 ) xh

Mode = 144.5 12 - 9
+ 147.2
(12 - 9)+(12 - 5) x9 =
NOTE:- (i) A data may have more than one mode or no-mode atall
(ii) A data with one mode is called uni-model, 2 modes bi-model or more than 2 modes multi
model data
Relation between mean, median, mode.
Mean = Median = Mode For symmetrical distribution
Mean > Median > Mode For positively skewed distribution
Mean < Median < Mode For negatively skewed distribution
For skewed distribution
Mode = 3 Median – 2 Mean

You might also like