Wordpress Documentation
Wordpress Documentation
The raw data, which have been collected are usually very large in quantity. Therefore, we
have to organize and summarize the collected data in such a form that is easy to understand.
This is called presentation of statistical data.
Array: The arrangement of data in ascending or descending order of magnitude is called an
array.
Different methods used in the presentation of statistical data
1. Classification 2. Tabulation 3. Diagram 4. Graph
Classification: Process of arranging the data into relatively homogenous groups or classes
according to some common characteristics is called classification. For example, population of
the country is classified according to age, sex, religion and marital status.
Tabulation: The systematic arrangement of the data in the form of rows and columns for the
purpose of comparison and analysis is known as tabulation.
Frequency distribution: A frequency distribution is a tabular arrangement of data in which
various items are arranged into classes or groups and the number of items falling in that class
is stated. The number of observations falling in a particular class is called class frequency or
simply frequency of that class and is denoted by "f".
Class and Class frequency: when a set of data are divided into non-overlapping
homogeneous groups, each group is called class or class interval. The number of observations
falling in a particular class is called frequency of that class or simply frequency and is
denoted by "f".
Class limits: The class limits are defined as the number or the values of the variables which
are used to separate two classes. The smaller number is called lower class limit and larger
number is called upper class limit.
Class boundaries: The class boundaries are obtained by subtracting and adding half of the
difference between the upper limit and lower limit of two successive classes respectively. It
can also be obtained by subtracting and adding h/2 from midpoint of each class.
Class mark or mid points: The class mark or the midpoint is that value which divides a
class into two equal parts. It is obtained by dividing the sum of lower and upper class limits
or class boundaries of a class by 2.
Class interval: Class interval is the length of a class. A class interval is usually denoted by "h".
It is obtained by
(i) The difference between the upper-class boundary and the lower-class boundary.(Not
the difference between class limits)OR
(ii) The difference between either two successive lower class limits or two successive
upper class limits. OR
(iii) The difference between two successive midpoints.
CONSTRUCTION OF A FREQUENCY DISTRIBUTION:
Decide the number of classes: The number of classes is determining by the formula i.e.
K=1+3.3log(n) OR n
k (approximately)
Where K denotes the number of classes and n denotes the total number of observations.
Determine the range of the data: The difference between the largest and smallest values in
the data is called the range of the data. i.e. R = largest observation - smallest
observation
Where R denote the range of the data.
Determine the approximate size of class interval: The size of the class interval is
determined by dividing the range of the data by the number of classes i.e. h= R/K
Where h denotes the size of the class interval. In case of fractional results, the next higher
whole number is usually taken as the size of the class interval.
Decide where to locate the class limits: The lower-class limit of the first class is started just
below the smallest value in the data and then add class interval to get lower class limit of the
next class, repeat this process until the lower-class limit of the last class is achieved.
Distribute the data into appropriate classes:Take an observation and marked a vertical bar
"I"(Tally) against the class it belongs.
Cumulative Frequency:Cumulative frequency of a class is obtained by adding all the
frequencies of all preceding classes including that class and is denoted by c.f.
Relative Frequency: The frequency of a class divided by the total frequency of all the classes
is called Relative frequency and is denoted by r.f.
Cumulative relative frequency: Cumulative relative frequency of a class is obtained by
adding all the relative frequencies of all preceding classes including that class.
Percentage frequency: Percentage frequency of a class is obtained by multiplying100 to the
relative frequencies of that class.
Cumulative percentage frequency: Cumulative percentage frequency of a class is
obtained by adding all the percentage frequencies of all preceding classes including that
class.
Example # 1. The following data is the final plant height (cm) of thirty plants of wheat.
Construct a frequency distribution
87 91 89 88 89 91 87 92 90 98 95 97 96
100 101 96 98 99 98 100 102 99 101 105 103 107
105 106 107 112
(i) Number of classes: The number of classes is determining by the
formula K = 1+3.3 log(n) = 1+3.3 log (30)= 1+3.3(1.4771) = 5.87 ≈ 6
(ii) Size of class interval:The size of the class interval h= R/K
R = Largest observation - Smallest observation = 112 - 87 =
25 h = 25/6 = 4.17 ≈5
FREQUENCY DISTRIBUTION
30 1.0000 100
Example # 2. The following data represent the number of goals scored by a team in 10 matches
0,0,1,1,3,1,3,0,2,0 construct frequency distribution
Male 6
Female 4
Graphs: Graph means the drawing of geometrical curves in conformity with the given data.
It is a representation of data by a continuous curve.
Advantages of Graphs:
Graphs are most effective way to represent data.
Graphs are the most effective way to compare two sets of data at a time.
Graphs are helpful to show the general trend of data.
Graphs are helpful in prediction and forecasting.
Graphs are useful to locate some of the averages.
Types of graphs: Different types of graphs are commonly used for displaying statistical data
are described below:
(i) Historigram/Graph of time series (ii) Histogram
(iii) Frequency polygon & Frequency curve (iv) Cumulative Frequency polygon or Ogive
Historigram/Graph of time series: A graph of time series is called historigram. A
Historigram is constructed by taking time along X-axis and the value of the variable along Y-
axis. Points are plotted and are then connected by straight line segments to get the
Historigram.
Histogram: Histogram is the graphical representation of frequency distribution by a set of
adjacent rectangles in which area of each rectangle is proportional to the corresponding
frequency. In the construction of histogram class boundaries taking along the X-axis and
whose height are proportional to the frequencies with respective classes (frequency along Y-
axis).But in case of unequal class interval adjusted frequency is used in place of frequency
where adjusted frequency is obtained by dividing the frequency to the class interval.
Frequency polygon: A frequency polygon is a line graph of frequency distribution in which
the frequencies are plotted against the mid points of the classes. Itis constructed by taking the
midpoints along X-axis and class frequency along Y-axis. Points are plotted and are then
connected by straight line segments. But to get a polygon* add extra class midpoint at both
ends of the distribution with zero frequency so that the polygon does form a closed figure
with the horizontal axis.
Frequency curve: If the frequency polygon is smoothed out, the resulting graph is called a
frequency curve. OR
A frequency curve is constructed by taking the midpoints along X-axis and class frequency
along Y-axis. Points are plotted and are then connected by free hand curve.
Cumulative frequency polygon/Ogive: A Cumulative frequency polygon is obtained by
plotting the cumulated frequency (along Y-axis) against the upper-class boundaries (along X-
axis) and the points are joined by straight line segments. To get a polygon include lower class
boundary of the first class with zero frequency and joined the last point with the last upper
class boundary.
Types of frequency curve:
(1) Symmetrical distribution (2) Skewed distribution
Symmetrical distribution: A frequency distribution or curve is said to be symmetrical if
values equidistant from a central maximum have the same frequencies. For example, Normal
curve. Skewed distribution A frequency distribution or curve is said to be skewed when it
departs from symmetry.
STEM-AND-LEAF DISPLAY:- A clear disadvantages of using a frequency table is that the
identity of individual observations is lost in grouping process. To overcome this drawback,
Jhon Tukey (1977) introduced a technique known as the Stem-and-Leaf display. This
technique offers a quick and novel way for simultaneously sorting and displaying data sets
where each number in the data set is divided into two parts, a Stem and a Leaf. A Stem is the
leading digit(s) of each number and is used in sorting, while a leaf is the rest of the number or
the trailing digit(s) and
shown in display. A vertical line separates the Leaf from the Stem. For example, the number
243
could be split two ways:
2 43 24 3
3│286 3*│2
4│519 3.│86
5│2619 4*│1
6│238 4.│59
5*│21
5.│69
6*│23
6.│8
Two data sets can be compared by using Back-to-Back stem and leaf display
Data 1) 32 45 38 41 49 36 52 56 51 62 63 59
68
Data 2) 23 58 26 57 55 65 29 36 59 69 60
Data 1 Data 2
(# 13) (# 11)
│ 2 │369
682│ 3 │6
915│ 4 │
9162│ 5 │8759
832│ 6 │590
MEASURE OF CENTRAL TENDENCY
OR
MEASURE OF CENTRAL
TENDENCY
An average is a single value As an average tends to lie at the center of the distribution
or data so it is called measure of location or measure of central tendency.
Average:
An average is a numerical value that is used to represent a set of data.
Properties of a good Average:A good average must have the following properties:
It should be clearly defined by mathematical formula.
It should be easy to calculate and simple to understand.
It should be based on all observation of data.
It should be capable for further algebraic treatment.
It should be least affected by fluctuation of sampling.
It should not be affected by extreme values.
Types of averages: The common used averages are:
(i) Arithmetic mean (ii) Geometric mean (iii) Harmonic mean
(iv) Median and quantiles (v) Mode
Arithmetic mean: Arithmetic mean (A.M) of a set of data is obtained by dividing the sum of
all the observations by the total number of observations. It is denoted by Greek letter " ".read
as “meu” for the population data. Population mean for N values is given as
X + X + X + ...
X X i
data = 1 2
N
3 N
= N
The estimate of population mean is the sample mean and is denoted by “ X ” read as “X-
bar” for the sample data. Sample mean for n values is given as
For ungrouped data
x + x + x + ... x x n
i
i1
Direct X= 1 2 3 n
=
Method n n
D
n
i
i1
In-direct Method/Short-cut X =A
Method n
U
n
i
x 902
X = n = = 90.2
10
D
X = A
= 89 = 89 1.2 90.2
n 12
10
U
X = A * 12
= 89 *1 = 89 1.2 *1 90.2
h 10
n
When the number of observations is very large, the data is organized into a frequency
distribution, which is used to calculate the approximate values of descriptive measures as the
identity of the observations is lost. To calculate the approximate value of the mean, the
observations in each class are assumed to be identical with the class midpoint so that the
product of the midpoint by the number of observations, i.e. frequency would be
approximately equal to the sum of observations for each class
For grouped data
f x + f x + f x + ... f fx n
i i
x
i1
Direct X= 1 1 2 2 3 3 n n
=
Method
f f
n n
i i
i1 i1
fD
n
i
In-direct Method/Short-cut X = A i1
Method f
where D X A and “ A ” is an arbitrary value.
fU
n
i
Step-deviation Method/Coding X = A i 1
*h
Method f
XA
where U and “ A ” is an arbitrary value and “ h ” is the common interval.
h
Example # 2. Find the arithmetic mean for
fx
X = f 1846
= 48 = 38.52
X = A
fD = 32 = 32 6.52 38.52, where, A 32
340
f 48
fU 62
X = A * = 32 * 5 = 89 1.3* 5 38.2, where, A 32andh 5
h 48
f
NOTE: At least one observation will be below and atleast one will be above the mean
PROPERTIES OF ARITHMETIC MEAN: Following are the properties of the arithmetic
mean.
Mean of the constant values is equal to a constant.
The sum of the deviations of the observations from their mean is equal to zero.
(X X )
The sum of squared deviations of the observations from their mean is minimum is that
squared deviation of the observations from an arbitrary value.
( X 2 X
2
( a)
) X
Where 'a' is any value other than mean of the data
If n1 values have
mean X1 , n2 values have mean X 2 , n3 values have X 3 , and
mean
n1 X 1 + n2 X 2 ,...nk X
so on then the mean of all the values is Xc k
= n1 + n2 + ...nk
Arithmetic mean is dependent of origin and scale. i.e. If a variable X has mean X ,
then mean of new variable Y will
be If Y a bX
Where a & b are any Y a bX
constants
Example # 3.
OR G = (X 1 * X 2 * X 3 *...* X n )n OR
[ log x1 + log x2 ...log X k ] log X
Log G =
= n
n log X
G = Antilog
n
f1
f2 fn
1
f
f3
For grouped data Geometric mean is given by G
= ) 2 *(X 3 ) *...*(X n )
(X1 ) *(X
Where, n denote total number of classes
OR
f 1 log X 1 + f 2 log X 2 + f 3 log X 3 ... f n log X n f log X
Log G = f + f 2 + f 3 ... f n
= 1 f
f log X
G = Antilog
f
Merits of Geometric
Mean:
It is clearly defined by mathematical formula.
It is based on all the observations.
It is least affected by extreme values.
It is suitable for further algebraic treatments.
It gives equal weights to all the values.
It is an appropriate average for averaging rates of change and ratios.
De-Merits of Geometric Mean:
It is neither easy to calculate nor simple to understand.
It cannot be calculated if any value is zero or negative in the data.
It cannot be calculated in case of open-end frequency
distribution.
Example # 10. Find the Geometric mean of the values 3, 5, 6, 6, 7, 10, 12.
X 3 5 6 6 7 10 12 Total
log(X) .4771 .6989 .7782
log .7782
X .8451 1.000 1.0792 5.65677
G = Antilog Antilog 5.65677
=
n 7
= Antilog (.80811) = 6.43
Example # 11. The grouped data is available on insect growth population for age and
corresponding frequencies. Find Geometric mean
x x x
H .M = Reciprocal
of 1 2 3 k
=
f + f + f ... f f
1 2 3 k
x
Merits of Harmonic Mean:
It is clearly defined by mathematical formula.
It is based on all the observations of the data.
It is suitable for further algebraic treatments.
It is not affected by extreme large observations.
It is not affected by sampling fluctuations.
It gives more weightage to the small values and less weightage to the large values.
It is better than weighted mean since in this, values are automatically weighted.
De-Merits of Harmonic Mean:
It is neither easy to calculate nor simple to understand.
It cannot be calculated if any value of the data is zero.
It is affected by extremely small observations.
It may be a value which is usually not present in the data.
Example # 12. Calculate Harmonic mean for the following data
Relation between A.M, G.M &H.M A.M G.M H.M G.M A.M
H.M
The three means are equal only when all the observations are identical. A.M = G.M = H.M
G.M A.M * H.M
Example # 13. Verify the relation A.M > G.M > H.M for the following
5-----24 4 4 4.5---24.5
25-----44 6 10 24.5---44.5
45-----64 14 24 44.5---64.5
65-----84 22 46 64.5---84.5
85----104 14 60 84.5---104.5
105---124 5 65 104.5---124.5
125---144 7 72 124.5---144.5
145---164 3 75 144.5---164.5
Since n/2 = 75/2 = 37.5
So the class containing median is 64.5------84.5
Median = h n 20 75
l+ - c 64.5 + - 24 76.77
=
f 2 22 2
Quartiles for group data
h n
Q = l+ j -C
j 4
f
l= Lower class boundary of the class containing jth quartile
(i.e the class corresponding to the cumulative frequency in which 'j(n/4)th' observation
lies).
h= Class interval of the class containing jth quartile
f= Frequency of the class containing jth quartile
n= Total number of observations
C= Cumulative frequency of the class preceding the class containing jth quartile.
Calculate Q1, Q3, D3 & P70
Since n/4 = 75/4 = 18.75
So the class containing Q1 is 44.5-----64.5
20 75
Q1 = 44.5+ 1 -10 57
14 4
Since 3n/4 = 225/4 = 56.25
So the class containing Q3 is 64.5-----84.5
20
= 84.5+
75
Q3 3 - 46 99.14
14 4
Since 3n/10 = 225/10 = 22.5
So the class containing D3 is 44.5-----64.5
20
= 44.5+
75
D3
3 -10 62.35
14 10
Since 7n/100 = 525/100 = 5.25
So the class containing P70 is 24.5- - - -44.5
20 75
= 24.5 + 7 - 4 28.66
P3
6 100
Mode:- The mode is defined as that value in the data which occurs the greatest number of
time provided such a value exists. A set of data may have more than one mode or no mode at
all when each observation occurs the same number of time. A distribution having only one
mode is called Uni-modal distribution, having two modes is called bi-modal distribution and
a distribution having more than two modes is called a multi-model distribution.
For grouped data
Mode =l f m- f 1
( f m - f 1 )+( f m - f 2 ) xh
+
Where
l= Lower class boundary of the class containing mode (i.e the class corresponding to the
highest frequency)
h= class interval of the class containing mode
fm=Frequency of the class containing mode
f1=Frequency of the class preceding the class containing mode
f2=Frequency of the class following the class containing mode
Merits of Mode:
It is easy to calculate and simple to understand.
It is not affected by extreme values.
It is suitable average for qualitative data.
It can be located even in open end classes.
De-Merits of Mode:
It is an ill-defined average..
It is not based on all the values.
It is not suitable for further algebraic treatments.
It is affected by sampling fluctuations.
Example # 16. Calculate Mode for the data
Mode = l
+ fm- f1
( f m - f 1 )+( f m - f 2 ) xh
Mode = 144.5 12 - 9
+ 147.2
(12 - 9)+(12 - 5) x9 =
NOTE:- (i) A data may have more than one mode or no-mode atall
(ii) A data with one mode is called uni-model, 2 modes bi-model or more than 2 modes multi
model data
Relation between mean, median, mode.
Mean = Median = Mode For symmetrical distribution
Mean > Median > Mode For positively skewed distribution
Mean < Median < Mode For negatively skewed distribution
For skewed distribution
Mode = 3 Median – 2 Mean