Lecture 2
Lecture 2
through:
1. Frequency Distributions
2. Graphical Representations
4. Measures of variability
1. Frequency distribution:
• The actual summarization and
organization of data starts from
frequency distribution.
• Frequency distribution: A table
which has a list of each of the
possible values that the data can
assume along with the number of
times each value occurs.
• For nominal and ordinal data, frequency distributions are
often used as a summary.
• Example:
Sturge’s rule:
K 1 3.322(log n)
L S
W
K
where
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
• Cumulative frequencies: When
frequencies of two or more classes are
added.
• Cumulative relative frequency: The
percentage of the total number of
observations that have a value either in
that interval or below it.
• Mid-point: The value of the interval which
lies midway between the lower and the
upper limits of a class.
• True limits: Are those limits that make an interval of a
continuous variable continuous in both directions
• Used for smoothening of the class intervals .
• Subtract 0.5 from the lower and add it to the
upper limit .
Time
(Hours) True limit Mid-point Frequency
Total 40
Guidelines for constructing tables
• Keep them simple,
• Limit the number of variables to three or less,
• All tables should be self-explanatory,
• Include clear title telling what, when and where,
• Clearly label the rows and columns,
• State clearly the unit of measurement used,
• Explain codes and abbreviations in the foot-note,
• Show totals,
• If data is not original, indicate the source in foot-
note.
Diagrammatic Representation
Importance of diagrammatic representation:
• Histogram
• Frequency polygon Quantitative
• Stem-and-leaf plot data
• Box plot
• Scatter plot
• Line graph
MEASURES OF CENTRAL TENDENCY (MCT)
mf ii
x=i=1k
f
i=
1
i
w
he
re,
k= thenum be
rofclassinterv a
ls
m i=them id
-po
intoftheithc la
ssinterv
al
fi=thefre
q u
encyoftheithc lassin
terval
EXAMPLE. COMPUTE THE MEAN AGE OF 169 SUBJECTS FROM THE GROUPED DATA.
• For a given set of data there is one and only one arithmetic
mean (uniqueness).
• Easy to calculate and understand (simple).
• Influenced by every value in a data set
• Greatly affected by extreme values.
• In the case of grouped data if any class interval is open, the
arithmetic mean can not be calculated.
2. MEDIAN
a) Ungrouped data
• The median is the value which divides the data set into two
equal parts.
• If the number of values is odd, the median will be the middle
value when all values are arranged in order of magnitude.
• When the number of observations is even, there is no single
middle value but two middle observations.
• In this case the median is the mean of these two middle
observations, when all observations have been arranged in the
order of their magnitude.
Median
Total 169
• n/2 = 84.5 = in the 3rd class interval
• Lower limit = 29.5, Upper limit = 39.5
• Frequency of the class = 47
• Example
• Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8
• There are two modes – 2 & 5
• This distribution is said to be “bi-modal”
• Example
• Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12
• No mode, since all the values are different
B) GROUPED DATA
Mode
Median
Mean
(d) Skewed to the left (negatively skewed) — Same as (c)
Mode
Median
Mean
QUIZ 5%
• Example –
• Data values: 5, 9, 12, 16, 23, 34, 37, 42
• Range = 42-5 = 37
• Data set with higher range exhibit more variability
PROPERTIES OF RANGE
range
2. INTERQUARTILE RANGE (IQR)
IQR = Q3 - Q1
(m i x) 2 f i
S2 i =1
k
i =1
fi - 1
where
mi = the mid-point of the i th class interval
x
fi = the frequency of the i th class interval
k = the number of class intervals
= the sample mean
Properties of Variance:
The main disadvantage of variance is that
its unit is the square of the unite of the
original measurement values .
The variance gives more weight to the
extreme values as compared to those
which are near to mean value, because the
difference is squared in variance.
• The drawbacks of variance are overcome
by the standard deviation.
7. STANDARD DEVIATION (, S)
• It is the square root of the variance.
• This produces a measure having the same
scale as that of the individual values.
and S = S
2 2
EXAMPLE. COMPUTE THE VARIANCE AND SD OF THE AGE OF
169 SUBJECTS FROM THE GROUPED DATA.
MEAN = 5810.5/169 = 34.48 YEARS
S2 = 20199.22/169-1 = 120.23
SD = √S2 = √120.23 = 10.96
Class
interval (mi) (fi) (mi-Mean) (mi-Mean)2 (mi-Mean)2 fi
10-19 14.5 4 -19.98 399.20 1596.80
20-29 24.5 66 -9-98 99.60 6573.60
30-39 34.5 47 0.02 0.0004 0.0188
40-49 44.5 36 10.02 100.40 3614.40
50-59 54.5 12 20.02 400.80 4809.60
60-69 64.5 4 30.02 901.20 3604.80