Unit II
Unit II
Descriptive Statistics:
Measures of Variance
(Standard Deviation for Sample & Population),
and Measure of Skewness
For which of the following distributions is the mean a true
representative of the data as a whole? Why?
Dispersion
• The median provides information about the sales of the person in the middle, but
what about the other salespeople?
• Are all of them selling $1.2 million annually, or do the sales figures vary widely,
with one person selling $5 million annually and another selling only $150,000
annually?
• .
RANGES: USEFUL MEASURES OF DISPERSION
• The range is the difference between the largest value of a data set and the smallest
value of a set
• One important use of the range is in quality assurance, where the range is used to
construct control charts.
• A disadvantage of the range is that, because it is computed with the values that are on
the extremes of the data, it is affected by extreme values, and its application as a
measure of variability is limited.
Interquartile Range
• The interquartile range is the range of values between the first and third quartile.
• Essentially, it is the range of the middle 50% of the data and is determined by computing
the value of Q3 - Q1.
• The interquartile range is especially useful in situations where data users are more
interested in values toward the middle and less interested in extremes.
• In describing a real estate housing
market, Realtors might use the
interquartile range as a measure of
housing prices when describing the
middle half of the market for buyers
who are interested in houses in the
midrange.
he middle part of the three quarters measures the central point of distribution and shows the data which are near
to the central point. The lower part of the quarters indicates just half information set which comes under the
median and the upper part shows the remaining half, which falls over the median. In all, the quartiles depict the
distribution or dispersion of the data set.
Ungrouped data
• Percentiles tell you how a value compares to other values. The general rule is
that if value X is at the kth percentile, then X is greater than K% of the
values.
Standard Deviation
Advantages Disadvantages
•Shows how much data is clustered •It doesn't give you the full range of the
around a mean value data
•It gives a more accurate idea of how the •Only used with data where an
data is distributed independent variable is plotted against
•Not as affected by extreme values the frequency of it
•Assumes a normal distribution pattern
Empirical Rule of Standard Deviation?[Three Sigma Rule or the 68-
95-99.7 ]
• The Empirical Rule states that 99.7% of data observed following a normal distribution lies
within 3 standard deviations of the mean.
• Under this rule, 68% of the data falls within one standard deviation, 95% percent within
two standard deviations, and 99.7% within three standard deviations from the mean.
Sample and Population Standard Deviation?
• σ 2 = population variance
• σ = population standard deviation
• f = frequency of each of the classes
• x = midpoint for each class
• μ = population mean
• N= size of the population
Sample Standard Deviation?
√ ∑ 𝑓 (𝑥−𝑥) 2
𝑠=
𝑛−1
Coefficient of Variation
• The coefficient of variation (CV) is a statistical measure of the dispersion of data points in a data series
around the mean.
• The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a
useful statistic for comparing the degree of variation from one data series to another, even if the
means are drastically different from one another.
Problem
No. of 3 12 15 24 2
students (f)
Compute for the following frequency distribution
Q.03 A study of the age of 100 persons grouped into intervals 20-22,22-24, 24-
26….. Revealed the mean age and standard deviation to be 32.02 and13.18
respectively. While checking, it was discovered that the observation 57 was
misread as 27. Calculate the correct mean age and SD.
Problem
• The mean and standard deviation of 20 items are found to be 10 and 2 respectively. At the time of checking it was
found that an item 12 was wrongly entered as 8. Calculate the correct mean and standard deviation.
• Mean of 100 items is 48 and their standard deviation is 10. Find the sum of all the items and the sum of the squares
of all the items.
• A student obtained the mean and the standard deviation of 100 observations as 40 and 5.1. It was later found that
one observation was wrongly copied as 50, the correct figure being 40. Find the correct mean and the S.D
• The mean and variance of seven observations are 8 and 16 respectively. If five of these are 2, 4, 10, 12 and 14, then
find the remaining two observations.
• For a group of 100 candidates the mean and standard deviation of their marks were found to be 60 and 15
respectively. Later on it was found that the scores 45 and 72 were wrongly entered as 40 and 27. Find the correct
mean and standard deviation
Skewness
• Skewness means “Lack of Symmetry.
• When curve is not symmetrical, the values of Mean, Mode and Mean fall at different
points. The curve may shift its bulk of the bell-shape either to the right or left of the Mean
Value. These are called skewness to the left or right of the mean.
Karl Pearson’s coefficient of skewness
Problem
Calculate karl Pearson Coefficient of Skewness for a distribution
having mean=3.41, median=3.4 and standard deviation =0.70
Sk=(3(3.41-3.4))/0.70
Sk=0.03/0.70
Sk=0.043
Problem
Calculate karl Pearson Coefficient of Skewness for a distribution
having mean=75, median=80 and standard deviation =20
Sk=(3(75-80))/20
Sk=-15/20
Sk=-0.75
• Karl Pearson Coefficient of skewness of a distribution is 0.32. Its s.d. is
6.5 and the mean is 29.6.Find the mode and median of the
distribution.
Problem
Calculate the Pearson’s coefficient of skewness based on Mean and Mode
from the following information.