0% found this document useful (0 votes)
80 views76 pages

Unit II

The document discusses various statistical measures used to describe the dispersion or spread of data in a distribution, including range, interquartile range, quartiles, deciles, and percentiles. It provides formulas and explanations for calculating each measure, whether for an ungrouped or grouped data set. The key measures of dispersion discussed are range, interquartile range, standard deviation, and coefficient of variation.

Uploaded by

Vidhi Maheshwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views76 pages

Unit II

The document discusses various statistical measures used to describe the dispersion or spread of data in a distribution, including range, interquartile range, quartiles, deciles, and percentiles. It provides formulas and explanations for calculating each measure, whether for an ungrouped or grouped data set. The key measures of dispersion discussed are range, interquartile range, standard deviation, and coefficient of variation.

Uploaded by

Vidhi Maheshwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

Unit II

Descriptive Statistics:
Measures of Variance
(Standard Deviation for Sample & Population),
and Measure of Skewness
For which of the following distributions is the mean a true
representative of the data as a whole? Why?
Dispersion

Dispersion is the spread of the data in a distribution, that is, the


extent to which the observations are scattered
Dispersion:
Dispersion is the spread of the data in a distribution, that is, the extent to
which the observations are scattered. Notice that curve ….. in Figure has a
wider spread, or dispersion, than curve …..
• A company has 25 salespeople in the field, and the median annual sales figure for
these people is $1.2 million.

• Are the salespeople being successful as a group or not?

• The median provides information about the sales of the person in the middle, but
what about the other salespeople?

• Are all of them selling $1.2 million annually, or do the sales figures vary widely,
with one person selling $5 million annually and another selling only $150,000
annually?

• .
RANGES: USEFUL MEASURES OF DISPERSION
• The range is the difference between the largest value of a data set and the smallest
value of a set

• An advantage of the range is its ease of computation.

• One important use of the range is in quality assurance, where the range is used to
construct control charts.

• A disadvantage of the range is that, because it is computed with the values that are on
the extremes of the data, it is affected by extreme values, and its application as a
measure of variability is limited.
Interquartile Range

• Another measure of variability is the interquartile range.

• The interquartile range is the range of values between the first and third quartile.

• Essentially, it is the range of the middle 50% of the data and is determined by computing
the value of Q3 - Q1.

• The interquartile range is especially useful in situations where data users are more
interested in values toward the middle and less interested in extremes.
• In describing a real estate housing
market, Realtors might use the
interquartile range as a measure of
housing prices when describing the
middle half of the market for buyers
who are interested in houses in the
midrange.

• In addition, the interquartile range is


used in the construction of box-and-
whisker plots.
Quartiles
Quartiles are the set of values which has three points dividing the data set into four identical parts

he middle part of the three quarters measures the central point of distribution and shows the data which are near
to the central point. The lower part of the quarters indicates just half information set which comes under the
median and the upper part shows the remaining half, which falls over the median. In all, the quartiles depict the
distribution or dispersion of the data set.
Ungrouped data

Q1 = [(n+1)/4]th item

Q2 = [(n+1)/2]th item

Q3 = [3(n+1)/4]th item


Grouped data

Where, Qr is the rth quartile


• l1 is the lower limit
• l2 is the upper limit
• f is the frequency
• c is the cumulative frequency of the class preceding the quartile
class.
Quartile Deviation

• Quartile deviation is defined as half of


the distance between the third and the
first quartile.

• It is also called Semi Interquartile range.

• If Q1 is the first quartile and Q3 is the


third quartile, then the formula for
deviation is given by;
Decile
• The term “decile” refers to the nine values that split the population data into ten equal
fragments such that each fragment is representative of 1/10th of the population.
• The concept of decile because it is widely used in the field of portfolio management to
assess the performance of a portfolio. The ranking helps to compare the performance of
an asset with other similar assets.
• The decile method is also used by the government to determine the income distribution or
level of income equality in a nation.
Ungrouped data

D1 = [(n+1)/10]th item

D2 = [2(n+1)/10]th item

D9 = [9(n+1)/10]th item


The rth Decile (a measure of the relative standing of an observation) for
grouped data is

Where, Dr is the rth Decile


• l1 is the lower limit
• l2 is the upper limit
• f is the frequency
• c is the cumulative frequency of the class preceding the percentile
class.
Percentile

• Percentiles tell you how a value compares to other values. The general rule is
that if value X is at the kth percentile, then X is greater than K% of the
values. 

• Percentiles are a measure of the


relative standing of observation
within a data. Percentiles divide a
set of observations into 100 equal
parts, and percentile scores are
frequently used to report results
from national standardized tests
such as NAT, GAT, etc.
• Note that 50th percentile is the median by definition as half
of the values in the data are smaller than the median and
half of the values are larger than the median.

• Similarly, 25th and 75th percentiles are the lower (Q1) and


upper quartiles (Q3) respectively.

• The quartiles, deciles, and percentiles are also


called quantiles or fractiles.
Ungrouped data

P1 = [(n+1)/100]th item

P2 = [2(n+1)/100]th item

P99 = [99(n+1)/100]th item


The rth percentile (a measure of the relative standing of an observation) for
grouped data is

Where, Pr is the rth percentile


• l1 is the lower limit
• l2 is the upper limit
• f is the frequency
• c is the cumulative frequency of the class preceding the percentile
class.
Standard Deviation
• One of the most common methods of determining the risk an investment
poses is standard deviation.
• When prices move wildly, standard deviation is high, meaning an investment
will be risky.
• Low standard deviation means prices are calm, so investments come with
low risk.
Example 01 of Standard Deviation Using Investments
• Let’s say you invest in Company XYZ which has returned an average of 10% per year
for the last 10 years. We’ll compare how risky this stock is compared to Company ABC.
We’ll take a closer look at the year-by-year returns that compose that average:
XYZ's returns

SD of XYZ stock 20.68%.


ABC's returns

SD of ABC stock is a much lower 0.0129 or 1.29%


What Is a Good Standard Deviation? 

• There isn’t a standard benchmark of


what is considered a “good”
standard deviation –

• it all depends on your investing


goals. For someone who wants to be
less risky with their portfolio, a high
standard deviation would be
considered “bad”, whereas someone
who desires to be more aggressive
would consider it “good”. 
Advantages and Disadvantages of Standard Deviation? 

Standard Deviation
Advantages Disadvantages
•Shows how much data is clustered •It doesn't give you the full range of the
around a mean value data
•It gives a more accurate idea of how the •Only used with data where an
data is distributed independent variable is plotted against
•Not as affected by extreme values the frequency of it
•Assumes a normal distribution pattern
Empirical Rule of Standard Deviation?[Three Sigma Rule or the 68-
95-99.7 ] 
• The Empirical Rule states that 99.7% of data observed following a normal distribution lies
within 3 standard deviations of the mean.
• Under this rule, 68% of the data falls within one standard deviation, 95% percent within
two standard deviations, and 99.7% within three standard deviations from the mean.
Sample and Population Standard Deviation? 

• The formula we use for standard deviation depends on whether the


data is being considered a population of its own, or the data is a
sample representing a larger population.

• If the data is being considered a population on its own, we divide by


the number of data points, NNN.

• If the data is a sample from a larger population, we divide by one


fewer than the number of data points in the sample, n-1n−1n, minus,
1.
Population Standard Deviation? 

• σ 2 = population variance
• σ = population standard deviation
• f = frequency of each of the classes
• x = midpoint for each class
• μ = population mean
• N= size of the population
Sample Standard Deviation? 

√ ∑ 𝑓 (𝑥−𝑥) 2

𝑠=
𝑛−1
Coefficient of Variation 
• The coefficient of variation (CV) is a statistical measure of the dispersion of data points in a data series
around the mean.
• The coefficient of variation represents the ratio of the standard deviation to the mean, and it is a
useful statistic for comparing the degree of variation from one data series to another, even if the
means are drastically different from one another.
Problem 

iv Determine the Coefficient of Quartile Deviation


v Determine the 8th Decile
vi Compute Variance
Problem 
Problems 
Problems 
• The following table gives the amount of time (in minutes) spent on the internet each
evening by a group of 56 students. Compute five number summary for the following
frequency distribution.
Time spent on 10-12 13-15 16-18 19-21 22-24
Internet (x)

No. of 3 12 15 24 2
students (f)
Compute for the following frequency distribution

1. Coefficient of Quartile Deviation


2. 7th Decile
3. Variance
4. 71th Percentile
5. Standard Deviation
6. Variance
7. Coefficient of Variation
Practice Problems 

• The following data represent the


difference in scores between the
winning and losing teams in a sample of Point Number of Bowl
Difference Games
15 college football bowl games from
1-5 8
2004-2005. 6 - 10 0
11 - 15 2
16 - 20 3
Compute for the following frequency distribution 21 - 25 1
1. Coefficient of Quartile Deviation 26 - 30 0
31 - 35 1
2. 7th Decile
3. Variance
4. 71th Percentile
5. Standard Deviation
6. Variance
7. Coefficient of Variation
Q.03 A study of the age of 100 persons grouped into intervals 20-22,22-24, 24-
26….. Revealed the mean age and standard deviation to be 32.02 and13.18
respectively. While checking, it was discovered that the observation 57 was
misread as 27. Calculate the correct mean age and SD.
Problem 

Q.03 A study of the age of 100 persons grouped into intervals 20-22,22-24, 24-
26….. Revealed the mean age and standard deviation to be 32.02 and13.18
respectively. While checking, it was discovered that the observation 57 was
misread as 27. Calculate the correct mean age and SD.
Problem 

Q.04 The mean of 5 observations is 15 and the variance is 9. If two more


observations having values -3 and 10 are combined with these 5
observations, what will be the new mean and variance of 7 observations.
Practice Problems 

• The mean and standard deviation of 20 items are found to be 10 and 2 respectively. At the time of checking it was
found that an item 12 was wrongly entered as 8. Calculate the correct mean and standard deviation.

• Mean of 100 items is 48 and their standard deviation is 10. Find the sum of all the items and the sum of the squares
of all the items.

• A student obtained the mean and the standard deviation of 100 observations as 40 and 5.1. It was later found that
one observation was wrongly copied as 50, the correct figure being 40. Find the correct mean and the S.D

• The mean and variance of seven observations are 8 and 16 respectively. If five of these are 2, 4, 10, 12 and 14, then
find the remaining two observations.

• For a group of 100 candidates the mean and standard deviation of their marks were found to be 60 and 15
respectively. Later on it was found that the scores 45 and 72 were wrongly entered as 40 and 27. Find the correct
mean and standard deviation
Skewness 
• Skewness means “Lack of Symmetry.

• When curve is not symmetrical, the values of Mean, Mode and Mean fall at different
points. The curve may shift its bulk of the bell-shape either to the right or left of the Mean
Value. These are called skewness to the left or right of the mean.
Karl Pearson’s coefficient of skewness
Problem 
Calculate karl Pearson Coefficient of Skewness for a distribution
having mean=3.41, median=3.4 and standard deviation =0.70
Sk=(3(3.41-3.4))/0.70
Sk=0.03/0.70
Sk=0.043
Problem 
Calculate karl Pearson Coefficient of Skewness for a distribution
having mean=75, median=80 and standard deviation =20
Sk=(3(75-80))/20
Sk=-15/20
Sk=-0.75
• Karl Pearson Coefficient of skewness of a distribution is 0.32. Its s.d. is
6.5 and the mean is 29.6.Find the mode and median of the
distribution.
Problem 
Calculate the Pearson’s coefficient of skewness based on Mean and Mode
from the following information.

Wages (Rs.) : 0-10 10-20 20-30 30-40 40-50


No. of workers : 15 20 30 25 10
Problem 
Calculate the Pearson’s coefficient of skewness based on Mean and Mode
from the following information.

Wages (Rs.) : 0-10 10-20 20-30 30-40 40-50


No. of workers : 15 20 30 25 10
Problem 
Calculate the Pearson’s coefficient of skewness based on Mean and Mode from the following
information.

Class : 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80


Frequency: 5 6 11 21 35 30 22 11
Practice Problem 
The radio music listener market is diverse. Listener formats might include adult
contemporary, album rock, top 40, oldies, rap, country and western, classical, and
jazz. In targeting audiences, market researchers need to be concerned about the
ages of the listeners attracted to particular formats. Suppose a market researcher
surveyed a sample of 170 listeners of country music radio stations and obtained
the following age distribution.
Age Frequency
A. What are the mean and modal ages of country music 15–under 20 9
20–under 25 16
listeners? 25–under 30 27
30–under 35 44
B. What are the variance and standard deviation of the 35–under 40 42
ages of country music listeners? 40–under 45 23
45–under 50 7
C. Calculate the Pearson’s coefficient of skewness 50–under 55 2

You might also like