0% found this document useful (0 votes)
2 views

MCS Lecture 3

This document covers numerical descriptive measures in probability and statistics, focusing on measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation, inter-quartile range). It explains how to calculate these measures for both ungrouped and grouped data, and discusses the importance of understanding data distribution and outliers. Additionally, it introduces graphical representations like box-and-whisker plots to visualize data characteristics.

Uploaded by

Cynosure Wolf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

MCS Lecture 3

This document covers numerical descriptive measures in probability and statistics, focusing on measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation, inter-quartile range). It explains how to calculate these measures for both ungrouped and grouped data, and discusses the importance of understanding data distribution and outliers. Additionally, it introduces graphical representations like box-and-whisker plots to visualize data characteristics.

Uploaded by

Cynosure Wolf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Probability and Statistics

Lecture # 3

NUMERICAL DESCRIPTIVE
MEASURES
PROBABILITY AND STATISTICS

CHAPTER 3
NUMERICAL descriptive measures
TABLE OF CONTENTS

 Measures of Central Tendency

 Measures of Central Dispersion


MEASURES OF CENTRAL TENDENCY
 Central Tendency denotes the tendency of quantitative data to cluster around some
central value.
 It may also be called a center or location of the distribution.
 As such, measures of central tendency are sometimes called measures of central
location.
MAIN TYPES

Mean

Median Mode
OTHER TYPES
 Geometric Mean: The nth root of the product of the data values.

 Harmonic Mean: The reciprocal of the arithmetic mean of the reciprocals of the data
values.

 Mid-Range: The arithmetic mean of the maximum and minimum values of a data set.

 Trimmed Mean: The arithmetic mean of data values after a certain number or
proportion of the highest and lowest data values have been discarded.
MEAN FOR UNGROUPED DATA
 It is the sum of all the data values divided by the number of total data values

σ𝑥
 For population µ=
𝑁

σ𝑥
 For sample x̄̄=
𝑛

 Here ‘x’ is an arbitrary data value


 ‘N’ is the number of elements in a population.
 ‘n’ is the number of elements in a sample.
EXAMPLE
 Following data represents the scores of Aqil Khan in a tournament:

18, 24, 38, 36, 21, 40, 33, 22

Find the average score.

Solution:

18+24+38+36+21+40+33+22
X ̄= = 29
8
MEAN FOR GROUPED DATA
 First, data must be arranged in frequency distribution.

 For grouped data mean is given by the following equation:

σ 𝑓𝑚
 For population µ=
𝑁

σ 𝑚𝑓
 For Sample x̄̄ =
𝑛

 ‘m’ represents the mid-point of each class.


 ‘f’ is the frequency of the corresponding class
 ‘n’ and ‘N’ are the same as described previously.
EXAMPLE
 The following frequency distribution table represents the daily commuting hours of all
the workers in an office. Find the average commuting hours for a worker,
SOLUTION
 We have to draw a table as follows:
MEDIAN
Definition
The median is the value of the middle term in a data set that has been ranked in
increasing order.
MEDIAN FOR UNGROUPED DATA
Steps to calculate the

As obvious from the definition of the median, it divides a ranked data set into two equal
parts.

The calculation of the median consists of the following two steps:

 Rank the data set in increasing order.

 Find the middle term. The value of this term is the median.
MEDIAN FOR UNGROUPED DATA
For odd number of observations
If the number of observations in a data set is odd, then the median is given by the value
of the middle term in the ranked data

Example
The following data give the prices (in thousands of dollars) of seven houses selected from
all houses sold last month in a city.

312 257 421 289 526 374 497


Find the median?
MEDIAN FOR UNGROUPED DATA
Solution
First, we rank the given data in increasing order as follows:
257 289 312 374 421 497 526

Since there are seven homes in this data set and the middle term is the fourth term, the
median is given by the value of the fourth term in the ranked data.

257 289 312 374 421 497 526

Thus, the median price of a house is 374, or $374,000.


MEDIAN FOR UNGROUPED DATA
For even number of observations
If the number of observations is even, then the median is given by the average of the
values of the two middle terms.

Example
Consider the data
7 8 9 10 11 12 13 13 14 17 17 45

Arranging the data in ascending order


7 8 9 10 11 12 13 13 14 17 17 45

Median = (12+13)/2 = 12.5


MEDIAN FOR GROUPED DATA
The median for a given grouped data can be calculated by the following formula

𝒉 𝒏
𝒍 + ( − 𝒄)
𝒇 𝟐

Where
l = Lower class boundary of the median class
h = Class width or interval
f = Frequency of the median class
n = Total number of observations
c = Cumulative frequency of the class preceding the median class
MEDIAN FOR GROUPED DATA

Example
MEDIAN FOR GROUPED DATA
Here
𝒏
= 75
𝟐

So the class 5-9 is the median class. The remaining values are

c = 32
f = 71
h = 9.5 – 4.5 = 5
l = 4.5

Median = 7.53 minutes


MODE
Definition
The mode is the value that occurs with the highest frequency in a data set.

Mode is a French word that means fashion—an item that is most popular or common.

In statistics, the mode represents the most common value in a data set.
MODE FOR UNGROUPED DATA
The mode, in this case, is simply the most repeated value in the data set.

A data set can contain one or more than one values that are repeated with the same
peak frequency. In this perspective the data set can be
 Uni-modal
 Bimodal
 Multimodal

A data set in which all the values are repeated with the same frequency has no modal
value.
EXAMPLES

Unimodal
77 82 74 81 79 84 74 78
Mode = 74
Bimodal
77 82 74 81 77 84 74 78
Mode = 74, 77
Multimodal
77 82 74 82 77 84 74 78
Mode = 74, 77, 82
No Mode
77 82 73 81 79 84 74 78
MODE FOR GROUPED DATA
The mode for a given grouped data can be calculated by the following formula
𝒇𝒎−𝒇𝟏
𝒍+ ∗h
𝒇𝒎−𝒇𝟏 +(𝒇𝒎−𝒇𝟐)

Where
l = Lower class boundary of the modal class
h = Class width or interval
fm = Frequency of the modal class
f1 = Frequency of the class preceding the modal class
f2 = Frequency of the class succeeding the modal class
EXAMPLE
Consider the following table

l = 15
h = 5
fm = 7
f1 = 5
f2 = 2
Mode = 16.42
FOR GROUPED DATA

Symmetric data Skewed data

Symmetric data

Data equally
spaced around an
axis about which
the mean lies
SKEWNESS

Positive skewed Negative skewed


CENTRAL TENDENCIES ON DISTRIBUTION CURVE

Normal curve
Mean, median and mode are in
the centre and at the same point.

For unsymmetrical curve


Median lies in between mode and
mean and in between the centre
of data accordingly.
DECISION CRITERIA

For symmetric data, mean lies in the middle of the spread but that is not true for
unsymmetrical data.
In unsymmetrical data the spread is around the median.

Symmetric Unsymmetric
Central tendency
data data

MEAN MEDIAN
UNGROUPED DATA

CASE 1
We have an ungrouped data set of income of 7 people
10000, 12000, 15000, 20000, 25000, 20000, 50000000

Mean 7157428 Median 20000

RESULT:
MEDIAN explains the data better
UNGROUPED DATA

CASE 2
We again have a sample of incomes of 7 people
10000,15000,15000,20000,25000,10000,15000

Mean 15714 Median 15000

RESULT:
MEAN explain the data better
NOTE(mode is also the same)
MODE

 Consider a discrete categorical data which consist of the choice of buyers from
cars of three colours
Red, white, black

RED 20

BLACK 30

WHITE 50
HOW TO RELATE ALL THE CENTRAL TENDENCIES WITH SPREAD
MEASURES OF DISPERSION

The measures of central tendency that include mean, median, or mode by themselves are
usually not sufficient enough to reveal the shape of the distribution of a data set.

Two data sets having similar measures of central tendency might have different spreads i.e.
the variations in the data set values might be different.

40 50 60 Mean: 60 Mean: 60 58 59 60
70 80 Spread: 40 Spread: 4 61 62

To completely describe a data set, ‘Measures of Dispersion’ are used alongside the
measures of central tendency.
DISPERSION

Dispersion (also called variability, scatter, or spread) is the extent to which


a distribution is stretched or squeezed.

The measures of Statistical Dispersion include:

• Range
• Variance
• Standard Deviation
• Inter-quartile Range
RANGE
• Range is simplest measure of statistical dispersion and it simply tells spread of the data set.
• Range is simply the difference of the largest and smallest data set observation.
• Range = Largest value – smallest value

VARIANCE
• In probability theory and statistics, variance is the expectation of the squared deviation of a
data set value from the mean of the data set.
• It measures how far a set of numbers are spread out from their average value.
• For ungrouped data:

Population Data Sample Data


VARIANCE
For grouped data:
• f: frequency
• m: class midpoint
Population Data Sample Data

STANDARD DEVIATION
• The standard deviation of a random variable, statistical population or data set is the
square root of its variance.

• Standard deviation is a measure that is used to quantify the amount of variation


or dispersion of a set of data values. A low standard deviation indicates that the data
points tend to be close to the mean of the set, while a high standard deviation indicates
that the data points are spread out over a wider range of values.
Use of Standard Deviation
 By using the mean and standard deviation, we can find the proportion or percentage
of the total observations that fall within a given interval about the mean. This section
briefly discusses Chebyshev’s theorem and the empirical rule, both of which
demonstrate this use of the standard deviation.

Chebyshev’s Theorem
 Chebyshev’s theorem gives a lower bound for the area under a curve between two
points that are on opposite sides of the mean and at the same distance from the
mean.
Example
Empirical Rule
Whereas Chebyshev’s theorem is applicable to any kind of distribution, the empirical
rule applies only to a specific type of distribution called a bell-shaped distribution.
STANDARD DEVIATION
• In statistics, an outlier is an observation point that is distant from other observations.

• An outlier may be due to variability in the measurement or it may indicate experimental


error; the latter are sometimes excluded from the data set.

• An outlier can cause serious problems in statistical analyses. (The standard deviation
might not depict the true behavior of the data set)

• If a data set consists of values 1,3,5,7,10,12,15 and 10000, it is clearly visible that the data
set value of 10000 is an outlier and it affects the overall standard deviation and variance of
the data set.

• To identify the outliers, another measure of dispersion is used that is the inter-quartile
range.
INTER-QUARTILE RANGE
• In statistics, the interquartile range (IQR), also called the midspread or middle 50%, or
technically H-spread, is a measure of statistical dispersion, being equal to the difference
between 75th and 25th percentiles, or between upper and lower quartiles, IQR = Q3 − Q1.

• Unlike total range, the interquartile range has a breakdown point of 25% and is thus often
preferred to the total range. The IQR is used to build box plots, simple graphical
representations of a probability distribution.

• The IQR can be used to identify outliers. The behavior of the data set values between first
and third quartiles represents the distribution of data set in a satisfactory manner.
Measures of position
 Quartiles: Quartiles are three summary measures that divide a ranked data set into four
equal parts. The second quartile is the same as the median of a data set. The first quartile
is the value of the middle term among the observations that are less than the median,
and the third quartile is the value of the middle term among the observations that are
greater than the median.

 Approximately 25% of the values in a ranked data set are less than Q1 and about 75% are
greater than Q1. The second quartile, Q2, divides a ranked data set into two equal parts;
hence, the second quartile and the median are the same. Approximately 75% of the data
values are less than Q3 and about 25% are greater than Q3. The difference between the
third quartile and the first quartile for a data set is called the interquartile range (IQR).
Percentiles and Percentile Rank
 Percentiles are the summary measures that divide a ranked data set into 100 equal parts.
Each (ranked) data set has 99 percentiles that divide it into 100 equal parts. The data
should be ranked in increasing order to compute percentiles. The kth percentile is
denoted by Pk, where k is an integer in the range 1 to 99. For instance, the 25th
percentile is denoted by P25.

 Thus, the kth percentile, Pk, can be defined as a value in a data set such that about k% of
the measurements are smaller than the value of Pk and about (100- k)% of the
measurements are greater than the value of Pk. The approximate value of the kth
percentile is determined as explained next.
BOX-AND-WHISKER PLOT
 A box-and whisker plot gives a graphic presentation of data using five measures: the
median, the first quartile, the third quartile, and the smallest and the largest values in
the data set between the lower and the upper inner fences.

 A box-and-whisker plot can help us visualize the center, the spread, and the skewness
of a data set.

 It also helps to detect outliers.

 We can compare different distributions by making box-and-whisker plots for each of


them.
BOX-AND-WHISKER PLOT
 The following data are the incomes (in thousands of dollars) for a sample of 12
households.
75 69 84 112 74 104 81 90 94 144 79 98
Construct a box-and-whisker plot for these data.
BOX-AND-WHISKER PLOT
BOX-AND-WHISKER PLOT
BOX-AND-WHISKER PLOT
Thank you!

You might also like