0% found this document useful (0 votes)

2 views

MCS Lecture 3

This document covers numerical descriptive measures in probability and statistics, focusing on measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation, inter-quartile range). It explains how to calculate these measures for both ungrouped and grouped data, and discusses the importance of understanding data distribution and outliers. Additionally, it introduces graphical representations like box-and-whisker plots to visualize data characteristics.

Uploaded by

Cynosure Wolf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

MCS Lecture 3

Uploaded by

Cynosure Wolf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Probability and Statistics

Lecture # 3

NUMERICAL DESCRIPTIVE
MEASURES
PROBABILITY AND STATISTICS

CHAPTER 3
NUMERICAL descriptive measures
TABLE OF CONTENTS

 Measures of Central Tendency

 Measures of Central Dispersion

MEASURES OF CENTRAL TENDENCY
 Central Tendency denotes the tendency of quantitative data to cluster around some
central value.
 It may also be called a center or location of the distribution.
 As such, measures of central tendency are sometimes called measures of central
location.
MAIN TYPES

Mean

Median Mode
OTHER TYPES
 Geometric Mean: The nth root of the product of the data values.

 Harmonic Mean: The reciprocal of the arithmetic mean of the reciprocals of the data
values.

 Mid-Range: The arithmetic mean of the maximum and minimum values of a data set.

 Trimmed Mean: The arithmetic mean of data values after a certain number or
proportion of the highest and lowest data values have been discarded.
MEAN FOR UNGROUPED DATA
 It is the sum of all the data values divided by the number of total data values

σ𝑥
 For population µ=
𝑁

σ𝑥
 For sample x̄̄=
𝑛

 Here ‘x’ is an arbitrary data value

 ‘N’ is the number of elements in a population.
 ‘n’ is the number of elements in a sample.
EXAMPLE
 Following data represents the scores of Aqil Khan in a tournament:

18, 24, 38, 36, 21, 40, 33, 22

Find the average score.

Solution:

18+24+38+36+21+40+33+22
X ̄= = 29
8
MEAN FOR GROUPED DATA
 First, data must be arranged in frequency distribution.

 For grouped data mean is given by the following equation:

σ 𝑓𝑚
 For population µ=
𝑁

σ 𝑚𝑓
 For Sample x̄̄ =
𝑛

 ‘m’ represents the mid-point of each class.

 ‘f’ is the frequency of the corresponding class
 ‘n’ and ‘N’ are the same as described previously.
EXAMPLE
 The following frequency distribution table represents the daily commuting hours of all
the workers in an office. Find the average commuting hours for a worker,
SOLUTION
 We have to draw a table as follows:
MEDIAN
Definition
The median is the value of the middle term in a data set that has been ranked in
increasing order.
MEDIAN FOR UNGROUPED DATA
Steps to calculate the

As obvious from the definition of the median, it divides a ranked data set into two equal
parts.

The calculation of the median consists of the following two steps:

 Rank the data set in increasing order.

 Find the middle term. The value of this term is the median.
MEDIAN FOR UNGROUPED DATA
For odd number of observations
If the number of observations in a data set is odd, then the median is given by the value
of the middle term in the ranked data

Example
The following data give the prices (in thousands of dollars) of seven houses selected from
all houses sold last month in a city.

312 257 421 289 526 374 497

Find the median?
MEDIAN FOR UNGROUPED DATA
Solution
First, we rank the given data in increasing order as follows:
257 289 312 374 421 497 526

Since there are seven homes in this data set and the middle term is the fourth term, the
median is given by the value of the fourth term in the ranked data.

257 289 312 374 421 497 526

Thus, the median price of a house is 374, or $374,000.

MEDIAN FOR UNGROUPED DATA
For even number of observations
If the number of observations is even, then the median is given by the average of the
values of the two middle terms.

Example
Consider the data
7 8 9 10 11 12 13 13 14 17 17 45

Arranging the data in ascending order

7 8 9 10 11 12 13 13 14 17 17 45

Median = (12+13)/2 = 12.5

MEDIAN FOR GROUPED DATA
The median for a given grouped data can be calculated by the following formula

𝒉 𝒏
𝒍 + ( − 𝒄)
𝒇 𝟐

Where
l = Lower class boundary of the median class
h = Class width or interval
f = Frequency of the median class
n = Total number of observations
c = Cumulative frequency of the class preceding the median class
MEDIAN FOR GROUPED DATA

Example
MEDIAN FOR GROUPED DATA
Here
𝒏
= 75
𝟐

So the class 5-9 is the median class. The remaining values are

c = 32
f = 71
h = 9.5 – 4.5 = 5
l = 4.5

Median = 7.53 minutes

MODE
Definition
The mode is the value that occurs with the highest frequency in a data set.

Mode is a French word that means fashion—an item that is most popular or common.

In statistics, the mode represents the most common value in a data set.
MODE FOR UNGROUPED DATA
The mode, in this case, is simply the most repeated value in the data set.

A data set can contain one or more than one values that are repeated with the same
peak frequency. In this perspective the data set can be
 Uni-modal
 Bimodal
 Multimodal

A data set in which all the values are repeated with the same frequency has no modal
value.
EXAMPLES

Unimodal
77 82 74 81 79 84 74 78
Mode = 74
Bimodal
77 82 74 81 77 84 74 78
Mode = 74, 77
Multimodal
77 82 74 82 77 84 74 78
Mode = 74, 77, 82
No Mode
77 82 73 81 79 84 74 78
MODE FOR GROUPED DATA
The mode for a given grouped data can be calculated by the following formula
𝒇𝒎−𝒇𝟏
𝒍+ ∗h
𝒇𝒎−𝒇𝟏 +(𝒇𝒎−𝒇𝟐)

Where
l = Lower class boundary of the modal class
h = Class width or interval
fm = Frequency of the modal class
f1 = Frequency of the class preceding the modal class
f2 = Frequency of the class succeeding the modal class
EXAMPLE
Consider the following table

l = 15
h = 5
fm = 7
f1 = 5
f2 = 2
Mode = 16.42
FOR GROUPED DATA

Symmetric data Skewed data

Symmetric data

Data equally
spaced around an
axis about which
the mean lies
SKEWNESS

Positive skewed Negative skewed

CENTRAL TENDENCIES ON DISTRIBUTION CURVE

Normal curve
Mean, median and mode are in
the centre and at the same point.

For unsymmetrical curve

Median lies in between mode and
mean and in between the centre
of data accordingly.
DECISION CRITERIA

For symmetric data, mean lies in the middle of the spread but that is not true for
unsymmetrical data.
In unsymmetrical data the spread is around the median.

Symmetric Unsymmetric
Central tendency
data data

MEAN MEDIAN
UNGROUPED DATA

CASE 1
We have an ungrouped data set of income of 7 people
10000, 12000, 15000, 20000, 25000, 20000, 50000000

Mean 7157428 Median 20000

RESULT:
MEDIAN explains the data better
UNGROUPED DATA

CASE 2
We again have a sample of incomes of 7 people
10000,15000,15000,20000,25000,10000,15000

Mean 15714 Median 15000

RESULT:
MEAN explain the data better
NOTE(mode is also the same)
MODE

 Consider a discrete categorical data which consist of the choice of buyers from
cars of three colours
Red, white, black

RED 20

BLACK 30

WHITE 50
HOW TO RELATE ALL THE CENTRAL TENDENCIES WITH SPREAD
MEASURES OF DISPERSION

The measures of central tendency that include mean, median, or mode by themselves are
usually not sufficient enough to reveal the shape of the distribution of a data set.

Two data sets having similar measures of central tendency might have different spreads i.e.
the variations in the data set values might be different.

40 50 60 Mean: 60 Mean: 60 58 59 60
70 80 Spread: 40 Spread: 4 61 62

To completely describe a data set, ‘Measures of Dispersion’ are used alongside the
measures of central tendency.
DISPERSION

Dispersion (also called variability, scatter, or spread) is the extent to which

a distribution is stretched or squeezed.

The measures of Statistical Dispersion include:

• Range
• Variance
• Standard Deviation
• Inter-quartile Range
RANGE
• Range is simplest measure of statistical dispersion and it simply tells spread of the data set.
• Range is simply the difference of the largest and smallest data set observation.
• Range = Largest value – smallest value

VARIANCE
• In probability theory and statistics, variance is the expectation of the squared deviation of a
data set value from the mean of the data set.
• It measures how far a set of numbers are spread out from their average value.
• For ungrouped data:

Population Data Sample Data

VARIANCE
For grouped data:
• f: frequency
• m: class midpoint
Population Data Sample Data

STANDARD DEVIATION
• The standard deviation of a random variable, statistical population or data set is the
square root of its variance.

• Standard deviation is a measure that is used to quantify the amount of variation

or dispersion of a set of data values. A low standard deviation indicates that the data
points tend to be close to the mean of the set, while a high standard deviation indicates
that the data points are spread out over a wider range of values.
Use of Standard Deviation
 By using the mean and standard deviation, we can find the proportion or percentage
of the total observations that fall within a given interval about the mean. This section
briefly discusses Chebyshev’s theorem and the empirical rule, both of which
demonstrate this use of the standard deviation.

Chebyshev’s Theorem
 Chebyshev’s theorem gives a lower bound for the area under a curve between two
points that are on opposite sides of the mean and at the same distance from the
mean.
Example
Empirical Rule
Whereas Chebyshev’s theorem is applicable to any kind of distribution, the empirical
rule applies only to a specific type of distribution called a bell-shaped distribution.
STANDARD DEVIATION
• In statistics, an outlier is an observation point that is distant from other observations.

• An outlier may be due to variability in the measurement or it may indicate experimental

error; the latter are sometimes excluded from the data set.

• An outlier can cause serious problems in statistical analyses. (The standard deviation
might not depict the true behavior of the data set)

• If a data set consists of values 1,3,5,7,10,12,15 and 10000, it is clearly visible that the data
set value of 10000 is an outlier and it affects the overall standard deviation and variance of
the data set.

• To identify the outliers, another measure of dispersion is used that is the inter-quartile
range.
INTER-QUARTILE RANGE
• In statistics, the interquartile range (IQR), also called the midspread or middle 50%, or
technically H-spread, is a measure of statistical dispersion, being equal to the difference
between 75th and 25th percentiles, or between upper and lower quartiles, IQR = Q3 − Q1.

• Unlike total range, the interquartile range has a breakdown point of 25% and is thus often
preferred to the total range. The IQR is used to build box plots, simple graphical
representations of a probability distribution.

• The IQR can be used to identify outliers. The behavior of the data set values between first
and third quartiles represents the distribution of data set in a satisfactory manner.
Measures of position
 Quartiles: Quartiles are three summary measures that divide a ranked data set into four
equal parts. The second quartile is the same as the median of a data set. The first quartile
is the value of the middle term among the observations that are less than the median,
and the third quartile is the value of the middle term among the observations that are
greater than the median.

 Approximately 25% of the values in a ranked data set are less than Q1 and about 75% are
greater than Q1. The second quartile, Q2, divides a ranked data set into two equal parts;
hence, the second quartile and the median are the same. Approximately 75% of the data
values are less than Q3 and about 25% are greater than Q3. The difference between the
third quartile and the first quartile for a data set is called the interquartile range (IQR).
Percentiles and Percentile Rank
 Percentiles are the summary measures that divide a ranked data set into 100 equal parts.
Each (ranked) data set has 99 percentiles that divide it into 100 equal parts. The data
should be ranked in increasing order to compute percentiles. The kth percentile is
denoted by Pk, where k is an integer in the range 1 to 99. For instance, the 25th
percentile is denoted by P25.

 Thus, the kth percentile, Pk, can be defined as a value in a data set such that about k% of
the measurements are smaller than the value of Pk and about (100- k)% of the
measurements are greater than the value of Pk. The approximate value of the kth
percentile is determined as explained next.
BOX-AND-WHISKER PLOT
 A box-and whisker plot gives a graphic presentation of data using five measures: the
median, the first quartile, the third quartile, and the smallest and the largest values in
the data set between the lower and the upper inner fences.

 A box-and-whisker plot can help us visualize the center, the spread, and the skewness
of a data set.

 It also helps to detect outliers.

 We can compare different distributions by making box-and-whisker plots for each of

them.
BOX-AND-WHISKER PLOT
 The following data are the incomes (in thousands of dollars) for a sample of 12
households.
75 69 84 112 74 104 81 90 94 144 79 98
Construct a box-and-whisker plot for these data.
BOX-AND-WHISKER PLOT
BOX-AND-WHISKER PLOT
BOX-AND-WHISKER PLOT
Thank you!

Time Series Forecasting - Project Report
33% (3)
Time Series Forecasting - Project Report
68 pages
CF Week 9 Assignment Template
No ratings yet
CF Week 9 Assignment Template
6 pages
Measures of Central Tendency
90% (10)
Measures of Central Tendency
22 pages
Stats Form 4
100% (2)
Stats Form 4
35 pages
Describing Data_Numerical Measure
No ratings yet
Describing Data_Numerical Measure
33 pages
Basics of Stats
No ratings yet
Basics of Stats
49 pages
Central Tendency
No ratings yet
Central Tendency
105 pages
Topic 3
No ratings yet
Topic 3
49 pages
Introduction To Business Statistics QM 120: Department of Quantitative Methods & Information Systems
No ratings yet
Introduction To Business Statistics QM 120: Department of Quantitative Methods & Information Systems
36 pages
SASA211: Finding The Center
No ratings yet
SASA211: Finding The Center
138 pages
Measure of Central Tendency
No ratings yet
Measure of Central Tendency
116 pages
المحاضرة رقم 3
No ratings yet
المحاضرة رقم 3
44 pages
Measures of Central Tendency and Dispersion: Chapter Three
No ratings yet
Measures of Central Tendency and Dispersion: Chapter Three
47 pages
Mathematical Analysis
100% (1)
Mathematical Analysis
46 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
51 pages
Lec_4 (Summary Data)
No ratings yet
Lec_4 (Summary Data)
89 pages
1 Measures of Central Tendency
No ratings yet
1 Measures of Central Tendency
32 pages
Measures of Central Tendency and Dispersion
100% (1)
Measures of Central Tendency and Dispersion
7 pages
Mean Median Mode
No ratings yet
Mean Median Mode
56 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
3rd Week
No ratings yet
3rd Week
87 pages
Week 3_ Review Topic_ Measures of Central Tendency and Dispersion _ NEUVLE (1)
No ratings yet
Week 3_ Review Topic_ Measures of Central Tendency and Dispersion _ NEUVLE (1)
13 pages
Lecture 3 - MEASURE OF CENTRAL TENDENCY
No ratings yet
Lecture 3 - MEASURE OF CENTRAL TENDENCY
25 pages
2.3 Descriptive Numerical Summary Measures
No ratings yet
2.3 Descriptive Numerical Summary Measures
67 pages
Lecture_2 Measures of Central Tendency and Variation
No ratings yet
Lecture_2 Measures of Central Tendency and Variation
40 pages
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
No ratings yet
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
62 pages
LESSON 2 Finals in MMW
No ratings yet
LESSON 2 Finals in MMW
10 pages
Descreptive Statistics 1
No ratings yet
Descreptive Statistics 1
74 pages
Lecture 2 - Descriptive Statistics
No ratings yet
Lecture 2 - Descriptive Statistics
40 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Math in The Modern World Stat Lecture
No ratings yet
Math in The Modern World Stat Lecture
3 pages
CHP 3 STAT 245 Summer 2021
No ratings yet
CHP 3 STAT 245 Summer 2021
31 pages
Central Tendency: Mode, Median, and Mean
No ratings yet
Central Tendency: Mode, Median, and Mean
15 pages
Data Analytics Ass Group-4 Updated
No ratings yet
Data Analytics Ass Group-4 Updated
7 pages
Lecture 3 Sem 1 Edited
No ratings yet
Lecture 3 Sem 1 Edited
30 pages
Statistics
No ratings yet
Statistics
49 pages
2nd Unit - Statistics
No ratings yet
2nd Unit - Statistics
15 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Lesson 3 Numerical and Descriptive Measures
No ratings yet
Lesson 3 Numerical and Descriptive Measures
16 pages
Stat I PPT, Chapter 3k
No ratings yet
Stat I PPT, Chapter 3k
47 pages
Measure of Central Tendency Grouped Data
No ratings yet
Measure of Central Tendency Grouped Data
22 pages
UCCM2233 - Chp3 Num Descriptive Measures-Wble
No ratings yet
UCCM2233 - Chp3 Num Descriptive Measures-Wble
103 pages
Chapter 4 Numerical Descriptive Measures of Data
No ratings yet
Chapter 4 Numerical Descriptive Measures of Data
35 pages
Measures of Central Tendency and Dispersion
No ratings yet
Measures of Central Tendency and Dispersion
9 pages
Measure of Central Tendency
No ratings yet
Measure of Central Tendency
16 pages
Laws of Mean, Median and Mode
No ratings yet
Laws of Mean, Median and Mode
60 pages
Biostat Lecture Four
No ratings yet
Biostat Lecture Four
53 pages
Chapter 2
No ratings yet
Chapter 2
10 pages
3.describing Data
No ratings yet
3.describing Data
35 pages
chapter 4 revised pdf
No ratings yet
chapter 4 revised pdf
29 pages
Measures of CT and Dispersion
No ratings yet
Measures of CT and Dispersion
43 pages
Calculates Measures of Central Tendency of Grouped and Ungrouped Data
No ratings yet
Calculates Measures of Central Tendency of Grouped and Ungrouped Data
23 pages
Measures of CT and Dispersion
No ratings yet
Measures of CT and Dispersion
57 pages
Mean,Median,Mode
No ratings yet
Mean,Median,Mode
49 pages
Measures of Central Tendency
100% (1)
Measures of Central Tendency
48 pages
Biostatistics Notes
No ratings yet
Biostatistics Notes
47 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Lesson-3.2-Measures-of-Central-Tendency-Position-and-Variation
No ratings yet
Lesson-3.2-Measures-of-Central-Tendency-Position-and-Variation
62 pages
Central Tendency and Dispersion
No ratings yet
Central Tendency and Dispersion
61 pages
Stat Chapter 3
No ratings yet
Stat Chapter 3
41 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
lec 6_27 Mar(1)
No ratings yet
lec 6_27 Mar(1)
25 pages
Lec 7_17 Apr
No ratings yet
Lec 7_17 Apr
13 pages
Lecture 7
No ratings yet
Lecture 7
14 pages
Lecture 1 and 2
No ratings yet
Lecture 1 and 2
17 pages
Lecture 2
No ratings yet
Lecture 2
41 pages
Lec 0 Feb 23
No ratings yet
Lec 0 Feb 23
60 pages
Fpls 10 00755 PDF
No ratings yet
Fpls 10 00755 PDF
17 pages
Year 11 Preliminary Standard Math: Analysing Data
No ratings yet
Year 11 Preliminary Standard Math: Analysing Data
32 pages
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
No ratings yet
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
34 pages
DMV_Lab_Manual (2) (1) (2)
No ratings yet
DMV_Lab_Manual (2) (1) (2)
45 pages
Austo Automobile
No ratings yet
Austo Automobile
20 pages
Kuliah 2 - Data Dan Eksplorasi Data
No ratings yet
Kuliah 2 - Data Dan Eksplorasi Data
61 pages
AP Stats Practices
No ratings yet
AP Stats Practices
28 pages
Basic Statistics
No ratings yet
Basic Statistics
90 pages
Manual For: LFQ-Analyst
No ratings yet
Manual For: LFQ-Analyst
19 pages
dataa
No ratings yet
dataa
3 pages
Unit 1 - Descriptive Statistics
No ratings yet
Unit 1 - Descriptive Statistics
49 pages
Feature Scaling in Machine Learning
No ratings yet
Feature Scaling in Machine Learning
14 pages
Describing Data: Displaying and Exploring Data
100% (1)
Describing Data: Displaying and Exploring Data
30 pages
TSA Theory Part1
No ratings yet
TSA Theory Part1
98 pages
Assignment 1 Data - R Sivani Jayanth
No ratings yet
Assignment 1 Data - R Sivani Jayanth
9 pages
360DigiTmg E Book Data Science
100% (1)
360DigiTmg E Book Data Science
168 pages
Yr 10 5.2 NCM CH 5 Investigating Data
No ratings yet
Yr 10 5.2 NCM CH 5 Investigating Data
50 pages
Business Analytics: Finding Relationships Among Variables
No ratings yet
Business Analytics: Finding Relationships Among Variables
39 pages
Lumira DataStorytellingHandbook 2017 PDF
No ratings yet
Lumira DataStorytellingHandbook 2017 PDF
49 pages
Lab Exercise 1
No ratings yet
Lab Exercise 1
16 pages
Choosing and Using Statistics A Biologist s Guide 3rd Edition Calvin Dytham - The full ebook with all chapters is available for download now
No ratings yet
Choosing and Using Statistics A Biologist s Guide 3rd Edition Calvin Dytham - The full ebook with all chapters is available for download now
53 pages
Mba Unit-2
No ratings yet
Mba Unit-2
2 pages
Data Visualization Using Python
No ratings yet
Data Visualization Using Python
44 pages
Intro To Stats Using LibreOffice-Calc and Gnumeric
100% (1)
Intro To Stats Using LibreOffice-Calc and Gnumeric
91 pages
Markkanen Et Al., 2023
No ratings yet
Markkanen Et Al., 2023
17 pages
Data Analysis Using Python Day_1 to Day_4
No ratings yet
Data Analysis Using Python Day_1 to Day_4
30 pages
Lesson 9 Using Macros For Analytics
No ratings yet
Lesson 9 Using Macros For Analytics
96 pages
SPH 2 Lecture - 1 Introduction and Data
No ratings yet
SPH 2 Lecture - 1 Introduction and Data
118 pages

MCS Lecture 3

Uploaded by

MCS Lecture 3

Uploaded by

Probability and Statistics

 Measures of Central Tendency

 Measures of Central Dispersion

 Here ‘x’ is an arbitrary data value

18, 24, 38, 36, 21, 40, 33, 22

Find the average score.

 For grouped data mean is given by the following equation:

 ‘m’ represents the mid-point of each class.

The calculation of the median consists of the following two steps:

 Rank the data set in increasing order.

312 257 421 289 526 374 497

257 289 312 374 421 497 526

Thus, the median price of a house is 374, or $374,000.

Arranging the data in ascending order

Median = (12+13)/2 = 12.5

Median = 7.53 minutes

Symmetric data Skewed data

Positive skewed Negative skewed

For unsymmetrical curve

Mean 7157428 Median 20000

Mean 15714 Median 15000

Dispersion (also called variability, scatter, or spread) is the extent to which

The measures of Statistical Dispersion include:

Population Data Sample Data

• Standard deviation is a measure that is used to quantify the amount of variation

• An outlier may be due to variability in the measurement or it may indicate experimental

 It also helps to detect outliers.

 We can compare different distributions by making box-and-whisker plots for each of

You might also like