0% found this document useful (0 votes)

34 views33 pages

Chapter 3

The document discusses various measures for describing numeric data, including central tendency (mean, median, mode), dispersion (range, variance, standard deviation), and shape (skewness). It explains how to calculate and interpret these statistics, and how to identify and treat outliers. The appropriate measure depends on factors like the data type and presence of outliers. For example, the median and mode are less impacted by outliers than the mean.

Uploaded by

Evelyn Maile

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views33 pages

Chapter 3

Uploaded by

Evelyn Maile

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Learning unit 2

Exploring Data

Week 2: Summarising Data: Summary Tables and Graphs

Week 3: Describing Data: Numeric Descriptive Statistics

Describing Data: Numeric Descriptive Statistics

‘ Learning outcomes
➢ describe the various central and non-central location measures
➢ calculate and interpret each of these location measures
➢ describe the appropriate central location measure for different data types
➢ describe the various measures of spread (or dispersion)
➢ calculate and interpret each measure of dispersion
➢ describe the concept of skewness
➢ calculate and interpret the coefficient of skewness
➢ explain how to identify and treat outliers
➢ calculate the five-number summary table and construct its box plot
➢ explain how outliers influence the choice of valid descriptive statistical measures
Describing the data profile of a random variable
Measures of location (both central and non-central)
➢ the arithmetic mean (also called the average) – valid for numeric data
➢ the median (also called the second quartile, the middle quartile or the 50th
percentile – valid for numeric data
➢ the mode (or modal value) – valid for numeric and categorical data
Measures of spread (or dispersion)
➢ Range
➢ Variance
➢ Standard deviation
➢ Coefficient of Variance
Measure of shape (skewness)
➢ Symmetrical Distribution
➢ Positively Skewed Distribution
➢ Negatively Skewed Distribution
Measures of Central Tendency
. Where data are centred

Real Equity Returns of 16 Major Equity Markets

Australia 9.0% Japan 9.3%
Belgium 4.8% Netherlands 7.7%
Canada 7.7% South Africa 9.1%
Denmark 6.2% Spain 5.8%
France 6.3% Sweden 9.9%
Germany 8.8% Switzerland 6.9%
Ireland 7.0% United Kingdom 7.6%
Italy 6.8% United States 8.7%
# Arithmetic Mean = 7.6% …… center of gravity and

subject to extreme large or small outliers

Arithmetic Mean for Grouped Numeric Data

When numeric data is grouped into intervals and shown in a numeric

frequency distribution; then arithmetic mean can be approximated by:

➢ finding the midpoint of each interval - representing all the x values in

each interval

➢ multiply each interval’s midpoint value by the frequency count

➢ summing up the total values of each interval

➢ divide the total sum by the sample size, n.

Arithmetic mean for grouped numeric data
Fuel
Truck
(km/l) Interval midpoint xi frequency fi xi fi
1 13 6-<9 7.5 4 30
2 11 9 - < 12 10.5 9 94.5
3 10 12 - < 15 13.5 5 67.5
4 13 15 - < 18 16.5 2 33
5 10 n = 20 225
6 13
7 8
8 10
9 10
10 13 ഥ =
𝒙 225/20 = 11.25 km/litre
11 11
12 8
13 16
14 16
15 11
16 9
17 11
18 13
19 7
20 12
Arithmetic mean (advantages and disadvantages)

Advantages:

➢ It uses all the data values in its calculation

➢ It is an unbiased statistic (meaning that, on average, it represents the true

mean)

Disadvantages

➢ It is not appropriate for categorical (i.e. nominal or ordinal-scaled) data;

only be applied to numeric (i.e. interval and ratio-scaled) data.

➢ It is distorted by outliers. An outlier is an extreme value in a data set.

Median for ungrouped data

➢ The middle number of an ordered set of data

➢ Divides an ordered set of data values into two equal halves

➢ 50% of the data values lie below the median and 50% lie above it

To calculate the median for ungrouped (raw) numeric data:

➢ Arrange the n data values in ascending order.

➢ Find the median by first identifying the middle position in the data set as
follows:

Odd number location = (n + 1) / 2

Even number locations = n/2 and (n+2)/2 [middle 2 items]

Median for ungrouped data - example
Outliers - example
Problem 4: P/Es for a Client Portfolio
Stock Price EPS P/E

A 16.83 1.23 13.68

D 16.54 1.06 15.60
C 86.92 4.95 17.56
B 60.83 3.19 19.07
G 38.66 1.84 21.01
F 28.43 1.11 25.61
E 13.30 0.03 443.33

* Mean = 79.41 (based on outlier E)

> Odd No. Location = (n + 1) / 2

> Even No. Locations = n/2 and (n+2)/2
[middle 2 items]

* Median P/E = 19.07 …… better indication of central location (not

affected by outlier)
Median for grouped data
Graphical approach
Using the ‘less than’ ogive graph, the median value is found by reading off the
data value on the x-axis that is associated with the 50% cumulative frequency
located on the y-axis.
Arithmetic approach
Based on the sample size, n, calculate n/2 to find the median position.

Using the cumulative frequency counts of the ‘less than’ ogive summary
table, find the median interval (i.e. the interval that contains the median
position [the (n/2)th data value]).

The median value can be approximated using the midpoint of the median
interval, or calculated using the following formula to give a more representative
median value:
Median for grouped data - example
Courier Delivery Times Study A courier company recorded 30 delivery times (in minutes) to
deliver parcels to their clients from its depot. The data are summarised in the numeric
frequency – and cumulative frequency – distributions as shown in Table 3.3.
Median (advantages and disadvantages)

Advantage over the mean

➢ it is not affected by outliers → a more representative measure of central

location than the mean when significant outliers occur in a set of data.

Disadvantages

➢ it cannot be calculated for categorical data – only be applied to numeric

data

➢ it is more affected by sampling fluctuations than the mean as it uses only

the middle data values (and not all the data values) and is therefore less
stable than the mean.
Mode
➢ the most frequently occurring value in a set data
➢ can be calculated both for categorical data and numeric data

To calculate the mode:

Ungrouped data
➢ rank the data from lowest to highest
➢ identify the data value that occurs most frequently.
Large samples of discrete or categorical (nominal and ordinal-scaled)
data:
➢ construct a categorical frequency table
➢ identify the modal value or modal category that occurs most frequently.
Mode – example
Refer to previous example – Courier Delivery Times
Mode (advantages and disadvantages)

Advantages
➢ Valid measure of central location for all data types (i.e. categorical and numeric)
➢ For categorical data → the mode defines the most frequently occurring category
➢ For numeric data → the mode is the most frequently occurring data value
(ungrouped) / the midpoint value of a modal interval (grouped)
➢ Not influenced by outliers → represents the most frequently occurring data value
(or response category).

Disadvantages
➢ Representative measure of central location only if the histogram of the numeric
random variable is unimodal (i.e. has one peak only)
Which Central Location Measure is Best?
Depends on:
Data Type
➢ For categorical (nominal or ordinal scaled) data → only the mode is the only valid
and representative measure
➢ For numeric (interval or ratio-scaled) → all three measures (mean, median and
mode) are valid and representative

Outliers
➢ It distorts the mean but do not affect the median or the mode.
➢ If outliers are detected in a set of data chose the median (or mode); the median is
preferred to the mode as it can be used in further analysis.

However, if there are good reasons to remove the outlier(s) from the data set then
the mean can again be used as the best central location measure.
Other Measures of Central Location

Geometric mean
➢ used to find the average of percentage change data, such as
indexes, growth rates or rates of change.

When each data value is calculated from a different base, the

appropriate measure of central location is the geometric mean.
Example: Share Price at end of
Week 1 = R25
Week 2 = R30
Week 3 = R33
% change week 1 to 2 =20% [(MVend/MVbegin – 1) x 100]
% change week 2 to 3 = 10%
Geometric Mean (example)

The percentage changes must be expressed as decimal values. For example, a 7%

increase must be written as 1.07 (1+0.07) and a 4% decrease must be written as 0.96
(1+(-0.04).
Other Measures of Central Location (continue)

Weighted Mean
➢ Different weights are given to each data value to arrive at an average value
➢ Use when the importance (weight) of each data value is different

To calculate the weighted arithmetic mean:

➢ Each observation, (xi) is first multiplied by its frequency count, fi (weighting)
➢ The weighted observations are then summed
➢ The sum is then divided by the sum of the weights

Formula
Weighted Mean (example)
Non-central Location Measures

Quartiles are non-central measures that divide an ordered data set into quarters
(i.e. four equal parts).

The lower quartile, Q1, is that data value that separates the lower (bottom) 25% of
(ordered) data values from the top 75% of ordered data values.

The middle quartile, Q2, is the median. It divides an ordered data set into two
equal halves.

The upper quartile, Q3, is that data value that separates the top (upper) 25% of
(ordered) data values from the bottom 75% of ordered data values.
Non-central Location Measures
Non-central Location Measures

Quartiles
➢ Calculated in a similar way to the median
➢ Difference lies in the identification of the quartile position & the choice of the quartile
interval.

Steps to calculate quartiles (lower, middle and upper) for ungrouped (raw) data:
➢ Sort the data in ascending order

➢ Each quartile position is determined as follows (regardless of whether n is even or odd):

➢ Count to the quartile position (rounded down to the nearest integer) to find the
(approximate) quartile value.

Quartile value = approximate quartile value + fraction part of quartile position ×

(consecutive value after quartile position − approximate quartile value)
Quartiles (Example)
Quartiles (Example)
Non-central Location Measures
Quartiles for grouped data

➢ Use formula similar to median formula to find both the lower and upper

quartiles

➢ Modify formula to identify either the lower or the upper quartile position

➢ Then find the lower or upper quartile interval

Non-central Location Measures
Quartiles for grouped data
Non-central Location Measures
Quartiles for grouped data
Quartiles (Example - grouped data)
Quartiles (Example - grouped data)
Percentiles
Similar to quartiles
lower quartile = 25th percentile
upper quartile = 75th percentile

Percentiles are calculated in the same way as quartiles

➢ First find the percentile position
➢ Then identify the percentile value in that position.

Example to find the 40th

➢ percentile position is 0.40(n + 1)

Once the percentile position is found, apply the same rules as for quartiles to
find the appropriate percentile value.

Stat I Chapter 3
No ratings yet
Stat I Chapter 3
48 pages
Measure of Central Tendency
No ratings yet
Measure of Central Tendency
16 pages
AMS 5355jv005
100% (3)
AMS 5355jv005
11 pages
CHAPTER 3 Statistical Description of Data
No ratings yet
CHAPTER 3 Statistical Description of Data
30 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
65 pages
Measures of Central Tendency
100% (3)
Measures of Central Tendency
30 pages
Describing Data - Numerical Measure
No ratings yet
Describing Data - Numerical Measure
33 pages
3i's - 4th Quarter Reviewer
100% (1)
3i's - 4th Quarter Reviewer
5 pages
Proton Waja 4G18 Engine Service Manual
No ratings yet
Proton Waja 4G18 Engine Service Manual
144 pages
1 Measures of Central Tendency
No ratings yet
1 Measures of Central Tendency
32 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
13 pages
Data Analytics TB
No ratings yet
Data Analytics TB
1,944 pages
Instrumentation Training Tutorial1 PDF
No ratings yet
Instrumentation Training Tutorial1 PDF
6 pages
Measures of Location
No ratings yet
Measures of Location
33 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Lecture 5 Statbio
No ratings yet
Lecture 5 Statbio
14 pages
Chapter 2
No ratings yet
Chapter 2
15 pages
(Measures of Location) - Lec#1 - Chapter 1 - Part1
No ratings yet
(Measures of Location) - Lec#1 - Chapter 1 - Part1
33 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Chapter 4 Numerical Descriptive Measures of Data
No ratings yet
Chapter 4 Numerical Descriptive Measures of Data
35 pages
COQTA1-B22 Week 1 Lesson 2 2
No ratings yet
COQTA1-B22 Week 1 Lesson 2 2
36 pages
MMW Reviewer
No ratings yet
MMW Reviewer
9 pages
ISA 250 - Consideration With Laws and Regulations
No ratings yet
ISA 250 - Consideration With Laws and Regulations
10 pages
Notation
No ratings yet
Notation
9 pages
Week 1 - Describing Data 2
No ratings yet
Week 1 - Describing Data 2
28 pages
Measure of Central Tendency - EWU - Removed
No ratings yet
Measure of Central Tendency - EWU - Removed
49 pages
ADDB Week 2
No ratings yet
ADDB Week 2
58 pages
2) S - Measures of Location and Spread
No ratings yet
2) S - Measures of Location and Spread
49 pages
5.measures of Central Tendency
No ratings yet
5.measures of Central Tendency
15 pages
Lec - 4 (Summary Data)
No ratings yet
Lec - 4 (Summary Data)
89 pages
MAT-08-Engineering-Data-Analysis-NUMERICAL SUMMARY MEASURES
No ratings yet
MAT-08-Engineering-Data-Analysis-NUMERICAL SUMMARY MEASURES
73 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Chapter 5 & 6 Descriptive and Inferential Statistics New2
No ratings yet
Chapter 5 & 6 Descriptive and Inferential Statistics New2
53 pages
Data Management
No ratings yet
Data Management
36 pages
Lecture 04
No ratings yet
Lecture 04
88 pages
2review On Measurement On Descriptive Statistics
No ratings yet
2review On Measurement On Descriptive Statistics
76 pages
Week 2
No ratings yet
Week 2
27 pages
Control Box CB6S Data Sheet Eng
0% (1)
Control Box CB6S Data Sheet Eng
8 pages
Unit - 2: Measures of Central Tendency
No ratings yet
Unit - 2: Measures of Central Tendency
8 pages
Math in The Modern World Stat Lecture
No ratings yet
Math in The Modern World Stat Lecture
3 pages
Week7 - Measures of Central Tendency
No ratings yet
Week7 - Measures of Central Tendency
46 pages
Lecture - 04 - TP
No ratings yet
Lecture - 04 - TP
126 pages
STA101 CH 3
No ratings yet
STA101 CH 3
46 pages
3descriptive Numerical Summary Measures
No ratings yet
3descriptive Numerical Summary Measures
111 pages
MCS Lecture 3
No ratings yet
MCS Lecture 3
57 pages
CH 3
No ratings yet
CH 3
59 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
Data Management Part 1 2024
No ratings yet
Data Management Part 1 2024
68 pages
Exercise 5 - MMW Statistics - For Asynch
No ratings yet
Exercise 5 - MMW Statistics - For Asynch
18 pages
Describing Data Numerical
No ratings yet
Describing Data Numerical
53 pages
III. Central Tendency
No ratings yet
III. Central Tendency
6 pages
MEASURES OF CENTRAL TENDENCY (Measures of Location)
No ratings yet
MEASURES OF CENTRAL TENDENCY (Measures of Location)
46 pages
Chapter Three Bio
No ratings yet
Chapter Three Bio
38 pages
2.3 Descriptive Numerical Summary Measures
No ratings yet
2.3 Descriptive Numerical Summary Measures
67 pages
4b) ppt-C4-prt 2
No ratings yet
4b) ppt-C4-prt 2
48 pages
BUSS101 Week 3 S1 2024
No ratings yet
BUSS101 Week 3 S1 2024
56 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
5 pages
Portion 9
No ratings yet
Portion 9
44 pages
OptimHire Deck
No ratings yet
OptimHire Deck
28 pages
Business Statistics - Session Descriptive Statistics
No ratings yet
Business Statistics - Session Descriptive Statistics
28 pages
Module 3. Organizing and Summarizing Quantitative Data
No ratings yet
Module 3. Organizing and Summarizing Quantitative Data
13 pages
Central Tendency
No ratings yet
Central Tendency
105 pages
3 Summarizing Data
No ratings yet
3 Summarizing Data
64 pages
L-03 PBH 611 Exploratory Data Analysis
No ratings yet
L-03 PBH 611 Exploratory Data Analysis
78 pages
Properties - Describing Quantitative Data
No ratings yet
Properties - Describing Quantitative Data
36 pages
W3 Product Market Fit - TPE
No ratings yet
W3 Product Market Fit - TPE
15 pages
5.IMPRESSION TECHNIQUES FOR COMPLETE DENTURES (Shewlett)
100% (1)
5.IMPRESSION TECHNIQUES FOR COMPLETE DENTURES (Shewlett)
45 pages
Casa Sofea Hotel Minutes of Meeting
100% (1)
Casa Sofea Hotel Minutes of Meeting
4 pages
Flight Ticket - Vadodara To New Delhi: Fare Rules & Baggage
No ratings yet
Flight Ticket - Vadodara To New Delhi: Fare Rules & Baggage
2 pages
Consumer Equilibrium
No ratings yet
Consumer Equilibrium
31 pages
Term Paper On Chile
100% (1)
Term Paper On Chile
4 pages
Internal Controls Todays Class
No ratings yet
Internal Controls Todays Class
26 pages
Core Mathematics 4 Jun14
No ratings yet
Core Mathematics 4 Jun14
4 pages
Capital Gains Tax SLIDES
No ratings yet
Capital Gains Tax SLIDES
38 pages
BAATSample Question Paper
No ratings yet
BAATSample Question Paper
14 pages
50 Quick Ideas
No ratings yet
50 Quick Ideas
216 pages
Special Deductions2025
No ratings yet
Special Deductions2025
35 pages
Gross Income
No ratings yet
Gross Income
31 pages
77777
No ratings yet
77777
29 pages
Hansen, Mass Culture in Kracauer, Derrida, Adorno
No ratings yet
Hansen, Mass Culture in Kracauer, Derrida, Adorno
32 pages
07 NuMicro FMC
No ratings yet
07 NuMicro FMC
21 pages
Capital Allowances
No ratings yet
Capital Allowances
24 pages
Grade 11 (Kinematics)
No ratings yet
Grade 11 (Kinematics)
27 pages
Special Inclusions
No ratings yet
Special Inclusions
11 pages
TAX 3B ASSIGNMENT Suggested Solution Unlocked
No ratings yet
TAX 3B ASSIGNMENT Suggested Solution Unlocked
4 pages
Internal Controls - Revenue Cycle
No ratings yet
Internal Controls - Revenue Cycle
27 pages
Lesson 4
No ratings yet
Lesson 4
14 pages
Viva Voce Question
No ratings yet
Viva Voce Question
3 pages
VAT Lesson 4
No ratings yet
VAT Lesson 4
21 pages
Retirement Benfits
No ratings yet
Retirement Benfits
12 pages
CR03 - PPAP-Flammability-IMDS-OTOP Status
No ratings yet
CR03 - PPAP-Flammability-IMDS-OTOP Status
1 page
Process Costing Slides
No ratings yet
Process Costing Slides
76 pages
Welding Research: Development of A New Hot-Cracking Test-The Sigmajig
No ratings yet
Welding Research: Development of A New Hot-Cracking Test-The Sigmajig
6 pages
NEC 61XM3 61 Plasma TV Manual
No ratings yet
NEC 61XM3 61 Plasma TV Manual
40 pages
Basic Calculus q4
No ratings yet
Basic Calculus q4
74 pages
Development of Presentation Media Design Based On Google Slides Add-On Pear-Deck On High School Sequences and Series Material
No ratings yet
Development of Presentation Media Design Based On Google Slides Add-On Pear-Deck On High School Sequences and Series Material
9 pages
jinnes,+CJNR Vol 36 Issue 01 Art 02
No ratings yet
jinnes,+CJNR Vol 36 Issue 01 Art 02
9 pages
An Agenda For Gesture Studies
No ratings yet
An Agenda For Gesture Studies
19 pages
Map Exercises
No ratings yet
Map Exercises
2 pages
An Empirical Model For Brand Loyalty Measurement: M. Punniyamoorthy
No ratings yet
An Empirical Model For Brand Loyalty Measurement: M. Punniyamoorthy
12 pages
Edu 101
No ratings yet
Edu 101
2 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet

Chapter 3

Uploaded by

Chapter 3

Uploaded by

Learning unit 2

Week 2: Summarising Data: Summary Tables and Graphs

Week 3: Describing Data: Numeric Descriptive Statistics

Real Equity Returns of 16 Major Equity Markets

subject to extreme large or small outliers

When numeric data is grouped into intervals and shown in a numeric

➢ finding the midpoint of each interval - representing all the x values in

➢ multiply each interval’s midpoint value by the frequency count

➢ summing up the total values of each interval

➢ divide the total sum by the sample size, n.

➢ It uses all the data values in its calculation

➢ It is an unbiased statistic (meaning that, on average, it represents the true

➢ It is not appropriate for categorical (i.e. nominal or ordinal-scaled) data;

➢ It is distorted by outliers. An outlier is an extreme value in a data set.

➢ The middle number of an ordered set of data

➢ Divides an ordered set of data values into two equal halves

To calculate the median for ungrouped (raw) numeric data:

➢ Arrange the n data values in ascending order.

Odd number location = (n + 1) / 2

Even number locations = n/2 and (n+2)/2 [middle 2 items]

A 16.83 1.23 13.68

* Mean = 79.41 (based on outlier E)

> Odd No. Location = (n + 1) / 2

* Median P/E = 19.07 …… better indication of central location (not

Advantage over the mean

➢ it is not affected by outliers → a more representative measure of central

➢ it cannot be calculated for categorical data – only be applied to numeric

➢ it is more affected by sampling fluctuations than the mean as it uses only

To calculate the mode:

When each data value is calculated from a different base, the

The percentage changes must be expressed as decimal values. For example, a 7%

To calculate the weighted arithmetic mean:

➢ Each quartile position is determined as follows (regardless of whether n is even or odd):

Quartile value = approximate quartile value + fraction part of quartile position ×

➢ Then find the lower or upper quartile interval

Percentiles are calculated in the same way as quartiles

Example to find the 40th

You might also like