0% found this document useful (0 votes)

24 views38 pages

Descriptive Statistics

Statistics

Uploaded by

HAPPY PHIRI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views38 pages

Descriptive Statistics

Statistics

Uploaded by

HAPPY PHIRI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Descriptive Statistics

Mrs. M. Mushabati
Objectives
By the end of this lecture the student should
understand:
•measures of central tendency (Mean, median,
mode) and how to calculate them.
•quartiles i.e quartiles Q1, Q2 and Q3.
•Measures of variation (range, inter-quartile range,
variance and standard deviation) and how to
calculate them
Introduction
In statistics we usually deal with large volumes of data that
needs to be organized, summarized and described
We need some form of summary to:
 permit us to deal with data in a manageable form

 be able to share our findings with others in scientific talks

and publications.
 A histogram or bar diagram of the frequency distribution

is one type of summary.

 However, for most purposes, a numerical summary n

needed to describe concisely, the properties of the

observed frequency distribution.
 And quantities providing such a summary are called

descriptive statistics.
Introduction
 Frequency distributions from continuous data are
described using measures of central tendency and
measures of dispersion.

 The measures of central tendency locate

observations on a measurement scale.

 The measures of dispersion suggest how widely

the observations are spread out
Measures of Central
Tendency
 Measures of central tendency are single numeric values
that describe the centre of the distribution of a numeric
variable
 The main numerical measures of central tendency used in
statistics are the mean and the median

 The other measure of central tendency used is the mode,

◦ it not often used in statistical computations
The Mean
 The mean is the average value, or the sum ( ∑) of
all the observed values (xi) divided by the total
number of observations (N):
The Mean
 A researcher wants to know the mean number of
decayed teeth among 10 children aged 5 years.
The following are the # of decayed teeth in the
children:
 2, 7, 3, 11, 4, 6, 5, 9, 0, 1 .
 The mean number of decayed teeth is:
(2+ 7+3+11+4+6+5+9+0+1) /10
= 48/10
= 4.8 teeth.
The mean
 The international domestic workers association
wishes to know the average minimum monthly
wage of the domestic workers in Zambia.
Assuming that the workers reported the following:
K200, K230, K300, K320, K350

 Their mean would be:

(200+230+300+320+350) / 5
= 280
The mean is affected by extreme values
 Considering example 2, if there were workers
with considerably higher salaries than the others,
the mean will be different

K200, K230, K300, K320, K1500

 The mean would then be K510, which is not
representative of the set of data as a whole
Properties of the mean
1. It is unique: for a given set of data, there is only
one arithmetic mean
2. It is easy to compute and understand
3. All values in a set of data contributes to the
computation of the mean.
 The mean is affected by extreme values
The median
 The median is the midpoint of the distribution.
 It is the value that divides an ordered data set

into two equal parts such that half of the

observations fall above, and half fall below
 To calculate the median:
◦ Order the data from smallest to largest
◦ Consider whether the number of observations
(n), is even or odd
The Median
 The median is the midpoint of the
distribution.
 It is the value that divides a ranked data set

into two equal parts such that half of the

observations fall above, and half fall below
 To calculate the median:
◦ Rank the data from smallest to largest
◦ Consider whether the number of
observations (n), is even or odd
The Median
 If n is even, the median is the mean of the two
centre observations in the ordered list.
 These are values sitting on the position n/2 and
(n+1)/2
 Considering example 1 regarding tooth decay in
children, n=10
2, 7, 3, 11, 4, 6, 5, 9, 0, 1 .
◦ Oder the observations in the data set
0, 1, 2, 3, 4, 5, 6, 7, 9, 11.
Median = (4+5)/2
= 4.5
The Median
 If n is odd, the median is the centre observation in the
ordered list
 It is the value that sits on the position (n+1)/2
 Considering the minimum wage of domestic workers in
Zambia with an outlier,
K200, K230, K300, K320, K1500
 The median wage would be:
K200, K230, K300, K320, K1500
Properties of the median
 It is unique: there is only one median for a set of
data
 It is easy to compute and understand
 It is not as drastically affected by extreme values
as is the mean
◦ It gives a more representative measure in a
data set that has a skewed distribution
The Mode
 This is the value in the data set which occurs
most frequently
◦ The mode is the value that is most frequently
occurring, not the frequency itself
 Mostly used in qualitative data
The Mode
 The following are the ages (in years) of patients
visiting a hospital with chemical poisoning:
13, 21, 15, 16, 13, 19, 14, 16, 16, 13.
 The most frequently occurring ages are 16 and 13

with a frequency of three.

◦ The data has 2 modal values
 In a small sample consisting observations
13, 21, 15, 16, 19
May not have a modal value since all values are
different
Properties of the Mode
 The mode is easy to calculate and understand
 Its computation is not based on all values as is the
case for the mean
◦ It is not affected by extreme large or small values
 Not be well defined if the data consists of small
number of values
◦ it is possible that there can be more than one
modal value
◦ Sometimes the data may not have a mode at all.
 It is not capable of further mathematical treatment
Measures of dispersion
 Measures of variation
give information on the
spread or variability of
the data values.
 This can be done by
calculating measures
based on percentiles or
measures based on the
mean
 The centre is the same but
the variation is different.
Quartiles
 Percentiles are sometimes called quantiles

 They are a percentage of observations below the

point indicated when all the observations are
ranked in descending order

 After data are ranked from lowest to highest, the

data can be divided into quarters (quartiles)

 Each subset containing the same number of

observations
Quartiles
These are values which divide a series of observations,
arranged in ascending order into 4 equal parts. (Thus the 2nd
Quartile is the Median).
 The data is ranked and then split into 4 segments with an

equal number of values per segment

 The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger

 Q2 is the same as the median (50% are smaller, 50% are

larger)
 Lastly, Only 25% of the observations are greater than the
third quartile
Calculating quartiles
Find a quartile by determining the value in the
appropriate position in the ranked data, where;
 First quartile position:Q1= (n+1)/4
 Second quartile position:Q2= (n+1)/2 (the median
position)
 Third quartile position: Q3= 3(n+1)/4
 where n is the number of observed values
Example 1
Find the first, second and third quartiles for the data:
11 16 12 16 17 22 18 13 21
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21
22 (n = 9)
Q1 is in the(9+1)/4 = 2.5 position of the ranked data so
use the value half way between the 2nd and 3rd values,
Therefore, Q1= 12.5
 Q1 and Q3 are measures of non-central location
 Q2= median, a measure of central location.
Example cont.
 Dataset: 11 12 13 16 16 17 18 21 22
 Q2 is in the(9+1)/2 = 5th position of the ranked
data,
Therefore, Q2= median = 16
 Q3 is in the3(9+1)/4 = 7.5 position of the ranked
data,
Therefore, Q3= 19.5
Inter-quartile range
Formula: IQR = Q3 − Q1

 STEP 1
 Find the median (Q2).

 STEP 2
 Divide the dataset in two sub-datasets. The values 50%
of the data below the median (Q2) and 50% above the
median (Q2).
 The median of the values below the median (Q2) is Q1.
 The median of the values above the median (Q2) is Q3.
Inter-quartile range

• Interquartile range =77–64 =13 OR can be

expressed simply as (IQR 64, 77)
The Range
 The range is a simplest measure of variation and it’s the
difference between the greatest and least data values.
 Range = Xlargest – Xsmallest
 However, it is more informative to provide the minimum
and the maximum values rather than providing the
range.

 Example: The scores of 6 students in a test. The scores

are: 50, 70, 64, 94, 78, 88. Find the range of values.
 Answer: 94-50= 44. The range can also be expressed as
50-94.
Standard Deviation
 Standard deviation (SD) is the most commonly
used measure of dispersion.

 It is a measure of spread of data about the mean.

 SD is the square root of sum of squared deviation

from the mean divided by the number of
observations.

 Has same units as the original data.

Standard Deviation (population)
 Steps in calculation Standard deviation

STEP 1
 Calculate the mean as a measure of central location (MEAN)

STEP 2
 Calculate the difference between each observation and the mean
(x-MEAN)
STEP 3
 Next square the differences known as SQUARED DEVIATION
(x-MEAN)²
What is the effect of this ?
 Negative and positive deviations will not cancel each other out.
Standard Deviation (population)
STEP 4
 Sum up these squared deviations (SUM OF THE

SQUARED DEVIATIONS)
 Σ (x -MEAN)²

STEP 5
 Divide this SUM OF THE SQUARED
DEVIATIONS by the total number of observations
(N) to give the VARIANCE
 This is a measure of the variability of the data
Variance (population)

Standard Deviation (population)
 STEP 6
• Get the square root of the variance to obtain
the standard deviation for the population.
Variance (sample)

Standard deviation
(sample)
The sample standard deviation is obtained by squaring
root of the sample variance.
Therefore, the sample standard deviation is given by;

 Why divide by n-1?

• This is an adjustment for the fact that the mean is
just an estimate of the true population mean. It tends
to make the variance bigger.
Calculating the variance and standard
deviation
Features of the standard deviation
• It is usually positive and NEVER negative

• It is 0 only when all data values are the same number

• The larger value for SD the greater amount the data

varies

• It can increase dramatically with the inclusion of outliers

• The units (minutes, feet, etc...) are the same as the

units of original values
Lecture Summary
We have described measures of central tendency
•Mean, median, mode.
•Discussed.
•Described measures of variation quartiles Q1, Q2
and Q3, range, inter-quartile range, variance and
standard deviation
Any Questions???

Session 2 Inferential Statistics Slides
100% (1)
Session 2 Inferential Statistics Slides
93 pages
Chapter 1
100% (1)
Chapter 1
75 pages
Health Statistics: Principles of Secondary Data Analysis
No ratings yet
Health Statistics: Principles of Secondary Data Analysis
61 pages
L2-Types of Data, Central Tendency and Dispersion-2
No ratings yet
L2-Types of Data, Central Tendency and Dispersion-2
81 pages
WEEK 3 - Central-Tendency-Variation-And-Shape
No ratings yet
WEEK 3 - Central-Tendency-Variation-And-Shape
39 pages
AGA 3842-2022-2023. Descriptive Statistics
No ratings yet
AGA 3842-2022-2023. Descriptive Statistics
101 pages
EDA W3 Obtaining-Data
No ratings yet
EDA W3 Obtaining-Data
57 pages
Exploring Numerical Data - Students
No ratings yet
Exploring Numerical Data - Students
97 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
Dtatistical Measures
No ratings yet
Dtatistical Measures
54 pages
2 Descriptives
No ratings yet
2 Descriptives
43 pages
Lecture 04
No ratings yet
Lecture 04
88 pages
Statistics For Business and Economics: Using Numerical Measures To Describe Data
No ratings yet
Statistics For Business and Economics: Using Numerical Measures To Describe Data
74 pages
Gtu 302 Biostatistics: Descriptive Statistics
100% (2)
Gtu 302 Biostatistics: Descriptive Statistics
57 pages
Summarizing Data
No ratings yet
Summarizing Data
49 pages
PC 2 Statistics by Praveen Mathur
No ratings yet
PC 2 Statistics by Praveen Mathur
44 pages
EECM3724 Unit 1 Ch3 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch3 Slides 2022
48 pages
Lesson 3.2 Measures of Central Tendency Position and Variation
No ratings yet
Lesson 3.2 Measures of Central Tendency Position and Variation
62 pages
Topic 1 Describing Data II
No ratings yet
Topic 1 Describing Data II
68 pages
Superperformance Stocks
100% (5)
Superperformance Stocks
128 pages
Measures of Central Tendency Dispersion
No ratings yet
Measures of Central Tendency Dispersion
30 pages
Lesson 3.2 Measures of Central Tendency Position and Variation
No ratings yet
Lesson 3.2 Measures of Central Tendency Position and Variation
51 pages
Slides Week2
No ratings yet
Slides Week2
43 pages
CH 3
No ratings yet
CH 3
59 pages
Probability Theory & Statistics: Describing Data: Numerical
No ratings yet
Probability Theory & Statistics: Describing Data: Numerical
36 pages
Share MBBS - Lecture 4 (1) - 1
No ratings yet
Share MBBS - Lecture 4 (1) - 1
68 pages
Week7 - Measures of Central Tendency
No ratings yet
Week7 - Measures of Central Tendency
46 pages
03 Numerical Description
No ratings yet
03 Numerical Description
52 pages
Lecture 2 Core Statistics 101 Mean Median Mode Distribution
No ratings yet
Lecture 2 Core Statistics 101 Mean Median Mode Distribution
32 pages
Lec1 Statistics
No ratings yet
Lec1 Statistics
30 pages
2.3 Descriptive Numerical Summary Measures
No ratings yet
2.3 Descriptive Numerical Summary Measures
67 pages
L3 Numerical Summary Measures
No ratings yet
L3 Numerical Summary Measures
44 pages
2 Measures of Location - Dispersion
No ratings yet
2 Measures of Location - Dispersion
61 pages
UNGROUPED DATA Measures of Central Tendency, Dispersion, and Position
No ratings yet
UNGROUPED DATA Measures of Central Tendency, Dispersion, and Position
34 pages
Measures of Centrality and Variability
No ratings yet
Measures of Centrality and Variability
42 pages
المحاضرة رقم 3
No ratings yet
المحاضرة رقم 3
44 pages
Central Tendency - Lecture Notes
No ratings yet
Central Tendency - Lecture Notes
34 pages
Lecture 2b - Describing Data-Numerical
No ratings yet
Lecture 2b - Describing Data-Numerical
47 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
Lecture 3 - Stat HO
No ratings yet
Lecture 3 - Stat HO
21 pages
1.2 Mathematical Presentation of Data
No ratings yet
1.2 Mathematical Presentation of Data
28 pages
Unit 3 Summarising Data - Averages and Dispersion
No ratings yet
Unit 3 Summarising Data - Averages and Dispersion
22 pages
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
No ratings yet
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
26 pages
Bill June24
No ratings yet
Bill June24
1 page
Assured Shorthold Tenancy
No ratings yet
Assured Shorthold Tenancy
6 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
Measures of Central Tendency - 1-1
No ratings yet
Measures of Central Tendency - 1-1
24 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Stock Statement
No ratings yet
Stock Statement
4 pages
Measures of Central Tendency and Dispersion
No ratings yet
Measures of Central Tendency and Dispersion
9 pages
2nd Unit - Statistics
No ratings yet
2nd Unit - Statistics
15 pages
Easa TCDS A.084 - Atr - 42 - Atr - 72 03 17102012 PDF
No ratings yet
Easa TCDS A.084 - Atr - 42 - Atr - 72 03 17102012 PDF
35 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Numerical Measures: Bf1206-Business Mathematics SEMESTER 2 - 2016/2017
No ratings yet
Numerical Measures: Bf1206-Business Mathematics SEMESTER 2 - 2016/2017
25 pages
8601 Quiz - 03087611772
0% (1)
8601 Quiz - 03087611772
55 pages
DDDDDD 2
No ratings yet
DDDDDD 2
5 pages
Measures
No ratings yet
Measures
8 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
No ratings yet
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
90 pages
Bio Statistics 3
No ratings yet
Bio Statistics 3
13 pages
Module 5 (Measures)
No ratings yet
Module 5 (Measures)
6 pages
Jerome Statistics
No ratings yet
Jerome Statistics
12 pages
Course Outline - 1: Matrix Methods For Structural Analysis
No ratings yet
Course Outline - 1: Matrix Methods For Structural Analysis
5 pages
Descriptive Lec
No ratings yet
Descriptive Lec
8 pages
Math in The Modern World Stat Lecture
No ratings yet
Math in The Modern World Stat Lecture
3 pages
eTranscriptFree
No ratings yet
eTranscriptFree
3 pages
Introduction To Epidemiology Lecture 2
No ratings yet
Introduction To Epidemiology Lecture 2
19 pages
Dokumen - Tips Speedstar Varistar Manua
No ratings yet
Dokumen - Tips Speedstar Varistar Manua
165 pages
Patent Cooperation Treaty
No ratings yet
Patent Cooperation Treaty
11 pages
Theory of Caring Kristen Swanson Models
No ratings yet
Theory of Caring Kristen Swanson Models
14 pages
MBC 240 Module 2025 (1) - 2
No ratings yet
MBC 240 Module 2025 (1) - 2
251 pages
USA PCC Form Pages 2
No ratings yet
USA PCC Form Pages 2
1 page
Impact of Bonus Issue On Market Price
No ratings yet
Impact of Bonus Issue On Market Price
70 pages
Mini Jolly Dali 20 Manual
No ratings yet
Mini Jolly Dali 20 Manual
6 pages
Testing of Lifting Equipment (Mobile Crane) As Per Legislation by Anshul Agrawal and Sanjay Kumar
No ratings yet
Testing of Lifting Equipment (Mobile Crane) As Per Legislation by Anshul Agrawal and Sanjay Kumar
8 pages
Marketing Research
No ratings yet
Marketing Research
4 pages
Directive Principles of State Policy
No ratings yet
Directive Principles of State Policy
2 pages
11 04 2019 Asea P1
No ratings yet
11 04 2019 Asea P1
40 pages
Dummy Tables For QOC Assessment 1st Draft
No ratings yet
Dummy Tables For QOC Assessment 1st Draft
33 pages
James R. Rosendall Jr.'s Bankruptcy Filing.
No ratings yet
James R. Rosendall Jr.'s Bankruptcy Filing.
50 pages
13.0 Lipid - Fat Acid Synthesis and Cholesterol Metabolism 2019
No ratings yet
13.0 Lipid - Fat Acid Synthesis and Cholesterol Metabolism 2019
42 pages
JBL Store Concept Presentation
No ratings yet
JBL Store Concept Presentation
22 pages
Republic Act No. 10846
No ratings yet
Republic Act No. 10846
1 page
ASHRAE Journal - Medical Office Building Thrives With Advanced Control Sequences
No ratings yet
ASHRAE Journal - Medical Office Building Thrives With Advanced Control Sequences
5 pages
Bachelor of Science in Hospitality Management
No ratings yet
Bachelor of Science in Hospitality Management
21 pages
Introduction To DBMS: Application Program End-User
No ratings yet
Introduction To DBMS: Application Program End-User
19 pages
Module 4: The Problems: Cyber Antipatterns
No ratings yet
Module 4: The Problems: Cyber Antipatterns
12 pages
Homework 5 Solutions
No ratings yet
Homework 5 Solutions
6 pages
Neck Related Exam Questions-1
No ratings yet
Neck Related Exam Questions-1
2 pages
IVYSS Code of Conduct Policy For Employees-1
No ratings yet
IVYSS Code of Conduct Policy For Employees-1
8 pages
MH12NR9505 PDF
No ratings yet
MH12NR9505 PDF
2 pages
Physiology Paper
No ratings yet
Physiology Paper
5 pages
Innovation in Volkswagen
No ratings yet
Innovation in Volkswagen
7 pages

Descriptive Statistics

Uploaded by

Descriptive Statistics

Uploaded by

Descriptive Statistics

 be able to share our findings with others in scientific talks

is one type of summary.

needed to describe concisely, the properties of the

 The measures of central tendency locate

 The measures of dispersion suggest how widely

 The other measure of central tendency used is the mode,

 Their mean would be:

K200, K230, K300, K320, K1500

into two equal parts such that half of the

into two equal parts such that half of the

with a frequency of three.

 They are a percentage of observations below the

 After data are ranked from lowest to highest, the

 Each subset containing the same number of

equal number of values per segment

 Q2 is the same as the median (50% are smaller, 50% are

• Interquartile range =77–64 =13 OR can be

 Example: The scores of 6 students in a test. The scores

 It is a measure of spread of data about the mean.

 SD is the square root of sum of squared deviation

 Has same units as the original data.

 Why divide by n-1?

• It is 0 only when all data values are the same number

• The larger value for SD the greater amount the data

• It can increase dramatically with the inclusion of outliers

• The units (minutes, feet, etc...) are the same as the

You might also like