0% found this document useful (0 votes)
10 views32 pages

Lecture 2 Core Statistics 101 Mean Median Mode Distribution

This document provides an introduction to core statistics, focusing on descriptive statistics, types of data, and measures of central tendency such as mean, median, and mode. It explains the importance of summarizing large datasets and introduces concepts like sampling, range, and distribution shape. Additionally, it covers how to create histograms and box-and-whisker plots for data visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views32 pages

Lecture 2 Core Statistics 101 Mean Median Mode Distribution

This document provides an introduction to core statistics, focusing on descriptive statistics, types of data, and measures of central tendency such as mean, median, and mode. It explains the importance of summarizing large datasets and introduces concepts like sampling, range, and distribution shape. Additionally, it covers how to create histograms and box-and-whisker plots for data visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Day 2: Core statistics 101

U D M M S C C O U R S E I N E D U C AT I O N & D E V E L O P M E N T
2013

N i c h o l a s S p a u l l @ g m a i l . c o m – w w w. n i c s p a u l l . c o m / t e a c h i n g
Introduction

What are statistics?


 “the practice or science of collecting and analysing

numerical data in large quantities”

Why do we need descriptive statistics?


 When we look at large amounts of data, there is very

little “face value” information. If you had a dataset


listing the income of 10,000 people and someone
asked you if the income of the group was high or low
it would be difficult to answer that question without
using summary statistics (mean, median, mode etc.).
Types of Data
3

Data

Categorical Numerical

Discrete Continuous
Types of Data
4

Data

Categorical Numerical

Examples:
 Marital Status
 Political Party
 Eye Color
(Defined categories) Discrete Continuous

Examples: Examples:
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured characteristics)
Collecting Data
5

Primary Sources Secondary Sources


Data Collection Data Compilation

Print or Electronic

Observation Survey

Experimentation
Sampling

What is a sample?
 A sample is “a small part or quantity intended to show

what the whole is like”


Why do we use samples rather than the
population?
Descriptive Statistics
7

Collect data
 e.g., Survey
Present data
 e.g., Tables and graphs
Characterize data
 e.g., Sample mean =  X i
n
Measures of Central Tendency

Central Tendency

Mean Median Mode

X i
X  i1
n Midpoint Most
of ranked frequently
values observed
value
Mean
9

 The most common measure of central tendency


 Mean = sum of values divided by the number of
values
 Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
10 10

Mean = 3 Mean = 4

1  2  3  4  5 15 1  2  3  4  10 20
 3  4
5 5 5 5
Median
10

In an ordered array, the median is the


“middle” number (50% above, 50% below)

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
10 10

Median = 3 Median = 3

Not affected by extreme values


Finding the Median

 The location of the median:

n 1
Median position  position in the ordered data
2

 If the number of values is odd, the median is the middle


number
 If the number of values is even, the median is the average of
the two middle numbers
n 1
 Note that 2 is not the value of the median, only
the position of the median in the ranked data
Mode
12

A measure of central tendency


Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical
(nominal) data
There may be no mode
There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5
13 14 6
No Mode
Mode = 9
Review Example
13

 Five houses on a hill by the beach

$2,000 K
House Prices:

$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000

$100 K

$100 K
Review Example: Summary Statistics
14

House Prices:  Mean: ($3,000,000/5)


= $600,000
$2,000,000

500,000  Median: middle value of


ranked data
300,000
= $300,000
100,000

100,000  Mode: most frequent value


Sum $3,000,000 = $100,000
Mean, median, mode and range

 Mean = the average value


 Median = the middle value in an ordered list of data
 Mode = the most common value
 Range = difference between highest and lowest value

Example: If we calculated the height of a class and we found:

In cm: 160, 162, 164, 164, 165, 165, 165, 180, 190
Mean = (160+160+162+163+164+164+165+165+165+180+190)/9 = 167
Median = 160+160+162+163+164+164+165+165+165+180+190 = 164
Mode= 160+160+162+163+164+164+165+165+165+180+190 =165
Range= 190 – 160 =30

If you are still confused about how to calculate the mean, median and mode,
watch this 4min video on YouTube: http://www.youtube.com/watch?v=k3aKKasOmIw
Which measure of location is the “best”?
16

 Mean is generally used, unless


extreme values (outliers) exist
 Then median is often used,
since the median is not sensitive
to extreme values.
 Example: Median home prices may be
reported for a region – less sensitive
to outliers
Range
17

Simplest measure of variation


Difference between the largest and the
smallest values in a set of data:

Range = Xlargest – Xsmallest


Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13
Disadvantages of the Range
18

Ignores the way in which data are


distributed
7 8 9 10 11 7 8 9 10 11
12
Range = 12 - 7 = 12
Range = 12 - 7 = 5
5

Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 120 - 1 =
119
Getting from the real world to a
distribution

 When we collect data from the ‘real reading_sco


student_id re math_score
world’ we need to then represent it in 1 508 483
numerically and graphically useful 2 437 454
ways. This is where graphical analysis 3 378 454
4 355 469
and numerical statistical analysis are
5 388 353
helpful. 6 378 439
7 399 439
8 437 454
 Say we went into one classroom and
9 447 469
observed 22 students with the 10 355 454
following reading and mathematics 11 399 424
12 490 483
scores.
13 437 469
14 419 353
 To help understand the distribution of 15 516 535
16 456 439
performance in this class we will 17 525 522
calculate the mean, median and mode 18 447 353
and also create a histogram of the 19 437 454
20 456 454
data. (Do UDM Tut1)
21 456 424
 UDM Tutorial 1 – Mean, median, mode 22 551 454
Mean Median Mode
Create a histogram

 To create a histogram.
 Ensure that your analysis module in Excel is enabled
 FileOptionsAdd-InsAnalysis ToolPak (click Analysis ToolPak and click “Go” at the

bottom
 Under the “Data” tab in Excel you should now have a button which says
“Data Analysis” on the far right

 Click “Data Analysis”  Click “Histogram” Highlight the reading marks


for input rangehighlight the Bin ranges for bin rangeClick OK
 Relabel the Bin ranges 0-299, 300-399, 400-449 and so on. Insert
If you are still confused about how to create a histogram in Excel
graph.
watch this 4min video on YouTube: http://www.youtube.com/watch?v=RyxPp22x9PU
The normal distribution

In a perfect normal distribution the mean,


median and mode are equal to each other –
75 here.
Skewness

Negative/Left
skew 

TIP: To remember if it
is positive skew or
negative skew, think of
the distribution like a
door-stop. Does the
door touch the positive
side or the negative
side of the
distribution?  Positive/Right
skew
Shape of a Distribution
24

Describes how data are distributed


Measures of shape
 Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean < Median Mean = Median Median < Mean
Positive and negative skew
Example question

For this graph


will:
 The mean > mode?
 The median <
mean?
 The mean = mode?
 The mean =
median?
Example question

For this graph The “highest” point in


the distribution is
will: always the mode…
 The mean > mode?
 The median <
mean?
 The mean = mode?
 The mean =
median?
Tutorial quiz 1

Go to http://quizstar.4teachers.org/indexs.jsp
Enter your username and password
Click on “Basic Stats 101” Quiz and complete the
quiz
If you have any questions raise your hand and I will
come and help you 

For those not already registered you can register as


a student on http://quizstar.4teachers.org/indexs.jsp
and then search for my class ”UDM Msc
Education” anyone can join the class
End of Lecture 1

For questions email me at


[email protected]

All slides/tutorials available at


www.nicspaull.com/teaching
Exploratory Data Analysis
30

Box-and-Whisker Plot: A Graphical display of


data using 5-number summary:

Minimum -- Q1 -- Median -- Q3 -- Maximum

Example:

25% 25% 25% 25%

Minimum 1st Median 3rd Maximum


Minimum Quartile
1st Median Quartile
3rd Maximum
Quartile Quartile
Shape of Box-and-Whisker Plots
31

 The Box and central line are centered between


the endpoints if data are symmetric around the
median

Min Q1 Median Q3
Max
 A Box-and-Whisker plot can be shown in either
vertical or horizontal format
Distribution Shape and Box-and-Whisker Plot
32

Left-Skewed Symmetric Right-Skewed

Q1 Q2Q3 Q1Q2Q3 Q1 Q2 Q3

You might also like