0% found this document useful (0 votes)
10 views23 pages

2023 Topic 1 Investigating and Comparing Data Distributions

The document outlines the study design for VCE General Mathematics Unit 1, focusing on investigating and comparing data distributions. It covers various topics including types of data, measures of center and spread, and different methods for displaying data such as frequency tables, histograms, and boxplots. Each section includes definitions, examples, and exercises to reinforce learning.

Uploaded by

paige.kranz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views23 pages

2023 Topic 1 Investigating and Comparing Data Distributions

The document outlines the study design for VCE General Mathematics Unit 1, focusing on investigating and comparing data distributions. It covers various topics including types of data, measures of center and spread, and different methods for displaying data such as frequency tables, histograms, and boxplots. Each section includes definitions, examples, and exercises to reinforce learning.

Uploaded by

paige.kranz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Name: ………………………………………………

VCE General Mathematics Unit 1


Topic 1
Investigating and Comparing Data Distributions

Topic Timeline
Study Design Content

2
Contents
Topic Timeline ............................................................................................................................................................... 1
Study Design Content .................................................................................................................................................. 2
1.1 Introduction to data distributions ....................................................................................................................... 4
Types of data .................................................................................................................................................................. 4
Measures of centre and spread ............................................................................................................................... 5
1.2 Tables and Charts ....................................................................................................................................................... 6
Frequency tables ........................................................................................................................................................... 6
Grouped frequency tables ......................................................................................................................................... 7
Bar charts ......................................................................................................................................................................... 7
1.3 Histograms .................................................................................................................................................................... 8
Histograms and grouped frequency tables ......................................................................................................... 8
Centre and spread of histograms............................................................................................................................ 8
Shapes of histograms .................................................................................................................................................. 9
1.4 Boxplots ...................................................................................................................................................................... 11
Five-number-summary ........................................................................................................................................... 11
IQR, outliers and fences ........................................................................................................................................... 12
Boxplots ......................................................................................................................................................................... 13
Comparing boxplots and histograms ................................................................................................................. 14
1.5 Dot Plots and Stem-and-Leaf Plots ................................................................................................................... 15
Dot plots ........................................................................................................................................................................ 15
Stem-and-leaf plot ..................................................................................................................................................... 15
1.6 Back-to-back stem plots and parallel boxplots ........................................................................................... 17
Back-to-back stem plots .......................................................................................................................................... 17
Parallel boxplots ........................................................................................................................................................ 18
Which display do we use? ...................................................................................................................................... 18
1.7 Mean and Standard Deviation ............................................................................................................................ 19
Measures of centre .................................................................................................................................................... 19
Measures of spread ................................................................................................................................................... 20
Standard deviations away from the mean ....................................................................................................... 21
Topic 1 Review ................................................................................................................................................................ 23

3
1.1 Introduction to data distributions

Types of data
▪ We have two main types of data: numerical (quantitative, values are numbers) and
categorical (qualitative, values can’t be quantified).

▪ Categorical data involves either numbers where adding makes no sense or categories that
don’t involve any numbers.
o Ordinal data has a natural order. E.g. star ratings from 1-4 on Uber, house numbers,
letter grades.
o Nominal has no natural order. E.g. eye colour.
▪ Numerical data can be divided into several types, depending on context.
o Continuous data can be measured to as many decimal places as you can physically
manage. E.g. handspan, rainfall, temperature.
o Discrete data can only take specific values and can’t be measured to however many
decimal places. E.g. number of pets, shoe size (6, 6 ½, 7 but not in between these).
o Ratio data has a fixed numerical beginning. E.g. handspan (ratio and continuous)
and number of pets (ratio and discrete) both start at zero, but temperature does
not.
o Interval has no fixed beginning. E.g. temperature (interval and continuous) and
calendar years (interval and discrete).

Use the following flowchart to assist.

4
Example 1
We do…
Classify the following sets of data as categorical or numerical, and then nominal/ordinal or
discrete/continuous/interval/ratio.

a) Height of a person d) Finishing position in a 100 metre race

b) Annual profit or loss of a business to e) Number of friends on a social media


the nearest cent. platform

c) Hair colour
f) Temperature at Vladivostok

Measures of centre and spread


▪ Measures of centre give us a single value that can describe the distribution of the data
o Mode – most frequently occurring data value. There can be zero, or more than one.
o Median – the middle value, when the data is in order from lowest to highest. If
there are two middle values, add them and divide the sum by 2.

▪ Measures of spread give us an idea of how ‘spread apart’ the data is


o Range – the difference between the largest value and the smallest value.
= maximum – minimum
Example 2
I do… You do…
Find the median, mode, and range of the Find the median, mode, and range of the
following: following:
0, 1, 1, 5, 2, 0, 3, 1, 3, 2 15, 19, 17, 6, 12, 18, 18, 17, 15

Median: put in order. 0, 0, 1, 1, 1, 2, 2, 3, 3, 5


1+2
= 1.5
2

Mode: 1 (it happens 3 times, when the others


are 2 or less)

Range: 5 − 0 = 5

TO DO:
Nelson Ex 1.1 p. 9
q’s 2 (every 2nd), 3bcd, 4-7, 11, 13

5
1.2 Tables and Charts

Revision: Ex 1.2, p. 15, q’s 1-2


Frequency tables
▪ A frequency table gives the variable values, along with their frequency and percentage
frequency.
▪ Can be used to display both categorical and numerical data
▪ Frequency = the number of times a value occurs (the count)
▪ Percentage frequency = the percentage of times a value occurs:
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 = × 100
𝑡𝑜𝑡𝑎𝑙

Example 1
Construct frequency tables for the following sets of data:
I do…
The different car colours along a quiet road are counted:

Colour Frequency Percentage


Frequency

6
Grouped frequency tables
▪ Numerical data, usually continuous data
▪ When it’s impractical to list each individual value
▪ Gives the frequency of certain ranges
▪ Intervals must all be the same size

Example 2

We do…
Create a grouped frequency table for the following data, and find the modal interval:
45, 78, 80, 67, 43, 59, 32, 12, 100, 45, 58, 56, 69, 16

Range Frequency Percentage frequency

Total 100%

Modal interval:

Bar charts
▪ Bar charts provide a visual display for categorical data sets.
▪ The bars are drawn with gaps between to show that values are separate categories
▪ You’ve done plenty of these.

TO DO:
Nelson Ex 1.2 p. 15
q’s 3, 4ac, 5, 8-15

7
1.3 Histograms

Revision: Ex 1.3 p. 24 q’s 1-2


Histograms and grouped frequency tables
▪ A histogram is a graphical representation of numerical data from a frequency table.
Example 1
We do…

Siblings Frequency

Centre and spread of histograms

8
Shapes of histograms
Symmetrical
The mean, median, and mode are all
approximately in the middle
𝑚𝑒𝑎𝑛 ≈ 𝑚𝑒𝑑𝑖𝑎𝑛 ≈ 𝑚𝑜𝑑𝑒

Multimodal
More than one mode

Negatively skewed
The mean is ‘below’ the mode
𝑚𝑒𝑎𝑛 < 𝑚𝑒𝑑𝑖𝑎𝑛 < 𝑚𝑜𝑑𝑒

Positively skewed
The mean is ‘above’ the mode
𝑚𝑒𝑎𝑛 > 𝑚𝑒𝑑𝑖𝑎𝑛 > 𝑚𝑜𝑑𝑒

▪ We can also identify outliers. These are an extreme high or low value in the data.

9
Example 2
You do…

TO DO:
Nelson Ex 1.3 p. 24
q’s 3-6, 9-13

10
1.4 Boxplots

Revision: Ex 1.4 p. 34 q’s 1-2


Five-number-summary
▪ The five-number-summary (or five-figure-summary) is a quick representation of a given
data set. It includes:
o The minimum = the lowest data point
o The lower quartile, Q1 = the median of the lowest half of the data
o The median = the middle data point
o The upper quartile, Q3 = the median of the highest half of the data
o The maximum = the highest data point
▪ These can all be found by hand or by using CAS. CAS is usually quicker.
Example 1
I do… You do…
Find the five-figure summary of the following Find the five-figure summary by hand and
data: then using CAS of the following data:
16, 11, 4, 25, 15, 7, 14, 13, 14, 12, 15, 13, 16 15, 19, 17, 6, 12, 18, 18, 17

By hand:
Put in order and split in halves –
(4, 7, 11, 12, 13, 13,), 14, (14, 15, 15, 16, 16, 25)
Minimum = 4
11+12
Q1 = 2 = 11.5
Median = 14
15+16
Q3 = 2 = 15.5
Maximum = 25

On CAS:

11
IQR, outliers and fences

▪ The Interquartile Range (IQR) is the difference between the lower and upper quartiles. This
gives us the range of the middle 50% of the data.
o 𝐼𝑄𝑅 = 𝑄3 − 𝑄1
▪ The IQR is used in a calculation to identify possible outliers.
▪ A data value is a possible outlier if it is:
o Lower than the lower fence: 𝑄1 − 𝟏. 𝟓 × 𝐼𝑄𝑅
o Higher than the upper fence: 𝑄3 + 𝟏. 𝟓 × 𝐼𝑄𝑅
▪ We calculate both fences and then compare our possible outliers to these numbers.

Example 2
I do… You do…
Identify any outliers from the following data Identify any outliers from the following data
set, using calculations. set, using calculations.
50, 57, 62, 64, 65, 65, 65, 68, 70, 71, 72, 72, 73, 3, 7, 20, 22, 22, 22, 25, 25, 28, 31, 34, 34, 39
74, 77, 79, 79

Using CAS:

𝐼𝑄𝑅 = 73.5 − 64.5 = 9


𝐿𝑜𝑤𝑒𝑟 𝑓𝑒𝑛𝑐𝑒 = 64.5 − 𝟏. 𝟓 × 9 = 51
𝑈𝑝𝑝𝑒𝑟 𝑓𝑒𝑛𝑐𝑒 = 73.5 + 𝟏. 𝟓 × 9 = 87

There is one value below the lower fence: 50


There are no values above the upper fence.

There is one outlier, 50.

12
Boxplots
▪ AKA ‘box and whisker plots’
▪ A box is used to represent the middle 50% of the data points
▪ The median is shown by a vertical line drawn within the box
▪ Whiskers extend out from Q1 and Q3 to the minimum and maximum, respectively
▪ Any outliers are drawn as dots above and below the lower and upper fences

Without outliers: With outliers:

Example 3
We do:
Calculate and construct the:
a) five-figure-summary
b) lower and upper fences, and identify outliers
c) boxplot (first by hand then using CAS)

15, 2, 24, 30, 25, 19, 24, 33, 41, 60, 42, 35, 35
28, 28, 19, 19, 28, 25, 20, 36, 38, 43, 45, 39

13
Comparing boxplots and histograms

TO DO:
Nelson Ex 1.4 p. 34
q’s 3-15

14
1.5 Dot Plots and Stem-and-Leaf Plots
Revision: Ex 1.5 p. 42 q’s 1-2
Dot plots
▪ Like a bar chart, but each data point is marked by a dot

Stem-and-leaf plot
▪ Presents the numerical data in a different format that easier for the reader to interpret
▪ The stem is the ‘tens’ column of the numbers, and the leaf is the ‘ones’
▪ Make sure that the data points are ordered from low to high!
▪ When describing stem-and-leaf plots, you turn them on their side in your head and do the
same as histograms:
Symmetrical Negatively skewed Positively skewed

Example 1
Draw a stem-and-leaf plot of the following data sets:
I do… You do…
The following is a set of marks obtained by a
group of students on a test:
15, 2, 24, 30, 25, 19, 24, 33, 41, 60, 42, 35, 35
28, 28, 19, 19, 28, 25, 20, 36, 38, 43, 45, 39

15
Example 3
You do…
Calculate the:
a) i. mode
ii. range
iii. five-number summary
b) Identify any outliers, justifying your response with calculations.

TO DO:
Nelson Ex 1.5 p. 42
q’s 3-7, 11-13, 14b, 15abc

16
1.6 Back-to-back stem plots and parallel boxplots
Revision: Ex 1.6 p. 52 q’s 1-2
We can directly compare the distribution of data for two groups using stem plots or boxplots.
Back-to-back stem plots
▪ Allows for easy comparison of two small sets of discrete data

▪ Make sure you have headings for each set of leaves!

Example 1
I do…
Use the following data sets to
a) construct a back-to-back stem-and-leaf plot
b) compare the median, range, and IQR for both sets

a)

b) Medians (middle numbers):


Girls = 24, boys = 25
Ranges:
Girls = 28, boys = 26
IQR:
Girls = 16, boys = 14

You do…
Construct a back-to-back stem plot for the following data.

17
Parallel boxplots
▪ Two boxplots drawn on the same scale.
▪ Allows for easy comparison of two or more sets of data.
▪ Address median, IQR, and possible outliers.

Example 2
Use the following data sets to parallel boxplots, then compare the distributions.
I do…

The median for Saturday is 73.5, slightly


higher than Sunday at 70.5. The IQR for
Saturday is much lower than Sunday, at
10.5 as opposed to 40. This shows that
while Saturday and Sunday have
approximately the same centre, Sunday
has much more variation than Saturday.

Which display do we use?

TO DO:
Nelson Ex 1.6 p. 52
q’s 4-14

18
1.7 Mean and Standard Deviation
Revision: Ex 1.7 p. 64 q’s 1, 2
We summarise data using summary statistics, which are certain calculations using the data. These
typically summarise the centre or spread of the data.
Measures of centre
▪ ̅) is the ‘average’
The mean (𝒙
o For ungrouped data:
𝑠𝑢𝑚 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠
𝑥̅ =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠
o For a grouped frequency table:
𝑠𝑢𝑚 𝑜𝑓 (𝑒𝑎𝑐ℎ 𝑣𝑎𝑙𝑢𝑒 × 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦)
𝑥̅ =
𝑠𝑢𝑚 𝑜𝑓 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑖𝑒𝑠

▪ Remember:
o Median = ‘middle’ data point
o Mode = highest frequency (most common)
o If 𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛 > 0 (i.e. positive), the data is positively skewed

Example 1
Calculate the mean, median and mode of the following data sets, then comment on the shape of
the distribution:
I do… You do…
Score Frequency
37 2
38 4
39 7
40 4
41 1
672
𝑚𝑒𝑎𝑛 = = 26.88
25
Median = middle number

Median = 21
Mode = number with highest
frequency
Mode = 21 (happens 4 times)
𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛 = 26.88 − 21
= +5.88
This dataset is positively skewed.

19
Measures of spread
▪ The standard deviation measures the spread of the data distribution about the mean
∑(𝑥−𝑥̅ )2
o 𝑠= (always do on CAS!)
𝑛
▪ Remember:
o The range is the difference between the smallest and largest data points
▪ 𝑟𝑎𝑛𝑔𝑒 = 𝑚𝑎𝑥 − 𝑚𝑖𝑛
o The interquartile range (IQR) gives the spread of the middle 50% of data values
▪ 𝐼𝑄𝑅 = 𝑄3 − 𝑄1

Example 2:
Calculate the range, IQR, and standard deviation of the following data sets:
I do… You do…
Score Frequency
37 2
38 4
39 7
40 4
41 1
Range = 𝑚𝑎𝑥 − 𝑚𝑖𝑛
Range = 49 − 16 = 33
Q1 = middle number of lower half.
Q3 = middle number of upper half
Count in from both ends until you meet
in middle:

Q1=halfway between 19 and 19=19


Q3=halfway between 31 and 35=33
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
𝐼𝑄𝑅 = 33 − 19 = 14

20
Standard deviations away from the mean
▪ You can calculate how many standard deviations a data point is from the mean by finding
the ‘z score’:
𝑥 − 𝑥̅
𝑧=
𝑠
(i.e. the data point minus the mean, divided by the standard deviation)
▪ A lot of data collected in real life has histograms that (when smoothed out) look like the
‘normal distribution’:

▪ The mean, median, and mode are all pretty much in the ‘middle’ of the data, and very few
data points exist towards the ‘left’ and ‘right’ sides.
▪ If the histogram resembles a bell curve, we say that the data is approximately normally
distributed’.
▪ If the data is normally distributed, then we know that the following percentages of the
data are within a given number of standard deviations from the mean:

21
Example 3:
The weights of bags of red gravel may be modelled by a normal distribution with mean 25.5kg
and standard deviation 0.5kg.
Determine the probability that a randomly selected bag of red gravel will weigh:
I do… You do…
a) Less than 24.5kg a) More than 26kg

b) Between 24kg and 25.5kg b) Between 23.5kg and 24.5kg

c) Less than 23.5kg or more than c) Less than 24.5kg or more than 26kg
26.5kg

TO DO:
Nelson Ex 1.7 p. 64
q’s 4-12, 14

22
Topic 1 Review

▪ As you complete the questions, write one double-sided A4 sheet of notes. These notes
should include any information, theories, formulas, and examples that you used to help
you answer the questions. You may use the Chapter Summary on page 67 as a guide, or .

▪ As you create these pages, they will consist of useful summaries to have at the very start of
your bound reference.

▪ Use the following excerpt from the Study Design to help structure your notes:

TO DO:
Nelson page 71-75
all questions

23

You might also like