0% found this document useful (0 votes)
31 views42 pages

Level 1 Multivariate Workbook Answers

Uploaded by

theacyzarine2006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views42 pages

Level 1 Multivariate Workbook Answers

Uploaded by

theacyzarine2006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Level 1

Multivariate
Workbook
ANSWERS

By Liz Sneddon &


Kathie Albertson
© Liz Sneddon 2020
Problem
Writing Investigation Questions
A comparison question needs:

● Categorical variable (2 groups),

● Numerical variable (Measurement or Count),

● The word “tends”,

● Direction,

● Population (use the word ALL to describe it).

Example:

I wonder if the travel time (minutes) for students who catch the bus tends to be
longer than the travel time (minutes) for students who walk, for ALL high school
students in NZ, for data from Census at School in 2015.

Exercise:

1) In the example given above:

a. What are the two groups that are being compared?

Students who catch the bus, and students who walk to school
b. What is the numerical (measurement or count) variable? And its units?

Travel time (minutes)


c. What is the population?

ALL high school students in NZ, for data from Census at School in
2015.
d. What is the direction?

© Liz Sneddon 2020 Page 2


I suggest that students who catch the bus will take LONGER than
students who walk.
2) The questions below all are missing one or more requirements. Identify each of
the requirements, and which parts are missing. Then rewrite the question.
a) Are 8 year old boys generally taller (cm) than 8 year old girls in NZ?

Categorical variable: Gender: girls and boys

Numerical variable and units: Height (cm)


Median: yes / no
Direction: yes / no boys TALLER than girls

Population: 8 Year old students in NZ

Rewritten question:

Does the height of 8 year old boys tend to be taller (cm) than the
height of 8 year old girls in NZ?

b) Do all 18 year old males tend to have a longer right foot than all 18 year old
females in NZ?

Categorical variable: Gender: males and females

Numerical variable and units: Right foot length (cm) the units are
missing
Median: yes / no
Direction: yes / no males have LONGER foot length than females

Population: 18 year olds in NZ

Rewritten question:

Do all 18 year old males tend to have a longer right foot length (cm)
than all 18 year old females in NZ?

© Liz Sneddon 2020 Page 3


c) I wonder if there is a difference between the average school bag weight for girls
and boys?

Categorical variable: Gender: girls and boys

Numerical variable and units: School bag weight (kg) units are missing
Median: yes / no
Direction: yes / no

Population: this is missing.

Rewritten question:

I wonder if the bag weight (kg) for girls tends to be heavier than the bag
weight for boys, for high school students in NZ?

d) How does the number of text messages all teenage girls send daily compare with
the number of text messages all teenage boys send daily, in Auckland?

Categorical variable: Gender: girls and boys

Numerical variable and units: number of text messages sent daily


Median: yes / no
Direction: yes / no

Population: teenagers in Auckland.

Rewritten question:

Does the number of text messages sent each day by ALL teenage girls in
Auckland tend to be higher than the number of text messages sent each day
by teenage boys?

© Liz Sneddon 2020 Page 4


3) The dataset described below is a sample taken from marathons in NZ. It is a
simple random sample of 200 athletes.

Variable Description

Minutes How many minutes they completed the marathon in

Gender Male (M) or Female (F)

AgeGroup Younger (under 40) or older (over 40)

StridelengthCM The persons average stride length over the marathon in cm.

There are four possible comparison questions that you can write from the dataset
above. Write all four questions below.

I wonder if the completion time (minutes) for females running a


marathon tends to be longer than the completion time for males, from
athletes in NZ?

I wonder if the marathon completion time (minutes) for younger


athletes (under 40) tends to be shorter than the completion time for
older (over 40) athletes in NZ?

I wonder if the stride length (cm) for males tends to be longer than the
stride length for females, for athletes doing marathons in NZ?

I wonder if the stride length (cm) for younger (under 40) athletes tends
to be longer than the stride length (cm) for older (over 40) athletes
running marathons in NZ?

© Liz Sneddon 2020 Page 5


4) The dataset described below is a sample of rugby data taken from
http://www.rugby-sidestep-central.com/

Variable Description

Country New Zealand or South Africa

Position Forward or Back

Weight The weight of the player in kilograms (kg)

Height The height of the player in metres (m)

There are four possible comparison questions that you can write from the dataset
above. Write all four questions below.

I wonder if the weight (kg) of a NZ rugby player tends to be more than


the weight of a South African rugby player, for data from the website
http://www.rugby-sidestep-central.com/

I wonder if the height (cm) of a NZ rugby player tends to be more than


the height of a South African rugby player, for data from the website
http://www.rugby-sidestep-central.com/

I wonder if the weight (kg) of a Forward tends to be more than the


weight of a Back, for rugby players from data on the website
http://www.rugby-sidestep-central.com/

I wonder if the height (cm) of a Forward tends to be more than the


height of a Back, for rugby players from data on the website
http://www.rugby-sidestep-central.com/

© Liz Sneddon 2020 Page 6


Plan
When we collect data, we usually have a sample not a population
(census).

Data
Add summary statistics and a box plot to your graph.

You need to take a sample of between 20-40 in each group and change the point size
(to better see the shape).

Watch this video to learn how to do that: http://tiny.cc/SampleNZGrapher

© Liz Sneddon 2020 Page 7


Analysis
Your analysis is all about the sample that you have taken, and what patterns you can
see.

© Liz Sneddon 2020 Page 8


Measure of Center

There are 3 measures of center - mean, median and mode. For this assessment, we
are going to focus only on the median.

Median = the number in the middle


(when the data is in order)

Example:

Estimate the center and find the


median.
Data: 9, 3, 1, 8, 3, 6

Median
Put the numbers in order: 1, 3, 3, 6, 8, 9
Find the number(s) in the middle: 1, 3, 3, 6, 8, 9

3+6
Find the median = = 4.5
2

© Liz Sneddon 2020 Page 9


Exercises:

Estimate center on the graph. Calculate the median

Median = 4

Data: 4, 6, 3, 8, 2, 4, 9

Median = 3.5

Data: 4.4 4.7 3.5 2.2 4.2 6.7


2.9 4.4 1.5 2.0 3.3

Median = 29

Data: 25, 35, 37, 36, 28, 29, 36, 26, 22

© Liz Sneddon 2020 Page 10


Measure of Spread

A measure of spread looks at how precise or accurate the data is.

You will only look at the IQR (Interquartile range).

That means the spread of the middle 50% of the data (the box part of the box and
whisker graph).

IQR = UQ - LQ

where UQ = Upper Quartile = the number where one quarter of the data lies above
it,
and LQ = Lower Quartile = the number where one quarter of the data lies below it.

© Liz Sneddon 2020 Page 11


The box and whisker plot.
The Box and whisker plot (or just box plot) shows minimum, lower quartile (LQ),
median, upper quartile (UQ) and maximum values of a dataset.

Example:

Estimate the summary statistics from the graphs below:

Minimum = 0
LQ = 165
Median = 230
UQ = 315
Maximum = 650

Calculate the IQR = UQ - LQ = 315 - 165 = 150

© Liz Sneddon 2020 Page 12


Exercise:

Estimate the summary statistics from the graphs below:


1)

Minimum = 76

LQ = 95

Median = 102

UQ = 115

Maximum = 140

Calculate the IQR = 115 - 95 = 20

© Liz Sneddon 2020 Page 13


2)

Minimum = 760

LQ = 1170

Median = 1390

UQ = 1600

Maximum = 1870

Calculate the IQR = 1600 - 1170 = 430

© Liz Sneddon 2020 Page 14


Writing Comparative Statements
Here are the features you need to analyse and COMPARE.
1. Shape
2. Center
3. Spread

We will now go through each feature, before putting it all together.

1. Shape Exercise
Normal distribution
(hill/mound shapes, symmetric, bell shaped curve)

Left skewed
(Tail is on the left hand side)

Right Skewed
(tail is on the right hand side)

Bimodal
(there are two peaks)

Uniform
(the sides are straight and it looks like a box)

Example:

The data has 2 peaks, so looks approximately bimodal in shape.

© Liz Sneddon 2020 Page 15


Exercise:1

Sketch over the top of each graph and then state what shape it most closely matches.

1. 2. 3.

Normal Uniform Bimodal

4. 5. 6.

Right skewed Normal Left skewed

7. 8. 9.

Right skewed and Right skewed Normal


bimodal

1
Thanks to Dr Pip Arnold for the graphs.
© Liz Sneddon 2020 Page 16
10. 11. 12.

Normal Left skewed Right skewed and


bimodal

13. 14. 15.

Right skewed Normal Right skewed

16. 17. 18.

Right skewed and Right skewed Normal


bimodal

© Liz Sneddon 2020 Page 17


Justifying Shape
To justify the shape, think about the following features:
● Symmetry
● Number of peaks
● What shape the tails are

Exercise:

For each graph, select the correct symmetry, peaks and tail description.

Symmetric (circle one): Yes / No

Number of peaks = 1

Tails (circle one):


Both tails the same size /
Left hand tail longer /
Right hand tail longer

Symmetric (circle one): Yes / No

Number of peaks = 1

Tails (circle one):


Both tails the same size /
Left hand tail longer /
Right hand tail longer

Symmetric (circle one): Yes / No

Number of peaks = 2

Tails (circle one):


Both tails the same size /
Left hand tail longer /
Right hand tail longer

© Liz Sneddon 2020 Page 18


Symmetric (circle one): Yes / No

Number of peaks = 1

Tails (circle one):


Both tails the same size /
Left hand tail longer /
Right hand tail longer

Symmetric (circle one): Yes / No

Number of peaks = 1

Tails (circle one):


Both tails the same size /
Left hand tail longer /
Right hand tail longer

Symmetric (circle one): Yes / No

Number of peaks = 1

Tails (circle one):


Both tails the same size /
Left hand tail longer /
Right hand tail longer

Symmetric (circle one): Yes / No

Number of peaks = 1

Tails (circle one):


Both tails the same size /
Left hand tail longer /
Right hand tail longer

© Liz Sneddon 2020 Page 19


Comparing the Centers
Locate the medians, and tell me which groups median is bigger and by how much.
For Merit, you need to add the justification and evidence.

© Liz Sneddon 2020 Page 20


Example:

The median weight for my sample of females is 68.45kg.

The median weight for my sample of males is 81.85kg.

Difference = male median - female median

= 81.85kg - 68.45kg

= 13.4kg

The median weight for my sample of the males is heavier than for my sample of
females by 13.4kg.

© Liz Sneddon 2020 Page 21


Exercise:

Compare the medians for the graphs below.


1)

The median height for the sample of males is 185.45 cm

The median height for the sample of females is 174.7cm

Difference = median males - median females

= 185.45 - 174.7

= 10.75cm

The median height for the sample of males is taller than the median height for my

sample of females by 10.75cm.

© Liz Sneddon 2020 Page 22


2)

The median amount of money the sample of girls spent for the school ball is $310

The median amount of money the sample of boys spent for the school ball is $200

Difference = median girls - median boys

= $310 - $200

= $110

In the sample, the median amount of money girls spent for the school ball is greater

than the median amount of money boys spent by $110.

© Liz Sneddon 2020 Page 23


Spread – comparing the spread

Find the IQR, and tell me which group’s spread is bigger and by how much.
For Merit, you need to add the justification and evidence.

Compare how wide the boxes are on the box plot. Is one group wider than the other?

Then calculate the IQR = UQ - LQ

© Liz Sneddon 2020 Page 24


Example:

IQR (females) = UQ - LQ
= 5.5 - 2
= 3.5kg
The IQR for bag weights for females is 3.5kg.

IQR (males) = UQ - LQ
= 5.1 - 2
= 3.1kg
The IQR for bag weights for females is 3.1kg.

In the sample, the spread of the middle 50% of bag weights for females is a little
wider than the spread of the middle 50% of bag weights for males.

© Liz Sneddon 2020 Page 25


Exercise:

Compare the spread for the graphs below.


1)

IQR (females) = UQ - LQ

= 179.7 - 170.8

= 8.9cm

The IQR for heights of females is 8.9cm

IQR (males) = UQ - LQ

= 191.5 - 179.6

= 11.9 cm

The IQR for heights of males is 11.9 cm

In the sample, the spread of the middle 50% of heights for females is smaller than
the spread of the middle 50% of heights for males.

© Liz Sneddon 2020 Page 26


2)

IQR (girls) = UQ - LQ

= $580 - $210

= $370

The IQR for amount of money spent for the school ball for girls is $370

IQR (boys) = UQ - LQ

= 260 - 150

= $110

The IQR for amount of money spent for the school ball for boys is $110

In the sample, the spread of the middle 50% of how much girls spend for the school
ball is over 3 times larger than the spread of the middle 50% of boys spending for
the school ball.

© Liz Sneddon 2020 Page 27


Full Example:

Problem:
Is the median weight of girls’ school bags greater than the median weight of boys’
school bags, for ALL students at Intermediate schools in NZ?

For my sample, I notice:

● The shape of the females and male bag weights in have the same right skewed
shape. The females and male bag weights are right skewed because they have
one peak on the left hand side, are asymmetric, and there is a longer tail on the
right hand side.

● The median of the female bag weights is a little heavier than the bag weights
for males by 0.5kg. My evidence is that the median bag weight for females is
around 3.8kg while the median bag weight for males is around 3.3kg.

● The spread of the middle 50% of females bag weights is slightly larger than the
spread of males bag weights, because the IQR of the females is approximately
3.5 kg compared to the IQR for males of 3.1 kg.

© Liz Sneddon 2020 Page 28


Exercise
For the samples below, write a complete analysis. Discuss features such as: shape,
center and spread.

1. I wonder if the median weight of male kiwi birds in NZ is heavier (kg) than the
median weight of female kiwis, for ALL kiwi birds in NZ.

The shape of the female kiwi birds is a normal distribution


because it is a symmetric mound shape. The male kiwi birds
graph has a slight right skew .

The median of the female kiwi’s weight is 2.879 kg which is


heavier than the median for males which is 2.248 kg. The female
median is 0.631 kg heavier than the males’ median weight.

The spread of the middle 50% of females is slightly larger than


the spread for males. The IQR is (3.151-2.6135) which is 0.5375
kg for females, and (2.441-2.0425) which is 0.3985kg for males.

© Liz Sneddon 2020 Page 29


2. For high school students in New Zealand, is the median travel time (minutes)
for ALL students who take the bus to school longer than the median travel time
than for ALL high school students who walk to school?

The shape for both bus and walk students is right skewed
because there is a tail on the right side and the data is piled up on
the left side.

The students who walk. have a distribution with a stronger skew


to the right.

The median travel time for bus students is 25 minutes for walk
students it is 10 minutes so the median for walk students is 15
minutes less.

The spread of the middle 50% of bus students is longer than the
spread of the middle 50% of walking students because the
interquartile range for bus students is 25 minutes, for walking
students it is 15 minutes.

© Liz Sneddon 2020 Page 30


3. For ALL high school students in New Zealand, is the median age for students
who have a device older than the median age for those who do not have a
device?

The shape of ages of students who DO NOT have their own device
is normal, because the shape is symmetric, one peak, and the
tails are both similar. The shape of ages of students who DO have
a device is left skewed because there is one peak, no symmetry,
and a longer tail on the left hand side.

The median age of students who DO NOT have their own device is
11 years old, while the median age of students who DO have their
own device is 14 years old. This shows that the median age of
students who DO have their own device is 3 years older than the
median age of students who DO NOT have their own device.
The spread of the middle 50% of ages for students who DO have
their own device is smaller than the spread of the middle 50% of
ages of students who DO NOT have their own device.
IQR (No device) = UQ - LQ = 12.5 - 10 = 2.5 years old

IQR (Have device) = UQ - LQ = 15 - 12 = 3 years old


© Liz Sneddon 2020 Page 31
4. I wonder if the median memory test percentage is higher for all female high
school students from schools in NZ than the median memory test percentage for
all male high school students.

The shape of the memory test result for both males and females is
right skewed, because the shape is not symmetric, there is one
peak and a longer tail on the right hand side.
The median memory test for females is 47% an the median test
result for males is 50%. This shows that males have a higher
median test mark by 3%.
The spread of the middle 50% of memory test results for females
is about the same as the middle 50% of memory test results for
males.
IQR (female) = UQ - LQ = 57 - 40 = 17%
IQR (male) = UQ - LQ = 61 - 42 = 19%

© Liz Sneddon 2020 Page 32


Conclusion
Your conclusion is about using the sample to infer (suggest) what might be
happening in the population.

You need to include the following in your conclusion:


● Answer your investigation (can you make the call?),
● Discuss sampling variability.

© Liz Sneddon 2020 Page 33


Answer the investigation question.
Because we have a sample, and don’t know the whole population, we need to decide
if we can make the call whether one group tends to be larger than the other group.

Making the call - method 1


Steps:
1) Shade in 50% of the lower group.

2) Shade in 75% of the higher group.

3) Do these areas overlap?

Evidence:
If there is no overlap, then the results tend to be higher for one group.

75% of the data in one group is not 75% of the data in one group is bigger
bigger than 50% of the second group, so than 50% of the second group, so I do
Ido not have enough have enough evidence that
evidence that one group tends to one group tends to be larger than the
be larger than the second group. second group.

© Liz Sneddon 2020 Page 34


Making the call - method 2
This uses the same idea, looking for evidence about whether the majority of one
group tends to be larger than the majority of the second group. For this method we
will use the medians to make the call.

The median of both groups is inside The median of both groups is outside
the box of the other group, so I do the box of the other group, so I do
not have enough evidence have enough evidence that
that one group tends to be larger than one group tends to be larger than the
the second group. second group.

© Liz Sneddon 2020 Page 35


Example:

Problem:
I wonder if the median weight of babies born to mothers who smoked is smaller than
the median weight of babies born to mothers who didn’t smoke, for ALL participants
at Baystate Medical Center, Springfield, Mass. during 1986.

Conclusion:
(You can use either method 1 or method 2, you don’t need both)

Method 1:

50% of the weights of babies born to mothers who don’t smoke are larger than 75%
of the weights of babies born to smoking mothers, so I have enough evidence to
make the call.

Method 2:

The median weight of babies born to mothers who don’t smoke is OUTSIDE the box
of the baby weights for mothers who smoke, so I have enough evidence to make the
call.

Inference:

I can make the call, so I DO have enough evidence that the weight of babies born to
mothers who smoked tends to be smaller than the weight of babies born to mothers
who didn’t smoke, for ALL participants at Baystate Medical Center, Springfield, Mass
during 1986.

© Liz Sneddon 2020 Page 36


Exercise:

1) Here is a sample of the weights of male and female kiwi birds from around NZ.

Problem:

I wonder if the weight of female kiwis tends to be heavier than the weight for male
kiwis, for ALL kiwi birds from around NZ?

Conclusion:

Method 1: Shade in 50% and 75%. Is there any overlap? Yes / No

Method 2: Does one or both medians lie outside the othe box? Yes / No

Can you make the call? Yes / No

I do / don’t have enough evidence that the weight for female kiwis tends to be
heavier than the weight for male kiwis, for ALL kiwi birds from around NZ.

© Liz Sneddon 2020 Page 37


2)

Problem:

I wonder if the height for males tends to be taller than the weight for females from,
for all adults in NZ?

Conclusion:

Method 1: Shade in 50% and 75%. Is there any overlap? Yes / No

Method 2: Does one or both medians lie outside the othe box? Yes / No

Can you make the call? Yes / No

I do / don’t have enough evidence that the height for males tends to be taller
than the height for females, for all adults in NZ.

© Liz Sneddon 2020 Page 38


3)

Problem:

I wonder if the amount of money that female students tend to spend on ball wear is
more than the amount of money males spend, for all high school students in NZ?

Conclusion:

Method 1: Shade in 50% and 75%. Is there any overlap? Yes / No

Method 2: Does one or both medians lie outside the othe box? Yes / No

Can you make the call? Yes / No

I have enough evidence that the amount of money that female


students spend on ball wear tends to be more than the amount of
money males spend, for all high school students in NZ.

© Liz Sneddon 2020 Page 39


4)

Problem:

I wonder if the time it takes older people to complete a marathon tends to be longer
than the time it takes younger people to complete a marathon, for all marathon
runners in NZ?

Conclusion:

Method 1: Shade in 50% and 75%. Is there any overlap? Yes / No

Method 2: Does one or both medians lie outside the othe box? Yes / No

Can you make the call? Yes / No

I do have enough evidence that the time it takes older people to


complete a marathon tends to be longer than the time it takes
younger people to complete a marathon, for all marathon runners
in NZ.

© Liz Sneddon 2020 Page 40


Sampling variation
Go to the following website: http://tiny.cc/Variation

Exercise
1) If each blue dot represents the height of a girl aged 12, why do they keep
changing each time they take another sample?

They keep changing because every time we take a different


sample we get slightly different members of the population.

© Liz Sneddon 2020 Page 41


Variation in the medians
Go to the following website: http://tiny.cc/VariationInMedian

Exercise
1) Look at the median (the line in the middle of the box). Notice how it changes
every time another sample is taken. Explain why this happens.

Every time we take a sample we get a slightly different group of


girls in this case and so we get a slightly different median.

2) Complete the following sentence. Think about the data and the medians.

If I took another sample …

I would get slightly different people and my sample and so a


slightly different median I would expect to make the same
conclusion at the end because the sample has come from the
same population

© Liz Sneddon 2020 Page 42

You might also like