0% found this document useful (0 votes)
415 views18 pages

Bivariate Data Analysis HSC Questions

The document contains a series of questions related to Bivariate Data Analysis for Advanced Mathematics students at Cherrybrook Technology High School. It covers topics such as scatterplots, correlation coefficients, least squares regression analysis, and interpretation of data. The analysis emphasizes the importance of understanding common pitfalls and key skills necessary for success in HSC exams.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
415 views18 pages

Bivariate Data Analysis HSC Questions

The document contains a series of questions related to Bivariate Data Analysis for Advanced Mathematics students at Cherrybrook Technology High School. It covers topics such as scatterplots, correlation coefficients, least squares regression analysis, and interpretation of data. The analysis emphasizes the importance of understanding common pitfalls and key skills necessary for success in HSC exams.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

lOMoARcPSD|49421039

Bivariate Data Analysis HSC Questions

Mathematics Advanced (Cherrybrook Technology High School)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Eleni EL ([email protected])
lOMoARcPSD|49421039

Questions
ADVANCED MATHEMATICS
Statistics (Adv), S2 Interpretation and Bivariate Data (Adv) 1. Algebra, STD2 A2 2017 HSC 3 MC
Bivariate Data Analysis (Y12) The graph shows the relationship between infant mortality rate (deaths per 1000 live births) and life
expectancy at birth (in years) for different countries.
Teacher: Elena Efimova
Exam Equivalent Time: 109.5 minutes (based on allocation of 1.5 minutes per mark)
90

Life expectancy at birth


80

70

(years)
60

50

0 10 20 30 40 50 60 70 80 90 100 110 120


Infant mortality rate
(deaths per 1000 live births)
HISTORICAL CONTRIBUTION What is the life expectancy at birth in a country which has an infant mortality rate of 60?
S2 Interpretation and Bivariate Data has contributed an average of 6.0% per new syllabus Advanced A. 68 years
exam since it began in 2020.
B. 69 years
We have split the area into six sub-topics for analysis purposes which are: 1-Classifying Data (0%), 2-
Bar Charts and Histograms (0.3%), 3-Other Chart Types (0.3%), 4-Summary Statistics - Box Plots (0%), C. 86 years
5-Summary Statistics - No Graph (1.0%) and 6-Bivariate Data Analysis (4.4%). D. 88 years
This analysis looks at Bivariate Data Analysis.

2. Statistics, STD2 S4 2017 HSC 12 MC


HSC ANALYSIS - What to expect and common pitfalls
Which of the data sets graphed below has the largest positive correlation coefficient value?
Bivariate Data Analysis (4.4%) investigates scatterplots, correlation, lines of best fit, least squares
regression analysis etc ... and represents the key area where common questions appear in both the
Adv and Std2 exams. A. B.
Common Adv-Std2 exam questions have appeared in all new syllabus Adv exams, with significant
allocations of between 4-5 marks on each occasion. This is a trend we expect to continue and any
revision should reflect this.
The word-heavy narrative style of the 2021 Q17 and 2020 Q27 represent critical revision questions
that students must review carefully.
Just as importantly, the 2022 Q24 example presented students with a novelly broad question to
"describe and interpret the data and other information provided, with reference to the context given". C. D.
Revision here should include a critical focus on an efficient structure of any solution.
Calculating a least squares regression line from raw data is a key skill that students must be able to
execute efficiently. EQ-Bank Q1-3 are important revision examples covering this area.
Pitfalls: Marker's comments have highlighted issues in finding equations of best fit, interpreting
gradients and identifying limitations of an equation - all areas that are thoroughly covered in this
database.

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

3. Statistics, 2ADV S2 2023 HSC 1 MC 5. Statistics, STD2 S4 2011 HSC 8 MC


The number of bees leaving a hive was observed and recorded over 14 days at different times of the In which graph would the data have a correlation coefficient closest to – 0.9?
day.

Which Pearson's correlation coefficient best describes the observations?


A. – 0.8
B. – 0.2
C. 0.2
D. 0.8

4. Statistics, STD2 S4 2016 HSC 3 MC


The graph shows a scatterplot for a set of data.

Which of the following is the best approximation for the correlation coefficient of this set of data?
(A)
(B)
(C)
(D)

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

6. Statistics, STD2 S4 2013 HSC 2 MC 7. Statistics, STD2 S4 2008 HSC 12 MC


Which graph best shows data with a correlation closest to 0.3? A scatterplot is shown.

Which of the following best describes the correlation between and ?


(A) Positive
(B) Negative
(C) Positively skewed
(D) Negatively skewed

8. Statistics, STD2 S4 2007 HSC 9 MC


Which of the following would be most likely to have a positive correlation?
(A) The population of a town and the number of schools in that town
(B) The price of petrol per litre and the number of litres of petrol sold
(C) The hours training for a marathon and the time taken to complete the marathon
(D) The number of dogs per household and the number of televisions per household

9. Statistics, STD2 S4 2012 HSC 11 MC


Which of the following relationships would most likely show a negative correlation?
(A) The population of a town and the number of hospitals in that town.
(B) The hours spent training for a race and the time taken to complete the race.
(C) The price per litre of petrol and the number of people riding bicycles to work.
(D) The number of pets per household and the number of computers per household.

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

10. Statistics, STD2 S4 SM-Bank 1 12. Statistics, 2ADV S2 2021 HSC 17


A student claimed that as time spent swimming training increases, the time to run a 1 kilometre time  Part b: RAP Data - Bottom 4%: School result (59%) was 4% below state average (63%)
trial decreases.
For a sample of 17 inland towns in Australia, the height above sea level, (metres), and the average
After collecting and analysing some data, the student found the correlation coefficient, , to be – maximum daily temperature, (°C), were recorded.
0.73.
The graph shows the data as well as a regression line.
What does this correlation indicate about the relationship between the time a student spends
swimming training and their 1 kilometre run time trial times. (1 mark)
Average maximum daily temperature

Average maximum daily temperature (y °C)


40
11. Statistics, STD2 S4 EQ-Bank 2 35

Pedro is planning a statistical investigation. 30


List the steps that Pedro must follow to execute the statistical investigation correctly. (2 marks) 25
20
15
10
5
0
0 100 200 300 400 500 600 700 800 900 1000
Height above sea level (x metres)

The equation of the regression line is .


The correlation coefficient is .
a. i. By using the equation of the regression line, predict the average maximum daily temperature, in
degrees Celsius, for a town that is 540 m above sea level. Give your answer correct to one decimal
place. (1 mark)
ii. The gradient of the regression line is −0.011. Interpret the value of this gradient in the given
context. (2 marks)
b. The graph below shows the relationship between the latitude, (degrees south), and the average
maximum daily temperature, (°C), for the same 17 towns, as well as a regression line.

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

Average maximum daily temperature


14. Statistics, 2ADV S2 EQ-Bank 3

Average maximum daily temperature (y °C)


40
35 The table below lists the average life span (in years) and average sleeping time (in hours/day) of 9
animal species.
30
25
20
15
10
5
0
10 15 20 25 30 35 40
Latitude (x degrees south)

The equation of the regression line is .


The correlation coefficient is .
Another inland town in Australia is 540 m above sea level. Its latitude is 28 degrees south.
Which measurement, height above sea level or latitude, would be better to use to predict this town’s
average maximum daily temperature? Give a reason for your answer. (1 mark)

13. Statistics, 2ADV S2 EQ-Bank 2


i. Using sleeping time as the independent variable, calculate the least squares regression line. (1 mark)
The table below lists the average body weight (in kilograms) and average brain weight (in grams) of
nine animal species. ii. A wallaby species sleeps for 4.5 hours, on average, each day.
Use your equation from part i to predict its expected life span, to the nearest year. (1 mark)

A least squares regression line is fitted to the data using body weight as the independent variable.
i. Calculate the equation of the least squares regression line. (1 mark)
ii. If dingos have an average body weight of 22.3 kilograms, calculate the predicted average brain
weight of a dingo using your answer to part i. (1 mark)

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

15. Statistics, 2ADV S2 2023 HSC 18 16. Statistics, 2ADV S2 EQ-Bank 1


The arm spans (in cm) and heights (in cm) for a group of 13 boys have been measured. The results
A university uses gas to heat its buildings. Over a period of 10 weekdays during winter, the gas used
are displayed in the table below.
each day was measured in megawatts (MW) and the average outside temperature each day was
recorded in degrees Celsius (°C).
Using as the average daily outside temperature and as the total daily gas usage, the equation of
the least-squares regression line was found.
The equation of the regression line predicts that when the temperature is 0°C, the daily gas usage is
236 MW.
The ten temperatures measured were: 0°, 0°, 0°, 2°, 5°, 7°, 8°, 9°, 9°, 10°,
The total gas usage for the ten weekdays was 1840 MW.
In any bivariate dataset, the least-squares regression line passes through the point , where is
the sample mean of the -values and is the sample mean of the -values.
a. Using the information provided, plot the point and the -intercept of the least-squares
regression line on the grid. (3 marks)

The aim is to find a linear equation that allows arm span to be predicted from height.
a. What will be the independent variable in the equation? (1 mark)

b. Assuming a linear association, determine the equation of the least squares regression line that
enables arm span to be predicted from height. Write this equation in terms of the variables arm span
and height. Give the coefficients correct to two decimal places. (2 marks)
c. Using the equation that you have determined in part b., interpret the slope of the least squares
regression line in terms of the variables height and arm span. (1 mark)

b. What is the equation of the regression line? (2 marks)

c. In the context of the dataset, identify ONE problem with using the regression line to predict gas
usage when the average outside temperature is 23°C. (1 mark)

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

17. Statistics, 2ADV S2 2022 HSC 24 19. Algebra, STD2 A2 2016 HSC 29e
Jo is researching the relationship between the ages of teenage characters in television series and the The graph shows the life expectancy of people born between 1900 and 2000.
ages of actors playing these characters.
After collecting the data, Jo finds that the correlation coefficient is 0.4564.
A scatterplot showing the data is drawn. The line of best fit with equation , is
also drawn.

i. According to the graph, what is the life expectancy of a person born in 1932? (1 mark)
Describe and interpret the data and other information provided, with reference to the context given. (4
marks) ii. With reference to the value of the gradient, explain the meaning of the gradient in this context. (2
marks)

18. Statistics, STD2 S4 EQ-Bank 4


Ten high school students have their height and the length of their right foot measured.
The results are recorded in the table below.

i. Using technology, calculate Pearson's correlation coefficient for the data. Give your answer to 3
decimal places. (1 mark)
ii. Describe the strength of the association between height and length of right foot for these students.
(1 mark)

iii. Using technology, determine the least squares regression line that allows height to be predicted
from right foot length. (1 mark)

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

20. Statistics, STD2 S4 2013 HSC 28b 21. Statistics, STD2 S4 2015 HSC 28e
 Part ii: RAP Data - Bottom 12%: School result (26%) was equal to state average (26%) The shoe size and height of ten students were recorded.

Ahmed collected data on the age ( ) and height ( ) of males aged 11 to 16 years.
He created a scatterplot of the data and constructed a line of best fit to model the relationship
between the age and height of males.

i. Complete the scatter plot AND draw a line of fit by eye. (2 marks)

i. Determine the gradient of the line of best fit shown on the graph. (1 mark)

ii. Explain the meaning of the gradient in the context of the data. (1 mark)
ii. Use the line of fit to estimate the height difference between a student who wears a size 7.5 shoe
iii. Determine the equation of the line of best fit shown on the graph. (2 marks) and one who wears a size 9 shoe. (1 mark)
iv. Use the line of best fit to predict the height of a typical 17-year-old male. (1 mark) iii. A student calculated the correlation coefficient to be 1 for this set of data. Explain why this cannot
v. Why would this model not be useful for predicting the height of a typical 45-year-old male? (1 mark) be correct. (1 mark)

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

22. Statistics, STD2 S4 2019 HSC 23 23. Statistics, 2ADV S2 2020 HSC 27
A set of bivariate data is collected by measuring the height and arm span of seven children. The  RAP Data - Bottom 10%: School result (45%) was 1% below state average (46%)
graph shows a scatterplot of these measurements.
A cricket is an insect. The male cricket produces a chirping sound.
A scientist wants to explore the relationship between the temperature in degrees Celsius and the
number of cricket chirps heard in a 15-second time interval.
Once a day for 20 days, the scientist collects data. Based on the 20 data points, the scientist
provides the information below.
A box-plot of the temperature data is shown.

The mean temperature in the dataset is 0.525°C below the median temperature in the dataset.
A total of 684 chirps was counted when collecting the 20 data points.

The scientist fits a least-squares regression line using the data , where is the temperature in
degrees Celsius and is the number of chirps heard in a 15-second time interval. The equation of
a. Calculate Pearson's correlation coefficient for the data, correct to two decimal places. (1 mark) this line is
b. Identify the direction and the strength of the linear association between height and arm span. (1 ,
mark)
where is the slope of the regression,
c. The equation of the least-squares regression line is shown.
The least-squares regression line passes through the point , where is the sample mean of
the temperature data and is the sample mean of the chirp data.
Height = 0.866 × (arm span) + 23.7
Calculate the number of chirps expected in a 15-second interval when the temperature is 19° Celsius.
A child has an arm span of 143 cm. Give your answer correct to the nearest whole number. (5 marks)
Calculate the predicted height for this child using the equation of the least-squares regression line.
(1 mark)

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

24. Statistics, STD2 S4 2006 HSC 27b 25. Statistics, STD2 S4 2014* HSC 30b
Each member of a group of males had his height and foot length measured and recorded. The results  Part ii: RAP Data - Bottom 21%: School result (75%) was 3% above state average (72%)
were graphed and a line of fit drawn.  Part viii: RAP Data - Bottom 14%: School result (1%) was 1% above state average (0%)

The scatterplot shows the relationship between expenditure per primary school student, as a
percentage of a country’s Gross Domestic Product (GDP), and the life expectancy in years for 15
countries.

i. Why does the value of the -intercept have no meaning in this situation? (1 mark)

ii. George is 10 cm taller than his brother Harry. Use the line of fit to estimate the difference in their
foot lengths. (1 mark)
iii. Sam calculated a correlation coefficient of −1.2 for the data. Give TWO reasons why Sam must be
incorrect. (2 marks)

i. For the given data, the correlation coefficient, , is 0.83. What does this indicate about the
relationship between expenditure per primary school student and life expectancy for the 15
countries? (1 mark)
ii. For the data representing expenditure per primary school student, is 8.4 and is 22.5.
What is the interquartile range? (1 mark)
iii. Another country has an expenditure per primary school student of 47.6% of its GDP.
Would this country be an outlier for this set of data? Justify your answer with calculations. (2 marks)

iv. On the scatterplot, draw the least-squares line of best fit . (2 marks)

v. Using this line, or otherwise, estimate the life expectancy in a country which has an expenditure per
primary school student of 18% of its GDP. (1 mark)
vi. Why is this line NOT useful for predicting life expectancy in a country which has expenditure per
primary school student of 60% of its GDP? (1 mark)

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

26. Statistics, STD2 S4 2009 HSC 28b Worked Solutions


The height and mass of a child are measured and recorded over its first two years.
1. Algebra, STD2 A2 2017 HSC 3 MC

90
This information is displayed in a scatter graph.

Life expectancy at birth


80

70

(years)
60

50

0 10 20 30 40 50 60 70 80 90 100 110 120


Infant mortality rate
(deaths per 1000 live births)

2. Statistics, STD2 S4 2017 HSC 12 MC

3. Statistics, 2ADV S2 2023 HSC 1 MC


NOTE: Inputting all data
i. Describe the correlation between the height and mass of this child, as shown in the graph. (1 mark) points into a calculator
is unnecessary and time
ii. A line of best fit has been drawn on the graph. consuming here.

Find the equation of this line. (2 marks)

Copyright © 2004-22 The State of New South Wales (Board of Studies, Teaching and Educational Standards NSW)
4. Statistics, STD2 S4 2016 HSC 3 MC

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

5. Statistics, STD2 S4 2011 HSC 8 MC 11. Statistics, STD2 S4 EQ-Bank 2

6. Statistics, STD2 S4 2013 HSC 2 MC

12. Statistics, 2ADV S2 2021 HSC 17

a.i.

7. Statistics, STD2 S4 2008 HSC 12 MC


a.ii.

b.

8. Statistics, STD2 S4 2007 HSC 9 MC

13. Statistics, 2ADV S2 EQ-Bank 2


9. Statistics, STD2 S4 2012 HSC 11 MC
i.

♦ Mean mark 43% COMMENT: Know this critical


calculator skill!.

ii.

10. Statistics, STD2 S4 SM-Bank 1

14. Statistics, 2ADV S2 EQ-Bank 3


i.

COMMENT: Issues here? YouTube


has short and excellent help
videos – search your calculator
ii. model and topic – eg. “fx-82
regression line” .

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

15. Statistics, 2ADV S2 2023 HSC 18 16. Statistics, 2ADV S2 EQ-Bank 1


a.
a.
COMMENT: Calculator skills for
finding the least squares
b. regression line were required in
NESA sample exam – know this
critical skill well!

c.

17. Statistics, 2ADV S2 2022 HSC 24

b.

18. Statistics, STD2 S4 EQ-Bank 4


i.

COMMENT: Issues here? YouTube


has short and excellent help
videos – search your calculator
c. model and topic – eg. “fx-82
correlation” .
ii.

iii.

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

19. Algebra, STD2 A2 2016 HSC 29e 20. Statistics, STD2 S4 2013 HSC 28b
i.
i.

ii.

♦♦ Mean mark part (ii) 33%

ii. ♦♦ Mean marks of 38%, 26% and


25% respectively for parts (i)-(iii).
MARKER’S
COMMENT: Interpreting gradients
has been consistently examined in
recent history and almost always
iii. poorly answered.

iv.

v.

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

21. Statistics, STD2 S4 2015 HSC 28e 22. Statistics, STD2 S4 2019 HSC 23

i. a.

♦ Mean mark 40%.


COMMENT: Issues here? YouTube
has short and excellent help
videos – search your calculator
model and topic – eg. “fx-82
b. correlation” .

c.

23. Statistics, 2ADV S2 2020 HSC 27

♦ Mean mark 46%.

ii.

iii.

♦ Mean mark 39%.

Downloaded by Eleni EL ([email protected])


lOMoARcPSD|49421039

24. Statistics, STD2 S4 2006 HSC 27b 25. Statistics, STD2 S4 2014* HSC 30b
i. i.

ii.
ii.

iii.
♦ Mean mark 35%
iii.

iv.

v.

♦♦ Mean mark 39%

vi. ♦♦♦ Mean mark 0%. The toughest


question on the 2014 paper.
Downloaded by Eleni EL ([email protected])
lOMoARcPSD|49421039

COMMENT: Examiners regularly


ask students to identify
and comment on outliers where
linear relationships break down.

26. Statistics, STD2 S4 2009 HSC 28b


i.

♦ Mean mark 48%.

ii.

♦♦♦ Mean mark 18%.


MARKER’S COMMENT: Many
students had difficulty due to the
fact the horizontal axis started at
and not the origin.

Copyright © 2016-2023 M2 Mathematics Pty Ltd (SmarterMaths.com.au)

Downloaded by Eleni EL ([email protected])

You might also like