Bivariate Data Analysis HSC Questions
Bivariate Data Analysis HSC Questions
Questions
ADVANCED MATHEMATICS
Statistics (Adv), S2 Interpretation and Bivariate Data (Adv) 1. Algebra, STD2 A2 2017 HSC 3 MC
Bivariate Data Analysis (Y12) The graph shows the relationship between infant mortality rate (deaths per 1000 live births) and life
expectancy at birth (in years) for different countries.
Teacher: Elena Efimova
Exam Equivalent Time: 109.5 minutes (based on allocation of 1.5 minutes per mark)
90
70
(years)
60
50
Which of the following is the best approximation for the correlation coefficient of this set of data?
(A)
(B)
(C)
(D)
A least squares regression line is fitted to the data using body weight as the independent variable.
i. Calculate the equation of the least squares regression line. (1 mark)
ii. If dingos have an average body weight of 22.3 kilograms, calculate the predicted average brain
weight of a dingo using your answer to part i. (1 mark)
The aim is to find a linear equation that allows arm span to be predicted from height.
a. What will be the independent variable in the equation? (1 mark)
b. Assuming a linear association, determine the equation of the least squares regression line that
enables arm span to be predicted from height. Write this equation in terms of the variables arm span
and height. Give the coefficients correct to two decimal places. (2 marks)
c. Using the equation that you have determined in part b., interpret the slope of the least squares
regression line in terms of the variables height and arm span. (1 mark)
c. In the context of the dataset, identify ONE problem with using the regression line to predict gas
usage when the average outside temperature is 23°C. (1 mark)
17. Statistics, 2ADV S2 2022 HSC 24 19. Algebra, STD2 A2 2016 HSC 29e
Jo is researching the relationship between the ages of teenage characters in television series and the The graph shows the life expectancy of people born between 1900 and 2000.
ages of actors playing these characters.
After collecting the data, Jo finds that the correlation coefficient is 0.4564.
A scatterplot showing the data is drawn. The line of best fit with equation , is
also drawn.
i. According to the graph, what is the life expectancy of a person born in 1932? (1 mark)
Describe and interpret the data and other information provided, with reference to the context given. (4
marks) ii. With reference to the value of the gradient, explain the meaning of the gradient in this context. (2
marks)
i. Using technology, calculate Pearson's correlation coefficient for the data. Give your answer to 3
decimal places. (1 mark)
ii. Describe the strength of the association between height and length of right foot for these students.
(1 mark)
iii. Using technology, determine the least squares regression line that allows height to be predicted
from right foot length. (1 mark)
20. Statistics, STD2 S4 2013 HSC 28b 21. Statistics, STD2 S4 2015 HSC 28e
Part ii: RAP Data - Bottom 12%: School result (26%) was equal to state average (26%) The shoe size and height of ten students were recorded.
Ahmed collected data on the age ( ) and height ( ) of males aged 11 to 16 years.
He created a scatterplot of the data and constructed a line of best fit to model the relationship
between the age and height of males.
i. Determine the gradient of the line of best fit shown on the graph. (1 mark)
ii. Explain the meaning of the gradient in the context of the data. (1 mark)
ii. Use the line of fit to estimate the height difference between a student who wears a size 7.5 shoe
iii. Determine the equation of the line of best fit shown on the graph. (2 marks) and one who wears a size 9 shoe. (1 mark)
iv. Use the line of best fit to predict the height of a typical 17-year-old male. (1 mark) iii. A student calculated the correlation coefficient to be 1 for this set of data. Explain why this cannot
v. Why would this model not be useful for predicting the height of a typical 45-year-old male? (1 mark) be correct. (1 mark)
22. Statistics, STD2 S4 2019 HSC 23 23. Statistics, 2ADV S2 2020 HSC 27
A set of bivariate data is collected by measuring the height and arm span of seven children. The RAP Data - Bottom 10%: School result (45%) was 1% below state average (46%)
graph shows a scatterplot of these measurements.
A cricket is an insect. The male cricket produces a chirping sound.
A scientist wants to explore the relationship between the temperature in degrees Celsius and the
number of cricket chirps heard in a 15-second time interval.
Once a day for 20 days, the scientist collects data. Based on the 20 data points, the scientist
provides the information below.
A box-plot of the temperature data is shown.
The mean temperature in the dataset is 0.525°C below the median temperature in the dataset.
A total of 684 chirps was counted when collecting the 20 data points.
The scientist fits a least-squares regression line using the data , where is the temperature in
degrees Celsius and is the number of chirps heard in a 15-second time interval. The equation of
a. Calculate Pearson's correlation coefficient for the data, correct to two decimal places. (1 mark) this line is
b. Identify the direction and the strength of the linear association between height and arm span. (1 ,
mark)
where is the slope of the regression,
c. The equation of the least-squares regression line is shown.
The least-squares regression line passes through the point , where is the sample mean of
the temperature data and is the sample mean of the chirp data.
Height = 0.866 × (arm span) + 23.7
Calculate the number of chirps expected in a 15-second interval when the temperature is 19° Celsius.
A child has an arm span of 143 cm. Give your answer correct to the nearest whole number. (5 marks)
Calculate the predicted height for this child using the equation of the least-squares regression line.
(1 mark)
24. Statistics, STD2 S4 2006 HSC 27b 25. Statistics, STD2 S4 2014* HSC 30b
Each member of a group of males had his height and foot length measured and recorded. The results Part ii: RAP Data - Bottom 21%: School result (75%) was 3% above state average (72%)
were graphed and a line of fit drawn. Part viii: RAP Data - Bottom 14%: School result (1%) was 1% above state average (0%)
The scatterplot shows the relationship between expenditure per primary school student, as a
percentage of a country’s Gross Domestic Product (GDP), and the life expectancy in years for 15
countries.
i. Why does the value of the -intercept have no meaning in this situation? (1 mark)
ii. George is 10 cm taller than his brother Harry. Use the line of fit to estimate the difference in their
foot lengths. (1 mark)
iii. Sam calculated a correlation coefficient of −1.2 for the data. Give TWO reasons why Sam must be
incorrect. (2 marks)
i. For the given data, the correlation coefficient, , is 0.83. What does this indicate about the
relationship between expenditure per primary school student and life expectancy for the 15
countries? (1 mark)
ii. For the data representing expenditure per primary school student, is 8.4 and is 22.5.
What is the interquartile range? (1 mark)
iii. Another country has an expenditure per primary school student of 47.6% of its GDP.
Would this country be an outlier for this set of data? Justify your answer with calculations. (2 marks)
v. Using this line, or otherwise, estimate the life expectancy in a country which has an expenditure per
primary school student of 18% of its GDP. (1 mark)
vi. Why is this line NOT useful for predicting life expectancy in a country which has expenditure per
primary school student of 60% of its GDP? (1 mark)
90
This information is displayed in a scatter graph.
70
(years)
60
50
Copyright © 2004-22 The State of New South Wales (Board of Studies, Teaching and Educational Standards NSW)
4. Statistics, STD2 S4 2016 HSC 3 MC
a.i.
b.
ii.
c.
b.
iii.
19. Algebra, STD2 A2 2016 HSC 29e 20. Statistics, STD2 S4 2013 HSC 28b
i.
i.
ii.
iv.
v.
21. Statistics, STD2 S4 2015 HSC 28e 22. Statistics, STD2 S4 2019 HSC 23
i. a.
c.
ii.
iii.
24. Statistics, STD2 S4 2006 HSC 27b 25. Statistics, STD2 S4 2014* HSC 30b
i. i.
ii.
ii.
iii.
♦ Mean mark 35%
iii.
iv.
v.
ii.