0% found this document useful (0 votes)
37 views17 pages

Regression

The document contains a series of mathematical problems related to regression analysis, including finding regression lines, calculating correlation coefficients, and making predictions based on given data. It covers various scenarios involving students' test scores, social media use, and temperature effects on park attendance and ice cream sales. Each problem specifies maximum marks and includes parts that require calculations and interpretations.

Uploaded by

ryurdakul123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views17 pages

Regression

The document contains a series of mathematical problems related to regression analysis, including finding regression lines, calculating correlation coefficients, and making predictions based on given data. It covers various scenarios involving students' test scores, social media use, and temperature effects on park attendance and ice cream sales. Each problem specifies maximum marks and includes parts that require calculations and interpretations.

Uploaded by

ryurdakul123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Regression [157 marks]

1. [Maximum mark: 5] SPM.2.SL.TZ0.5


The following table below shows the marks scored by seven students on two
different mathematics tests.

Let L1 be the regression line of x on y. The equation of the line L1 can be written in
the form x = ay + b.

(a) Find the value of a and the value of b. [2]

(b) Let L2 be the regression line of y on x. The lines L1 and L2 pass


through the same point with coordinates (p , q).

Find the value of p and the value of q. [3]

2. [Maximum mark: 7] SPM.2.AHL.TZ0.4


The following table below shows the marks scored by seven students on two
different mathematics tests.

Let L1 be the regression line of x on y. The equation of the line L1 can be written in
the form x = ay + b.

(a) Find the value of a and the value of b. [2]

Let L2 be the regression line of y on x. The lines L1 and L2 pass through the same
point with coordinates (p , q).

(b) Find the value of p and the value of q. [3]


(c) Jennifer was absent for the first test but scored 29 marks on the
second test. Use an appropriate regression equation to estimate
Jennifer’s mark on the first test. [2]

3. [Maximum mark: 15] EXM.2.SL.TZ0.1


The principal of a high school is concerned about the effect social media use
might be having on the self-esteem of her students. She decides to survey a
random sample of 9 students to gather some data. She wants the number of
students in each grade in the sample to be, as far as possible, in the same
proportion as the number of students in each grade in the school.

(a) State the name for this type of sampling technique. [1]

The number of students in each grade in the school is shown in table.

(b.i) Show that 3 students will be selected from grade 12. [3]

(b.ii) Calculate the number of students in each grade in the sample. [2]

In order to select the 3 students from grade 12, the principal lists their names in
alphabetical order and selects the 28th, 56th and 84th student on the list.

(c) State the name for this type of sampling technique. [1]

Once the principal has obtained the names of the 9 students in the random
sample, she surveys each student to find out how long they used social media
the previous day and measures their self-esteem using the Rosenberg scale. The
Rosenberg scale is a number between 10 and 40, where a high number
represents high self-esteem.

(d.i) Calculate Pearson’s product moment correlation coefficient, r. [2]

(d.ii) Interpret the meaning of the value of r in the context of the


principal’s concerns. [1]

(d.iii) Explain why the value of r makes it appropriate to find the


equation of a regression line. [1]

(e) Another student at the school, Jasmine, has a self-esteem value


of 29.

By finding the equation of an appropriate regression line,


estimate the time Jasmine spent on social media the previous
day. [4]

4. [Maximum mark: 7] 23M.2.SL.TZ1.4


The total number of children, y, visiting a park depends on the highest
temperature, T , in degrees Celsius (°C). A park official predicts the total number
of children visiting his park on any given day using the model
+ 23T + 110, where 10 ≤ T ≤ 35.
2
y = −0. 6T

(a) Use this model to estimate the number of children in the park
on a day when the highest temperature is 25 °C. [2]

An ice cream vendor investigates the relationship between the total number of
children visiting the park and the number of ice creams sold, x. The following
table shows the data collected on five different days.
Total number
81 175 202 346 360
of children (y)
Ice creams
15 27 23 35 46
sold (x)
(b) Find an appropriate regression equation that will allow the
vendor to predict the number of ice creams sold on a day when
there are y children in the park. [3]

(c) Hence, use your regression equation to predict the number of


ice creams that the vendor sells on a day when the highest
temperature is 25°C. [2]

5. [Maximum mark: 5] 22N.2.SL.TZ0.1


The following table shows the Mathematics test scores (x) and the Science test
scores (y) for a group of eight students.

The regression line of y on x for this data can be written in the form
y = ax + b.

(a) Find the value of a and the value of b. [2]

(b) Write down the value of the Pearson’s product-moment


correlation coefficient, r. [1]

(c) Use the equation of your regression line to predict the Science
test score for a student who has a score of 78 on the
Mathematics test. Express your answer to the nearest integer. [2]
6. [Maximum mark: 7] 22M.1.SL.TZ1.3
A survey at a swimming pool is given to one adult in each family. The age of
the adult, a years old, and of their eldest child, c years old, are recorded.

The ages of the eldest child are summarized in the following box and whisker
diagram.

(a) Find the largest value of c that would not be considered an


outlier. [3]

The regression line of a on c is a =


7

4
c + 20. The regression line of c on a is
a − 9.
1
c =
2

(b.i) One of the adults surveyed is 42 years old. Estimate the age of
their eldest child. [2]

(b.ii) Find the mean age of all the adults surveyed. [2]

7. [Maximum mark: 7] 21N.2.AHL.TZ0.1


In Lucy’s music academy, eight students took their piano diploma examination
and achieved scores out of 150. For her records, Lucy decided to record the
average number of hours per week each student reported practising in the
weeks prior to their examination. These results are summarized in the table
below.
(a) Find Pearson’s product-moment correlation coefficient, r, for
these data. [2]

(b) The relationship between the variables can be modelled by the


regression equation D = ah + b. Write down the value of a
and the value of b. [1]

(c) One of these eight students was disappointed with her result
and wished she had practised more. Based on the given data,
determine how her score could have been expected to alter had
she practised an extra five hours per week. [2]

(d) Lucy asserts that the number of hours a student practises has a
direct effect on their final diploma result. Comment on the
validity of Lucy’s assertion. [1]

(e) Lucy suspected that each student had not been practising as
much as they reported. In order to compensate for this, Lucy
deducted a fixed number of hours per week from each of
the students’ recorded hours.

State how, if at all, the value of r would be affected. [1]

8. [Maximum mark: 7] 21M.2.SL.TZ1.2


The following table shows the data collected from an experiment.

The data is also represented on the following scatter diagram.


The relationship between x and y can be modelled by the regression line of y
on x with equation y = ax + b, where a, b ∈ R.
(a) Write down the value of a and the value of b. [2]

(b) Use this model to predict the value of y when x = 18. [2]

(c)
¯
¯ Write down the value of x and the value of y . [1]

(d) Draw the line of best fit on the scatter diagram. [2]

9. [Maximum mark: 6] 21M.2.SL.TZ2.1


At a café, the waiting time between ordering and receiving a cup of coffee is
dependent upon the number of customers who have already ordered their
coffee and are waiting to receive it.

Sarah, a regular customer, visited the café on five consecutive days. The
following table shows the number of customers, x, ahead of Sarah who have
already ordered and are waiting to receive their coffee and Sarah’s waiting time,
y minutes.
The relationship between x and y can be modelled by the regression line of y
on x with equation y = ax + b.
(a.i) Find the value of a and the value of b. [2]

(a.ii) Write down the value of Pearson’s product-moment correlation


coefficient, r. [1]

(b) Interpret, in context, the value of a found in part (a)(i). [1]

(c) On another day, Sarah visits the café to order a coffee. Seven
customers have already ordered their coffee and are waiting to
receive it.

Use the result from part (a)(i) to estimate Sarah’s waiting time to
receive her coffee. [2]

10. [Maximum mark: 6] 20N.2.SL.TZ0.S_2


Lucy sells hot chocolate drinks at her snack bar and has noticed that she sells
more hot chocolates on cooler days. On six different days, she records the
maximum daily temperature, T , measured in degrees centigrade, and the
number of hot chocolates sold, H . The results are shown in the following table.

The relationship between H and T can be modelled by the regression line


with equation H = aT + b.

(a.i) Find the value of a and of b. [3]

(a.ii) Write down the correlation coefficient. [1]

(b) Using the regression equation, estimate the number of hot


chocolates that Lucy will sell on a day when the maximum
temperature is 12°C.
[2]

11. [Maximum mark: 6] 19N.2.SL.TZ0.S_1


The number of messages, M , that six randomly selected teenagers sent during
the month of October is shown in the following table. The table also shows the
time, T , that they spent talking on their phone during the same month.

The relationship between the variables can be modelled by the regression


equation M = aT + b.

(a) Write down the value of a and of b. [3]

(b) Use your regression equation to predict the number of


messages sent by a teenager that spent 154 minutes talking on
their phone in October. [3]

12. [Maximum mark: 7] 19N.3.AHL.TZ0.Hsp_1


Peter, the Principal of a college, believes that there is an association between the
score in a Mathematics test, X , and the time taken to run 500 m, Y seconds, of
his students. The following paired data are collected.

It can be assumed that (X, Y ) follow a bivariate normal distribution with


product moment correlation coefficient ρ.

(a.i) State suitable hypotheses H0 and H1 to test Peter’s claim,


using a two-tailed test.
[1]

(a.ii) Carry out a suitable test at the 5 % significance level. With


reference to the p-value, state your conclusion in the context of
Peter’s claim. [4]

(b) Peter uses the regression line of y on x as


y = 0.248x + 83.0 and calculates that a student with a

Mathematics test score of 73 will have a running time of 101


seconds. Comment on the validity of his calculation. [2]

13. [Maximum mark: 6] 19M.1.SL.TZ2.T_2


Colorado beetles are a pest, which can cause major damage to potato crops. For
a certain Colorado beetle the amount of oxygen, in millilitres (ml), consumed
each day increases with temperature as shown in the following table.

This information has been used to plot a scatter diagram.


(a) Find the equation of the regression line of y on x. [2]

The mean point has coordinates (20, 230).

(b) Draw the regression line of y on x on the scatter diagram. [2]

(c) In order to estimate the amount of oxygen consumed, this


regression line is considered to be reliable for a temperature x
such that a ≤ x ≤ b.

Write down the value of a and of b. [2]

14. [Maximum mark: 6] 19M.1.SL.TZ2.T_2


Colorado beetles are a pest, which can cause major damage to potato crops. For
a certain Colorado beetle the amount of oxygen, in millilitres (ml), consumed
each day increases with temperature as shown in the following table.

This information has been used to plot a scatter diagram.


(a) Find the equation of the regression line of y on x. [2]

The mean point has coordinates (20, 230).

(b) Draw the regression line of y on x on the scatter diagram. [2]

(c) In order to estimate the amount of oxygen consumed, this


regression line is considered to be reliable for a temperature x
such that a ≤ x ≤ b.

Write down the value of a and of b. [2]

15. [Maximum mark: 6] 19M.2.SL.TZ1.S_5


A jigsaw puzzle consists of many differently shaped pieces that fit together to
form a picture.

Jill is doing a 1000-piece jigsaw puzzle. She started by sorting the edge pieces
from the interior pieces. Six times she stopped and counted how many of each
type she had found. The following table indicates this information.
Jill models the relationship between these variables using the regression
equation y = ax + b.
(a) Write down the value of a and of b. [3]

(b) Use the model to predict how many edge pieces she had found
when she had sorted a total of 750 pieces. [3]

16. [Maximum mark: 16] 19M.2.SL.TZ1.T_1


A healthy human body temperature is 37.0 °C. Eight people were medically
examined and the difference in their body temperature (°C), from 37.0 °C, was
recorded. Their heartbeat (beats per minute) was also recorded.

(a) Draw a scatter diagram for temperature difference from 37 °C (x


) against heartbeat (y). Use a scale of 2 cm for 0.1 °C on the
horizontal axis, starting with −0.3 °C. Use a scale of 1 cm for 2
heartbeats per minute on the vertical axis, starting with 60 beats
per minute. [4]

(b.i) Write down, for this set of data the mean temperature
difference from 37 °C, x̄. [1]

(b.ii) Write down, for this set of data the mean number of heartbeats
per minute, ȳ. [1]

(c) Plot and label the point M(x̄, ȳ) on the scatter diagram. [2]

(d.i) Use your graphic display calculator to find the Pearson’s


product–moment correlation coefficient, r. [2]

(d.ii) Hence describe the correlation between temperature difference


from 37 °C and heartbeat. [2]
(e) Use your graphic display calculator to find the equation of the
regression line y on x. [2]

(f ) Draw the regression line y on x on the scatter diagram. [2]

17. [Maximum mark: 6] 19M.2.SL.TZ2.S_1


A group of 7 adult men wanted to see if there was a relationship between their
Body Mass Index (BMI) and their waist size. Their waist sizes, in centimetres, were
recorded and their BMI calculated. The following table shows the results.

The relationship between x and y can be modelled by the regression equation


y = ax + b.

(a.i) Write down the value of a and of b. [3]

(a.ii) Find the correlation coefficient. [1]

(b) Use the regression equation to estimate the BMI of an adult


man whose waist size is 95 cm. [2]

18. [Maximum mark: 14] 18N.2.SL.TZ0.T_1


The marks obtained by nine Mathematical Studies SL students in their projects
(x) and their final IB examination scores (y) were recorded. These data were used
to determine whether the project mark is a good predictor of the examination
score. The results are shown in the table.
(a.i) Use your graphic display calculator to write down x̄, the mean
project mark. [1]

(a.ii) Use your graphic display calculator to write down ȳ, the mean
examination score. [1]

(a.iii) Use your graphic display calculator to write down r , Pearson’s


product–moment correlation coefficient. [2]

The equation of the regression line y on x is y = mx + c.

(b.i) Find the exact value of m and of c for these data. [2]

(b.ii) Show that the point M (x̄, ȳ) lies on the regression line y on x. [2]

A tenth student, Jerome, obtained a project mark of 17.

(c.i) Use the regression line y on x to estimate Jerome’s examination


score. [2]

(c.ii) Justify whether it is valid to use the regression line y on x to


estimate Jerome’s examination score. [2]

(d) In his final IB examination Jerome scored 65.

Calculate the percentage error in Jerome’s estimated


examination score. [2]

19. [Maximum mark: 6] 18N.2.SL.TZ0.S_2


The following table shows the hand lengths and the heights of five athletes on a
sports team.
The relationship between x and y can be modelled by the regression line with
equation y = ax + b.
(a.i) Find the value of a and of b. [3]

(a.ii) Write down the correlation coefficient. [1]

(b) Another athlete on this sports team has a hand length of 21.5
cm. Use the regression equation to estimate the height of this
athlete. [2]

20. [Maximum mark: 6] 18M.1.SL.TZ1.T_4


A scientist measures the concentration of dissolved oxygen, in milligrams per
litre (y) , in a river. She takes 10 readings at different temperatures, measured in
degrees Celsius (x).

The results are shown in the table.

It is believed that the concentration of dissolved oxygen in the river varies


linearly with the temperature.

(a.i) For these data, find Pearson’s product-moment correlation


coefficient, r. [2]

(a.ii) For these data, find the equation of the regression line y on x. [2]

(b) Using the equation of the regression line, estimate the


concentration of dissolved oxygen in the river when the
temperature is 18 °C. [2]

21. [Maximum mark: 6] 18M.1.SL.TZ2.T_1


The following scatter diagram shows the scores obtained by seven students in
their mathematics test, m, and their physics test, p.

The mean point, M, for these data is (40, 16).

(a) Plot and label the point M(m̄, p̄) on the scatter diagram. [2]

(b) Draw the line of best fit, by eye, on the scatter diagram. [2]

(c) Using your line of best fit, estimate the physics test score for a
student with a score of 20 in their mathematics test. [2]

© International Baccalaureate Organization, 2024

You might also like