0% found this document useful (0 votes)
8 views

Assignment_on_Probability_and_statistics

The document outlines an assignment on probability and statistics worth 400 marks, consisting of various problems related to probability calculations, data analysis, regression, and correlation coefficients. It includes tasks such as finding probabilities, interpreting statistical relationships, and conducting hypothesis tests. The assignment covers a range of topics including cumulative frequency, sampling techniques, and information theory.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Assignment_on_Probability_and_statistics

The document outlines an assignment on probability and statistics worth 400 marks, consisting of various problems related to probability calculations, data analysis, regression, and correlation coefficients. It includes tasks such as finding probabilities, interpreting statistical relationships, and conducting hypothesis tests. The assignment covers a range of topics including cumulative frequency, sampling techniques, and information theory.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Assignment on Probability and statistics [400 marks]

1. [Maximum mark: 6] SPM.1.SL.TZ0.13


Mr Burke teaches a mathematics class with 15 students. In this class there are 6 female students and 9 male students.

Each day Mr Burke randomly chooses one student to answer a homework question.

(a) Find the probability that on any given day Mr Burke chooses a female student to answer a question. [1]

In the first month, Mr Burke will teach his class 20 times.

(b) Find the probability he will choose a female student 8 times. [2]

(c) Find the probability he will choose a male student at most 9 times. [3]

2. [Maximum mark: 6] SPM.1.SL.TZ0.3


At the end of a school day, the Headmaster conducted a survey asking students in how many classes they had used the internet.

The data is shown in the following table.

(a) State whether the data is discrete or continuous. [1]

The mean number of classes in which a student used the internet is 2.

(b) Find the value of k. [4]

(c) It was not possible to ask every person in the school, so the Headmaster arranged the student names in
alphabetical order and then asked every 10th person on the list.

Identify the sampling technique used in the survey. [1]

3. [Maximum mark: 17] SPM.2.SL.TZ0.3


The Malvern Aquatic Center hosted a 3 metre spring board diving event. The judges, Stan and Minsun awarded 8 competitors a
score out of 10. The raw data is collated in the following table.

(a.i) Write down the value of the Pearson’s product–moment correlation coefficient, r. [2]

(a.ii) Using the value of r, interpret the relationship between Stan’s score and Minsun’s score. [2]

(b) Write down the equation of the regression line y on x. [2]

(c.i) Use your regression equation from part (b) to estimate Minsun’s score when Stan awards a perfect 10. [2]
( ) y g q p ( ) p [ ]

(c.ii) State whether this estimate is reliable. Justify your answer. [2]

The Commissioner for the event would like to find the Spearman’s rank correlation coefficient.

(d) Copy and complete the information in the following table.

[2]

(e.i) Find the value of the Spearman’s rank correlation coefficient, r .


s [2]

(e.ii) Comment on the result obtained for r . s [2]

(f ) The Commissioner believes Minsun’s score for competitor G is too high and so decreases the score from 9.5 to 9.1.

Explain why the value of the Spearman’s rank correlation coefficient r does not change.
s [1]

4. [Maximum mark: 7] EXN.1.SL.TZ0.4


A food scientist measures the weights of 760 potatoes taken from a single field and the distribution of the weights is shown by the
cumulative frequency curve below.

(a) Find the number of potatoes in the sample with a weight of more than 200 grams. [2]

(b.i) Find the median weight. [1]

(b.ii) Find the lower quartile. [1]


(b.iii) Find the upper quartile. [1]

(c) The weight of the smallest potato in the sample is 20 grams and the weight of the largest is 400 grams.

Use the scale shown below to draw a box and whisker diagram showing the distribution of the weights of the
potatoes. You may assume there are no outliers.

[2]

5. [Maximum mark: 7] EXN.1.SL.TZ0.8


The water temperature (T ) in Lake Windermere is measured on the first day of eight consecutive months (m) from January to
August (months 1 to 8) and the results are shown below. The value for May (month 5) has been accidently deleted.

(a) Assuming the data follows a linear model for this period, find the regression line of T on m for the remaining
data. [2]

(b) Use your line to find an estimate for the water temperature on the first day of May. [2]

(c.i) Explain why your line should not be used to estimate the value of m at which the temperature is 10. 0 °C. [1]

(c.ii) Explain in context why your line should not be used to predict the value for December (month 12). [1]

(d) State a more appropriate model for the water temperature in the lake over an extended period of time. You are
not expected to calculate any parameters. [1]

6. [Maximum mark: 7] EXN.1.AHL.TZ0.12


It is believed that the power P of a signal at a point d km from an antenna is inversely proportional to d where n n
∈ Z
+
.

The value of P is recorded at distances of 1 m to 5 m and the values of log 10


d and log 10
P are plotted on the graph below.

(a) Explain why this graph indicates that P is inversely proportional to d . n


[2]

The values of log 10


d and log 10
P are shown in the table below.

(b) Find the equation of the least squares regression line of log 10
P against log 10
.
d

[2]

(c.i) Use your answer to part (b) to write down the value of n to the nearest integer. [1]

(c.ii) Find an expression for P in terms of d. [2]

7. [Maximum mark: 2] EXN.2.SL.TZ0.c


Which of the correlation coefficients would you recommend is used to assess whether or not there is an association
between total number of minutes late and distance from school? Fully justify your answer. [2]

8. [Maximum mark: 10] EXM.1.AHL.TZ0.15


Adesh wants to model the cooling of a metal rod. He heats the rod and records its temperature as it cools.

He believes the temperature can be modeled by T (t) = ae


bt
+ 25 , where a, b ∈ R .

(a) Show that ln (T − 25) = bt + ln a . [2]

(b) Find the equation of the regression line of ln (T − 25) on t. [3]

Hence

(c.i) find the value of a and of b. [3]

(c.ii) predict the temperature of the metal rod after 3 minutes. [2]

9. [Maximum mark: 18] 23N.2.AHL.TZ0.2


The heights, h, of 200 university students are recorded in the following table.
(a.i) Write down the mid-interval value of 140 ≤ h < 160 . [1]

(a.ii) Calculate an estimate of the mean height of the 200 students. [2]

This table is used to create the following cumulative frequency graph.

(b) Use the cumulative frequency curve to estimate the interquartile range. [2]

Laszlo is a student in the data set and his height is 204 cm.

(c) Use your answer to part (b) to estimate whether Laszlo’s height is an outlier for this data. Justify your answer. [3]

It is believed that the heights of university students follow a normal distribution with mean 176 cm and standard deviation
13. 5 cm.

It is decided to perform a χ goodness of fit test on the data to determine whether this sample of 200 students could have
2

plausibly been drawn from an underlying distribution N (176, 13. 5 ). 2

(d) Write down the null and the alternative hypotheses for the test. [2]

As part of the test, the following table is created.

(e.i) Find the value of a and the value of b. [4]

(e.ii) Hence, perform the test to a 5 % significance level, clearly stating the conclusion in context. [4]
10. [Maximum mark: 27] 23N.3.AHL.TZ0.2
This question is about applying ideas from logarithms, calculus and probability to an unfamiliar mathematical theory called
information theory.

Claude Shannon developed a mathematical theory called information theory to measure the information gained when random
events occur. He defined the information, I , that is gained when an event with probability p occurs as

I = − ln p

where 0 < p ≤ 1 . For example, no information is gained (I = 0) when an event is certain to occur (p = 1) .

(a.i) Sketch the graph of I = − ln p , for 0 < p ≤ 1 , labelling all axes intercepts and asymptotes. [3]

(a.ii) Show, using calculus, that I is a decreasing function of p. [3]

(a.iii) Interpret what “I is a decreasing function of p” means in the given context. [1]

(b) A computer selects at random an integer x from 1 to 10, inclusive. Each outcome is equally likely.

Alessia is trying to determine the value of x and asks if x is odd.

(b.i) Write down the probability that x is odd. [1]

(b.ii) Alessia is told that x is odd. Find how much information Alessia gains. [2]

The computer then selects at random an integer y from 1 to 10, inclusive. Each outcome is equally likely.

Daniel is trying to determine the value of y and asks if y is 7. He is told that it is not 7.

(b.iii) Find how much information Daniel gains. [2]

If a random variable has n possible outcomes with probabilities p 1, p2 … pn , then the expected information gained, E(I ), is
defined as
n

E(I ) = Σ − p r ln p r .
r=1

(c) For the integer guessing game described in part (b), when Daniel asks if y is 7, there are two possible outcomes: “y
is 7” or “y is not 7”.

(c.i) Show that the expected information gained by Daniel is 0. 325, correct to three significant figures. [2]

(c.ii) Alessia asks if x is odd. Show that her expected information gained is greater than Daniel’s expected information
gained. [2]

Information theory can be applied to a variety of situations.

(d) When a coin is flipped, the outcome is either heads or tails. The coin may be biased. Let p be the probability of the
outcome being heads.

(d.i) Find, in terms of p, the information gained when the outcome is tails. [1]

(d.ii) Find, in terms of p, the expected information gained when the coin is flipped once. [1]

(d.iii) Hence, find the value of p when the expected information gained is maximized. [2]

A famous puzzle uses 12 balls which appear identical. 11 have the same weight, but one is either lighter or heavier than the others.
A pair of scales can be repeatedly used to compare the weights of different combinations of the balls.
The outcome of each weighing can be “balanced”, “left-hand side heavier” or “right-hand side heavier”. The aim of the puzzle is to
identify the ball which is the different weight, and whether it is heavier or lighter than the others, in as few weighings as possible.
(e) Angela wants to decide how many balls should be compared to each other in the first weighing. She produces
the following table to help plan her strategy.

(e.i) Find the value of x. Justify your answer. [2]

(e.ii) Find the value of y. [2]

(e.iii) Find the value of z. [2]

(e.iv) Use the table to suggest the best choice for Angela’s first weighing. Justify your answer. [1]

11. [Maximum mark: 7] 23M.1.SL.TZ2.3


In a school, 200 students solved a problem in a mathematics competition. Their times to solve the problem were recorded and the
following cumulative frequency graph was produced.
(a) Use the graph to find

(a.i) the median time; [1]

(a.ii) the lower quartile; [1]

(a.iii) the upper quartile; [1]

(a.iv) the interquartile range. [1]

Cedric took 14 seconds to solve the problem.

(b) Determine whether Cedric’s time is an outlier. [3]

12. [Maximum mark: 6] 23M.1.SL.TZ2.4


At a running club, Sung-Jin conducts a test to determine if there is any association between an athlete’s age and their best time
taken to run 100 m. Eight athletes are chosen at random, and their details are shown below.

Athlete A B C D E F G H

Age (years) 13 17 22 18 19 25 11 36

Time (seconds) 13. 4 14. 6 13. 4 12. 9 12. 0 11. 8 17. 0 13. 1

Sung-Jin decides to calculate the Spearman’s rank correlation coefficient for his set of data.

(a) Complete the table of ranks.

Athlete A B C D E F G H

Age rank 3
Time rank 1 [2]

(b) Calculate the Spearman’s rank correlation coefficient, r . s [2]

(c) Interpret this value of r in the context of the question.


s [1]

(d) Suggest a mathematical reason why Sung-Jin may have decided not to use Pearson’s product-moment
correlation coefficient with his data from the original table. [1]

13. [Maximum mark: 4] 23M.1.SL.TZ2.5


The following frequency distribution table shows the test grades for a group of students.

Grade 1 2 3 4 5 6 7

Frequency 1 4 7 9 p 9 4

For this distribution, the mean grade is 4. 5.

(a) Write down the total number of students in terms of p. [1]

(b) Calculate the value of p. [3]

14. [Maximum mark: 6] 23M.1.SL.TZ2.6


A company that owns many restaurants wants to determine if there are differences in the quality of the food cooked for three
different meals: breakfast, lunch and dinner.

Their quality assurance team randomly selects 500 items of food to inspect. The quality of this food is classified as perfect,
satisfactory, or poor. The data is summarized in the following table.

An item of food is chosen at random from these 500.

(a) Find the probability that its quality is not perfect, given that it is from breakfast. [2]

A χ test at the 5% significance level is carried out to determine if there is significant evidence of a difference in the quality of the
2

food cooked for the three meals.

The critical value for this test is 9. 488.

The hypotheses for this test are:


H0 : The quality of the food and the type of meal are independent.
H1 : The quality of the food and the type of meal are not independent.
(b) Find the χ statistic.
2
[2]

(c) State, with justification, the conclusion for this test. [2]

15. [Maximum mark: 7] 23M.1.AHL.TZ1.8


The random variables (X, Y ) follow a bivariate normal distribution with product-moment correlation coefficient ρ. The values of
six random observations of (X, Y ) are shown in the table.

x 6. 3 4. 1 5. 6 9. 2 7. 8 8. 2

y 9. 2 4. 9 8. 9 10. 3 8. 9 9. 8

(a) State null and alternative hypotheses which could be used to test whether there is a linear correlation between X
and Y . [2]

(b) Determine the value of

(b.i) the product-moment correlation coefficient, r, of the sample. [1]

(b.ii) the corresponding p-value. [2]

(c) State whether your result from part (b)(ii) indicates there is sufficient evidence to claim that, at the 5%
significance level, X and Y are not linearly correlated.

Give a reason for your answer. [2]

16. [Maximum mark: 9] 23M.1.AHL.TZ2.9


At a running club, Sung-Jin conducts a test to determine if there is any association between an athlete’s age and their best time
taken to run 100 m. Eight athletes are chosen at random, and their details are shown below.

Athlete A B C D E F G H

Age (years) 13 17 22 18 19 25 11 36

Time (seconds) 13. 4 14. 6 13. 4 12. 9 12. 0 11. 8 17. 0 13. 1

Sung-Jin decides to calculate the Spearman’s rank correlation coefficient for his set of data.

(a) Complete the table of ranks.

Athlete A B C D E F G H

Age rank 3

Time rank 1

[2]
(b) Calculate the Spearman’s rank correlation coefficient, r . s [2]

(c) Interpret this value of r in the context of the question.


s [1]

(d) Suggest a mathematical reason why Sung-Jin may have decided not to use Pearson’s product-moment
correlation coefficient with his data from the original table. [1]

(e.i) Find the coefficient of determination for the data from the original table. [2]

(e.ii) Interpret this value in the context of the question. [1]

17. [Maximum mark: 9] 23M.1.AHL.TZ2.9


At a running club, Sung-Jin conducts a test to determine if there is any association between an athlete’s age and their best time
taken to run 100 m. Eight athletes are chosen at random, and their details are shown below.

Athlete A B C D E F G H

Age (years) 13 17 22 18 19 25 11 36

Time (seconds) 13. 4 14. 6 13. 4 12. 9 12. 0 11. 8 17. 0 13. 1

Sung-Jin decides to calculate the Spearman’s rank correlation coefficient for his set of data.

(a) Complete the table of ranks.

Athlete A B C D E F G H

Age rank 3

Time rank 1

[2]

(b) Calculate the Spearman’s rank correlation coefficient, r . s [2]

(c) Interpret this value of r in the context of the question.


s [1]

(d) Suggest a mathematical reason why Sung-Jin may have decided not to use Pearson’s product-moment
correlation coefficient with his data from the original table. [1]

(e.i) Find the coefficient of determination for the data from the original table. [2]

(e.ii) Interpret this value in the context of the question. [1]

18. [Maximum mark: 15] 23M.2.SL.TZ1.1


The mean annual temperatures for Earth, recorded at fifty-year intervals, are shown in the table.

Year (x) 1708 1758 1808 1858 1908 1958 2008

Year °C (y) 8. 73 9. 22 9. 10 9. 12 9. 13 9. 45 9. 76
Tami creates a linear model for this data by finding the equation of the straight line passing through the points with coordinates
(1708, 8. 73) and (1958, 9. 45).

(a) Calculate the gradient of the straight line that passes through these two points. [2]

(b.i) Interpret the meaning of the gradient in the context of the question. [1]

(b.ii) State appropriate units for the gradient. [1]

(c) Find the equation of this line giving your answer in the form y = mx + c . [2]

(d) Use Tami’s model to estimate the mean annual temperature in the year 2000. [2]

Thandizo uses linear regression to obtain a model for the data.

(e.i) Find the equation of the regression line y on x. [2]

(e.ii) Find the value of r, the Pearson’s product-moment correlation coefficient. [1]

(f ) Use Thandizo’s model to estimate the mean annual temperature in the year 2000. [2]

Thandizo uses his regression line to predict the year when the mean annual temperature will first exceed 15 °C.

(g) State two reasons why Thandizo’s prediction may not be valid. [2]

19. [Maximum mark: 13] 23M.2.AHL.TZ1.1


The mean annual temperatures for Earth, recorded at fifty-year intervals, are shown in the table.

Year (x) 1708 1758 1808 1858 1908 1958 2008

Year °C (y) 8. 73 9. 22 9. 10 9. 12 9. 13 9. 45 9. 76

Tami creates a linear model for this data by finding the equation of the straight line passing through the points with coordinates
(1708, 8. 73) and (1958, 9. 45).

(a) Calculate the gradient of the straight line that passes through these two points. [2]

(b.i) Interpret the meaning of the gradient in the context of the question. [1]

(b.ii) State appropriate units for the gradient. [1]

(c) Find the equation of this line giving your answer in the form y = mx + c . [2]

(d) Use Tami’s model to estimate the mean annual temperature in the year 2000. [2]

Thandizo uses linear regression to obtain a model for the data.

(e.i) Find the equation of the regression line y on x. [2]

(e.ii) Find the value of r, the Pearson’s product-moment correlation coefficient. [1]

(f ) Use Thandizo’s model to estimate the mean annual temperature in the year 2000. [2]
20. [Maximum mark: 17] 23M.2.AHL.TZ1.3
A large international sports tournament tests their athletes for banned substances.
They interpret a positive test result as meaning that the athlete uses banned substances.
A negative result means that they do not.

The probability that an athlete uses banned substances is estimated to be 0. 06.

If an athlete uses banned substances, the probability that they will test positive is 0. 71.

If an athlete does not use banned substances, the probability that they will test negative is 0. 98.

(a) Using the information given, complete the following tree diagram.

[2]

(b.i) Determine the probability that a randomly selected athlete does not use banned substances and tests negative. [2]

(b.ii) If two athletes are selected at random, calculate the probability that both athletes do not use banned substances
and both test negative. [2]

(c.i) Calculate the probability that a randomly selected athlete will receive an incorrect test result. [3]

(c.ii) A random sample of 1300 athletes at the tournament are selected for testing. Calculate the expected number of
athletes in the sample that will receive an incorrect test result. [2]

Team X are competing in the tournament. There are 20 athletes in this team. It is known that none of the athletes in Team X use
banned substances.

(d) Calculate the probability that none of the athletes in Team X will test positive. [4]

(e) Determine the probability that more than 2 athletes in Team X will test positive. [2]

21. [Maximum mark: 7] 22N.1.SL.TZ0.2


In the first month of a reforestation program, the town of Neerim plants 85 trees. Each subsequent month the number of trees
planted will increase by an additional 30 trees.

The number of trees to be planted in each of the first three months are shown in the following table.
(a) Find the number of trees to be planted in the 15th month. [3]

(b) Find the total number of trees to be planted in the first 15 months. [2]

(c) Find the mean number of trees planted per month during the first 15 months. [2]

22. [Maximum mark: 17] 22N.2.SL.TZ0.1


Elsie, a librarian, wants to investigate the length of time, T minutes, that people spent in her library on a particular day.

(a) State whether the variable T is discrete or continuous. [1]

Elsie’s data for 160 people who visited the library on that particular day is shown in the following table.

(b) Find the value of k. [2]

(c.i) Write down the modal class. [1]

(c.ii) Write down the mid-interval value for this class. [1]

(d) Use Elsie’s data to calculate an estimate of the mean time that people spent in the library. [2]

(e) Using the table, write down the maximum possible number of people who spent 35 minutes or less in the library
on that day. [1]

Elsie assumes her data to be representative of future visitors to the library.

(f ) Find the probability a visitor spends at least 60 minutes in the library. [2]

The following box and whisker diagram shows the times, in minutes, that the 160 visitors spent in the library.

(g) Write down the median time spent in the library. [1]

(h) Find the interquartile range. [2]

(i) Hence show that the longest time that a person spent in the library is not an outlier. [3]

Elsie believes the box and whisker diagram indicates that the times spent in the library are not normally distributed.
(j) Identify one feature of the box and whisker diagram which might support Elsie’s belief. [1]

23. [Maximum mark: 18] 22N.2.AHL.TZ0.6


A company makes doors for kitchen cupboards from two layers. The inside layer is wood, and its thickness is normally distributed
with mean 7 mm and standard deviation 0. 3 mm. The outside layer is plastic, and its thickness is normally distributed with mean
3 mm and standard deviation 0. 16 mm. The thickness of the plastic is independent of the thickness of the wood.

(a) Find the probability that a randomly chosen door has a total thickness of less than 9. 5 mm. [5]

Eight doors are to be packed into a box to send to a customer. The width of the box is 82 mm. The thickness of each door is
independent.

(b) Find the probability that the total thickness of the eight doors is greater than the width of the box. [4]

The company buys two new machines, A and B, to make the wooden layers. An employee claims that the layers from machine B
are thinner than the layers from machine A. In order to test this claim, a random sample is taken from each machine.

The seven layers in the sample from machine A have a thickness, in mm, of

6. 23, 7. 04, 7. 31, 6. 79, 6. 91, 6. 79, 7. 47 .

Find the

(c.i) mean. [1]

(c.ii) unbiased estimate of the population variance. [2]

The eight layers in the sample from machine B have a mean thickness of 6. 89 mm and S n−1 = 0. 31 .

(d) Perform a suitable test, at the 5% significance level, to test the employee’s claim. You may assume the thickness of
the wooden layers from each machine are normally distributed with equal population variance. [6]

24. [Maximum mark: 29] 22N.3.AHL.TZ0.1


In this question, you will explore possible approaches to using historical sports results for making predictions about future
sports matches.

Two friends, Peter and Helen, are discussing ways of predicting the outcomes of international football matches involving
Argentina.

Peter suggests analysing historical data to help make predictions. He lists the results of the most recent 240 matches in which
Argentina played, in chronological order, then considers blocks of four matches at a time. He counts how many times Argentina
has won in each block. The following table shows his results for the 60 blocks of four matches.
(a) Determine the mean number of wins per block of four matches for Argentina. [2]

Peter thinks that this data can be modelled by a binomial distribution with n = 4 and decides to carry out a χ goodness of fit test.
2

(b) Use Peter’s data to write down an estimate for the probability p for this binomial model. [1]

(c.i) Use the binomial model to find the probability that Argentina win zero matches in a block of four matches. [1]

(c.ii) Find the expected frequency for zero wins. [2]

As some expected frequencies are less than 5, Peter combines rows in his table to produce the following observed frequencies. He
then uses his binomial model to find appropriate expected frequencies, correct to one decimal place.

Peter uses this table to carry out a χ goodness of fit test, to test the hypothesis that the data follows a binomial distribution with
2

n = 4, at the 5% significance level.

For this test, state

(d.i) the null hypothesis; [1]

(d.ii) the number of degrees of freedom; [1]

(d.iii) the p-value; [2]

(d.iv) the conclusion, justifying your answer. [2]

(e) Using Peter’s binomial model, find the probability that Argentina will win at least one of their next four
international football matches. [2]

Helen thinks that a better prediction might be made by considering the transition between matches. To keep the model simple, she
decides to use only two states: Argentina won (A) or Argentina did not win (B). Helen looks at Peter’s list of results and counts the
number of times that:

Argentina won, twice in succession (AA),


Argentina won, then did not win (AB),
Argentina did not win, then won (BA),
Argentina did not win, twice in succession (BB).

She recorded the following results.


Helen uses the relative frequencies to estimate the probabilities in a transition matrix.
(f.i) Given that Argentina won the previous match, show that Helen’s estimate for the probability of Argentina
winning the next match is 17

29
. [2]

(f.ii) Write down the transition matrix, T , for Helen’s model. [2]

(g.i) Show that the characteristic polynomial of T is 1363λ 2


− 1263λ − 100 = 0 . [3]

(g.ii) Hence or otherwise, find the eigenvalues of T . [1]

(g.iii) Find the corresponding eigenvectors. [3]

(h) In her retirement, many years from now, Helen is planning to travel to three consecutive international football
matches involving Argentina. Use Helen’s model to find the probability that Argentina will win all three matches. [4]

25. [Maximum mark: 7] 22M.1.SL.TZ2.5


A polygraph test is used to determine whether people are telling the truth or not, but it is not completely accurate. When a person
tells the truth, they have a 20% chance of failing the test. Each test outcome is independent of any previous test outcome.

10 people take a polygraph test and all 10 tell the truth.

(a) Calculate the expected number of people who will pass this polygraph test. [2]

(b) Calculate the probability that exactly 4 people will fail this polygraph test. [2]

(c) Determine the probability that fewer than 7 people will pass this polygraph test. [3]

26. [Maximum mark: 7] 22M.1.SL.TZ2.5


A polygraph test is used to determine whether people are telling the truth or not, but it is not completely accurate. When a person
tells the truth, they have a 20% chance of failing the test. Each test outcome is independent of any previous test outcome.

10 people take a polygraph test and all 10 tell the truth.

(a) Calculate the expected number of people who will pass this polygraph test. [2]

(b) Calculate the probability that exactly 4 people will fail this polygraph test. [2]

(c) Determine the probability that fewer than 7 people will pass this polygraph test. [3]

27. [Maximum mark: 5] 22M.1.SL.TZ2.7


A college runs a mathematics course in the morning. Scores for a test from this class are shown below.

25 33 51 62 63 63 70 74 79 79 81 88 90 90 98
For these data, the lower quartile is 62 and the upper quartile is 88.
(a) Show that the test score of 25 would not be considered an outlier. [3]

The box and whisker diagram showing these scores is given below.

Test scores

Another mathematics class is run by the college during the evening. A box and whisker diagram showing the scores from this class
for the same test is given below.

Test scores

A researcher reviews the box and whisker diagrams and believes that the evening class performed better than the morning class.

(b) With reference to the box and whisker diagrams, state one aspect that may support the researcher’s opinion and
one aspect that may counter it. [2]

28. [Maximum mark: 6] 22M.1.SL.TZ2.2


A group of 130 applicants applied for admission into either the Arts programme or the Sciences programme at a university. The
outcomes of their applications are shown in the following table.

(a) Find the probability that a randomly chosen applicant from this group was accepted by the university. [1]

An applicant is chosen at random from this group. It is found that they were accepted into the programme of their choice.

(b) Find the probability that the applicant applied for the Arts programme. [2]

(c) Two different applicants are chosen at random from the original group.

Find the probability that both applicants applied to the Arts programme. [3]

29. [Maximum mark: 16] 22M.2.SL.TZ1.3


The scores of the eight highest scoring countries in the 2019 Eurovision song contest are shown in the following table.
For this data, find

(a.i) the upper quartile. [2]

(a.ii) the interquartile range. [2]

(b) Determine if the Netherlands’ score is an outlier for this data. Justify your answer. [3]

Chester is investigating the relationship between the highest-scoring countries’ Eurovision score and their population size to
determine whether population size can reasonably be used to predict a country’s score.

The populations of the countries, to the nearest million, are shown in the table.

Chester finds that, for this data, the Pearson’s product moment correlation coefficient is r = 0. 249 .

(c) State whether it would be appropriate for Chester to use the equation of a regression line for y on x to predict a
country’s Eurovision score. Justify your answer. [2]

Chester then decides to find the Spearman’s rank correlation coefficient for this data, and creates a table of ranks.
Write down the value of:

(d.i) a . [1]

(d.ii) b. [1]

(d.iii) c . [1]

(e.i) Find the value of the Spearman’s rank correlation coefficient r .s [2]

(e.ii) Interpret the value obtained for r .


s [1]

(f ) When calculating the ranks, Chester incorrectly read the Netherlands’ score as 478. Explain why the value of the
Spearman’s rank correlation r does not change despite this error.
s [1]

30. [Maximum mark: 15] 22M.2.SL.TZ1.5


The aircraft for a particular flight has 72 seats. The airline’s records show that historically for this flight only 90% of the people who
purchase a ticket arrive to board the flight. They assume this trend will continue and decide to sell extra tickets and hope that no
more than 72 passengers will arrive.

The number of passengers that arrive to board this flight is assumed to follow a binomial distribution with a probability of 0. 9.

(a) The airline sells 74 tickets for this flight. Find the probability that more than 72 passengers arrive to board the
flight. [3]

(b.i) Write down the expected number of passengers who will arrive to board the flight if 72 tickets are sold. [2]

(b.ii) Find the maximum number of tickets that could be sold if the expected number of passengers who arrive to board
the flight must be less than or equal to 72. [2]

Each passenger pays $150 for a ticket. If too many passengers arrive, then the airline will give $300 in compensation to each
passenger that cannot board.

(c) Find, to the nearest integer, the expected increase or decrease in the money made by the airline if they decide to
sell 74 tickets rather than 72. [8]

31. [Maximum mark: 17] 22M.2.SL.TZ2.1


Mackenzie conducted an experiment on the reaction times of teenagers. The results of the experiment are displayed in the
following cumulative frequency graph.
Use the graph to estimate the

(a.i) median reaction time. [1]

(a.ii) interquartile range of the reaction times. [3]

(b) Find the estimated number of teenagers who have a reaction time greater than 0. 4 seconds. [2]

(c) Determine the 90th percentile of the reaction times from the cumulative frequency graph. [2]

Mackenzie created the cumulative frequency graph using the following grouped frequency table.

(d.i) Write down the value of a. [1]

(d.ii) Write down the value of b. [1]

(e) Write down the modal class from the table. [1]

(f ) Use your graphic display calculator to find an estimate of the mean reaction time. [2]

Upon completion of the experiment, Mackenzie realized that some values were grouped incorrectly in the frequency table. Some
reaction times recorded in the interval 0 < t ≤ 0. 2 should have been recorded in the interval 0. 2 < t ≤ 0. 4.

(g) Suggest how, if at all, the estimated mean and estimated median reaction times will change if the errors are
corrected. Justify your response. [4]
32. [Maximum mark: 13] 22M.2.AHL.TZ1.3
A Principal would like to compare the students in his school with a national standard. He decides to give a test to eight students
made up of four boys and four girls. One of the teachers offers to find the volunteers from his class.

(a) Name the type of sampling that best describes the method used by the Principal. [1]

The marks out of 40, for the students who took the test, are:

25, 29, 38, 37, 12, 18, 27, 31.

For the eight students find

(b.i) the mean mark. [2]

(b.ii) the standard deviation of the marks. [1]

The national standard mark is 25. 2 out of 40.

(c) Perform an appropriate test at the 5% significance level to see if the mean marks achieved by the students in the
school are higher than the national standard. It can be assumed that the marks come from a normal population. [5]

(d) State one reason why the test might not be valid. [1]

Two additional students take the test at a later date and the mean mark for all ten students is 28. 1 and the standard deviation is
8. 4.

For further analysis, a standardized score out of 100 for the ten students is obtained by multiplying the scores by 2 and adding 20.

For the ten students, find

(e.i) their mean standardized score. [1]

(e.ii) the standard deviation of their standardized score. [2]

33. [Maximum mark: 15] 22M.2.AHL.TZ2.4


A student investigating the relationship between chemical reactions and temperature finds the Arrhenius equation on the internet.
c

k = Ae T

This equation links a variable k with the temperature T , where A and c are positive constants and T > 0 .

(a) Show that dk

dT
is always positive. [3]

(b) Given that lim k = A and lim k = 0 , sketch the graph of k against T . [3]
T →∞ T →0

The Arrhenius equation predicts that the graph of ln k against 1

T
is a straight line.

Write down

(c) (i) the gradient of this line in terms of c;

(ii) the y-intercept of this line in terms of A. [4]


The following data are found for a particular reaction, where T is measured in Kelvin and k is measured in cm 3
mol
−1
s
−1
:

(d) Find the equation of the regression line for ln k on 1

T
. [2]

Find an estimate of

(e.i) c.

It is not required to state units for this value. [1]

(e.ii) A .

It is not required to state units for this value. [2]

34. [Maximum mark: 28] 22M.3.AHL.TZ1.1


This question is about modelling the spread of a computer virus to predict the number of computers in a city which will be
infected by the virus.

A systems analyst defines the following variables in a model:

t is the number of days since the first computer was infected by the virus.
Q(t) is the total number of computers that have been infected up to and including day t.

The following data were collected:

(a.i) Find the equation of the regression line of Q(t) on t. [2]

(a.ii) Write down the value of r, Pearson’s product-moment correlation coefficient. [1]

(a.iii) Explain why it would not be appropriate to conduct a hypothesis test on the value of r found in (a)(ii). [1]

A model for the early stage of the spread of the computer virus suggests that

Q′(t) = βN Q(t)

where N is the total number of computers in a city and β is a measure of how easily the virus is spreading between computers.
Both N and β are assumed to be constant.
(b.i) Find the general solution of the differential equation Q′(t) = βN Q(t) . [4]

(b.ii) Using the data in the table write down the equation for an appropriate non-linear regression model. [2]

(b.iii) Write down the value of R for this model.


2
[1]

(b.iv) Hence comment on the suitability of the model from (b)(ii) in comparison with the linear model found in part (a). [2]

(b.v) By considering large values of t write down one criticism of the model found in (b)(ii). [1]

(c) Use your answer from part (b)(ii) to estimate the time taken for the number of infected computers to double. [2]

The data above are taken from city X which is estimated to have 2. 6 million computers.
The analyst looks at data for another city, Y. These data indicate a value of β = 9. 64 × 10
−8
.

(d) Find in which city, X or Y, the computer virus is spreading more easily. Justify your answer using your results from
part (b). [3]

An estimate for Q′(t), t ≥ 5 , can be found by using the formula:

Q(t+5)−Q(t−5)
Q′(t) ≈
10
.

The following table shows estimates of Q′(t) for city X at different values of t.

(e) Determine the value of a and of b. Give your answers correct to one decimal place. [2]

An improved model for Q(t), which is valid for large values of t, is the logistic differential equation

Q(t)
Q′(t) = kQ(t)(1 − )
L

where k and L are constants.

Q′(t)
Based on this differential equation, the graph of Q(t)
against Q(t) is predicted to be a straight line.

(f.i) Use linear regression to estimate the value of k and of L. [5]

(f.ii) The solution to the differential equation is given by

L
Q(t) =
1+Ce −kt

where C is a constant.

Using your answer to part (f )(i), estimate the percentage of computers in city X that are expected to have been
infected by the virus over a long period of time. [2]

© International Baccalaureate Organization, 2024

You might also like