The FINAL Final
The FINAL Final
A random sample of 149 scores for a university exam are given in the table.
Markscheme
52.8 A1
[1 mark]
Markscheme
s
2
n−1
= 23.7
2
= 562 M1A1
[2 marks]
The university wants to know if the scores follow a normal distribution, with the mean
and variance found in part (a).
(b) Show that the expected frequency for 20 < x ≤ 4 is 31.5 correct to 1
decimal place. [3]
Markscheme
0.211 × 149 M1
= 31.5 AG
[3 marks]
Markscheme
υ = 5 − 1 − 2 = 2 A1
p-value = 0.569 A2
[8 marks]
The university assigns a pass grade to students whose scores are in the top 80%.
(d) Use the normal distribution model to find the score required to pass. [2]
Markscheme
Φ
−1
(0.2) = 32.8 M1A1
[2 marks]
The university also wants to know if the exam is gender neutral. They obtain random
samples of scores for male and female students. The mean, sample variance and sample
size are shown in the table.
Markscheme
use of a t-test M1
H0 : μm = μf and H 1 : μ m ≠ μf A1
p-value = 0.180 A2
[6 marks]
The university awards a distinction to students who achieve high scores in the exam.
Typically, 15% of students achieve a distinction. A new exam is trialed with a random
selection of students on the course. 5 out of 20 students achieve a distinction.
Markscheme
use of test for proportion using Binomial distribution M1
Since 5 < 7 R1
[6 marks]
Markscheme
[3 marks]
(g.ii) Given that p = 0.2 find the probability of making a Type II error. [3]
Markscheme
P (X ⩽ 3) = 0.598 M1A1
[3 marks]
t is the number of days since the first computer was infected by the virus.
Q(t) is the total number of computers that have been infected up to and including
day t.
Markscheme
Note: Award at most A1A0 if answer is not an equation. Award A1A0 for an answer
including either x or y.
[2 marks]
Markscheme
0. 755 (0. 754741 …) A1
[1 mark]
Markscheme
[1 mark]
A model for the early stage of the spread of the computer virus suggests that
Q′(t) = βN Q(t)
where N is the total number of computers in a city and β is a measure of how easily the
virus is spreading between computers. Both N and β are assumed to be constant.
Markscheme
ln|Q| = βN t + c A1A1A1
[4 marks]
(b.ii) Using the data in the table write down the equation for an
appropriate non-linear regression model. [2]
Markscheme
A1
0.292t 0.292055…t
Q = 1. 15e (Q = 1. 14864 … e )
OR
A1
t t
Q = 1. 15 × 1. 34 (1. 14864 … × 1. 33917 … )
[2 marks]
[1 mark]
Markscheme
2
R > r
R > r
The exponential model shows better correlation (since not clear how it is being
measured)
Model 2 has a better fit
Model 2 is more correlated
[2 marks]
(b.v) By considering large values of t write down one criticism of the model
found in (b)(ii). [1]
Markscheme
it suggests that there will be more infected computers than the entire population
R1
[1 mark]
(c) Use your answer from part (b)(ii) to estimate the time taken for the
number of infected computers to double. [2]
Markscheme
OR OR OR using the
0.292t t ln 2
1. 15e = 2. 3 1. 15 × 1. 34 = 2. 3 t =
0.292
model to find two specific times with values of Q(t) which double M1
t = 2. 37 (days) A1
Note: Do not FT from a model which is not exponential. Award M0A0 for an answer of
2. 13 which comes from using (10, 20) from the data or any other answer which
finds a doubling time from figures given in the table.
[2 marks]
The data above are taken from city X which is estimated to have 2. 6 million computers.
The analyst looks at data for another city, Y. These data indicate a value of
β = 9. 64 × 10
−8
.
(d) Find in which city, X or Y, the computer virus is spreading more easily.
Justify your answer using your results from part (b). [3]
Markscheme
OR
0.292055… ln 1.33917…
β = 6
β = 6
2.6×10 2.6×10
= 1. 12328 … × 10
−7
A1
−8
this is larger than 9. 64 × 10 so the virus spreads more easily in city X R1
[3 marks]
Q(t+5)−Q(t−5)
Q′(t) ≈
10
.
The following table shows estimates of Q′(t) for city X at different values of t.
(e) Determine the value of a and of b. Give your answers correct to one
decimal place. [2]
Markscheme
[2 marks]
An improved model for Q(t), which is valid for large values of t, is the
logistic differential equation
Q(t)
Q′(t) = kQ(t)(1 − )
L
Q′(t)
Based on this differential equation, the graph of Q(t)
against Q(t) is predicted to be a
straight line.
Markscheme
Q′
(A1)(A1)
−6
= 0. 42228 − 2. 5561 × 10 Q
Q
Note: Award A1 for each coefficient seen – not necessarily in the equation. Do not
penalize seeing in the context of y and x.
L = 165000 (165205) A1
[5 marks]
L
Q(t) = −kt
1+Ce
where C is a constant.
Markscheme
165205…
2600000
= 6. 35% (6. 35403 … %) A1
Note: Accept any final answer consistent with their answer to part (f )(i) unless their
L is less than 120146 in which case award at most M1A0.
[2 marks]
It can be assumed that (X, Y ) follow a bivariate normal distribution with product
moment correlation coefficient ρ.
(a.i) State suitable hypotheses H 0 and H 1 to test Peter’s claim, using a two-
tailed test. [1]
Markscheme
H0 : ρ = 0 H1 : ρ ≠ 0 A1
Note: It must be ρ.
[1 mark]
(a.ii) Carry out a suitable test at the 5 % significance level. With reference to
the p-value, state your conclusion in the context of Peter’s claim. [4]
Markscheme
p = 0.649 A2
Note: The A mark depends on the R mark and the answer must be given in context.
Follow through the p-value in part (b).
[4 marks]
(b) Peter uses the regression line of y on x as y = 0.248x + 83.0 and
calculates that a student with a Mathematics test score of 73 will have a
running time of 101 seconds. Comment on the validity of his
calculation. [2]
Markscheme
a statement along along the lines of ‘(we have accepted that) the two variables are
independent’ or ‘the two variables are weakly correlated’ R1
a statement along the lines of ‘the use of the regression line is invalid’ or ‘it would
give an inaccurate result’ R1
Note: FT the conclusion in(a)(ii). If a candidate concludes that the claim is correct,
mark as follows: (as we have accepted H1) the 2 variables are dependent and 73 lies
in the range of x values R1, hence the use of the regression line is valid R1.
[2 marks]
Aimmika is the manager of a grocery store in Nong Khai. She is carrying out a statistical
analysis on the number of bags of rice that are sold in the store each day. She collects the
following sample data by recording how many bags of rice the store sells each day over
a period of 90 days.
(a.i) Find the mean and variance for the sample data given in the table. [2]
Markscheme
[2 marks]
(a.ii) Hence state why Aimmika believes her data follows a Poisson
distribution. [1]
Markscheme
[1 mark]
(b) State one assumption that Aimmika needs to make about the sales of
bags of rice to support her belief that it follows a Poisson distribution. [1]
Markscheme
the number of bags sold each day is independent of any other day
the sales of bags of rice (each day) occur at a constant mean rate A1
[1 mark]
Aimmika knows from her historic sales records that the store sells an average of 4. 2
bags of rice each day. The following table shows the expected frequency of bags of rice
sold each day during the 90 day period, assuming a Poisson distribution with mean 4. 2
.
Markscheme
a = 7. 018 A1
b = 17. 498 A1
EITHER
c = 5. 755 A1
OR
90 − 7. 018 − 11. 903 − 16. 665 − 17. 498 − 14. 698 − 10. 289 − 6. 173
(M1)
c = 5. 756 A1
Note: Do not penalize the omission of clear a, b and c labelling as this will be
penalized later if correct values are interchanged.
[5 marks]
Aimmika decides to carry out a χ 2 goodness of fit test at the 5% significance level to
see whether the data follows a Poisson distribution with mean 4. 2.
(d.i) Write down the number of degrees of freedom for her test. [1]
Markscheme
7 A1
[1 mark]
(d.ii) Perform the χ 2 goodness of fit test and state, with reason, a conclusion. [7]
Markscheme
H 0 : The number of bags of rice sold each day follows a Poisson distribution with
mean 4. 2. A1
H 1 : The number of bags of rice sold each day does not follow a Poisson
distribution with mean 4. 2. A1
Note: Award A1A1 for both hypotheses correctly stated and in correct order. Award
A1A0 if reference to the data and/or “mean 4. 2” is not included in the hypotheses,
but otherwise correct.
evidence of attempting to group data to obtain the observed frequencies for ≤ 1
and ≥ 8 (M1)
the result is not significant so there is no reason to reject H 0 (the number of bags
sold each day follows a Poisson distribution) A1
Note: Do not award R0A1. The conclusion MUST follow through from their
hypotheses. If no hypotheses are stated, the final A1 can still be awarded for a
correct conclusion as long as it is in context (e.g. therefore the data follows a
Poisson distribution).
[7 marks]
Aimmika claims that advertising in a local newspaper for 300 Thai Baht (THB) per
day will increase the number of bags of rice sold. However, Nichakarn, the owner of the
store, claims that the advertising will not increase the store’s overall profit.
Nichakarn agrees to advertise in the newspaper for the next 60 days. During that
time, Aimmika records that the store sells 282 bags of rice with a profit of 495 THB
on each bag sold.
(e.i) By finding a critical value, perform this test at a 5 % significance level. [6]
Markscheme
METHOD 1
H 1 : μ > 252 A1
282 ≥ 279 , R1
(the advertising increased the number of bags sold during the 60 days)
Note: Do not award R0A1. Accept statements referring to the advertising being
effective for A1 as long as the R mark is satisfied. For the R1A1, follow through within
the part from their critical value.
METHOD 2
H 0 : μ = 4. 2
H 1 : μ > 4. 2 A1
4. 7 > 4. 63518 … R1
(the advertising increased the number of bags sold during the 60 days)
Note: Do not award R0A1. Accept statements referring to the advertising being
effective for A1 as long as the R mark is satisfied. For the R1A1, follow through within
the part from their critical value.
[6 marks]
(e.ii) Hence state the probability of a Type I error for this test. [1]
Markscheme
[1 mark]
EITHER
OR
OR
THEN
EITHER
Even though the number of bags of rice increased, the advertising is not worth it as
the overall profit did not increase. R1
OR
The advertising is worth it even though the cost is less than the increased profit,
since the number of customers increased (possibly buying other products and/or
returning in the future after advertising stops) R1
Note: Follow through within the part for correct reasoning consistent with their
comparison.
[3 marks]
For the test, a group of eight students were randomly selected from each school. Both
samples were given a standardized test at the start of the course and a prediction for
total IB points was made based on that test; this was then compared to their points total
at the end of the course.
Previous results indicate that both the predictions from the standardized tests and the
final IB points can be modelled by a normal distribution.
the standardized test is a valid method for predicting the final IB points
that variations from the prediction can be explained through the circumstances
of the student or school.
(a) Identify a test that might have been used to verify the null hypothesis
that the predictions from the standardized test can be modelled by a
normal distribution. [1]
Markscheme
χ
2
(goodness of fit) A1
[1 mark]
(b) State why comparing only the final IB points of the students from the
two schools would not be a valid test for the effectiveness of the two
different teaching methods. [1]
Markscheme
EITHER
OR
[1 mark]
For each student, the change from the predicted points to the final points (f − p) was
calculated.
[1 mark]
Markscheme
2.46 (M1)A1
[2 marks]
Markscheme
p-value = 0.423 A1
[4 marks]
Markscheme
p-value = 0.0984 A1
0.0984 > 0.05 (not significant at the 5 % level) so do not reject the null hypothesis
R1A1
Note: The final A1 cannot be awarded following an incorrect reason. The final R1A1
can follow through from their incorrect p-value. Award a maximum of A1(M1)A0R1A1
for p-value = 0.0993.
[5 marks]
(e.ii) State why it was important to test that both sets of points were normally
distributed. [1]
Markscheme
sample too small for the central limit theorem to apply (and t-tests assume normal
distribution) R1
[1 mark]
School A also gives each student a score for effort in each subject. This effort score is
based on a scale of 1 to 5 where 5 is regarded as outstanding effort.
It is claimed that the effort put in by a student is an important factor in improving upon
their predicted IB points.
Markscheme
H0 : ρ = 0
H0 : ρ > 0 A1
Note: Allow hypotheses to be expressed in words.
p-value = 0.00157 A1
[3 marks]
(f.ii) Hence, find the expected improvement between predicted and final
points for an increase of one unit in effort grades, giving your answer to
one decimal place. [1]
Markscheme
[1 mark]
A mathematics teacher in school A claims that the comparison between the two schools
is not valid because the sample for school B contained mainly girls and that for school A,
mainly boys. She believes that girls are likely to show a greater improvement from their
predicted points to their final points.
She collects more data from other schools, asking them to class their results into four
categories as shown in the following table.
Markscheme
H0 : improvement and gender are independent
groups first two columns as expected values in first column less than 5 M1
(A1)
p-value = 0.581 A1
[6 marks]
(h) If you were to repeat the test performed in part (e) intending to
compare the quality of the teaching between the two schools, suggest
two ways in which you might choose your sample to improve the
validity of the test. [2]
Markscheme
For example:
Note: Award R1 for each reasonable suggestion to improve the validity of the test.
[2 marks]
A factory produces components for a tractor. They have designed a new technique to
produce one of their components that they hope will increase its useful lifespan.
They test 120 components made with the new technique and 240 with the technique
they currently use. At the end of 250 hours of use, they check the components, and
record whether they have no cracks, minor cracks or major cracks.
Markscheme
141 − 88 A1
= 53 AG
[1 mark]
Markscheme
(120 − 53 − 54 =) 13 A1
[1 mark]
(b) Given that this component had minor cracks find the probability that it
was produced by the new technique. [2]
Markscheme
Restricting the size of the sample space to 150 (54 + 96) (M1)
= 0. 36 (
54
150
,
9
25
) A1
[2 marks]
Markscheme
Note: Condone equivalent statements such as ‘not dependent’ but do not accept
“uncorrelated” or “not related” in place of “independent”.
[1 mark]
Markscheme
(p − value =) 0. 0170 (0. 0169864 …) (M1)A1
[2 marks]
(c.iii) State the conclusion of the test in context, justifying your answer. [2]
Markscheme
0. 0170 < 0. 05 R1
hence there is sufficient evidence to reject the null hypothesis that the
development of cracks and the technique used are independent. A1
[2 marks]
(d) For the components in the trial that were made with the current
technique, show that the proportion which developed cracks is 19
30
.
[1]
Markscheme
96+56
240
(=
152
240
) A1
19
30
AG
[1 mark]
H0 : p =
19
30
'
H1 : p <
19
30
.
In a randomly selected sample of 120 components made with the new technique let X
be the number which developed cracks. The researchers assume that, under the null
hypothesis, X~B(120, 19
30
).
(e) State one additional assumption that the researchers are making in
choosing this distribution. [1]
Markscheme
EITHER
Note: Do not accept the word “independence” on its own. Appropriate context
must be seen.
OR
[1 mark]
(f ) Use appropriate data from the trial to perform the test proposed by the
researchers, at the 5% significance level. State the conclusion of the
test, justifying your answer. [5]
Markscheme
67 seen (A1)
EITHER
0. 0549 > 0. 05 R1
OR
critical region is X ≤ 66 A1
THEN
EITHER
do not reject the null hypothesis (as there is insufficient evidence that the new
technique reduces the number of cracks). A1
OR
do not accept the alternative hypothesis (as there is insufficient evidence that the
new technique reduces the number of cracks). A1
[5 marks]
(g) In comparison with the test in part (c), state one mathematical reason
why
Markscheme
the test for a proportion is directional and so considers whether the new treatment
reduces the number of components developing cracks. R1
[1 mark]
Markscheme
EITHER
there could be variation in the value of p chosen for the null hypothesis /
the value of p from the sample might not be a representative of the current
technique R1
OR
the test in (f ) does not treat minor and major cracks as different attributes /
the test in (c) does treat minor and major cracks as different attributes R1
OR
the test in (f ) has to make an additional assumption (for example ‘independence’)
R1
[1 mark]
For these components, the researchers also consider the mean time taken until cracks
develop. It is hoped that using the new technique will increase this value. A second trial
is carried out and the times, in hours, taken for cracks to appear is recorded.
The mean time taken for cracks to appear ( t̄ ) and the value of s n−1 for each technique
are given in the following table.
(h) Perform an appropriate test at the 5% significance level to determine
whether the new technique increases the mean time taken for cracks to
appear. [7]
Markscheme
EITHER
let μ 1 be the mean length of time before cracks appear with the new technique
and μ 2 be the mean length with the current technique
H0 : μ1 = μ2 A1
H1 : μ1 > μ2 A1
Note: Award A1A0 for correct hypotheses in which the two population means are
not clearly defined (e.g. unsupported μ 1 and μ 2 ).
OR
H 0 : the POPULATION mean length of time before cracks appear is the same for
both groups A1
H 1 :the new technique increases the POPULATION mean length of time before
cracks appear. A1
OR
H 0 : the mean length of time before cracks appear in ALL components made with
the new technique is the same as for ALL components made with the current
technique. A1
H 1 : the mean length of time before cracks appear in ALL components made with
the new technique is greater than the mean for ALL components made with the
current technique. A1
Note: Award A1A0 if “population” (or equivalent, such as “all”) is omitted from an
otherwise correct answer.
THEN
0. 0162 < 0. 05 R1
[7 marks]
The company decides to go ahead with the new technique and publishes the following
statement: “statistical tests show the new technique will significantly increase the time
before components crack and need to be replaced”.
Markscheme
EITHER
(though statistically significant) the new technique only seems to increase the time
before cracks appear by 1 hour out of 250, so it is not a significant increase (i.e. the
effect size is small) R1
OR
the minimum time (not mean time) before cracks appear should be considered
given the context / An appropriate confidence interval should be considered, and
not simply the mean. R1
Note: If a not significant p-value was seen in part (h), do not award R1 for an answer
of “the result is not significant” in part (i).
[1 mark]
(a) Find the probability that an apple from the tree has a weight greater
than 90 grams. [2]
Markscheme
[2 marks]
A sample of apples are taken from 2 trees, A and B, in different parts of the orchard.
The owner of the orchard wants to know whether the mean weight of the apples from
tree A(μ A ) is greater than the mean weight of the apples from tree B(μ B ) so sets up
the following test:
H0 : μA = μB and H 1 : μ A > μ B
(b.i) Find the p-value for the owner’s test. [2]
Markscheme
[2 marks]
State the conclusion of the test, giving a reason for your answer. [2]
Markscheme
0. 0189 < 0. 05 R1
Sufficient evidence to reject the null hypothesis (that the weights of apples from
the two trees are equal) A1
[2 marks]
Every year an accountancy firm recruits new employees for a trial period of one year
from a large group of applicants.
At the start, all applicants are interviewed and given a rating. Those with a rating of
either Excellent, Very good or Good are recruited for the trial period. At the end of this
period, some of the new employees will stay with the firm.
It is decided to test how valid the interview rating is as a way of predicting which of the
new employees will stay with the firm.
Markscheme
H 0 : Staying (or leaving) the firm and interview rating are independent.
H1 : Staying (or leaving) the firm and interview rating are not independent
A1
0. 487 > 0. 05 R1
Note: Do not award R0A1. The final R1A1 can follow through from their incorrect p-
value
[6 marks]
The next year’s group of applicants are asked to complete a written assessment which is
then analysed. From those recruited as new employees, a random sample of size 18 is
selected.
The sample is stratified by department. Of the 91 new employees recruited that year, 55
were placed in the national department and 36 in the international department.
(b) Show that 11 employees are selected for the sample from the national
department. [2]
Markscheme
55
91
× 18 = 10. 9 (10. 8791 …) M1A1
≈ 11 AG
[2 marks]
At the end of their first year, the level of performance of each of the 18 employees in
the sample is assessed by their department manager. They are awarded a score between
1 (low performance) and 10 (high performance).
The marks in the written assessment and the scores given by the managers are shown
in both the table and the scatter diagram.
The firm decides to find a Spearman’s rank correlation coefficient, r s , for this data.
(c.i) Without calculation, explain why it might not be appropriate to
calculate a correlation coefficient for the whole sample of 18
employees. [2]
Markscheme
the international department manager seems to be less generous than the national
department manager R1
Note: The A1 is for commenting there is a difference between the two departments
and the R1 is for correctly commenting on the direction of the difference
[2 marks]
Markscheme
(M1)(A1)
Note: Award (M1) for an attempt to rank the data, and (A1) for correct ranks for both
variables. Accept either set of rankings in reverse.
Note: The (M1) is for calculating the PMCC for their ranks.
(A1)A1.
Accept −0. 909 if one set of ranks has been ordered in reverse.
[4 marks]
Markscheme
EITHER
there is a (strong) association between the written assessment mark and the
manager scores. A1
OR
there is a (strong) agreement in the rank order of the written assessment marks and
the rank order of the manager scores. A1
OR
there is a (strong linear) correlation between the rank order of the written
assessment marks and the rank order of the manager scores. A1
THEN
the written assessment is likely to be a valid measure (of the level of employee
performance) R1
[2 marks]
The same seven employees are given the written assessment a second time, at the end
of the first year, to measure its reliability. Their marks are shown in the table below.
(d.i) State the name of this type of test for reliability. [1]
Markscheme
test-retest A1
[1 mark]
(d.ii) For the data in this table, test the null hypothesis, H 0 : ρ = 0, against
Markscheme
0. 00209 < 0. 05 R1
Note: Do not award R0A1. Accept “accept H 1 ”. The final R1A1 can follow through
from their incorrect p-value.
[4 marks]
Markscheme
Note: Follow through from their answer in part (d)(ii). Do not award if there is no
conclusion in d(ii).
[1 mark]
The written assessment is in five sections, numbered 1 to 5. At the end of the year, the
employees are also given a score for each of five professional attributes: V, W, X, Y
and Z.
The firm decides to test the hypothesis that there is a correlation between the mark in a
section and the score for an attribute.
They compare marks in each of the sections with scores for each of the attributes.
(e.i) Write down the number of tests they carry out. [1]
Markscheme
25 A1
[1 mark]
Assuming that:
find the probability that at least one of the tests will be significant. [4]
Markscheme
1 − 0. 95
25
(M1)(A1)
Note: Award (M1) for use of 1 − P(0) or the binomial distribution with any value
of p.
= 0. 723 (0. 722610 …) A1
[4 marks]
(e.iii) The firm obtains a significant result when comparing section 2 of the
written assessment and attribute X. Interpret this result. [1]
Markscheme
(though the result is significant) it is very likely that one significant result would be
achieved by chance, so it should be disregarded or further evidence sought R1
[1 mark]
(a) Assuming that the shopkeeper’s claim is correct, find the probability
that the weight of six randomly chosen carrots is more than two times
the weight of one randomly chosen broccoli. [6]
Markscheme
* This question is from an exam for a previous syllabus, and may contain minor
differences in marking or structure.
Let X = Σ C i − 2B M1
i=1
E(X) = 6 × 130 − 2 × 400 = −20 (M1)(A1)
Note: Condone the notation 6C − 2B only if the (M1) is awarded for the variance.
[6 marks]
Dong Wook decides to investigate the shopkeeper’s claim that the mean weight of
carrots is 130 grams. He plans to take a random sample of n carrots in order to
calculate a 98 % confidence interval for the population mean weight.
(b) Find the least value of n required to ensure that the width of the
confidence interval is less than 2 grams. [3]
Markscheme
z = 2. 326 … (A1)
2zσ
< 2 M1
√n
√n > 11. 6 …
n > 135. 2 …
n = 136 A1
[3 marks]
Anjali thinks the mean weight, μ grams , of the broccoli is less than 400 grams. She
decides to perform a hypothesis test, using a random sample of size 8. Her hypotheses
are
H0 : μ = 400 ; H1 : μ < 400.
She decides to reject H 0 if the sample mean is less than 395 grams.
Markscheme
variance = 80
8
= 10 (A1)
= 0. 0569 or 5. 69% A1
[3 marks]
(d) Given that the weights of the broccoli actually follow a normal
distribution with mean 392 grams and variance 80 grams
2
, find
the probability of Anjali making a Type II error. [3]
Markscheme
¯
¯
= P(B >395 B ≈ N (392, 10)) (A1)
= 0. 171 A1
[3 marks]
© International Baccalaureate Organization, 2025