Stats EQ
Stats EQ
1. (a) Max wants to take a random sample of students from his year group.
(i) Explain what is meant by a random sample.
..............................................................................................................................................
..............................................................................................................................................
(ii) Describe a method Max could use to take his random sample.
..............................................................................................................................................
..............................................................................................................................................
..............................................................................................................................................
(2)
(b) The table below shows the numbers of students in 5 year groups at a school.
Year Number of students
9 239
10 257
11 248
12 190
13 206
Work out the number of students from Year 9 she has in her sample.
..............................................................................................................................................
(2)
(Total for Question is 4 marks)
2. The table shows information about 1065 students.
Elena takes a stratified sample of 120 students by year group and by gender.
Work out the number of Year 8 female students in her sample
...........................................................
4. For a school project, Neville is investigating the types of music people in the UK like to listen to.
He collects data by asking friends from his year group.
Is this sample likely to be representative of the population?
Give one way in which the sample could be improved.
..............................................................................................................................................
..............................................................................................................................................
..............................................................................................................................................
5. Sara is investigating the variation in daily maximum gust, t kn, for Camborne in June and July 1987.
She used the large data set to select a sample of size 20 from the June and July data for 1987. Sara selected the
first value using a random number from 1 to 4 and then selected every third value after that.
(a) State the sampling technique Sara used.
(1)
(b) From your knowledge of the large data set, explain why this process may not generate a sample of size
20.
(1)
2.1 Interpret diagrams for single-variable data, including understanding that area in a histogram
represents frequency
Connect to probability distributions
2.2 Interpret scatter diagrams and regression lines for bivariate data, including recognition of scatter
diagrams which include distinct sections of the population (calculations involving regression lines are
excluded)
Understand informal interpretation of correlation
Understand that correlation does not imply causation
2.3 Interpret measures of central tendency and variation, extending to standard deviation
Be able to calculate standard deviation, including from summary statistics
2.4 Recognise and interpret possible outliers in data sets and statistical diagrams
Select or critique data presentation techniques in the context of a statistical problem
Be able to clean data, including dealing with missing data, errors and outliers
1. The number of hours of sunshine each day, y, for the month of July at Heathrow are summarised in
the table below.
A histogram was drawn to represent these data. The 8 y < 11 group was represented by a bar of
width 1.5 cm and height 8 cm.
(a) Find the width and the height of the 0 y < 5 group.
(3)
(b) Use your calculator to estimate the mean and the standard deviation of the number of hours of
sunshine each day, for the month of July at Heathrow. Give your answers to 3 significant figures.
(3)
The mean and standard deviation for the number of hours of daily sunshine for the same month in Hurn
are 5.98 hours and 4.12 hours respectably. Thomas believes that the further south you are the more
consistent should be the number of hours of daily sunshine.
(c) State, giving a reason, whether or not the calculations in part (b) support Thomas’ belief.
(2)
(d) Estimate the number of days in July at Heathrow where the number of hours of sunshine is more
than 1 standard deviation above the mean.
(2)
(Total for Question is 10 marks)
2. The partially completed histogram and the partially completed table show the time, to the nearest
minute, that a random sample of motorists were delayed by roadworks on a stretch of motorway.
Estimate the percentage of these motorists who were delayed by the road works for between 8.5 and
13.5 minutes.
(5)
3. The mark, x, scored by each student who sat a statistics examination is coded using y = 1.4x − 20
The coded marks have mean 60.8 and standard deviation 6.60
Find the mean and the standard deviation of x.
(Total 4 marks)
4. Sara was studying the relationship between rainfall, r mm, and humidity, h %, in the UK. She takes
a random sample of 11 days from May 1987 for Leuchars from the large data set. She obtained the
following results.
h 93 86 95 97 86 94 97 97 87 97 86
r 1.1 0.3 3.7 20.6 0 0 2.4 1.1 0.1 0.9 0.1
A value that is more than 1.5 times the interquartile range (IQR) above Q3 is called an outlier.
Sara decided to exclude this day’s reading and drew the following scatter diagram for the remaining 10
days’ values of r and h.
The equation of the regression line of r on h for these 10 days is r 12.8 + 0.15h.
(e) (i) Comment on the suitability of Sara’s sampling method for this study.
(ii) Suggest how Sara could make better use of the large data set for her study.
(2)
(Total for Question is 7 marks)
3 Probability 26 marks
understand and be able to use mutually exclusive and independent events when calculating probabilities;
be able to make links to discrete and continuous distributions.
1. The Venn diagram shows the probabilities for students at a college taking part in various sports.
The probability that a student selected at random takes part in Athletics or Tennis is 0.75.
(b) State, giving a reason, whether or not the events A and T are statistically independent. Show your
working clearly.
(3)
(c) Find the probability that a student selected at random does not take part in Athletics or Cricket.
(1)
(c) Find the probability that the soft toy has none of these 3 defects.
(2)
(d) Find the probability that the soft toy has exactly one of these 3 defects.
(4)
(Total 12 marks)
3. State in words the relationship between two events R and S when P(R∩S) = 0
(1)
The events A and B are independent with P(A) = and P(A∪B) =
Find
(b) P(B)
(4)
(c) P(A'∩B)
(2)
(d) P(B' A)
(2)
(Total 9 marks)
4 Statistical distributions 34 marks
Understand and use simple, discrete probability distributions (calculation of mean and variance of
discrete random variables is excluded), including the binomial distribution, as a model; calculate
probabilities using the binomial distribution
1. The discrete random variable X can take only the values 2, 3, 4 or 6. For these values the
probability distribution function is given by
(2)
Find
(b) F(3)
(1)
(Total 3 marks)
2. The discrete random variable X can take only the values 1, 2 and 3. For these values the
cumulative distribution function is defined by
F (x) = x = 1,2, 3
(2)
(4)
(Total 6 marks)
3. In a large restaurant an average of 3 out of every 5 customers ask for water with their meal.
A random sample of 10 customers is selected.
where the random variable X represents the number of these customers who ask for water.
(3)
(Total 8 marks)
(2)
(b) more than 3 sales are made in 20 calls.
(2)
(c) Find the least number of calls each day a representative should make to achieve this requirement.
(2)
(d) Calculate the least number of calls that need to be made by a representative for the probability of at least
1 sale to exceed 0.95
(3)
(Total 9 marks)
5. A manufacturer supplies DVD players to retailers in batches of 20. It has 5% of the players returned
because they are faulty.
1. Write down a suitable model for the distribution of the number of faulty DVD players in a batch.
(2)
(2)
(2)
4. Find the mean and variance of the number of faulty DVD players in a batch.
(2)
(Total 8 marks)
Understand and apply the language of statistical hypothesis testing, developed through a binomial
model: null hypothesis, alternative hypothesis, significance level, test statistic, 1-tail test, 2-tail test,
critical value, critical region, acceptance region, p-value
Conduct a statistical hypothesis test for the proportion in the binomial distribution and interpret the
results in context
Understand that a sample is being used to make an inference about the population
and appreciate that the significance level is the probability of incorrectly rejecting the null hypothesis
1. In a manufacturing process 25% of articles are thought to be defective. Articles are produced in batches
of 20
(a) A batch is selected at random. Using a 5% significance level, find the critical region for a two tailed test
that the probability of an article chosen at random being defective is 0.25
You should state the probability in each tail which should be as close as possible to 0.025
(5)
The manufacturer changes the production process to try to reduce the number of defective articles. She then
chooses a batch at random and discovers there are 3 defective articles.
(b) Test at the 5% level of significance whether or not there is evidence that the changes to the process have
reduced the percentage of defective articles. State your hypotheses clearly.
(5)
(Total 10 marks)
2. Sammy manufactures wallpaper. She knows that defects occur randomly in the manufacturing process at
a rate of 1 every 8 metres. Once a week the machinery is cleaned and reset. Sammy then takes a random sample
of 40 metres of wallpaper from the next batch produced to test if there has been any change in the rate of defects.
(a) Stating your hypotheses clearly and using a 10% level of significance, find the critical region for this test.
You should choose your critical region so that the probability of rejection is less than 0.05 in each tail.
(4)
(2)
Thomas claims that his new machine would reduce the rate of defects and invites Sammy to test it. Sammy takes
a random sample of 200 metres of wallpaper produced on Thomas' machine and finds 19 defects.
(c) Using a suitable approximation, test Thomas' claim. You should use a 5% level of significance and state your
hypotheses clearly.
(7)
(Total 13 marks)
3. Sue throws a fair coin 15 times and records the number of times it shows a head.
(a) State the distribution to model the number of times the coin shows a head.
(2)
(2)
(2)
Sue has a different coin which she believes is biased in favour of heads. She throws the coin 15 times and
obtains 13 heads.
(c) Test Sue's belief at the 1% level of significance. State your hypotheses clearly.
(6)
(Total 12 marks)
4. State the conditions under which the normal distribution may be used as an approximation to the
binomial distribution.
(2)
A company sells seeds and claims that 55% of its pea seeds germinate.
(b) Write down a reason why the company should not justify their claim by testing all the pea seeds they
produce.
(1)
To test the company's claim, a random sample of 220 pea seeds was planted.
(d) State the hypotheses for a two-tailed test of the company's claim.
(1)
Given that 135 of the 220 pea seeds germinated,
(d) use a normal approximation to test, at the 5% level of significance, whether or not the company's claim is
justified.
(7)
(Total 11 marks)