0% found this document useful (0 votes)

50 views60 pages

Modelling and Stats Guide

Uploaded by

Jordy Van Kollenburg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views60 pages

Modelling and Stats Guide

Uploaded by

Jordy Van Kollenburg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

1

Modelling and Statistics

Techniques
A guide to help with the exploration:
Maths IB Standard Level and Higher Level
Applications and Interpretations
Analysis and Approaches

(For first examination in 2021).

Author: Andrew Chambers

All rights reserved. No part of this publication may be reproduced, distributed, or
transmitted in any form or by any means, including photocopying, recording, or other
electronic or mechanical methods, without the prior written permission of the
publisher, except in the case of brief quotations embodied in critical reviews and
certain other noncommercial uses permitted by copyright law.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
2
Table of Contents
Page 3: Introduction
Part 1: Modelling techniques

Page 4: Linear regression.

Page 7: Quadratic regression

Page 10: Cubic regression

Page 14: Exponential regression

Page 17: Linearisation using log scales.

Page 20: Trigonometric regression

Page 23: Other useful graphs

Part 2: Statistics techniques

Page 26: Pearson’s Product investigation: Height and arm span.

Page 35: Binomial investigation: Extra Sensory Powers

Page 38: Poisson investigation: Customers in a shop

Page 41: 2 sample t-tests: Comparing different classes, Reaction times

Page 43: Paired t-tests: Comparing the same class, Reaction times

Page 45: Chi Squared test: Efficiency of vaccinations

Page 48: Chi Squared Goodness of Fit: Are IB results normally distributed?

Page 51: Bernoulli trials for polling data.

Page 54: Spearman’s rank: Does cola taste preference increase with price?

Page 56: Sampling techniques and experiment design

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
3
Introduction

I’ve written this guide to supplement the main Exploration Guide I put together. You
should consult the main guide for guidance on choosing topics, an explanation of the
marking criteria, common student mistakes and technology advice. In this guide I
look at various modelling techniques and also a number of different statistical tests.
In many cases these are taught in textbooks simply using technology, whereas it is
often desirable to demonstrate a greater understanding through non-calculator
methods in your maths exploration. So, where possible I’ve included non-calculator
techniques.

It’s important to note that these methods are not intended to be exemplars - there
are many different ways of explaining the following techniques and ideas, these are
just my ideas! You should attempt to put your methods into your own words so that
you can demonstrate a good personal understanding. The students who do best in
their exploration consult from a variety of sources, collate the ideas and are therefore
able to show a deep understanding.

If you do use this guide then it is essential that you correctly cite this source in your
exploration - failure to cite sources correctly can lead to malpractice investigations by
the IB, so make sure everything is done correctly.

The exploration is a great opportunity to apply your maths knowledge to an area of

personal interest - so choose something you are passionate about, and enjoy it!

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
4
Linear Regression

Method 1

If you are doing a correlation investigation then you should use the Pearson’s
Product formula first to check the strength of correlation. Once you have done this
you can then find the equation of the line of best fit.

Method 2

If you are simply trying to find a linear regression line and not measure correlation
then you can use the least squares regression formula.

The equation of the line of best fit is given by:

y = mx + c

Where:

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
5
Say for example we have the following data points we want to fit a line through:

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
6

Let’s see how accurate this line is:

It’s pretty good! If we use the linear regression tool on Desmos by typing:

y1 ~
mx1 + c

We find we get exactly the same equation to 3 significant figures.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
7
Quadratic regression

Completing the square

If I have the following graph I would notice that it follows a general quadratic shape.
So, usually the easiest method to fit a quadratic curve is to use the form:

y = p(x q) 2
+r

Because my graph uses time and height, I will rewrite this as:

h(t) = p(t q) 2
+r

When written in this form, p represents the vertical stretch factor and will be negative
because the graph is concave down. (q,r) will be the coordinates of the vertex of the
graph.

Looking at the graph, I need to decide where a best-fit quadratic curve would have
its vertex. In this case it looks like the coordinate point (3, 6.5) is quite close.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
8
Therefore I have:

h(t) = p(t 3) 2
+ 6.5

Next I just need to find p. To do this I can choose any point that I want my curve to
go through. If I decide that my curve must go through (0,0) so that my model has a
height of 0 metres after 0 seconds, then I can substitute these values to find p:

h(0) = 0 = p(0 3) 2
+ 6.5

p= 6.5
9

p ≈ 0.722
Therefore:

h(t) = 0.722 (t 3) 2
+ 6.5

This gives the following curve:

We can see that it goes through the point we chose as a vertex as well as the origin.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
9
Desmos regression

I can also see what regression line Desmos will draw for these points by typing in:

y 1 ~ p(x1 q) 2
+r

This generates the following graph:

h(t) = 0.727 (t 3.06) 2

+ 6.39

We can see that this time Desmos fits the best possible quadratic for all the points -
and so it does not quite fit the maximum point or go through the origin.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
10
Cubic regression

Simultaneous equations

The general form of a cubic is:

y = ax3 + bx2 + cx + d

Because my graph uses time and height, I will rewrite this as:

h(t) = at3 + bt2 + ct + d

Here I have 4 unknowns and so need 4 equations. Luckily my graph goes through
(0,0) so I immediately know that d = 0. If your graph doesn’t pass through the origin
you can still use the same method but will use your GDC simultaneous equation
solver with 4 unknowns.

I will then choose the coordinate points which I want my graph to pass through. I’d
like it to pass through the origin (0, 0), the first maximum (1, 5.8), the first minimum
(3, 1.8) and the end point (4.5, 9).

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
11
This will therefore generate the following equations:

3 2
h(0) = 0 = a(0) + b(0) + c(0) + d

Therefore d = 0.

h(1) = 5.8 = a(1)3 + b(1)2 + c(1)

3 2
h(3) = 1.8 = a(3) + b(3) + c(3)

3 2
h(4.5) = 9 = a(4.5) + b(4.5) + c(4.5)

These simplify to give:

5.8 = a + b + c

1.8 = 27a + 9b + 3c

9 = 91.125a + 20.25b + 4.5c

Simultaneous equations can be solved using a GDC. For those doing HL maths you
might want to explore how to use the inverse of a 3x3 matrix to solve this. However
just using a Casio we could use the simultaneous equation solver:

This gives us: a = 1.00, b = -6.62, c = 11.4

Therefore we have the equation:

h(t) = 1.00t3 6.62t 2

+ 11.4t

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
12
If we plot this graph we get;

We can see that it passes through the points we specified and it is a very good fit.

Desmos regression

We can type the following in to see what regression line Desmos will create:

y 1 ~ ax1 3 + bx1 2 + cx1 + d

Desmos creates a very similar graph:

h(t) = 1.01t3 6.63t

2
+ 11.4t

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
13
Higher powers regression

You can use the same technique for cubic regression to find higher power
regression. For example a quartic curve has general equation:

y = ax4 + bx3 + cx2 + dx + e

Example of a quartic curve:

In order to fit a regression line to a quartic you will need to have 5 equations because
you have 5 unknowns. So, choose 5 points you want your graph to pass through.
You can then use your GDC simultaneous equation solver to solve.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
14
Exponential regression

Method 1: Simultaneous equations

The general equation for an exponential graph is:

y = AeBx + C

Because my graph plots infected (I) and time (t), I’ll rewrite it as:

I(t) = AeBt + C

This method needs your graph to have an asymptote y = 0 so that you can set C=0.
If your graph has an asymptote at (say) y = 3 then move all your points down by 3
(i.e take away 3 from each y coordinate). Then follow this method below to find A
and B. Finally you can set C = 3.

I(t) = AeBt

Next I need to choose 2 points that I want the exponential to pass through. I’m going
to choose coordinates one third and two thirds along so I can represent the curve in
the middle section. (5, 4.2) and (10, 14).

4.2 = AeB(5) (1)

B(10)
14 = Ae (2)

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
15
I can eliminate A by doing equation (2) divided by equation (1):

14 AeB(10)
4.2
= AeB(5)

10 e10B
3 = e5B

10
3 = e5B

10
ln( 3 ) = 5B

10
B = 0.2 ln( 3 )

I can then find A by substituting into one of the equations (e.g equation (1) ):

4.2 = AeB(5)

10
4.2 = Ae5(0.2ln( 3 ))

10
4.2 = Aeln( 3 )

4.2 = A( 10
3 )

A = 1.26.

And I can now round B to 3 sf to give:

B ≈ 0.241
So my equation is:
I(t) = 1.26e0.241t

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
16
This gives the following equation:

We can see that it goes through the 2 points we specified. It is a reasonable fit over
the first 10 days - but then fits less well over the next 10 days. We could try again
choosing the end point coordinate to ensure the curve fits through this point, or we
could choose to plot a piecewise function (i.e represent this using 2 different
equations, one equation for the first 10 days and another equation for the next 10
days).

Regression using Desmos

Desmos manages to fit a much better exponential curve - which shows that we
should try our exponential model again choosing one of the end coordinates.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
17
Exponential regression

Method 2: Linearisation

Let’s take the same graph and the same starting point:

I(t) = AeBt

We now do the following:

ln(I(t)) = ln(AeBt )

ln(I(t)) = lnA + ln(eBt )

ln(I(t)) = lnA + (Bt)ln(e)

ln(I(t)) = lnA + B t

ln(I(t)) = B t + lnA

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
18
Therefore this is in the form of the equation of a straight line, when we plot ln(I(t)) on
the y axis against t. When we do this, the gradient will be B and the y-intercept will
be lnA .

For example with the coordinate (5, 4.2), I will plot (5, ln(4.2)) etc. This will give:

I now can draw a straight line of best fit to find the gradient.

This has a gradient of 0.260. This means B = 0.260.

The y-intercept is -0.1832. This means that

lnA = 0.1832

A = e0.1832

A ≈ 0.833
So my equation is:

I(t) = 0.833e0.260t
Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
19
This gives the following graph:

Other uses of linearisation:

The idea of linearisation is to transform a graph into a straight line which we can then
find the gradient and y-intercept from easily.

We can also use linearisation to find the equation of graphs of the form:

y = AxB

lny = ln(AxB )

lny = lnA + ln(xB )

lny = lnA + Bln(x)

In this case we would plot lny against ln(x) and would have B as the gradient and
lnA as the y-intercept.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
20
Trigonometric regression

The general form for a sine regression is:

y = asin(b(x c)) + d
Because I’m looking at months t and average hours of sunlight S(t) I’ll rewrite this as:

S (t) = asin(b(t c)) + d

a represents the amplitude and can be thought of as a vertical stretch of S (t) = sin(t).

b is related to the period by the equation:

period = 2π
b

and we have a translation from the standard S (t) = sin(t) graph by the vector:

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
21
Step 1 is to find the amplitude (a). We do this by finding the difference between the
maximum and minimum points then dividing by 2.

a = 16.93 2 7.57

a ≈ 4.68
Step 2 is to find b. We note that the period of the graph is 12 therefore:

period = 2π
b

12 = 2π
b

π
b= 6

Step 3 is to look at the maximum point. This has a y coordinate of 16.93.

The graph of

S (t) = 4.68sin( π6 t)

would have a y-coordinate maximum of 4.68. Therefore the vertical translation must
be:

d = 16.93 4.68 = 12.25.

Step 4 is to plot the graph you have already drawn.

S (t) = 4.68sin( π6 t) + 12.25

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
22
Then we can see that the maximum point at (3, 16.93) should be at (7, 16.93).
Therefore we need a horizontal translation of 4. So c = 4.

So our final graph is:

S (t) = 4.68sin( π6 (t 4)) + 12.25

Regression using Desmos

I can type the following into Desmos to see what regression line it will create:

y 1 ~ asin( π6 (x1 c)) + d

Note I need to specify the value of b (see what happens if you don’t do this!)

S (t) = 4.57sin( π6 (t 3.67)) + 12.3

We can see that our initial model is pretty close to Desmos’ model here.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
23
Other useful graphs

1. Logistic regression model:

a
y = 1+ber(xc)

The logistic model can be very useful for modelling population growth - and will
appear when you use the SIR model for infections. The value a is the carrying
capacity and is the maximum that the population can reach.

2. Damped harmonic motion

y = ef (xg) sin(b(x c)) + d

Sometimes when modelling harmonic motion you will notice that the amplitude
changes - eg. the height of tides or the vertical height of a pendulum. In this case we
can plot a damped graph by multiplying the trig function by an exponential term.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
24
3. Circles

(x h) 2
+ (y k) 2
= r2

This generates a circle with radius r centred at (h,k). For example:

(x 2) 2
+ (y 3) 2
= 22

4. Ellipses

(x h)
2
+

(y k)
2
=1

a2 b
2

This generates an ellipse with distance from the centre to the edges of a horizontally
and b vertically, centred at (h,k). For example:

(x 2)
2
2
+

(y 3)
2
2
=1

1 2

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
25

5. Piecewise functions

Sometimes you’ll not be able to represent your graph using just one function - so you
can instead use a piecewise function like this:

This tells me that the function behaves like a linear function for all x values up to and
including 2, then behaves like a quadratic function for x values greater than 2. Note
here that you should usually aim to have a continuous function (i.e the value when x
= 2 is the same for both equations) when using this for modelling.

This gives the following graph:

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
26
Pearson’s Product Correlation investigation: height and arm span.

Correlation investigations are very common - but also have lots of things that can go
wrong with them, so here is an example where I highlight common mistakes and
show good practice.

Step 1: Personal engagement

Step 1. You need to work quite hard to justify a personal interest in order to get
more than C1 on correlation topics. Two ways of showing personal engagement will
be to do some reasonably time consuming data collection and creating a narrative as
to why you are investigating this topic.

“Is there a correlation between the height and arm span of Y13 boys?” is a
reasonable topic question which will be possible to complete, but it’s quite
depersonalised. Why do you care about this?

“Can understanding the relationship between height and arm span help me design
better fitting suits for Y13 boys?” is immediately more engaging. Now there is a
genuine purpose, and plenty of scope for reflection based on this topic question.

Step 2: Collecting data.

The two main problems here are not collecting enough data for the investigation to
be meaningful, and not showing any awareness of sampling methods. I would
recommend trying to collect 40-50 data points if you are collecting your own data. If
you are using secondary data then 50-75 would be better.

You should show a clear explanation of the method used to collect data. For
example, “I borrowed the height measuring machine from the school nurse and
during a Y13 PE lesson asked my sample to line up with a straight back (no shoes).
I measured in cm to 1 decimal place. etc.”

Your choice of sampling methods are simple random sampling, convenience,

systematic, quota and stratified. You need to show an awareness of which one you
are using, a justification for why you are using it and a discussion about potential
limitations.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
27
For example, if you decide that the population you are interested in is limited to Y13
boys then you could conduct a simple random sample by assigning a number to
every Y13 boy in the school and then using a random number generator to generate
your sample.

Step 3: Data presentation. If you have a lot of data then you probably are best
including the first part of a table in the main body and then the full table in the
appendix. I’ll work through some maths with 10 data points as an example.

Height (cm). Rounded to 4 sf. Arm Span (cm). Rounded to 4 sf.

156.4 162.0

177.7 176.5

161.1 160.8

170.9 170.4

173.3 185.2

173.0 176.5

162.9 170.8

161.2 162.3

188.7 190.9

178.6 180.0

Here we have arm span on the y-axis therefore we are investigating if arm span is
dependent on height. We can clearly see a positive linear correlation so it is relevant
to do a Pearson’s Product correlation calculation.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
28
Note - we might want to also show the graph with x and y axes starting from 150cm
to show the data point trend clearer.

Step 4: Maths processes.

Mean and standard deviation are going to be relevant when considering suit
measurements. We will then do a Pearson’s Product calculation to see how strong
the correlation is.

Finding the mean.

Note, there is a convention of using μx for the mean of the x values when you are
measuring every x value in the population and to use x when
you are finding the
mean from a sample. For example if I survey all Y13 students in my class and only
care about what the results tell me about this class then this is a population survey
(i.e I have surveyed the whole population). If I survey all Y13 students in my class
and use this to draw conclusions on other Y13 students in the school then this is a
sample.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
29
Finding the standard deviation

This gives the standard deviation as though you were using the entire population
data. There is a very similar equation to if you want to find the standard deviation for
a sample (this is called the unbiased estimator for the sample standard deviation):

At IB SL you’re not really expected to appreciate the difference but you could
mention it. I will use the standard deviation as though it is from the whole population.

x values (height) (x x) 2

156.4 195.4404

177.7 53.5824

161.1 86.1184

170.9 0.2704

173.3 8.5264

173 6.8644
162.9 55.9504
161.2 84.2724
188.7 335.6224
178.6 67.5684
894.216

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
30

Now we have both the mean and standard deviations we can do some reflection on
what these show - linking back to our aim.

Because we are looking at arm length and height it is probably reasonable to

assume that both of these are normally distributed (we could discuss how
reasonable this assumption is and even test this assumption).

We could then use the normal distribution with our mean and standard deviation to
work out (say) the range of heights that 95% of students will have. This requires the
inverse normal function:

This returns the result that 95% of student heights would be between 152cm and
189cm.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
31
Pearson’s Product Correlation formula

There are many other different versions of the Pearson’s Product formula. I think
that the ones below are the most useful as they make use of both the mean and
standard deviation calculations. The equation is in effect the average of the product
of the standardised scores.

1) Pearson’s Product Correlation for a population:

μx , σ x : population mean and standard deviation of x values. μy , σ y : population

mean and standard deviation of y values.

2) Pearson’s Product using sample mean and population standard

deviation:

If we obtain our means from a sample ( x, y ) we can replace μx and μy with x and
y because the sample mean is an unbiased estimator for our population mean
values. Therefore we have:

3) Pearson’s Product using sample mean and sample standard deviation:

Technically there is a slightly different version depending on whether we are using

the population standard deviation or the sample standard deviation. Sample
standard deviation is on the HL Applied syllabus but not the HL Analysis.

Both equations will give the same answer - so choose to use the one which matches
the standard deviation you used. I will use equation (2) in the working out below.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
32

(x x)
. (yσyy)
σx

1.763634539

0.236863645

1.292447823

-0.017849605

0.372200556

0.084779064

0.224051459

1.127988333

3.476728722

0.580496961

Therefore if we sum these values we can calculate the r value:

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
33
We could then reflect on this and what it shows. Because it is close to 1 it is
appropriate to find the equation of the regression line.

To find c we can then use the fact that a line of best fit will always go through the
mean x and y coordinates:

Then we can show we have checked all this using our GDC:

It’s good to include a screen-capture here to prove to the moderator you’ve done
everything correctly. When n is large there will be little difference between the
population standard deviation and the sample standard deviation - but with small n
there will be a greater discrepancy.

Clearly now we would look to use our regression line, make some reflections based
on the aim of the topic etc.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
34
Standardisation in the Pearson Product equation

For those of you who know about the standardised normal calculation, you will notice
the similarity in the standardisation process.

In the normal distribution we have:

And our standardised x value in our working out is:

Both of these are doing the same thing - the standardised score tells me how many
standard deviations away from the mean I am.

For example if I have an x value of 2, with a mean of 4 and standard deviation of 1,

then I am 2 standard deviations away from the mean.

The standardised score returns - 2 because the x value is smaller than the mean.
By using standardised x and y values, Pearson’s Product formula is then able to
compare how similarly the 2 sets of values are distributed.

If on average each standardised x value is paired with a standardised y value of the

same sign (i.e they are both positive meaning they are both to the right of their
respective means, or both negative meaning they are both to the left of their means)
then the Pearson’s Product formula will return a positive correlation

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
35
Binomial investigation: Extra Sensory Powers

Step 1
We can define our data collection process - we’re going to test if Person 1 has extra
sensory perception (ESP). We’ll use ESP cards with 5 symbols. For each trial one
of these symbols will be “transmitted” to Person 1. We will then record if they
correctly guess the transmitted symbol. We will do this trial 50 times.

A binomial distribution is appropriate because we have a fixed number of trials, a

fixed probability of success and the trials are independent of each other.

Step 2: Maths processes

We have a binomial distribution:

X ~ B (50, 0.2)

There are 50 trials and the probability of success by guessing is 0.2.

The mean (expected value) for a binomial is given by:

E (X) = np
E (X) = 50(0.2)
E (X) = 10

So, we would expect people who do this test to get around 10 correct if they have no
ESP powers.

The standard deviation for a binomial is given by:

p)
σ = √np(1

σ = √50(0.2)(1 0.2)

σ = 2√2

Therefore if we expect most our data to be within 2 standard deviations of the mean
then we would expect most people to get:

10 4√2 < X < 10 + 4√2

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
36
Which means that all the following values are within 2 standard deviations of the
mean:

5 ≤ X ≤ 15

Next, let’s do a hypothesis test for our binomial investigation.

H 0 : p = 0.2

H 1 : p > 0.2

Our null hypothesis ( H 0 ) is that the probability of Person 1 correctly guessing the
transmitted symbol is 0.2. Our alternative hypothesis ( H 1 ) is that the probability of
Person 1 correctly guessing the transmitted symbol is greater than 0.2.

Next, we work out the critical region. We will conduct our hypothesis test to the 5%
significance level. We are interested in finding:

P (X > a) > 0.05

This is the same as:

P (X ≤ a) > 0.95

Using our GDC we can see that:

P (X ≤ 14) ≈ 0.939
P (X ≤ 15) ≈ 0.969

Therefore

P (X > 15) > 0.05

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
37

And so if Person 1 gets 16 or more correct then this will be in our critical region - and
this will be evidence to reject our null hypothesis and accept the alternative
hypothesis.

We can notice that the value of 16 is just outside our boundary of values with 2
standard deviations of the mean found earlier:

5 ≤ X ≤ 15

As n gets large the binomial distribution can be approximated by the normal

distribution - and in a normal distribution we have around 95% of values with 2
standard deviations of the mean.

Let’s say that Person 1 got a remarkable 20 correct out of 50. In this case we can
clearly see that it is in our critical region, and so we would reject our null hypothesis.
We might then want to see how likely this result was to happen by chance.

P (X ≥ 20) = 1 P (X ≤ 19)
P (X ≥ 20) = 1 0.99906756

P (X ≥ 20) ≈ 0.000932

Therefore the probability of this happening by chance is only around 0.0932% - or

around 9 times in every 10,000 experiments.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
38
Poisson investigation: Customers in a shop

Step 1
Let’s say we are interested in finding out whether having a person standing outside a
shop handing out leaflets to passers-by has a significant impact on customer
numbers entering a shop.

A Poisson distribution is appropriate because we can assume that the arrivals of

groups of customers is independent (we count groups rather than individuals
because if a family of 3 people arrive together the arrival of the children is not
independent of the parent). We have a fixed time frame and we assume that the
mean number of arrivals during that specific time period on that day is constant.

We spend one hour at (say) 3-4pm on a Saturday counting the number of groups of
customers who enter the shop. We count 15 groups of people therefore we have:

X ~ P (15)

Step 2: Maths processes

For a Poisson we have:

μ = 15

σ = √μ

σ = √15

Therefore if we expect most our data to be within 2 standard deviations of the mean
then we would expect most people to get:

15 2√15 < X < 15 + 2√15

Which means that all the following values are within 2 standard deviations of the
mean:

8 ≤ X ≤ 22

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
39
Next, let’s do a hypothesis test for our poisson investigation.

H 0 : μ = 15

H 1 : μ > 15

Our null hypothesis ( H 0 ) is that the mean number of groups entering the shop will
be 15. Our alternative hypothesis ( H 1 ) is that the mean number of groups entering
the shop will be more than 15.

Next, we work out the critical region. We will conduct our hypothesis test to the 5%
significance level. We are interested in finding:

P (X > a) > 0.05

This is the same as:

P (X ≤ a) > 0.95

Using our GDC we can see that:

P (X ≤ 21) ≈ 0.947
P (X ≤ 22) ≈ 0.967

Therefore

P (X > 22) > 0.05

Therefore if there are 23 groups or more that enter our shop this will be in our critical
region - and this will be evidence to reject our null hypothesis and accept the
alternative hypothesis.
Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
40

We notice that 23 is just outside the 2 standard deviation bound that we found
earlier.

We then spend another hour (3-4pm on the next Saturday) counting customers
whilst the shop employs someone to hand out leaflets to passers-by. We find that
the shop this time has 21 groups of customers. This is not in our critical region, and
so we do not have evidence to reject the null hypothesis.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
41
2 sample t-tests: Comparing different classes, Reaction times

One of the main uses of the t-test is to be able to compare 2 samples and decide if
they both came from the same population. To do this we assume that the two
samples do come from the same population and therefore have the same standard
deviation. We want to test whether they share the same population mean.

If our population distribution is X , we need to assume that the mean X follows a
normal distribution. This is a reasonable assumption as long as either:

1) X is normally distributed
2) Or our sample size (n) is sufficiently large (usually it should be at least n >
30).

If X meets either of the following criteria and has population mean μ and population
standard deviation σ then:

σ2
X ~ N ( μ , n
)

Step 1:
Say for example I want to compare whether a class of Year 7 students have different
reaction times to a class of Year 13 students I could conduct a 2 sample t-test.
Reaction times reasonably approximate a normal distribution (I could discuss this in
more detail) so my test assumption is met. I would then collect some data on
reaction times explaining the collection process.

Step 2: Data collection

x: Year 7 221 215 212 320 295 209 211 349 220 198
reaction time
(milliseconds)

y: Year 13 312 341 225 214 238 188 378 301 205 226
reaction time
(milliseconds)

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
42
Step 3: Maths processes

I’m conducting a 2 tailed t-test because I am interested in any difference (better or

worse) between the 2 groups. Therefore :

H 0 : μx = μy

H 1 : μx =/ μy

Our null hypothesis ( H 0 ) is that the mean for the Year 7 students ( μx ) will be the
same as the mean for the Year 13 students ( μy ). Our alternative hypothesis ( H 1 ) is
that the two means are not equal.

I can conduct a 2 sample t-test on my GDC. I choose pooled data as an option

which assumes that the variances of both populations (Y13 students and Y7
students) are equal.

Here the GDC returns a p value of 0.515.

0.515 > 0.05

Therefore I do not have any evidence to reject the null hypothesis - and I can
conclude that there is no significant difference between the reaction times of Y13
and Y7 students.

I can see that whilst the means of the 2 samples are different (the mean for Y7
students is 245 milliseconds and is 262.8 milliseconds for Y13 students), that this is
not significant at the 5% level.

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
43
Paired t-tests: Comparing the same class, Reaction times

Step 1
Here I will do a paired t-test because I am comparing the same students (i.e the
results are paired - with each pair of results for one student) I am comparing the
same students’ reaction times before and after they are given some training on how
to improve their reaction times. This will be one-tailed because I want to see if there
is a significant improvement in reaction times in the second trial. I need to make the
same assumptions as for the 2 sample t-test.

Step 2: Data collection

x: Year 7 221 215 212 320 295 209 278 349 220 198
reaction time
before
training.
(milliseconds)

y: Year 7 210 250 188 318 238 188 211 301 205 167
reaction time
after training.
(milliseconds)

x-y 11 -35 24 2 57 21 67 48 15 31

Maths processes:

I’m conducting a 1 tailed paired t-test because I am interested in an improvement in

the 2nd trial. Therefore :

H 0 : μx μ y = 0

H1 : μ μ
x y > 0

Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
44

Our null hypothesis ( H 0 ) is that the difference between the means for the Year 7
students before training ( μx ) and after training ( μy ) will be 0. Our alternative
hypothesis ( H 1 ) is that the difference between the two means will be positive
(because in this context it will mean that the trial with training was faster). I will do
the test at the 5% level.

Here the GDC returns a p value of 0.0292.

0.0292 < 0.05

Therefore I have any evidence to reject the null hypothesis - and I can conclude that
there is significant improvement in the reaction times of Y7 students after training .

Say there are several different vaccinations being used to help prevent influenza,
and I want to find out whether people are able to avoid catching influenza
independently of which vaccine they receive. My null hypothesis is therefore that
there is no difference in the efficacy of the different vaccines against influenza. A Chi
squared test is appropriate here as I have categorical data and I am testing for
independence. I do not have a 2x2 table therefore I can avoid having to use Yate’s
Continuity Correction, and also all my table values are greater than 5.

H 0 : Avoidance of influenza is independent of which vaccine is received

H 1 : Avoidance of influenza is not independent of which vaccine is received

Data analysis:

Observed data Got influenza Avoided influenza Total

Vaccine 1 43 237 280

Vaccine 2 52 198 250

Vaccine 3 25 245 270

Vaccine 4 48 212 260

Vaccine 5 57 233 290

Total 225 1125 1350

Expected data Got influenza Avoided influenza Total

Vaccine 1 225
1350 (280) = 46.666... 1125
1350 (280) = 233.333... 280

Vaccine 2 225
1350 (250) = 41.666... 1125
1350 (250) = 208.333... 250

Vaccine 3 225
1350 (270) = 45 1125
1350 (270) = 225 270

Vaccine 4 225
1350 (260) = 43.333... 1125
1350 (260) = 216.666... 260

Vaccine 5 225
1350 (290) = 48.333... 1125
1350 (290) = 241.666... 290

Total 225 1125 1350

Next we use the equation:

χ2 =∑

(f o f e )
2

calc fe

f o : observed frequencies

f e : expected frequencies

We take away each of the expected frequencies from each of the observed
frequencies, square the result and divide by the expected frequency. We then sum
all our answers together.

2
(f o f e ) Got influenza Avoided influenza
fe

Vaccine 1
(43 46.666...)
2
= 0.288... 2
(237 233.333...)
= 0.0576...
46.666... 233.333...

Vaccine 2 2.56... 0.512...

Vaccine 3 8.88... 1.77...

Vaccine 4 0.502... 0.100...

Vaccine 5 1.55.... 0.310....

Therefore

χ2 calc = ∑

2
(f o f e )
fe

χ2 calc ≈ 16.5

Next I calculate the degrees of freedom (d.f) for my test. This is

d.f = (columns 1)(rows 1)

d.f = (2 1)(5 1) = 4
At the 5% level the critical value for Chi squared with 4 degrees of freedom is 9.488.
At the 1% level the critical value is 13.277. Therefore as:

16.5 > 13.277

My result is significant at the 1% level. We will therefore reject the null hypothesis
that avoidance of influenza is independent of the type of vaccination received.
Therefore we have evidence that some of the vaccines are more effective than
others in helping prevent influenza.

We can use a Chi Squared Goodness of Fit test to test if data follows a certain
distribution. Say for example I want to see if the IB results follow a normal
distribution I can proceed as follows:

Step 1: Data

Data from the 2020 May IB Provisional Stats Bulletin.

IB Score 0 ≤ X ≤ 15 16 ≤ X ≤ 23 24 ≤ X ≤ 27 28 ≤ X ≤ 31 32 ≤ X ≤ 35 36 ≤ X ≤ 39
X

Number 1726 12,401 16,375 19,369 17,112 12,256

of
students

IB Score X 40 ≤ X ≤ 43 44 ≤ X ≤ 45

Number of 5,807 559

students

Step 2: Mathematical processes:

I can then find the mean and the sample standard deviation using my GDC (plotting
the mid-point values):

I will take the sample mean as 29.7 to 3sf and the sample standard deviation (which
is the same as the population standard deviation to 3sf in this case) as 7.13.

H 0 : X ~ N (29.7, 7.132 )

H 1 : X does not follow this normal distribution.

Next I can use the normal distribution X ~ N (29.7, 7.132 ) to calculate the following
expected values (leaving the bottom inequality unbounded and the top inequality
unbounded).

For example I can use my GDC to find:

P (16 ≤ X ≤ 23) = 0.14635097

Therefore the expected number of students is the probability multiplied by the total
number of students:

0.14635097(85605) = 12528.37479

I’ll round the nearest number of students to give 12,528.

IB Score X ≤ 15 16 ≤ X ≤ 23 24 ≤ X ≤ 27 28 ≤ X ≤ 31 32 ≤ X ≤ 35 36 ≤ X ≤ 39
X

Expected 1679 12,528 12,023 14,259 12401 7910

number
of
students

IB Score X 40 ≤ X ≤ 43 44 ≤ X

Expected 3700 1922

number of
students

I can then use my GDC to perform a Chi Squared Goodness of Fit test. For this test
the degrees of freedom when I have n cells of data is:

d.f = n 1
Copyright © 2020 Andrew Chambers. All rights reserved. 300 IA ideas: https://ibmathsresources.com.
50
But I also estimated both the mean and the standard deviation therefore my degrees
of freedom are:

d.f = n 1 1 1
d.f = 8 1 1 1

d.f = 5
Entering this data into my GDC gives:

χ2 calc = 9753

(You can see a Youtube video for how to do this for a CASIO here )

The critical value at the 5% level for 5 degrees of freedom is 11.07.

Therefore as:

9753>11.07

We reject the null hypothesis - i.e we accept the alternative hypothesis that X does

not follow the normal distribution we tested.

Say I interview 1000 people and ask them a binary question such as “will you vote
Republican or Democrat in the next US election?” (With “don’t knows” or “won’t vote
for either” discarded), I might want to then know how confident I can be that my
results can be applied to the whole population. As long as my sampling technique
accurately reflects the population make-up, 1000 people will give a pretty accurate
gauge of public opinion (which is why many polling companies poll 1000 people).

Maths processes

We model this as 1000 repeated Bernoulli trials. A Bernoulli trial is just defined as a
binomial trial with n = 1. i.e:

X ~ B (1, p)

Each trial is defined as asking a person who they are going to vote for. We can
define p in this case as “success” when someone polled chooses Democrat (for
example).

And if we sum identical, independent Bernoulli trials we get:

n
∑ X k ~ B (n, p)
k=1

Therefore if we conduct 1000 trials we have:

1000
∑ X k ~ B (1000, p)
k=1

So, let’s define our distribution as:

Y ~ B (1000, p)

So, say we interview 1000 people and 685 people say they will vote Democrat. We
want to know between what percentages we can be 95% confident people will
actually vote Democrat.

We use our data to give an estimation of the probability someone will vote Democrat:

p ≈ 685
1000

For a binomial distribution:

Y ~ B (1000, 0.685)

we have the following equation for the mean:

μ = np

Therefore for our sample mean (m) we have the following estimate:

m = 1000(0.685) = 685

And for a binomial we have the following for the standard deviation:

σ = √np(1 p)
Therefore for our sample standard deviation (d) we have the following estimate:

d = √685(1 0.685)
d ≈ 14.689
We can now use the fact that when n is large the binomial distribution is
approximated by the normal distribution.

±
The 95% confidence interval has z value 1.96 therefore this means that 95% of
±
values will lie between 1.96 standard deviations of the mean.

I.e we can be 95% confident that the actual probability lies between:

m 1.96(d) < μ < m + 1.96(d)

685 1.96(14.689) < μ < 685 + 1.96(14.689)
685 1.96(14.689) < np < 685 + 1.96(14.689)

685 1.96(14.689) < 1000p < 685 + 1.96(14.689)

We can see that our original estimate of 68.5% therefore has a margin of error of ±
2.5%. This means that it will be quite accurate (as long as our sampling was done
correctly). You can read more on the maths behind this polling method here - which
provided the idea behind the example used here.

In many cases, drawing a scatter plot followed either by Pearson’s Product or a

non-linear regression is going to be appropriate. But in the case of when you are
ranking something by its position in a sequence, then Spearman’s rank should be
used. Note that Spearman’s rank will tell you if the 2 variables are monotonic
increasing or decreasing (i.e that as one increases/decreases the other will also
increase/decrease) but it will not tell you if this relationship is linear or not.

Step 1
I want to investigate if students will prefer the most expensive cola drinks when given
a blind tasting test. I could choose 5 different colas, give them to (say) 30 students -
and each student has to rank them from 1 to 5. 1 being the best, 5 being the worst. I
would then tally all the scores and then give an overall rank to the 5 colas.

Step 2: Data analysis

Taste rank 1 2 3 4 5

Cost (Baht) 55 38 40 20 25

Rewriting the second row in terms of ranks (with 1 for the most expensive) gives:

Taste rank 1 2 3 4 5

Cost rank 1 3 2 5 4

I can then use the following formula to calculate Spearman’s rank (as long as there
are no tied ranks):

6 ∑ d2

1)

rs = 1 n(n2

Here d is the difference between each paired ranks, and n is the number of ranks we
have used.

Taste rank 1 2 3 4 5

Cost rank 1 3 2 5 4

d 0 -1 1 -1 1

So this gives:

2 2
2 2
∑ d = 0 + ( 1) + 1 + ( 1) + 1 = 4

2 2

And we have n = 5. Therefore:

6 ∑ d2

1)

rs = 1 n(n2

rs = 1 6(4)
2
5(5 1)

rs = 0.8

This shows that there is a positive relationship between the 2 variables - i.e as the
cost of the cola increases, the extent to which people prefer it also increases.

We can check this on the GDC by simply finding Pearson’s Product for the 2 ranks:

1) What is your population?

It’s important to know what your population is. If I want to know about one Year 13
class - without drawing any further conclusions for other Year 13 students then my
population is that Year 13 class. If I include every student in that class then I will
have population data - not sample data. There are slightly different formulae (and
notation) used for population data compared with sample data so you should be
aware of this.

If I take data from one Year 13 class but want to draw wider conclusions about a
bigger population then this is a sample. Be clear what your wider population is - is it
the Year 13 cohort in your school? Is it all Year 13 students in the world? Is it all
students in your school?

What your population is will determine how reasonable your choice of sample is. It
probably is reasonable to use a sample of Year 13 students in your school to give
you data about the population of Year 13 students in your school. If you use a
sample of Year 13 students in your school to draw conclusions about all Year 13
students in the world then you need to consider how representative your school is of
the average Year 13 student. It may be that you could narrow down your population
(say to Year 13 students at international schools in your country) so that the sample
data you collect is more representative.

If your sample is not representative of the population then the conclusions you draw
will not be valid. For example if my population is school aged children and I want to
find the average height - but only sample Year 13 students, clearly the data I get is
useless in drawing any wider conclusions about the height of my population.

2) Sampling

You are expected to show an awareness of sampling when conducting a statistics

investigation - your sampling technique might not be perfect (and you can discuss
these limitations) but you should have a technique! The IB explicitly mentions the
following techniques:

a) Simple random sampling.

Every member of the population has an equal chance of being selected. For
example, if my population is all Y13 students at my school and I assign a
number to each one. I then use a random number generator to select 10
students.

b) Systematic sampling
There is a system to select members by using a random starting point and a
fixed interval. For example if my population is all Y13 students at my school
and I assign a number to each one. I then use a random number generator to
select the first student. I then add 10 to this number to select the second
student etc.

c) Stratified sampling
My population is divided into non-overlapping strata which share common
characteristics and I then chose a random sample from each strata.
For example if my population is all Y13 students at my school I could divide
the students into the two strata boys and girls. I would then choose a sample
from both groups. If my population was all students at my school then my
strata could be the different year groups.
You can add a quota to your stratified sample - for example I divide my
population into year group strata - but then also want to make sure that I
survey 55% girls and 45% boys in each year group (perhaps because this
reflects the overall gender mix of the school etc).

d) Convenience sampling
Here you just choose whatever is easiest to do! As you might expect this is
not an especially good technique when trying to draw wider conclusions from
your sample. For example you may do a survey on the path to the lunch-hall
and stop the first 10 students you see. However it may be appropriate
depending on what you are hoping to achieve. If your population is Year 13
students then it may be convenient to just survey everyone in your form class
- and as long as your form class is representative of the Year 13 population
then this would still allow valid wider conclusions.

You should try to design a study and a data collection process that reduces errors
and bias. At the very least you should show an awareness of potential errors/bias
when discussing your design.

Some common errors/biases in survey design include:

a) Email /internet surveys - even if these are sent to the whole population, the
people who respond might not be representative of that population. Perhaps
they are more diligent than usual (or perhaps they have more time on their
hands!)

b) Poor design in data collection which does not allow you to accurately compare
students. For example if you are measuring heights of students you need to
very clearly set out the standardised process you used with every student.
Perhaps every student must take off their shoes, stand with their back straight
against the wall, be measured by the same person etc.

c) A lack of anonymity leading to untrue answers. You will likely get different
answers to some questions depending on whether the survey is anonymous
or not. People will (unsurprisingly) be less likely to tell the truth if they think it
makes them look bad. If you do a survey on time spent doing exercise a
week some people will tell you what they think they should be doing, not what
they are doing! This is a common problem for doctors when talking to
patients.

Say for example I want to conduct an experiment to test whether listening to music
can help with memorising a list of words. A simple way of doing this would be:

Simple experiment

Twenty students are given a list of 10 words and 2 minutes to memorise. They then
write down how many they can remember.
The same 20 students are now given 10 different words and 2 minutes to memorise.
This time they listen to music whilst memorising. They then write down how many
they can remember.

Whilst this looks superficially like a fair experiment there are some serious flaws.
Firstly I can’t be sure that the 2 lists of 10 words are equally easy to remember. This
flaw on its own makes the whole experiment completely useless in drawing any
conclusions from. Secondly it may be the case that the students perform better on
the second trial because they have already got their brains in-gear (or worse
because they are tired etc). So we need to control for both of these problems.

Better experiment

Five students are given a list of 10 words [LIST 1] and 2 minutes to memorise.
There is no music.
The same 5 students are now given 10 different words [LIST 2] and 2 minutes to
memorise. They listen to music whilst memorising.

Five other students are given a list of 10 words [LIST 2] and 2 minutes to memorise.
There is no music.
The same 5 students are now given 10 different words [LIST 1] and 2 minutes to
memorise. They listen to music whilst memorising.

Five other students are given a list of 10 words [LIST 1] and 2 minutes to memorise.
They listen to music whilst memorising.
The same 5 students are now given 10 different words [LIST 2] and 2 minutes to
memorise. There is no music.

Five other students are given a list of 10 words [LIST 2] and 2 minutes to memorise.
They listen to music whilst memorising.
The same 5 students are now given 10 different words [LIST 1] and 2 minutes to
memorise. There is no music

This experiment design means that half the students had LIST 1 first and half had
LIST 2 first, half the time LIST 1 was memorised with music and half the time LIST 2
was memorised with music, half the time the students listened to music first and half
the time they listened to music second.

If we denote A: LIST 1, B: LIST 2, N: No music, M: Music, we can write the

combinations as:

AN-BM
BN-AM
AM-BN
BM-AN

We can see that we have covered all the possible pairings.

I can then conduct a paired t-test, where I simply compare the student “with music”
score with the student “without music” score.

For more complicated designs you can use the idea of Latin Squares to design fair
experiments like this.

Mathematical Methods Units 1 2 AC VCE PDF
100% (7)
Mathematical Methods Units 1 2 AC VCE PDF
851 pages
Edexcel A Level Mathematics Topic Checklist
No ratings yet
Edexcel A Level Mathematics Topic Checklist
12 pages
Ia Checklist
No ratings yet
Ia Checklist
9 pages
VCE Mathematical Methods 1&2
100% (12)
VCE Mathematical Methods 1&2
851 pages
Paper 3 Exploration Questions All v5
No ratings yet
Paper 3 Exploration Questions All v5
41 pages
Revision Checklist For o Level Mathematics 4024 Final
100% (2)
Revision Checklist For o Level Mathematics 4024 Final
15 pages
Mathematics Csec Summary 2022
100% (1)
Mathematics Csec Summary 2022
22 pages
Methods Textbook
100% (4)
Methods Textbook
851 pages
Newsl 2.3: Swans and Owans
No ratings yet
Newsl 2.3: Swans and Owans
3 pages
Modelling Guide
No ratings yet
Modelling Guide
50 pages
Maths Core 3 Coursework Mei
100% (2)
Maths Core 3 Coursework Mei
9 pages
Mathematical Supplements
No ratings yet
Mathematical Supplements
154 pages
An Introduction To Numerical Methods A MATLAB Approach Third Edition Guenther Instant Download
No ratings yet
An Introduction To Numerical Methods A MATLAB Approach Third Edition Guenther Instant Download
52 pages
Class 10 Mathematics Dos and Donts 1 29
No ratings yet
Class 10 Mathematics Dos and Donts 1 29
29 pages
Dokumen - Pub Numerical and Statistical Methods For Computer Engineering Gujarat Technological University 2017 2nbsped 9789352604852 9352604857
No ratings yet
Dokumen - Pub Numerical and Statistical Methods For Computer Engineering Gujarat Technological University 2017 2nbsped 9789352604852 9352604857
598 pages
Mei Core 3 Coursework Examples
100% (2)
Mei Core 3 Coursework Examples
7 pages
IB AA Topic Checklist
No ratings yet
IB AA Topic Checklist
3 pages
An Introduction To Numerical Methods.a Matlab Approach
100% (1)
An Introduction To Numerical Methods.a Matlab Approach
435 pages
Add Maths Formulae List
No ratings yet
Add Maths Formulae List
8 pages
E Math Notes
No ratings yet
E Math Notes
10 pages
Revision Checklist For O Level Additional Mathematics 4037 FINAL
No ratings yet
Revision Checklist For O Level Additional Mathematics 4037 FINAL
18 pages
c3 Coursework Mei Comparison
100% (2)
c3 Coursework Mei Comparison
7 pages
NON LINEAR EQUATIONS Final
No ratings yet
NON LINEAR EQUATIONS Final
5 pages
Business
No ratings yet
Business
2 pages
IB Math Study Bible (HL)
No ratings yet
IB Math Study Bible (HL)
12 pages
Example of Mei c3 Coursework
100% (2)
Example of Mei c3 Coursework
6 pages
Digital SAT Math
No ratings yet
Digital SAT Math
90 pages
Term Paper: Curve Fitting Numerical Methods
No ratings yet
Term Paper: Curve Fitting Numerical Methods
14 pages
Further and Additional GCSE Maths Comparison
No ratings yet
Further and Additional GCSE Maths Comparison
2 pages
CK 12 Middle School Math Concepts Grade 6 B v51 z63 s1
100% (2)
CK 12 Middle School Math Concepts Grade 6 B v51 z63 s1
1,213 pages
Mei Maths Coursework Example
100% (2)
Mei Maths Coursework Example
7 pages
MQ12 Further Maths 5E U3&4 Book
100% (3)
MQ12 Further Maths 5E U3&4 Book
807 pages
Ocr Mei Maths c3 Coursework
100% (2)
Ocr Mei Maths c3 Coursework
8 pages
Numerical Approximation Methods
No ratings yet
Numerical Approximation Methods
14 pages
Formulas Not in Tables
No ratings yet
Formulas Not in Tables
5 pages
Revision Checklist For O Level Mathematics 4024 FINAL
No ratings yet
Revision Checklist For O Level Mathematics 4024 FINAL
15 pages
Maths
No ratings yet
Maths
23 pages
A Level Pure Maths Pure Topic Checklist
100% (1)
A Level Pure Maths Pure Topic Checklist
2 pages
Quick Revision Guide by Junaid Bhuri Final
No ratings yet
Quick Revision Guide by Junaid Bhuri Final
46 pages
Maths Concepts and Formulae: y FX F y X
No ratings yet
Maths Concepts and Formulae: y FX F y X
16 pages
NM Coursework Mei Example
100% (2)
NM Coursework Mei Example
6 pages
Midterm Review Guide
No ratings yet
Midterm Review Guide
2 pages
IGCSE Functions
No ratings yet
IGCSE Functions
127 pages
Numerical Methods II - Curve-Fitting I
No ratings yet
Numerical Methods II - Curve-Fitting I
95 pages
Lab06.least Squares Fitting Shortened - Desmos - MATH-1173-001 - Calculus I With Computer Expl
No ratings yet
Lab06.least Squares Fitting Shortened - Desmos - MATH-1173-001 - Calculus I With Computer Expl
5 pages
IBMYP Year 5 Math - PreBoard and Final e - Assessment Portion 2023-24
No ratings yet
IBMYP Year 5 Math - PreBoard and Final e - Assessment Portion 2023-24
3 pages
MATH2059 - Ozge KK-Chapter1
No ratings yet
MATH2059 - Ozge KK-Chapter1
34 pages
1a MEC500
No ratings yet
1a MEC500
25 pages
s5 6p Maths Checklist 2024 2025 Kvhhi0nbha
No ratings yet
s5 6p Maths Checklist 2024 2025 Kvhhi0nbha
8 pages
Exam Predictor Simulator
No ratings yet
Exam Predictor Simulator
17 pages
Trigonometry Lecture
No ratings yet
Trigonometry Lecture
211 pages
Caie Igcse Add Maths 0606 Theory
No ratings yet
Caie Igcse Add Maths 0606 Theory
10 pages
Ib Mathematics Internal Assessment Student Handout: Ellen Thompson Updated May 2020
No ratings yet
Ib Mathematics Internal Assessment Student Handout: Ellen Thompson Updated May 2020
20 pages
CAS Introduction 2019-2020
No ratings yet
CAS Introduction 2019-2020
28 pages
CAS Introduction 2019-2020
No ratings yet
CAS Introduction 2019-2020
35 pages
Akhil Math IA Properties of AC Circuits
No ratings yet
Akhil Math IA Properties of AC Circuits
12 pages
Eel 5245 Power Electronics I Lecture #2: Chapter 1 Introduction To Power Electronics
No ratings yet
Eel 5245 Power Electronics I Lecture #2: Chapter 1 Introduction To Power Electronics
27 pages
Profile Skills: Contacto
No ratings yet
Profile Skills: Contacto
1 page
Assignment - Professional Commiunications and Negotiation Skills-1
33% (3)
Assignment - Professional Commiunications and Negotiation Skills-1
5 pages
Bulb Onion Production in Ethiopia
No ratings yet
Bulb Onion Production in Ethiopia
70 pages
List of MCA For CSC
No ratings yet
List of MCA For CSC
9 pages
DE09 Sol
No ratings yet
DE09 Sol
157 pages
The 5th ICMS Agenda
No ratings yet
The 5th ICMS Agenda
13 pages
MOD 3 10KTL3 XH User Manual EN
No ratings yet
MOD 3 10KTL3 XH User Manual EN
29 pages
ME2102 Tutorial 6
No ratings yet
ME2102 Tutorial 6
2 pages
CH12
No ratings yet
CH12
8 pages
Activity 3.1.3 Commercial Wall Systems Answer Key
No ratings yet
Activity 3.1.3 Commercial Wall Systems Answer Key
4 pages
Test 2 Answers
No ratings yet
Test 2 Answers
8 pages
Adobe Scan 04-Mar-2024
No ratings yet
Adobe Scan 04-Mar-2024
12 pages
Cambridge IGCSE: PHYSICS 0625/41
No ratings yet
Cambridge IGCSE: PHYSICS 0625/41
16 pages
Proportional Relief Valves, High Pressure: SS-4R3A
No ratings yet
Proportional Relief Valves, High Pressure: SS-4R3A
2 pages
Fractionated Coconut Oil: Material Safety Data Sheet
No ratings yet
Fractionated Coconut Oil: Material Safety Data Sheet
3 pages
Job Vacancies Beatrice (Mine)
No ratings yet
Job Vacancies Beatrice (Mine)
3 pages
Steps Involved in Production and Utilization of A TV Programme
No ratings yet
Steps Involved in Production and Utilization of A TV Programme
5 pages
6 FM Circuits
100% (1)
6 FM Circuits
33 pages
Child Friendly School S High School 1
No ratings yet
Child Friendly School S High School 1
17 pages
The Most Sensitive Area of The Tooth During
No ratings yet
The Most Sensitive Area of The Tooth During
5 pages
Instruction Manual: Digital Genset Controller DGC-500
No ratings yet
Instruction Manual: Digital Genset Controller DGC-500
151 pages
Size of Capacitor For Power Factor Correction Size of Capacitor For Power Factor Correction
No ratings yet
Size of Capacitor For Power Factor Correction Size of Capacitor For Power Factor Correction
4 pages
BMC Script Writing
No ratings yet
BMC Script Writing
2 pages
Petrel 2014 1 Release Notes
No ratings yet
Petrel 2014 1 Release Notes
46 pages
Objective:: Power Plant Lab (Me-223L) Experiment No: 6 Title: Demonistration of Steam Engine
No ratings yet
Objective:: Power Plant Lab (Me-223L) Experiment No: 6 Title: Demonistration of Steam Engine
5 pages
Drug Calculation Tutorial
100% (2)
Drug Calculation Tutorial
13 pages
Dsa 24 H Imp
No ratings yet
Dsa 24 H Imp
1 page
Visual Testing: - Asme - Section 5 (NDT) - Section 5 - Article 9 (VT)
100% (3)
Visual Testing: - Asme - Section 5 (NDT) - Section 5 - Article 9 (VT)
29 pages

Modelling and Stats Guide

Uploaded by

Modelling and Stats Guide

Uploaded by

1

Modelling and Statistics

(For first examination in 2021).

Author: Andrew Chambers

Page 4: Linear regression.

Page 7: Quadratic regression

Page 10: Cubic regression

Page 14: Exponential regression

Page 17: Linearisation using log scales.

Page 20: Trigonometric regression

Page 23: Other useful graphs

Part 2: Statistics techniques

Page 26: Pearson’s Product investigation: Height and arm span.

Page 35: Binomial investigation: Extra Sensory Powers

Page 38: Poisson investigation: Customers in a shop

Page 41: 2 sample t-tests: Comparing different classes, Reaction times

Page 45: Chi Squared test: Efficiency of vaccinations

Page 51: Bernoulli trials for polling data.

Page 56: Sampling techniques and experiment design

The exploration is a great opportunity to apply your maths knowledge to an area of

The equation of the line of best fit is given by:

Let’s see how accurate this line is:

We find we get exactly the same equation to 3 significant figures.

Completing the square

This gives the following curve:

This generates the following graph:

h(t) = ­ 0.727 (t ­ 3.06) 2

The general form of a cubic is:

h(t) = at3 + bt2 + ct + d

h(1) = 5.8 = a(1)3 + b(1)2 + c(1)

These simplify to give:

9 = 91.125a + 20.25b + 4.5c

This gives us: a = 1.00, b = -6.62, c = 11.4

Therefore we have the equation:

h(t) = 1.00t3 ­ 6.62t 2

y 1 ~ ax1 3 + bx1 2 + cx1 + d

Desmos creates a very similar graph:

h(t) = 1.01t3 ­ 6.63t

y = ax4 + bx3 + cx2 + dx + e

Example of a quartic curve:

Method 1: Simultaneous equations

The general equation for an exponential graph is:

4.2 = AeB(5) (1)

And I can now round B to 3 sf to give:

Regression using Desmos

We now do the following:

ln(I(t)) = lnA + ln(eBt )

ln(I(t)) = lnA + (Bt)ln(e)

This has a gradient of 0.260. This means B = 0.260.

The y-intercept is -0.1832. This means that

Other uses of linearisation:

lny = lnA + ln(xB )

lny = lnA + Bln(x)

The general form for a sine regression is:

S (t) = asin(b(t ­ c)) + d

b is related to the period by the equation:

Step 3 is to look at the maximum point. This has a y coordinate of 16.93.

d = 16.93 ­ 4.68 = 12.25.

S (t) = 4.68sin( π6 t) + 12.25

So our final graph is:

S (t) = 4.68sin( π6 (t ­ 4)) + 12.25

Regression using Desmos

y 1 ~ asin( π6 (x1 ­ c)) + d

S (t) = 4.57sin( π6 (t ­ 3.67)) + 12.3

1. Logistic regression model:

2. Damped harmonic motion

y = ef (x­g) sin(b(x ­ c)) + d

This generates a circle with radius r centred at (h,k). For example:

This gives the following graph:

Step 1: Personal engagement

Step 2: Collecting data.

Your choice of sampling methods are simple random sampling, convenience,

Height (cm). Rounded to 4 sf. Arm Span (cm). Rounded to 4 sf.

Step 4: Maths processes.

Finding the mean.

Because we are looking at arm length and height it is probably reasonable to

h(t) = 0.727 (t 3.06) 2

h(t) = 1.00t3 6.62t 2

h(t) = 1.01t3 6.63t

S (t) = asin(b(t c)) + d

d = 16.93 4.68 = 12.25.

S (t) = 4.68sin( π6 (t 4)) + 12.25

y 1 ~ asin( π6 (x1 c)) + d

S (t) = 4.57sin( π6 (t 3.67)) + 12.3

y = ef (xg) sin(b(x c)) + d

10 4√2 < X < 10 + 4√2

15 2√15 < X < 15 + 2√15

d.f = (columns 1)(rows 1)