Assignment - MATHS IV
Assignment - MATHS IV
Engineering Maths IV
Assignment
Probability and Statistics
FRANCIS WANTONO
Question One
a) Explain what you understand by a statistical model.
b) Write down a random variable which could be modelled by
i. a discrete uniform distribution,
ii. a normal distribution.
Question 2
A group of students believes that the time taken to travel to college, T minutes, can be assumed
to be normally distributed. Within the college 5% of students take at least 55 minutes to travel
to college and 0.1% take less than 10 minutes.
Find the mean and standard deviation of T.
Question 3
The discrete random variable X has probability function
1
a) Show that 𝑘 = 15
1
The distances, in kilometres, travelled to school by the teachers in two schools, A and B, in the
same town were recorded. The data for School A are summarized in Diagram 1.
For School B, the least distance travelled was 3 km and the longest distance travelled was 55 km.
The three quartiles were 17, 24 and 31 respectively.
An outlier is an observation that falls either 1.5 × (interquartile range) above the upper quartile
or 1.5 × (interquartile range) below the lower quartile.
b) Draw a box plot for School B.
c) Compare and contrast the two box plots.
Question 6
For any married couple who are members of a tennis club, the probability that the husband has
3 1
a degree is 5 and the probability that the wife has a degree is 2. The probability that the
11
husband has a degree, given that the wife has a degree, is 12.
2
Question 7
A piece of string AB has length 12 cm. A child cuts the string at a randomly chosen point P, into
two pieces. The random variable X represents the length, in cm, of the piece AP.
a) Suggest a suitable model for the distribution of X and specify it fully
b) Find the cumulative distribution function of X.
c) Write down P(X < 4).
Question 8
A manufacturer of chocolates produces 3 times as many soft centred chocolates as hard centred
ones.
Assuming that chocolates are randomly distributed within boxes of chocolates, find the
probability that in a box containing 20 chocolates there are;
a) Equal numbers of soft centred and hard centred chocolates,
b) Fewer than 5 hard centred chocolates.
A large box of chocolates contains 100 chocolates.
c) Write down the expected number of hard centred chocolates in a large box.
Question 9
The continuous random variable X has probability density function f(x) given by
3
Question 10
At the end of a season a league of eight ice hockey clubs produced the following table showing
the position of each club in the league and the average attendances (in hundreds) at home
matches.
a) Calculate the Spearman rank correlation coefficient between position in the league and
average home attendance.
b) Stating clearly your hypotheses and using a 5% two-tailed test, interpret your rank
correlation coefficient.
Many sets of data include tied ranks.
c) Explain briefly how tied ranks can be dealt with.
Question 11
The three tasks most frequently carried out in a garage are A, B and C. For each of the tasks the
times, in minutes, taken by the garage mechanics are assumed to be normally distributed with
means and standard deviations given in the following table.
Assuming that the times for the three tasks are independent, calculate the probability that;
d) the total time taken by a single randomly chosen mechanic to carry out all three tasks
lies between 533 and 655 minutes,
e) a randomly chosen mechanic takes longer to carry out task B than task C.
Question 12
In 1789, Henry Cavendish estimated the density of the earth by using a torsion balance. His 29
measurements follow, expressed as a multiple of the density of water.
a) Calculate;
4
ii. Use the data to obtain measures of central tendency
b) Using information provided below about the hydrocarbon levels and oxygen purity;
THE END
5
HINT
The regression equation is;
𝑦 ^ = 𝑎 + 𝑏𝑥
The exact equation is;
𝑦 = 𝑎 + 𝑏𝑥 + 𝑒
𝑒 = 𝑦 − 𝑦 ^ = 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙/𝑒𝑟𝑟𝑜𝑟
𝑦 ^ = 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
𝑦 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
𝑆𝑥𝑦
𝑏=
𝑆𝑥𝑥
𝑎 = 𝑦̅ − 𝑏𝑥̅
𝑛
𝑆𝑆𝐸
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑒𝑟𝑟𝑜𝑟 𝑆𝜀 2 =
𝑛−2
Also;
The coefficient of determination is used to measure the strength of the linear relationship.
Coefficient of determination = 𝑅 2 = 𝑟 2
Where r = correlation coefficient
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟=
√𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 √𝑛 ∑ 𝑦 2 − (∑ 𝑦)2
6
The Spearman Correlation
Spearman’s correlation coefficient is a statistical measure of the strength of a monotonic
relationship between paired data. In a sample it is denoted by rs or ρ and is by design constrained
as −1 ≤ 𝑟𝑠 ≤ 1
Its interpretation is similar to that of Pearson’s, e.g. the closer rs is to ±1 the stronger the
monotonic relationship.
Verbally, we can describe the strength of the correlation by use of absolute value of rs;
– 0 .00-0.19 “very weak”
– 0 .20-0.39 “weak”
– 0 .40-0.59 “moderate”
– 0 .60-0.79 “strong”
– 0 .80-1.0 “very strong”
The Spearman correlation is used in two general situations:
It measures the relationship between two ordinal variables; that is, X and Y and both consist of
ranks.
It measures the consistency of direction of the relationship between two variables. In this case,
the two variables must be converted to ranks before the Spearman correlation is computed.
The calculation of the Spearman correlation requires:
Two variables are observed for each individual.
The observations for each variable are rank ordered.
Note that the X values and the Y values are ranked separately.
After the variables have been ranked, the Spearman correlation is computed by either:
i. Using the Pearson formula with the ranked data.
ii. Using the special Spearman formula (assuming there are few, if any, tied ranks).
When there are no tied ranks:
6 ∑ 𝑑𝑖 2
𝜌=1−
𝑛(𝑛2 − 1)
Where di = difference in paired ranks and n = number of cases.
When there are tied ranks:
∑𝑖(𝑥𝑖 − 𝑥̅ ) (𝑦𝑖 − 𝑦̅)
𝜌=
√∑𝑖(𝑥𝑖 − 𝑥̅ )2 ∑𝑖(𝑦𝑖 − 𝑦̅)2
Where i = paired score
7
Reading Assignment
Read about how to test for the significance of a correlation
Question 13 (Not Compulsory)
The following data comprises 23 groundwater samples that were collected and analyzed to
determine the Uranium concentration (ppb) and the TDS (mg/L).
a) Plot a scatter diagram for the data
b) From the scatter diagram plotted, would you use the Pearson or Spearman’s method to
determine the correlation? Explain the reasons for your choice
c) Determine and explain the relationship between the 2 variables.
d) Investigate the significance of the correlation established
8
9
Question 14 (Not Compulsory)
Using the data provided below, construct rankings for each variable and calculate the rank order
correlation coefficient using Spearman's Rho. Determine if the correlation is statistically
significant. Show all work. Draw a conclusion related to the correlation.
Limitations of r
• Observe that seemingly high values of r e.g. r = 0.70, explain only about 50% of the
variability in the response variable y. So take care when interpreting correlation
coefficients.
• A low value for r does not necessarily imply absence of a relationship – could be a curved
relationship! So plotting the data is also important
• Tests exist for testing if there is no association. But depending on the sample size, even
low values of r e.g. r = 0.20 can give significant results – not a very useful finding!
10