0% found this document useful (0 votes)
60 views

Tutorial Sheet EN

This document provides an overview of key topics in statistics and probability, including: 1. Descriptive statistical concepts like statistical units, attributes, frequency distributions, and distribution functions. 2. Common measures used to describe distributions, such as the arithmetic mean, median, variance, and quantiles. 3. Fundamental probability concepts like random variables, probability distributions, and limit theorems. 4. Examples and exercises are provided to illustrate statistical and probabilistic calculations and analyses for real-world data.

Uploaded by

Valentin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Tutorial Sheet EN

This document provides an overview of key topics in statistics and probability, including: 1. Descriptive statistical concepts like statistical units, attributes, frequency distributions, and distribution functions. 2. Common measures used to describe distributions, such as the arithmetic mean, median, variance, and quantiles. 3. Fundamental probability concepts like random variables, probability distributions, and limit theorems. 4. Examples and exercises are provided to illustrate statistical and probabilistic calculations and analyses for real-world data.

Uploaded by

Valentin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Tutorial Statistics and Probability

Summer Term 2022

Contents
1 Statistical attributes and variables 2

2 Measures to describe statistical distributions 5

3 Two dimensional distributions 8

4 Linear regression 10

5 Combinatorics and counting principles 12

6 Fundamentals of probability theory 13

7 Random variables in one dimension 16

8 Multidimensional random variables 19

9 Stochastic models and special distributions 20

10 Limit theorems 23

11 Point estimation of population parameters 24

12 Interval estimation 25

13 Statistical hypotheses testing 27


1 Statistical attributes and variables

Comprehension questions

Explain in your own words :

• What is statistics about and what does descriptive statistics mean?

• What is a frequency density function and what a distribution function, and which (mathematical)
features do they have to fulfill? How are they linked to each other?

• What kind of information is easily given by a frequency density function and by a distribution
function?

Exercise 1
Consider the following attributes and give examples for their statistical unit and their characteristic
values. Indicate type and scale on which each attribute can be measured.
hair color body height
income weight
school grades religion
gender social affiliation
profession wealth
bank account transfers per month

Exercise 2
The owner of a kiosk quotes how many newspapers of a certain kind he has sold within the last 200
days.
number of newspapers number of days
sold
0 21
1 46
2 54
3 40
4 24
5 10
6 5
(a) What is the statistical unit in this case and what are the possible attributes? What type of
attributes would you consider here?
(b) Draw the distribution function for the given table.
(c) Give an interpretation of the distribution function at the place 2.

Exercise 3
Students are asked about the amount of their monthly rent expenses. The results are shown by the
distribution function below:

2
100
90
80
70
60
H̄(x)

50
40
30
20
10

250 300 350 400 450 500 550 600 650


rent expenses

(a) What percentage of the respondents do pay a rent between 350 and 450 EUR?
(b) What rent will at least be paid by the 20% of those with the highest rental expenses?
(c) Determine the median of the distribution.
(d) Draw the histogram of the distribution.

Exercise 4
The table below gives a summary of the results of the last statistics exam. A maximum of 100 points
could be achieved.
Points Number of exams
from . . . to below . . .
0− 25 50
25 − 50 90
50 − 75 170
75 − ··· 90
(a) Draw the histogram and the distribution function (appoximated by a polygon).
(b) How many students did achieve at most 90 points? Justify your answer and describe the necessary
assumptions.

Exercise 5
Consider the daily turnover X (in EUR) of „Luigi’s Pizza Service“ over 50 days given by an ordered
data set: x1 , x2 , . . . , x50
115.7 148.2 209.8 225.2 256.5 290.0 293.2 294.0 295.0 301.0
301.8 305.1 333.1 339.5 356.7 361.2 361.6 388.0 403.5 419.7
442.7 459.7 461.6 467.4 489.9 498.0 505.6 510.8 539.7 547.0
564.7 568.5 612.6 638.6 642.7 650.6 651.1 651.3 687.8 689.6
690.7 717.1 784.1 824.6 927.0 976.9 982.5 1016.2 1154.0 1197.6
(a) For the interval [100, 1200] and a class size of 100 prepare a table of (i) absolute frequencies, (ii)
relative frequencies and (iii) cumulative relative frequencies of daily turnovers.
(b) Depict the (empirical) density function of classes using a histogram.

3
(c) Graphically depict the related (empirical) distribution function of classes using an approximating
polygon.
(d) Determine the median of the distribution
i. using the given data set,
ii. using the distribution function of classes.
(e) On how many days was Luigi’s turnover 650 EUR or less? Again, answer the question
i. using the given data set,
ii. using the distribution function of classes.
Briefly explain the different answers to (i) and (ii).

4
2 Measures to describe statistical distributions
• One of the most important measures is the arithmetic mean. Which alternatives do you know
and when would you prefer one measure over another?

• What is the difference between a quartile and a quantile?

• Why is the variance defined by a square operator, why not only as the sum of differences to the
arithmetic mean, or why not only as the sum of absolute differences?

Exercise 6
Consider the table below for 9,114 lotto numbers drawn over a period of 25 years.
1 2 3 4 5 6 7
187 194 194 178 176 187 175
8 9 10 11 12 13 14
179 201 173 175 180 157 181
15 16 17 18 19 20 21
172 180 190 183 191 181 200
22 23 24 25 26 27 28
191 186 177 196 200 180 167
29 30 31 32 33 34 35
186 182 199 211 189 175 183
36 37 38 39 40 41 42
199 175 199 199 195 183 182
43 44 45 46 47 48 49
189 181 188 191 175 195 207
(a) Calculate the arithmetic mean of these 9,114 numbers.
(b) Given that possible lotto numbers range from 1-49 and are drawn uniformly, which number would
you expect as the average?

Exercise 7
In a software company the median of employees’ income is 3100 € and the arithmetic mean is 3400 €.
Due to increasing demand for specialists in programming and an increasing fluctuation rate of em-
ployees the HR division decides to increase income about 12% but only for their best employees. In
particular, before increasing income, the group of the best had the top 20% of incomes and earned
40% of overall income which the firm was paying to their employees.
Calculate arithmetic mean and median for the time after the increase of incomes.

Exercise 8
You go shopping for vitamins: In the first shop you see an offer of 12 oranges for 6 € and you buy
them. In the second shop you see an even better offer on 24 oranges for 6 €, and you buy them too. On
your way home, you see the best offer ever which is 36 oranges for 6 €. After regretting your previous
decisions you buy these oranges as well and leave for home. Once reached home, you summarize the
paid prices for each 12 oranges as
6€ , 3€ , 2€
respectively. But what is the average price you paid for your shopping trip? Calculate
(a) the unweighted arithmetic mean of the three prices you paid p̄,
(b) the unweighted harmonic mean Hp ,

5
(c) and the quantity weighted arithmetic mean.
Which of your calculations a), b) or c) do you think is the correct answer?

Exercise 9
Arithmetic, geometric and harmonic mean
(a) A racing cyclist drives for one hour at a speed of 60 km/h. Then, for one hour and 15 minutes
he drives at a speed of 50 km/h. What is his avereage speed altogether?
(b) During the last three years the yearly increase of salary in a company had been 1.8%, 2.5% and
2.0%. Calculate the average groth rate of salary.
(c) For a certain reporting day the employment agency indicates data on unemployment in five
districs as follows.
district 1 2 3 4 5
rate of unemployment in % 4 3 5 9 6
number of unemployed 1600 750 1000 3600 1500
Determine the average rate of unemployment over the five districts.
(d) For the solutions x1 , x2 of the quadratic equation below calculate

x2 − 15x + 49 = 0

i. the arithmetic mean x̄ ,


ii. the geometric mean GX ,
iii. the harmonic mean HX .
(Voluntarily: Can you solve this problem without actually calculating x1 and x2 ? Try it!)

Exercise 10
The famous psychologist A. Jensen conducts some test to evaluate the intelligence quotient (IQ) of
students. In particular, he considers two groups (w and sch) of individuals who have almost the same
age and gathers the following information:
IQ(w) 90 90 97 99 98 145 114 80 85 102
IQ(sch) 95 99 90 105 85 98 110 96 69 103
(a) Calculate the average IQ for the two groups and indicate for each participant the deviation from
the respective mean.
(b) In which of the two groups is the dispersion (variance) larger?

Exercise 11
Consider the following statistical sequences:
X 0 0 0 0 0 0 0 0
Y a a a a a a a a a
U 0 1 0 1 0 1 0 1
V 1 0 1 0 1 0 1 0 1
W 3 3 3 7 5 3 5 4
T 1 2 3 4 5 6 7 8 9
Calculate the variance for each of the 6 sequences.
(Hint: Don’t forget, that there is a formula for a simplified calculation.)

Exercise 12
Variance, standard deviation

6
Students in Europe and in the U.S. were asked for their monthly expenses on books. The result is
shown in the tables below:

Europe
expenses from 0 50 100 150 200 250
to below [EUR] 50 100 150 200 250 ···
midpoint of class [EUR] 25 75 125 175 225 275
relative frequency 0.1 0.2 0.3 0.2 0.1 0.1

U.S.
expenses from 0 50 100 150 200 250
to below [USD] 50 100 150 200 250 ···
midpoint of class [USD] 25 75 125 175 225 275
relative frequency 0.2 0.25 0.25 0.2 0.05 0.05

(a) Calculate variance, standard deviation and coefficient of variation for both data sets.
(b) Where do you observe a higher dispersion in expenses, in Europe or in the U.S.? Which measure
do you apply?

Exercise 13
Suppose you have a statistical sequence X whose mean is x = 12 and variance is s2X = 25. Now,
another sequence Y can be defined using X. In particular, each element of X will be multiplied by a
constant factor b = −2.5 and the constant number a = 4 will be added on top. Calculate y, s2Y and
sY .

Exercise 14
For a data sample of size n = 12 we observed the arithmetic mean x̄ = 9 and the standard deviation
sX = 2.5. Afterwards it became apparent that three values x13 = 8, x14 = 12, x15 = 13 were omitted.
What are arithmetic mean and standard deviation of the complete sample of size nnew = 15.

7
3 Two dimensional distributions
Comprehension questions
Explain in your own words :

• What is the intention of distributions with two or more dimensions?

• How do average and variance differ (in their calculations) from the case of univariate statistics?

• What does „statistical independency“ mean and why is it important?

• What is the difference between „statistical dependency“ and „correlation“?

Exercise 15
About twins: A psychologist conducts an experiment with the intention to measure the IQ of seven
identical twins who have been separated after birth. The following table summarizes his findings: the
first row gives the values (X) for the twin who grew up with his biological parents, whereas the second
row (Y ) gives the respective IQ for the twin who grew up with foster parents.
X 98 100 104 104 102 102 104
Y 94 94 103 105 99 102 103
Explore the relation of these numbers in a statistical manner:
(a) Show the statistical relationship in an appropriate plot, and interpret the resulting figure.
(b) Then, in order to check your claim and interpretation conduct calculations for the statistical
relationship of the numbers mentioned in the table.

Exercise 16
The sequences below are given. Calculate the following covariances and correlation coefficients

cXU , cY V , cV T , cU V , rU W , rY V and rV T .

X 0 0 0 0 0 0 0 0
Y a a a a a a a a a
U 0 1 0 1 0 1 0 1
V 1 0 1 0 1 0 1 0 1
W 3 3 3 7 5 3 5 4
T 1 2 3 4 5 6 7 8 9

Exercise 17
Employment: In Germany, the Federal Statistical Office is frequently publishing figures about the
distribution of employment. In particular, the table below gives an overview:
Type of employment
Self-employed Employees (dep.)
15 – 25 102 4080
25 – 35 604 6597
Age 35 – 45 1453 9672
classes 45 – 55 1272 7827
55 – 65 798 3643
65 – 95 273 246
(a) What are the statistical units, the basic population and attributes here?

8
(b) Determine the joint distribution and both marginal distributions for the attributes age class and
type of employment.
(c) Determine the two conditional distributions for the attribute age class. Draw a histogram of the
distribution of age class conditional on dependently employed people.
(d) Determine both conditional medians of the attribute age class.
(e) Determine both conditional means of the attribute age class.
(f) What are the proportions of self-employed and dependently employed being 55 years or older?
What proportion of those being 65 years or older is self-employed?

Exercise 18

(a) Consider the following contingency table with absolute frequencies of statistical variables (X, Y ).

y1 y2 y3
x1 10 5 3
x2 6 5 1

i. Determine the conditional relative frequencies

h1|X=x2 = relH(Y = y1 |X = x2 ) and h2|Y =y3 = relH(X = x2 |Y = y3 ).

ii. Are the variables X and Y independent?


(b) The marginal distributions of a two-dimensional frequency distribution of the independent vari-
ables X and Y are given. Calculate the (absolute) frequencies for all empty fields of the table.

y1 y2 y3 y4 y5
x1 20
x2 40
x3 60
x4 40
x5 40
50 30 10 70 40 200

Exercise 19
Consider the table of relative frequencies h(xi , yj ) for the characteristics X and Y for a total of n = 200
observation pairs.
h(xi , yj ) y1 = 0 y2 = 1 y3 = 4 Σ
x1 = 1 0.1
x2 = 2 0.2
Σ 0.4 1
Furthermore the conditional relative frequency relH(Y = y3 |X = x1 ) = 0.5 and the absolute frequency
absH(Y = y1 ) = 80 are known.
(a) Complete the frequency table.
(b) Check whether or not X and Y are statistically dependent or independent.
(c) Determine the number of pairs (x, y) of observations that satisfy x + y = 2.
(d) Calculate the covariance cXY of X and Y .

9
4 Linear regression
Comprehension questions
Explain in your own words :

• What is the purpose of linear regression analysis?

• Discuss how a regression line and causality are related to each other!

• Interpret the coefficient of determination R2 !

Exercise 20
A company produces fan heaters. The responsible accounting department gathers the following num-
bers for the year 2007:

Month Quantity produced (X) Overtime (Y )

January 3000 200


February 3200 250
March 2900 200
April 2700 150
May 2700 150
June 2800 150
July 2600 100
August 0 0
September 2500 50
October 2600 70
November 2800 150
December 3000 180
In August, the whole month was free for vacation.
(a) Calculate the correlation coefficient for production and overtime. Is there a linear statistical
relation?
(b) Check for this relation by conducting a linear regression analysis: Calculate both the regression
function (Y depending on X) and the reverse regression function. (That is the regression X
depending on Y .)
(c) Give a graphical overview for the paired points and the two resulting regression lines.
(d) Try to give an estimate for the quantity produced for a “usual” month where no overtime is done.

Exercise 21
In five households the following numbers for two attributes X and Y are collected:
Income of household in
X 40 45 60 80 75
thousand Euros per year
Monthly expenditures for dairy
Y 80 80 90 140 100
products in Euros

(a) Calculate s2X , s2Y and cXY .


(b) Determine if there exists a statistical relationship between those two factors. If so, indicate the
direction (positive or negative) and the intensity of this relation.

10
(c) Describe the relationship by means of a lienar regression.
(d) Is it possible to calculate a second plausible regression function? If so, calculate it and show both
graphs in a plot.
(e) Convey the intuition for both graphs by discussing their causation and effects.

Exercise 22
In order to calculate the linear regression line y = a + bx , the following figures are given:
n = 26 x = 20 y=3
x2 = 418 y 2 = 24 x · y = 77
Are these numbers sufficient to determine the regression coefficients? Calculate the coefficient of
determination. What do you recognize? Which inference do you draw from this?

Exercise 23
For the monthly returns X of a market index and the monthly returns Y of stock you observed the
data below:
month index return stock return
i xi [%] yi [%]
1 3.6 5.0
2 2.6 3.4
3 2.8 1.0
4 0.0 −0.8
5 −0.4 1.6
6 2.2 3.0
(a) Using an appropriate working table calculate the coefficient of correlation rXY .
(b) Calculate the parameters a and b for a linear regression Ŷ = a + b · X.
(c) Draw a scatter plot of the observed data for X and Y and add your regression line.
(d) Split the total variance of Y into a part, that is explained by the regression and an unexplained,
residual part. What is the value of the coefficient of determination R2 ?

11
5 Combinatorics and counting principles
Comprehension questions
Explain in your own words :

• What is combinatory and which purpose does it answer to?

• Which problems/questions are typically asked in this subject?

• Are you able to transfer (the methods of) combinatory to other issues? Give examples for
problems beyond your statistics and math classes.

Exercise 24
The English language has many words consisting of three letters, like cat, dog, man, kid, . . . How
many of such words can be formed with the second character being a vowel?

Exercise 25
You win in lottery (6 out of 49) if you have six correct numbers, five correct numbers plus bonus
(Zusatzzahl), five correct numbers, four correct numbers or three correct numbers. How many winning
combinations do exist in total? (Remark: These rules are somewhat outdated today.)

Exercise 26
A consulting company has twelve permanently employed consultants and five part-time consulting
professors. For a newly acquired project they plan to settle a team consisting of one supervisor, two
consultants and one professor. How many different consulting-teams can be formed?

Exercise 27
For the upcoming soccer World Cup national coach Jogi Löw plans to have 23 players in the national
team going to Russia. Three of them are goal keepers who are not allowed to be field players, however,
the rest have to be field players but not goal keepers. How many different combinations does the coach
have for setting a team for the finals? In order to calculate the result the following information is
given:
Two of the field players are injured and can’t play. Further, it is known that the coach prefers
a positioning of 3-5-2 (defense - midfield - front). For that, he has five strikers, seven players for
midfield, and six players for the defense.
Determine the number of possible combinations for the coach, (a) if the order inside an area (front,
midfield, and defense) does matter, and (b) if it does not?

Exercise 28
Peter has 12 friends and plans to invite 7 of them to a party.
(a) How many choices are possible if Alan and Bob are feuding and will not both go to the party?
(b) How many choices are possible if Alan and Betty insist that they both go together or neither one
of them goes to the party?

12
6 Fundamentals of probability theory
Comprehension questions
Explain in your own words :

• How did Laplace define probability? What is statistical probability and what does subjective
probability mean?
• Explain Bayes’ theorem and give an example for its application.

Exercise 29
A fair die will be played two times successively.
(a) Name each single element of the event space.
(b) What is the probability that the sum of numbers is 10 or larger given that the first throw was a
5?
(c) . . . , given that at least for one throw the die shows a 5?

Exercise 30
Six fair coins are tossed at the same time (or one fair coin six times). What is the probability that
“head” shows up exactly three times?

Exercise 31
Suppose you roll a red and a green die. Let the event A be “The red die shows a 2 or a 5” and B
“The sum of the two dice is at least 7.” Are A and B independent?

Exercise 32
Someone has got two children. Assume that the probability for the birth of boys or girls equals
0.5 respectively and that the births of boys and girls are independent of each other. What is the
probability that both children are boys, if
(a) there is no further information;
(b) it is known, that one child is a boy;
(c) it is known, that the older child is a boy?

Exercise 33
Every Saturday a student attends meetings with acquaintances, who live in places A, B and C. To
this she arrives at the railway station at a random point in time between 2 and 4 p.m. and takes the
next train that departs to one of the three places. (The student’s arrival time is uniformly distributed
over the period of time between 2 and 4 p.m.) What is the probability to visit the three places, if the
timetable below is in effect?
trains to departure
A 2:10 pm and 3:10 pm
B 2:30 pm and 3:30 pm
C 3:00 pm and 4:00 pm

Exercise 34
For the Events A and B the following probabilities are given:

P(A) = 0.5 P(B) = 0.3 P(A ∩ B) = 0.2

13
Calculate the following probabilities:
a) P(A ∪ B) b) P(A|B)
c) P(A ∩ B) d) P(A ∪ B)
e) P(A ∪ B) f) P(A ∩ B)

Exercise 35
The following multiple choice questions are given. Mark the correct statements with a cross. Note
that all, none, or some can be correct.
(a) If (A implies B) and P(A) > 0, then it follows:

( ) P(A|B) ≥ P(A)
( ) P(A|B) < P(A)
( ) P(A|B) ≥ P(B)
( ) P(B|A) > 0

(b) If (A and B are disjoint) and P(A) > 0, then it follows:

( ) P(A ∩ B) = P(A) · P(B)


( ) P(A ∪ B) = P(A) + P(B)
( ) P(A ∪ B) = 0
( ) P(A|B) = P(A)

Exercise 36
Suppose two machines (A and B) produce one and the same good. Further, it is known that A
produces 70% of the quantity, and B only 30%. While 8% of the production from A shows defects,
the production from B shows defects in 6%.
Suppose you randomly pick one of the goods and find that it is defective. What is the probability
that it was produced by machine A?

Exercise 37
5 out of 25 machined parts are faulty.
(a) If two parts are selected at random, and without replacement (no putting back), what is the
probability that the second part selected is faulty?

(b) If three parts are selected at random, and without replacement, what is the probability that the
third part selected is faulty?

Exercise 38

(a) In a specific group of people 4% of the men and 1% of the women are taller than 1.90 m. Moreover,
60% of the people in the the group are women. A person randomly picked from the group is
taller than 1.90 m. What is the probability that this person is a woman?

(b) Three ponds contain one, two or three fish, respectively:

14
You randomly choose a pond, you catch a fish out of the pond, mark the fish and release it
again. On the next day from the same pond you catch a fish, that is not marked. What is the
probability that the chosen pond contains
(i) one, (ii) two, (iii) three
fish?
(c) A box contains 5 red balls and 4 white balls. You randomly pick two balls one after another
without putting back. Determine the probability
i. that the second ball is white;
ii. that the first ball is white, when you know, that the second ball was white.

15
7 Random variables in one dimension
Comprehension questions
Explain in your own words :

• What is a random variable? How is the sample space Ω linked to the set of events E?
• What is the domain and co-domain of every distribution function?
• What is the relationship of probability mass functions/probability density functions of random
variables and frequency functions/histograms of a characteristic in a sample statistic.
• Explain the difference of expectation value and average value or mean.

Exercise 39
Which of the figures below show graphs of distribution functions of continuous random variables?
Provide a short reasoning for your answers.

Exercise 40
A professional basketball player throws the ball during training from a distance of seven meters. He
gets the ball into the basket with a probability of p = 50%. The random variable X is defined as the
number of hits in a series of four throws.
(a) Denote the probability mass function of this random variable.
(b) What are the values of mode, median, expected value and variance of X?
(c) Draw the graph of the distribution function.

Exercise 41
The distribution of a continuous random variable X is defined by the following probability density
function f : R → R+0 : 
2ax

 for 0 < x < 1
f (x) = 3a − ax for 1 ≤ x < 3


0 otherwise
Where a is a constant to be chosen properly.
(a) What must the value of a be?

16
(b) Sketch the graph of the probability density function.
(c) What are values of the following probabilities

P(X = 1) , P(0.5 < X < 2) , P (X < 2) ?

(d) What is the value of the conditional probability

P(X < 1|X < 0.5) ?

(e) Is the mode larger than the expected value?

Exercise 42
A continuous random variable X has the probability density function
(
2
9 x(3 − x) for 0 ≤ x ≤ 3
f :R→ R+
0 , f (x) =
0 otherwise.

(a) Sketch the graph of the probability density function.


(b) Calculate the probability P(X ≤ 2) .
(c) Determine the expected value.

Exercise 43
Consider the distribution function F : R → [0, 1] of a random variable X given below:


 0 for x < 1
 2

x−1
F (x) = for 1 ≤ x ≤ 3


 a
1 for 3 < x

(a) What is the value of the constant a?


(b) Sketch the graph of the distribution function F .
(c) Specify the density function of X and sketch its graph.
(d) Calculate the expectation value and variance of X.
(e) Calculate the quartiles Q1 , Q2 and Q3 of X. (These are the 25%, 50% and 75% quantiles of X.)

Exercise 44
Chebyshev’s inequality: If a continuous random variable X has an expected value of 10 and a variance
of 2, what are the values of the following probabilities at most or at least?
(a) P(8 < X < 12)
(b) P({X < 7} ∪ {13 < X})
(c) P(5 < X < 15)
(d) P({X < 2} ∪ {18 < X})

Exercise 45
Phil Taylor has designed a simple dartboard for his son Chris. This dartboard consists of four big
squares (side length of one square d = 4 units) and a bull’s eye (circle with radius r = 1 unit) in the
middle:

17
If Chris throws a dart on the board, he receives the corresponding scores (Assumption: Chris scores
always but randomly and will never throw outside of the board). The random variable X corresponds
to the score of one throw.
(a) Specify the probability mass function of the random variable X.
(b) What is the distribution function of X?
(c) Calculate expected value, variance and standard deviation of X.

18
8 Multidimensional random variables
Comprehension questions
Explain in your own words :
• What do joint probability distribution and marginal probability distribution mean?
• What does „stochastically independent“ mean and how can you examine the stochastic indepen-
dence of two random variables?
• Give examples of a multidimensional random variable.

Exercise 46
The mass function of a two-dimensional random variable (X, Y ) is described by the following table
Y
2 3 4
1 1 1
10
9 9 9
1 1
X 20 0
6 6
1 2 1
30
18 9 18
On the top are the possible values of Y , on the left side are the possible values of X, inside the
table are the corresponding values of the joint probability mass function.
(a) Calculate both marginal probability distributions.
(b) Calculate the distribution of X under the condition of Y = 4.
(c) Calculate V(X) and V(Y ).
(d) Calculate the covariance and the coefficient of correlation.
(e) Are X and Y stochastically independent?

Exercise 47
A bet is said to carry 3 to 1 odds if you win $3 for each $1 you bet. What must the probability of
winning be for this to be a fair bet? (Fair means: The expectation of your profit & loss balance is
zero.)
In a popular gambling game, three dice are rolled. For a $1 bet you win $1 for each 6 that appears
(plus your dollar back). If no 6 appears you lose your dollar. What is the expected value of your profit
& loss?

Exercise 48
Of the joint mass function of a random vector (X, Y ) we know:
f (xm , yn ) y1 = −5 y2 = 0 y3 = 5
x1 = 3 0.15 0.3
x2 = 4 p 0.1
0.4
(a) Determine p in a way such that X and Y are uncorrelated.
(b) Using your result from a) calculate the values:
E(X · Y ) , V(X) , E[(X + 3)2 ]

19
9 Stochastic models and special distributions
Comprehension questions
Explain in your own words :
• How is an arbitrary normal distribution linked to the standard normal distribution? Can a nor-
mally distributed random variable be transformed into a standard normally distributed variable?
• Name some well-known distributions.
• Is it possible to give an exact density function for a real random variable like a stock return?

Exercise 49
A software on your PC generates random numbers in the half-open interval [0, 1). It promises, that
every number 0 ≤ x < 1 comes out “with same probability”. One uses random numbers among other
things for the simulation of the behavior of stochastic systems.
(a) Consider the random numbers as realisations of a continuous random variable X. Draw its
probability density function and distribution function and calculate the mean value and the
variance.
(b) In reality your PC is a digital computer, so that the continuum [0, 1) is not completely available
for you, rather its random numbers possess only five decimal places behind the comma. Specify
the mass function and calculate the mean value and the variance of these discrete random variable
Y.
(c) What can you do, if you need random numbers Z with mean value 0 and variance 1, but your PC
has only this one random generator as described in a)? Specify the probability density function
of Z.

Exercise 50
With two in ten home visits Mr. Kaiser sells on average a life insurance. He is very diligent and does
exactly 16 home visits every day.
(a) Calculate the mean value and variance of the number of sold life insurances per day.
(b) What is the probability that Mr. Kaiser sells at most 3 life insurances during a working day?
(c) However, some times Mr. Kaiser becomes tired and does only 12 home visitings. If so, on how
many percent of those working days does he sell more than 10 life insurances?

Exercise 51
For lack of time a student cannot prepare for a multiple-choice-exam consisting of 20 questions and
decides to tick off answers randomly. She knows that every question has five options and that exactly
one of them is correct.
(a) What is the distribution of the random variable describing the number of the student’s correct
answers? How many questions will she answer correctly on average?
(b) The exam is passed, if at least ten correct answers are given. What is the probability the student
passes the exam? What should that threshold be, if the probability to pass the exam randomly
was larger than 5%?

Exercise 52
Using Chebyschev’s inequality
(a) estimate of the probability at the most, that with ten coin tosses head occurs less than twice or
more than eight times and

20
(b) compare this estimation with the true value of the probability in question.

Exercise 53
Approximately 1 230 000 deposit accounts of employees are managed in a large bank. A survey carried
out in the year 2000 showed, that the annual saving per account has a mean value of 400 € and
a standard deviation of 200 € (annual saving = change of deposit balance; negative values denote
decrease) and is almost normally distributed.
Using this information calculate the numbers of deposit accounts of employees with an annual saving
of
(1) below 0€
(2) between 0 € and 200 €
(3) between 200 € and 300 €
(4) between 300 € and 400 €
(5) between 400 € and 600 €
(6) between 600 € and 800 €
(7) above 800 €

Exercise 54
A normally distributed stochastic variable X has the expectation µ = 18 and the standard deviation
σ = 4.
(a) The standardized variable Z = X−18 4 is then standard normally distributed. How large is the
probability that the variable Z falls into the interval [−1, 1]?
(b) Specify the corresponding interval, centered around µ, into which the random variable X falls
with this probability.
(c) How large is the probability, that X falls into the interval (10, 26]?

Exercise 55
The random variable X is normally distributed X ∼ N µ, σ 2 with µ = 50 and σ 2 = 25. Determine


a, b and c from:

P(X < a) = 0.6 ; P(X ≥ b) = 0.75 ; P(|X − 50| < c) = 0.6 .

Exercise 56
The random variable X ist binomially distributed X ∼ Bi(45; 0.4). Calculate the probability P(10 ≤
X ≤ 12) :
(a) exactly
(b) [using the central limit theorem (with continuity correction)]
(c) without continuity correction

Exercise 57
A normally distributed stochastic variable W has the expectation µ = −5 and the standard deviation
σ = 0.5.
(a) The transformed variable U = 2 · (W + 5) is therefore standard normally distributed. Into which
symmetrical intervals around zero does U fall with a probability of 80%, 90%, 95%, 100% ?
(b) Specify the symmetrical intervals around −5, into which the random variable W falls with these
probabilities.

21
Exercise 58
For an exam with a maximum number of achievable points of 100 the results are approximately
normally distributed with µ = 60 and σ = 10.
(a) Determine the percentage of students, that failed the exam, if they needed at least 50 points to
pass.
(b) Determine the percentage of students, that achieve the grade good, if this is awarded for scores
from 80 to 95 points (inclusively).
(c) Which value has to be fixed for the minimal number of points needed to pass the exam, if no
more than 10% of the students shall fail?
Note: Apply continuity correction.

22
10 Limit theorems
Comprehension questions
Explain in your own words :

• Does the central limit theorem put requirements on the original distribution of random variables?

• What does convergence with probability 1 mean? For which law do we need this definition?

Exercise 59
The distribution of a metric attribute X in a very large statistical population is unknown. However,
it is known that the mean value is µ = 1700 and the standard deviation is σ = 144. Now, one takes a
sample of size n = 200.
(a) What is the expectation of the mean value of this sample?
(b) What is the standard deviation of the mean value of this sample?

Exercise 60
A large company receives 120 applications on positions advertised for absolvents of universities. Con-
sider the applicants as a random sampling from all the corresponding absolvents of an age-group with
40% women.
(a) How large is the probability that the proportion of women is between 35% and 45% of applicants?
Hint: Approximate the number of female applicants using a normal distribution.
(b) How large is this probability when the correction for continuity is considered?

Exercise 61
An insurance company has n = 1 000 000 clients. The probability that a client has an insurance claim
1
in a given year is p = 100 .
(a) Compute expectation and variance for the number of insurance claims within a given year.
(Assume independence of the individual claims.)
(b) Find a symmetrical interval centred on the expectation that contains the number of insurance
claims in a given year with a probability of 95%.
(c) Find an upper boundary for the number of claims in a given year that is only exceeded with a
probability of 5%.

23
11 Point estimation of population parameters
Comprehension questions
Explain in your own words :

• What does unbiasedness mean? What is consistency? What is efficiency?

• How do you compare two different estimators? Which criterion do you use?

• You can define many different estimators. Is there one estimator which is always “correct”?

Exercise 62
An industry sector consists of N = 12100 individual companies. For lack of an official profit statistics,
a random sampling with a size of n = 225 individual companies is set for the annual profit G. From
the random sampling in 2016 we get

g = 600 000 € und sG = 90 000 €.

(a) Give a point estimation for the parameters µ and σ of the distribution of annual profit in the
population.
(b) How large is the standard deviation of the estimation of the mean value?

Exercise 63
A machine produces ball bearings. We draw a random sampling of n = 225 ball bearings from the
weekly production of this machine. In the sampling, we find as average weight of a ball bearing

g = 0.824 kg with sG = 0.005 kg.

(a) How large is the standard deviation of the mean value of the sampling? Is this an estimation or
a true value?
(b) Give a numerical point estimation for the true mean value.

Exercise 64
Suppose n random variables are independent from each other and identically distributed. Imagine that
only the expected value µ is known to you, but not the variance σ 2 . Three estimators are proposed:
n
2 1X
σ̂A = (Xj − µ)2
n j=1
n
2 1 X
σ̂B = (Xj − µ)2
n − 1 j=1
2 1 2 
σ̂C = X1 + X22 + Xn2 − µ2
3

(a) Which estimators are unbiased?


(b) One of the estimators is not consistent, which one is it?
(c) How large are the biases of the three estimators (for n = 9)?
(d) Would you prefer the third estimator to the second one (again for n = 9)?

24
12 Interval estimation
Comprehension questions
Explain in your own words :

• What is the connection between the probability of error and the probability of confidence?

• What is a confidence interval?

• What is the connection between the normal distribution and the chi-square-distribution?

Exercise 65
The characteristic X of a large population has the frequency function
(
1
5 for x = 0, 1, 2, 3, 4
h(x) =
0 otherwise .

A random sampling is drawn from this population.


(a) Specify numerically mean value and variance of the basic population.
(b) Specify the distribution of the sampling mean X (2) for a sample size n = 2 completely.
(c) Calculate E(X (2) ) and V(X (2) ) using this distribution.
(d) Calculate the same variance without explicit use of this distribution.
(e) Calculate P(2.5 < X (2) ≤ 3.5) for a sampling size of n = 2.
(f) Calculate P(2.5 < X (50) ≤ 3.5) for a sampling size of n = 50 (approximation).

Exercise 66
An industry sector consists of N = 12100 individual companies. For lack of an official profit statistics,
a random sampling with a size of n = 225 individual companies is set for the annual profit G. From
the random sampling in 2016 we get

g = 600 000 € and sG = 90 000 €.

Give an interval estimation for µ for a probability of error α = 4, 55% and for this probability of error,
an interval estimation of the total income of this industry sector (extrapolation).

Exercise 67
The mean value µ of a normal distribution with the variance σ 2 = 9 shall be estimated. A sampling
with size n = 100 yields the mean value x = 53.97 .
(a) Specify a 95%-confidence interval for µ.
(b) How large should the sample size be, so that one gets a 95%-confidence interval with length 0.4?
(c) How large should the sample size be, so that one gets a 99%-confidence interval with length 0.4?

Exercise 68
The distribution of individual IQ is approximately normal and has a mean of 100 and a variance of
225.
(a) What is the probability of a person you randomly meet having an IQ of more than 130?

25
(b) Select 100 test persons randomly and determine the average IQ of this group, namely iq. Specify
numerically the values E(IQ) and V(IQ). What is the probability of the mean value iq being in
the interval between 98 and 103?

Exercise 69
Let X be a continuous random variable with density
1

for x > 1
f (x) = x2

0 otherwise

(a) Compute the expectation of X.


(b) Find the median of X.
(c) Find a 95% confidence interval of the form [1, a]. In other words, find a such that P(1 ≤ X ≤
a) = 95% holds.

Exercise 70
Of the normally distributed yearly returns of a listed stock the sample values below are given:
2%, −4%, 3%, −2%, −1%
Determine an interval estimation with confidence level 1 − α = 98% for the expectation µ of the yearly
stock returns.

Exercise 71
For 7 cycles the losses of a bank’s credit portfolio were observed [in MU]:
x1 x2 x3 x4 x5 x6 x7
560 532 550 525 540 530 515
Assume that x1 , . . . x7 represent a sample from a normally distributed population. Determine confi-
dence intervals with a confidence level of 1 − α = 90% for
(a) the expectation µ, when the variance is known: σ 2 = 95, 5 [MU]2 ,
(b) the variance σ 2 , when it is not known.
(c) How large needs the sample size in a) to be chosen, so that the length of the confidence interval
is at most 10 MU while keeping the confidence level at 1 − α = 90%?

Exercise 72

(a) A restaurant obtains a delivery of many crayfish. To save time, and since crayfish are rather
delicate a sample of size n = 5 is taken to determine the avarage weight of the crayfish. The
following values (in g) were observed:
100; 103; 104; 106; 112.
Determine
i. a 99% confidence interval for the average weight µ and
ii. a 95% confidence interval for the variance of the weight σ 2 .
(b) A large sample was taken out of the current production process of a certain product. The sample
revealed a mean weight of x = 150 g and a standard deviation of s = 28 g. At a significance
level of 10% a confidence interval for the average weight µ had the boundaries 143.4 g and 156.6 g.
What was the size n of the sample, if you assume, that a normal distribution was applied?

26
13 Statistical hypotheses testing
Comprehension questions
Explain in your own words :

• What do type 1 and type 2 error mean?

• When will the normal distribution be used and when the t-distribution?

Exercise 73
For the control of the ongoing production of a machine, small samples of size n = 5 are undertaken
hourly, the thickness of the work pieces are measured and the arithmetic mean is calculated out of it.
From long-term experience, it is known that the variance of the thickness stays unchanged with
σ 2 = 0.45 mm2 and the thickness is very well normally distributed. The mean thickness of the work
pieces, however, varies occasionally, and the machine then needs to be adjusted.
Whether a re-adjustment is necessary or not should be decided by the small sample.
(a) What is the standard deviation of the sample mean?
(b) Is your statement in a) an estimation?
(c) What is the deviation of the sample mean from the target value µ0 that would cause you to stop
and re-adjust the machine, given you want to ensure that the machine is stopped erroneously
only with a probability of α = 0.03 (type 1 error)?

Exercise 74
A sample is taken out of a normal distributed population and yields the following numbers:

12.7 13.3 13.0 12.9 13.1

(a) Given this sample, is it possible to reject the null hypothesis at a significance level of α = 0.05
that the true mean µ of the basic population has a value of µ0 = 12.83?
(b) Suppose you would know that the variance of the overall population is σ 2 = 0.036. Is it now
possible to reject the null hypothesis?

Exercise 75
An industry association claims that the average annual profit per individual company in the year 2016
is no more than 600 000 €. It takes in a hurry a small random sample of only n = 36 companies and
finds
g = 636 000 € and sG = 90 000 €.

With this sample, the null hypothesis of the industry association should be tested at a significance
level of α = 1%.
(a) What is the result if it can be assumed that the true value of the standard deviation is also
σG = 90 000 € in 2016?
(b) What is the result if it must be assumed that the variance may have changed and needs to be
estimated?
Hint: Assume a normal distribution for the statistical population in a) and b).

27
Exercise 76

(a) For a stock A we assume returns to be normally distributed and observe the sample values below:

1.512 1.477 1.464 1.439 1.440 1.427 1.406 1.396 in %

The variance of returns has a value of 0.01 ˜ (square of volatility). At a significance level of
1% test the hypothesis that for the unknown mean value µ of the returns it holds:

H0 : µ ≤ 1.40 % vs. H1 : µ > 1.40 %.

(b) For a stock B we assume returns to be normally distributed and observe the sample values below:

8.4 9.3 8.1 8.3 8.8 9.0 8.3 in %

The variance of returns has a value of 0.5 ˜ (square of volatility). At a significance level of 5%
test the hypothesis that for the unknown mean value µ of the returns it holds:

H0 : µ = 8% vs. H1 : µ 6= 8%.

Exercise 77
For the return of the stocks A and B you observe the sample values below:

A 6.8 6.1 4.9 7.1 5.4 5.9 5.1 in %


B 7.8 8.2 7.2 6.9 8.4 in %

Assume normally distributed returns and at a significance level of 5% test the hypothesis: For the
unknown mean values µA (stock A) and µB (stock B) of the returns it holds:

H0 : µA = µB vs. H1 : µA 6= µB .

Exercise 78

(a) The changes in value of a portfolio are assumed to be normally distributed. You observe the
sample (in MU) below:
5 0 1 3 −4 in MU
At a significance level of 5% test the hypothesis: For the variance σ 2 of the changes in value it
holds:
H0 : σ 2 = 10 MU2 vs. H1 : σ 2 6= 10 MU2 .

(b) The changes in value of a portfolio are assumed to be normally distributed. You observe the
sample (in MU) below:

3 −5 0 2 7 −4 in MU

At a significance level of 5% test the hypothesis: For the variance σ 2 of the changes in value it
holds:
H0 : σ 2 ≤ 10 MU2 vs. H1 : σ 2 > 10 MU2 .

28
Exercise 79
[Optional exercise, slightly beyond the scope of the script]
In its annual report, the ADAC published that according to the survey of all new cars sold in Germany
in 2011, 21% had defects within the first year.
Among the cars sold in the first half of 2012, a random sample of size n = 400 was considered, and
it showed that only 17.4% had defects within the first year. The ADAC claimed that this noticeable
improvement in the quality was due to its journalistic activity.
Can the null hypothesis, that the quality has not changed, be rejected with a probability of error
of 5%?

29

You might also like