0% found this document useful (0 votes)
22 views76 pages

Erg PRFG

The document provides an overview of fundamental mathematical concepts including: 1) Important sets of numbers such as natural numbers, integers, rational numbers, and real numbers. 2) Basic set operations including intersection, union, difference, and complement. 3) Notation for factorials, binomial coefficients, and the binomial theorem.

Uploaded by

azmat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views76 pages

Erg PRFG

The document provides an overview of fundamental mathematical concepts including: 1) Important sets of numbers such as natural numbers, integers, rational numbers, and real numbers. 2) Basic set operations including intersection, union, difference, and complement. 3) Notation for factorials, binomial coefficients, and the binomial theorem.

Uploaded by

azmat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Contents

I Mathematics 1 1
1 Sets, numbers, transformation of terms 1

2 Solving equations, inequalities and systems of equations 10

3 Vectors 19

4 Matrices and determinants 25

5 Functions 29

6 Measures of change 32

7 Trigonometry and applications in the right triangle 32

8 Basic concepts of differentiation and integration 36

9 Introduction to probability theory and statistics 55

Part I
Mathematics 1
1 Sets, numbers, transformation of terms
1. Important sets of numbers:

• The natural numbers

N := {0, 1, 2, 3, 4, . . . }.

Some authors define the natural numbers without 0, in this case


N := {1, 2, 3, 4, . . . }. Sometimes the notation N0 is used to em-
phasize that in this case 0 is included in the set.

1
• The integers

Z := {. . . , −3, −2, −1, 0, 1, 2, 3, . . . } = N ∪ {0} ∪ {−n | n ∈ N}.

We see, N is a subset of Z,

N ⊆ Z.

• The rational numbers


a
Q := { | a ∈ Z and b ∈ N, b ̸= 0}.
b
As for any a ∈ Z we have a = a1 , it holds that Z is a subset of Q,
the set of integers is a subset of the set of fractions, hence

N ⊆ Z ⊆ Q.

The decimal representation of a rational number is either finite,


e.g. 27 = 3.5, −18
8
= −2.25, or infinite but with periodic digits, e.g.
1
3
= 0.33333 . . . = 0.3̇, −10
7
= −1.428571428571 . . . = −1.428571.
• The set of irrational numbers is the collection of the numbers
whose
√ decimal representation is neither finite nor periodic, e.g.
2 = 1.41421 . . . , −π = −3.14159 . . . .
• The real numbers

R := Q ∪ {r | r is an irrational number}.

Every point on the numberline represents a real number:


18 10 1 √ 7
−π - 8 - 7 3 2 2

−5 −4 −3 −2 −1 0 1 2 3 4 5

As Q ⊆ R, we have

N ⊆ Z ⊆ Q ⊆ R.

2
R Q Z N

2. Finite sets can be denoted by listing their elements between { and }.


For a finite set A, we introduce the notion of the size of a set: |A| is
the number of elements contained in A. Examples
• The set of vowels in the (latin) alphabet
{a, e, i, o, u}, with |{a, e, i, o, u}| = 5.

• The set of natural numbers between 5 and 10


{n ∈ N : 5 ≤ n ≤ 10} = {5, 6, 7, 8, 9, 10}, with |{5, 6, 7, 8, 9, 10}| = 6.

• The set of federal provinces in Austria


|{V, T, Sbg, St, K, B, W, Nö, Oö}| = 9.

• The empty set ∅ = { }, with |∅| = 0.


3. Set operations.
(a) Intersection. If A and B are sets, then A ∩ B is the set that
contains all the elements, which are contained in A and contained
in B.

A B

3
Example:
A := {1, 2, c, d}, B := {car, coffee, 1, 2}, A ∩ B = {1, 2}.

(b) Union. If A and B are sets, then A ∪ B is the set that contains
all the elements, which are contained in A or in B (or both).

A B

Example:
A ∪ B = {1, 2, c, d, car, coffee}.
(c) Set Difference. If A and B are sets, then A \ B is the set that
contains all the elements, which are contained in A but not in B.

A B

Example:
A \ B = {c, d}.
(d) Power Set. If A is a sets, then P(A) is the set that contains all
subsets of A. Example:
P(A) = {∅, {1}, {2}, {c}, {d}, {1, 2}, {1, c}, {1, d}, {2, c}, {2, d}, {c, d},
{1, 2, c}, {1, 2, d}, {1, c, d}, {2, c, d}, A} .
If A is finite and |A| = n, then P(A) is finite and |P(A)| = 2n .
(e) Set complement. If A and B are sets, such that A ⊆ B, then we
can define the set complement of A under B, denoted by AC , as
the set of elements of B that are not contained in A.

4
B
A

Example:

A := {1, 2, 3}, B := {1, 2, 3, 4, 5, }, AC = {4, 5}.

If it is implicitly clear, with respect to which set set we are taking


the complement, we might just write “the complement of A”, i.e.
we omit to mention B here.
4. Using powers of ten to represent large or small numbers.
• 1cm2 = 0.0000000001km2 = 1 · 10−10 km2 .
• The mass of the earth is around 5.9 · 1024 kg. The mass of the sun
is around 3.3·105 times the mass of the earth. As 3.3∗5.9 = 19.47,
the mass of the sun is around 1.947 · 1024+5+1 kg = 1.947 · 1030 kg.
5. Transformation of terms:
• Newtons second law:
F = ma.
Assuming m > 0, we conclude
F
a= .
m
• The energy of an object in Motion can be described as the sum of
potential energy and kinetic energy.
1
E= · m · v 2 + m · g · h.
2
Again, for m > 0, we have
s  
E
v=± 2 −g·h ,
m

5
where in most cases only v ≥ 0 will be considered a solution. If
we are interested in obtaining m instead of v, we note
 
1 2 1 2
·m·v +m·g·h= v + g · h m,
2 2
hence for 21 v 2 + g · h ̸= 0, we have
E
m= 1 2 .
2
v +g·h

• Assuming v ̸= v0 and t > 0, solve the following expression for v:


s
t= .
v − v0
We get (v − v0 )t = s, hence v − v0 = st , thus v = s
t
+ v0 .
6. For n ∈ N, the factorial of n, denoted by n!, is defined as 0! = 1 and
k
Y
k! = k · (k − 1)! = k · (k − 1) · · · 2 · 1 = i
i=1

for k ̸= 0. Example:
• 3! = 3 · 2! = 3 · (2 · 1!) = 3 · (2 · (1 · 0!)) = 3 · 2 · 1 = 6.
• 10! = 10 · 9 · 8 · · · 2 · 1 = 3628800.
7. The binomial coefficients, usually denoted as pair of numbers, e.g. nk


“n choose k”, are the positive integers that occur as coefficients in the
binomial theorem
n  
n
X n n−i i
(a + b) = a b.
i=0
i

Example:
• Case n = 2:
2  
2
X 2
(a + b) = a2−i bi
i=0
i
     
2 2−0 0 2 2−1 1 2 2−2 2
= a b + a b + a b = 1 · a2 + 2 · ab + 1 · b2 .
0 1 2

6
• Case n = 3:
3  
3
X 3
(a + b) = a3−i bi = a3 + 3a2 b + 3ab2 + b3 .
i=0
i

n

For given n, k ∈ N0 , with n ≥ k, we calculate k
via
 
n n!
= .
k k! · (n − k)!

We have

• 20 = 2! 2

0!·(2−0)!
= 2
= 1,
• 21 = 2! 2

1!·(2−1)!
= 1
= 2,
• 22 = 2! 2

2!·(2−2)!
= 2
= 1,
• 30 = 3! 6

0!·(3−0)!
= 6
= 1,
• 31 = 3! 6

1!·(3−1)!
= 2
= 3,
• 32 = 3! 6

2!·(3−2)!
= 2
= 3,
• 33 = 3! 6

3!·(3−3)!
= 6
= 1.

Important properties of the binomial coefficients are

• nk = 0, if n < k or n < 0 or k < 0,




• n0 = nn = 1 for all n ∈ N0 ,
 

• nk = n−kn
 
for all n, k ∈ N0 ,
• nk = n−1 n−1
  
k
+ k−1
for all n, k ∈ N \ {0}.

8. Elementary Combinatorics:

• If A, B are finite sets and |A| = n and |B| = k then there are nk
possibilities to choose one element from each set, i.e. |A×B| = nk.
Example: If A := {soup, rice, noodles} and B = {water, juice}
then we have 3 · 2 = 6 possibilites to choose a meal.

7
• If A is a finite set and |A| = n, then there are n! different pos-
sibilites to put (all) its element in order. These orderings are
called permutations of A. Example: Consider the following set
of Books: A := {Math, History, German} then we can put these
books in 3! = 6 different ways into a bookshelf.
– Math, History, German,
– Math, German, History,
– History, Math, German,
– History, German, Math,
– German, Math, History,
– German, History, Math.
The general idea is we make a list:
– in the first place of the list, we can put any of the n elements,
– in the second place of the list, we can put any of the n − 1
remaining elements,
– in the third place of the list, we can put any of the n − 2
remaining elements,
– ...,
– for the last place of the list, there is only 1 remaining element.
Thus, we get n · (n − 1) · (n − 2) · . . . · 1 = n!.
• (Variation without repetition: order important, no re-
peat) If A is a finite set with |A| = n and k ≤ n, then there are
n!
(n−k)!
different possibilities to order k (different) elements from A.
To see this, we again imagine making a list of length k:
– in the first place of the list, we can put any of the n elements,
– in the second place of the list, we can put any of the n − 1
remaining elements,
– in the third place of the list, we can put any of the n − 2
remaining elements,
– ...,
– for the last place of the list, there are only n − k + 1 remaining
elements.

8
Hence, we get

n · (n − 1) · (n − 2) · . . . · (n − k + 1) =
n · (n − 1) · (n − 2) · . . . · (n − k + 1) · (n − k) · (n − k − 1) · . . . · 1
=
(n − k) · (n − k − 1) · . . . · 1
n!
.
(n − k)!

• (Variation with repetition: order important, repeat al-


lowed) If A is a finite set with |A| = n and k ∈ N then there are
nk different variations to put elements from A in k possible spots.
To see this, we again imagine making a list of length k:
– in the first place of the list, we can put any of the n elements,
– in the second place of the list, we can (again) put any of the
n elements - repetition is now allowed,
– in the third place of the list, we can (again) put any of the n
elements - repetition is now allowed,
– ...,
– in the last place of the list, we can (again) put any of the n
elements - repetition is now allowed.
Hence, we get
· . . . · n} = nk .
|n · n · n{z
k−times

• (Combination without repetition: order not important,


no repeat) If A is a finite set with |A| = n and k ≤ n, then there
n

are k different possibilities to choose k (different) elements from
A. To see this, we first notice the following: there are always
k! different orderings of a list of k elements (permutations, see
above). We then take ordered lists of length k of the n elements,
n!
this yields (n−k)! different lists (variation without repetition). Al-
ways k! of these lists contain the same elements, just in a different
order. Hence the number of combinations, i.e. when we do not
care about order, is
n!  
(n−k)! n! n
= = .
k! k!(n − k)! k

9
• (Combinations with repetition: order not important, re-
peat allowed) If A is a finite set with |A| = n and k ∈ N then
there are n+k−1 n+k−1

k
= n−1
different ways to select k elements of
A, if the order is not important and repetition is allowed. To see
this, we imagine a table with the elements of A and marks “+”
that indicate how often we want to select the element. (Important:
“how often” and not “in what order”.) E.g.
x1 x2 x3 ... xn
++ + +
We can translate the above table into a string

+ + | | + | ... | + .

(Take the first element 2 times, the second 0 times, the third 1
time, . . . , the n-th 1 time.) Each string represents exactly one
combination with repetition. These strings always have length
n + k − 1, the n − 1 bars | and the k marks + to indicate that
the corresponding element is chosen. If we just have a look at the
string, we have n + k − 1 places to put either
– n − 1 bars (in this case, the places where to put the k marks
is already determined),
– or k marks (in this case, the place where to put the n − 1 bars
is already determined).
This means we get n+k−1 n+k−1
 
k
, which is the same as n−1
, different
strings. As each string represents exactly one combination with
repetition, weget that the number of combinations with repetition
is also n+k−1
k
.

2 Solving equations, inequalities and systems


of equations
1. Important notions:

• The so-called universe (or underlying set) U is the set of admissi-


ble values for the variable(s). If no universe is explicitly specified,
we assume U = R.

10
• The so-called domain of definition D consists of those values such
that the equation/inequality etc. is an admissible expression, thus
it is a subset of the universe.
• The so-called solution set L is the set of solutions, hence it is a
subset of the domain. The solution set can be empty, i.e. it can
happen that L = ∅, in this case we say “The equation/inequality
etc. has no solution”.

2. Linear equation in one variable:

• Solve in R: x − 3 = 4.
Solution: We have U = R, D = R and L = {7}.
• Solve: x1 = 4.
Solution: We have U = R, D = R \ {0} and L = { 41 }.
• Solve in Z: x · 2 = 7.
Solution: We have U = Z, D = Z and L = ∅.
• Solve: 2x − 3 = 4 + x − 1 + x.
Solution: We have U = R, D = R and L = R.
• Solve in R: x − 3 = 4.
Solution: We have U = R, D = R and L = {7}.
• Such problems can also be posed as a mathematical word problem.
E.g. “There is a number, such that: if you take three times this
number, you get the same result you would get, if you add 2 to
this number.”
This can be understood as: Solve in R: 3x = x + 2.
Solution: We have U = R, D = R and L = {1}.

3. Linear inequality in one variable:

• Solve in R: x − 3 ≥ 4.
Solution: We have U = R, D = R and L = [7, ∞).

3 4 5 6 7 8 9 10 11 12

• Solve: x1 < 2.
Solution: We have U = R, D = R \ {0}.
In order to get the solution, we consider two cases, either x < 0

11
or x > 0:
In the first case, we solve 1 > 2x, in the second case, we solve
1 < 2x.
We get two intermediate solution sets L1 = (−∞, 0) and L2 =
( 12 , ∞)
The result is L = L1 ∪ L2 .

−3 −2 −1 0 1 2 3

• Again, as before, such inequalities can be posed as mathematical


word problems.

4. Quadratic equations:

• Solve: x2 + 8x − 9 = 0.
We have U = R, D = R, either
– we notice
x2 + 8x − 9 = (x − 1)(x + 9),
hence L = {−9, 1},
– we solve using the so-called p-q- formula
s 
2
8 8
x=− ± − (−9) = −4 ± 5,
2 2

hence L = {−9, 1},


– we solve using the so-called a-b-c-formula
p
−8 ± 82 − 4 · 1 · (−9) −8 ± 10
x= =
2·1 2
hence L = {−9, 1}.
The p-q-formula is applicable if the quadratic equation is of the
form
x2 + px + q = 0,
and the candidates for the solution are the solutions of
s 
2
p b
x=− ± − q.
2 2

12
The a-b-c-formula is applicable if the quadratic equation is of the
form
ax2 + bx + c = 0,
and the candidates for the solution are the solutions of

−b ± b2 − 4ac
x= .
2a

Geometric interpretation:

x
−9 1

• Solve: 5x − 2x2 = x2 − 2.
Solution: We have U = R, D = R. First we bring this into the
form −3x2 + 5x + 2 = 0. Then we solve using a-b-c-formula.
p √
−5 ± 25 − 4 · (−3) · 2 −5 ± 49 −5 ± 7
x= = = ,
2 · (−3) −6 −6
hence L = − 13 , 2 .


Geometric interpretation:

13
x2 − 2

x
− 13 2

x
− 13 2

5x − 2x2

• Depending on the discriminant d, which is the term “under” the


square root, we may consider the following situations:
– Case d > 0: two solutions in R.
– Case d = 0: one solution in R.
– Case d < 0: no solutions in R.
5. Using the binomial formulae
• (a + b)2 = a2 + 2ab + b2 ,
• (a − b)2 = a2 − 2ab + b2 ,
• (a + b)(a − b) = a2 − b2 ,
and the fact that a product of any real numbers is eqal to 0, if and only
if at least one of the factors is equal to 0, i.e. for x, y ∈ R it holds that
x · y = 0 implies x = 0 or y = 0 (or both).
Example, if we want to solve x2 − 4 = 0, we solve (x − 2)(x + 2) = 0,
hence either x − 2 = 0 or x + 2 = 0. Thus the two solutions x = 2
and x = −2. Using the above mentioned p-q-formula yields the same
result.

14
6. Solving equations of degree 3 and 4: Again using the fact that a product
of real numbers is equal to 0 if and only if one of the factors is equal to
0, we are also able to solve some equations of degree 3 and degree 4.
• Example: Solve x3 − x = 0. We factor out x and solve x(x2 − 1) =
0. We recognize x2 −1 = (x+1)(x−1) and solve x(x+1)(x−1) = 0,
hence the three solutions x = 0, x = 1, x = −1.
• Example: Solve x4 − x2 = 0. We factor out x2 and solve x2 (x2 −
1) = 0. We recognize (again) x2 − 1 = (x + 1)(x − 1) and solve
x · x · (x + 1)(x − 1) = 0, hence the three solutions x = 0, x =
1, x = −1. Here one solution, namely x = 0 “appears two times”.
• Example: Solve x4 −16 = 0. We recognize x4 −16 = (x2 +4)(x2 −4)
and solve (x2 + 4)(x2 − 4) = 0. Using for example the p-q-formula,
we see that x2 + 4 = 0 has no solutions, we emphasize that this
means there is no real number x, such that x2 + 4 = 0. On the
other hand, x2 − 4 has the solutions x = 2 and x = −2, hence
x4 − 16 has the solutions x = 2 and x = −2.
7. We are also able to solve the following more complicated looking equa-
tion of degree 4.
41 81
x4 − x2 + = 0.
8 256
We use the following trick: we rewrite x2 = y, i.e. we first solve the
following equation of degree 2
41 81
y2 − y+ = 0,
8 256
which probably yields some solutions for y, and then, for each of these
soltutions, we solve x2 = y. Using the p-q-formula, we get
s 
2
41 41 81
y= ± −
16 16 256
r
41 1681 81
= ± −
16 256 256
r
41 1600
= ±
16 256
41 40
= ± .
16 16

15
1 81
Hence we get the intermediate solutions y = 16 and y = 16 . Now, as
2 2 1 2 81
we set x = y, we solve both x = 16 = 0 and x = 16 . The first
equation yields the solutions x = 14 and x = − 14 , the second equation
yields x = 49 and x = − 94 . Hence in total, we got 4 solutions.

8. Systems of linear equations in two variables:

• Solve in R × R, i.e. x ∈ R and y ∈ R:


5x −2y =1
3x +3y =9
We have U = R × R and D = R × R. Transforming the first
equation to
1 + 2y
x= ,
5
and replacing x in the second equation with this expression we get
1 + 2y
3· + 3y = 9,
5
thus y = 2, hence x = 1. We have a unique solution (x, y) = (1, 2),
hence L = {(1, 2)}.
Geometric interpretation: two lines intersecting.

16
5x − 2y = 1

(1, 2)

3x + 3y = 9
x

• Solve:
5x −2y =1
10x −4y = −12
As no universe is explicitly specified, we assume x ∈ R and y ∈ R,
hence we take U = R × R and D = R × R. Transforming the first
equation to
1 + 2y
x= ,
5
and replacing x in the second equation with this expression we get
1 + 2y
10 · − 4y = −12,
5
hence
2 + 4y − 4y = −12,
finally
2 = −12.

17
As this statement is a contradiction, we have no solution, thus
L = ∅.
Geometric interpretation: two parallel lines.
10x − 4y = −12
5x − 2y = 1

• Solve:
5x −2y =1
10x −4y =2
We have U = R × R and D = R × R. Transforming the first
equation to
1 + 2y
x= ,
5
and replacing x in the second equation with this expression we get
1 + 2y
10 · − 4y = 2,
5
hence
2 + 4y − 4y = 2,

18
finally
2 = 2.
As this is a tautology, we have infinetely many solution, thus
 
5 1
L = (x, y) ∈ R × R | y = · x − .
2 2
Geometric interpretation: two identical lines.
5x − 2y = 1

10x − 4y = 2

• We put emphasis on the fact that students must be able to ap-


ply the underlying solution strategies also on mathematical word
problems.

3 Vectors
1. Using the common notation R2 := R × R, a two-dimensional vector
v ∈ R2 , v = (x, y) can be interpreted geometrically as both, the point

19
with the coordinates (x, y), and the arrow from the origin to the point
(x, y). Sometimes vectors are denoted using an arrow, e.g. ⃗v .

y-Axis

x
(x, y)

x-Axis

2. The length of a vector v ∈ R2 , usually denoted by ||v|| (or |v|), can be


computed using the Pythagorean theorem:
p
||v|| := x2 + y 2 .
√ √
Example: ||(1, 2)|| = 12 + 22 = 5.

3. Two vectors v1 , v2 ∈ R2 , with v1 = (x1 , y1 ) and v2 = (x2 , y2 ), can be


added, the resulting vector v1 + v2 = (x1 + x2 , y1 + y2 ).

y-Axis
v2
v1

v1 v1 + v2

v2
x-Axis

4. Furthermore, we can subtract one vector from another v1 − v2 = (x1 −


x2 , y1 − y2 ).

20
y-Axis

−v2
v1
v1 − v2 v2
v1 x-Axis
−v2

y-Axis

v1 v1 − v2

v1 − v2 v2
x-Axis

5. Moreover, we can scale vectors, i.e. stretch or shrink, by multiplying


the vector with a real number. Therefore, in this context, real numbers
are also called scalars, i.e. one might say ”We multiply a vector with a
scalar”. Example: if v ∈ R2 with v = (x, y), then 3 · v = (3x, 3y).

y-Axis y-Axis

v1

v1 3 · v1

v1
x-Axis x-Axis

21
Scaled vectors point in the same direction. Vectors that point in the
same direction are often called parallel.

6. Given a vector v ∈ R2 , with v ̸= (0, 0), another vector v0 ∈ R2 with


the properties

(a) ||v0 || = 1
(b) v and v0 are parallel
1
can be obtained by v0 := ||v|| · v. This is called normalizing v. Vectors
with length 1 are called unit vectors. Example: If v = (1, 2), then
v0 = √15 · (1, 2) = ( √15 , √25 ).

7. The scalarproduct ⟨·, ·⟩ : R2 × R2 → R is a function that maps two


vectors to a real number. For v1 , v2 ∈ R2 , with v1 = (x1 , y1 ) and
v2 = (x2 , y2 ) we have

⟨v1 , v2 ⟩ := x1 x2 + y1 y2 .

Example: ⟨(2, 3), (4, −1)⟩ = 2 · 4 + 3 · (−1) = 5.


Geometric interpretation: If v2 is a unit vector, then ⟨v1 , v2 ⟩ is the
(oriented) length of the projection of v1 onto v2 .

• The value of ⟨v1 , v2 ⟩ is positive, if the angle between v1 , v2 is acute.


• The value of ⟨v1 , v2 ⟩ is negative, if the angle between v1 , v2 is
obtuse.
• Important: The value of ⟨v1 , v2 ⟩ is equal to 0, if the angle between

v1 , v2 is 90 .

22
y-Axis

v1

· x-Axis
v2

The scalar product is also often called dot product or inner product and
denoted v1 .v2 or v⃗1 · v⃗2 .
8. We use the fact that two vectors v1 , v2 ∈ R2 are perpendicular, if and
only if ⟨v1 , v2 ⟩ = 0, to find given a vector v2 , such that for a given
vector v1 , we have that v1 and v2 are perpendicular. One candidate
is v21 := (−y1 , x1 ), as ⟨v1 , v21 ⟩ = x1 (−y1 ) + y1 x1 = 0 in this case.
Another candidate is v22 := (y1 , −x1 ), as ⟨v1 , v22 ⟩ = x1 y1 +y1 (−x1 ) = 0.
Geometric interpretation: The vector v21 is v1 tilted to the left and v22
is v1 tilted to the right.

y-Axis

v1
v21 ··
x-Axis
v22

23
9. We are able to define lines using vectors by collecting all points that
we can reach adding a scaled version of a given vector to a given point.
Given a point P ∈ R2 and a vector r ∈ R2 we set

g := {P + λ · r | λ ∈ R}.

The vector r is often called the direction vector of g.

The line g is sometimes denoted using the notation g : X = P + λ · r,


this is sometimes called the parametric form of the line g.

10. Some examples for problems that can be solved using vectors are:

• Given two distinct Points P, Q ∈ R2 , determine the parametric


form of the line through P and Q.
• Given a line g through P ∈ R2 , determine the parametric form of
a line through P , which is perpendicular to g.

24
• Given two distinct Points P, Q ∈ R2 , determine {X ∈ R2 | ||X −
P || = ||X − Q||} (perpendicular bisector).
• Given two lines g, h in parametric form, determine if they intere-
sect, are parallel or describe the same set.

11. The parametric form of a line g : X = P + λ · r can be transformed into


the so-called explicit form g : y = kx + d and vice-versa (if r ̸= µ · (0, 1)
for µ ∈ R):
If P = (xP , yP ) and r = (xr , yr ), we can compute k = rrxy and determine
d by solving yP = kxP + d.
If g is given in explicit form, we can set r := (1, k) and P = (0, d).

4 Matrices and determinants


1. A system of linear equations in two variables can also be written (and
solved) in matrix form. A matrix can be understood as a table of
numbers. In order to transform a system of equations with unknowns
x, y in matrix form, we first write the coefficients of x in the first column
of the matrix and the coefficients of y in the second column of the
matrix. E.g. the left hand side of the equations

ax +by =k
cx +dy =ℓ

can be represented by the matrix


 
a b
.
c d

In order to represent the system of equations, the right hand side is


added to the matrix. Often a vertical line is used to seperate coefficients
from the values on the right hand side. Continuing the example this
results in  
a b k
.
c d ℓ
Such matrices are often called augmented matrices.
In order to solve the system, we are now allowed to

25
• Swap rows,
• divide/multiply rows by any real number λ ∈ R \ {0},
• add multiples of another row to any row.

Performing (possibly several) such steps, we want to obtain a matrix


of the form  
1 0 m
.
0 1 n
As this again represents a system of equations, we can read off the
solution x = m, y = n. If we cannot obtain such a matrix using the
above described steps, but we obtain a row of the form

0 0 n,

then either the system of equations has no solutions, or it has infinitely


many solutions. If in this case n ̸= 0, then the system has no solutions,
if n = 0, then depending on the other row, this system has infinitely
many solutions. E.g.  
1 0 3
0 1 0
has no solutions,  
1 0 3
0 0 0
has infinitely many solutions and
 
0 0 1

0 0 0

has no solutions.

(a) Solve in R × R, i.e. x ∈ R and y ∈ R:


5x −2y =1
3x +3y =9
We have U = R × R and D = R × R.
Transforming the system of equations to matrix form gives
 
5 −2 1
,
3 3 9

26
which, by dividing the second line by 3, can be transformed to
 
5 −2 1
.
1 1 3
Swapping the first and the second row yields
 
1 1 3
,
5 −2 1
and subtracting 5 times the first row from the second we get
 
1 1 3
.
0 −7 −14
We divide the second row by −7, thus we have
 
1 1 3
,
0 1 2
finally we subtract the second row from the first and get
 
1 0 1
.
0 1 2
This represents the system of equations
1x +0y = 1
0x +1y = 2
Hence we can read off the solution x = 1, y = 2.
(b) Solve for x ∈ R and y ∈ R:
5x −2y =1
10x −4y = −12
We have U = R × R and D = R × R. Transforming the system of
equations to matrix form gives
 
5 −2 1
.
10 −4 −12
We subtract 2 times the first equation from the second and get
 
5 −2 1
.
0 0 −14
This represents the system of equations

27
5x -2y = 1
0x +0y = −14
This is a contradiction, as 0 ̸= 14, hence we conclude there is no
solution, hence L = ∅.
(c) Solve:
5x −2y =1
10x −4y =2
We have U = R × R and D = R × R. Transforming the system of
equations to matrix form:
 
5 −2 1
.
10 −4 2

We subtract 2 times the first equation from the second and get
 
5 −2 1
.
0 0 0

This corresponds to the system of equations


5x -2y = 1
0x +0y = 0
The second equation is a tautology, thus
 
5 1
L = (x, y) ∈ R × R | y = · x − .
2 2

2. For a 2 × 2 matrix A,  
a b
,
c d
the determinant of A is defined as det(A) = ad − bc. We can use
the determinant to determine how many solutions a given system of
equations has: We already know

ax +by =k
cx +dy =ℓ

28
can be represented by the matrix
 
a b
.
c d

If det(A) ̸= 0, then the system has a unique solution; if on the other


hand det(A) = 0, then the corresponding system has either no solutions
or infintely many solutions. We already saw that this depends on the
right hand side of the equations.

5 Functions
1. Let A, B ̸= ∅. We say f : A → B is a function from A to B, if and only
if each a ∈ A is assigned exactly one f (a) ∈ B. The mapping rule is
denoted by a 7→ f (a). In this case, A is called domain of definition, or
simply domain, B is called codomain of f . Let us for example consider
the following function:
n
f : N → Q, n 7→ .
3
In this example, we see the domain of f is N, the codomain is Q. The
mapping rule is n3 , hence 1 7→ 13 , 2 7→ 23 , 3 7→ 33 , . . . .

2. Often one deals with functions, with the property that their domain
and codomain are both subsets of the real numbers R. Such functions
are often called real functions.

3. Sometimes the function is given implicitly by the function term. E.g.


f (x) = x2 . In this case, we assume the domain to be the largest subset
of R, such that the function term is well defined. Let us consider for
example f (x) = x2 , then we have that the domain of f is R. If for
example g(x) = x1 , we have that the domain of g is R \ {0}.

4. For a function f : A → B, the graph of f is the set {(a, f (a)) : a ∈ A}.

5. We need to be able to determine the function term, the domain and to


roughly plot the graph of the following important types of real func-
tions:

• constant functions: f (x) = a, for a ∈ R.

29
• linear functions: f (x) = ax + b, for a, b ∈ R and a ̸= 0,
• quadratic functions: f (x) = ax2 + bx + c, for a, b, c ∈ R and a ̸= 0,
• other polynomial functions (polynomials):
n
X
f (x) = ai xi = an xn + an−1 xn−1 + · · · + a1 x + a0 x0 ,
i=0

for n ∈ N, an , an−1 , . . . , a1 , a0 ∈ R and an ̸= 0,


• power functions: f (x) = axb , for a ∈ R and b ∈ Z,

• root functions: f (x) = xb , for a ∈ N and b ∈ Z,
a

• exponential functions: f (x) = a · bx , for a, b ∈ R and b > 0, b ̸= 1,


• logarithmic functions: f (x) = loga x, for a ∈ R and a > 0, a ̸= 1,
• sine and cosine function: f (x) = sin(x) and f (x) = cos(x).

6. The degree of a polynomial f is the largest number n, such that the


coefficient an of xn is non-zero. A convention is that the degree of the
zero polynomial, f (x) = 0, is equal to −∞.

7. A real number x0 is called zero of a polynomial f , if f (x0 ) = 0.


Example: let
f (x) = x2 − 1.
Then the degree of f is 2 and as f (1) = 0, we have that 1 is a zero of
f . Furthermore, as f (−1) = 0, we have that also (−1) is a zero of f .

8. A very important statement, related to the fundamental theorem


of algebra is that a non-zero polynomial f of degree n has at most n
zeros. That is the number of real solutions x of the equation f (x) = 0
is at most n. In the previous example, f (x) = x2 − 1, we saw that the
degree of the polynomial and the number of zeros were both equal to
2.

9. Given two real functions f, g, s.t. the codomain of g is a subset of the


domain of f , we can determine the composition of f and g: (f ◦g)(x) =
f (g(x)). Examples:

• Let f (x) = 2x and g(x) = x2 + 1, then f (g(x)) = 2x2 + 2,

30
• Let f (x) = sin(x) + cos(x) and g(x) = 3x4 , then
f (g(x)) = sin(3x4 ) + cos(3x4 ),
• Let f (x) = ex and g(x) = ln(x), then f (g(x)) = x.
10. Given a function f which is of one of the types, which are described
above, we are able to investigate, how the graph changes, if we perform
the following transformations:
• f (x) ,→ c · f (x),
• f (x) ,→ f (x) + c,
• f (x) ,→ f (c · x),
• f (x) ,→ f (x + c),
for c ∈ R.
11. Given the graph, or a section of the graph, of a function f we are able
to discuss monotonicity and local as well as global extrema. I.e. we
understand the notion of
• (strictly) increasing functions,
• (strictly) decreasing functions,
• local/global minima and maxima of a function.
12. To analyze exponential and logarithmic functions, we sometimes need
to use calculation rules for exponentials and logarithms, such as
ab
ab · ac = ab+c , = ab−c , . . .
ac
13. Curve sketching. Given a function f by function term, we can deter-
mine the following properties of f and its graph respectively:
(a) domain and codomain,
(b) zeros of f ,
(c) (symmetry),
(d) extrema (local/global minima/maxima),
(e) poles (e.g. f (x) = x1 has a pole at x = 0),
(f) monotonicity,
(g) (periodicity).

31
6 Measures of change
Considering a real function f and a, b ∈ R in the domain of f , we imagine
to “replace” a by b and analyze how the corresponding function values f (a)
and f (b) change using the following measures of change: (assuming all the
terms are well defined)

1. absolute change sometimes denoted by ∆f : ∆f = f (b) − f (a),


f (b)−f (a)
2. relative change: f (a)
,
f (b)−f (a)
3. mean/average rate of change: b−a
, if f is a function of x, this can
be denoted as ∆f
∆x
,
f (b)
4. factor of change: c = f (a)
, i.e. we determine a number c ∈ R, s.t.
f (b) = c · f (a).

7 Trigonometry and applications in the right


triangle
1. The unit circle {(x, y) ∈ R2 : x2 + y 2 = 1} is the circle around the
origin with radius 1. Given an angle α ∈ [0, 2π), we determine the
corresponding point p on the circle.

32
y-Axis

α x-Axis

The sine of α is the y-coordinate of the corresponding point p, the


cosine of α is the x-coordinate of the corresponding point p.

y-Axis

1
sin(α)

α · x-Axis
cos(α)

By the Pythagorean Theorem we immediatly see


sin2 (α) + cos2 (α) = 1,

33
for all α ∈ [0, 2π). If α ≥ 2π, we subtract 2π as often as necessary,
such that the result is again an angle between 0 (inclusive) and 2π
(exclusive). This corresponds to the notion of “going around the circle”
more than 1 time. If α < 0, we add 2π as often as necessary, such
that the result is again between 0 (inclusive) and 2π (exclusive). This
corresponds to the notion of “going around the circle” in the opposite
direction. Hence the formula above holds in fact for all α ∈ R.

2. Let us consider the circle with radius r ∈ R+ : {(x, y) ∈ R2 : x2 + y 2 =


r2 }, an angle α and again the corresponding point p on the circle with
radius r. If we compare the resulting triangle with the triangle we get
in the unit circle, we see these two triangles are similar, i.e. one of the
triangles is just a scaled version of the other. Hence we get that the
y-coordinate of p is r sin(α) and the x-coordinate of p is r cos(α).

y-Axis

r
r sin(α)
1
α x-Axis
r cos(α)

3. Following the considerations above, we notice the following: Assume


we know p = (x, y), then we are able to compute the sine and cosine

34
y
value of the corresponding angle α by sin(α) = r
and cos(α) = xr .
4. We are able to use trigonometry in many applications. Assume for
example we are given a parallelogram with side-lengths a, b and an
angle α. We can compute the height ha :

D C

b
ha = b sin(α)
α
A a B

Hence the area of the parallelogram can be determined via the formula
A = ab sin(α).
5. Considering again the unit circle, for an angle α ∈ [0, 2π), we compare
sin(α) to sin(−α) and cos(α) to cos(−α):

y-Axis

α x-Axis
−α

We notice: sin(−α) = − sin(α) and cos(−α) = cos(α).


6. Considering again the unit circle, for an angle α ∈ [0, 2π), we compare
sin(α) to sin(α − π) and cos(α) to cos(α + π):

35
y-Axis

π α x-Axis

We notice: sin(α + π) = − sin(α) and cos(α + π) = − cos(α).

7. Another important property is the fact that we can express the cosine
function via the sine function:
π
cos(α) = sin(α + ).
2

8 Basic concepts of differentiation and inte-


gration
1. For b−a ̸= 0, we already considered the average rate of change f (b)−f
b−a
(a)
.
This expression is also called difference quotient. The difference quo-
tient is the slope of the line (secant) through the points A = (a, f (a))
and B = (b, f (b)).

36
y-axis
f (x)

B
f (b)

A
f (a)
x-axis
a b

y-axis

(b, f (b))

f (b)−f (a)
b−a
(a, f (a)) 1
x-axis

2. If we now consider a sequence b1 , b2 , b3 , . . . with b → a, i.e. lim bn = a,


n→∞
we imagine that the sequence of difference quotients
f (bn ) − f (a)
bn − a
describes more accurately the slope of the function f at a. The limit
of the difference quotients (if it exists)
f (bn ) − f (a)
lim
n→∞ bn − a

37
is called the differential quotient of f at a, denoted by f ′ (a).

3. Geometrically, the sequence of secants approaches their limit, the tan-


gent on f in a.

y-axis

f (x)

(a, f (a)) f ′ (a)


1
x-axis

We notice, f ′ (a) can be interpreted as the slope of f at a. If f ′ (a) > 0,


the function f is increasing near a, if f ′ (a) < 0, then f is decreasing
near a.

4. The function x 7→ f ′ (x), i.e. each value x gets mapped to the value of
the differential quotient of f at x, is called the derivative of f .

5. Some calculation rules for derivatives:

(a) For constant functions f (x) = a, a ∈ R, it holds that f ′ (x) = 0.


Constant functions are precisely those function where the slope is
0 everywhere.
(b) For functions f (x) = axb , with a ∈ R and b ∈ Z, it holds that
f ′ (x) = b · axb−1 . This is sometimes denoted as (axb )′ = baxb−1 .

38
(c) For functions f (x) = a · ex , a ∈ R, it holds that f ′ (x) = a · ex .
This is sometimes denoted as (aex )′ = aex .
(d) The derivative of sin(x) is cos(x) and the derivative of cos(x) is
− sin(x). This is sometimes denoted as

sin′ (x) = (sin(x))′ = sin(x)′ = cos(x)

and
cos′ (x) = (cos(x))′ = cos(x)′ = − sin(x).
(e) If f ′ (x) is the derivative of f (x) and g ′ (x) is the derivative of
g(x), then f ′ (x) + g ′ (x) is the derivative of f (x) + g(x). This rule
is sometimes called the sum rule. Hence if e.g.
n
X
h(x) = ai x i ,
i=0

then n
X

h (x) = iai xi−1 .
i=1

(f) If f ′ (x) is the derivative of f (x) and g ′ (x) is the derivative of g(x),
then f ′ (x) · g(x) + f (x) · g ′ (x) is the derivative of f (x) · g(x). This
rule is sometimes called the product rule. Some authors refer to
this rule as the Leibnitz rule (for differentiation). E.g.

(sin(x) cos(x))′ = sin(x)′ cos(x) + sin(x) cos(x)′


= cos2 (x) − sin2 (x)
(= cos(2x)).

Let a ∈ R be constant, then another useful rule, which follows


from the product rule is (a · f (x))′ = a′ · f (x) + a · f ′ (x) = 0 ·
f (x) + a · f ′ (x) = a · f (x).
(g) If f ′ (x) is the derivative of f (x) and g ′ (x) is the derivative of
g(x), then f ′ (g(x)) · g ′ (x) is the derivative of f (g(x)). This rule is
sometimes called the chain rule.
Using abx = aeln(b)x and the chain rule, we are able to compute a

39
derivative of h(x) = abx . We set f (x) = ex and g(x) = ln(b) · x.
Hence f ′ (x) = ex and g ′ (x) = ln(b). Thus

h(x) = a · f (g(x)).

By the product rule and by the chain rule we get

h′ (x) = a · f ′ (g(x)) · g ′ (x)


= a · eln(b)x · ln(b)
= a · ln(b) · bx .

6. Higher derivatives: If g(x) = f ′ (x) is the derivative of f (x) and g ′ (x)


is the derivative of g(x), then we have

f ′′ (x) = (f ′ (x))′ = (g(x))′ = g ′ (x),

i.e. g ′ (x) is the second derivative of f (x).

7. A very important property is that if f (a) is a local extrema, then


f ′ (a) = 0. I.e. given a function f (x), by solving f ′ (x) = 0 yields
candidates for local extrema.

8. Examples for applications:

• In physics:
– distance-time diagrams, if s(t) is distance travelled by time t,
then v(t) = s′ (t) is speed at time t,
– speed-time diagrams, if v(t) is speed at time t, then a(t) =
v ′ (t) is acceleration at time t.
Remark 1: it is very common to write ṡ(t) instead of s′ (t) and
v̇(t) instead of v ′ (t), i.e. to replace the usual symbol “ ′ ” by a “ ·
” if the variable denotes time.
Remark 2: the letter v for speed is used because often one is not
only interested in the time rate at which an object is moving, but
also the direction. Velocity is the rate and direction of an object’s
movement.
• In economy: if f (x) is a cost/benefit function, depending on the
quantity x, we can ask for optimal values.

40
9. Given a function f (x), we computed g(x), s.t. f ′ (x) = g(x) and we
called g(x) the derivative of f (x). On the other hand if we consider a
function g(x), and find a function f (x) with f ′ (x) = g(x), then we call
f (x) an antiderivative of g(x). Most authors emphasize the property of
being an antiderivative by using capital letters, e.g. an antiderivative
of f (x) is denoted by F (x). Examples:

• As v(t) = ṡ(t), we have that v(t) is the derivative of s(t), hence


s(t) is an antiderivative of v(t).
• As a(t) = v̇(t), we have that a(t) is the derivative of v(t), hence
v(t) is an antiderivative of a(t).

10. Let c ∈ R be any number, if F (x) is an antiderivative of f (x), then


(F (x) + c)′ = F ′ (x) + c′ = f (x) + 0 = f (x), hence also the function
F (x) + c is an antiderivative of f (x). (We used the sum rule and the
rule that tells us how to differentiate constant functions here.)

11. When looking at a velocity-time diagrams, given a speed function v(t),


one might not only be interested in the acceleration at a given point of
time t, but also in the average speed during some points in time t1 and
t2 , with t1 < t2 . We use knowledge about derivatives and antideriva-
tives:

• Speed is the derivative of distance, hence, for a given time t, v(t)


is the current rate of change of s(t) at time t.
• We already know how to compute the average rate of change of
distance in an interval (t1 , t2 ) with t1 < t2 , namely s(t2t2)−s(t
−t1
1)
.

Thus we get
s(t2 ) − s(t1 )
vavg = .
t2 − t1
We can generalize this: given a function f (x) with antiderivative F (x)
and an interval (a, b) ⊂ R, then the average function value favg in the
non-empty interval (a, b) can be computed as

F (b) − F (a)
favg = .
b−a

41
12. If F (x) and G(x) are both antiderivatives of f (x), we immediately see
that it makes no difference if we use F (x) or G(x) to calculate the
average function value favg . We already saw if F (x) and G(x) are
antiderivatives, then there is some c ∈ R, s.t. G(x) = F (x) + c. Hence

G(b) − G(a) (F (b) + c) − (F (a) + c)


=
b−a b−a
F (b) − F (a) + c − c
=
b−a
F (b) − F (a)
= .
b−a
In essence: If F (x) and G(x) are any antiderivatives of a given function
f (x), then F (a) − F (b) = G(a) − G(b) for any a, b ∈ R. Of course, the
value pof favg > still depends on the function f (x) and the interval
(a, b) we consider. To emphasize this property of independence of the
actual antiderivative, we introduce the notion of the integral of f (x)
from a to b. Z b
(F (a) − F (b)) = f (x) dx. (1)
a

Formula (1) is one side of the coin of the so-called fundamental the-
orem of calculus, i.e. it states that we can compute the integral of
f (x) by plugging in the upper and the lower bound into an antideriva-
tive and subtracting the results.
Using this notation, the average function value of f (x) between a < b
can expressed as
Z b
1
favg = f (x) dx. (2)
b−a a

13. In many applications, we will be just given a function f (x), i.e. we


will not always know an antiderivative F (x). But we still might be
interested in calculating average function values. In fact, we do not
need to know an explicit antiderivative, we saw that we just need to be
able to compute the integral of f (x). From formula (2), we get
Z b
f (x) dx = favg · (b − a).
a

42
As for a given interval (a, b) and a given function f (x), we can interpret
the term (b − a) as length and favg as height, the integral of f (x) is the
(possibly negative) area of the rectangle with side length b − a and the
height favg .

y-axis
f (x)

Z b
1
f (x) dx = favg
b−a a

x-axis
a b

y-axis
f (x)

x-axis
a b

We consider now to split the interval (a, b) in smaller and smaller sub-
intervals, but each with even length. In each of those sub-invervals,
we assume we are able to compute the average function value, and we

43
multiply it by the sub-interval length. favg can then be computed by
taking the average of those sub averages.

y-axis y-axis
f (x) f (x)

x-axis x-axis
a b a b
y-axis y-axis
f (x) f (x)

x-axis x-axis
a b a b

The total of the red areas is the same for all four pictures above, it is
equal to Z b
(b − a) · favg = f (x) dx.
a
We also notice that by considering smaller and smaller sub-intervals,
the red area more and more accuretely describes the (signed) area be-
tween the graph of f (x) and the x-axis. We conclude: the integral of
f (x) from a to b is the signed area between the graph of f (x) and the
x-axis in the interval (a, b).

44
y-axis
f (x)

Z b
f (x) dx
a

x-axis
a b

14. In order to approximately calculate the integral of f (x) from a to b, we


proceed as follows:

• We fix n ∈ N, with n ≥ 1.
• Similar as before, we split the interval (a, b) in n parts. Each part
has length h = b−a
n
.
• The left border of the i-th interval, denoted by xi , is given by
xi = a + (i − 1) · h, the right point is given by a + i · h = xi + h,
for i = 1, 2, . . . , n. Note that a = x1 and b = xn + h.
• In each interval (xi , xi + h), we can find values xi:min and xi:max ,
such that f (xi:min ) is a minimum and f (xi:max ) is a maximum on
(xi , xi + h).
• We compute
n
X n
X
f (xi:min ) · h and f (xi:max ) · h,
i=1 i=1

the lower sum and upper sum of f (x) in the interval (a, b).

Example with n = 8, a = −4, b = 4, hence h = 1:

45
y-axis

f (x)

x-axis
a b

y-axis

f (x)

x-axis
x1 x2 x2 x4 x5 x6 x7 x8 b

46
y-axis

f (x)
f (xi:min )

x-axis

y-axis

f (x)
f (xi:max )

x-axis

47
y-axis

f (x)
lower sum

x-axis

y-axis

f (x)
upper sum

x-axis

We observe that always


n
X Z b n
X
f (xi:min ) · h ≤ f (x) dx ≤ f (xi:max ) · h.
i=1 a i=1

As n, the number of sub-intervals, gets larger, the lower sum gets larger
and the upper sum gets smaller. Hence
Xn Z b Xn
lim f (xi:min ) · h = f (x) dx = lim f (xi:max ) · h.
n→∞ a n→∞
i=1 i=1

48
15. For constant and linear functions we are already able to compute the
integral:

• Constant functions f (x) = c for c ∈ R. We have


Z b Z b
f (x) dx = c dx = (b − a) · c. (∗)
a a

y-axis

f (x) = c

Z b
f (x) dx = (b − a) · c
a

x-axis
a b

• Linear functions f (x) = kx + d for k, d ∈ R. We have


Z b  
b−a
f (x) dx = k · + d · (b − a). (∗∗)
a 2

49
y-axis

f (x) = kx + d
b−a

f 2

x-axis
a b

16. Again, we consider constant functions f (t) = c and linear functions


f (t) = kt + d. If we fix the lower bound of the integral a = 0 and
consider the variable upper bound b = x, we get for each value of
x ∈ R the area of the corresponding integral
Z x
f (t) dt,
0

which is again a real number. This means, we can define a function


Z x
g(x) = f (t)dt.
0

We emphasize, only the upper bound of the integral is variable. Ex-


amples:

• Let f (t) = 3. Then


o Using (∗) we get
Z 1
g(1) = 3 dt = (1 − 0) · 3 = 3.
0

o Using (∗) we get


Z 3
g(3) = 3 dt = (3 − 0) · 3 = 9.
0

50
o Using (∗) we get
Z 5
g(5) = 3 dt = (5 − 0) · 3 = 15.
0
o In general: Z x
g(x) = 3 dt = 3x.
0

y-axis y-axis y-axis


f (t) = 3 f (t) = 3 f (t) = 3

t-axis t-axis t-axis


1 3 5

• Let f (t) = −0.5t + 3. Then


o Using (∗∗) we get
Z 1  
1 1−0
g(1) = −0.5t + 3 dt = − · + 3 · (1 − 0) = 2.75.
0 2 2
o Using (∗∗) we get
Z 3  
1 3−0
g(3) = −0.5t + 3 dt = − · + 3 · (3 − 0) = 6.75.
0 2 2
o Using (∗∗) we get
Z 5  
1 5−0
g(5) = −0.5t + 3 dt = − · + 3 · (5 − 0) = 8.75.
0 2 2
o In general:
Z x
1 x2
 
1 x
g(x) = −0.5t+3 dt dt = − · + 3 ·(x) = − · +3x.
0 2 2 2 2

y-axis y-axis y-axis


f (t) = − 21 t + 3 f (t) = − 12 t + 3 f (t) = − 12 t + 3

t-axis t-axis t-axis


1 3 5

51
17. We already know that if we do not only know f (t), but also an an-
tiderivative F (t), we are able to compute the value of
Z x
g(x) = f (t) dt
0

by
g(x) = F (x) − F (0),
where F (0) ∈ R is a constant. An interesting fact is that the derivative
of g(x) is given by the sum rule and the rule for differentiating constants
as
g ′ (x) = F ′ (x) − (F (0))′ = f (x) − 0 = f (x).
In other words, the functions g(x) and f (t) use different variables, but
g(x) is an antiderivative of f (x). We state this important property,
which is the other side of the coin of the fundamental theorem
of calculus. Given f (x), we can compute an antiderivative F (x) by
considering the function
Z x
F (x) = f (t) dt.
0

18. Some calculation rules for antiderivatives, which can be justified by


comparing the corresponding rules for derivatives. Again, we emphasize
an antiderivative is only unique up to an additive constant c ∈ R.
(a) For functions f (x) = axb , with a ∈ R and b ∈ N0 , it holds that an
b+1
antiderivative is given by F (x) = axb+1 .
(b) For functions f (x) = a · ex , a ∈ R, it holds that an antiderivative
is given by F (x) = a · ex .
(c) An antiderivative of sin(x) is − cos(x) and an antiderivative of
cos(x) is sin(x).
(d) If F (x) is an antiderivative of f (x) and G(x) is an antiderivative
of g(x), then F (x) + G(x) is an antiderivative of f (x) + g(x). This
rule is also sometimes called the sum rule.
(e) If f (x) is an antiderivative of f ′ (x) and g ′ (x) is the derivative of
g(x), then
Z x
f (x) · g(x) − f (0) · g(0) − f (t) · g ′ (t) dt
0

52
is an antiderivative of f ′ (x) · g(x). This rule is sometimes called
partial integration or integration by parts. This rule is also often
denoted as
Z Z Z Z
′ ′
f g = f g− f g or f (x)g(x) dx = f (x)g(x)− f (x)g ′ (x) dx.

Here
R the notion of an indefinite integral is used, i.e. the symbol
f without lower and upper bound indicates we are not interested
in the area between f and the x-axis, but in an antiderivative of
f . Some authors refer to integrals with lower and upper bound as
definite integrals. Example: Let f (x) = f ′ (x) = ex and g(x) = x.
Then g ′ (x) = 1.
Z Z
xe dx = ex · x dx
x

Z
= e · x − e · 0 − ex · 1 dx
x 0

= xex − ex .

19. Some authors emphasize the dependence on a constant c ∈ R of the


antiderivative by writing
Z
f (x) dx = F (x) + c,

for example
x3
Z
x2 + 2x dx = + x2 + c
3
or Z
π cos(x) + 2ex dx = π sin(x) + 2ex + c.

53
y-axis

F3 (x) = sin(x) + 4

F2 (x) = sin(x) + 2

F1 (x) = sin(x)
x-axis

f (x) = cos(x)

20. If we are given a function f (x) and an additional condition, such as


F (0) = 1 (in general F (x0 ) = y0 ), then we can fix the constant c ∈ R
and thus find the unique antiderivative satisfying the condition. Ex-
amples:

• Find an antiderivative F (x) of the function f (x) = 4x + 1 with


F (0) = 1. We compute

x2
Z Z
f (x) dx = 4x + 1dx = 4 + x + c,
2

and solve F (0) = 1. As 2(0)2 + (0) + c = 1, we get c = 1. Hence,


we get F (x) = 2x2 + x + 1.
• Find an antiderivative F (x) of the function f (x) = cos(x) with
F (π) = 0. We compute
Z Z
f (x) dx = cos(x)dx = sin(x) + c,

and solve F (π) = sin(π) + c = 0, hence c = 1. As a result we find


F (x) = sin(x).

54
9 Introduction to probability theory and statis-
tics
1. In statistics, we often consider a (statistical) population from which a
sample is drawn. The sample is always a subset of the population.
Example:

• From all the students in a course (=population), those with a


matriculation number that contains 9 are drawn (=sample).

{s : s is stud. with mat. nr. that contains 9} ⊆ {s : s is stud. in this course}.

• From the bicycles parked on campus (=population), those which


are older than 2 years are drawn (=sample).

{b : b is older than 2 years} ⊆ {b : b is a bycicle that is parked on uni}.

• Those electrical devices in the lab, which are broken.

{e : e is a broken electrical device} ⊆ {d : d is a device in the lab}.

2. Sometimes we want to consider the entire population, in this case the


sample is just the entire population.

3. Given a sample, one or more features are considered. We might consider

• qualitative features, such as


– hair color,
– profession,
– manufacturer,
– etc.;
• and quantitative features, such as
– size,
– temperature,
– income,
– etc.

4. Data representation.

55
(a) In most cases, data is first collected using unsorted lists. In order
to more easily process data, this list is then sorted. The frequency
of a feature can be specified as
• absolute frequency: number of occurrences in the sample,
• relative frequence: number of occurrences in the sample di-
vided by sample size. This value is often multiplied by 100,
to represent a percentage.
Example: Consider the following sample of students which own a
pet
Student Pet
Ann Cat
Bob Cat
Charly Dog
Diane Cat
Eve Dog
Fatima Dog
Georg Dog
Holger Dog
Irene Cat
Jules Dog
The size of the sample is 10, the absolute frequency of dogs
 (cats)
6 4
is 6 (4), the relative frequency of dogs (cats) is 10 10 . The
percentage of dog owners in the sample is 60%, the percentage of
cat owners is 40%.
(b) The frequencies can be entered in a table to visualize data.
Feature absolute frequency relative frequency percentage
4
Cats 4 10
40%
6
Dogs 6 10
60%
(c) Let us consider the following sample (favourite subjects of stu-
dents):
Subject Mathematics Physics Chemisty History
Students 79 70 30 19
We now use the following diagrams as a graphical representation
of the sample data:

56
• a pie chart.

40% Mathematics
10% History
Chemistry
15% Physics
35%

n = 198
• a bar chart with absolute numbers or with percentages.

History 19

Chemistry 30

Physics 70

Mathematics 79

0 10 20 30 40 50 60 70 80 90 100

History 10%

Chemistry 15%

Physics 35%

Mathematics 40%

0% 10% 20% 30% 40% 50%


• Other statistical graphs: histograms, scatter plots, etc.

57
(d) The following statistical measures are often used to explore prop-
erties of (large amounts of) data. We will by far not present an
exhaustive list, but rather focus on the most important ones. Tak-
ing a sample, we consider a certain feature and get a sorted list of
values x1 ≤ x2 ≤ · · · ≤ xn .
• The (arithmetical) mean or mean value is defined as
n
1X
x= xi .
n i=1

Sometimes there are only m different values x1 , . . . , xm , i.e. it


might happen that some numbers occur more often. If the list
is given by the distinct and sorted values x1 < x2 < · · · < xm
and the corresponding frequencies h1 , h2 , . . . , hm ∈ N⊬ , we are
able to compute the total amount of results n via
m
X
n= hi
i=1

and the mean as m


1X
x= hi xi .
n i=1
This is also often called weighted mean. If the sample is the
whole population, many authors use the symbol µ instead of
x.
• The geometric mean is defined as

n
! n1
Y
xgeom = xi ,
i=1

or using the notation from above

m
! n1
Y
xgeom = xhi i .
i=1

• A disadvantage of the arithmetic mean is that it is very sensi-


tive to outliers, i.e. a small number of values, which are either

58
very small or very large, when compared with the others, may
change the result by a lot. A more robust measure is the so
called median which is defined for odd n as
x̃ = x n+1 ,
2

and for even n as the arithmetic mean of x n2 and x n2 +1 ,


x n2 + x n2 +1
x̃ = ,
2
thus it splits the sorted list of values in two halves: the values
on the left are less or equal than the median, the values on
the right are greater or equal than the median.
• The mode(s) or modal value(s) is the (are the) value(s) that
occur most often. If, as above, h1 , h2 , . . . , hm are the frequen-
cies of the distincet values x1 , . . . , xm , then we consider all hj ,
s.t. hj ≥ hi for all i = 1, . . . , m. The mode(s) is (are) then
the value(s) xj .
• Often some crucial information is lost, when only considering
the mean. Assume for example the following two lists of data:
1, 1, 9, 9 and 5, 5, 5, 5.
The arithmetic mean of both lists is 5, but the values in the
first list are obviously more scattered. The variance is the
weighted sum of the squared differences of the values xi from
the mean value. If the sample is not the population, this value
can be determined using
n
1 X
s2 = (xi − x)2 .
n − 1 i=1

If the sample is the entire population, then the variance is


determined by
n
2 1X
σ = (xi − µ)2 .
n i=1
The reason for dividing by n − 1 in the first formula and n
in the second formula is that if the sample is not the pop-
ulation, then x is just an estimate of, hence might (and in

59
most real cases will) differ from, the true mean value µ. Thus
in most cases, if we only consider samples which are not the
entire population, dividing by n − 1 gives a more accurate
estimate of the true variance. If of course the sample is the
entire population, we do not need to estimate the variance,
we can calculate it directly, hence we divide by n. The rea-
son for writing s2 and σ 2 is that usually the values xi and
also x, µ represent some quantities in a certain unit, but this
unit is squared during calculation. E.g. if the xi represent
meters, then the variance represents square meters. In order
to “convert the variance back” to the original unit,√the so-
called 2
√(standard) deviation is used, which is just s = s and
σ= σ . 2

• Example:
Player A B C D E F G
Goals 8 4 4 1 7 4 7
Sorted list:
x1 x2 x3 x4 x5 x6 x 7
1 4 4 4 7 7 8
Arithmetic mean:
1 35
x = (1 + 4 + 4 + 4 + 7 + 7 + 8) = = 5.
7 7
Geometric mean:

7

7
xgeom = 1 · 4 · 4 · 4 · 7 · 7 · 8 = 25088 ≈ 4.251.

Median: As 7 is odd, we calculate 7+1


2
= 4 and as x4 = 4, we
get x̃ = 4.
Mode: 4 occurs most often, hence the mode is 4.
Variance and standard deviation: We calculate the result of
(1 − 5)2 + (4 − 5)2 + (4 − 5)2 + (4 − 5)2 + (7 − 5)2 + (7 − 5)2 + (8 − 5)2
,
7−1
which is
16 + 1 + 1 + 1 + 4 + 4 + 9
= 6.
6

Hence s2 = 6, s = 6.

60
• Example:
x1 x2 x3 x4 x5
Value -10 3 10 13 15
Frequency 5 3 7 2 3
This means in a complete sorted list, the first 5 values are
(−10), the next 3 are 3, and so on.
Arithmetic mean:
1 100
x= (5 · (−10) + 3 · 3 + 7 · 10 + 2 · 13 + 3 · 15) = = 5.
20 20
Geometric mean: (not applicable)
Median: As 20 is even, we calculate 10
2
= 10 and 10
2
+ 1 = 11,
hence we take the 10th and 11th value of the complete list
and calculate their mean. As both these values are equal to
10, we get x̃ = 10.
Mode: h3 = 7 occurs most often, hence the mode is 10.
Variance and standard deviation:
1740
s2 = ≈ 91.579.
19
Hence s ≈ 9.57.

Remark: In both examples we assumed the sample is not the


entire population.

5. Rolling dice.

• Assume we roll a blue die 12, 1200 and 120000 times. From these
samples, we consider the feature “number of dots on the side that
is facing upwards”. The results may be
Result(12) 1 2 3 4 5 6
Frequency 1 3 1 2 5 0
Percentage 8.3% 25% 8.3% 16.7% 41.7% 0%

Result(1200) 1 2 3 4 5 6
Frequency 178 207 200 210 194 211
Percentage 14.8% 17.25% 16.7% 17.5% 16.2% 17.6%

61
(Percentages are rounded. How is it possible that they do not sum
up to 100% each time?)

Result(120000) 1 2 3 4 5 6
Frequency 19915 20034 19978 20148 19908 20017
Percentage 16.6% 16.7% 16.5% 16.6% 16.8% 16.7%
• Assume we roll a green die 12, 1200 and 120000 times. Again,
we consider the feature “number of dots on the side that is facing
upwards”. The results may be
Result(12) 1 2 3 4 5 6
Frequency 0 0 8 2 2 0
Percentage 0% 0% 66.7% 16.7% 16.7% 0%

Result(1200) 1 2 3 4 5 6
Frequency 75 49 887 62 66 61
Percentage 6.25% 4.1% 73.9% 5.2% 5.5% 5.1%

Result(120000) 1 2 3 4 5 6
Frequency 5920 5907 90478 5846 5941 5908
Percentage 4.9% 4.9% 75.4% 4.9% 5.0% 4.9%

Again, we are able to compute the statistical measures discussed above.


Moreover, we notice that as the number of die rolls increase, the per-
centages of the different outcomes of the blue die seem to tend to the
same number. For the green die, this is not the case. What might be
the cause of the two completely different outcomes?

6. A random experiment is an experiment with multiple possible out-


comes, for which it is not predictable beforehand which one will be
the result of a particular execution of the experiment. E.g. coin flip or
die/dice roll.

7. The sample space is a set of outcomes. It is usually denoted by Ω. We


will consider Ω to be either finite or infinite, but such that we can label
the elements using natural numbers. We call such sets enumerable. For
example

• Coin flip: Ω = {Head, Tail}.

62
• Die roll: Ω = {1, 2, 3, 4, 5, 6}.
• Rolling three dice: Ω = {(x1 , x2 , x3 ) : xi ∈ {1, 2, 3, 4, 5, 6} for i =
1, 2, 3}.
• Number of coin flips before “Head” was up the first time: Ω =
{1, 2, 3, 4, . . . }.

Subsets of Ω are called events. Consider the die roll we could for
example consider the following subsets.

• “1” was rolled: A = {1}.


• “5” was not rolled: B = {1, 2, 3, 4, 6}.
• An odd number was rolled: C = {1, 3, 5}.
• The certain event: Ω.
• The impossible event: ∅.

The probability measure P is a function that assigns to each A ⊆ Ω a


number between 0 and 1, i.e.

P : P(Ω) → [0, 1],

with the following properties

(a) P(∅) = 0 and P(Ω) = 1,


(b) If A and B are disjoint events, i.e. A ∩ B = ∅, then P(A ∪ B) =
P(A) + P(B).

The idea is that P(A), which is a number between 0 and 1, repre-


sents how likely it is that the outcome of an actual execution of the
experiment is a member of A. The sets Ω and P(Ω) together with a
corresponding probability measure P are called probability space. This
then is the model of the experiment we are working with. Examples

• Coin flip (fair coin):


– Ω = {Head, Tail},
– P(Ω) = {∅, {Head}, {Tail}, Ω},
– P(∅) = 0, P({Head}) = 21 , P({Tail}) = 12 , P(Ω) = 1.
• Coin flip (example of unfair coin):

63
– Ω = {Head, Tail},
– P(Ω) = {∅, {Head}, {Tail}, Ω},
– P(∅) = 0, P({Head}) = 14 , P({Tail}) = 34 , P(Ω) = 1.
• Coin flip (fair coin), two times:
– Ω = {HH, HT, TH, TT},
– We write down explicitly:

P(Ω) = {∅,
{HH}, {HT}, {TH}, {TT},
{HH, HT}, {HH, TH}, {HH, TT},
{HT, TH}, {HT, TT}, {TH, TT},
{HH, HT, TH}, {HH, HT, TT}, {HH, TH, TT}, {HT, TH, TT},
Ω}.

As |Ω| = 4, we get |P(Ω) = 24 = 16.


– For the probability measure P, we have:
1
P(∅) = 0, P({HH}) = ,
4
1 1
P({HT}) = , P({TH}) = ,
4 4
1 1
P({TT}) = , P({HH, HT}) = ,
4 2
1 1
P({HH, TH}) = , P({HH, TT}) = ,
2 2
1 1
P({HT, TH}) = , P({HT, TT}) = ,
2 2
1 3
P({TH, TT}) = , P({HH, HT, TH}) = ,
2 4
3 3
P({HH, HT, TT}) = , P({HH, TH, TT}) = ,
4 4
3
P({HT, TH, TT}) = , P(Ω) = 1.
4
From the last example, we might notice the following: If Ω is given,
we do not need to write down P(Ω) explicitly. Furthermore, it suffices
to define P(A) for all A ⊆ Ω, with |A| = 1. This is because P(∅) = 0

64
and P(Ω) = 1 are always true. Moreover for any event B ⊆ Ω, we may
find A1 , A2 , A3 , . . . , such that Ai ∩ Aj = ∅ if i ̸= j and |Ai | = 1, with
A1 ∪ A2 ∪ A3 ∪ · · · = B. As the Ai are now pairwise disjoint, we are
able to use the property P(Ai + Aj ) = P(Ai ) + P(Aj ) and so on. In the
previous example, this means we can also define the probabilty space
by
• Ω = {HH, HT, TH, TT},
• P({HH}) = P({HT}) = P({TH}) = P({TT}) = 41 .
The subsets A ⊆ Ω with |A| = 1 are called elementary events or atomic
events.
8. A finite probability space, where each outcome is equally likely is called
Laplace probability space, Laplace space or Laplace model. In this case,
the probability of an event A ⊆ Ω is given by
|A| “favorable cases”
P(A) = = .
|Ω| “possible cases”
This is sometimes called Laplace’s rule. Examples:
• Randomly drawing a card from a (standard) 52 cards deck, what
is the probability of the following events
4 1
– A := {an ace is drawn}: P(A) = 52 = 13 .
– B := {a red card is drawn}: P(B) = 52 = 12 .
26

• In an urn are 5 red, 7 white and 4 green balls.


7
– A := {a white ball is drawn}: P(A) = 16 .
4 5
– B := {a green ball or a red ball is drawn}: P(B) = 16 + 16 =
9 7
16
(= 1 − 16
).
C
As B = A , we are able to compute the probability of B in
two ways. Either directly, or by using the rule of complementary
probability:
For any A ⊆ Ω : P(AC ) = 1 − P(A).
9. We expect that rolling a fair die n-times, for n ∈ N, can be modelled us-
ing a Laplace space. Hence we expect the probability of an elementary
event A to be
1 1
P(A) = = n.
|Ω| 6

65
We also expect the model to (at least roughly) describe reality, hence we
expect that by repeatedly executing the (real life) random experiment,
the (real life) relative frequency (percentage) of a certain event tends
more and more to the (theoretical) probability of that event. This is
sometimes called the law of large numbers: If hn (A) is the relative
frequency of an event A ⊆ Ω, then

lim hn (A) = P(A).


n→∞

If we compare this to the 12,1200 and 120000 rolls of the blue and green
die above, we see that the blue die can be modelled by a Laplace space,
the green die (most likely) not. A reason for this might be that the
green die is biased or unfair. (Why can’t we be sure it is unfair?). If
the green die is unfair, then the some outcomes might be more likely
then others.

10. Following the idea, of using percentages to compute probabilites and


using Laplace’s rule we consider also the following examples:

• Consider the following dart board:

1
2

We assume that a player hits the board with each dart he throws.
Each point on the dart board gets hit equally likely. How probable
is it that a dart hits a point within the blue area? We compute
the total area of the dart board |Ω| = 22 π and the area of the blue
circle |A| = 12 π, thus
|A| π 1
P(A) = = = .
|Ω| 4π 4

• A needle is dropped on a quadratic tile with side lenght 1. How


likely is it, that the distance of the tip of the needle and the lower
left corner is greater than one?

66
We can assume, the lower left point has the coordinates (0, 0) and
we set Ω = [0, 1]2 . The set of points (x, y) ∈ Ω with distance
greater than one to the lower left point is

A = {(x, y) ∈ Ω : (x − 0)2 + (y − 0)2 > 1}.

We see AC is the following quarter circle with radius 1.

Thus we get

green area = 41 r2 π 1
· 12 π π
P(AC ) = = 4
= .
area of the square = 1 · 1 1 4

Hence P(A) = 1 − P(AC ) = 4−π 4


. (This gives us a possibility to
empirically determine an estimate for π.)

11. Repeated random experiments (with finite sample space) can be vi-
sualized using tree diagrams. For the following example, we consider
drawing 2 balls from an urn, which contains 5 red and 3 green balls. If
we put balls we have drawn back into the urn, we get

67
5 3
8 8

R G
5 3 5 3
8 8 8 8

RR RG GR GG

If we remove balls we have drawn, we get

5 3
8 8

R G
4 3 5 2
7 7 7 7

RR RG GR GG

If we now want to find the probability of a leaf of the tree, e.g. the
event {RR}, we multiply the numbers on the edges along the path from
the root to the corresponding leaf, in the case of {RR} we have
• P({RR}) = 85 · 58 = 25
64
in the case where we put balls back into the
urn and
• P({RR}) = 5
8
· 4
7
= 5
14
in the case where we remove balls we have
drawn.
This rule is sometimes called multiplication rule.
12. If we are interested in an event that corresponds to more than one leaf,
we add up the corresponding probabilites. E.g. if we are interested in
the event {one green and one red ball is drawn} = {RG, GR}, we have

68
• P({RG, GR}) = P({RG}) + P(GR}) = 58 · 38 + 3
8
· 5
8
= 15
32
in the
case where we put balls back into the urn and
• P({RG, GR}) = P({RG}) + P(GR}) = 58 · 37 + 3
8
· 5
7
= 15
28
in the
case where we remove balls we have drawn.

This rule is called addition rule. In fact, as the leafs are mutually
exclusive events, this rule directly follows from P(A∪B) = P(A)+P(B)
when A ∩ B = ∅.

13. In the case where we remove balls we have drawn, the experiment
changes at each level of the tree. In fact, we can also visualize random
experiments that consist of different sub-experiments. E.g. We flip a
coin, if we get “Head”, we roll a die, if we get “Tail”, we flip a different
coin.

1 1
2 2

H T
1 1 1 1 1 1
6 6 6 6 2 2

H1 H2 H3 ... H6 TH TT

14. In the previous example, assume we know that the event

{an even number was rolled with the die}

occurred. What do we know about the first coin flip? We notice that
if we know some event occurred, this might influence the probability
of other events to occur. This is known as conditional probability. If
A, B ⊆ Ω are events, then

P(A ∩ B)
P(A|B) =
P(B)

69
is the probability of A given B. This is sometimes called the probability
of A under the condition B. The formula above is sometimes used to
determine the probability of the intersection:

P(A ∩ B) = P(A|B) · P(B),

i.e. if P(A|B) and P(B) are given, we are able to determine P(A ∩ B).

15. If an event A does not depend on B, we say A is independent of B.


As in this case P(A|B) = P(A), i.e. the probability of A is the same,
wether B occurred or not, we get

P(A ∩ B) = P(A|B) · P(B) = P(A) · P(B).

16. Law of total probability :

• (Simple version) If A, B1 , B2 ⊆ Ω are events, such that B1 ∩B2 =


∅ and B1 ∪ B2 = Ω, then

P(A) = P(B1 ) · P(A|B1 ) + P(B2 ) · P(A|B2 ).

Remark: In this case, always B2 = B1C .



A

B1 B2

A ∩ B1 A ∩ B2

• (General version) If A, B1 , B2 , . . . , Bn ⊆ Ω are events, such that


Bi ∩ Bj = ∅ for i ̸= j and B1 ∪ B2 ∪ · · · ∪ Bn = Ω, then

P(A) = P(B1 ) · P(A|B1 ) + P(B2 ) · P(A|B2 ) + · · · + P(Bn ) · P(A|Bn )


Xn
= P(Bi ) · P(A|Bi ).
i=1

70
17. If A, B ⊆ Ω, we have P(A ∩ B) = P(B) · P(A|B) and as P(B ∩ A) =
P(A) · P(B|A). But as A ∩ B = B ∩ A, thus P(A ∩ B) = P(B ∩ A), we
get Bayes law
P(B) · P(A|B)
P(B|A) = .
P(A)
Example: A company knows that the probability that a file they receive
via mail is infected by a virus is 0.1 (10%). Their anti-virus software
detects a virus in 99% of the cases, but the software also falsely detects
a virus in a file that is not infected in 2 out of 1000 cases (0.2%). How
likely is it that a file that got reported by the anti-virus software really
is infected?
We denote the events
• V := {mail contains a virus} and
• V C := {mail does not contain a virus}.
We have P(V ) = 0.1, hence P(V C ) = 1 − P(V ) = 0.9. Furthermore we
denote the event R := {a mail gets reported}. We know P(R|V ) = 0.99
and P(R|V C ) = 0.02. We want to know P(V |R). By the law of total
probability, we determine P(R)
P(R) = P(V ) · P(R|V ) + P(V C ) · P(R|V C )
= 0.1 · 0.99 + 0.9 · 0.02
= 0.117.
Hence, by Bayes law, we have
P(V ) · P(R|V )
P(V |R) =
P(R)
0.1 · 0.99
=
0.117
≈ 0.846.
Hence the probability that a reported file is infected is around 84.6%.
18. Many random experiments describe a situation, where each outcome is
again assigned a real number. Examples:
• Die roll, each outcome is assigned the number of dots on the side
that is facing upwards.

71
• Simple game, we flip a coin, if we get “Head”, you win 1 Euro, if
we get “Tail ”, I win 1 Euro, that means you loose 1 Euro.

We can model this by a function X : Ω → R. Such a function is called


random variable, the result X(ω) ∈ R, for ω ∈ Ω is called realization
of the random variable X. Examples

• Die roll: Ω = {1, 2, 3, 4, 5, 6}, X(1) = 1, X(2) = 2, etc.


• Coin flip: Ω = {Head, Tail}, X(Head) = 1, X(Tail) = −1.

Now, we are able to ask questions like

• Die roll: How likely is it to get a “6”? The corresponding proba-


bility is P(X = 6) = P({6}) = 16 .
• Coin flip: How likely is it that you win? The corresponding prob-
ability is P(X = 1) = P({Head}) = 12 .
• Die roll: How likely is it to get number greater or equal “3”? The
corresponding probability is P(X ≥ 3) = P({3, 4, 5, 6}) = 46 .
• Die roll: How likely is it to get either “1”, “3” or “4”? The
corresponding probability is P(X ∈ {1, 3, 4}) = P({1, 3, 4}) = 36 .

19. Suppose that for an enumerable set Ω, i.e. Ω is either finite or we


can label the elements of Ω using the natural numbers, we collect all
real numbers xi = X(ωi ) for ωi ∈ Ω, i.e. all the values we get from
the random variable X, in a set D ⊂ R. The function f : D → [0, 1],
with f (xi ) = P(X = xi ) is called the probability (mass) function of the
random variable X. The function f : R → [0, 1], with F (x) = P(X ≤ x)
the probability distribution function or cumulative distribution function
(cdf ) of the random variable X. Examples

• Coin flip: Recall X(Head) = 1, X(Tail) = −1, then D := {−1, 1}


and the probability function f is given by f (−1) = P(X = −1) =
P({Tail}) = 12 , f (1) = P(X = 1) = P({Head}) = 21 . We also plot
the cumulative distribution function:

72
F (x)

1
2

x
−1 1

• Die roll: we have D := {1, 2, 3, 4, 5, 6}, f (1) = f (2) = f (3) =


f (4) = f (5) = f (6) = 16 . A plot of the cdf looks like

F (x)

1
5
6
2
3
1
2
1
3
1
6

x
1 2 3 4 5 6

20. If we play the coin flip game often, we expect that in the long run, we
will neither win nor loose money, i.e. we expect that in around 50%
of the cases we win 1 Euro and in 50% of the cases we loose 1 Euro,
hence on average we win/loose 0 Euro. Following this idea, we define
the expectation or expected value of a random variable X with values
xi ∈ D as X
E(X) = P(X = xi ) · xi .
xi ∈D

73
Examples:

• Die roll: Ω = {1, 2, 3, 4, 5, 6}, X(1) = 1, X(2) = 2, etc. and


1 1 1 1 1 1
E(X) = · 1 + · 2 + · 3 + · 4 + · 5 + · 6 = 3.5.
6 6 6 6 6 6

• Coin flip: Ω = {Head, Tail}, X(Head) = 1, X(Tail) = −1 and


1 1
E(X) = · (−1) + · 1 = 0.
2 2
Similar as for the arithmetical mean of the population, the symbol
µ = E(X) is often used.

21. The variance of a random variable X with values xi ∈ D and expecta-


tion µ is defined as
X
V(X) = P(X = xi ) · (xi − µ)2 .
xi ∈D

Often the symbol σ 2 = V(X) is used to denote the variance. Examples:

• Die roll: Ω = {1, 2, 3, 4, 5, 6}, X(1) = 1, X(2) = 2, etc., µ =


E(X) = 3.5 and
1 1 1
V(X) = · (1 − 3.5)2 + · (2 − 3.5)2 + · (3 − 3.5)2
6 6 6
1 1 1
+ · (4 − 3.5)2 + · (5 − 3.5)2 + · (6 − 3.5)2
6 6 6
35
= ≈ 2.92.
12

• Coin flip: Ω = {Head, Tail}, X(Head) = 1, X(Tail) = −1, µ =


E(X) = 0 and
1 1
V(X) = · (−1 − 0)2 + · (1 − 0)2 = 1.
2 2
The standard deviation σ is again defined using the variance:
p
σ = V(X).

74
22. In many cases, we are not interested in the actual result of a random
experiment, but in the realization of a corresponding random variable.
Consider the following game: we roll a die and we loose 1 Euro if “5” is
rolled, but we gain 1 Euro if any other number is rolled. Now, we might
not be interested if the other number is “1” or “2” etc. we just want
to know if we win or loose, i.e. we are only interested in the realization
of the random variable

−1 ω = 5,
X(ω) = .
1 else
A random experiment with only two option (e.g. win-loose) is called
Bernoulli-experiment. If the probability for winning is p ∈ [0, 1], then
the probability of loosing is q = 1 − p. If we repeat the same Bernoulli-
experiment n times, then we might be interested in the probability of
winning k times. Examples: (we consider again the game described
above, hence p = 56 )
• If we roll the die only once, i.e. n = 1, then the probability of
winning k = 0 times is
1
P(X = 0) = 1 · p0 · q 1−0 = 1 · p0 · (1 − p)1−0 = .
6
The probability of winning k = 1 time is
5
P(X = 1) = 1 · p1 · q 1−1 = 1 · p1 · (1 − p)0 = .
6
• If we roll the die n = 2 times, then the probability of winning
k = 0 times is
1
P(X = 0) = 1 · p0 · q 2−0 = 1 · p0 · (1 − p)2−0 = .
36
(We have to loose both games).
The probability of winning k = 1 time is
5 10
P(X = 1) = 2 · p1 · q 2−1 = 2 · p1 · (1 − p)1 = 2 · = .
36 36
(We can either win in the first or the second die roll, but not
both.)
The probability of winning k = 2 time is
25
P(X = 2) = 1 · p2 · q 2−2 = 1 · p2 · (1 − p)0 = .
36
75
(We have to win both games.)
• If we roll the die n = 3 times, then the probability of winning
k = 0 times is
1
P(X = 0) = 1 · p0 · q 3−0 = 1 · p0 · (1 − p)3−0 = .
216
(We have to loose all three games).
The probability of winning k = 1 time is
5 15
P(X = 1) = 3 · p1 · q 3−1 = 3 · p1 · (1 − p)2 = 3 · = .
216 216
(We can either win in the first or the second or the third die roll,
but not in more than one.)
The probability of winning k = 2 time is
25 75
P(X = 2) = 3 · p2 · q 3−2 = 3 · p2 · (1 − p)1 = 3 · = .
216 216
(We can either loose in the first or the second or the third die roll,
but we have to loose one time.)
The probability of winning k = 3 time is
125
P(X = 3) = 1 · p3 · q 3−3 = 1 · p3 · (1 − p)0 = .
216
(We have to win both games.)

A random variable X with probability mass function


 
n k
f (k) = P(X = k) = p · (1 − p)n−k
k

is called a random varianle that follows the binomial distribution with


parameters n and p, often denoted by X ∼ B(n, p).

23. If X ∼ B(n, p) then

• E(X) = np,
• V(X) = np(1 − p).

76

You might also like