Erg PRFG
Erg PRFG
I Mathematics 1 1
1 Sets, numbers, transformation of terms 1
3 Vectors 19
5 Functions 29
6 Measures of change 32
Part I
Mathematics 1
1 Sets, numbers, transformation of terms
1. Important sets of numbers:
N := {0, 1, 2, 3, 4, . . . }.
1
• The integers
We see, N is a subset of Z,
N ⊆ Z.
N ⊆ Z ⊆ Q.
R := Q ∪ {r | r is an irrational number}.
−5 −4 −3 −2 −1 0 1 2 3 4 5
As Q ⊆ R, we have
N ⊆ Z ⊆ Q ⊆ R.
2
R Q Z N
A B
3
Example:
A := {1, 2, c, d}, B := {car, coffee, 1, 2}, A ∩ B = {1, 2}.
(b) Union. If A and B are sets, then A ∪ B is the set that contains
all the elements, which are contained in A or in B (or both).
A B
Example:
A ∪ B = {1, 2, c, d, car, coffee}.
(c) Set Difference. If A and B are sets, then A \ B is the set that
contains all the elements, which are contained in A but not in B.
A B
Example:
A \ B = {c, d}.
(d) Power Set. If A is a sets, then P(A) is the set that contains all
subsets of A. Example:
P(A) = {∅, {1}, {2}, {c}, {d}, {1, 2}, {1, c}, {1, d}, {2, c}, {2, d}, {c, d},
{1, 2, c}, {1, 2, d}, {1, c, d}, {2, c, d}, A} .
If A is finite and |A| = n, then P(A) is finite and |P(A)| = 2n .
(e) Set complement. If A and B are sets, such that A ⊆ B, then we
can define the set complement of A under B, denoted by AC , as
the set of elements of B that are not contained in A.
4
B
A
Example:
5
where in most cases only v ≥ 0 will be considered a solution. If
we are interested in obtaining m instead of v, we note
1 2 1 2
·m·v +m·g·h= v + g · h m,
2 2
hence for 21 v 2 + g · h ̸= 0, we have
E
m= 1 2 .
2
v +g·h
for k ̸= 0. Example:
• 3! = 3 · 2! = 3 · (2 · 1!) = 3 · (2 · (1 · 0!)) = 3 · 2 · 1 = 6.
• 10! = 10 · 9 · 8 · · · 2 · 1 = 3628800.
7. The binomial coefficients, usually denoted as pair of numbers, e.g. nk
“n choose k”, are the positive integers that occur as coefficients in the
binomial theorem
n
n
X n n−i i
(a + b) = a b.
i=0
i
Example:
• Case n = 2:
2
2
X 2
(a + b) = a2−i bi
i=0
i
2 2−0 0 2 2−1 1 2 2−2 2
= a b + a b + a b = 1 · a2 + 2 · ab + 1 · b2 .
0 1 2
6
• Case n = 3:
3
3
X 3
(a + b) = a3−i bi = a3 + 3a2 b + 3ab2 + b3 .
i=0
i
n
For given n, k ∈ N0 , with n ≥ k, we calculate k
via
n n!
= .
k k! · (n − k)!
We have
• 20 = 2! 2
0!·(2−0)!
= 2
= 1,
• 21 = 2! 2
1!·(2−1)!
= 1
= 2,
• 22 = 2! 2
2!·(2−2)!
= 2
= 1,
• 30 = 3! 6
0!·(3−0)!
= 6
= 1,
• 31 = 3! 6
1!·(3−1)!
= 2
= 3,
• 32 = 3! 6
2!·(3−2)!
= 2
= 3,
• 33 = 3! 6
3!·(3−3)!
= 6
= 1.
• n0 = nn = 1 for all n ∈ N0 ,
• nk = n−kn
for all n, k ∈ N0 ,
• nk = n−1 n−1
k
+ k−1
for all n, k ∈ N \ {0}.
8. Elementary Combinatorics:
• If A, B are finite sets and |A| = n and |B| = k then there are nk
possibilities to choose one element from each set, i.e. |A×B| = nk.
Example: If A := {soup, rice, noodles} and B = {water, juice}
then we have 3 · 2 = 6 possibilites to choose a meal.
7
• If A is a finite set and |A| = n, then there are n! different pos-
sibilites to put (all) its element in order. These orderings are
called permutations of A. Example: Consider the following set
of Books: A := {Math, History, German} then we can put these
books in 3! = 6 different ways into a bookshelf.
– Math, History, German,
– Math, German, History,
– History, Math, German,
– History, German, Math,
– German, Math, History,
– German, History, Math.
The general idea is we make a list:
– in the first place of the list, we can put any of the n elements,
– in the second place of the list, we can put any of the n − 1
remaining elements,
– in the third place of the list, we can put any of the n − 2
remaining elements,
– ...,
– for the last place of the list, there is only 1 remaining element.
Thus, we get n · (n − 1) · (n − 2) · . . . · 1 = n!.
• (Variation without repetition: order important, no re-
peat) If A is a finite set with |A| = n and k ≤ n, then there are
n!
(n−k)!
different possibilities to order k (different) elements from A.
To see this, we again imagine making a list of length k:
– in the first place of the list, we can put any of the n elements,
– in the second place of the list, we can put any of the n − 1
remaining elements,
– in the third place of the list, we can put any of the n − 2
remaining elements,
– ...,
– for the last place of the list, there are only n − k + 1 remaining
elements.
8
Hence, we get
n · (n − 1) · (n − 2) · . . . · (n − k + 1) =
n · (n − 1) · (n − 2) · . . . · (n − k + 1) · (n − k) · (n − k − 1) · . . . · 1
=
(n − k) · (n − k − 1) · . . . · 1
n!
.
(n − k)!
9
• (Combinations with repetition: order not important, re-
peat allowed) If A is a finite set with |A| = n and k ∈ N then
there are n+k−1 n+k−1
k
= n−1
different ways to select k elements of
A, if the order is not important and repetition is allowed. To see
this, we imagine a table with the elements of A and marks “+”
that indicate how often we want to select the element. (Important:
“how often” and not “in what order”.) E.g.
x1 x2 x3 ... xn
++ + +
We can translate the above table into a string
+ + | | + | ... | + .
(Take the first element 2 times, the second 0 times, the third 1
time, . . . , the n-th 1 time.) Each string represents exactly one
combination with repetition. These strings always have length
n + k − 1, the n − 1 bars | and the k marks + to indicate that
the corresponding element is chosen. If we just have a look at the
string, we have n + k − 1 places to put either
– n − 1 bars (in this case, the places where to put the k marks
is already determined),
– or k marks (in this case, the place where to put the n − 1 bars
is already determined).
This means we get n+k−1 n+k−1
k
, which is the same as n−1
, different
strings. As each string represents exactly one combination with
repetition, weget that the number of combinations with repetition
is also n+k−1
k
.
10
• The so-called domain of definition D consists of those values such
that the equation/inequality etc. is an admissible expression, thus
it is a subset of the universe.
• The so-called solution set L is the set of solutions, hence it is a
subset of the domain. The solution set can be empty, i.e. it can
happen that L = ∅, in this case we say “The equation/inequality
etc. has no solution”.
• Solve in R: x − 3 = 4.
Solution: We have U = R, D = R and L = {7}.
• Solve: x1 = 4.
Solution: We have U = R, D = R \ {0} and L = { 41 }.
• Solve in Z: x · 2 = 7.
Solution: We have U = Z, D = Z and L = ∅.
• Solve: 2x − 3 = 4 + x − 1 + x.
Solution: We have U = R, D = R and L = R.
• Solve in R: x − 3 = 4.
Solution: We have U = R, D = R and L = {7}.
• Such problems can also be posed as a mathematical word problem.
E.g. “There is a number, such that: if you take three times this
number, you get the same result you would get, if you add 2 to
this number.”
This can be understood as: Solve in R: 3x = x + 2.
Solution: We have U = R, D = R and L = {1}.
• Solve in R: x − 3 ≥ 4.
Solution: We have U = R, D = R and L = [7, ∞).
3 4 5 6 7 8 9 10 11 12
• Solve: x1 < 2.
Solution: We have U = R, D = R \ {0}.
In order to get the solution, we consider two cases, either x < 0
11
or x > 0:
In the first case, we solve 1 > 2x, in the second case, we solve
1 < 2x.
We get two intermediate solution sets L1 = (−∞, 0) and L2 =
( 12 , ∞)
The result is L = L1 ∪ L2 .
−3 −2 −1 0 1 2 3
4. Quadratic equations:
• Solve: x2 + 8x − 9 = 0.
We have U = R, D = R, either
– we notice
x2 + 8x − 9 = (x − 1)(x + 9),
hence L = {−9, 1},
– we solve using the so-called p-q- formula
s
2
8 8
x=− ± − (−9) = −4 ± 5,
2 2
12
The a-b-c-formula is applicable if the quadratic equation is of the
form
ax2 + bx + c = 0,
and the candidates for the solution are the solutions of
√
−b ± b2 − 4ac
x= .
2a
Geometric interpretation:
x
−9 1
• Solve: 5x − 2x2 = x2 − 2.
Solution: We have U = R, D = R. First we bring this into the
form −3x2 + 5x + 2 = 0. Then we solve using a-b-c-formula.
p √
−5 ± 25 − 4 · (−3) · 2 −5 ± 49 −5 ± 7
x= = = ,
2 · (−3) −6 −6
hence L = − 13 , 2 .
Geometric interpretation:
13
x2 − 2
x
− 13 2
x
− 13 2
5x − 2x2
14
6. Solving equations of degree 3 and 4: Again using the fact that a product
of real numbers is equal to 0 if and only if one of the factors is equal to
0, we are also able to solve some equations of degree 3 and degree 4.
• Example: Solve x3 − x = 0. We factor out x and solve x(x2 − 1) =
0. We recognize x2 −1 = (x+1)(x−1) and solve x(x+1)(x−1) = 0,
hence the three solutions x = 0, x = 1, x = −1.
• Example: Solve x4 − x2 = 0. We factor out x2 and solve x2 (x2 −
1) = 0. We recognize (again) x2 − 1 = (x + 1)(x − 1) and solve
x · x · (x + 1)(x − 1) = 0, hence the three solutions x = 0, x =
1, x = −1. Here one solution, namely x = 0 “appears two times”.
• Example: Solve x4 −16 = 0. We recognize x4 −16 = (x2 +4)(x2 −4)
and solve (x2 + 4)(x2 − 4) = 0. Using for example the p-q-formula,
we see that x2 + 4 = 0 has no solutions, we emphasize that this
means there is no real number x, such that x2 + 4 = 0. On the
other hand, x2 − 4 has the solutions x = 2 and x = −2, hence
x4 − 16 has the solutions x = 2 and x = −2.
7. We are also able to solve the following more complicated looking equa-
tion of degree 4.
41 81
x4 − x2 + = 0.
8 256
We use the following trick: we rewrite x2 = y, i.e. we first solve the
following equation of degree 2
41 81
y2 − y+ = 0,
8 256
which probably yields some solutions for y, and then, for each of these
soltutions, we solve x2 = y. Using the p-q-formula, we get
s
2
41 41 81
y= ± −
16 16 256
r
41 1681 81
= ± −
16 256 256
r
41 1600
= ±
16 256
41 40
= ± .
16 16
15
1 81
Hence we get the intermediate solutions y = 16 and y = 16 . Now, as
2 2 1 2 81
we set x = y, we solve both x = 16 = 0 and x = 16 . The first
equation yields the solutions x = 14 and x = − 14 , the second equation
yields x = 49 and x = − 94 . Hence in total, we got 4 solutions.
16
5x − 2y = 1
(1, 2)
3x + 3y = 9
x
• Solve:
5x −2y =1
10x −4y = −12
As no universe is explicitly specified, we assume x ∈ R and y ∈ R,
hence we take U = R × R and D = R × R. Transforming the first
equation to
1 + 2y
x= ,
5
and replacing x in the second equation with this expression we get
1 + 2y
10 · − 4y = −12,
5
hence
2 + 4y − 4y = −12,
finally
2 = −12.
17
As this statement is a contradiction, we have no solution, thus
L = ∅.
Geometric interpretation: two parallel lines.
10x − 4y = −12
5x − 2y = 1
• Solve:
5x −2y =1
10x −4y =2
We have U = R × R and D = R × R. Transforming the first
equation to
1 + 2y
x= ,
5
and replacing x in the second equation with this expression we get
1 + 2y
10 · − 4y = 2,
5
hence
2 + 4y − 4y = 2,
18
finally
2 = 2.
As this is a tautology, we have infinetely many solution, thus
5 1
L = (x, y) ∈ R × R | y = · x − .
2 2
Geometric interpretation: two identical lines.
5x − 2y = 1
10x − 4y = 2
3 Vectors
1. Using the common notation R2 := R × R, a two-dimensional vector
v ∈ R2 , v = (x, y) can be interpreted geometrically as both, the point
19
with the coordinates (x, y), and the arrow from the origin to the point
(x, y). Sometimes vectors are denoted using an arrow, e.g. ⃗v .
y-Axis
x
(x, y)
x-Axis
y-Axis
v2
v1
v1 v1 + v2
v2
x-Axis
20
y-Axis
−v2
v1
v1 − v2 v2
v1 x-Axis
−v2
y-Axis
v1 v1 − v2
v1 − v2 v2
x-Axis
y-Axis y-Axis
v1
v1 3 · v1
v1
x-Axis x-Axis
21
Scaled vectors point in the same direction. Vectors that point in the
same direction are often called parallel.
(a) ||v0 || = 1
(b) v and v0 are parallel
1
can be obtained by v0 := ||v|| · v. This is called normalizing v. Vectors
with length 1 are called unit vectors. Example: If v = (1, 2), then
v0 = √15 · (1, 2) = ( √15 , √25 ).
⟨v1 , v2 ⟩ := x1 x2 + y1 y2 .
22
y-Axis
v1
· x-Axis
v2
The scalar product is also often called dot product or inner product and
denoted v1 .v2 or v⃗1 · v⃗2 .
8. We use the fact that two vectors v1 , v2 ∈ R2 are perpendicular, if and
only if ⟨v1 , v2 ⟩ = 0, to find given a vector v2 , such that for a given
vector v1 , we have that v1 and v2 are perpendicular. One candidate
is v21 := (−y1 , x1 ), as ⟨v1 , v21 ⟩ = x1 (−y1 ) + y1 x1 = 0 in this case.
Another candidate is v22 := (y1 , −x1 ), as ⟨v1 , v22 ⟩ = x1 y1 +y1 (−x1 ) = 0.
Geometric interpretation: The vector v21 is v1 tilted to the left and v22
is v1 tilted to the right.
y-Axis
v1
v21 ··
x-Axis
v22
23
9. We are able to define lines using vectors by collecting all points that
we can reach adding a scaled version of a given vector to a given point.
Given a point P ∈ R2 and a vector r ∈ R2 we set
g := {P + λ · r | λ ∈ R}.
10. Some examples for problems that can be solved using vectors are:
24
• Given two distinct Points P, Q ∈ R2 , determine {X ∈ R2 | ||X −
P || = ||X − Q||} (perpendicular bisector).
• Given two lines g, h in parametric form, determine if they intere-
sect, are parallel or describe the same set.
ax +by =k
cx +dy =ℓ
25
• Swap rows,
• divide/multiply rows by any real number λ ∈ R \ {0},
• add multiples of another row to any row.
has no solutions.
26
which, by dividing the second line by 3, can be transformed to
5 −2 1
.
1 1 3
Swapping the first and the second row yields
1 1 3
,
5 −2 1
and subtracting 5 times the first row from the second we get
1 1 3
.
0 −7 −14
We divide the second row by −7, thus we have
1 1 3
,
0 1 2
finally we subtract the second row from the first and get
1 0 1
.
0 1 2
This represents the system of equations
1x +0y = 1
0x +1y = 2
Hence we can read off the solution x = 1, y = 2.
(b) Solve for x ∈ R and y ∈ R:
5x −2y =1
10x −4y = −12
We have U = R × R and D = R × R. Transforming the system of
equations to matrix form gives
5 −2 1
.
10 −4 −12
We subtract 2 times the first equation from the second and get
5 −2 1
.
0 0 −14
This represents the system of equations
27
5x -2y = 1
0x +0y = −14
This is a contradiction, as 0 ̸= 14, hence we conclude there is no
solution, hence L = ∅.
(c) Solve:
5x −2y =1
10x −4y =2
We have U = R × R and D = R × R. Transforming the system of
equations to matrix form:
5 −2 1
.
10 −4 2
We subtract 2 times the first equation from the second and get
5 −2 1
.
0 0 0
2. For a 2 × 2 matrix A,
a b
,
c d
the determinant of A is defined as det(A) = ad − bc. We can use
the determinant to determine how many solutions a given system of
equations has: We already know
ax +by =k
cx +dy =ℓ
28
can be represented by the matrix
a b
.
c d
5 Functions
1. Let A, B ̸= ∅. We say f : A → B is a function from A to B, if and only
if each a ∈ A is assigned exactly one f (a) ∈ B. The mapping rule is
denoted by a 7→ f (a). In this case, A is called domain of definition, or
simply domain, B is called codomain of f . Let us for example consider
the following function:
n
f : N → Q, n 7→ .
3
In this example, we see the domain of f is N, the codomain is Q. The
mapping rule is n3 , hence 1 7→ 13 , 2 7→ 23 , 3 7→ 33 , . . . .
2. Often one deals with functions, with the property that their domain
and codomain are both subsets of the real numbers R. Such functions
are often called real functions.
29
• linear functions: f (x) = ax + b, for a, b ∈ R and a ̸= 0,
• quadratic functions: f (x) = ax2 + bx + c, for a, b, c ∈ R and a ̸= 0,
• other polynomial functions (polynomials):
n
X
f (x) = ai xi = an xn + an−1 xn−1 + · · · + a1 x + a0 x0 ,
i=0
30
• Let f (x) = sin(x) + cos(x) and g(x) = 3x4 , then
f (g(x)) = sin(3x4 ) + cos(3x4 ),
• Let f (x) = ex and g(x) = ln(x), then f (g(x)) = x.
10. Given a function f which is of one of the types, which are described
above, we are able to investigate, how the graph changes, if we perform
the following transformations:
• f (x) ,→ c · f (x),
• f (x) ,→ f (x) + c,
• f (x) ,→ f (c · x),
• f (x) ,→ f (x + c),
for c ∈ R.
11. Given the graph, or a section of the graph, of a function f we are able
to discuss monotonicity and local as well as global extrema. I.e. we
understand the notion of
• (strictly) increasing functions,
• (strictly) decreasing functions,
• local/global minima and maxima of a function.
12. To analyze exponential and logarithmic functions, we sometimes need
to use calculation rules for exponentials and logarithms, such as
ab
ab · ac = ab+c , = ab−c , . . .
ac
13. Curve sketching. Given a function f by function term, we can deter-
mine the following properties of f and its graph respectively:
(a) domain and codomain,
(b) zeros of f ,
(c) (symmetry),
(d) extrema (local/global minima/maxima),
(e) poles (e.g. f (x) = x1 has a pole at x = 0),
(f) monotonicity,
(g) (periodicity).
31
6 Measures of change
Considering a real function f and a, b ∈ R in the domain of f , we imagine
to “replace” a by b and analyze how the corresponding function values f (a)
and f (b) change using the following measures of change: (assuming all the
terms are well defined)
32
y-Axis
α x-Axis
y-Axis
1
sin(α)
α · x-Axis
cos(α)
33
for all α ∈ [0, 2π). If α ≥ 2π, we subtract 2π as often as necessary,
such that the result is again an angle between 0 (inclusive) and 2π
(exclusive). This corresponds to the notion of “going around the circle”
more than 1 time. If α < 0, we add 2π as often as necessary, such
that the result is again between 0 (inclusive) and 2π (exclusive). This
corresponds to the notion of “going around the circle” in the opposite
direction. Hence the formula above holds in fact for all α ∈ R.
y-Axis
r
r sin(α)
1
α x-Axis
r cos(α)
34
y
value of the corresponding angle α by sin(α) = r
and cos(α) = xr .
4. We are able to use trigonometry in many applications. Assume for
example we are given a parallelogram with side-lengths a, b and an
angle α. We can compute the height ha :
D C
b
ha = b sin(α)
α
A a B
Hence the area of the parallelogram can be determined via the formula
A = ab sin(α).
5. Considering again the unit circle, for an angle α ∈ [0, 2π), we compare
sin(α) to sin(−α) and cos(α) to cos(−α):
y-Axis
α x-Axis
−α
35
y-Axis
π α x-Axis
7. Another important property is the fact that we can express the cosine
function via the sine function:
π
cos(α) = sin(α + ).
2
36
y-axis
f (x)
B
f (b)
A
f (a)
x-axis
a b
y-axis
(b, f (b))
f (b)−f (a)
b−a
(a, f (a)) 1
x-axis
37
is called the differential quotient of f at a, denoted by f ′ (a).
y-axis
f (x)
4. The function x 7→ f ′ (x), i.e. each value x gets mapped to the value of
the differential quotient of f at x, is called the derivative of f .
38
(c) For functions f (x) = a · ex , a ∈ R, it holds that f ′ (x) = a · ex .
This is sometimes denoted as (aex )′ = aex .
(d) The derivative of sin(x) is cos(x) and the derivative of cos(x) is
− sin(x). This is sometimes denoted as
and
cos′ (x) = (cos(x))′ = cos(x)′ = − sin(x).
(e) If f ′ (x) is the derivative of f (x) and g ′ (x) is the derivative of
g(x), then f ′ (x) + g ′ (x) is the derivative of f (x) + g(x). This rule
is sometimes called the sum rule. Hence if e.g.
n
X
h(x) = ai x i ,
i=0
then n
X
′
h (x) = iai xi−1 .
i=1
(f) If f ′ (x) is the derivative of f (x) and g ′ (x) is the derivative of g(x),
then f ′ (x) · g(x) + f (x) · g ′ (x) is the derivative of f (x) · g(x). This
rule is sometimes called the product rule. Some authors refer to
this rule as the Leibnitz rule (for differentiation). E.g.
39
derivative of h(x) = abx . We set f (x) = ex and g(x) = ln(b) · x.
Hence f ′ (x) = ex and g ′ (x) = ln(b). Thus
h(x) = a · f (g(x)).
• In physics:
– distance-time diagrams, if s(t) is distance travelled by time t,
then v(t) = s′ (t) is speed at time t,
– speed-time diagrams, if v(t) is speed at time t, then a(t) =
v ′ (t) is acceleration at time t.
Remark 1: it is very common to write ṡ(t) instead of s′ (t) and
v̇(t) instead of v ′ (t), i.e. to replace the usual symbol “ ′ ” by a “ ·
” if the variable denotes time.
Remark 2: the letter v for speed is used because often one is not
only interested in the time rate at which an object is moving, but
also the direction. Velocity is the rate and direction of an object’s
movement.
• In economy: if f (x) is a cost/benefit function, depending on the
quantity x, we can ask for optimal values.
40
9. Given a function f (x), we computed g(x), s.t. f ′ (x) = g(x) and we
called g(x) the derivative of f (x). On the other hand if we consider a
function g(x), and find a function f (x) with f ′ (x) = g(x), then we call
f (x) an antiderivative of g(x). Most authors emphasize the property of
being an antiderivative by using capital letters, e.g. an antiderivative
of f (x) is denoted by F (x). Examples:
Thus we get
s(t2 ) − s(t1 )
vavg = .
t2 − t1
We can generalize this: given a function f (x) with antiderivative F (x)
and an interval (a, b) ⊂ R, then the average function value favg in the
non-empty interval (a, b) can be computed as
F (b) − F (a)
favg = .
b−a
41
12. If F (x) and G(x) are both antiderivatives of f (x), we immediately see
that it makes no difference if we use F (x) or G(x) to calculate the
average function value favg . We already saw if F (x) and G(x) are
antiderivatives, then there is some c ∈ R, s.t. G(x) = F (x) + c. Hence
Formula (1) is one side of the coin of the so-called fundamental the-
orem of calculus, i.e. it states that we can compute the integral of
f (x) by plugging in the upper and the lower bound into an antideriva-
tive and subtracting the results.
Using this notation, the average function value of f (x) between a < b
can expressed as
Z b
1
favg = f (x) dx. (2)
b−a a
42
As for a given interval (a, b) and a given function f (x), we can interpret
the term (b − a) as length and favg as height, the integral of f (x) is the
(possibly negative) area of the rectangle with side length b − a and the
height favg .
y-axis
f (x)
Z b
1
f (x) dx = favg
b−a a
x-axis
a b
y-axis
f (x)
x-axis
a b
We consider now to split the interval (a, b) in smaller and smaller sub-
intervals, but each with even length. In each of those sub-invervals,
we assume we are able to compute the average function value, and we
43
multiply it by the sub-interval length. favg can then be computed by
taking the average of those sub averages.
y-axis y-axis
f (x) f (x)
x-axis x-axis
a b a b
y-axis y-axis
f (x) f (x)
x-axis x-axis
a b a b
The total of the red areas is the same for all four pictures above, it is
equal to Z b
(b − a) · favg = f (x) dx.
a
We also notice that by considering smaller and smaller sub-intervals,
the red area more and more accuretely describes the (signed) area be-
tween the graph of f (x) and the x-axis. We conclude: the integral of
f (x) from a to b is the signed area between the graph of f (x) and the
x-axis in the interval (a, b).
44
y-axis
f (x)
Z b
f (x) dx
a
x-axis
a b
• We fix n ∈ N, with n ≥ 1.
• Similar as before, we split the interval (a, b) in n parts. Each part
has length h = b−a
n
.
• The left border of the i-th interval, denoted by xi , is given by
xi = a + (i − 1) · h, the right point is given by a + i · h = xi + h,
for i = 1, 2, . . . , n. Note that a = x1 and b = xn + h.
• In each interval (xi , xi + h), we can find values xi:min and xi:max ,
such that f (xi:min ) is a minimum and f (xi:max ) is a maximum on
(xi , xi + h).
• We compute
n
X n
X
f (xi:min ) · h and f (xi:max ) · h,
i=1 i=1
the lower sum and upper sum of f (x) in the interval (a, b).
45
y-axis
f (x)
x-axis
a b
y-axis
f (x)
x-axis
x1 x2 x2 x4 x5 x6 x7 x8 b
46
y-axis
f (x)
f (xi:min )
x-axis
y-axis
f (x)
f (xi:max )
x-axis
47
y-axis
f (x)
lower sum
x-axis
y-axis
f (x)
upper sum
x-axis
As n, the number of sub-intervals, gets larger, the lower sum gets larger
and the upper sum gets smaller. Hence
Xn Z b Xn
lim f (xi:min ) · h = f (x) dx = lim f (xi:max ) · h.
n→∞ a n→∞
i=1 i=1
48
15. For constant and linear functions we are already able to compute the
integral:
y-axis
f (x) = c
Z b
f (x) dx = (b − a) · c
a
x-axis
a b
49
y-axis
f (x) = kx + d
b−a
f 2
x-axis
a b
50
o Using (∗) we get
Z 5
g(5) = 3 dt = (5 − 0) · 3 = 15.
0
o In general: Z x
g(x) = 3 dt = 3x.
0
51
17. We already know that if we do not only know f (t), but also an an-
tiderivative F (t), we are able to compute the value of
Z x
g(x) = f (t) dt
0
by
g(x) = F (x) − F (0),
where F (0) ∈ R is a constant. An interesting fact is that the derivative
of g(x) is given by the sum rule and the rule for differentiating constants
as
g ′ (x) = F ′ (x) − (F (0))′ = f (x) − 0 = f (x).
In other words, the functions g(x) and f (t) use different variables, but
g(x) is an antiderivative of f (x). We state this important property,
which is the other side of the coin of the fundamental theorem
of calculus. Given f (x), we can compute an antiderivative F (x) by
considering the function
Z x
F (x) = f (t) dt.
0
52
is an antiderivative of f ′ (x) · g(x). This rule is sometimes called
partial integration or integration by parts. This rule is also often
denoted as
Z Z Z Z
′ ′
f g = f g− f g or f (x)g(x) dx = f (x)g(x)− f (x)g ′ (x) dx.
′
Here
R the notion of an indefinite integral is used, i.e. the symbol
f without lower and upper bound indicates we are not interested
in the area between f and the x-axis, but in an antiderivative of
f . Some authors refer to integrals with lower and upper bound as
definite integrals. Example: Let f (x) = f ′ (x) = ex and g(x) = x.
Then g ′ (x) = 1.
Z Z
xe dx = ex · x dx
x
Z
= e · x − e · 0 − ex · 1 dx
x 0
= xex − ex .
for example
x3
Z
x2 + 2x dx = + x2 + c
3
or Z
π cos(x) + 2ex dx = π sin(x) + 2ex + c.
53
y-axis
F3 (x) = sin(x) + 4
F2 (x) = sin(x) + 2
F1 (x) = sin(x)
x-axis
f (x) = cos(x)
x2
Z Z
f (x) dx = 4x + 1dx = 4 + x + c,
2
54
9 Introduction to probability theory and statis-
tics
1. In statistics, we often consider a (statistical) population from which a
sample is drawn. The sample is always a subset of the population.
Example:
4. Data representation.
55
(a) In most cases, data is first collected using unsorted lists. In order
to more easily process data, this list is then sorted. The frequency
of a feature can be specified as
• absolute frequency: number of occurrences in the sample,
• relative frequence: number of occurrences in the sample di-
vided by sample size. This value is often multiplied by 100,
to represent a percentage.
Example: Consider the following sample of students which own a
pet
Student Pet
Ann Cat
Bob Cat
Charly Dog
Diane Cat
Eve Dog
Fatima Dog
Georg Dog
Holger Dog
Irene Cat
Jules Dog
The size of the sample is 10, the absolute frequency of dogs
(cats)
6 4
is 6 (4), the relative frequency of dogs (cats) is 10 10 . The
percentage of dog owners in the sample is 60%, the percentage of
cat owners is 40%.
(b) The frequencies can be entered in a table to visualize data.
Feature absolute frequency relative frequency percentage
4
Cats 4 10
40%
6
Dogs 6 10
60%
(c) Let us consider the following sample (favourite subjects of stu-
dents):
Subject Mathematics Physics Chemisty History
Students 79 70 30 19
We now use the following diagrams as a graphical representation
of the sample data:
56
• a pie chart.
40% Mathematics
10% History
Chemistry
15% Physics
35%
n = 198
• a bar chart with absolute numbers or with percentages.
History 19
Chemistry 30
Physics 70
Mathematics 79
0 10 20 30 40 50 60 70 80 90 100
History 10%
Chemistry 15%
Physics 35%
Mathematics 40%
57
(d) The following statistical measures are often used to explore prop-
erties of (large amounts of) data. We will by far not present an
exhaustive list, but rather focus on the most important ones. Tak-
ing a sample, we consider a certain feature and get a sorted list of
values x1 ≤ x2 ≤ · · · ≤ xn .
• The (arithmetical) mean or mean value is defined as
n
1X
x= xi .
n i=1
n
! n1
Y
xgeom = xi ,
i=1
m
! n1
Y
xgeom = xhi i .
i=1
58
very small or very large, when compared with the others, may
change the result by a lot. A more robust measure is the so
called median which is defined for odd n as
x̃ = x n+1 ,
2
59
most real cases will) differ from, the true mean value µ. Thus
in most cases, if we only consider samples which are not the
entire population, dividing by n − 1 gives a more accurate
estimate of the true variance. If of course the sample is the
entire population, we do not need to estimate the variance,
we can calculate it directly, hence we divide by n. The rea-
son for writing s2 and σ 2 is that usually the values xi and
also x, µ represent some quantities in a certain unit, but this
unit is squared during calculation. E.g. if the xi represent
meters, then the variance represents square meters. In order
to “convert the variance back” to the original unit,√the so-
called 2
√(standard) deviation is used, which is just s = s and
σ= σ . 2
• Example:
Player A B C D E F G
Goals 8 4 4 1 7 4 7
Sorted list:
x1 x2 x3 x4 x5 x6 x 7
1 4 4 4 7 7 8
Arithmetic mean:
1 35
x = (1 + 4 + 4 + 4 + 7 + 7 + 8) = = 5.
7 7
Geometric mean:
√
7
√
7
xgeom = 1 · 4 · 4 · 4 · 7 · 7 · 8 = 25088 ≈ 4.251.
60
• Example:
x1 x2 x3 x4 x5
Value -10 3 10 13 15
Frequency 5 3 7 2 3
This means in a complete sorted list, the first 5 values are
(−10), the next 3 are 3, and so on.
Arithmetic mean:
1 100
x= (5 · (−10) + 3 · 3 + 7 · 10 + 2 · 13 + 3 · 15) = = 5.
20 20
Geometric mean: (not applicable)
Median: As 20 is even, we calculate 10
2
= 10 and 10
2
+ 1 = 11,
hence we take the 10th and 11th value of the complete list
and calculate their mean. As both these values are equal to
10, we get x̃ = 10.
Mode: h3 = 7 occurs most often, hence the mode is 10.
Variance and standard deviation:
1740
s2 = ≈ 91.579.
19
Hence s ≈ 9.57.
5. Rolling dice.
• Assume we roll a blue die 12, 1200 and 120000 times. From these
samples, we consider the feature “number of dots on the side that
is facing upwards”. The results may be
Result(12) 1 2 3 4 5 6
Frequency 1 3 1 2 5 0
Percentage 8.3% 25% 8.3% 16.7% 41.7% 0%
Result(1200) 1 2 3 4 5 6
Frequency 178 207 200 210 194 211
Percentage 14.8% 17.25% 16.7% 17.5% 16.2% 17.6%
61
(Percentages are rounded. How is it possible that they do not sum
up to 100% each time?)
Result(120000) 1 2 3 4 5 6
Frequency 19915 20034 19978 20148 19908 20017
Percentage 16.6% 16.7% 16.5% 16.6% 16.8% 16.7%
• Assume we roll a green die 12, 1200 and 120000 times. Again,
we consider the feature “number of dots on the side that is facing
upwards”. The results may be
Result(12) 1 2 3 4 5 6
Frequency 0 0 8 2 2 0
Percentage 0% 0% 66.7% 16.7% 16.7% 0%
Result(1200) 1 2 3 4 5 6
Frequency 75 49 887 62 66 61
Percentage 6.25% 4.1% 73.9% 5.2% 5.5% 5.1%
Result(120000) 1 2 3 4 5 6
Frequency 5920 5907 90478 5846 5941 5908
Percentage 4.9% 4.9% 75.4% 4.9% 5.0% 4.9%
62
• Die roll: Ω = {1, 2, 3, 4, 5, 6}.
• Rolling three dice: Ω = {(x1 , x2 , x3 ) : xi ∈ {1, 2, 3, 4, 5, 6} for i =
1, 2, 3}.
• Number of coin flips before “Head” was up the first time: Ω =
{1, 2, 3, 4, . . . }.
Subsets of Ω are called events. Consider the die roll we could for
example consider the following subsets.
63
– Ω = {Head, Tail},
– P(Ω) = {∅, {Head}, {Tail}, Ω},
– P(∅) = 0, P({Head}) = 14 , P({Tail}) = 34 , P(Ω) = 1.
• Coin flip (fair coin), two times:
– Ω = {HH, HT, TH, TT},
– We write down explicitly:
P(Ω) = {∅,
{HH}, {HT}, {TH}, {TT},
{HH, HT}, {HH, TH}, {HH, TT},
{HT, TH}, {HT, TT}, {TH, TT},
{HH, HT, TH}, {HH, HT, TT}, {HH, TH, TT}, {HT, TH, TT},
Ω}.
64
and P(Ω) = 1 are always true. Moreover for any event B ⊆ Ω, we may
find A1 , A2 , A3 , . . . , such that Ai ∩ Aj = ∅ if i ̸= j and |Ai | = 1, with
A1 ∪ A2 ∪ A3 ∪ · · · = B. As the Ai are now pairwise disjoint, we are
able to use the property P(Ai + Aj ) = P(Ai ) + P(Aj ) and so on. In the
previous example, this means we can also define the probabilty space
by
• Ω = {HH, HT, TH, TT},
• P({HH}) = P({HT}) = P({TH}) = P({TT}) = 41 .
The subsets A ⊆ Ω with |A| = 1 are called elementary events or atomic
events.
8. A finite probability space, where each outcome is equally likely is called
Laplace probability space, Laplace space or Laplace model. In this case,
the probability of an event A ⊆ Ω is given by
|A| “favorable cases”
P(A) = = .
|Ω| “possible cases”
This is sometimes called Laplace’s rule. Examples:
• Randomly drawing a card from a (standard) 52 cards deck, what
is the probability of the following events
4 1
– A := {an ace is drawn}: P(A) = 52 = 13 .
– B := {a red card is drawn}: P(B) = 52 = 12 .
26
65
We also expect the model to (at least roughly) describe reality, hence we
expect that by repeatedly executing the (real life) random experiment,
the (real life) relative frequency (percentage) of a certain event tends
more and more to the (theoretical) probability of that event. This is
sometimes called the law of large numbers: If hn (A) is the relative
frequency of an event A ⊆ Ω, then
If we compare this to the 12,1200 and 120000 rolls of the blue and green
die above, we see that the blue die can be modelled by a Laplace space,
the green die (most likely) not. A reason for this might be that the
green die is biased or unfair. (Why can’t we be sure it is unfair?). If
the green die is unfair, then the some outcomes might be more likely
then others.
1
2
We assume that a player hits the board with each dart he throws.
Each point on the dart board gets hit equally likely. How probable
is it that a dart hits a point within the blue area? We compute
the total area of the dart board |Ω| = 22 π and the area of the blue
circle |A| = 12 π, thus
|A| π 1
P(A) = = = .
|Ω| 4π 4
66
We can assume, the lower left point has the coordinates (0, 0) and
we set Ω = [0, 1]2 . The set of points (x, y) ∈ Ω with distance
greater than one to the lower left point is
Thus we get
green area = 41 r2 π 1
· 12 π π
P(AC ) = = 4
= .
area of the square = 1 · 1 1 4
11. Repeated random experiments (with finite sample space) can be vi-
sualized using tree diagrams. For the following example, we consider
drawing 2 balls from an urn, which contains 5 red and 3 green balls. If
we put balls we have drawn back into the urn, we get
67
5 3
8 8
R G
5 3 5 3
8 8 8 8
RR RG GR GG
5 3
8 8
R G
4 3 5 2
7 7 7 7
RR RG GR GG
If we now want to find the probability of a leaf of the tree, e.g. the
event {RR}, we multiply the numbers on the edges along the path from
the root to the corresponding leaf, in the case of {RR} we have
• P({RR}) = 85 · 58 = 25
64
in the case where we put balls back into the
urn and
• P({RR}) = 5
8
· 4
7
= 5
14
in the case where we remove balls we have
drawn.
This rule is sometimes called multiplication rule.
12. If we are interested in an event that corresponds to more than one leaf,
we add up the corresponding probabilites. E.g. if we are interested in
the event {one green and one red ball is drawn} = {RG, GR}, we have
68
• P({RG, GR}) = P({RG}) + P(GR}) = 58 · 38 + 3
8
· 5
8
= 15
32
in the
case where we put balls back into the urn and
• P({RG, GR}) = P({RG}) + P(GR}) = 58 · 37 + 3
8
· 5
7
= 15
28
in the
case where we remove balls we have drawn.
This rule is called addition rule. In fact, as the leafs are mutually
exclusive events, this rule directly follows from P(A∪B) = P(A)+P(B)
when A ∩ B = ∅.
13. In the case where we remove balls we have drawn, the experiment
changes at each level of the tree. In fact, we can also visualize random
experiments that consist of different sub-experiments. E.g. We flip a
coin, if we get “Head”, we roll a die, if we get “Tail”, we flip a different
coin.
1 1
2 2
H T
1 1 1 1 1 1
6 6 6 6 2 2
H1 H2 H3 ... H6 TH TT
occurred. What do we know about the first coin flip? We notice that
if we know some event occurred, this might influence the probability
of other events to occur. This is known as conditional probability. If
A, B ⊆ Ω are events, then
P(A ∩ B)
P(A|B) =
P(B)
69
is the probability of A given B. This is sometimes called the probability
of A under the condition B. The formula above is sometimes used to
determine the probability of the intersection:
i.e. if P(A|B) and P(B) are given, we are able to determine P(A ∩ B).
B1 B2
A ∩ B1 A ∩ B2
70
17. If A, B ⊆ Ω, we have P(A ∩ B) = P(B) · P(A|B) and as P(B ∩ A) =
P(A) · P(B|A). But as A ∩ B = B ∩ A, thus P(A ∩ B) = P(B ∩ A), we
get Bayes law
P(B) · P(A|B)
P(B|A) = .
P(A)
Example: A company knows that the probability that a file they receive
via mail is infected by a virus is 0.1 (10%). Their anti-virus software
detects a virus in 99% of the cases, but the software also falsely detects
a virus in a file that is not infected in 2 out of 1000 cases (0.2%). How
likely is it that a file that got reported by the anti-virus software really
is infected?
We denote the events
• V := {mail contains a virus} and
• V C := {mail does not contain a virus}.
We have P(V ) = 0.1, hence P(V C ) = 1 − P(V ) = 0.9. Furthermore we
denote the event R := {a mail gets reported}. We know P(R|V ) = 0.99
and P(R|V C ) = 0.02. We want to know P(V |R). By the law of total
probability, we determine P(R)
P(R) = P(V ) · P(R|V ) + P(V C ) · P(R|V C )
= 0.1 · 0.99 + 0.9 · 0.02
= 0.117.
Hence, by Bayes law, we have
P(V ) · P(R|V )
P(V |R) =
P(R)
0.1 · 0.99
=
0.117
≈ 0.846.
Hence the probability that a reported file is infected is around 84.6%.
18. Many random experiments describe a situation, where each outcome is
again assigned a real number. Examples:
• Die roll, each outcome is assigned the number of dots on the side
that is facing upwards.
71
• Simple game, we flip a coin, if we get “Head”, you win 1 Euro, if
we get “Tail ”, I win 1 Euro, that means you loose 1 Euro.
72
F (x)
1
2
x
−1 1
F (x)
1
5
6
2
3
1
2
1
3
1
6
x
1 2 3 4 5 6
20. If we play the coin flip game often, we expect that in the long run, we
will neither win nor loose money, i.e. we expect that in around 50%
of the cases we win 1 Euro and in 50% of the cases we loose 1 Euro,
hence on average we win/loose 0 Euro. Following this idea, we define
the expectation or expected value of a random variable X with values
xi ∈ D as X
E(X) = P(X = xi ) · xi .
xi ∈D
73
Examples:
74
22. In many cases, we are not interested in the actual result of a random
experiment, but in the realization of a corresponding random variable.
Consider the following game: we roll a die and we loose 1 Euro if “5” is
rolled, but we gain 1 Euro if any other number is rolled. Now, we might
not be interested if the other number is “1” or “2” etc. we just want
to know if we win or loose, i.e. we are only interested in the realization
of the random variable
−1 ω = 5,
X(ω) = .
1 else
A random experiment with only two option (e.g. win-loose) is called
Bernoulli-experiment. If the probability for winning is p ∈ [0, 1], then
the probability of loosing is q = 1 − p. If we repeat the same Bernoulli-
experiment n times, then we might be interested in the probability of
winning k times. Examples: (we consider again the game described
above, hence p = 56 )
• If we roll the die only once, i.e. n = 1, then the probability of
winning k = 0 times is
1
P(X = 0) = 1 · p0 · q 1−0 = 1 · p0 · (1 − p)1−0 = .
6
The probability of winning k = 1 time is
5
P(X = 1) = 1 · p1 · q 1−1 = 1 · p1 · (1 − p)0 = .
6
• If we roll the die n = 2 times, then the probability of winning
k = 0 times is
1
P(X = 0) = 1 · p0 · q 2−0 = 1 · p0 · (1 − p)2−0 = .
36
(We have to loose both games).
The probability of winning k = 1 time is
5 10
P(X = 1) = 2 · p1 · q 2−1 = 2 · p1 · (1 − p)1 = 2 · = .
36 36
(We can either win in the first or the second die roll, but not
both.)
The probability of winning k = 2 time is
25
P(X = 2) = 1 · p2 · q 2−2 = 1 · p2 · (1 − p)0 = .
36
75
(We have to win both games.)
• If we roll the die n = 3 times, then the probability of winning
k = 0 times is
1
P(X = 0) = 1 · p0 · q 3−0 = 1 · p0 · (1 − p)3−0 = .
216
(We have to loose all three games).
The probability of winning k = 1 time is
5 15
P(X = 1) = 3 · p1 · q 3−1 = 3 · p1 · (1 − p)2 = 3 · = .
216 216
(We can either win in the first or the second or the third die roll,
but not in more than one.)
The probability of winning k = 2 time is
25 75
P(X = 2) = 3 · p2 · q 3−2 = 3 · p2 · (1 − p)1 = 3 · = .
216 216
(We can either loose in the first or the second or the third die roll,
but we have to loose one time.)
The probability of winning k = 3 time is
125
P(X = 3) = 1 · p3 · q 3−3 = 1 · p3 · (1 − p)0 = .
216
(We have to win both games.)
• E(X) = np,
• V(X) = np(1 − p).
76