0% found this document useful (0 votes)
0 views6 pages

Errata - Mathematical Introduction To Data Science

The document is an errata for 'Mathematical Introduction to Data Science' by Sven A. Wegner, detailing corrections and clarifications across various pages and lines. It includes mathematical equations, pseudocode, and definitions that require adjustments for accuracy. The corrections span topics such as k-nearest neighbors, clustering algorithms, and properties of eigenvalues.

Uploaded by

Serge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views6 pages

Errata - Mathematical Introduction To Data Science

The document is an errata for 'Mathematical Introduction to Data Science' by Sven A. Wegner, detailing corrections and clarifications across various pages and lines. It includes mathematical equations, pseudocode, and definitions that require adjustments for accuracy. The corrections span topics such as k-nearest neighbors, clustering algorithms, and properties of eigenvalues.

Uploaded by

Serge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Errata

(Mathematical Introduction to Data Science by Sven A. Wegner)


May 1, 2025

n Page 8, Line −4:


n
P n
P n
P n
P
(xi − x)(yi − y) = x i yi − y xi − x yi + nxy
i=1 i=1 i=1 i=1

n Page 9, Line −3:


n
P n
P n
P
0= (axi + b − yi ) = a xi + nb − yi = anx + nb − ny,
i=1 i=1 i=1

n Page 13, Line 10: . . . constant random variable av.


2
n Page 13, Line −5: since x(n) 2 = var(x(n) ) + x(n) as the sum . . .
Page 15, Line −9: · · · + n22 xi xj E(Ei ) E Ej )
P
n
i<j
n Page 16, Line 10: . . . , i.e., f ∗ (x) = ha∗ , xi + b∗ . . .
n Page 14, Line −2: If the latter is the case, then sign(rxy ) = sign(hu, vi) = · · ·
n Page 17, Line13: . . . we calculate (with just for now φ(ã) = hã, X T Xãi):
n Page 23, Line9: In the picture it must be the z-axis.
n Page 23, Line−3: . . . and (w, b) = (w1 , . . . , wd , b) for . . .
 
n Page 24, Line16: P Yi (f ) = yi for all i = · · ·
Page 24, Line 18: L : f : Rd → (0, 1) | f logistic function → R

n

n Page 26, Line 13: h0 (t) = sig(t)+1


n Page 27, Line 18: . . . the data overlaps if for every w ∈ Rd+1 \{0} there exists . . .
n Page 27, Line 22: . . . , the rounded logisitic logistic regressor, . . .
n Page 27, Line −10: if sig(hw, > xi) > 1/2
n Page 37, Line 1:
1: function k-NN Classifier(D, k, x)
2: D0 ← D, A ← ∅
3: for j ← 1 to k do
4: z ∗ ← argminz∈D0 ρ(x, π1 (z))
5: A ← A ∪ {z ∗ }, D0 ← D0 \{z ∗ }
6: for y in Y do
7: N (y) ← #{a ∈ A | π2 (a) = y}
8: ` ← argmaxy∈Y N (y)
9: return `
Here, π2 (x, y) = y denotes the projection onto the second entry of (x, y) ∈ D and y ∗ is
the label of z ∗ .
n Page 39, Line 9: . . . and k < n.
n Page 39, Line 10: The calculation of the k-neareast neighbors of x can be implemented
such that at most (n · d · k)-many mulitplikations have to be carried out.
n Page 39, Line 14: In the Euklidean metric, it requires (d − 1)-many multiplications to
compute one distance if we omit the root, which we can do as it does not change the
argmin. This leads to
(d − 1) · (n + (n − 1) + · · · + (n − k + 1)) 6 C· d · k · n
multiplikationen with a suitable C ∈ N.
n Page 40, Line 3: . . . , we choose k-nearest neighbors x1 , . . . , xk of x and denote their
labels by y1 , . . . , yk .
k
w(xi , x) · yi
P
i=1
n Page 40, Line 9: f : X → Y, f (x) = k
P
w(xi , x)
i=1

(x(i) (j)
1 − minj=1,...,n x1 )(b − a)
 
n Page 41, Line 5: x̃(i) = a+ (j) (j)
, . . .
maxj=1,...,n x1 − minj=1,...,n x1
 x(i) − x(·) 
n Page 41, Line 7: x̃(i) = 1
σ1
1
,...
(1) (4)
n Page 42, Line 14: ρ(x , x ) = 3.681
n Page 46, Line 1: We discuss some of these methods in Exercise 3.10.
n Page 45, Line 28: We thus see that texts no. 1 and text no. 2 are significantly more
cosine similar than text no. 1 and text no. 3 or text no. 2 and text no. 3.
n Page 48, Line 26: The cosine distance, on the other hand, may appear here more natural,
as the scalar product increases if the frequency of the fixed word increases in the second
text.
n Page 52, Line 11: For finite subsets A, B ⊆ X we define . . .
n Page 53, Line 20:
1: function Linkage-based Clustering (X, ρ, D, δ)
2: k ← #D
3: for i ← 1 to k do
4: Ci ← {xi }
5: while mini6=j ρ(Ci , Cj ) 6 δ and k > 2 do
6: m←0
7: (i∗ , j ∗ ) ← argmini6=j ρ(Ci , Cj )
8: for ` ← 1 to k − 1 do
9: if ` = min(i∗ , j ∗ ) then
10: C` ← Ci∗ ∪ Cj ∗
11: if ` = max(i∗ , j ∗ ) then
12: m←1
13: C` ← C`+m
14: else
15: C` ← C`+m
16: k ← k−1
17: return C1 , . . . , Ck
n Page 54, Line −6: K : Ck → R
n Page 56, Line −16: The following pseudocode approximates a minimizer of the k-means
cost function.
n Page 56, Pseudocode:
1: function k-means (D, k, X, ρ)
2: µ1 , . . . , µk ← pairwise different points from X
3: for i ← 1 to k do
4: Ci ← {x ∈ D | i ∈ argminj=1,...,k ρ(x, µj )}
5: U ← True
6: while U = True do
7: U ← False
8: for i ← 1 to k do
9: µi ← µ(Ci )
10: for i ← 1 to k do
11: Ci0 ← {x ∈ D | i ∈ argminj=1,...,k ρ(x, µj )}
12: if Ci0 6= Ci
13: Ci ← Ci0
14: U ←True
15: return C1 , . . . , Ck
In the lines 4 and 9 of the pseudocode we pick as a single i in the case that the armin is
not unique.
n Page 57, Line 7:
ρ(x, µ)2 ,
P P
µ(A) ∈ argmin respectively µ(A) ∈ argmin ρ(x, µ).
µ∈A x∈A µ∈X x∈A

n Page 57, Line −7: For j > 1 denote by (C1(j) , . . . , Ck(j) ) that clustering which the algorithm
produces in the j-th round. For j > 2 we have
k
ρ(x, µi )2
P P
K(C1(j) , . . . , Ck(j) ) = min
µ1 ,...,µk ∈X i=1 (j)
x∈C i

n Page 58, Line 3: . . . line 11 of Algorithm 4.9 . . .


n Page 58, Line 7: There, Picture 1b corresponds to the penultimate line in the estimate
and Picture 2a to the line above that.
n Page 58, Line 10: . . . as we have just moved point x3 from cluster C2 in Figure 1b to
cluster C1 in Figure 2a, . . .
n Page 59, Line −4: We assume that we start with the initial values µ1 = 2 and . . .
n Page 62, Line 11: A = (aij )i,j=1,...,n
n Page 62, Line 13: L = (`ij )i,j=1,...,n
n Page 63, Line −2: . . . In Example 5.7, λ2 6= 0 and there are no clusters (or, depending
on how one prefers to see it, one single cluster), in Example 5.8 . . .
n Page 67, Line 8: For the other direction let {v1 , . . . , vn } be a basis consisting of eigenvec-
tors corresponding to the λi and let U ⊆ Rn be a subspace with dim U = n − k + 1. By
construction U ∩span{v1 , . . . , vk } =
6 {0} and we can select 0 6= x = α1 v1 +· · ·+αk vk ∈ U .
Then it follows k k
λi αi2 λk αi2
P P
hx, M xi
= i=1k 6 ki=1 = λk ,
hx, xi P 2 P 2
αi αi
i=1 i=1

since the λi ’s are increasing. With this we get min06=x∈U hx,M xi


hx,xi 6λk which then leads to

hx, M xi
maxn min 6 λk .
U ⊆R x∈U
x6=0
hx, xi
dim U =n−k+1

n Page 67, Line −4: . . . and any corresponding eigenvector v2 of norm 1:


n Page 68, Line 14: Let G = (V, E) be a graph with deg(v) > 0 for all v ∈ V .
n Page 70, Line 10:
(xi − xj )2
X

{i,j}∈E
λ2 (L) = min n .
x6=0
x2i di
X
hDx,1i=0
i=1

n Page 70, Line 16:


LD1/2 1 = D−1/2 LD−1/2 D1/2 1 = D−1/2 L1 = D−1/2 01 = 0D1/2 1.

Proposition
5.6

n Page 70, Line 18:


Theorem
5.13 hx, Lxi (∗) hD1/2 y, LD1/2 yi
λ2 (L) = min hx,xi = min
x6=0 y6=0 hD1/2 y,D1/2 yi
hx,D1/2 1i=0 hD1/2 y,D1/2 1i=0
2
P
hy,Lyi {i,j}∈E (yi −yj )
= min hy,Dyi = min Pn 2
y6=0 y6=0 i=1 yi di
hy,D1i=0 hDy,1i=0
n Page 71, Line 14: (ii) min(vol Sk , vol Skc ) = vol Skc and vol Skc − vol Sk+1
c = dk+1 hold
whenever r 6 k 6 n − 1.
n Page 71, Line 17:
n n
vol Skc − vol Sk+1
c
P P
= di − di = dk+1
i=k+1 i=k+2
n Page 72, Line 9:
n
P
hDx, 1i = di xi = . . .
i=1
n Page 72, Line −1 (and Page 73, Line 1):
vol S
· · · = vol S − vol S · vol S+vol S c
vol S
> vol S − vol S · 2 vol S ,

n Page 73, Line 9: . . . and our goal in the following will be to show λ2 > α2 /2, . . .
n Page 73, Line 12 (Equation (5.2)):
n
P
· · · and hDx, 1i = di xi = 0
i=1

n Page 73, Line 21:


 x1 −xr   x1 −xr   0

.. .. ..
.
 xr−1 −xr   x −x   . .
0

  r−1 r  
= −
  x −x  =: p − n.
0
 
 0   0
 xr+1.−xr   0
..   r
..
r+1 
.. . .
xn −xr x −x 0 r n

n Page 74, Line −3 (until top of page 75):


(xi − xj )2
P
{i,j}∈E
λ2 = Pn 2
i=1 xi di
2 2
P 
{i,j}∈E (pi − pj ) + (ni − nj )
> Pn 2 2
↑ i=1 (pi + ni )di
(5.3)
(5.4) P
2 2
P
{i,j}∈E (pi − pj ) + {i,j}∈E (ni − nj )
= Pn 2
Pn 2
i=1 pi di + i=1 ni di
2 2
P P
{i,j}∈E (pi − pj ) {i,j}∈E (ni − nj )

> min P n 2 , P n 2
↑ i=1 pi di i=1 ni di
(5.5)
2 2
P P
{i,j}∈E (pi − pj ) {i,j}∈E (pi + pj )
 
= min Pn 2 · P 2
, . . .
i=1 pi di {i,j}∈E (pi + pj )

Z
 
=: min N , . . . .

n Page 75, Line 8:


n n
p2i di · (pi + pj )2 > p2i di · 2(p2i + p2j )
P P P P
N=
i=1 {i,j}∈E ↑ i=1 {i,j}∈E
(5.5)

n Page 76, Line 3:


 n−1 n j−1  2
p2k − p2k+1
P P P
··· = 1E (i, j)

telescopic
i=1 j=i+1 k=i
sum
n Page 79, Line 13: Finally, we want to note that Theorem 5.21 together with Remark
5.17(i) provides an upper bound for the eigenvalue λ2 (L).
n Page 82, Line −5: Let a dataset D = {x1 , . . . , xn } ⊆ Rd be given, . . .
n Page 99, Line −3: Multiplication with V T from the left in
Corollary 5.22. Let G = (V, E) be a graph with deg(i) > 0 for all i ∈ V . Then for
the second smallest eigenvalue λ2 of the normalized Laplace matrix of G the estimate
λ2 6 2 holds. 
n Page 105, Zeile −1 and Page 106, Line 1:
 0.07 0.29 0.32 0.51 0.66 0.18 90.23   12.4 0.0 0.0 0.0 0.0   0.56 0.09 0.56 0.09 0.59 
0.13 90.02 90.01 90.79 0.59 90.02 90.06 0.0 9.5 0.0 0.0 0.0
 0.68 90.12 0.69 90.12 0.69 0.02
90.11 90.05 90.05 90.24 0.56 90.35   0.0 0.0 1.3 0.0 0.0 
  90.40 90.09 90.40 90.09 0.80 
A=  0.15
 0.59 0.65 90.25 90.33 90.09 0.11   0.0

0.0 0.0 0.0 0.0   0.41 0.09 0.40 0.09 90.80 
 0.41 90.07 90.03 0.10 90.02 90.78 90.43   0.0 0.0 0.0 0.0 0.0 

0.51 0.48 90.51 90.48 90.00

0.07 0.73 90.67 0.00 90.00 0.00 0.00 0.0 0.0 0.0 0.0 0.0 0.48 90.51 90.48 0.51 90.00
0.55 90.09 90.04 0.17 0.17 90.11 0.78 0.0 0.0 0.0 0.0 0.0
| {z } | {z } | {z }
U Σ VT
 0.07 0.29 0.32 
0.13 90.02 90.01
 0.68 90.05  12.4
90.11
   0.56 0.09 0.56 0.09 0.59 
=  0.15
 0.59 0.65 
 9.5 90.12 0.69 90.12 0.69 0.02 .
 0.41 90.07 90.03  1.3 90.40 90.09 90.40 90.09 0.80
0.07 0.73 90.67
0.55 90.09 90.04

n Page 106, Zeile 6:


 0.15 1.97 0.15 1.97 0.56   0.07 0.29 
0.92 0.01 0.92 0.01 0.94 0.13 90.02
 4.84 0.03 4.84 0.03 4.95   0.68 90.11  h ih i
 12.4 0.56 0.09 0.56 0.09 0.59
Ǎ = =
   0.15 0.59 
 0.36 4.03 0.36 4.03 1.20   9.5 90.12 0.69 90.12 0.69 0.02
 2.92 90.00 2.92 90.00 2.98   0.41 90.07 
90.34 4.86 90.34 4.86 0.65 0.07 0.73
3.92 0.02 3.92 0.02 4.00 0.55 90.09

n Page 107, Line 15:


 0.07 0.29 90.23 
..
.

0.13 90.02 90.06 " # 0.56 0.09 0.56 0.09 0.59


 0 
 0.68 12.4 ..
90.11 90.35  9.5 90.12 0.69 90.12 0.69 0.02
= [0 1 0 0 ] 0.15
 0.59 0.11 
  . ..  . 
..
.

 0.41 90.07 90.43 


1.3 .. . 0
1
.
..

0.07 0.73 0.00 0.48 90.51 90.48 0.51 0.00 0


0.55 90.09 0.78
..
.

n Page 107, Line −2:

u2 = 0.29 · Abbie − 0.02 · Bailey + · · · −0.09 · Gladys,

n Page 109, Line 10:


Abbie −→
Bailey −→  0.07 0.29

Alien Casablanca Star Wars Titanic The Matrix
Catherine −
−→

0.13 90.02
−→


−→

Darlene −

→ 0.68 90.11 h ih i
  12.4 0.56 0.09 0.56 0.09 0.59
Ǎ =  0.15 0.59 
9.5 90.12 0.69 90.12 0.69 0.02
0.41 90.07
 
Elena −→
 
0.07 0.73
Fatima −→
Gladys −→
0.55 90.09

n Page 109, Line -3: . . . and V̌ = {v1 , v2 }.


n Page 229, Line 13: Alternatively, with sigmoid activation, . . .
P(A ∩ B)
n Page 282, Line 6: (ii) For A, B ∈ Σ with P(B) 6= 0, P(A|B) := ...
P(B)
n Page 283, Line 19:
kx−µk2
1
ρ(x) = 1
(2πσ 2 )d/2
e− 2σ 2 respectively ρ(x) = · 1B (x),
λd (B)

n Page 286, Line −5: For A = A1 × · · · × Ad ⊆ Rd with Ai ∈ Bd we calculate


n Page 288, Line 2:
Z
1 (s − t)2  t2 
(ρ1 ∗ ρ2 )(s) = √ exp − 2a
exp − 2b dt
2π ab
ZR
1 b(s2 − 2st + t2 ) + at2 
= √ exp − 2ab
dt
2π ab
ZR
1 t2 (b + a) − 2stb + bs2 
= √ exp − 2ab
dt
2π ab
ZR
1 t2 (b + a)/c − 2stb/c + bs2 /c 
= √ exp − dt
2π ab R
2ab/c
Z
(t − (bs)/c)2 − (sb/c)2 + s2 (b/c) 
=√ p1 exp − dt
2πc 2π(ab/c) R
2ab/c
Z
(sb/c)2 − s2 (b/c) (t − (bs)/c)2 
= √ 1 exp(+ )p 1 exp − dt
2πc 2ab/c 2π(ab/c) R
2ab/c
(sb/c)2 c2 − s2 (b/c)c2
= √1 exp(+ )
2πc 2abc
s2 (b2 − bc)
= √1 exp(+ 2abc
)
2πc
2
= √1 s
exp(− 2c ),
2πc

You might also like