Algodat 2 Merged
Algodat 2 Merged
Contact [email protected]
Please always mention your student id (8 digits)
and the course id (6 digits) in the subject line!
https://studienhandbuch.jku.at/125317
PRAM Algorithms
Date Topic
Mo. 11.10.2021 Introduction + Randomized Algorithms
Mo. 18.10.2021 Height Balanced Trees
Mo. 25.10.2021 Weight Balanced Trees
Mo. 08.11.2021 Hashing
Mo. 15.11.2021 Double Hashing
Mo. 22.11.2021 Monte Carlo Tree Search
Mo. 29.11.2021 Graphs: Structure
Mo. 06.12.2021 Graphs: Flows
Mo. 13.12.2021 Social Graphs
Mo. 10.01.2022 Community Detection
Mo. 17.01.2022 Community Analysis
Mo. 24.01.2022 PRAM Algorithms
Mo. 31.01.2022 Exam
A.V. Aho, J.E. Hopcroft, J.D. Ullman: Data Structures and Algorithms,
Addison Wesley, 1987.
Donald E. Knuth:
The Art of Computer Programming. Volume 1 / Fundamental Algorithms.
The Art of Computer Programming. Volume 2 / Seminumerical Algorithms.
Reading, Mass.: Addison Wesley Longman, 1997.
▪ Frühe Standardwerke in der Informatik Grundausbildung.
◦ If you see the moodle icon but cannot access the course, please contact [email protected].
Date Topic
Oct 05/06 Introduction
Oct 19/20 Exercise 1: Randomized Algorithms
Nov 09/10 Exercise 2: Search Trees
Nov 23/24 Exercise 3: Hashing
Nov 30/Dec 01 Exercise 4: Graphs (Structure)
Dec 14/15 Exercise 5: Graphs (Shortest paths)
Jan 11/12 Exercise 6: Graphs (Network flows)
Jan 25/26 (reserved)
On computable numbers,
with an application to the Entscheidungsproblem
A. M. Turing
Proceedings of the London mathematical society 2.1 (1937): 230-265.
Gambling
◦ card games
◦ dice games
Security, Cryptography
Simulation
Machine Learning
Direct methods
◦ modulo calculation
§ z = randint () % n
◦ scaling
§ z = randint () * n / m
Problems
◦ overflow of the value range with scaling
◦ randomness by modulo
Maximum Density
◦ Ri values to avoid large gaps in [0,1]
◦ Problem: Ri is discrete (not continuous)
◦ Solution: a very large integer for modulus m (e.g., 231-1)
Maximum Period
◦ achieve maximum density and avoid cycling
◦ needs good choice of a, c, m, and X0
Efficiency
◦ binary representation of numbers
◦ modulus m a power of 2 (2b or close)
◦ Xi+1 is obtained from aXi+c by dropping the leftmost and using the b rightmost binary digits
def smallRand(n):
k = m/n # how often does [0..n) fit in [0..m)
max = k*n # greatest multiple ≤ m
while(r >= max): # restriction to values ∈ [0..max-1]
r = randint()
return r/k # scaling from [0..max-1] to [0..n-1]
Uniformity
◦ uniformly/equally distributed
◦ divide the interval [0,1] into n classes of equal length
◦ given N random numbers (observations):
expected number of observations in each interval is N/n
Independence
◦ probability of observing a value in a particular interval is independent of the previous value
◦ no correlation between consecutive random numbers
Determinism
◦ algorithm implies uniqueness
◦ generating numbers using a known method removes the potential for true randomness
◦ if the method is known, the set of random numbers can be replicated
◦ however, random numbers should not be predictable
⇒ Generate random numbers in a way completely opaque for the user of the generator
Pseudo-random
◦ random number sequence is reproducible and the same for every program run
◦ e.g., 17th call always returns 5761
⇒ Important for recreating exactly the same scenarios in simulations
⇒ Not desirable in games, to avoid e.g., sequences of dice to always start with 5,3,4
Verification of randomness
◦ tests for uniformity and independence for a particular generation scheme
Considerations
T. Song, Relationships among some univariate distributions, IIE Transactions (2005) 37, 651–656
T. Song, Relationships among some univariate distributions, IIE Transactions (2005) 37, 651–656
T. Song, Relationships among some univariate distributions, IIE Transactions (2005) 37, 651–656
T. Song, Relationships among some univariate distributions, IIE Transactions (2005) 37, 651–656
Algorithms and Data Structures 2 // 2021W // 31 A. Ferscha
RANDOM NUMBERS :: PROBABILITY DENSITY
RNGs randint(), randf() generate
Uniform Distribution
but may want RNs with other PDFs:
Method
def gauss():
sum = 0
for i in range(n):
sum = sum + (randf()*2.0-1.0)
return sum
Box-Muller Transform
Method
def shuffle (a):
for (i=n, n-1, ... 3, 2):
k = smallRand(i) # random element from a [0: i-1]
a[i-1] ↔ a[k] # comes at the end
m = 8; a = 3; c = 5; X0 = 0
1 2 3 4 5 6 7 8 9 10 11 12
5 4 1 0 5 4 1 0 5 4 1 ... ⇒ period = 4
m = 8; a = 3; c = 2; X0 = 0
1 2 3 4 5 6 7 8 9 10 11 12
2 0 2 0 2 0 2 0 2 0 2 ... ⇒ period = 2
Find the period of the multiplicative congruential generator for a=13, m=26=64 and X0=1,2,3 and 4
◦ m=64, c=0; Maximal period P=m/4 = 16 is achieved by using odd seeds X0=1 and X0=3 (a=13
is of the form 5+8k with k=1)
◦ X0=1 -> the generated sequence {1,5,9,13,…,53,57,61} has large gaps
◦ density insufficient
◦ period too short
◦ not a viable generator
Example
m = 18
18 = 2·3$ ⇒ prime factors are {2, 3}
⇒ c ∈ {1, 5, 7, 11, 13, 17}
⇒ a = 2·3·k + 1 ⇒ a ∈ {7, 13}
If m is a prime number
⇒ m itself is the only prime factor
⇒ the smallest possible a would be m + 1
⇒ because of 2≤a<m the maximum period length is not reachable
X1 = 75(123457)mod(231-1) =2,074,941,799
R1 = X1/231 =0.9662
X2 = 75(2,074,941,799) mod(231-1) =559,872,160
R2 = X2/231 =0.2607
X3 = 75(559,872,160) mod(231-1) =1,645,535,613
R3 = X3/231 =0.7662
…
Period length is determined by the size of the "memory" (number of possible states)
§ 16 bit ⇒ max. period length 𝟐𝟏𝟔
RG
Random number
Random number
Abbreviation ⊕
a) bitwise XOR: z = t[p]^t[r]
b) addition modulo m: z = (t[p]+t[r])%m
RG1
Level of significance
When to test
◦ if a well-known simulation language or random number generator is used, it is probably
unnecessary to test
◦ if the generator is not explicitly known or documented, e.g., spreadsheet programs,
symbolic/numerical calculators, tests should be applied to many sample numbers
Types of tests
◦ theoretical
§ evaluate the choices of m, a, and c without actually generating any numbers
◦ empirical
§ applied to actual sequences of numbers produced
Consecutive dice rolls may produce the same number of eyes due to independence of rolls
Probability is 1/6
If the entire range of values is used the serial test may fail
When mapping to [0, n-1] with n < m, this should ideally occur in one of n cases
Example
◦ random numbers in [0, r-1]
◦ probability of gaps of length k
r = 10
Procedure
◦ generate n runs-ups
◦ count the number of runs-ups of length 0, 1, …, k
Hypothesis
◦ actual distribution comes from a given n is the # of classes
target distribution (not necessarily uniform)
Ei is the expected
# in the ith class
Procedure (Oi - Ei ) 2
n
1 n 1 n
deviations too large too regular
Algorithms and Data Structures 2 // 2021W // 71 A. Ferscha
RANDOM NUMBERS :: EXAMPLE: CHI-SQUARE TEST
Let a = 0.05 and n=10 intervals of equal length, namely [0,0.1),[0.1,0.2), …., [0.9,1.0)
c02 = 3.4
Critical value from table c0.05,92 = 16.9
Hence, null hypothesis is not rejected
Algorithms and Data Structures 2 // 2021W // 72 A. Ferscha
RANDOM NUMBERS :: VISUAL TESTS
Bar test
◦ visualization of the frequency distribution
Sky test
◦ x/y pairs form a starry sky
Stripe test
◦ randomly coloured lines
• In addition to the input, a randomized algorithm takes random numbers to make random
choices during execution
• Behavior (output) of the algorithm depends on a random experiment
◦ Output varies in each execution, also using same input
◦ Runtime and correctness can only be guaranteed with a certain probability
• Hence, Randomized Algorithms are nondeterministic
Advantages
◦ Efficiency
◦ Simplicity
Attention
◦ Randomized inputs (by itself) do not make randomized algorithms
find_char_monte_carlo(char[] A, k)
begin
i = 0
do
randomly select one element from A
i++
while (i < k and ‘a’ is not found)
end
Probabilistic determination of pi
Solving Integrals m
A1 = n × m
• Random samples (x,y) from rectangle ((0,0), (n,m))
• Count the number of random pairs below (nin) and above n
< 84 >
Algorithms and Data Structures 2 // 2021W // 84 A. Ferscha
RANDOMIZED ALGORITHMS :: LAS VEGAS ALGORITHM
find_a_las_vegas(char[] A, k)
begin
do
randomly select one element from A
while (‘a’ is not found)
end
Markov Chains
◦ discrete-time stochastic processes moving from one state to the next
◦ sequence of states Xi, for which the probability P(Xs) to reach a certain state Xs is dependent
only on the previous state of the system
Finite State Markov Chain M = (S, P)
◦ S is a finite set of states and P an S*S stochastic matrix holding the transition probabilities (i, j
∈ S such that 0 ⩽ Pij ⩽ 1 and ∑ Pij = 1)
◦ for all i, j ∈ S, Pij is the probability that M moves from state i to state j
◦ the probability only depends on i and not on previous steps
Let Xt be a random variable which determines the state of M at time t
P(Xt+1 = j | Xt = i) = Pij
◦ the initial state X0 is usually set stochastically based on some initial distribution
0 0,2 0,8 0 𝑖𝑓 𝑖 = 1
𝑀 = 0,5 0 0,5 𝑃 𝑋𝑡 + 1 = 𝑖 𝑋𝑡 = 1) = 𝑓 𝑥 = 60,2 𝑖𝑓 𝑖 = 2
0,1 0,9 0 0,8 𝑖𝑓 𝑖 = 3
stochastic matrix where the rows
represent the current state and the
columns the next state
0,5 𝑖𝑓 𝑖 = 1
𝑃 𝑋𝑡 + 1 = 𝑖 𝑋𝑡 = 2) = 𝑓 𝑥 = 6 0 𝑖𝑓 𝑖 = 2
0,5 𝑖𝑓 𝑖 = 3
state graph of the Markov chain
0,1 𝑖𝑓 𝑖 = 1
𝑃 𝑋𝑡 + 1 = 𝑖 𝑋𝑡 = 3) = 𝑓 𝑥 = 60,9 𝑖𝑓 𝑖 = 2
0 𝑖𝑓 𝑖 = 3
transition probabilities of the Markov
chain
Given a G = (V, E) finite connected undirected graph with 𝑉 = 𝑛 vertices and 𝐸 = 𝑚 edges
◦ neighbors of vertex u, Γ(u) = {v | (u, v) ∈ E}
Consider a random walker who is located at any vertex of the graph at any time moving to any
vertex connected via an edge at equal probability
◦ muv is the expected number of steps needed to get from vertex u to vertex v
◦ Cuv = muv + mvu is the expected number of steps from vertex u to u visiting vertex v at least
once
Each random walk on a graph G is a finite Markov chain where Pij = 0 if there is no edge from i
"
to j or Pij = otherwise (with d(i) being the initial degree of vertex i)
?(#)
• Use probabilistic control for counter-steps, i.e. adopt thermodynamics of the annealing
process modeled using the Boltzmann probability distribution:
Prob(E) ~ exp(-E/kT)
E ... possible energy state
Algorithms and Data Structures 2 // 2021W // 95
T ... system temperature A. Ferscha
k ... Boltzmann’s constant
THE BOLTZMANN DISTRIBUTION
moderate
probability for
up-hill stepping at
low temperature
reasonable
probability for
up-hill stepping at
high temperature
Anneal(I,S)
PRE: I = Instance of problem
S = initial feasible solution to I of size n > 0
h(S) = objective function, cost of solution
POST: S = feasible solution for I
Temperature(T)
POST: T provides about 95% chance
of changing solutions
A tree is
• an acyclic, simple, coherent graph
• i.e. does not contain loops and cycles:
between each pair of nodes there is at maximum one edge
Generalization of lists:
• Element (node) may have multiple successors.
• Exactly 1 node has no predecessor: root
• Nodes without successors: leaves
Frequently used data structure: decision trees, syntax trees, derivation trees, search trees, ...
Frequently used representation form: quantity, bracket, recursive indentation, graph, ...
Here: Use of trees to store keys and realization of dictionary operations (search, insert, remove)
in e.g. binary trees
Tree B is ordered, if successors of each node are ordered (1., 2., 3. etc.; left, right).
In an ordered tree, the subtrees Bi of each node form an ordered set (e.g.: arithmetic expression)
Order of B: maximum number of successors of a node
Path of length k: Follow p0, ..., pk of nodes, such that p is successor of pi-1
• Height of a tree:
maximum distance of a leaf to the root.
• Depth of a node:
distance to the root (the number of edges on a path from this node to the root)
The nodes at level i are all nodes with depth i.
A tree of order n is called complete if all leaves have the same depth
and the maximum number of nodes is present at each level.
A root
B is parent of D and E
C is sibling of B
D and E are children of B
D, E, F, G, I are external nodes or leaves
A, B, C, H are internal nodes
The depth of E is 2.
The height of the tree is 3.
The order of B is 2, the order of C is 3.
If T is a binary tree (order of all nodes <= 2) with n nodes and a height h, then T has the following properties
F M
D W C E
E R
• Preorder:
28, 16, 12, 8, 15, 19, 34, 31, 29, 49
• Postorder:
8, 15, 12, 19, 16, 29, 31, 49, 34, 28
• Inorder:
8, 12, 15, 16, 19, 28, 29, 31, 34, 49
Hints: If you traverse the tree starting from the root, you have
• Preorder by visiting all nodes on the left side of which you pass by.
• Postorder by visiting all nodes on the right side of which you pass by.
• Inorder by visiting all nodes on the bottom side of which you pass by.
Assumption: For each node n with children cl, cr and key k we have (search tree condition):
• All keys stored in subtree of cl are smaller than k
• All keys stored in subtree of cr are larger than k
• All keys stored in subtree of cl are smaller than all keys stored in subtree of cr
Pseudocode
Algorithm TreeSearch(k, v)
Input: key k to be searched, node v of a binary search tree
Output: node w in subtree of v, for successful search internal node with key k, for unsuccessful
search
external node after all internal nodes smaller than k and before all internal nodes greater than k
if v is an external node then
return v
if k = key(v) then
return v
else if k < key(v) then
return TreeSearch(k, T.leftChild(v))
else
return Treesearch(k, T.rightChild(v))
For searching, inserting, removal, nodes along a root-leaf-path (plus possibly siblings of such nodes)
needs to be traversed.
• Complexity per node: O(1)
• Therefore the runtime is O(h), where h is the height of the tree
In worst case the height of the binary search tree is h = N
(tree „degenerates“ to sorted sequence)
Removal of a node
Cost measure: Number of nodes visited or number of search steps or key comparisons required.
𝑃𝐿 𝐵 = ( 𝐿𝑒𝑣𝑒𝑙 (𝐾! )
!"#
!"
• The mean path length is calculated by 𝑙 =
#
Maximum access cost We have the longest search path and thus the maximum access cost, when the binary search
tree degenerates into a linear list.
ℎ𝑒𝑖𝑔ℎ𝑡 ℎ = 𝑙!"# + 1 = 𝑁
$)#
1 (𝑁 + 1) (𝑁 + 1)
𝑧%&' = ∗ ( 𝑖+1 ∗1=𝑁∗ = = 𝑂(𝑁)
𝑁 2𝑁 2
!"(
i-1 N-i
Zi-1 nodes nodes ZN-i
$
The probability that the first key has the value i is
%
(Assumption: same access probability to all nodes)
We get:
𝑁+1
𝑧$ = 2 ∗ ∗ 𝐻$ − 3 = 2𝑙𝑛 𝑁 − 𝑐
𝑁
The balanced binary search tree causes the least cost for all basic operations.
Efficiency of dictionary operations (insert, search, remove) on trees depends directly on the tree height.
Aim:
• Fast access with zmax ~ O (log2 N)
• Insert and remove operations in logarithmic complexity.
Heuristics:
For each node in the tree, the number of nodes in each of its two subtrees should be kept as constant as possible.
• Height-balanced trees:
• Weight-balanced trees:
The ratio of the node weights (number of nodes) of both subtrees meets certain conditions.
• B-Trees
always balanced due to balance sustaining search, insert, remove
number of children per inner node between t (=m/2) and 2t (=m)
access in O(log(n))
• AVL Trees
The heights of each internal node’s children only differ by a maximum of 1
access in O(log(n))
10244– 1 = (210)4-1 = 240- 1 = 1.099.511.627.775 keys (i.e. access time 10 times lower)
Remove Move
Removing (inner node) could transfer a sibling
destroy balance to expand (minimum) inner node
before stepping down
check whether there are
enough keys in the subtree
Merge Remove
before stepping down before deleting key from inner node
merge two a siblings separate/adjust the key ranges
if both (left and right child) of both left and right child
are at minimum number of inner nodes (here: inorder transfer,
i.e. rightmost of left child)
An AVL tree is a binary search tree, in which the heights of each internal node’s children only differ by a maximum of 1.
Example (numbers next to the nodes indicate their height):
44 4
2 17 78 3
32 1 2 50 88 1
48 1 62 1
Sketch of a proof:
Find n(h), the smallest possible number of internal nodes in an AVL tree of height h.
By inserting a node into an AVL tree, the height of some nodes in this tree changes
2 17 78 3
1 32 2 50 1 88
48 1 62 1
2 17 78 4
unbalanced!
1 32 3 50 1 88
48 1 62 2
54 1
17 78 z
32 y 50 88
48 62 x
54
17 78 z
32 y 50 88
48 62 x T3
T0 54 T2
T1
17 78 z/c
32 y/a 50 88
48 62 x/b T3
T0 54 T2
T1
2 17 3 62 x/b
1 32 y/a 50 2 2 78 z/c
1 48 1 54 88 1
T0 T1 T3
T0 T3
T1 T3 T0 T1 T2
T2
c=z
b=y
b=y
a=x c=z
a=x
T3
T0
T0 T2
T1 T2 T3
T1 Algorithms and Data Structures 2 // 2021W // 37 A. Ferscha
RESTRUCTURING IN AVL TREES
(DOUBLE ROTATIONS)
a=z b=x
T0
T2
T2 T3 T0 T1 T3
T1
c=z b=x
T3 T1
T0 T1 T0 T2 T3
T 2
Algorithms and Data Structures 2 // 2021W // 38 A. Ferscha
ALGORITHM FOR RESTRUCTURING
Input:
node x of a binary tree T with parent y and grandparent z
Output:
tree T restructured by (single or double) rotation of nodes x, y, z
50 78 x
Create a new tree from the 7 parts, which is balanced
and in which the in-order sequence of the parts is retained. T0
48 54 88
T2
T1 T3
Example 2
number 7 parts according to in-order traversal 44 z
1 4
17 62 y
3 6
50 78 x
T0 5 7
48 54 88
T2
T1 T3
Example
Create an array with indices 1..7
1 2 3 4 5 6 7
“Cut“ the 4 subtrees and place them into the array
according to their numbering.
1 2 3 4 5 6 7
Example
• Reassemble the tree again
• Set element
at position 4 (b) as root
4 T0 z=a T1 y=b T2 x=c T3
62 b
1 2 3 4 5 6 7
Example
• Reassemble the tree again
• Set elements
at position 2 and 6 as children
4 T0 z=a T1 y=b T2 x=c T3
62 b
1 2 3 4 5 6 7
2 6
44 a 78 c
Example
• Reassemble the tree again
• Set elements at position
1, 3 or 5, 7 as child of 2 and 6
4 T0 z=a T1 y=b T2 x=c T3
62 b
1 2 3 4 5 6 7
2 6
44 a 78 c
1 3 5 7
17 50 T2 88
T0 48 54 T3
T1
Cut/Link restructuring algorithm has the same effects as the four rotation cases previously considered
Advantage:
• No case distinction necessary
• More „elegant“ solution
Disadvantage:
• May require more code
• Let z be the first unbalanced node which is visited while traversing up in the tree.
• Let y be the child of z with the largest height and let x be the child of y with largest height.
• The algorithm restructure(x) can be used to restructure and balance the subtree with root z.
• However, restructuring can destroy the balance at higher levels, so that the verification
(and restructuring if necessary) must be continued until the root node is reached.
Example
44
17 62
32 50 78
48 54 88
remove
Example
44 z unbalanced!
17 62 y
50 78 x
T0
48 54 88
T2
T1 T3
Example
62 y
44 z 78 x
17 50 88
T2
48 54
T0 T3
T1
Example unbalanced!
44 z
Alternative 17 62 y
50 x 78
T0
48 54 88
T1 T2 T3
Example
50 x
Alternative 44 z 62 y
17 48 54 78
88
T0 T1 T2
T3
• From this the estimation follows: h £ 1,44 log2 (N(h)+2) à h = O (log N(h))
Minimum number of nodes grows exponential with height
à so vice versa: height grows logarithmically with node number
◦ Height-balanced trees:
◦ Weight-balanced trees:
Balanced trees are introduced as a compromise between balanced and natural search trees,
whereby logarithmic search complexity is required in the worst case.
Definition:
Let B be a binary search tree with left subtree Bl and l be the number of nodes in Bl (let N be the
corresponding number of nodes in B)
◦ r ( B ) = ( l + 1 ) / ( N + 1) is the root balance of B.
◦ A tree B is weight-balanced ( BB(a ) ) or of limited balance a,
if for each subtree B‘ of B we have: a £ r ( B‘ ) £ 1 - a
Rebalancing
◦ Use of the same rotation types as for the AVL tree
◦ is guaranteed by the choice of a £ 1 - Ö2 / 2
Example: Search 8
11 13 17 20 21
Example: Search 12
11<12<13 à
11 13 17 20 21
middle subtree
External node à unsuccessful!
5 10 15
3 4 6 8 11 13 14 17
Insert key in lowest internal node, that has been reached during search.
insert g
Case 1: d d g
Node has first record
insert f
Case 2: d g d fg
Node has 2 records
fn
Case 3:
n
Node has already 3 records
◦ Node splitting: Split node into two insert e
d fg d e g
1-element nodes and move middle
element into parent node
Top-Down Insertion
◦ Starting with the root, node-splitting is done for each node with three elements, that is
visited on the way when searching for the insertion position
◦ This ensures that inserting can be done according to case 1 or 2.
split concerns
constant number of nodes à O(1)
Bottom-Up Insertion
◦ Insertion position searched
◦ If the node at the insertion position has already 3 elements, node-splitting is done
◦ If this results in an “overflow” in the parent node (by moving the middle element upwards),
node-splitting is done again.
Principle
◦ Find the record to be deleted via key
◦ Remove the entry and merge (inverse operation to split), if node has too few entries
Example 5 10 6 10
remove 3
3 6 8 11 5 8 11
5 10 6 10
remove 3
Example: Remove 3
3 6 8 11 5 8 11
underflow!
therefore transfer
3 8 11 5 8 810
10
Example: Remove 11
Example: Remove 14
5 10 15
3 4 6 8 11 14 17
5 10 5 10 12 15
12
merge
3 4 6 8 11 14
15 151717
Red-Black trees
• balanced
• Search, Insert, Remove: O(log N)
• Binary tree structure!
or
Let
N be the number of internal nodes
L be the number of leaves (L = N+1)
H be the height
B be the Black Height (height according to black edges)
Property 1:
2B £ N+1 £ 4B
Property 2:
1/2 log2 (N+1) £ B £ log2 (N+1)
Property 3:
Search algorithm in Red-Black trees is
log2 (N+1) £ H £ 2 log2 (N+1) identical to search in binary trees
2 2
insert 1
• If the parent of the new node has already an incoming red edge, two red edges would follow
each other. Therefore restructuring by rotation or promotion is required.
Restructuring by rotation
(single)
Restructuring by rotation
(double)
T2 T2
T1 T1
Runtime
◦ Restructuring: O(1)
◦ Recoloring: O(log N)
if it propagates until root is reached
◦ Therefore the overall complexity for insert is: O(log N)
Case 1:
„remove 8“ The node to be deleted has at
least one external node as a
child.
Case 2:
„remove 7“ The node to be deleted has
no external node as child,
then replace node with in-
order predecessor (or
successor).
v
u
u w
If incoming edge to v was black, color the incoming edge to u double-black.
v
u
u w
3. As long as there are double-black colored edges, “color compensation” by restructuring or
recoloring is required (total number of black edges must be preserved).
z v
p s
v s p z
z v
v s v s
p p
v s v s
p s
v s p
Insert or Remove can cause a local interference (successive red or double-black colored edges)
Complexity
◦ One restructure or recolor step: O(1)
◦ Insert: at maximum 1 restructure step or O(log N) recolor steps
◦ Remove: at maximum 2 restructure steps or O(log N) recolor steps
◦ Overall complexity: O(log N)
Binary search tree, in which “splaying” is done after each access operation => adaption to search
queries
◦ Splaying: special move-to-root operation is applied to node x
◦ Three cases for one step in splaying:
§ zick-zick: x is right (left) child of y, y is right (left) child of z
z
x
splaying step
y
y
T1
x T4
z
T2
T3
T3 T4
T1 T2
Algorithms and Data Structures 2 // 2021W // 33 A. Ferscha
Splay Trees
y
splaying step x
T1
x
z y
T4
T2 T3
T1 T2 T3 T4
y x
x y
splaying step
T1 T3
T2 T3 T1 T2
Splaying operation starts at the lowest node x, which is visited in an access operation
(insert, delete, find)
Is executed until this node x is the root
In each zick-zick or zick-zack step the height of x decreases by 2, in one zick step by 1
◦ Therefore, splaying the node x with depth d requires ëd/2û zick-zacks or zick-zicks if d is even
and an additional zick if d is odd
Each of these operations affect a constant number of nodes, so the complexity is O(1)
◦ Therefore we have a complexity for splaying of O(d)
d ... depth of the tree
3 10
4 11
6 12
5 7 15
13 17
14
8 after zick-zack
next step: zick-zick
3 10
4 11
6 12
5 7 14
13 15
17
8 after zick-zick
next step: zick-zick
3 10
4 14
6 12 15
5 7 11 13 17
14
after zick-zick – done!
10 15
8 12 17
3 11 13
5 7
Insertion of key k
◦ Splaying is done with new node x containing k
(previous example could be considered as splaying after “insert 14”)
Removal of key k
◦ Splaying with parent from removed node
• original tree
1
2 2 3
• original tree
• after Insert of 2 1
after splaying
after insertion of 3
after splaying
3 10
4 11
5 7
3 10
4 11
5 7
6 10
4 11
3 5
6
8
4 7
3 5 10
11
• An amortized analysis (using the accounting method) can show that on average we have:
O(log N)
Hashing:
• Try to do this without key comparisons, i.e. determine by calculation where a data set with key k ∈ K is stored.
Hashtable:
• Data set are stored in an array A[0..N-1]
Hashfunction:
• h: K → {0, ..., N-1} assigns a hash address to each key k (= index in the hash table)
0 ≤ h(k) ≤ N-1
• Since N is generally much smaller than K, h() is generally not injective
• Example: Symbol table: 51 reserved words in Java with more than 6280 allowed identifiers with ≤ 80 digits.
Synonyms
• Keys k, k‘∈ K are synonymous if h(k) = h(k‘)
Address collision
• The same hash address is assigned to synonyms
• No synonyms, no collision
• Address collision requires special handling
Occupancy factor
• For a hash table of size N, that currently stores n keys, we specify 𝜶 = n/N as the occupancy factor.
Hash tables:
Efficient implementation of a dictionary with regard to storage space and complexity of
search, insert and remove operations (usually better than implementations based on key
comparisons)
• Key-value pairs are stored in an array of size N
• Index is calculated from the hash function value of the key h(k).
Aim: store item(k,e) at A[h(k)]
• Example: Use key k modulo array size as index and use chaining,
if two keys are mapped to the same index (collision)
A[0] 5
A[1] 11 1
Chaining:
A[2]
Keys with same index are stored in a list.
A[3] 13
A[4] 9 24
First part of the hash function (h1) assigns an integer to any key k
= Hash code or hash value
in Java: hashCode() method returns 32 bit int (!) for each object
• (in many Java implementations, however, this is only the memory address of the object,
i.e. a bad distribution => bad hash codes => overload with better method)
Consider binary representation of the key as (x0, x1, x2, ... xk-1):
simple accumulation results in bad hash code because e.g. „spot“, „stop“, „tops“ ... Collide.
for (Java-)Strings therefore:
Consider the character values (ASCII or unicode) x0x1...xn-1 as coefficients of a polynomial
x0 ak-1 + x1 ak-2 + ... + xk-2 a + xk-1
Calculation according to Horner scheme (overflows are ignored) for certain value a ¹ 1
xk-1 + a ( xk-2 + a (xk-3 + ... + a (x1 + a x0)...))
For e.g. a=33, 37, 39, or 41 there are only 6 collisions in a vocabulary of 50.000 (english) words
public static int hashCode(String s) {
int h = 0;
for (int i = 0; i < s.length(); i++) {
h = (h << 5) | (h >>> 27); // 5-bit cyclic shift
h += (int) s.charAt(i); // add next character
}
return h;
}
Division-Reminder-Method
• h(k) = |k| mod N
• Choice of N even (odd), then h(k) also even (odd)
• Bad if e.g. the last bit expresses a fact
(e.g. 0 = male, 1 = female)
• Choice of N = 2p
• h(k) returns the p lowest dual digits of k: bad because remaining bits are neglected
• Choice of N as prime number: N ¹ ri ± j, 0 £ j £ r-1, ... r = radix
(proves best in practice, empirically best results)
in the interval [0,1], then the resulting n+1 intervals have at most three different lengths.
0,618 Y − Y
in the interval [0,1], then the resulting n+1 intervals have at most three different lengths.
If you divide further, the next point (n+1) Y - ë(n+1) Yû falls into the largest partial interval.
Of all numbers 0 £ Y £ 1 the golden ration Y = (Ö5 -1)/2 leads to the most balanced intervals.
.... 1,618034
McDonald’s Toyota
1
National
1.618 Geographic
Multiplicative Method
• Choose constant Y with 0 < Y < 1
• Calculate k Y mod 1 = k Y - ëk Y û k
• h(k) = ëN(k Y mod 1)û 0, Y
• Choice of N not critical
with N = 2p the r1 , r0
calculation of h(k) can be accelerated.
p Bits = h(k)
Example:
If the number of keys to be stored is known and ½ K ½ £ N, collision-free storing is always possible!
Application example:
Keywords of a programming language are assigned to fixed places in a symbol table.
Assumptions
• n data items inserted into a memory with N places
• there have been no deletions
• All configrations of n occupied and N-n nonoccupied storage locations have the same probability
If Pr is the probability that exactly r places must be tested in the unsuccessful search, then we have:
• the first r-1 places are occupied, the rth place is free
• on the remaining m-r places the other n-(r-1) occupied places can be distributed arbitrarily
Observation:
current key set K Ì K is generally not “equally distributed” from the universe of keys K
(example: programmers’ preference for variables i, i1, i2, i3, ... )
In other words:
H is universal if the number of hash functions to which h(k)=h(l) applies is at maximum equal to |H|/N for each pair of different keys.
Universal Hash functions exist / and are „easy“ to create:
Hash table A of size N=3 and p=5 (prime number) Keys K = {0, 1, 2, 3, 4}
20 hash functions H = hi,j(x) = ((ix + j) mod 5) mod 3 1 £ i < p, 0 £ j < p
1x+0 2x+0 3x+0 4x+0
1x+1 2x+1 3x+1 4x+1
1x+2 2x+2 3x+2 4x+2
1x+3 2x+3 3x+3 4x+3
1x+4 2x+4 3x+4 4x+4
each (mod 5) (mod 3) For two randomly chosen keys 1, 4
there is one collision in 4 of the 20
Example: Consider e.g. keys 1 and 4 hash functions
h(1) = h(4) occurs in 4 of 20 hash functions (x+0, x+4, 4x+0, 4x+4)
(1×1 + 0) mod 5 mod 3 = 1 = (1×4 + 0) mod 5 mod 3 in the other 16 there are
(1×1 + 4) mod 5 mod 3 = 0 = (1×4 + 4) mod 5 mod 3
0 collisions
(4×1 + 0) mod 5 mod 3 = 1 = (4×4 + 0) mod 5 mod 3
(4×1 + 4) mod 5 mod 3 = 0 = (4×4 + 4) mod 5 mod 3
i.e. PrH (hi,j(x) = hi,j(y)) £ 4/20=1/5 for all hash functions hi,j(x) Î H
i.e. H is universal
Recommended approach:
Known:
The number of keys |K| which has to be mapped to N hash addresses.
Choose:
1. a prime number p which is greater than or equal to |K|
2. two numbers i, j in the range 1 £ i < p, 0 £ j < p
Then:
h(x) = ((ix + j) mod p) mod N
is a “good” hash function
Definition
1 … 𝑖𝑓 ℎ 𝑥 = ℎ 𝑦 𝑎𝑛𝑑 𝑥 ≠ 𝑦
𝛿 𝑥, 𝑦, ℎ = )
0 … 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
d shows if collisions occur for two keys from K regarding h()
d(x,Y,h) = Sy Î Y d(x,y,h)
d(x,y,H) = Sh Î H d(x,y,h)
= 1/|H| Sh Î H Sy Î S d(x,y,h)
= 1/|H| Sy Î S Sh Î H d(x,y,h)
= 1/|H| Sy Î S d(x,y,H)
£ 1/|H| Sy Î S |H| / N
= |S| / N
i.e. the expected number of already inserted elements which probably have collided with x is |S| / N
This means that an arbitrarily chosen hash function h from a universal set H, will map sequences of keys (no matter
how unilaterally they are) to available hash addresses as evenly as possible.
Inserting a synonym k‘, if key k is already stored: Collision (place h(k) = h(k‘) is already occupied)
h(k‘) is referred to as overflow
Example:
Insert sequence:
25, 2, 15, 50, 13, 6, 20
Hash Function:
h(k) = k mod 7
Method:
Each element of the hash table is a
Overflows in reference to an overflow chain.
linked list
Insert a key k
• search for k as described above (ends unsuccessfully – otherwise it will not be inserted)
• create list element for k and insert it in the overflow list
Remove a key k
• search for k as described above
• if successful, remove from overflow list
Complexity of the search: (new keys are always added to the end of the overflow list)
C´n expected number of searched position for unsuccessful search
C´n = n/N = a
Cn expected number of searched positions for successful search
Cn = 1/n Sj=1..n (1+(j-1)/N) = 1+ (n-1) / 2m » 1 + a /2
Properties of s(j,k)
Sequence
h(k) - s(0,k) mod N
h(k) - s(1,k) mod N
...
h(k) - s(N-2,k) mod N
h(k) - s(N-1,k) mod N
53 12 „coalescing“
5 53 12
"primary
accumulation“
A[0] A[1] A[2] A[3] A[4] A[5] A[6]
6 inspections! Efficiency gets worse
15 2 5 53 12
drastically near a=1
19
Algorithms and Data Structures 2 // 2021W // 31 A. Ferscha
Open Hashing :: Quadratic Probing
53 12
Two synonyms always
traverse through the same
5 probe sequence
A[0] A[1] A[2] A[3] A[4] A[5] A[6]
= interfere each other
53 12 5
Linear Probing
• Probe sequence: h(k), h(k)-1, h(k-2), ...
• Problem: primary clustering
• C´n » ( 1 + 1/(1-a)2) Cn » ( 1 + 1/(1-a))
Quadratic Probing
• Probe sequence: h(k), h(k)-1, h(k)+1, h(k)-4, h(k)+4, ...
• Permutation, if N = 4i+3, prime
• Problem: secondary clustering
• C´n » 1/(1-a) - a + ln( 1/(1-a) ) Cn » 1 - a/2 + ln( 1/(1-a) )
Uniform Probing
• s(j,k) = pk(j) pk one of N! permutations of {0,...,N-1}
• Each permutation has equal probability
• C´n £ 1/(1-a) Cn » 1/a × ln( 1/(1-a) )
Random Probing
• s(j,k) = random number dependent on k
• s(j,k) = s(j´,k) possible, but unlikely
Example
• h1(k) = k mod 7, h2(k) = 1 + k mod 5
Key sequence 15, 22, 1, 29, 26
h1(15) = 1
A[0] A[1] A[2] A[3] A[4] A[5] A[6] h1(22) = 1 h2(22) = 3
15 29 26 22 1 h1(1) = 1 h2(1) = 2
h1(29) = 1 h2(29) = 5
h1(26) = 5 h2(26) = 2
When inserting:
• k encounters kold in A[i], i.e. i = h(k) - s(j,k) = h(kold) - s(j´,kold)
• kold already stored in A[i]
• Idea: Search vacant position for k or kold
Two options:
M1 : kold remains in A[i] and k tries insert position h(k)- s(j+1,k)
M2 : k pushes kold to h(kold) - s(j´+1, kold)
done
k encounters k´´´´
calculate
19 15 12 53 5 2 Insert of 19
h2(2) = 1+ 2 mod 5 = 3
and h1(19) = 19 mod 7 = 5 occupied,
h2(12) = 1 + 12 mod 5 = 3 2 19
calculated
2: probe (2-3)mod7=6 h2(19) = 1+ 19 mod 5 = 5
vacant and
h2(5) = 1 + 5 mod 5 = 1
Example
h1(k) = k mod 7, h2 (k) = 1 + k mod 5
key sequence 15, 22, 1, 29, 26 h1(15) = 1
h1(29) = 1 h2(29) = 5
k yields k´ yields
k encounters k´´´´ k´´ encounters k´´´´´ k´ encounters k´´´´´´ k´´ encounters k´´´´´´´
. . . .
. . . .
• Time for unsuccessful. search remains unchanged
. . .
• Cn´ » 1 / (1-a)
• Time for successful search is reduced to
• Cnbinary tree < 2.2
Example:
A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[0] A[1] A[2] A[3] A[4] A[5] A[6]
N=7, K = {0, 1, ..., 500},
Keys: 12, 53, 5, 15, 2, 19, 21 19 15 12 53 5 2 21 15 12 53 5 2
h1(k) = k mod 7,
h2(k) = 1 + k mod 5 21 19
19 15 12 21 53 5 2
Example
§ h1(k) = k mod 7, h2 (k) = 1 + k mod 5
key sequence 15, 22, 1, 29, 26
h1(15) = 1
h1(29) = 1 h2(29) = 0
h1(26) = 5 h2(26) = 2
h2(22) = 3
§ Durchschnittliche Suchzeit = (1+2+2+2+2)/5 = 9/5 = 1.8
§ hier identisch mit Brent‘s Algorithmus, da jeweils im ersten Sondierungsschritt Lücke gefunden wird
Search for k:
• k´> k in probe sequence => search unsuccessful
Rule for inserting:
• Smaller keys displace larger keys (ordered hashing)
Invariant:
• all keys in the probe sequence before k are smaller than k
(but not necessarily sorted in ascending order)
Problems:
• Displacement can trigger "chain reaction”
• k´ displaced by k: Position of k´ in probe sequence? => (s(j,k) - s(j-1,k) = s(1,k)) 1£j£N
Hash functions:
h1(k) = k mod 7, h2(k) = 1 + k mod 5
Key sequence: 15, 2, 43, 4, 8
15 15
43 8 2 43 43
4
h1(15) = 1
h1(2) = 2
h1(43) = 1 43>15 h2(43) = 4 (1-4) mod 7 = 4
h1(4) = 4 4<43 h2(43) = 4 (4-4) = 0
h1(8) = 1 8<15 h2(15) = 1 (1-1) = 0
15<43 h2(43) = 4 (0-4) mod 7 = 3
Minimax value
• Is the best utility that can be reached from a current node n onwards,
assuming that both players play optimally from n to the end of the game:
Step 1:
• The entire decision tree is
A MAX generated (meaning we
expand every possible
move).
B C MIN • The utility function is applied
to get the terminal values for
each node.
D E F G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
Steps 2-5:
• The first minimax values for
A MAX MAX are determined.
• Node D: max(4, 8) = 8
B C MIN • Node E: max(9, 3) = 9
• Node F: max(2, -2) = 2
• Node G: max(9, -1) = 9
8 D 9 E 2 F 9 G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
max(4,8)
Steps 6-7:
• The minimax values for MIN are
A MAX determined.
min(8,9)
• Node B: min(8, 9) = 8
8 B 2 C MIN • Node C: min(2, 9) = 2
8 D 9 E 2 F 9 G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
Step 8:
• The minimax value for MAX in
8 A MAX the root node is determined.
• Node A: max(8, 2) = 8
8 B 2 C MIN
8 D 9 E 2 F 9 G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
Result
• With this we found our optimal
8 A MAX playing strategy.
• MAX moves to node B.
8 B 2 C MIN • MIN Moves to node D.
• MAX moves to node I.
8 D 9 E 2 F 9 G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
Completeness:
• Minimax is complete, if the game tree is finite.
Optimality:
• Optimal if opponent also plays optimally.
Time Complexity:
• O(bm)
Space Complexity:
• O(bm)
A MAX
8 B 2 C MIN
8 D 9 E 2 F 9 G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
The entire right branch of node A
will never lead to success
Method
Propagate two parameters along the expansion of a path, and update them when backing up: [α, β].
• α ... best (largest) value found so far for MAX.
• β ... best (smallest) value found so far for MIN.
Pruning
• Whenever a Minimax value as a child of a MIN node is less than or equal to the current α:
➔ ignore remaining nodes (subtrees) below this MIN node.
• Whenever a Minimax value as a child of a MAX node is greater than or equal to the current β:
➔ ignore remaining nodes (subtrees) below this MAX node.
⍺=
A β= MAX
⍺= ⍺=
B β= C β= MIN
⍺= ⍺= ⍺= ⍺=
D β= E F G MAX
β= β= β=
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
Step 1:
• The entire decision tree is
⍺ = -∞
? A β = +∞ MAX generated.
• The utility function is applied to
get the terminal values for each
B C MIN node.
• In node A ⍺ is set to -∞ and β is
D E F G MAX set to +∞ and propagated to
node D.
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
Step 1:
• The entire decision tree is
⍺ = -∞
? A β = +∞ MAX generated.
• The utility function is applied to
⍺ = -∞ get the terminal values for each
? B β = +∞ C MIN node.
• In node A ⍺ is set to -∞ and β is
D E F G MAX set to +∞ and propagated to
node D.
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
Step 1:
• The entire decision tree is
⍺ = -∞
? A β = +∞ MAX generated.
• The utility function is applied to
⍺ = -∞ get the terminal values for each
? B β = +∞ C MIN node.
• In node A ⍺ is set to -∞ and β is
⍺ = -∞ set to +∞ and propagated to
D β = +∞ E F G MAX
node D.
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
Step 2:
• In node D, MAX finds the value
⍺ = -∞
A MAX 4 of node H.
β = +∞
• 4 > ⍺(-∞): ⍺ is updated to 4
⍺ = -∞ and the value of D is updated to
B β = +∞ C MIN
4.
⍺=4
4 D β = +∞ E F G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
4 > ⍺(-∞)
Based on an example from John Levine (https://www.youtube.com/watch?v=zp3VMe0Jpf8)
Step 3:
• In node D MAX finds the value
⍺ = -∞
A β = +∞ MAX 8 of node I.
• 8 > ⍺(4): ⍺ is updated to 8 and
⍺ = -∞ the value of D is updated to 8.
B β = +∞ C MIN
⍺=8
8 D β = +∞ E F G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
8 > ⍺(4)
Based on an example from John Levine (https://www.youtube.com/watch?v=zp3VMe0Jpf8)
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
Step 5:
⍺=8 ⍺ = -∞
8 D β = +∞ 9 E β = 8 F G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
9 > β(8)
Based on an example from John Levine (https://www.youtube.com/watch?v=zp3VMe0Jpf8)
Step 6:
8 > ⍺(-∞) • In node A MAX finds the value
⍺=8
8 A β = +∞ MAX 8 of node B.
• 8 > ⍺(-∞): ⍺ is updated to 8
⍺ = -∞ ⍺=8 and the value of node A is
8 B β=8 C β = +∞ MIN updated to 8.
• ⍺ is down propagated to node
⍺=8 ⍺ = -∞ ⍺=8
8 D β = +∞ 9 E β = 8 F G MAX F.
β = +∞
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
Step 7:
⍺=8 ⍺ = -∞ ⍺=8
8 D β = +∞ 9 E β=8 2 F β = +∞ G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
2 < ⍺(8)
Based on an example from John Levine (https://www.youtube.com/watch?v=zp3VMe0Jpf8)
Step 8:
⍺=8 ⍺ = -∞ ⍺=8
8 D β = +∞ 9 E β=8 2 F β = +∞ G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
-2 < ⍺(8)
Based on an example from John Levine (https://www.youtube.com/watch?v=zp3VMe0Jpf8)
Step 9:
⍺=8 • In node C MIN finds the
8 A β = +∞ MAX
value 2 of node F.
8
⍺ = -∞
2
⍺=8 • 2 < ⍺(8): the remaining
B β=8 C β = +∞ MIN
branches of C are
pruned.
⍺=8 ⍺ = -∞ ⍺=8
8 D β = +∞ 9 E β=8 2 F β = +∞ G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
2 < ⍺(8)
Based on an example from John Levine (https://www.youtube.com/watch?v=zp3VMe0Jpf8)
Result
⍺=8 ⍺ = -∞ ⍺=8
8 D β = +∞ 9 E β=8 2 F β = +∞ G MAX
H I J K L M N O Terminal Nodes
4 8 9 3 2 -2 9 -1 Utility
Can we do better?
• While both algorithms have many applications,
in certain scenarios they might reach their limits.
• This is where we can apply Monte Carlo Tree Search.
At this point the search is halted and the best performing root action is returned.
Si v = 10
Tree Policy
In MCTS the most widely used utility function is called Upper Confidence Bound (UCB1).
UCB1 =
A value of 2 for the tunable parameter C has been used in the past to yield promising results.
Default Policy
Play out the domain from a given non-terminal state to produce a value estimate (simulation). In the
simplest case these are just random moves.
v = 20
Backpropagating and
summing up the value
estimate
v = 20
v = 10
v = 10
)*(.)
• UCB1(s2) = 10 +2 ,
= 11.67
s3 v = 0 s4 v = 0
n=0 n=0
s3 v = 0 s4 v = 0
n=1 n=0
)*(/) s3 v = 0 s4 v = 0
• UCB1(s2) = 10 +2 ,
= 12.10 n=1 n=0
• Since the UCB1 score of s2 is higher we
explore this branch.
s3 v = 0 s4 v = 0 s5 v = 0 s6 v = 0
n=1 n=0 n=0 n=0
s3 v = 0 s4 v = 0 s5 v = 14 s6 v = 0
n=1 n=0 n=1 n=0
Result v = 44
s0 n = 4
• Following the design of the MCTS algorithm
we could do as many iterations as we want. v = 20 v = 24
s1 s2
• However, if we were to stop now, the branches n=2 n=2
with the highest total scores would be optimal
to choose (which is s2 followed by s5).
• More iterations often improve results. s3 v = 0 s4 v = 0 s5 v = 14 s6 v = 0
n=1 n=0 n=1 n=0
Background
• Remember, in a 19x19 Go board there are 2.08*10170 valid game states.
• While boards with the size of 5x5 have successfully been solved in 2002,
19x19 boards have long been assumed unsolvable.
Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Value Networks
• A value network approximates the optimal value function.
• Trained by 30 million moves sampled from distinct games of self-play from RL-policy.
• While the policy networks reveal which moves are promising,
the value network determines how good a board position is.
Network overview
Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Silver, D., Huang, A., Maddison, C. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Example: a b
V = { a, b, c, d, e}
E = { (a,b), (a,c), (a,d), (b,e), (c,d), (c,e), (d,e)}
c
d e
Two vertices are adjacent to each other, if they are connected by an edge.
• Example: a and c are adjacent to each other
The degree of a vertex deg(v) is defined as the number of vertices adjacent to it.
d e
A cyclic path (short: cycle) is a simple path with the exception, that the first vertex in the path is identical to the last
vertex in the path.
• Example: a, b, e, c, a a b
d e
A graph is connected if any two of its vertices are connected to each other by a path.
A subgraph is a subset of vertices and edges of a graph, which in turn form a graph.
c c c c
a b
c c c c
d e e
1 1 𝑛(𝑛 − 1)
• in a complete graph we have: 𝑚= ! deg 𝑣 = ! 𝑛 − 1 =
2 2 2
!∈# !∈#
If m < n-1, then the graph is not connected (consists of more than one connected component).
a b a b
c c
d e d e
G Spanning tree of G
An edge (v,w) is defined as directed, if it leads v to w, but not vice versa. Then (v,w) is an ordered pair.
• Illustration: as arrow
v w
a b
A graph is directed (digraph), if it contains directed edges.
c
d e
E LH2829 OS931 OS122 SR533 BA901 OS205 AF1539 SN4054 IW700 SN4051 SN3640
V
LNZ FRA VIE ZRH LHR BRU CDG
• in-degree
• out-degree
BRU
FRA
LHR
LHR
E LH2829 OS931 OS122 SR533 BA901 OS205 AF1539 SN4054 IW700 SN4051 SN3640
V
LNZ FRA VIE ZRH LHR BRU CDG
LH2829 LH2829 OS122 OS931 AF1539 SR533 IW700 BA901 SN3640 SN4051 IW700 SN3640
• Incidency container
• (in/out count)
to
from LNZ FRA VIE ZRH LHR BRU CDG
LNZ F T T F F F F
FRA F F T T T F F
VIE F F F T F F T
ZRH F F F F F T T
LHR F F F F F F F
BRU F F F T F F F
CDG F F F F F T F
to
from LNZ FRA VIE ZRH LHR BRU CDG
LHR F F F F F F F
BRU F F F SN4051 F F F
CDG F F F F F SN3640 F
Complexity
Edge list Adjacency list Adjacency matrix
Space O(m+n) O(m+n) O(n2)
Operations:
size, isEmpty O(1) O(1) O(1)
vertices O(n) O(n) O(n)
edges O(m) O(m) O(m)
endVertices, opposite, isDirected O(1) O(1) O(1)
incidentEdges O(m) O(deg(v)) O(n)
areAdjacent O(m) O(min(deg(u),deg(v))) O(1)
insertVertex O(1) O(1) O(n2)
removeVertex O(m) O(deg(v)) O(n2)
insertEdge, insertDirectedEdge, removeEdge O(1) O(1) O(1)
Algorithm DFS(v);
else
label e as a back edge
A B C D
E F G H
I J K L
M N O P
discovery edge
back edge
Algorithms and Data Structures 2 // 2021W // 17 A. Ferscha
Properties of DFS
For an undirected graph G in which a DFS starting with vertex s is executed, we have:
• The traversal visits all vertices in the connected component (=maximum connected subgraph), which contains s.
• The set of discovery edges form a spanning tree for this connected component.
Complexity of DFS
• If ns is the number of vertices in a connected component with s and if ms is the number of edges in the connected
component of s, then the complexity of DFS is O(ns + ms) on the assumptions:
• Graph is stored so that access to the vertices and edges is O(1)
• Marking and testing of the edges is O(1)
• There is a mechanism that systematically searches the edges of a node without looking at an edge more than
once.
Principle:
• Let s be the start node of BFS, set s to level 0
• In the first step, visit all (not yet visited) vertices that are adjacent to s and set them to level 1.
• In the next step visit for all vertices on level 1 all not yet visited adjacent vertices and set them to level 2.
• Repeat this step as long as all vertices have been reached.
Result:
• Traversal of the graph
• Level of a vertex v shows the length of the shortest path from v to s.
i¬0
while Li is not empty do
create container Li+1 to initially be empty
for each vertex v in Li do
for each edge e incident on v do
if edge e is unexplored then
let w be the other endpoint of e
if vertex w is unexplored then
label e as discovery edge
insert w into Li+1
else
label e as cross edge
i ¬ i+1
A B C D
E F G H
Level 1
I J K L
Level 2
Level 3
M N O P Level 4
Level 5
cross edge
Let G be an undirected graph in which BFS is executed starting with vertex s, then we have:
• The traversal visits all vertices in the connected component, which contains s.
• The set of discovery edges form a spanning tree (BFS tree) for this connected component.
• For each vertex v on level i, the path to s along the BFS tree has length i
and any other path from v to s has at least length i.
• If (u,v) is an edge that is not in the BFS tree, then the levels of u and v differ by 1 at maximum.
Transitive closure
• The transitive closure G* to a graph G is obtained by inserting a directed edge (v,w) if v is reachable from w
(there is a directed path from v to w)
Assumption: Operations areAdjacent and insertDirectedEdge have complexity O(1) (Graph is e.g. as adjacency matrix stored)
Algorithm FloydWarshall(G);
let v1 ... vn be an arbitrary ordering of the vertices of G0 = G
for k = 1 to n do
// consider all possible routing vertices vk
Gk = G k-1
for each ( i, j = 1,..,n ) ( i != j ) ( i,j != k ) do
// for each pair of vertices vi and vj
if Gk-1.areAdjacent(vi,vk) and Gk-1.areAdjacent(vk,vj) then
Gk. insertDirectedEdge(vi,vj,null)
return Gn
110 weight(u,z)
u = FRA
z = VIE
BRU
85
80 FRA
LHR 55
65 75
ZRH 45
LNZ VIE
75 80
CDG
130
Dijkstra‘s algorithm
• Finds shortest path for all vertices z to start vertex s in a graph
• with undirected edges and
• with non-negative edge weights
• based on greedy method
Algorithmic idea
• Set of nodes for which a path has already been found is stored in set C.
• D(z) denotes the shortest path from v to s found so far.
• When a new node u is visited, the system checks whether a route via this node to an already visited node z has
a shorter distance than the shortest route found so far, i.e. whether D[u] + weight(u,z) < D[z]
• If yes, then this path is saved via b as the new shortest path to z and D[z] is updated (relaxation)
Algorithm ShortestPath(G,s);
¥
BRU
¥
85
¥ 80 FRA
LHR 55
65 75 0 ¥
¥ ZRH 45
LNZ VIE
75 80
CDG
¥ 160
110
¥ 75
BRU
185 205 ¥
85
¥ 80 FRA
LHR 55
125 65 75 0 ¥
¥ ZRH
ZRH 45
LNZ VIE 45
75 80
CDG
¥ 160
205
200
Algorithm PrimJarnik(G):
110 Initialization
BRU
¥ ¥
¥ 85
80 FRA
LHR 55
¥ 65 75 0 ¥
ZRH 45
LNZ VIE
¥ 75 80
CDG
160
110
BRU 55
80
¥ ¥75
¥110 FRA
85
LHR 55 80
65
¥80 65 75 0 45
¥
ZRH 45
75
160¥
LNZ VIE
75 80
CDG
160
Put one edge after the other into the MST under the following conditions:
• Select the edge with the lowest weight
• An edge is only inserted, if no cycle will result from the insertion
Data structure:
• Algorithm manages a set of trees (forest)
• An edge is accepted if it connects vertices from different trees
Therefore the data structure must manage disjunctive subsets and support the following operations:
• find(u) returns the set, which u contains
• union(u,v) merges the sets, which u and v contain
T ®
while Q ¹ Æ do
(u,v) ¬ Q.removeMinElement()
if P.find(u) ¹ P.find(v) then
add edge (u,v) to T
P.union(u,v)
return T
110
BRU
85
80 FRA
LHR 55
65 75
ZRH 45
LNZ VIE
75 80
CDG
160
Edge in MST
Eccentricity
• Maximum (shortest) distance from the vertex to any other vertex in a connected graph
Diameter
• is the maximum (shortest) distance between two vertices of a connected graph
Radius
Minimum eccentricity among all vertices
Relation: diameter-radius
a b c d • h-a-g-f-e-d = 5
eccentricities a, b, c, d, e, f, g, h
h g f e • 4,4,3,5,4,3,3,5
radius:
• 3 (min eccentricities )
c d
a
d
b
e
Interpretation
• low average distance => short paths between most of the vertices
• high average distance => general difficult to reach one vertex from another
… vertex degree
… (minimum) distance (between vertex i and vertex j)
… diameter
… average distance
Moore Bound
d=4
Moore Graph: a regular graph of degree d and diameter Δ (also k) whose number of vertices
2∆ + 1 𝑖𝑓 𝑑 = 2
equals the upper bound = 0$($&')∆ &)
$&)
𝑖𝑓 𝑑 > 2
Moore Graph: a regular graph of degree d and diameter Δ (also k) whose number of vertices
2∆ + 1 𝑖𝑓 𝑑 = 2
equals the upper bound = 0$($&')∆ &)
$&)
𝑖𝑓 𝑑 > 2
3*2
Moore Bound
Moore Bound
Moore Bound
GCR Construction
Obtained from a ring by adding chords (= additional link between nodes) by variations over:
• Number c of chords
• Chord length w = number of edges between two nodes (on the ring)
• Period p = number of nodes having the same chord length (same connection pattern)
Example:
M = 14
Example:
P =1
W =2
Diameter
=4
Example:
P =1
W =3
à Diameter
=3
Example:
P =1
W = 5, 9
à Diameter
=3
Example:
P =7
W1 = 3, 10
W2 = 5, 8
W3 = 6, 9
W4 = 4, 11
W5 = 5, 8
W6 = 6, 8
W7 = 6, 9
à Diameter
=2
f e d
Edge disjoint paths (with common source/sink) do not have any common edges
Vertex disjoint paths do not have any common vertices
p1 p2
p1 c d p1 c d
p1
b p2 b p2 p1
p2 a e p2 a e
p2
3 1 2
a: a,b,c,d,g à d,a,c,{b,g} à a=2
c d e b: a,b,c à a,b,c àb=2
3
c: a,b,c,d à d,a,c,b à c=3
b d: a,c,d,e,f,g à d,a,c,g,e,f à d=1
e: d,e à d,e à e=2
a g f f: d,f àd,f à f=2
g: a,d,g à d,a,g à g=3
2 3 2
1-hop-degree
a b c a b c a b c
f e d f e d f e d
a b c a b c a b c
f e d f e d f e d
f e d f e d f e d
a b c a b c a b c
f e d f e d f e d
Stress Centrality
total number of all pairs shortest paths that pass through a vertex v
It is an estimation of the stress that a vertex in a network bears, assuming all communication
will be carried along the shortest path.
Betweenness Centrality
If the paths for shortest paths between nodes of a network pass through some vertices
more often than others, then these vertices are significantly more important than others for communication purposes.
Simple procedure to find CB is to calculate all shortest paths in the graph G and count the number of paths that pass
through v by excluding the paths that start or end at v.
forall v in V
bv ß 1
find BFS paths from v to all other nodes
end
forall s in V
starting from the farthest nodes, move from u towards s along paths
using vertex v.
bv ß bv+bu
• The score of a node can simply be its degree. We are adding the degrees of the neighbors of a
vertex i to find its new score.
• To find the eigenvalue centrality of a graph G, we need to find the eigenvalues of the adjacency matrix A. We
select the largest eigenvalue and its associated eigenvector. This eigenvector contains the eigenvalue
centralities of all vertices.
/**
* Determine wether the graph is connected.
*/
public class ConnectivityTesterDFS<V, E> extends DFS<V, E> {
protected int reached;
/** Find a path between the start vertex and a given target vertex. */
public class FindPathDFS<V, E> extends DFS<V, E> {
protected LinkedList<Vertex<V>> path;
protected boolean done;
protected Vertex<V> target;
@Override
protected void finishVisit(Vertex<V> v) {
if (!cycle.isEmpty() && !done) cycle.removeLast();
}
@Override
protected void traverseDiscovery(Edge<E> e, Vertex<V> from) {
if (!done) cycle.addLast(e);
}
@Override
protected void traverseBack(Edge<E> e, Vertex<V> from) {
if (!done) {
cycle.addLast(e); // back edge e creates a cycle
cycleStart = g.opposite(from, e);
done = true;
}
}
@Override
protected boolean isDone() {
return done;
}
}
à at.jku.pervasive.ad2.vo11.FindCycleDFS.java
flow
Find the flow f in a given network N with maximum value
capacity
Example: maxFlow = 5 s
2/2 2/3
1/1
1/2 1/2
1/1 1/2
1/2 0/4
2/2
2/2 1/1
t
s
2/2 2/3
S
1/1
1/2 1/2
1/1 1/2
1/2 0/4
c(X)=1+2+2+4+2=11
2/2
2/2 1/1
t T
A cut X = (Vs, Vt) is a cut through the network that separates the set of vertices into two partitions.
The capacity of a cut is the sum of the capacities of the “cut” edges
𝑐 𝑋 = $ 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦(𝑣, 𝑤)
!∈#! ,%∈#"
We have:
Value of the maximum flow = capacity of the minimum cut
Maximum Flow
2/2 s 1/2
Maximum Flow
2/2 s 1/2
• Backward-edge: 1
flow(u,v) > 0
à Flow can be decreased 1/1
Maximum Flow
2/2 s 1/2
Method FindFlow
if augmenting paths exist then
find augmenting path;
increase flow;
recursive call to FindFlow;
A path s à t
(regardless of the direction of the arrow)
on which you can increase the flow is
called augmenting path
u u cf(v,u) = f(u,v)
Edges with capacity 0 are removed
An augmented path in N corresponds to a directed path from s to t in Nf and can therefore be determined using DFS in Nf
2 s
2/2 s
u u cf(v,u) = f(u,v)
Edges with capacity 0 are removed
An augmented path in N corresponds to a directed path from s to t in Nf and can therefore be determined using DFS in Nf
2 s 1
2/2 s 1/2
1
1/1 1
1
1/2 2/2 1 2
t t
increase flow by 1
+1 Updated residual graph
+1 +1 2
1/3
1/5 1/3 4 1 2
1 1
0/3 3
S T S T
4 0
5
0/4 0/5 0 3 0
0/3
0
Algorithms and Data Structures 2 // 2021W // 14 A. Ferscha
AUGMENTED PATH :: EXAMPLE STEP 2/7
capacity Current flow Augmenting path Residual graph
increase flow by 1
Updated residual graph
1/3 2
1/5 1/3 4 1 2
1 1
0/3 3
S T S T
3 0
+1 +1 4
1/4 1/5 1 2 1
1/3 1
+1
Algorithms and Data Structures 2 // 2021W // 15 A. Ferscha
AUGMENTED PATH :: EXAMPLE STEP 3/7
capacity Current flow Augmenting path Residual graph
increase flow by 1
+1 Updated residual graph
+1 1
2/3
2/5 1/3 3 2 2
2 1
1/3 +1 2
S T S T
3 1
3
1/4 2/5 1 1 2
2/3 2
+1 +1
Algorithms and Data Structures 2 // 2021W // 16 A. Ferscha
AUGMENTED PATH :: EXAMPLE STEP 4/7
capacity Current flow Augmenting path Residual graph
increase flow by 1
Updated residual graph
+1 1
2/3
2/5 2/3 3 2 1
-1 2
3 2
S 0/3 T S T
2 0
+1 3
2/4 2/5 2 1 2
2/3
2
Algorithms and Data Structures 2 // 2021W // 17 A. Ferscha
AUGMENTED PATH :: EXAMPLE STEP 5/7
capacity Current flow Augmenting path Residual graph
increase flow by 1
+1 Updated residual graph
+1 0
3/3
3/5 2/3 2 3 1
3 2
1/3 2
S +1 T S T
2 1
2
+1 2
2/4 3/5 0 3
3/3 3
+1
Algorithms and Data Structures 2 // 2021W // 18 A. Ferscha
AUGMENTED PATH :: EXAMPLE STEP 6/7
capacity Current flow Augmenting path Residual graph
increase flow by 1
Updated residual graph
+1
3/3 0
3/5 3/3 2 3 0
-1 3 3
0/3 3
S T S T
1 0
2
+1 3/4 3/5 3 0 3
3/3
3
Algorithms and Data Structures 2 // 2021W // 19 A. Ferscha
AUGMENTED PATH :: EXAMPLE STEP 7/7
No augmenting path from source S to sink T exists where flow could be increased
(when flow = capacity).
Residual graph
flow = capacity
3/3 0
3/5 3/3 2 3 0
3 3
0/3 3
S T S T
1 0
2
3/4 3/5 3 0 3
3/3
3
No augmenting path from source S to sink T exists where flow could be increased
(when flow = capacity).
Residual graph
flow = capacity
3/3
3/5 3/3 2 3
3 3
0/3 3
S T S T
1
2
3/4 3/5 3 3
3/3
3
O( (n+m) × n × m ) = O(n5)
Advantage:
Runtime is independent from the value of the maximum flow.
Erdős Pál
(1913-1996)
Small-world network
• most nodes are not neighbors of one another
• but most nodes can be reached from every other by
a small number of hops or steps.
Stanley Milgram
(1933-1984) Took U.S. cities of
Omaha, Nebraska, and Wichita, Kansas, The "six degrees of separation" model
1960 to be the starting points
„lost letter technique“ and Boston, Massachusetts,
to be the end point
of a chain of correspondence.
p=1/6 N=10
p=1/6 N=10
<k> ~ 1.5
Degree distribution
(in very large networks)
Expected
Over 1 Trillion documents
Found
a document and follow
them recursively
Albert-László
Barabási
(1967-)
Gaussian
Random
Network
Power-Law
Scale-free
Network
Number of links ( k )
Costa, Evsukoff et al (2010) Complex Networks
1985 2003
The Network Behind an Organisation :: Shareholder-Network of the Japanese Automotive Industry
https://www.quantamagazine.org/20130904-evolution-as-opportunist/
Krischke & Röpcke (2014) Graphen und Netzwerktheorie
https://www.quantamagazine.org/20130904-evolution-as-opportunist/
Krischke & Röpcke (2014) Graphen und Netzwerktheorie
spreading rate
of a virus
spreading rate
of a virus
Small-World Effect
• Six Degrees of Separation
• A famous experiment conducted by Travers and Milgram (1969)
- Subjects were asked to send a chain letter to his acquaintance in order to reach a target person
- The average path length is around 5.5
J. Leskovec and E. Horvitz. Planetary-scale views on a large instant-messaging network. In Proceedings of the 17th international conference on World Wide Web, WWW ’08, pages 915–924, New York, NY, USA, 2008. ACM
Small-World Effect
• Six Degrees of Separation
• A famous experiment conducted by Travers and Milgram (1969)
- Subjects were asked to send a chain letter to his acquaintance in order to reach a target person
- The average path length is around 5.5
J. Leskovec and E. Horvitz. Planetary-scale views on a large instant-messaging network. In Proceedings of the 17th international conference on World Wide Web, WWW ’08, pages 915–924, New York, NY, USA, 2008. ACM
Diameter
Measures used to calibrate the small world effect
• Diameter: the (maximum) longest shortest path in a network
• Average shortest path length
Link prediction
Network-Based Classification
Algorithms and Data Structures 2 // 2020W // 5 A. Ferscha
Community Analysis
Density of connections
• Friends of a friend are likely to be friends as well
- density of connections among one’s friends
d … number of neighbours
k … connections among neighboured friends
Example:
d6=4, N6= {4, 5, 7,8}
k6=4 as e(4,5), e(5,7), e(5,8), e(7,8)
C6 = 4/(4*3/2) = 2/3
Degree of Centrality
The importance of a node is determined by the number of nodes adjacent to it
• The larger the degree, the more import the node is
• Only a small number of nodes have high degrees in many real-life networks
Degree Centrality
Closeness Centrality
• “Central” nodes are important, as they can reach the whole network more quickly than non-central nodes
• Average Distance
• Closeness Centrality
9 1 8
Cc (3) = = = 0.47
1+1+1+2+2+3+3+4 17
9 1 8
Cc (4) = = = 0.62
1+2+1+1+1+2+2+3 13
Node 4 is more central than node 3
Betweenness Centrality
• Node betweenness counts the number of shortest paths that pass one node
• Nodes with high betweenness are important in communication and information diffusion
• Betweenness Centrality
CB (4) = 15
What’s the betweenness centrality for node 5?
(A) Example of a hypothetical social network illustrating the individual level metrics. Letters label individuals in the network.
(B) Network is restructured in a hierarchy such that the node with the highest relevant centrality measure is on top
following the arrow). For example, node C has the highest Eigenvector centrality, but node E had the highest
betweenness centrality.
Social Media allows users to connect to each other more easily than ever
• One user might have thousands of friends online
• Who are the most important ones among your Facebook friends?
“shortcut” Bridge
• Bridges are rare in real-life networks
• Alternatively, one can relax the definition by checking if the distance
between two terminal nodes increases if the edge is removed
• The larger the distance, the weaker the tie is
Neighbourhood Overlap
Tie strength can be measured based on neighborhood overlap; the larger the overlap, the stronger the tie is.
Influence Modeling
Influence modeling is one of the fundamental questions in order to understand the information diffusion,
spread of new ideas, and word-of-mouth (viral) marketing
• A social network is represented by a directed graph, with each actor being one node;
• Each node v chooses a threshold ϴv randomly from a uniform distribution in an interval between 0 and 1.
• In each discrete step, all nodes that were active in the previous step remain active
• The nodes satisfying the following condition will be activated
• A node w, once activated at step t , has one chance to activate each of its neighbors randomly
- For a neighboring node (say, v), the activation succeeds with probability pw,v (e.g. p = 0.5)
Influence Maximization
• Given a network and a parameter k, which k nodes should be selected to be in the activation set B in order to
maximize the influence in terms of active nodes at the end?
• Let σ(B) denote the expected number of nodes that can be influenced by B, the optimization problem can be
formulated as follows:
§ Suppose we have a binary attribute with each node (say, whether or not being smoker)
§ If the attribute is correlated with the network, we expect actors sharing the same attribute value
to be positively correlated with social connections
§ That is, smokers are more likely to interact with other smokers,
and non-smokers with non-smokers
• Red nodes denote non-smokers, and green ones are smokers. If there is no correlation, then the probability
of one edge connecting a smoker and a non-smoker is 2 × 4/9 × 5/9 = 49%.
• In this example the fraction is 2/14 = 14% < 49%, so this network demonstrates some degree of correlation
with respect to the smoking behavior.
• A more formal way is to conduct a χ2 test for independence of social connections and attributes [1]
[1] T. La Fond and J. Neville. Randomization tests for distinguishing social influence and homophily effects. In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 601–610, New York, NY, USA, 2010.
ACM.
• Node-Centric Community
• Each node in a group satisfies certain properties
• Group-Centric Community
• Consider the connections within a group as a whole.
The group has to satisfy certain properties without zooming into node-level
• Network-Centric Community
• Partition the whole network into several disjoint sets
• Hierarchy-Centric Community
• Construct a hierarchical structure of communities
• Node-Centric Community
• Each node in a group satisfies certain properties
• Group-Centric Community
• Consider the connections within a group as a whole.
The group has to satisfy certain properties without zooming into node-level
• Network-Centric Community
• Partition the whole network into several disjoint sets
• Hierarchy-Centric Community
• Construct a hierarchical structure of communities
Complete Mutuality
• Cliques
Reachability of members
• k-clique, k-clan, k-club
Nodal degrees
• k-plex, k-core
Clique
A maximum complete subgraph in which all nodes are adjacent to each other
Clique
A maximum complete subgraph in which all nodes are adjacent to each other
• Nodes with degree < k-1 will not be included in the maximum clique
• Many nodes will be pruned as social media networks follow a power law distribution for node degrees
• Suppose we sample a sub-network with nodes {1, 2, 3, 4, 5} and find a clique {1, 2, 3} of size 3
• In order to find a clique >3, remove all nodes with degree <=3-1=2
• Remove nodes 2 and 9
• Remove nodes 1 and 3
• Remove node 4
Input
• A parameter k, and a network
Procedure
• Find out all cliques of size k in a given network
• Construct a clique graph. Two cliques are adjacent if they share k-1 nodes
• Each connected components in the clique graph form a community
CPM Example
Cliques of size 3:
{1, 2, 3}, {1, 3, 4}, {4, 5, 6}, {5, 6, 7},
{5, 6, 8}, {5, 7, 8}, {6, 7, 8}
Communities:
{1, 2, 3, 4}
{4, 5, 6, 7, 8}
k-cliques
Algorithms and Data Structures 2 // 2020W // 40 A. Ferscha
Community Analysis :: Clustering
Cliques: {1, 2, 3}
2-cliques: {1, 2, 3, 4, 5}, {2, 3, 4, 5, 6}
2-clubs: {1,2,3,4}, {1, 2, 3, 5}, {2, 3, 4, 5, 6}
Cliques: {1, 2, 3}
2-cliques: {1, 2, 3, 4, 5}, {2, 3, 4, 5, 6}
2-clubs: {1,2,3,4}, {1, 2, 3, 5}, {2, 3, 4, 5, 6}
Cliques: {1, 2, 3}
2-cliques: {1, 2, 3, 4, 5}, {2, 3, 4, 5, 6}
2-clubs: {1,2,3,4}, {1, 2, 3, 5}, {2, 3, 4, 5, 6}
Cliques: {1, 2, 3}
2-cliques: {1, 2, 3, 4, 5}, {2, 3, 4, 5, 6}
2-clubs: {1,2,3,4}, {1, 2, 3, 5}, {2, 3, 4, 5, 6}
Cliques: {1, 2, 3}
2-cliques: {1, 2, 3, 4, 5}, {2, 3, 4, 5, 6}
2-clubs: {1,2,3,4}, {1, 2, 3, 5}, {2, 3, 4, 5, 6}
Cliques: {1, 2, 3}
2-cliques: {1, 2, 3, 4, 5}, {2, 3, 4, 5, 6}
2-clubs: {1,2,3,4}, {1, 2, 3, 5}, {2, 3, 4, 5, 6}
Cliques: {1, 2, 3}
2-cliques: {1, 2, 3, 4, 5}, {2, 3, 4, 5, 6}
2-clubs: {1,2,3,4}, {1, 2, 3, 5}, {2, 3, 4, 5, 6}
The group-centric criterion requires the whole group to satisfy a certain condition
• E.g., the group density >= a given threshold
|Es |
⇥
|Vs |(|Vs | 1)/2
Network-Centric clustering
Approaches:
• Clustering based on vertex similarity
• Latent space models
• Block model approximation
• Spectral clustering
• Modularity maximization
|{5}| 1
Jaccard(4, 6) = =
|{1, 3, 4, 5, 6, 7, 8}| 7
1 1
cosine(4, 6) = =
4·4 4
1 1 T 1 T
SS T ⇡ (I 11 )(P P )(I 11 ) = Pe
€ 2 n n
1
• Solution: S=V 2
MDS Example 2
0 1 1 1 2 2 3 3 4
3
6 1 0 1 2 3 3 4 4 5 7
6 7
6 1 1 0 1 2 2 3 3 4 7
6 7
6 1 2 1 0 1 1 2 2 3 7
geodesic 6
P =6
6 2 3 2 1 0 1 1 1 2
7
7
7
0.8
distance 6
6
6
2 3 2 1 1 0 1 1 2 7
7
7
9 6 3 4 3 2 1 1 0 1 1 7
0.6
4 3 4 3 2 1 1 1 0 2 5
4 5 4 3 2 2 1 2 0
0.4
2 3
0.2 2 2.46 3.96 1.96 0.85 0.65 0.65 2.21 2.04 3.65
7
6 3.96 6.46 3.96 1.35 1.15 1.15 3.71 3.54 6.15 7
1 3 6 7
0 4 6 1.96 3.96 2.46 0.85 0.65 0.65 2.21 2.04 3.65 7
5 6 6 7
S2
Two communities:
{1, 2, 3, 4} and {5, 6, 7, 8, 9}
Block Models
Two communities:
{1, 2, 3, 4} and {5, 6, 7, 8, 9}
Cut
• Most interactions are within a group, whereas interactions between groups are few
• community detection (clustering) à minimum cut problem
• Cut: A partition of vertices of a graph into two disjoint sets
• Minimum cut problem: find a graph partition such that the number of edges between the two sets is minimized
• Minimum cut often returns an imbalanced partition, with one set being a singleton (e.g. node 9)
• Change the objective function to consider community size
k
1 X cut(Ci , C̄i )
Ratio Cut( ) = , Ci,: a community
k i=1 |Ci |
cut(A, B): number of edges induced by cut
k
1 X cut(Ci , C̄i ) |Ci|: number of nodes in Ci (=community size)
Normalized Cut( ) = vol(Ci): sum of degrees in Ci
k i=1 vol(Ci )
Spectral Clustering
• Both ratio cut and normalized cut can be reformulated as
min e
T r(S T LS)
S {0,1}n⇥k
• Where
⇢ graph Laplacian for ratio cut
e= D A
L 1/2 1/2 normalized graph Laplacian
I D AD
D = diag(d1 , d2 , · · · , dn ) diagonal matrix of degrees
• Spectral relaxation: e
min T r(S T LS) s.t. S T S = Ik
S
Modularity Maximation
• Modularity measures the strength of a community partition by taking into account the degree distribution
• Given a network with m edges, the expected number of edges between two nodes with di and dj is
• Strength of a community:
• Modularity
Modularity Maximation
1
max Q = T r(S T BS) s.t. S T S = Ik
2m
• Optimal solution: top eigenvectors of the modularity matrix
• Apply k-means to S as a post-processing step to obtain community partition
Two communities:
{1, 2, 3, 4} and {5, 6, 7, 8, 9}
k-means
2 3 2 3
0.32 0.79 0.68 0.57 0.43 0.43 0.43 0.32 0.11 0.44 0.00
6 0.79 0.14 0.79 0.29 0.29 0.29 0.29 0.21 0.07 7 6 0.38 0.23 7
6 7 6 7
6 0.68 0.79 0.32 0.57 0.43 0.43 0.43 0.32 0.11 7 6 7
6 7 6 0.44 0.00 7
6 0.57 0.29 0.57 0.57 0.43 0.43 0.57 0.43 0.14 7 6 7
6 7 6 0.17 0.48 7
B=6 0.43 0.29 0.43 0.43 0.57 0.43 0.43 0.57 0.14 7
6
6
7
7 S=6
6 0.29 0.32 7
7
6 0.43 0.29 0.43 0.43 0.43 0.57 0.43 0.57 0.14 7 6 7
6 0.43 0.29 0.43 0.57 0.43 0.43 0.57 0.57 0.86 7 6 0.29 0.32 7
6 7 6 0.38 0.34 7
4 0.32 0.21 0.32 0.43 0.57 0.57 0.57 0.32 0.11 5 6 7
0.11 0.07 0.11 0.14 0.14 0.14 0.86 0.11 0.04 4 0.34 0.08 5
0.14 0.63
Modularity Matrix
8
>
> modified proximity matrix Pe if latent space models
<
adjacency matrix A if block models
Utility Matrix M = e
>
> graph Laplacian L if spectral clustering
:
modularity maximization B if modularity maximization
Hierarchy-Centric Clustering
• Representative approaches:
• Divisive Hierarchical Clustering
• Agglomerative Hierarchical Clustering
Divisive clustering
• Partition nodes into several sets
• Each set is further divided into smaller ones
• Network-partition can be applied for the partition
Edge Betweeness
st (e)
edge-betweenness(e) = s<t
s,t
The edge betweenness of e(1, 2) is 4, as
all the shortest paths from 2 to {4, 5, 6, 7,
8, 9} have to either pass e(1, 2) or e(2,
3), and e(1,2) is the shortest path
between 1 and 2
• The edge with higher betweenness tends to be the bridge between two communities.