Data Structures 3
Data Structures 3
Bundle 8
Data Structures - 3
Editor
Murat Ekici
Reviewers
Kayacan Vesek
1
Contents
3 Mo’s Algorithm 10
3.1 Mo’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Trie 13
4.1 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2
1 Segment Tree With Lazy Propagation
In the previous lectures, update function was called to update only a single value in array. Please
note that a single value update in array may cause changes in multiple nodes in Segment Tree as
there may be many segment tree nodes that have this changed single element in it’s range.
3
1.2 Lazy Propagation Algorithm
• Return the sum of all elements in the given range [l, r].
Notice that if update was for single element, we could use the segment tree we have learned before.
Trivial structure comes to mind is to use an array and do the operations by traversing and increasing
the elements one by one. Both operations would take O(L) time complexity in this structure where
L is the number of elements in the given range.
Let’s use segment tree’s we have learned. Second operation is easy, We can do it in O(logN ). What
about the first operation. Since we can do only single element update in the regular segment tree,
we have to update all elements in the given range one by one. Thus we have to perform update
operation L times. This works in O(LlogN ) for each range update. This looks bad, even worse than
just using an array in a lot of cases.
So we need a better structure. People developed a trick called lazy propagation to perform range
updates on a structure that can perform single update (This trick can be used in segment trees,
treaps, k-d trees ...).
Trick is to be lazy i.e, do work only when needed. Do the updates only when you have to. Using
Lazy Propagation we can do range updates in O(logN ) on standart segment tree. This is definitely
fast enough.
Let’s be lazy as told, when we need to update an interval, we will update a node and mark its
children that it needs to be updated and update them when needed. For this we need an array
lazy[] of the same size as that of segment tree. Initially all the elements of the lazy[] array will be 0
representing that there is no pending update. If there is non-zero element lazy[k] then this element
needs to update node k in the segment tree before making any query operation, then lazy[2 ∗ k] and
lazy[2 ∗ k + 1] must be also updated correspondingly.
• If current segment tree node has any pending update, then first add that pending update to
current node and push the update to it’s children.
• If the interval represented by current node lies completely in the interval to update, then update
the current node and update the lazy[] array for children nodes.
• If the interval represented by current node overlaps with the interval to update, then update
the nodes as the earlier update function
4
1 void update(int node, int start, int end, int l, int r, int val) {
2 if(lazy[node] != 0) {
3 tree[node] += (end - start + 1) * lazy[node];
4 if(start != end) {
5 lazy[2 * node] += lazy[node];
6 lazy[2 * node + 1] += lazy[node];
7 }
8 lazy[node] = 0;
9 }
10 if(start > r || end < l)
11 return;
12 if(l <= start && end <= r) [
13 tree[node] += (end - start + 1) * val;
14 if(start != end) {
15 lazy[2 * node] += val;
16 lazy[2 * node + 1] += val;
17 }
18 return;
19 }
20 int mid = (start + end) / 2;
21 update(2 * node, start, mid, l, r, val);
22 update(2 * node + 1, mid + 1, end, l, r, val);
23 tree[node] = tree[2 * node] + tree[2 * node + 1];
24 }
This is the update function for given problem. Notice that when we arrive a node, all the updates that
we postponed that would effect this node will be performed since we are pushing them downwards
as we go to this node. Thus this node will keep the exact values when the range updates are done
without lazy. So it’s seems like it is working. How about queries?
Since we have changed the update function to postpone the update operation, we will have to change
the query function also. The only change we need to make is to check if there is any pending update
operation on that node. If there is a pending update operation, first update the node and then work
same as the earlier query function. As we told in the last subsection, all the postponed updates that
would effect this node will be performed before we reach this node. So sum value we look for will be
there, correctly!
5
1 int query(int node, int start, int end, int l, int r) {
2 if(start > r || end < l)
3 return 0;
4 if(lazy[node] != 0) {
5 tree[node] += (end - start + 1) * lazy[node];
6 if(start != end) {
7 lazy[2 * node] += lazy[node];
8 lazy[2 * node + 1] += lazy[node];
9 }
10 lazy[node] = 0;
11 }
12 if(l <= start && end <= r)
13 return tree[node];
14 int mid = (start + end) / 2;
15 int p1 = query(2 * node, start, mid, l, r);
16 int p2 = query(2 * node + 1, mid + 1, end, l, r);
17 return (p1 + p2);
18 }
Notice that only difference with regular query function is pushing the lazy values downwards as we
travel. This is a widely used trick. You can use it for dozens of different problems. But not all range
problems. You may have noticed that we used some properties of adding operation here. We used
the fact that increasing updates has associative property. We merged more than one updates in lazy
array, we didn’t care about the order of this updates. This assumption must be made to use lazy
propagation. Other properties that needed left as an exercise to the readers.
6
2 Binary Search on Segment Tree
Assume we have an array A that contains elements between 1 and M . We have to perform 2 kinds
of operations.
Let’s construct a frequency array, F [i] will keep how many times number i occurs in our original
array. So we want to find smallest i such that j=ij=1 F [i] ≥ k. Then the number i will be our answer
P
for the query. And for updates we just have to change F array accordingly.
6 int query(int k) {
7 int sum = 0, ans = 0;
8 for(int i = 1; i <= M; i++) {
9 sum += F[i];
10 if(sum >= k)
11 return i;
12 }
13 }
7
2.2 How to Solve It With Segment Trees?
This is of course, slow. Let’s use segment tree’s to improve it. First we will construct a segment tree
on F array. Segment tree will perform single element updates and range sum queries. We will use
binary search to find corresponding i for k th element queries.
If you look at the code above you can notice that each update takes O(logM ) time and each query
takes O(log 2 M ) time, but we can do better.
8
2.3 Speed It Up?
If you look at the segment tree solution on preceding subsection you can see that queries are performed
in O(log 2 M ) time. We can make is faster, actually we can reduce the time complexity to O(logM )
which is same with the time complexity for updates. We will do the binary search when we are
traversing the segment tree. We first will start from the root and look at its left child’s sum value, if
this value is greater than k, this means our answer is somewhere in the left child’s subtree. Otherwise
it is somewhere in the right child’s subtree. We will follow a path using this rule until we reach a
leaf, then this will be our answer. Since we just traversed O(logM ) nodes (one node at each level),
time complexity will be O(logM ). Look at the code below for better understanding.
9
3 Mo’s Algorithm
This method will be a key for solving offline range queries on an array. By offline, we mean we can
find the answers of these queries in any order we want and there are no updates. Let’s introduce a
problem and construct an efficient solution for it.
You have an array a with N elements such that it’s elements ranges from 1 to M . You have to answer
Q queries. Each is in the same type. You will be given a range [l, r] for each query, you have to print
how many different values are there in the subarray [al , al+1 ..ar−1 , ar ].
First let’s find a naive solution and improve it. Remember the frequency array we mentioned before.
We will keep a frequency array that contains only given subarray’s values. Number of values in this
frequency array bigger than 0 will be our answer for given query. Then we have to update frequency
array for next query. We will use O(N ) time for each query, so total complexity will be O(Q × N ).
Look at the code below for implementation.
1 class Query {
2 public: int l, r, ind;
3 Query(int l, int r, int ind) {
4 this->l = l, this->r = r, this->ind = ind;
5 }
6 };
7
8 void del(int ind, vector< int > &a, vector< int > &F, int &num) {
9 if(F[a[ind]] == 1) num--; F[a[ind]]--;
10 }
11
12 void add(int ind, vector< int > &a, vector< int > &F, int &num) {
13 if(F[a[ind]] == 0) num++; F[a[ind]]++;
14 }
15
16 vector< int > solve(vector< int > &a, vector< Query > &q) {
17 int Q = q.size(), N = a.size();
18 int M = *max_element(a.begin(), a.end());
19 vector< int > F(M + 1, 0); // This is frequency array we mentioned before
20 vector< int > ans(Q, 0);
21 int l = 0, r = -1, num = 0;
22 for(int i = 0; i < Q; i++) {
23 int nl = q[i].l, nr = q[i].r;
24 while(l < nl) del(l++, a, F, num);
25 while(l > nl) add(--l, a, F, num);
26 while(r > nr) del(r--, a, F, num);
27 while(r < nr) add(++r, a, F, num);
28 ans[q[i].ind] = num;
29 } return ans;
30 }
Time complexity for each query here is O(N ). So total complexity √is O(Q × N ). Just by changing
the order of queries we will reduce this complexity to O((Q + N ) × N ).
10
3.1 Mo’s algorithm
We will change the order of answering the queries such that overall complexity will be reduced
drastically. We will use following
√ cmp function to sort our queries and will answer them in this
sorted order. Block size here is N .
Why does that work? Let’s examine what we do here first then find the complexity. We divide l0 s of
l
queries into blocks. Block number of a given l is blocksize (integer division). We sort the queries first
by their block numbers then for same block numbers, we sort them by their r0 s. Sorting all queries
will take O(QlogQ) time. Let’s look at how many times we will call add and del operations to change
current r. For the same block r0 s always increases. So for same block it is O(N ) since it can only
N
increase. Since there are blocksize blocks in total, it will be O(N × N/blocksize) operations in total.
For same block, add and del operations that changes l will be called at most O(blocksize) times for
each query, since if block number is same then their l0 s must differ at most by O(blocksize). So overall
it is O(Q × blocksize). Also when consecutive queries has different block numbers we will perform at
most O(N ) operations, but notice that there are at most O(N/blocksize) √ such consecutive queries,
√ the overall time complexity. If we pick blocksize = N overall complexity will
so it doesn’t change
be O((Q + N ) × N ). Full code is given below.
11
1 int block_size;
2
3 class Query {
4 public:
5 int l, r, ind;
6 Query(int l, int r, int ind) {
7 this->l = l, this->r = r, this->ind = ind;
8 }
9 bool operator<(Query other) const {
10 return make_pair(l / block_size, r) <
11 make_pair(other.l / block_size, other.r);
12 }
13 };
14
15 void del(int ind, vector< int > &a, vector< int > &F, int &num) {
16 if(F[a[ind]] == 1) num--;
17 F[a[ind]]--;
18 }
19
20 void add(int ind, vector< int > &a, vector< int > &F, int &num) {
21 if(F[a[ind]] == 0) num++;
22 F[a[ind]]++;
23 }
24
25 vector< int > solve(vector< int > &a, vector< Query > &q) {
26 int Q = q.size(), N = a.size();
27 int M = *max_element(a.begin(), a.end());
28 block_size = sqrt(N);
29 sort(q.begin(), q.end());
30 vector< int > F(M + 1, 0); // This is frequency array we mentioned before
31 vector< int > ans(Q, 0);
32 int l = 0, r = -1, num = 0;
33 for(int i = 0; i < Q; i++) {
34 int nl = q[i].l, nr = q[i].r;
35 while(l < nl) del(l++, a, F, num);
36 while(l > nl) add(--l, a, F, num);
37 while(r > nr) del(r--, a, F, num);
38 while(r < nr) add(++r, a, F, num);
39 ans[q[i].ind] = num;
40 } return ans;
41 }
12
4 Trie
Trie is an efficient information reT rieval data structure. Using Trie, search complexities can be
brought to optimal limit (key length). If we store keys in binary search tree, a well balanced BST
will need time proportional to M * log N, where M is maximum string length and N is number of
keys in tree. Using Trie, we can search the key in O(M) time. However the penalty is on Trie storage
requirements (Please refer Applications of Trie for more details)
Every node of Trie consists of multiple branches. Each branch represents a possible character of keys.
We need to mark the last node of every key as end of word node. A Trie node field isEndOfWord is
used to distinguish the node as end of word node. A simple structure to represent nodes of English
alphabet can be as following,
1 // Trie node
2 class TrieNode {
3 public:
4 TrieNode *children[ALPHABET_SIZE];
5 bool isEndOfWord;
6 TrieNode() {
7 isEndOfWord = false;
8 for(int i = 0; i < ALPHABET SIZE; i++)
9 children[i] = NULL;
10 }
11 };
13
4.1 Insertion
Inserting a key into Trie is simple approach. Every character of input key is inserted as an individual
Trie node. Note that the children is an array of pointers (or references) to next level Trie nodes. The
key character acts as an index into the array children. If the input key is new or an extension of
existing key, we need to construct non-existing nodes of the key, and mark end of word for last node.
If the input key is prefix of existing key in Trie, we simply mark the last node of key as end of word.
The key length determines Trie depth.
4.2 Search
Searching for a key is similar to insert operation, however we only compare the characters and move
down. The search can terminate due to end of string or lack of key in Trie. In the former case, if
the isEndofWord field of last node is true, then the key exists in Trie. In the second case, the search
terminates without examining all the characters of key, since the key is not present in Trie.
Insert and search costs O(keyl ength). However the memory requirements of Trie high.
It is O(ALP HABET SIZE × keyl ength × N ) where N is number of keys in Trie. There are ef-
ficient representation of trie nodes (e.g. compressed trie, ternary search tree, etc.) to minimize
memory requirements of trie.
14
References
15