Count Min Sketch Algorithm
Count Min Sketch Algorithm
Count min sketch algorithm is used to find frequency of the elements in the stream using constant
time and sublinear space. Count min sketch algorithm does not store the complete value of data
stream, but it will use a matrix to compute the frequency. In this matrix the number of rows is equal
to the number of hash function used and number of columns depends on the maximum output of
the hash function.
Example: -
H1 H2 H3 H4
A 1 6 3 1
B 1 2 4 6
C 3 4 1 6
D 6 2 4 1
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Now for each data from stream now let’s calculate the Hash outputs and increment the
corresponding counter in the table….
1 0 0 0 0 0
0 0 0 0 0 1
0 0 1 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
0 1 0 0 0 1
0 0 1 1 0 0
1 0 0 0 0 1
2 0 1 0 0 0
0 1 0 1 0 1
1 0 1 1 0 0
1 0 0 0 0 2
3 0 1 0 0 0
0 1 0 1 0 2
1 0 2 1 0 0
2 0 0 0 0 2
4 0 1 0 0 0
0 1 0 1 0 3
1 0 3 1 0 0
3 0 0 0 0 2
4 0 2 0 0 0
0 1 0 2 0 3
2 0 3 1 0 0
3 0 0 0 0 3
4 0 2 0 0 1
0 2 0 2 0 3
2 0 3 2 0 0
4 0 0 0 0 3
The frequency of A will be min (H1(A)=1, H2(A)=6, H3(A)=3, H4(A)=1) => min (4,3,3,4) =3
In case of hash collision, the frequency that we may get may be greater than the actual frequency.
Time complexity of count min sketch algorithm: -
O (1)
References: -
https://medium.com/@gopalkrushnapattanaik/understanding-count-min-sketch-8a10590fc936
https://florian.github.io/count-min-sketch/
https://www.youtube.com/watch?v=ibxXO-b14j4&ab_channel=TechDummiesNarendraL
https://www.google.com/search?
q=sublinear+graph&tbm=isch&source=iu&ictx=1&vet=1&fir=4LcIi4hK45qC7M
%252C_5JElye65bN4DM%252C_%253BSmc6Bkj5p_SebM%252C0A1v8H63kgJSSM%252C_
%253BxBKicTge1fGKjM%252C4sojUZ9rQocddM%252C_%253Byyf26NMx9eYaGM
%252C_5JElye65bN4DM%252C_%253B9Ltj5hmSkfIePM%252CJ0qqVmkhbbqIhM%252C_
%253BWrj4QQB_tisVrM%252C36RYvr9xImUbDM%252C_%253BmBYHEpe-0pzbJM
%252C_5JElye65bN4DM%252C_%253B4ENCtrbKChUfKM%252CtHdd4magQqzMmM%252C_
%253BpaUKgidShP4NiM%252Crsx3ojsQlpt15M%252C_%253BXeseGQQVmxDd9M
%252CUXBHrz_g4v_KwM%252C_%253Bzq8MztJsQRpQFM%252C0A1v8H63kgJSSM%252C_
%253BKM6YmtDuAHFhyM%252C5BYpGv2-Sp97LM%252C_%253BnAfSXaLoEfb8RM
%252CfhTRTwIAjdMhiM%252C_%253BilR0jf3vM90x6M%252C36RYvr9xImUbDM%252C_&usg=AI4_-
kTZTGfhVapyHfnji7fM70rj_azTZQ&sa=X&ved=2ahUKEwi7rN-
7zqL3AhVYad4KHdyCAbIQ9QF6BAgDEAE#imgrc=4LcIi4hK45qC7M
Script: -
Count min sketch algorithm is the probabilistic data structure which helps in finding the frequency of
the element in the data stream. It uses hash function to map the data to the frequency but unlikely
to hash tables, count min sketch algorithm uses sub-linear space. It stores data in the form of matrix
where the number of rows is equal to the number of hash function used and number of columns
depends on the maximum output of the hash function.
And ya the only drawback of the algorithm is that in case of hash collision it may over count the
frequency of the element.
Taking about the experiences, I learnt about how to work as a team and also, I learnt different things
from my team mates. Overall, the experience was great.