0% found this document useful (0 votes)
60 views4 pages

Count Min Sketch Algorithm

The Count-min sketch algorithm uses a matrix to estimate the frequency of elements in a data stream using sub-linear space and constant time. It maps data to the matrix using hash functions, where the number of rows equals the number of hash functions and the number of columns depends on the maximum hash output. The algorithm may overestimate frequencies due to hash collisions but provides an approximation of frequencies in sub-linear space.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views4 pages

Count Min Sketch Algorithm

The Count-min sketch algorithm uses a matrix to estimate the frequency of elements in a data stream using sub-linear space and constant time. It maps data to the matrix using hash functions, where the number of rows equals the number of hash functions and the number of columns depends on the maximum hash output. The algorithm may overestimate frequencies due to hash collisions but provides an approximation of frequencies in sub-linear space.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Count Min Sketch Algorithm description

Count min sketch algorithm is used to find frequency of the elements in the stream using constant
time and sublinear space. Count min sketch algorithm does not store the complete value of data
stream, but it will use a matrix to compute the frequency. In this matrix the number of rows is equal
to the number of hash function used and number of columns depends on the maximum output of
the hash function.

Example: -

Stream= {A, B, C, A, A, C, D, …...}

H1 H2 H3 H4
A 1 6 3 1
B 1 2 4 6
C 3 4 1 6
D 6 2 4 1

No of rows =No of hash function =4

Not of column =Max value of hash function=6

Initially all the elements of the matrix will be 0

0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0

Now for each data from stream now let’s calculate the Hash outputs and increment the
corresponding counter in the table….

For first element(A)

H1(A)=1 H2(A)=6 H3(A)=3 H1(A)=1

1 0 0 0 0 0
0 0 0 0 0 1
0 0 1 0 0 0
1 0 0 0 0 0

For second element(B)

H1(B)=1 H2(B)=2 H3(B)=4 H1(B)=6

2 0 0 0 0 0
0 1 0 0 0 1
0 0 1 1 0 0
1 0 0 0 0 1

For third element(C)


H1(C)=3 H2(C)=4 H3(C)=1 H1(C)=6

2 0 1 0 0 0
0 1 0 1 0 1
1 0 1 1 0 0
1 0 0 0 0 2

For fourth element(A)

H1(A)=1 H2(A)=6 H3(A)=3 H1(A)=1

3 0 1 0 0 0
0 1 0 1 0 2
1 0 2 1 0 0
2 0 0 0 0 2

For fifth element(A)

H1(A)=1 H2(A)=6 H3(A)=3 H1(A)=1

4 0 1 0 0 0
0 1 0 1 0 3
1 0 3 1 0 0
3 0 0 0 0 2

For sixth element(C)

H1(C)=3 H2(C)=4 H3(C)=1 H1(C)=6

4 0 2 0 0 0
0 1 0 2 0 3
2 0 3 1 0 0
3 0 0 0 0 3

For seventh element(D)

H1(D)=6 H2(D)=2 H3(D)=4 H1(D)=1

4 0 2 0 0 1
0 2 0 2 0 3
2 0 3 2 0 0
4 0 0 0 0 3

Now our data is loaded in the table

The frequency of A will be min (H1(A)=1, H2(A)=6, H3(A)=3, H4(A)=1) => min (4,3,3,4) =3

Drawback of count min sketch algorithm: -

In case of hash collision, the frequency that we may get may be greater than the actual frequency.
Time complexity of count min sketch algorithm: -

O (1)

Space complexity of count min sketch algorithm: -

O (n) (Sub-linear time complexity)

Accuracy of count min sketch algorithm: -

Accuracy increases with increasing number of hash functions

References: -

https://medium.com/@gopalkrushnapattanaik/understanding-count-min-sketch-8a10590fc936

https://florian.github.io/count-min-sketch/

https://www.youtube.com/watch?v=ibxXO-b14j4&ab_channel=TechDummiesNarendraL

https://www.google.com/search?
q=sublinear+graph&tbm=isch&source=iu&ictx=1&vet=1&fir=4LcIi4hK45qC7M
%252C_5JElye65bN4DM%252C_%253BSmc6Bkj5p_SebM%252C0A1v8H63kgJSSM%252C_
%253BxBKicTge1fGKjM%252C4sojUZ9rQocddM%252C_%253Byyf26NMx9eYaGM
%252C_5JElye65bN4DM%252C_%253B9Ltj5hmSkfIePM%252CJ0qqVmkhbbqIhM%252C_
%253BWrj4QQB_tisVrM%252C36RYvr9xImUbDM%252C_%253BmBYHEpe-0pzbJM
%252C_5JElye65bN4DM%252C_%253B4ENCtrbKChUfKM%252CtHdd4magQqzMmM%252C_
%253BpaUKgidShP4NiM%252Crsx3ojsQlpt15M%252C_%253BXeseGQQVmxDd9M
%252CUXBHrz_g4v_KwM%252C_%253Bzq8MztJsQRpQFM%252C0A1v8H63kgJSSM%252C_
%253BKM6YmtDuAHFhyM%252C5BYpGv2-Sp97LM%252C_%253BnAfSXaLoEfb8RM
%252CfhTRTwIAjdMhiM%252C_%253BilR0jf3vM90x6M%252C36RYvr9xImUbDM%252C_&usg=AI4_-
kTZTGfhVapyHfnji7fM70rj_azTZQ&sa=X&ved=2ahUKEwi7rN-
7zqL3AhVYad4KHdyCAbIQ9QF6BAgDEAE#imgrc=4LcIi4hK45qC7M
Script: -

Count min sketch algorithm is the probabilistic data structure which helps in finding the frequency of
the element in the data stream. It uses hash function to map the data to the frequency but unlikely
to hash tables, count min sketch algorithm uses sub-linear space. It stores data in the form of matrix
where the number of rows is equal to the number of hash function used and number of columns
depends on the maximum output of the hash function.

And ya the only drawback of the algorithm is that in case of hash collision it may over count the
frequency of the element.

Taking about the experiences, I learnt about how to work as a team and also, I learnt different things
from my team mates. Overall, the experience was great.

You might also like