Count-Min Sketch in Python
Last Updated :
20 May, 2024
Count-Min Sketch is a probabilistic data structure which approximates the frequency of items in a stream of data. It uses little memory while handling massive amounts of data and producing approximations of the answers. In this post, we'll explore the idea behind the Count-Min Sketch, how it's implemented in Python, and discuss its uses and drawbacks.
What is Count-Min Sketch?
Count-Min is a probabilistic data structure used to count unique items in a large stream of data. It is used to find an approximate frequency of the events on the streaming data.
The idea behind Count-Min Sketch is to use hash functions and a two-dimensional array (or matrix) to efficiently store the frequency of items. The array is made up of several rows and columns, where a bucket is represented by a column and a hash function by a row. The hash functions identify the locations in the array to increment or get counts when updating or querying the frequency of entries.
Key Operations in Count-Min Sketch:
Initialization: Set the number of rows and columns that you want in the Count-Min Sketch.
Update: To increase an element's count, hash it through each hash function and update the array's associated buckets.
Query: Find the lowest count across the related buckets after hashing an element with each hash algorithm to determine its estimated frequency.
Implementation of Count-Min Sketch in Python:
Below is the implementation of Count-Min Sketch in Python:
Python
import hashlib
class CountMinSketch:
def __init__(self, rows, cols):
self.rows = rows
self.cols = cols
self.count_matrix = [[0] * cols for _ in range(rows)]
self.hash_functions = [hashlib.md5, hashlib.sha1, hashlib.sha256]
def update(self, element):
for i, hash_func in enumerate(self.hash_functions):
hash_value = int(hash_func(element.encode()).hexdigest(), 16)
bucket_index = hash_value % self.cols
self.count_matrix[i][bucket_index] += 1
def query(self, element):
min_count = float('inf')
for i, hash_func in enumerate(self.hash_functions):
hash_value = int(hash_func(element.encode()).hexdigest(), 16)
bucket_index = hash_value % self.cols
min_count = min(min_count, self.count_matrix[i][bucket_index])
return min_count
# Example usage
cms = CountMinSketch(rows=3, cols=10)
data_stream = ["apple", "banana", "apple", "orange", "apple", "banana", "banana"]
for element in data_stream:
cms.update(element)
print("Frequency of 'apple':", cms.query("apple"))
OutputFrequency of 'apple': 3
Count-Min Sketch is a data structure that provides accurate approximations of element frequencies in massive data streams. Python developers may efficiently address a range of frequency estimation issues by using Count-Min Sketch provided they have a thorough knowledge of its concepts, uses, and limits.
Similar Reads
Python Bin | Count total bits in a number Given a positive number n, count total bit in it. Examples: Input : 13 Output : 4 Binary representation of 13 is 1101 Input : 183 Output : 8 Input : 4096 Output : 13 We have existing solution for this problem please refer Count total bits in a number link. Approach#1: We can solve this problem quick
3 min read
Find Chromatic Number in Python Find the chromatic number of a given graph G, which is the smallest number of colors needed to color the vertices of the graph in such a way that no two adjacent vertices share the same color. Examples: Input: Vertices = 5, Edges: [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (3, 4)]Output: Chromatic Num
4 min read
How to Plot Value Counts in Pandas In this article, we'll learn how to plot value counts using provide, which can help us quickly understand the frequency distribution of values in a dataset.Table of ContentConcepts Related to Plotting Value CountsSteps to Plot Value Counts in Pandas1. Install Required Libraries2. Import Required Lib
3 min read
Count the Number of Null Elements in a List in Python In data analysis and data processing, It's important to know about Counting the Number of Null Elements. In this article, we'll explore how to count null elements in a list in Python, along with three simple examples to illustrate the concept. Count the Number of Null Elements in a List in PythonIn
3 min read
Count-Min Sketch Data Structure with Implementation The Count-Min Sketch is a probabilistic data structure and is defined as a simple technique to summarize large amounts of frequency data. Count-min sketch algorithm talks about keeping track of the count of things. i.e, How many times an element is present in the set. What is Count-Min Sketch?Count-
7 min read
Learn Photoshop Note and Count Tool Adobe Photoshop is a raster-based image editing software. It is developed by Adobe.Inc and available for both macOS and Windows operating systems. You can use Photoshop to create or edit images, posters, banners, logos, invitation cards, and various types of graphic designing work. It provides vario
6 min read