Problem
You need to identify the most frequently occurring items in a sequence.
Solution
We can use counter to keep track of the items in a sequence.
What is a Counter ?
The “Counter” is a mapping that holds an integer count for each key. Updating an existing key adds to its count. This Objectis used for counting the instances of hashable objects or as a multiset.
The “Counter” is one of your best friends when you are performing data analysis.
This object has been present in Python for quite some time, and so for a lot of you, this will be a quick review.We will start byimporting Counter from collections.
from collections import Counter
A traditional dictionary, if it has a missing key, will raise a key error. Python’s dictionaries will answer with a key error if thekey is not found.
# An empty dictionary dict = {} # check for a key in an empty dict dict['mystring'] # Error message --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-12-1e03564507c6> in <module> 3 4 # check for a key in an empty dict ----> 5 dict['mystring'] 6 7 # Error message KeyError: 'mystring'
How can we avoid key error exceptions in this situation ?
The Counter is a sub-class of dictionary and has very dictionary-like behavior, however, if you look up a missing key ratherthan raising a key error, it just returns zero.
# define the counter c = Counter()
# check for the unavailable key print(f"Output\n{c['mystring']}")
Output
0
c['mystring'] += 1 print(f"Output\n{c}")
Output
Counter({'mystring': 1})
Example
print(f"Output\n{type(c)}")
Output
<class 'collections.Counter'>
Most Frequently Occurring Items in a Sequence
One of the other nice things about the counter is that you can a list of objects and it will count them for you. It saves us frombuilding a loop in order to construct our counter.
Counter ('Peas porridge hot peas porridge cold peas porridge in the pot nine days old'.split())
Output
Counter({'Peas': 1, 'porridge': 3, 'hot': 1, 'peas': 2, 'cold': 1, 'in': 1, 'the': 1, 'pot': 1, 'nine': 1, 'days': 1, 'old': 1})
What the split will do is take the string and split it into a list of words. It splits on white space.
The “Counter” will loop over that list and count all of the words, giving us the counts shown in the output.
There is more, I can also count the most common words in the phrase.
most_common() method will give us the frequently occurring items.
count = Counter('Peas porridge hot peas porridge cold peas porridge in the pot nine days old'.split()) print(f"Output\n{count.most_common(1)}")
Output
[('porridge', 3)]
Example
print(f"Output\n{count.most_common(2)}")
Output
[('porridge', 3), ('peas', 2)]
Example
print(f"Output\n{count.most_common(3)}")
Output
[('porridge', 3), ('peas', 2), ('Peas', 1)]
Notice that it returned a list of tuples. The first part of the tuple is the word, and the second part is its count.
A little-known feature of Counter instances is that they can be easily combined using various mathematical operations.
string = 'Peas porridge hot peas porridge cold peas porridge in the pot nine days old' another_string = 'Peas peas hot peas peas peas cold peas' a = Counter(string.split()) b = Counter(another_string.split())
# Add counts add = a + b print(f"Output\n{add}")
Output
Counter({'peas': 7, 'porridge': 3, 'Peas': 2, 'hot': 2, 'cold': 2, 'in': 1, 'the': 1, 'pot': 1, 'nine': 1, 'days': 1, 'old': 1})
# Subtract counts sub = a - b print(f"Output\n{sub}")
Output
Counter({'porridge': 3, 'in': 1, 'the': 1, 'pot': 1, 'nine': 1, 'days': 1, 'old': 1})
Finally, Counter is very smart in how it stores the data in a container.
As you can see above it groups the words together when storing allowing us to take them together which is commonlyreferred to as a multiset.
We can pull the words one at a time using elements. It does not remember the order but puts all the words in a phrasetogether.
Example
print(f"Output\n{list(a.elements())}")
Output
['Peas', 'porridge', 'porridge', 'porridge', 'hot', 'peas', 'peas', 'cold', 'in', 'the', 'pot', 'nine', 'days', 'old']
Example
print(f"Output\n{list(a.values())}")
Output
[1, 3, 1, 2, 1, 1, 1, 1, 1, 1, 1]
Example
print(f"Output\n{list(a.items())}")
Output
[('Peas', 1), ('porridge', 3), ('hot', 1), ('peas', 2), ('cold', 1), ('in', 1), ('the', 1), ('pot', 1), ('nine', 1), ('days', 1), ('old', 1)]