This article is aimed at developers who want to find the largest or smallest items with Python. I will show a few methods touse and will conclude the best method for you.
Method – 1: Slice approach on a List
If you are simply trying to find the single smallest or largest item i.e N = 1, it is faster to use min() and max().
Let us begin by generating some random integers.
import random # Create a random list of integers random_list = random.sample(range(1,10),9) random_list
Output
[2, 4, 5, 1, 7, 9, 6, 8, 3]
FINDING THE SMALLEST & LARGEST ITEM (N=1)
# Find the smallest number (N=1) min ( random_list )
Output
1
# Find the largest number (N=1) max ( random_list )
Output
9
FINDING THE 3 SMALLEST & LARGEST ITEMS (N=3)
Similarly, if N is about the same size as the collection itself, it is usually faster to sort it first and take a slice of N.
# lets get the nsmallest using a slice approach(N=3) sorted(random_list)[:3]
Output
[1, 2, 3]
# lets get the nlargest using a slice approach(N=3) sorted(random_list)[-3:]
Output
[7, 8, 9]
Method – 2: heapq Method on a List
The heapq module has two functions—nlargest() and nsmallest() that can be used to find the nsmallest or nlargest items.
import heapq import random random_list = random.sample(range(1,10),9) # nsmallest items (N=3) heapq.nsmallest(3,random_list)
Output
[1, 2, 3]
# nlargest items (N=3) heapq.nlargest(3,random_list)
Output
[9, 8, 7]
If you have a bit more complicated data, heapq functions have a key parameter that can be used.
import heapq grandslams = [ {'name':'Roger Federer','titles':20}, {'name:'Rafel Nadal','titles':19}, {'name':'Novak Djokovic','titles':17}, {'name':'Andy Murray','titles':3},] # Players with less titles (N=3) less = heapq.nsmallest(3,grandslams, key = lambdas:s['titles']) less
Output
[{'name': 'Andy Murray', 'titles': 3}, {'name': 'Novak Djokovic', 'titles': 17}, {'name': 'Rafel Nadal', 'titles': 19}]
# Players with highest titles (N=3) more = heapq.nlargest(3,grandslams,key = lambdas:s['titles']) more
Output
[{'name': 'Roger Federer', 'titles': 20}, {'name': 'Rafel Nadal', 'titles': 19}, {'name': 'Novak Djokovic', 'titles': 17}]
Finding N Largest and Smallest from a DataFrame.
Well, the world is made up of CSV files, Yes they do!.
So it is very safe to assume that at some point in your python development you would encounter CSV’s and apparentlyDataFrame.
I will show you couple of methods to find the N largest/ smallest from a DataFrame.
In the first method we will sort the values using sort_values() method and pick up the values using head method.
import pandas as pd import io # Define your data data = """ player,titles Djokovic,17 Nadal,19 Federer,20 Murray,3 """ throwaway_storage = io.StringIO(data) df = pd.read_csv(throwaway_storage,index_col = "player")
# nsmallest (N = 3) df.sort_values("titles").head(3)
Output
player title _______________ Murray 3 Djokovic 17 Nadal 19
# nlargest (N = 3) df.sort_values("titles",ascending = False).head(3)
Output
player title _______________ Federer 20 Nadal 19 Djokovic 17
Instead of sorting the rows and using the .head() method, we can call the .nsmallest() and .nlargest() methods.
df.nsmallest(3,columns="titles")
Output
player title _______________ Murray 3 Djokovic 17 Nadal 19
df.nlargest(3,columns = "titles")
Output
player title _______________ Federer 20 Nadal 19 Djokovic 17
Conclusion
If you are trying to find a relatively small number of items, then the nlargest() and nsmallest() functions are most appropriate.
If you are simply trying to find the single smallest or largest item (N=1), it is faster to use min() and max().
Similarly, if N is about the same size as the collection itself, it is usually faster to sort it first and take a slice.
In conclusion, the actual implementation of nlargest() and nsmallest() is adaptive in how python operates and will carry outsome of these optimizations on your behalf.