SortingData Structures in Python
SortingData Structures in Python
● Sorting Techniques: Introduce the two primary methods for sorting in Python: sort()
and sorted().
○ list.sort(): Sorts the list in place and returns None.
○ sorted(): Returns a new sorted list from the elements of any iterable.
Note that the sort method does not return a sorted version of the list. In fact, it returns the value
None. But the list itself has been modified. This kind of operation that works by having a side
effect on the list can be quite confusing.
The sorted function takes some optional parameters (see the Optional Parameters
page). The first optional parameter is a key function, which will be described in the next
section. The second optional parameter is a Boolean value which determines whether to sort
the items in reverse order. By default, it is False, but if you set it to True, the list will be sorted
in reverse order.
If you want to sort things in some order other than the “natural” or its reverse, you
can provide an additional parameter, the key parameter. For example, suppose you want to
sort a list of numbers based on their absolute value, so that -4 comes after 3? Or suppose
you have a dictionary with strings as the keys and numbers as the values. Instead of sorting
them in alphabetic order based on the keys, you might like to sort them in order based on
their values.
What’s really going on there? We’ve done something pretty strange. Before, all the values
we have passed as parameters have been pretty easy to understand: numbers, strings, lists,
Booleans, dictionaries. Here we have passed a function object: absolute is a variable name whose
value is the function. When we pass that function object, it is not automatically invoked. Instead, it is
just bound to the formal parameter key of the function sorted.
We are not going to look at the source code for the built-in function sorted. But if we did, we
would find somewhere in its code a parameter named key with a default value of None. When a
value is provided for that parameter in an invocation of the function sorted, it has to be a function.
What the sorted function does is call that key function once for each item in the list that’s getting
sorted. It associates the result returned by that function (the absolute function in our case) with the
original value. Think of those associated values as being little post-it notes that decorate the original
values. The value 4 has a post-it note that says 4 on it, but the value -2 has a post-it note that says 2
on it. Then the sorted function rearranges the original items in order of the values written on their
associated post-it notes.
L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']
d = {}
for x in L:
if x in d:
d[x] = d[x] + 1
else:
d[x] = 1
for x in d.keys():
print("{} appears {} times".format(x, d[x]))
The dictionary’s keys are not sorted in any particular order. In fact, you may get a different
order of output than someone else running the same code. We can force the results to be displayed
in some fixed ordering, by sorting the keys.
L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']
d = {}
for x in L:
if x in d:
d[x] = d[x] + 1
else:
d[x] = 1
y = sorted(d.keys())
for k in y:
print("{} appears {} times".format(k, d[k]))
With a dictionary that’s maintaining counts or some other kind of score, we might prefer to
get the outputs sorted based on the count rather than based on the items. The standard way to do
that in python is to sort based on a property of the key, in particular its value in the dictionary.
Here things get a little confusing because we have two different meaning of the word “key”.
One meaning is a key in a dictionary. The other meaning is the parameter name for the function that
you pass into the sorted function.
Remember that the key function always takes as input one item from the sequence and
returns a property of the item. In our case, the items to be sorted are the dictionary’s keys, so each
item is one key from the dictionary. To remind ourselves of that, we’ve named the parameter in tha
lambda expression k. The property of key k that is supposed to be returned is its associated value in
the dictionary. Hence, we have the lambda expression lambda k: d[k].
L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']
d = {}
for x in L:
if x in d:
d[x] = d[x] + 1
else:
d[x] = 1
y = sorted(d.keys(), key=lambda k: d[k], reverse=True)
for k in y:
print("{} appears {} times".format(k, d[k]))
L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']
d = {}
for x in L:
if x in d:
d[x] = d[x] + 1
else:
d[x] = 1
def g(k):
return d[k]
y =(sorted(d.keys(), key=g, reverse=True))
# now loop through the keys
for k in y:
print("{} appears {} times".format(k, d[k]))
When we sort the keys, passing a function with key=lambda x: d[x] does not specify to
sort the keys of a dictionary. The lists of keys are passed as the first parameter value in the
invocation of sort. The key parameter provides a function that says how to sort them.
L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']
d = {}
for x in L:
if x in d:
d[x] = d[x] + 1
else:
d[x] = 1
# now loop through the sorted keys
for k in sorted(d, key=lambda k: d[k], reverse=True):
print("{} appears {} times".format(k, d[k]))
5. Tie-Breaking in Sorting
What happens when two items are “tied” in the sort order? For example, suppose we
sort a list of words by their lengths. Which five letter word will appear first?
The answer is that the python interpreter will sort the tied items in the same order they
were in before the sorting.
What if we wanted to sort them by some other property, say alphabetically, when the
words were the same length? Python allows us to specify multiple conditions when we perform
a sort by returning a tuple from a key function.
First, let’s see how python sorts tuples. We’ve already seen that there’s a built-in sort
order, if we don’t specify any key function. For numbers, it’s lowest to highest. For strings, it’s
alphabetic order. For a sequence of tuples, the default sort order is based on the default sort
order for the first elements of the tuples, with ties being broken by the second elements, and
then third elements if necessary, etc. For example,
In the code below, we are going to sort a list of fruit words first by their length, smallest to
largest, and then alphabetically to break ties among words of the same length. To do that, we
have the key function return a tuple whose first element is the length of the fruit’s name, and
second element is the fruit name itself.
Here, each word is evaluated first on it’s length, then by its alphabetical order. Note that we
could continue to specify other conditions by including more elements in the tuple.
What would happen though if we wanted to sort it by largest to smallest, and then by
alphabetical order?
One solution is to add a negative sign in front of len(fruit_name), which will convert all positive
numbers to negative, and all negative numbers to positive. As a result, the longest elements would
be first and the shortest elements would be last.
For our first sort order, we want to sort the states in order by the length of the first city name.
Here, it’s pretty easy to compute that property. states[state] is the list of cities associated with a
particular state. So If state is a list of city strings, len(states[state][0]) is the length of the first city
name. Thus, we can use a lambda expression:
states = {"Minnesota": ["St. Paul", "Minneapolis", "Saint Cloud",
"Stillwater"],
"Michigan": ["Ann Arbor", "Traverse City", "Lansing",
"Kalamazoo"],
"Washington": ["Seattle", "Tacoma", "Olympia", "Vancouver"]}
print(sorted(states, key=lambda state: len(states[state][0])))
That’s already pushing the limits of complex a lambda expression can be before it’s reall
hard to read (or debug).
For our second sort order, the property we want to sort by is the number of cities that begin
with the letter ‘S’. The function defining this property is harder to express, requiring a filter and count
accumulation pattern. So we are better off defining a separate, named function. Here, we’ve chosen
to make a lambda expression that looks up the value associated with the particular state and pass
that value to the named function s_cities_count. We could have passed just the key, but then the
function would have to look up the value, and it would be a little confusing, from the code, to figure
out what dictionary the key is supposed to be looked up in. Here, we’ve done the lookup right in the
lambda expression, which makes it a little bit clearer that we’re just sorting the keys of the states
dictionary based on a property of their values. It also makes it easier to reuse the counting function
on other city lists, even if they aren’t embedded in that particular states dictionary.
def s_cities_count(city_list):
ct = 0
for city in city_list:
if city[0] == "S":
ct += 1
return ct
states = {"Minnesota": ["St. Paul", "Minneapolis", "Saint Cloud",
"Stillwater"],
"Michigan": ["Ann Arbor", "Traverse City", "Lansing", "Kalamazoo"],
"Washington": ["Seattle", "Tacoma", "Olympia", "Vancouver"]}
print(sorted(states, key=lambda state: s_cities_count(states[state])))
Discussion Questions:
1. What are the differences between sort() and sorted()? When would you use one over
the other?
2. How does the key parameter enhance the sorting functionality? Can you think of a
scenario where it would be particularly useful?
3. In what situations might you need to implement tie-breaking in your sorting logic?
4. How do nested data structures complicate sorting? Can you provide an example from
real-world data?
Sample Problems:
1. Problem 1: Given a list of integers, sort them in ascending order and then in descending
order using both sort() and sorted().
2. Problem 2: Create a dictionary of products with their prices. Sort the dictionary by
product names and then by prices.
3. Problem 3: You have a list of tuples containing names and scores of students. Sort the
list first by scores in descending order and then by names in