0% found this document useful (0 votes)
4 views

SortingData Structures in Python

This lesson teaches students various sorting techniques in Python, including the use of the sort() and sorted() functions, customization of sorting behavior, and sorting of dictionaries. It covers handling tie-breaking in sorting and working with nested data structures. Additionally, it includes discussion questions and sample problems to reinforce learning.

Uploaded by

hannnlady
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

SortingData Structures in Python

This lesson teaches students various sorting techniques in Python, including the use of the sort() and sorted() functions, customization of sorting behavior, and sorting of dictionaries. It covers handling tie-breaking in sorting and working with nested data structures. Additionally, it includes discussion questions and sample problems to reinforce learning.

Uploaded by

hannnlady
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Sorting and Nested Data Structures in Python

By the end of this lesson, students will be able to:

1.​ Understand and apply different sorting techniques in Python.


2.​ Utilize the sort() and sorted() functions effectively.
3.​ Customize sorting behavior using optional parameters.
4.​ Sort dictionaries based on keys and values.
5.​ Handle tie-breaking in sorting using secondary criteria.
6.​ Work with nested data structures, including lists of complex items and nested
dictionaries.

1. Introduction to Sorting in Python

●​ Sorting Techniques: Introduce the two primary methods for sorting in Python: sort()
and sorted().
○​ list.sort(): Sorts the list in place and returns None.
○​ sorted(): Returns a new sorted list from the elements of any iterable.

Note that the sort method does not return a sorted version of the list. In fact, it returns the value
None. But the list itself has been modified. This kind of operation that works by having a side
effect on the list can be quite confusing.

L2 = ["Cherry", "Apple", "Blueberry"]​



L3 = sorted(L2)​
print(L3)​
print(sorted(L2))​
print(L2) # unchanged​

print("----")​

L2.sort()​
print(L2)​
print(L2.sort()) #return value is None

2. Basic Sorting Examples

3. Customizing Sort Behavior


●​ Reverse Parameter: Discuss how to sort in descending order.
●​ Key Parameter: Explain how to use a function to determine the sort order.

The sorted function takes some optional parameters (see the Optional Parameters
page). The first optional parameter is a key function, which will be described in the next
section. The second optional parameter is a Boolean value which determines whether to sort
the items in reverse order. By default, it is False, but if you set it to True, the list will be sorted
in reverse order.

L2 = ["Cherry", "Apple", "Blueberry"]​


print(sorted(L2, reverse=True))

●​ Optional with key parameter

If you want to sort things in some order other than the “natural” or its reverse, you
can provide an additional parameter, the key parameter. For example, suppose you want to
sort a list of numbers based on their absolute value, so that -4 comes after 3? Or suppose
you have a dictionary with strings as the keys and numbers as the values. Instead of sorting
them in alphabetic order based on the keys, you might like to sort them in order based on
their values.

L1 = [1, 7, 4, -2, 3]​



def absolute(x):​
if x >= 0:​
return x​
else:​
return -x​

print(absolute(3))​
print(absolute(-119))​

for y in L1:​
print(absolute(y))
Now, we can pass the absolute function to sorted in order to specify that we want the
items sorted in order of their absolute value, rather than in order of their actual value.

L1 = [1, 7, 4, -2, 3]​



def absolute(x):​
if x >= 0:​
return x​
else:​
return -x​

L2 = sorted(L1, key=absolute)​
print(L2)​

#or in reverse order​
print(sorted(L1, reverse=True, key=absolute))

What’s really going on there? We’ve done something pretty strange. Before, all the values
we have passed as parameters have been pretty easy to understand: numbers, strings, lists,
Booleans, dictionaries. Here we have passed a function object: absolute is a variable name whose
value is the function. When we pass that function object, it is not automatically invoked. Instead, it is
just bound to the formal parameter key of the function sorted.

We are not going to look at the source code for the built-in function sorted. But if we did, we
would find somewhere in its code a parameter named key with a default value of None. When a
value is provided for that parameter in an invocation of the function sorted, it has to be a function.
What the sorted function does is call that key function once for each item in the list that’s getting
sorted. It associates the result returned by that function (the absolute function in our case) with the
original value. Think of those associated values as being little post-it notes that decorate the original
values. The value 4 has a post-it note that says 4 on it, but the value -2 has a post-it note that says 2
on it. Then the sorted function rearranges the original items in order of the values written on their
associated post-it notes.

To illustrate that the absolute function is invoked once on each item:

L1 = [1, 7, 4, -2, 3]​



def absolute(x):​
print("--- figuring out what to write on the post-it note for " + str(x))​
if x >= 0:​
return x​
else:​
return -x​

print("About to call sorted")​
L2 = sorted(L1, key=absolute)​
print("Finished execution of sorted")​
print(L2)
4. Sorting Dictionaries

●​ Sorting by Keys: Show how to sort a dictionary by its keys.


●​ Sorting by Values: Show how to sort a dictionary by its values.

L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']​

d = {}​
for x in L:​
if x in d:​
d[x] = d[x] + 1​
else:​
d[x] = 1​
for x in d.keys():​
print("{} appears {} times".format(x, d[x]))

The dictionary’s keys are not sorted in any particular order. In fact, you may get a different
order of output than someone else running the same code. We can force the results to be displayed
in some fixed ordering, by sorting the keys.

L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']​

d = {}​
for x in L:​
if x in d:​
d[x] = d[x] + 1​
else:​
d[x] = 1​
y = sorted(d.keys())​
for k in y:​
print("{} appears {} times".format(k, d[k]))
With a dictionary that’s maintaining counts or some other kind of score, we might prefer to
get the outputs sorted based on the count rather than based on the items. The standard way to do
that in python is to sort based on a property of the key, in particular its value in the dictionary.

Here things get a little confusing because we have two different meaning of the word “key”.
One meaning is a key in a dictionary. The other meaning is the parameter name for the function that
you pass into the sorted function.

Remember that the key function always takes as input one item from the sequence and
returns a property of the item. In our case, the items to be sorted are the dictionary’s keys, so each
item is one key from the dictionary. To remind ourselves of that, we’ve named the parameter in tha
lambda expression k. The property of key k that is supposed to be returned is its associated value in
the dictionary. Hence, we have the lambda expression lambda k: d[k].

L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']​

d = {}​
for x in L:​
if x in d:​
d[x] = d[x] + 1​
else:​
d[x] = 1​

y = sorted(d.keys(), key=lambda k: d[k], reverse=True)​
for k in y:​
print("{} appears {} times".format(k, d[k]))

Here’s a version of that using a named function.

L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']​

d = {}​
for x in L:​
if x in d:​
d[x] = d[x] + 1​
else:​
d[x] = 1​

def g(k):​
return d[k]​

y =(sorted(d.keys(), key=g, reverse=True))​

# now loop through the keys​
for k in y:​
print("{} appears {} times".format(k, d[k]))

When we sort the keys, passing a function with key=lambda x: d[x] does not specify to
sort the keys of a dictionary. The lists of keys are passed as the first parameter value in the
invocation of sort. The key parameter provides a function that says how to sort them.
L = ['E', 'F', 'B', 'A', 'D', 'I', 'I', 'C', 'B', 'A', 'D', 'D', 'E', 'D']​

d = {}​
for x in L:​
if x in d:​
d[x] = d[x] + 1​
else:​
d[x] = 1​

# now loop through the sorted keys​
for k in sorted(d, key=lambda k: d[k], reverse=True):​
print("{} appears {} times".format(k, d[k]))

5. Tie-Breaking in Sorting

What happens when two items are “tied” in the sort order? For example, suppose we
sort a list of words by their lengths. Which five letter word will appear first?

The answer is that the python interpreter will sort the tied items in the same order they
were in before the sorting.

What if we wanted to sort them by some other property, say alphabetically, when the
words were the same length? Python allows us to specify multiple conditions when we perform
a sort by returning a tuple from a key function.

First, let’s see how python sorts tuples. We’ve already seen that there’s a built-in sort
order, if we don’t specify any key function. For numbers, it’s lowest to highest. For strings, it’s
alphabetic order. For a sequence of tuples, the default sort order is based on the default sort
order for the first elements of the tuples, with ties being broken by the second elements, and
then third elements if necessary, etc. For example,

tups = [('A', 3, 2),​


('C', 1, 4),​
('B', 3, 1),​
('A', 2, 4),​
('C', 1, 2)]​
for tup in sorted(tups):​
print(tup)

In the code below, we are going to sort a list of fruit words first by their length, smallest to
largest, and then alphabetically to break ties among words of the same length. To do that, we
have the key function return a tuple whose first element is the length of the fruit’s name, and
second element is the fruit name itself.

fruits = ['peach', 'kiwi', 'apple', 'blueberry', 'papaya', 'mango', 'pear']​


new_order = sorted(fruits, key=lambda fruit_name: (len(fruit_name),
fruit_name))​
for fruit in new_order:​
print(fruit)

Here, each word is evaluated first on it’s length, then by its alphabetical order. Note that we
could continue to specify other conditions by including more elements in the tuple.

What would happen though if we wanted to sort it by largest to smallest, and then by
alphabetical order?

fruits = ['peach', 'kiwi', 'apple', 'blueberry', 'papaya', 'mango', 'pear']​


new_order = sorted(fruits, key=lambda fruit_name: (len(fruit_name),
fruit_name), reverse=True)​
for fruit in new_order:​
print(fruit)
Do you see a problem here? Not only does it sort the words from largest to smallest, but also in
reverse alphabetical order! Can you think of any ways you can solve this issue?

One solution is to add a negative sign in front of len(fruit_name), which will convert all positive
numbers to negative, and all negative numbers to positive. As a result, the longest elements would
be first and the shortest elements would be last.

fruits = ['peach', 'kiwi', 'apple', 'blueberry', 'papaya', 'mango', 'pear']​


new_order = sorted(fruits, key=lambda fruit_name: (-len(fruit_name),
fruit_name))​
for fruit in new_order:​
print(fruit)

When to use Lambda Expression


Though you can often use a lambda expression or a named function interchangeably when
sorting, it’s generally best to use lambda expressions until the process is too complicated, and then a
function should be used. For example, in the following examples, we’ll be sorting a dictionary’s keys
by properties of its values. Each key is a state name and each value is a list of city names.

For our first sort order, we want to sort the states in order by the length of the first city name.
Here, it’s pretty easy to compute that property. states[state] is the list of cities associated with a
particular state. So If state is a list of city strings, len(states[state][0]) is the length of the first city
name. Thus, we can use a lambda expression:
states = {"Minnesota": ["St. Paul", "Minneapolis", "Saint Cloud",
"Stillwater"],​
"Michigan": ["Ann Arbor", "Traverse City", "Lansing",
"Kalamazoo"],​
"Washington": ["Seattle", "Tacoma", "Olympia", "Vancouver"]}​

print(sorted(states, key=lambda state: len(states[state][0])))
That’s already pushing the limits of complex a lambda expression can be before it’s reall
hard to read (or debug).

For our second sort order, the property we want to sort by is the number of cities that begin
with the letter ‘S’. The function defining this property is harder to express, requiring a filter and count
accumulation pattern. So we are better off defining a separate, named function. Here, we’ve chosen
to make a lambda expression that looks up the value associated with the particular state and pass
that value to the named function s_cities_count. We could have passed just the key, but then the
function would have to look up the value, and it would be a little confusing, from the code, to figure
out what dictionary the key is supposed to be looked up in. Here, we’ve done the lookup right in the
lambda expression, which makes it a little bit clearer that we’re just sorting the keys of the states
dictionary based on a property of their values. It also makes it easier to reuse the counting function
on other city lists, even if they aren’t embedded in that particular states dictionary.

def s_cities_count(city_list):​
ct = 0​
for city in city_list:​
if city[0] == "S":​
ct += 1​
return ct​

states = {"Minnesota": ["St. Paul", "Minneapolis", "Saint Cloud",
"Stillwater"],​
"Michigan": ["Ann Arbor", "Traverse City", "Lansing", "Kalamazoo"],​
"Washington": ["Seattle", "Tacoma", "Olympia", "Vancouver"]}​

print(sorted(states, key=lambda state: s_cities_count(states[state])))
Discussion Questions:

1.​ What are the differences between sort() and sorted()? When would you use one over
the other?
2.​ How does the key parameter enhance the sorting functionality? Can you think of a
scenario where it would be particularly useful?
3.​ In what situations might you need to implement tie-breaking in your sorting logic?
4.​ How do nested data structures complicate sorting? Can you provide an example from
real-world data?

Sample Problems:

1.​ Problem 1: Given a list of integers, sort them in ascending order and then in descending
order using both sort() and sorted().

# List of dictionaries with K-Pop idols​


kpop_idols = [​
​ {"name": "Jimin", "age": 27},​
​ {"name": "Lisa", "age": 26},​
​ {"name": "BamBam", "age": 26},​
​ {"name": "Suga", "age": 30},​
​ {"name": "Rosé", "age": 26}​
]​

# Sort by age​
sorted_kpop_idols = sorted(kpop_idols, key=lambda x: x['age'])​
print("Sorted K-Pop idols by age:", sorted_kpop_idols)

2.​ Problem 2: Create a dictionary of products with their prices. Sort the dictionary by
product names and then by prices.
3.​ Problem 3: You have a list of tuples containing names and scores of students. Sort the
list first by scores in descending order and then by names in

You might also like