Representing Data - Collections
Representing Data - Collections
Set data
Our first kind of collection is the set data type, which is a collection of zero or more distinct values,
where order does not matter. Examples include: the set of all people in Toronto; the set of words of
the English language; and the set of all countries on Earth.
Each piece of data contained in a set is called an element of that set. In your previous studies, you
may have seen sets written by using curly braces, with each element inside the braces separated by
commas. For example, {1, 2, 3} or {‘hi’, ‘bye’}.
In Python, we represent sets with the set data type. Set literals are written using curly braces,
matching the mathematical notation we just described. Here are some examples of set literals:
>>> {'David'}
{'David'}
>>> {1, 2, 3}
{1, 2, 3}
>>> {1, 2.0, 'three'}
{1, 2.0, 'three'}
In Python, sets can have elements of the same type, or of different types. We call a set where every
element has the same type a homogeneous set, and a set of where there are elements of different
types a heterogeneous set. In this course we’ll typically work with homogeneous sets, but
occasionally it will be useful to have heterogeneous sets.
There is another way of writing sets: set builder notation, which defines the form of elements of a
set using variables. We saw an example of this earlier when defining the set of rational numbers,
Q = { pq ∣ p, q ∈ Z and q ≠ 0}. We’ll explore this notation in more detail in Section
Section
Section1.7
1.7
1.7,
1.7 but for
now we’ll stick with plain set literals to keep things simple.
Operations on sets
As with strings, there are many different operations we can perform on sets. In this section we’ll
just illustrate three that reuse Python operators we’ve already seen, and then in the next chapter
we’ll see a few more.
First we have set equality. Two sets are equal when they contain the exact same elements
(ignoring what order the elements are written in). In Python, we (unsurprisingly!) use the ==
operator to compare sets.
The analog of “substrings” for sets is the notion of subsets. Given two sets S1 and S2, we say that
S1 is a subset of S2 when all elements of S1 are contained in S2. 1 In Python, we use the <=
1
This is analogous to the definition of substrings, except without any mention of order, because order doesn’t matter in
sets.
You might have noticed an inconsistency with strings here: Python uses in to check for
substrings, but <= to check for subsets. This is because in is used to express a different
fundamental set operation: element checking. In mathematics, we use the symbol ∈ to mean “is an
element of” or “is in”, e.g. 1 ∈ {1, 2, 3}. In Python, we use the in operator to express the same
computation:
>>> 1 in {1, 2, 3}
True
>>> 10 in {1, 2, 3}
False
List data
The second form of collection we will study is the list, which is a sequence of zero or more values
that may contain duplicates. Like sets, the values contained in a list are called the elements of the
list. Unlike sets, lists can contain duplicates, and order matters in a list. 2 List data is used instead of
2 So both strings and lists are examples of sequences, where order matters. In fact, in some programming languages (but
not Python) strings are represented simply as lists of characters, rather than a separate data type.
a set when the elements of the collection should be in a specified order, or if it may contain
duplicates. Examples include: the list of all people in Toronto, ordered by age; the list of words of
the English language, ordered alphabetically, and the list of names of students at the University of
Toronto (two students may have the same name!), ordered alphabetically.
In mathematics, lists are written using square brackets with each element contained in the list
separated by commas, for example, [1, 2, 3]. In Python, lists are represented using the list data
type, 3 and list literals use this same syntax:
3 There is another closely related Python data type called tuple that we’ll see later in this course.
>>> [1, 2, 3]
[1, 2, 3]
>>> ['David']
['David']
>>> [1, 2.0, 'three']
[1, 2.0, 'three']
Like sets, Python lists can be either homogeneous (all elements have the same type) or
heterogeneous (elements have different types).
Operations on lists
As with sets, we’ll just illustrate a few different operations on lists here, leaving others to future
chapters.
First, list equality: two lists are equal when the contain the same elements in the same order. In
Python, we use == to compare lists for equality:
>>> 3 in [1, 2, 3]
True
>>> 10 in [1, 2, 3]
False
And like strings, lists support list concatenation (using the + operator) and list indexing (using
square brackets), where the indexing also starts at 0.
So our final data type for this section is the mapping, which is a collection of association pairs, 4
4 “Association pairs” are also called “key-value pairs”.
where each pair consists of a key and associated value for that key. (Note that keys are themselves
pieces of data like names or numbers.) In a mapping, each key must be unique, but values can be
duplicated. A key cannot exist in the mapping without a corresponding value.
For example, to represent a restaurant menu with a mapping, each food item would be a key, and
the corresponding value would be its price. In the mapping, each food item is unique (and must be
associated with exactly one price), but it is possible for two food items to have the same price.
We use curly braces to represent a mapping. 5 Each key-value association pair in a mapping is
5
This is similar to sets, because mappings are quite similar to sets. Both data types are unordered, and both have a
uniqueness constraint (a set’s elements are unique; a mapping’s keys are unique).
written using a colon, with the key on the left side of the colon and its associated value on the
right. For example, here is how we could write a mapping representing the menu items of a
restaurant:
In Python, we represent mappings using the dict (short for “dictionary”) data type, and write
literals using the syntax described above.
Like sets and lists, Python dictionaries can have both keys and associated values of different types.
A homogeneous dictionary is a dictionary where every key has the same type, and every associated
value has the same type, but the key type and associated value type can be different. 6 A
6 So the “menu” dictionary example above is a homogeneous dictionary.
heterogeneous dictionary is a dictionary where the keys have different types or the associated
values have different types, or both.
Operations on mappings
We’ll end off by discussing three fundamental operations on mappings.
As usual, our first operation is mapping equality: two mappings are equal when they contain the
exact same key-value pairs. In Python, we use == to compare two dictionaries for equality.
The second operation is analogous to element checking for sets and lists: key checking. Given a
mapping M and value k, we use k ∈ M to express that k is a key in M . In Python, we accomplish
the same task with the in operator:
One warning: in Python there is no equivalent operator to check whether there is a given
“associated value” in a dictionary. So for example, we can’t use in to check for the presence of
prices in our menu example:
The third operation on mappings is key lookup. Given a mapping M and key k that appears in the
mapping, this operation returns the value that is associated with k in M . We use the same square
bracket syntax as string/list indexing, writing M[k] to denote this operation. Here is an example of
this syntax in Python:
That’s it for now—as with all of the other data types, we’ll explore Python dictionaries in more
detail throughout this course. Before we wrap up, we’ll leave you with one final note about empty
collections.
Empty collections
All three kinds of collections we’ve studied allow for a collection of size zero. An empty set or list
contains zero elements, and an empty mapping contains zero key-value pairs. In Python, the literal
[] represents an empty list.
But we have a problem with sets and dictionaries: both of their literals use curly braces in Python,
so does {} represent an empty set or an empty dictionary? The answer (for historical reasons) is
that {} is the literal for an empty dictionary—Python has no literal to represent an empty set.
Instead, we can create an empty set in Python with the expression set() , which is syntax we
haven’t seen yet but will explore in the next chapter.
Summary
The collection data types are a bit more complex that the earlier data types we looked at, and they
might blur together as you are first encountering them. We’ve put together a summary table for
the three data types we studied in this section to help you review what you’ve learned.
Definition of all elements have same all elements have same all keys have same type,
homogeneous type type and all values have same
type
References
• Appendix
Appendix
AppendixA.2
A.2
A.2Python
Python
PythonBuilt-In
Built-In
Built-InData
Data
DataTypes
Types
TypesReference
Reference
Reference
CSC110/111
CSC110/111Course
CSC110/111 CourseNotes
Course NotesHome
Notes Home
Home