Python Session 23-26 Module 3-Dictionaries
Python Session 23-26 Module 3-Dictionaries
Module 3
• Topic 1: Lists
• Topic 2: Dictionaries
• Topic 3: Tuples
• Topic4: Regular expressions
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
Topic 2: Dictionaries
2.1 Dictionary as a set of counters
2.2 Dictionaries and files
2.3 Looping and dictionaries
2.4 Advanced text parsing
2.5 Debugging
2.6 Exercises
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
Introduction to Dictionaries
A dictionary is like a list, but more general. In a list, the index positions have to be integers; in a dictionary, the
indices can be (almost) any type.
You can think of a dictionary as a mapping between a set of indices (which are called keys) and a set of values.
Each key maps to a value. The association of a key and a value is called a key-value pair or sometimes an item.
As an example, we’ll build a dictionary that maps from English to Spanish words, so the keys and the values are
all strings.
The function dict creates a new dictionary with no items. Because dict is the name of a built-in function, you
should avoid using it as a variable name.
>>> eng2sp = dict()
>>> print(eng2sp)
{}
The curly brackets, {}, represent an empty dictionary. To add items to the dictionary, you can use square brackets:
>>> eng2sp['one'] = 'uno'
This line creates an item that maps from the key 'one' to the value “uno”.
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
Introduction to Dictionaries
If we print the dictionary again, we see a key-value pair with a colon between the key and
value:
>>> print(eng2sp)
{'one': 'uno'}
This output format is also an input format. For example, you can create a new dictionary
with three items. But if you print eng2sp, you might be surprised:
>>> eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
>>> print(eng2sp)
{'one': 'uno', 'three': 'tres', 'two': 'dos'}
The order of the key-value pairs is not the same. In fact, if you type the same example on
your computer, you might get a different result. In general, the order of items in a dictionary
is unpredictable.
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
Introduction to Dictionaries
But that’s not a problem because the elements of a dictionary are never indexed with
integer indices. Instead, you use the keys to look up the corresponding values:
>>> print(eng2sp['two'])
‘dos'
The key 'two' always maps to the value “dos” so the order of the items doesn’t matter. If
the key isn’t in the dictionary, you get an exception:
>>> print(eng2sp['four'])
KeyError: 'four'
The len function works on dictionaries; it returns the number of key-value pairs:
>>> len(eng2sp)
3
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
Introduction to Dictionaries
The in operator works on dictionaries; it tells you whether something appears as a key in the dictionary (appearing
as a value is not good enough).
>>> 'one' in eng2sp
True
>>> 'uno' in eng2sp
False
To see whether something appears as a value in a dictionary, you can use the method values, which returns the
values as a type that can be converted to a list, and then use the in operator:
>>> vals = list(eng2sp.values())
>>> 'uno' in vals
True
The in operator uses different algorithms for lists and dictionaries. For lists, it uses a linear search algorithm. As the
list gets longer, the search time gets longer in direct proportion to the length of the list. For dictionaries, Python
uses an algorithm called a hash table that has a remarkable property: the in operator takes about the same amount
of time no matter how many items there are in a dictionary
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
Introduction to Dictionaries
Exercise 1: Download a copy of the file www.py4e.com/code3/words.txt
Write a program that reads the words in words.txt and stores them as keys in a dictionary. It
doesn’t matter what the values are. Then you can use the in operator as a fast way to check
whether a string is in the dictionary
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
an example of a pattern called nested loops because one of the loops is the outer loop and the other loop is the inner loop. Because the inner loop executes all of its
iterations each time the outer loop makes a single iteration, we think of the inner loop as iterating “more quickly” and the outer loop as iterating more slowly. The
combination of the two nested loops ensures that we will count every word on every line of the input file
fname = input('Enter the file name: ')
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit()
counts = dict()
for line in fhand:
words = line.split()
for word in words:
if word not in counts:
counts[word] = 1
else:
counts[word] += 1
print(counts)
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
It is a bit inconvenient to look through the dictionary to find the most common words and
their counts, so we need to add some more Python code to get us the output that will be
more helpful.
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
Since the Python split function looks for spaces and treats words as tokens separated by
spaces, we would treat the words “soft!” and “soft” as different words and create a
separate dictionary entry for each word.
Also since the file has capitalization, we would treat “who” and “Who” as different words
with different counts.
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
Looking through this output is still unwieldy and we can use Python to give us exactly what
we are looking for, but to do so, we need to learn about Python tuples. We will pick up this
example once we learn about tuples.
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
2.5 Debugging
As you work with bigger datasets it can become unwieldy to debug by printing and checking
data by hand. Here are some suggestions for debugging large datasets:
Scale down the input : If possible, reduce the size of the dataset. For example if the
program reads a text file, start with just the first 10 lines, or with the smallest example you
can find. You can either edit the files themselves, or (better) modify the program so it reads
only the first n lines. If there is an error, you can reduce n to the smallest value that
manifests the error, and then increase it gradually as you find and correct errors.
Check summaries and types: Instead of printing and checking the entire dataset, consider
printing summaries of the data: for example, the number of items in a dictionary or the
total of a list of numbers. A common cause of runtime errors is a value that is not the right
type. For debugging this kind of error, it is often enough to print the type of a value.
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
2.5 Debugging
Write self-checks: Sometimes you can write code to check for errors automatically. For
example, if you are computing the average of a list of numbers, you could check that the
result is not greater than the largest element in the list or less than the smallest. This is
called a “sanity check” because it detects results that are “completely illogical”. Another
kind of check compares the results of two different computations to see if they are
consistent. This is called a “consistency check”.
Pretty print the output: Formatting debugging output can make it easier to spot an error.
Again, time you spend building scaffolding can reduce the time you spend debugging.
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
2.7 Exercises
Exercise 2: Write a program that categorizes each mail message by which day of the week
the commit was done. To do this look for lines that start with “From”, then look for the third
word and keep a running count of each of the days of the week. At the end of the program
print out the contents of your dictionary (order does not matter).
Sample Line: From [email protected] Sat Jan 5 09:14:16 2008
Sample Execution:
python dow.py
Enter a file name: mbox-short.txt
{'Fri': 20, 'Thu': 6, 'Sat': 1}
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
2.7 Exercises
Method 1 Method 2
fname = input('Enter the file name: ') fname = input('Enter the file name: ')
fhand=open(fname) fhand=open(fname)
daycounts = dict() daycounts = dict()
for line in fhand: for line in fhand:
if line.startswith('From'): if line.startswith('From'):
words = line.split() if line.startswith('From:'):continue
if len(words)>2: words = line.split()
print(words) word=words[2]
word=words[2] if word not in daycounts:
if word not in daycounts: daycounts[word] = 1
daycounts[word] = 1 else:
else: daycounts[word] += 1
daycounts[word] += 1 print(daycounts)
print(daycounts)
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
2.7 Exercises
Exercise 3: Write a program to read through a mail log, build a histogram using a dictionary
to count how many messages have come from each email address, and print the dictionary.
Enter file name: mbox-short.txt
{'[email protected]': 1, '[email protected]': 3, '[email protected]':
5, '[email protected]': 1, '[email protected]': 2, '[email protected]': 3,
'[email protected]': 4, '[email protected]': 1, '[email protected]': 4,
'[email protected]': 2, '[email protected]': 1}
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
2.7 Exercises
Method 1 Method 2
fname = input('Enter the file name: ') fname = input('Enter the file name: ')
fhand=open(fname) fhand=open(fname)
usercounts = dict() usercounts = dict()
for line in fhand: for line in fhand:
if line.startswith('From'): if line.startswith('From'):
if line.startswith('From:'):continue words = line.split()
words = line.split() if len(words)>2
word=words[1] word=words[1]
if word not in usercounts: if word not in usercounts:
usercounts[word] = 1 usercounts[word] = 1
else: else:
usercounts[word] += 1 usercounts[word] += 1
print(usercounts) print(usercounts)
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
2.7 Exercises
Exercise 4: Add code to the above program to figure out who has the most messages in the
file. After all the data has been read and the dictionary has been created, look through the
dictionary using a maximum loop (see Chapter 5: Maximum and minimum loops) to find
who has the most messages and print how many messages the person has.
Enter a file name: mbox-short.txt
[email protected] 5
Enter a file name: mbox.txt
[email protected] 195
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
2.7 Exercises
fname = input('Enter the file name: ') print(lst1)
fhand=open(fname) print(lst)
usercounts = dict() maximum=0
for line in fhand: count=0
if line.startswith('From'): index=0
if line.startswith('From:'):continue for key in lst:
words = line.split() if key>maximum:
word=words[1] maximum=key
if word not in usercounts: count=index
usercounts[word] = 1 index=index+1
else: print(maximum)
usercounts[word] += 1 print(count)
print(usercounts) print(lst1[count],maximum)
lst=list(usercounts.values())
lst1=list(usercounts.keys())
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
2.7 Exercises
Exercise 5: This program records the domain name (instead of the address) where the
message was sent from instead of who the mail came from (i.e., the whole email address).
At the end of the program, print out the contents of your dictionary.
python schoolcount.py
Enter a file name: mbox-short.txt
{'media.berkeley.edu': 4, 'uct.ac.za': 6, 'umich.edu': 7, 'gmail.com': 1, 'caret.cam.ac.uk': 1,
'iupui.edu': 8}
VIDYA VIKAS INSTITUTE
Department of Electronics & Communication Engineering OF ENGINEERING &
TECHNOLOGY
2.7 Exercises
fname = input('Enter the file name: ')
fhand=open(fname)
addrcounts = dict()
for line in fhand:
line=line.rstrip()
if line.startswith('From'):
if line.startswith('From:'):continue
words=line.split()
word=words[1]
print(word)
atpos=word.find('@')
mailaddr=word[atpos+1:]
if mailaddr not in addrcounts:
addrcounts[mailaddr] = 1
else:
addrcounts[mailaddr] += 1
print(addrcounts)