Lecture 5 - Dictionaries
Lecture 5 - Dictionaries
Science
in Python
Lecture 5: Dictionaries
Flying
2
Lists - reminder
A list is an ordered sequence of elements.
Create a list in Python:
>>> my_list = [2,3,5,7,11]
>>> my_list
[2,3,5,7,11]
>>> my_list[0]
2
>>> my_list[-1]
11
3
List Methods
Function Description
lst.append(item) append an item to the end of the list
lst.count(val) return the number of occurrences of value
lst.extend(another_lst) extend list by appending items from another list
lst.index(val) return first index of value
lst.insert(ind, item) insert an item before item at index ind
lst.pop(), lst.pop(ind) remove and return the last item or item at index ind
lst.remove(value) remove first occurrence of a value
lst.reverse() reverse the list
lst.sort() sort the list
4
Tuples
A tuple is similar to a list, but it is immutable.
Syntax: note the parentheses!
5
Tuples
>>> t[0] = 'do' # try to change
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
t[0]='do'
TypeError: 'tuple' object does not support
item assignment
6
Tuples
• Fixed size
• Immutable (similarly to Strings)
• What are they good for (compared to list)?
• Simpler (“light weight”)
• Staff multiple things into a single container
• Immutable (e.g., records in database, safe code)
Dictionaries (Hash Tables)
keys values
9
Dictionaries
Dictionary: a set of key-value pairs.
10
Dictionaries
Example: “144” - Map names to phone numbers:
PythonTutor: https://goo.gl/yNJhKH 11
Dictionaries
{'Eric Cartman': '2020',
Retrieve dictionary value by key: 'Stan March': '5711', 'Kyle
Broflovski': '2781'}
>>> phonebook['Eric Cartman']
'2020'
13
Dictionaries
Idea: Add address to the key
new key
15
Dictionary Methods
Function Description
D = {} Creates an empty dictionary
D[k] = value Sets D[k] to value
k in D Returns True if k is a key in D, False otherwise
D[k] Returns the value mapped to key k (raises KeyError if k not in D)
D.get(k, [d]) Returns D[k] for k in D, otherwise d (default: None)
D.keys() Returns a view* of D's keys
D.values() Returns a view* of D's values
D.items() Returns a view* of D’s (key, value) pairs, as 2-tuples
D.pop(k, [d]) Removes the specified key k and returns its value.
If k is not found, returns d (if d is not specified, raises
KeyError)
Example:
>>> d = {1 : 'a'}
>>> 2 in d
False
>>> 1 in d
True
>>> 'a' in d
False
18
Sorting Dictionaries
http://pythoncentral.io/how-to-sort-python-dictionaries-by-key-or-value/
Fourth: 4
first: 1
second: 2
third: 3
19
Sorting Dictionaries
http://pythoncentral.io/how-to-sort-python-dictionaries-by-key-or-value/
21
Sorting Dictionaries
http://pythoncentral.io/how-to-sort-python-dictionaries-by-key-or-value/
22
Sorting Dictionaries
numbers = {'first' : 1, 'second' : 2, 'third' : 3, 'fourth' : 4}
>>> {key: value for (key, value) in sorted(numbers.items(), reverse=True)}
{'third': 3, 'second': 2, 'fourth': 4, 'first': 1}
23
Example: Frequency Counter
▪ Assume you want to learn about the frequencies of English
letters usage in text.
supercalifragilisticexpialidocious
24
Frequency Counter
text = 'supercalifragilisticexpialidocious'
# count letters – build letters histogram
def get_char_count(text):
char_count = {}
for char in text:
if not char in char_count:
char_count[char] = 1
else:
char_count[char] = char_count[char] + 1
return char_count
25
Frequency Counter
(has_key version)
PythonTutor: https://goo.gl/u19on5
26
Frequency Counter
(get(k,0) version)
text = 'supercalifragilisticexpialidocious'
# count letters – build letters histogram
def get_char_count(text):
char_count = {}
for char in text:
count = char_count.get(char, 0)
count += 1
char_count[char] = count
return char_count
# a shorter version:
for char in text:
char_count[char] = char_count.get(char, 0) + 1
27
Frequency Counter
In what order can we print the counts?
1. We can print the results in an alphabetic order: print
the count for 'a', then 'b' and so on.
28
Print by keys order
def print_by_keys_order(char_count): sorted(iterable): returns
sorted_chars = sorted(char_count) a sorted list of the
objects in iterable (e.g.,
for char in sorted_chars: lists, strings)
print(char, ':', char_count[char])
text = 'supercalifragilisticexpialidocious'
cc = get_char_count(text)
print_by_keys_order(cc)
The output is:
a:3
c:3
d:1
e:2
f:1
…
29
Print by values order
# text = 'supercalifragilisticexpialidocious'
def print_by_values_order(char_count):
sorted_chars = sorted(char_count, key=char_count.get,
reverse=True )
for char in sorted_chars:
print(char , ':', char_count[char])
30
More about dictionaries…
Hash Functions
• The type for a dictionary keys must be
• Immutable
• Hashable
31
Hashing
• Hash function h: Mapping from U to the slots of a
hash table T[0..m–1].
h : U → {0,1,…, m–1}
• With arrays, key k maps to slot A[k].
• With hash tables, key k maps or “hashes” to slot
T[h(k)].
• H(k) is the hash value of key k.
32
Hashing
0
U
(universe of keys)
h(k1)
h(k4)
K k1 k4
(actual k2 collision h(k2)=h(k5)
keys) k5
k3
h(k3)
m–1
33
Issues with Hashing
• Multiple keys can hash to the same slot –
collisions are possible.
• Design hash functions such that collisions are
minimized.
• But avoiding collisions is impossible.
• Design collision-resolution techniques.
• If keys are well dispersed in table then all operations
can be made to have very fast running time
34
Method of Resolution
• Chaining:
• Store all elements that hash to the same
0
k1 k4
35
Collision Resolution by Chaining
0
U
(universe of keys)
X
h(k1)=h(k4)
k1
k4
K
(actual k2 k6 X
k5 h(k2)=h(k5)=h(k6)
keys)
k8 k7
k3 X
h(k3)=h(k7)
h(k8)
m–1
36
Collision Resolution by Chaining
0
U
(universe of keys)
k1 k4
k1
k4
K
(actual k2 k6
keys) k5 k5 k2 k6
k8 k7
k3
k7 k3
k8
m–1
37
Text Categorization /
Document Classification
Application: Data Analysis
• To understand the information, we “see” we need to
• inspect
• clean
• transform
• model
• Descriptive / Predictive
How is it Done?
• Manually
• Automatically
• Gather document statistics
• Measure how similar it is to documents in
each category
• Today we will collect word-statistics from
several well-known books
Identifying frequent words in
text
Steps:
• Find textual data of interest
url=urllib.request.urlopen('https://raw.githubusercontent.com/GITenberg/Alice-
s-Adventures-in-Wonderland_11/master/11.txt')
alice = url.read().decode()
Try:
urllib.request.urlopen('http://www.
google.com').read().decode()
Print Most Popular Words
(High Level)
import urllib.request
def main():
alice = urllib.request.urlopen('https://raw.githubusercontent….')
n = 20
popular_list = get_most_popular(alice, n)
get_most_popular:
Input:
a url address url,
an integer n
• Easy to maintain
Get most popular words - Plan
• Split the given text into words
• Count occurrences in text for each word using a dictionary
• For a given n: return the n most popular words.
Word w1 w2 w3 w4 w5 w8 w9
Count 8 16 4 1 23 42 15
Observation: the 3 most frequent words are the first 3 ones after
sorting by values in reverse (descending) order
46
Get Most Popular
Build Word-Occurrences Dictionary
def get_most_popular(url, n):
'''
url - object representing an internet address (URL)
n - num of popular words to print
'''
string.split() – returns a list.
text = url.read().decode()
Splits the string by whitespaces
words = text.split()
word_count = {}
for w in words:
word_count[w] = word_count.get(w, 0) + 1
Get Most Popular
Build Word-Occurrences Dictionary
def get_most_popular(url,n):
'''
url - object representing an internet address (URL)
n - num of popular words to print
'''
text = url.read().decode()
words = text.split()
word_count = {}
for w in words:
word_count[w] = word_count.get(w, 0) + 1