0% found this document useful (0 votes)
2 views51 pages

Lecture 5 - Dictionaries

This lecture discusses dictionaries in Python, highlighting their key-value mapping, immutability of keys, and methods for accessing and manipulating data. It also covers the differences between lists, tuples, and dictionaries, as well as collision resolution techniques in hash tables. Examples include creating a phonebook and counting letter frequencies in a string.

Uploaded by

Raed Qassem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views51 pages

Lecture 5 - Dictionaries

This lecture discusses dictionaries in Python, highlighting their key-value mapping, immutability of keys, and methods for accessing and manipulating data. It also covers the differences between lists, tuples, and dictionaries, as well as collision resolution techniques in hash tables. Examples include creating a phonebook and counting letter frequencies in a string.

Uploaded by

Raed Qassem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Foundations of Computer

Science
in Python
Lecture 5: Dictionaries
Flying

2
Lists - reminder
A list is an ordered sequence of elements.
Create a list in Python:
>>> my_list = [2,3,5,7,11]
>>> my_list
[2,3,5,7,11]
>>> my_list[0]
2
>>> my_list[-1]
11

3
List Methods
Function Description
lst.append(item) append an item to the end of the list
lst.count(val) return the number of occurrences of value
lst.extend(another_lst) extend list by appending items from another list
lst.index(val) return first index of value
lst.insert(ind, item) insert an item before item at index ind
lst.pop(), lst.pop(ind) remove and return the last item or item at index ind
lst.remove(value) remove first occurrence of a value
lst.reverse() reverse the list
lst.sort() sort the list

These are queries that do not change the list

4
Tuples
A tuple is similar to a list, but it is immutable.
Syntax: note the parentheses!

>>> t = ("don't", "worry", "be", "happy") # definition


>>> t
("don't", 'worry', 'be', 'happy')
>>> t[0] # indexing
"don't"
>>> t[-1] # backwords indexing
'happy'
>>> t[1:3] # slicing
('worry', 'be')

5
Tuples
>>> t[0] = 'do' # try to change
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
t[0]='do'
TypeError: 'tuple' object does not support
item assignment

No append / extend / remove in Tuples!

6
Tuples
• Fixed size
• Immutable (similarly to Strings)
• What are they good for (compared to list)?
• Simpler (“light weight”)
• Staff multiple things into a single container
• Immutable (e.g., records in database, safe code)
Dictionaries (Hash Tables)
keys values

• Key – Value mapping


• (No order guarantee)
• Fast!
• Usage examples:
• Database
• Dictionary
• Phone book
Dictionaries
Access to the data in the dictionary:
• Given a key, it is easy to get the value.
• Given a value, you need to go over all the dictionary to get
the key.

Intuition - Yellow Pages:


• Given a name, it’s easy to find
the right phone number
• Given a phone number it’s
difficult to match the name

9
Dictionaries
Dictionary: a set of key-value pairs.

>>> dict_name = {key1:val1, key2:val2,…}

Keys are unique and immutable.

10
Dictionaries
Example: “144” - Map names to phone numbers:

>>> phonebook = {'Eric Cartman': '2020', 'Stan March':


'5711', 'Kyle Broflovski': '2781'}
>>> phonebook
{'Eric Cartman': '2020', 'Stan March': '5711', 'Kyle
Broflovski': '2781’}

Note: The dictionary pairs will not necessarily be presented in


the same order as in the dictionary definition.
Dictionaries do not guarantee a certain key order.
Changed in version 3.7: Dictionary order is guaranteed to be insertion order.
However, it’s not a good practice to rely on it because it is version/language dependent.

PythonTutor: https://goo.gl/yNJhKH 11
Dictionaries
{'Eric Cartman': '2020',
Retrieve dictionary value by key: 'Stan March': '5711', 'Kyle
Broflovski': '2781'}
>>> phonebook['Eric Cartman']
'2020'

Test if a key already exists in dictionary:


>>> 'Kenny McCormick' in phonebook
False

Add a new person:


>>> phonebook['Kenny McCormick'] = '1234'
>>> phonebook
{'Eric Cartman': '2020', 'Stan March': '5711', 'Kyle
Broflovski': '2781', 'Kenny McCormick': '1234'}
12
Dictionaries
What happens when we add a key that already exists?
>>> phonebook['Kenny McCormick']= '2222'
>>> phonebook
{'Eric Cartman': '2020', 'Stan March': '5711', 'Kyle
Broflovski': '2781', 'Kenny McCormick': '2222'}

Kenny’s phone was


previously 1234 and
now changed to 2222

How can we add another Kenny McCormick in the phone book?

13
Dictionaries
Idea: Add address to the key
new key

>>> phonebook= {['Kenny McCormick', 'Southpark']: '2222'}


Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
phonebook= {['Kenny McCormick', 'Southpark']: '2222'}
TypeError: unhashable type: 'list'

What’s the problem?

keys must be immutable!


14
Dictionaries
Fix: use tuples as keys!

>>> phonebook= {('Kenny McCormick', 'Southpark'): '2222'}


>>> phonebook
{('Kenny McCormick', 'Southpark'): '2222'}

15
Dictionary Methods
Function Description
D = {} Creates an empty dictionary
D[k] = value Sets D[k] to value
k in D Returns True if k is a key in D, False otherwise
D[k] Returns the value mapped to key k (raises KeyError if k not in D)
D.get(k, [d]) Returns D[k] for k in D, otherwise d (default: None)
D.keys() Returns a view* of D's keys
D.values() Returns a view* of D's values
D.items() Returns a view* of D’s (key, value) pairs, as 2-tuples
D.pop(k, [d]) Removes the specified key k and returns its value.
If k is not found, returns d (if d is not specified, raises
KeyError)

• An argument shown in [] has a default value. We don’t have to specify the


second argument for get and pop methods.
• * keys(), values(), items() provide iterable views in un-guaranteed order.

what is: v in D.values() ? 16


More Dictionary Methods
Function Description
D.copy() Returns a shallow copy of dictionary D
(Only the dictionary itself is duplicated, not the objects it holds)

D.update(other) Adds all items of dictionary other to D

Use in to test if a certain value is a key in the dictionary*.


*Same as has_key() in previous versions, as you might encounter in old exams…

Example:
>>> d = {1 : 'a'}
>>> 2 in d
False
>>> 1 in d
True
>>> 'a' in d
False

Full list of dictionary related commands:


https://docs.python.org/3/library/stdtypes.html#dict
17
Dictionary Views (New in 3.7)
• The dictionary methods keys(), values(), items() return a dynamic iterable
view.
• Starting from python 3.7 dictionaries and their views are returned in insertion
order, but do not assume this (because it is version/language dependent).
• A view can be iterated using a loop, tested for membership using in and it
dynamically changes as the dictionary changes.
>>> d={'A':65, 'B':66, 'C':67} >>> for k in d.keys():
>>> d print (k, end= ' ')
{'A': 65, 'B': 66, 'C': 67} A B C D
>>> d.keys() >>> 'E' in d.keys()
dict_keys(['A', 'B', 'C']) False
>>> d.values()
dict_values([65, 66, 67])
>>> a=d.items()
>>> a
dict_items([('A', 65), ('B', 66), ('C', 67)])
>>> d['D']=68
>>> a # changed automatically as d changed !
dict_items([('A', 65), ('B', 66), ('C', 67), ('D', 68)])

• Do not change the dictionary while iterating one of its views.

18
Sorting Dictionaries
http://pythoncentral.io/how-to-sort-python-dictionaries-by-key-or-value/

numbers = {'first':1, 'second':2, 'third':3, 'Fourth':4}

Printing keys + values sorted by keys

for key in sorted(numbers):


print(key + ":", numbers[key])

Fourth: 4
first: 1
second: 2
third: 3

19
Sorting Dictionaries
http://pythoncentral.io/how-to-sort-python-dictionaries-by-key-or-value/

numbers = {'first': 1, 'second': 2, 'third': 3, 'Fourth': 4}

Dictionaries are ordered (from Python 3.7)


>>> numbers.keys() # order is guaranteed
dict_keys(['first', 'second', 'third', 'Fourth'])
>>> numbers.values() # order is guaranteed
dict_values([1, 2, 3, 4])

Sorting the keys of a dictionary


# This is the same as calling sorted(numbers.keys())
>>> sorted(numbers)
['Fourth', 'first', 'second', 'third']

Sorting the values of a dictionary


# We have to call numbers.values() here
>>> sorted(numbers.values())
[1, 2, 3, 4] 20
Sorting Dictionaries
http://pythoncentral.io/how-to-sort-python-dictionaries-by-key-or-value/

numbers = {'first':1, 'second':2, 'third':3, 'Fourth':4}

Printing the keys sorted by values


# Use the get method as the key function
>>> sorted(numbers, key = numbers.get)
# In the order of sorted values: [1, 2, 3, 4]
['first', 'second', 'third', ‘Fourth']

Printing the keys sorted by values, in reverse


>>> sorted(numbers, key = numbers.get, reverse = True)
['Fourth', 'third', 'second', 'first']

21
Sorting Dictionaries
http://pythoncentral.io/how-to-sort-python-dictionaries-by-key-or-value/

List comprehension is a natural way to generate a list:


>>> S = [x**2 for x in range(10)]
>>> S
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Sorts the keys
alphabetically!
Useful for dictionaries too:(and returns the reversed order)
numbers = {'first' : 1, 'second' : 2, 'third' : 3, 'fourth' : 4}
>>> [value for (key, value) in sorted(numbers.items(), reverse=True)]
[3, 2, 4, 1]

22
Sorting Dictionaries
numbers = {'first' : 1, 'second' : 2, 'third' : 3, 'fourth' : 4}
>>> {key: value for (key, value) in sorted(numbers.items(), reverse=True)}
{'third': 3, 'second': 2, 'fourth': 4, 'first': 1}

>>> dict([(key, value) for (key, value) in sorted(numbers.items(), reverse=True)])


{'third': 3, 'second': 2, 'fourth': 4, 'first': 1}

23
Example: Frequency Counter
▪ Assume you want to learn about the frequencies of English
letters usage in text.

▪ You find a long and representative text and start counting.

▪ Which data structure will you use to keep your findings?

supercalifragilisticexpialidocious

24
Frequency Counter
text = 'supercalifragilisticexpialidocious'
# count letters – build letters histogram
def get_char_count(text):
char_count = {}
for char in text:
if not char in char_count:
char_count[char] = 1
else:
char_count[char] = char_count[char] + 1

return char_count

25
Frequency Counter
(has_key version)

PythonTutor: https://goo.gl/u19on5
26
Frequency Counter
(get(k,0) version)
text = 'supercalifragilisticexpialidocious'
# count letters – build letters histogram
def get_char_count(text):
char_count = {}
for char in text:
count = char_count.get(char, 0)
count += 1
char_count[char] = count
return char_count

# a shorter version:
for char in text:
char_count[char] = char_count.get(char, 0) + 1
27
Frequency Counter
In what order can we print the counts?
1. We can print the results in an alphabetic order: print
the count for 'a', then 'b' and so on.

Sort by the keys

2. We can print the results ordered by the counts: start


from the most frequent letter and finish with the least
frequent
Sort by the values

28
Print by keys order
def print_by_keys_order(char_count): sorted(iterable): returns
sorted_chars = sorted(char_count) a sorted list of the
objects in iterable (e.g.,
for char in sorted_chars: lists, strings)
print(char, ':', char_count[char])

text = 'supercalifragilisticexpialidocious'
cc = get_char_count(text)
print_by_keys_order(cc)
The output is:
a:3
c:3
d:1
e:2
f:1

29
Print by values order
# text = 'supercalifragilisticexpialidocious'

def print_by_values_order(char_count):
sorted_chars = sorted(char_count, key=char_count.get,
reverse=True )
for char in sorted_chars:
print(char , ':', char_count[char])

The output is:


i:7
a:3
c:3
l:3
Sorted: https://www.programiz.com/python- s:3
programming/methods/built-in/sorted …

30
More about dictionaries…
Hash Functions
• The type for a dictionary keys must be
• Immutable
• Hashable

31
Hashing
• Hash function h: Mapping from U to the slots of a
hash table T[0..m–1].
h : U → {0,1,…, m–1}
• With arrays, key k maps to slot A[k].
• With hash tables, key k maps or “hashes” to slot
T[h(k)].
• H(k) is the hash value of key k.

32
Hashing
0
U
(universe of keys)
h(k1)

h(k4)
K k1 k4
(actual k2 collision h(k2)=h(k5)
keys) k5
k3

h(k3)

m–1

33
Issues with Hashing
• Multiple keys can hash to the same slot –
collisions are possible.
• Design hash functions such that collisions are
minimized.
• But avoiding collisions is impossible.
• Design collision-resolution techniques.
• If keys are well dispersed in table then all operations
can be made to have very fast running time

34
Method of Resolution
• Chaining:
• Store all elements that hash to the same
0
k1 k4

slot in a linked list. k5 k2 k6

• Store a pointer to the head of the linked k7 k3


k8
list in the hash table slot. m–1

35
Collision Resolution by Chaining
0
U
(universe of keys)
X
h(k1)=h(k4)

k1
k4
K
(actual k2 k6 X
k5 h(k2)=h(k5)=h(k6)
keys)
k8 k7
k3 X
h(k3)=h(k7)
h(k8)
m–1

36
Collision Resolution by Chaining
0
U
(universe of keys)
k1 k4

k1
k4
K
(actual k2 k6
keys) k5 k5 k2 k6
k8 k7
k3
k7 k3

k8
m–1

37
Text Categorization /
Document Classification
Application: Data Analysis
• To understand the information, we “see” we need to
• inspect
• clean
• transform
• model

• This process is crucial for decision making

• Descriptive / Predictive
How is it Done?
• Manually 
• Automatically
• Gather document statistics
• Measure how similar it is to documents in
each category
• Today we will collect word-statistics from
several well-known books
Identifying frequent words in
text
Steps:
• Find textual data of interest

• Collect word statistics

• Analyze the results


Find Data
• This might be the hardest task for many
applications!
• Project Gutenberg (http://www.gutenberg.org/)
• Alice's Adventures in Wonderland
(https://raw.githubusercontent.com/GITenberg/Alice-s-Adventures-in-
Wonderland_11/master/11.txt)

• The Bible, King James version, Book 1:


Genesis (https://raw.githubusercontent.com/GITenberg/The-
Bible-King-James-Version-Complete_30/master/30.txt)
Reading an online text using Python
import urllib.request

url=urllib.request.urlopen('https://raw.githubusercontent.com/GITenberg/Alice-
s-Adventures-in-Wonderland_11/master/11.txt')
alice = url.read().decode()

convert from one encoding scheme


URL = uniform resource locator.
A web-address, which is a string to the desired encoding scheme.
that constitutes a reference to a
web resource

Try:
urllib.request.urlopen('http://www.
google.com').read().decode()
Print Most Popular Words
(High Level)
import urllib.request

def main():
alice = urllib.request.urlopen('https://raw.githubusercontent….')

n = 20
popular_list = get_most_popular(alice, n)

get_most_popular:
Input:
a url address url,
an integer n

The function reads the text, finds the top n most


popular words and returns them as a sorted list with
their count (descending order).
Modular Programming
• Top-down approach: first write what you plan
to do and then implement the details

• Clear for readers

• Easy to debug and test

• Easy to maintain
Get most popular words - Plan
• Split the given text into words
• Count occurrences in text for each word using a dictionary
• For a given n: return the n most popular words.

Word w1 w2 w3 w4 w5 w8 w9
Count 8 16 4 1 23 42 15

What are the 3 most frequent words in the text ?


Sorting words by counts can help: 42 23 16 15 8 4 1

Observation: the 3 most frequent words are the first 3 ones after
sorting by values in reverse (descending) order
46
Get Most Popular
Build Word-Occurrences Dictionary
def get_most_popular(url, n):
'''
url - object representing an internet address (URL)
n - num of popular words to print
'''
string.split() – returns a list.
text = url.read().decode()
Splits the string by whitespaces
words = text.split()

>>> text = 'Hello world, 2023!'


>>> text.split()
['Hello', 'world,', ‘2023!']
Get Most Popular
Build Word-Occurrences Dictionary
def get_most_popular(url,n):
'''
url - object representing an internet address (URL)
n - num of popular words to print
'''
text = url.read().decode() word_count holds the counts for
words = text.split() each word

word_count = {}
for w in words:
word_count[w] = word_count.get(w, 0) + 1
Get Most Popular
Build Word-Occurrences Dictionary
def get_most_popular(url,n):
'''
url - object representing an internet address (URL)
n - num of popular words to print
'''
text = url.read().decode()
words = text.split()

word_count = {}
for w in words:
word_count[w] = word_count.get(w, 0) + 1

sorted_words = sorted(word_count, key=word_count.get, reverse=True)


top_n_words = sorted_words[:n]
top_n_words_count = [(word, word_count[word]) for word in top_n_words]
return top_n_words_count
Print Results
alice= urllib.request.urlopen('https://raw.githubusercontent.com/GITenberg/Alice-
s-Adventures-in-Wonderland_11/master/11.txt')
book_name = "Alice In Wonderland"
top_n_words_count = get_most_popular(alice, 20)
print('*****', book_name, '*****')
for word, count in top_n_words_count :
print((word, count)) #print as tuples
How is it Really Done?
• Preprocessing (e.g., words to lower case, remove
punctuation signs)
• Enhance statistics
• Discard stop words (e.g., and, of, a) ), https://bit.ly/2zfn8SY
• Stemming (e.g., go & went)
• Synonyms (‫)מילים נרדפות‬
• Entity recognition (e.g., N.Y.C)
• Similarity measures to existing documents /
categories

You might also like