0% found this document useful (0 votes)
6 views

Mapping Words to Properties Using Python Dictionaries

The document discusses the use of Python dictionaries for mapping words to part-of-speech tags, highlighting their efficiency in storing and retrieving associations. It explains how to create, update, and utilize dictionaries, including the use of default dictionaries for handling non-existent keys. Additionally, it covers advanced topics such as inverting dictionaries for reverse lookups and provides examples of practical applications in language processing tasks.

Uploaded by

Karthik S
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Mapping Words to Properties Using Python Dictionaries

The document discusses the use of Python dictionaries for mapping words to part-of-speech tags, highlighting their efficiency in storing and retrieving associations. It explains how to create, update, and utilize dictionaries, including the use of default dictionaries for handling non-existent keys. Additionally, it covers advanced topics such as inverting dictionaries for reverse lookups and provides examples of practical applications in language processing tasks.

Uploaded by

Karthik S
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

1

Mapping Words to Properties Using Python


Dictionaries

S.Karthik, Assistant Professor,


Department of Information Technology,
Sri Ramakrishna College of Arts & Science, Coimbatore
Mapping Words to Properties Using Python
Dictionaries

 A tagged word of the form (word, tag) is an association between a word and a
part-of-speech tag.

 Once we start doing part-of-speech tagging, we will be creating programs that


assign a tag to a word, the tag which is most likely in a given context.

 We can think of this process as mapping from words to tags.

 The most natural way to store mappings in Python uses the so-called dictionary
data type.
Mapping Words to Properties Using Python
Dictionaries
 Indexing Lists Versus Dictionaries

 A text is treated in Python as a list of words.

 An important property of lists is that we can “look up” a particular item by


giving its index, e.g., text1[100].

 We specify a number and get back a word.

 We can think of a list as a simple kind of table, as shown in figure below.


Mapping Words to Properties Using Python
Dictionaries
 With frequency distributions, where we specify a word and get back a number,
e.g., fdist['monstrous'], which tells us the number of times a given word has
occurred in a text.

 Lookup using words is familiar to anyone who has used a dictionary. Some more
examples are shown in figure below.
Mapping Words to Properties Using Python
Dictionaries

 In the case of a phonebook, we look up an entry using a name and get back a
number.

 When we type a domain name in a web browser, the computer looks this up to
get back an IP address.

 A word frequency table allows us to look up a word and find its frequency in a
text collection.

 In all these cases, we are mapping from names to numbers, rather than the
other way around as with a list.

 In general, we would like to be able to map between arbitrary types of


information.
Mapping Words to Properties Using Python
Dictionaries
 Table below lists a variety of linguistic objects, along with what they map.

 Most often, we are mapping from a “word” to some structured object.

 For example, a document index maps from a word to a list of pages.


Mapping Words to Properties Using Python
Dictionaries
 Dictionaries in Python

 Python provides a dictionary data type that can be used for mapping between
arbitrary types.

 It is like a conventional dictionary, in that it gives you an efficient way to look


things up.

 However, Dictionaries in Python has a much wider range of uses.

 To illustrate, we define pos to be an empty dictionary and then add four entries
to it, specifying the part-of-speech of some words.

 We add entries to a dictionary using the familiar square bracket notation.


Mapping Words to Properties Using Python
Dictionaries
 Example.
Mapping Words to Properties Using Python
Dictionaries
 So, for example, say that the part-of-speech of colorless is adjective, or more
specifically, that the key 'colorless' is assigned the value 'ADJ' in dictionary pos.

 When we inspect the value of pos we see a set of key-value pairs.

 Once we have populated the dictionary in this way, we can employ the keys to
retrieve values.

 We might accidentally use a key that hasn’t been assigned a value.


Mapping Words to Properties Using Python
Dictionaries

 Unlike lists and strings, where we can use len() to work out which integers will
be legal indexes, how do we work out the legal keys for a dictionary?

 If the dictionary is not too big, we can simply inspect its contents by evaluating
the variable pos, this gives us the key-value pairs.

 They are not in the same order they were originally entered; this is because
dictionaries are not sequences but mappings, and the keys are not inherently
ordered.
Mapping Words to Properties Using Python
Dictionaries

 Alternatively, to just find the keys, we can either convert the dictionary to a list
or use the dictionary in a context where a list is expected, as the parameter of
sorted() or in a for loop.
Mapping Words to Properties Using Python
Dictionaries

 As well as iterating over all keys in the dictionary with a for loop, we can use the
for loop as we did for printing lists.
Mapping Words to Properties Using Python
Dictionaries

 Finally, the dictionary methods keys(), values(), and items() allow us to access
the keys, values, and key-value pairs as separate lists.

 We can even sort tuples, which orders them according to their first element.
Mapping Words to Properties Using Python
Dictionaries

 We want to be sure that when we look something up in a dictionary, we get only


one value for each key.

 Now suppose we try to use a dictionary to store the fact that the word sleep can
be used as both a verb and a noun.
Mapping Words to Properties Using Python
Dictionaries

 Initially, pos['sleep'] is given the value 'V’.

 But this is immediately overwritten with the new value, 'N’.

 In other words, there can be only one entry in the dictionary for 'sleep’.

 However, there is a way of storing multiple values in that entry: we use a list
value, e.g., pos['sleep'] = ['N', 'V'].
Mapping Words to Properties Using Python
Dictionaries
 Defining Dictionaries

 We can use the same key-value pair format to create a dictionary.

 There are a couple of ways to do this, and we will normally use the first.

 The dictionary keys must be immutable types, such as strings and tuples. If we
try to define a dictionary using a mutable key, we get a TypeError.
Mapping Words to Properties Using Python
Dictionaries

 Default Dictionaries

 If we try to access a key that is not in a dictionary, we get an error.

 However, it’s often useful if a dictionary can automatically create an entry for
this new key and give it a default value, such as zero or the empty list.

 Since Python 2.5, a special kind of dictionary called a defaultdict has been
available.

 In order to use it, we have to supply a parameter which can be used to create
the default value, e.g., int, float, str, list, dict, tuple.
Mapping Words to Properties Using Python
Dictionaries
 Example.
Mapping Words to Properties Using Python
Dictionaries
 The preceding examples specified the default value of a dictionary entry to be
the default value of a particular data type.

 However, we can specify any default value we like, simply by providing the
name of a function that can be called with no arguments to create the required
value.

 Let’s return to our part-of-speech example, and create a dictionary whose


default value for any entry is 'N’ .

 When we access a non-existent entry, it is automatically added to the


dictionary.
Mapping Words to Properties Using Python
Dictionaries
 Let’s see how default dictionaries could be used in a more substantial language
processing task.

 Many language processing tasks- including tagging - struggle to correctly process


the hapaxes of a text.

 They can perform better with a fixed vocabulary and a guarantee that no new
words will appear.

 We can preprocess a text to replace low-frequency words with a special “out of


vocabulary” token, UNK, with the help of a default dictionary.

 We need to create a default dictionary that maps each word to its replacement.

 The most frequent n words will be mapped to themselves.

 Everything else will be mapped to UNK.


Mapping Words to Properties Using Python
Dictionaries
 Example.
Mapping Words to Properties Using Python
Dictionaries

 Incrementally Updating a Dictionary

 We can employ dictionaries to count occurrences, emulating the method for


tallying words.

 We begin by initializing an empty defaultdict, then process each part-of-speech


tag in the text.

 If the tag hasn’t been seen before, it will have a zero count by default.

 Each time we encounter a tag, we increment its count using the += operator.
Mapping Words to Properties Using Python
Dictionaries
 Example.
Mapping Words to Properties Using Python
Dictionaries

 The listing in previous example illustrates an important idiom for sorting a


dictionary by its values, to show words in decreasing order of frequency.

 The first parameter of sorted() is the items to sort, which is a list of tuples
consisting of a POS tag and a frequency.

 The second parameter specifies the sort key using a function itemgetter().

 In general, itemgetter(n) returns a function that can be called on some other


sequence object to obtain the nth element.
Mapping Words to Properties Using Python
Dictionaries

 The last parameter of sorted() specifies that the items should be returned in
reverse order, i.e., decreasing values of frequency.

 There’s a second useful programming idiom at the beginning of previous


example, where we initialize a defaultdict and then use a for loop to update its
values. Here’s a schematic version.
Mapping Words to Properties Using Python
Dictionaries
 Here’s another instance of this pattern, where we index words according to their
last two letters.
Mapping Words to Properties Using Python
Dictionaries
 The following example uses the same pattern to create an anagram dictionary.

 Since accumulating words like this is such a common task, NLTK provides a more
convenient way of creating a defaultdict(list), in the form of nltk.Index().
Mapping Words to Properties Using Python
Dictionaries
 We can use default dictionaries with complex keys and values.

 Let’s study the range of possible tags for a word, given the word itself and the
tag of the previous word.

 We will see how this information can be used by a POS tagger.


Mapping Words to Properties Using Python
Dictionaries

 The example uses a dictionary whose default value for an entry is a dictionary
(whose default value is int(), i.e., zero).

 Notice how we iterated over the bigrams of the tagged corpus, processing a pair
of word-tag pairs for each iteration .

 Each time through the loop we updated our pos dictionary’s entry for (t1, w2), a
tag and its following word.

 When we look up an item in pos we must specify a compound key, and we get
back a dictionary object.

 A POS tagger could use such information to decide that the word right, when
preceded by a determiner, should be tagged as ADJ.
Mapping Words to Properties Using Python
Dictionaries
 Inverting a Dictionary

 Dictionaries support efficient lookup, so long as you want to get the value for
any key.

 If d is a dictionary and k is a key, we type d[k] and immediately obtain the


value.

 Finding a key given a value is slower and more cumbersome.


Mapping Words to Properties Using Python
Dictionaries
 If we expect to do this kind of “reverse lookup” often, it helps to construct a
dictionary that maps values to keys.

 In the case that no two keys have the same value, this is an easy thing to do.

 We just get all the key-value pairs in the dictionary, and create a new dictionary
of value-key pairs.

 The below example also illustrates another way of initializing a dictionary pos
with key-value pairs.
Mapping Words to Properties Using Python
Dictionaries
 Let’s first make our part-of-speech dictionary a bit more realistic and add some
more words to pos using the dictionary update() method, to create the situation
where multiple keys have the same value.

 Then the technique just shown for reverse lookup will no longer work.

 Instead, we have to use append() to accumulate the words for each part-of-
speech, as follows.
Mapping Words to Properties Using Python
Dictionaries
 Now we have inverted the pos dictionary, and can look up any part-of-speech
and find all words having that part-of-speech.

 We can do the same thing even more simply using NLTK’s support for indexing,
as follows.

 A summary of Python’s dictionary methods is given in the table given next.


Mapping Words to Properties Using Python
Dictionaries

You might also like