Natural language processing-Section (3)
Natural language processing-Section (3)
import nltk
nltk.download(“punkt”)
nltk.download(“wordnet”)
nltk.download(“averaged_perceptron_tagger”)
2
Tokenizing
text = """To be, or not to be, that is the question. Whether 'tis nobler in the mind to suffer.
The slings and arrows of outrageous fortune, or to take arms against a sea of troubles.
And by opposing end them. To die—to sleep, no more; and by a sleep to say we end. The
heart-ache and the thousand natural shocks"""
tokenized_text = sent_tokenize(text)
3
print(tokenized_text)
Tokenizing (cont.)
text = """To be, or not to be, that is the question. Whether 'tis nobler in the mind to suffer.
The slings and arrows of outrageous fortune, or to take arms against a sea of troubles.
And by opposing end them. To die—to sleep, no more; and by a sleep to say we end. The
heart-ache and the thousand natural shocks"""
tokenized_text = word_tokenize(text)
print(tokenized_text)
4
Stemming
5
Stemming (cont.)
6
Lemmatization
7
Lemmatization (cont.)
8
Lemmatization (cont.)
9
Lemmatization (cont.)
10
Parts of Speech Tagging
11
Parts of Speech Tagging (cont.)
import nltk
print(nltk.pos_tag(tokenized_text))
12
Parts of Speech Tagging (cont.)
14
WordNet (cont.)
• Word-sense disambiguation
• Information retrieval
• Automatic text classification
• Automatic text summarization
• Machine translation 15
Example (Synsets and Lemmas)
16
Code
17
Output
four have the name ’room’ and are nouns, while the last
one’s name is ’board’ and is a verb.
word = "hungry"
synset = wordnet.synsets(word)[0]
19
Try it out yourself
Code:
https://colab.research.google.com/drive/1wLjqqi4aLEY2
PWDcpax-_4tCyh946yVQ
Parts of Speech tagger:
https://parts-of-speech.info/
WordNet search:
http://wordnetweb.princeton.edu/perl/webwn
20
Task #1
21
Task #2
22
Task #3
23
Thank you for your attention!
24
References
https://medium.com/@gianpaul.r/tokenization-and-parts-of-speech-pos-tagging-in-pyth
ons-nltk-library-2d30f70af13b
https://medium.com/@gaurav5430/using-nltk-for-lemmatizing-sentences-c1bfff963258
https://www.datacamp.com/community/tutorials/stemming-lemmatization-python
https://www.nltk.org/book/ch05.html#tab-universal-tagset
25