0% found this document useful (0 votes)

4 views25 pages

Natural language processing-Section (3)

The document provides an overview of the Natural Language Toolkit (NLTK) in Python, covering key functionalities such as tokenization, stemming, lemmatization, and parts of speech tagging. It explains how to install NLTK, use its modules for text processing, and introduces WordNet as a resource for word relationships. Additionally, it includes practical tasks for users to apply their knowledge of NLTK features.

Uploaded by

dw9324764

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views25 pages

Natural language processing-Section (3)

Uploaded by

dw9324764

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Language Engineering

Prepared by: Abdelrahman M. Safwat

Section (3) – NLTK Basics

Installing NLTK

!pip install nltk

import nltk
nltk.download(“punkt”)
nltk.download(“wordnet”)
nltk.download(“averaged_perceptron_tagger”)

2
Tokenizing

 NLTK has a module that can tokenize text. You can

tokenize text based on sentences or words.
from nltk.tokenize import sent_tokenize

text = """To be, or not to be, that is the question. Whether 'tis nobler in the mind to suffer.
The slings and arrows of outrageous fortune, or to take arms against a sea of troubles.
And by opposing end them. To die—to sleep, no more; and by a sleep to say we end. The
heart-ache and the thousand natural shocks"""

tokenized_text = sent_tokenize(text)
3
print(tokenized_text)
Tokenizing (cont.)

from nltk.tokenize import word_tokenize

tokenized_text = word_tokenize(text)

print(tokenized_text)

4
Stemming

 If we want to get the origin form of a word, we use a

stemmer.
 For example, stemming the word “connection,”
“connecting,” or “connected” would all result in the
word “connect”

5
Stemming (cont.)

from nltk.stem import PorterStemmer

words = ["connection", "connected", "connecting"]

for word in words:

print(PorterStemmer().stem(word))

6
Lemmatization

 Using stemming sometimes can lead to a wrong origin

word, or a word that doesn’t exist.
 In that case, we can use lemmatization, which is
similar to looking up the origin of a word in a dictionary.

7
Lemmatization (cont.)

from nltk.stem import WordNetLemmatizer

text = "The rabbit was running quickly towards the carrot"

tokenized_text = nltk.word_tokenize(text)

for word in tokenized_text:

print(WordNetLemmatizer().lemmatize(word))

8
Lemmatization (cont.)

 In the above example, you’ll see that there is no

meaningful change after lemmatization.
 That’s because you need to provide the lemmatization
function with Parts of Speech tags.

9
Lemmatization (cont.)

from nltk.stem import WordNetLemmatizer

text = "The rabbit was running quickly towards the carrot"

tokenized_text = nltk.word_tokenize(text)

for word in tokenized_text:

print(WordNetLemmatizer().lemmatize(word, pos="v"))

10
Parts of Speech Tagging

 Parts of Speech tagging is the process of tagging a

word in a text based on its definition and context.
 Example: Tagging “likes” as a verb.
 Note: To tag words in a text, you need to tokenize it
first.

11
Parts of Speech Tagging (cont.)

import nltk

text = "The rabbit was running quickly towards the carrot"

tokenized_text = nltk.word_tokenize(text)

print(nltk.pos_tag(tokenized_text))

12
Parts of Speech Tagging (cont.)

Tag Meaning Examples

ADJ Adjective new, good, high, special, big
ADP Adposition on, of, at, with, by, into,
under
ADV Adverb really, already, still, early,
now
CONJ Conjunction and, or, but, if, while
DET Determiner the, a, some, most, every
NOUN Noun year, home, costs, time
NUM Numeral twenty-four, fourth, 1991
PRT Particle at, on, out, over per, that,
up
13
PRON Pronoun he, their, her, its, my, I, us
WordNet

 In lemmatization, we mentioned a process similar to

looking up a dictionary. WordNet is what we use to
look up words.
 WordNet is similar to a database or a dictionary of links
and relationships between words.

14
WordNet (cont.)

The WordNet is a part of Python's Natural Language Toolkit. It

is a large word database of English Nouns, Adjectives, Adverbs
and Verbs.

WordNet has been used for a number of purposes in

information systems, including

• Word-sense disambiguation
• Information retrieval
• Automatic text classification
• Automatic text summarization
• Machine translation 15
Example (Synsets and Lemmas)

In WordNet, similar words are grouped into a set known

as a Synset

Every Synset has a name, a part-of-speech, and a

number. The words in a Synset are known as Lemmas.

16
Code

The function wordnet.synsets('word’);

returns an array containing all the Synsets related to the

word passed to it as the argument.
from nltk.corpus import wordnet
synset = wordnet.synsets(“room”)

17
Output

[Synset(‘room.n.01’), Synset(‘room.n.02’), Synset(‘room.n.03’), Synset(‘room.n.04’),

Synset(‘board.v.02’)]

four have the name ’room’ and are nouns, while the last
one’s name is ’board’ and is a verb.

also suggests that the word ‘room’ has a total of five

meanings or contexts.
18
WordNet

from nltk.corpus import wordnet

word = "hungry"
synset = wordnet.synsets(word)[0]

print("Name: " + synset.name())

print("Description: " + synset.definition())
print("Antonym: " + synset.lemmas()[0].antonyms()[0].name())
print("Examples: " + synset.examples()[0])

19
Try it out yourself

 Code:
https://colab.research.google.com/drive/1wLjqqi4aLEY2
PWDcpax-_4tCyh946yVQ
 Parts of Speech tagger:
https://parts-of-speech.info/
 WordNet search:
http://wordnetweb.princeton.edu/perl/webwn
20
Task #1

 Read a PDF file using the PyPDF2 library, extract the

text from the first page, tokenize it into sentences and
then tag with the Parts of Speech tagger.

21
Task #2

 Use stemming to transform the word to its root form.

22
Task #3

 Write a code to determine stems in the input

sentence.

23
Thank you for your attention!

24
References

 https://medium.com/@gianpaul.r/tokenization-and-parts-of-speech-pos-tagging-in-pyth
ons-nltk-library-2d30f70af13b
 https://medium.com/@gaurav5430/using-nltk-for-lemmatizing-sentences-c1bfff963258
 https://www.datacamp.com/community/tutorials/stemming-lemmatization-python
 https://www.nltk.org/book/ch05.html#tab-universal-tagset

Language Engineering - Section
No ratings yet
Language Engineering - Section
24 pages
NLTK
No ratings yet
NLTK
4 pages
NLTK
No ratings yet
NLTK
3 pages
NLP Intro
No ratings yet
NLP Intro
15 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
UNIT-V-NLP Using NLTK
No ratings yet
UNIT-V-NLP Using NLTK
19 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
UBC Summer School in NLP - VSP 2019 Lecture 10
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 10
33 pages
NLP Record
No ratings yet
NLP Record
15 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
NLTK Cheatsheet
No ratings yet
NLTK Cheatsheet
27 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
AP for NLP-LO1
No ratings yet
AP for NLP-LO1
61 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
Chapter 5
No ratings yet
Chapter 5
7 pages
Natural Language Processing With Python's NLTK Package – Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package – Real Python
27 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
AP for NLP-Word 2 Vec
No ratings yet
AP for NLP-Word 2 Vec
33 pages
NLP UNIT-2
No ratings yet
NLP UNIT-2
12 pages
Lab 2
No ratings yet
Lab 2
49 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
ChatGPT-Tokenization Stemming Lemmatization NLTK
No ratings yet
ChatGPT-Tokenization Stemming Lemmatization NLTK
110 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
nlp soln
No ratings yet
nlp soln
6 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
Lab 04 - Text Normalization Tutorial
No ratings yet
Lab 04 - Text Normalization Tutorial
5 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
NLP Op
No ratings yet
NLP Op
16 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
Experiment 3 Manual
No ratings yet
Experiment 3 Manual
7 pages
Worksheet Notes
No ratings yet
Worksheet Notes
22 pages
NLP Assignment(917722H031)
No ratings yet
NLP Assignment(917722H031)
18 pages
NLP UNIT 2 Part 2
No ratings yet
NLP UNIT 2 Part 2
6 pages
NLP Programs
No ratings yet
NLP Programs
5 pages
NLP UNIT 5 part b
100% (2)
NLP UNIT 5 part b
31 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
ir manual
No ratings yet
ir manual
53 pages
4.TWITTER EXTRACTION AND ANALYTICS
No ratings yet
4.TWITTER EXTRACTION AND ANALYTICS
45 pages
3 a Morphology
No ratings yet
3 a Morphology
4 pages
CH4
No ratings yet
CH4
15 pages
TextMining
No ratings yet
TextMining
43 pages
R22 Nlp Python Programs
No ratings yet
R22 Nlp Python Programs
15 pages
Lemmatization Approaches
No ratings yet
Lemmatization Approaches
13 pages
Text Mining
No ratings yet
Text Mining
34 pages
Nlp exp 5 , implement stemming, lemmetization, pos_ tag, wordNet - Colab
No ratings yet
Nlp exp 5 , implement stemming, lemmetization, pos_ tag, wordNet - Colab
2 pages
NLP PDF
No ratings yet
NLP PDF
3 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
UBC Summer School in NLP - VSP 2019 Lecture 11
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 11
51 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
Lab Prgms Weel1-Output
No ratings yet
Lab Prgms Weel1-Output
4 pages
Text Preprocessing: Information Retrieval
100% (2)
Text Preprocessing: Information Retrieval
16 pages
Sree017 NLP
No ratings yet
Sree017 NLP
3 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
No ratings yet
PPT for Assignment-10 (Machine Learning With Python_NLP-2)
37 pages
NLP FinAL (1)
No ratings yet
NLP FinAL (1)
27 pages
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
From Everand
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
Eric Elliott
No ratings yet
SWE-Week 05
No ratings yet
SWE-Week 05
32 pages
SWE-Week 03
No ratings yet
SWE-Week 03
21 pages
Natural language processing-Section (7)
No ratings yet
Natural language processing-Section (7)
22 pages
SWE-Week 01
No ratings yet
SWE-Week 01
25 pages
SWE-Week 02
No ratings yet
SWE-Week 02
24 pages
SWE-Week 04
No ratings yet
SWE-Week 04
17 pages
Natural language processing-Section (6)
No ratings yet
Natural language processing-Section (6)
29 pages
Natural language processing-Section (5)
No ratings yet
Natural language processing-Section (5)
38 pages
Natural language processing-Section (4)
No ratings yet
Natural language processing-Section (4)
29 pages
Natural language processing-Section (1)
No ratings yet
Natural language processing-Section (1)
57 pages
Natural_Language_Processing_Project_Spring2024-2025
No ratings yet
Natural_Language_Processing_Project_Spring2024-2025
2 pages
1-Introduction to NLP_part1
No ratings yet
1-Introduction to NLP_part1
31 pages
4-Finite State Machines_part1
No ratings yet
4-Finite State Machines_part1
31 pages
Computer
No ratings yet
Computer
5 pages
Point D.P.P Subjective
No ratings yet
Point D.P.P Subjective
5 pages
Statically Indeterminate Structures by The Matrix Force Method-3
No ratings yet
Statically Indeterminate Structures by The Matrix Force Method-3
14 pages
Andini Jupri
No ratings yet
Andini Jupri
7 pages
Worksheet2 2016
No ratings yet
Worksheet2 2016
5 pages
Full DM Notes For 3rd Sem Students
No ratings yet
Full DM Notes For 3rd Sem Students
281 pages
York Cat Min-Vrf 2012 Def en
No ratings yet
York Cat Min-Vrf 2012 Def en
68 pages
Conceptual Physics Nov 3-7
No ratings yet
Conceptual Physics Nov 3-7
1 page
atomic structure and the periodic table
No ratings yet
atomic structure and the periodic table
37 pages
27-650 ENG Manual PCD1P1001-J30 PQA 01 PDF
No ratings yet
27-650 ENG Manual PCD1P1001-J30 PQA 01 PDF
92 pages
Robotics and Automation - Question Bank EC6003
No ratings yet
Robotics and Automation - Question Bank EC6003
18 pages
3PoleContactors Digital PDF
No ratings yet
3PoleContactors Digital PDF
17 pages
Application, Data and Host Security
No ratings yet
Application, Data and Host Security
24 pages
Classification of Computer
No ratings yet
Classification of Computer
29 pages
Chapter 2 - Traffic Survey and Data Collection Methods
No ratings yet
Chapter 2 - Traffic Survey and Data Collection Methods
26 pages
Lesson Plan in Mathematics 7 Final Demo
67% (3)
Lesson Plan in Mathematics 7 Final Demo
2 pages
EC18 Errata - 2024 0221
No ratings yet
EC18 Errata - 2024 0221
18 pages
MA3251 -QUESTION BANK-IA2
No ratings yet
MA3251 -QUESTION BANK-IA2
7 pages
9.1 - Chem Project
No ratings yet
9.1 - Chem Project
18 pages
Project AC Machines
No ratings yet
Project AC Machines
5 pages
Enzyme Electrode
100% (1)
Enzyme Electrode
10 pages
motilal-oswal-midcap-fund-regular-plan (1)
No ratings yet
motilal-oswal-midcap-fund-regular-plan (1)
2 pages
Serviceability Behavior and Failure Mechanisms of Concrete Inverted T-Beam Bridge Bentcaps
No ratings yet
Serviceability Behavior and Failure Mechanisms of Concrete Inverted T-Beam Bridge Bentcaps
11 pages
Torque Measurement With Speed Sensors
No ratings yet
Torque Measurement With Speed Sensors
37 pages
Chapter 1
No ratings yet
Chapter 1
38 pages
MATHEMATICS ANNUAL REVISION WORKSHEET CLASS 8
No ratings yet
MATHEMATICS ANNUAL REVISION WORKSHEET CLASS 8
6 pages
Lecture 7 Turbulence Modeling Introducti
No ratings yet
Lecture 7 Turbulence Modeling Introducti
48 pages
Gj 024203001
No ratings yet
Gj 024203001
11 pages
5G WS 4 Throughput
67% (3)
5G WS 4 Throughput
25 pages
Logic Gates and Logic Circuits: Grade 11
No ratings yet
Logic Gates and Logic Circuits: Grade 11
20 pages

Natural language processing-Section (3)

Uploaded by

Natural language processing-Section (3)

Uploaded by

Language Engineering

Prepared by: Abdelrahman M. Safwat

Section (3) – NLTK Basics

!pip install nltk

 NLTK has a module that can tokenize text. You can

from nltk.tokenize import word_tokenize

 If we want to get the origin form of a word, we use a

from nltk.stem import PorterStemmer

words = ["connection", "connected", "connecting"]

for word in words:

 Using stemming sometimes can lead to a wrong origin

from nltk.stem import WordNetLemmatizer

text = "The rabbit was running quickly towards the carrot"

for word in tokenized_text:

 In the above example, you’ll see that there is no

from nltk.stem import WordNetLemmatizer

text = "The rabbit was running quickly towards the carrot"

for word in tokenized_text:

 Parts of Speech tagging is the process of tagging a

text = "The rabbit was running quickly towards the carrot"

Tag Meaning Examples

 In lemmatization, we mentioned a process similar to

The WordNet is a part of Python's Natural Language Toolkit. It

WordNet has been used for a number of purposes in

In WordNet, similar words are grouped into a set known

Every Synset has a name, a part-of-speech, and a

The function wordnet.synsets('word’);

returns an array containing all the Synsets related to the

[Synset(‘room.n.01’), Synset(‘room.n.02’), Synset(‘room.n.03’), Synset(‘room.n.04’),

also suggests that the word ‘room’ has a total of five

from nltk.corpus import wordnet

print("Name: " + synset.name())

 Read a PDF file using the PyPDF2 library, extract the

 Use stemming to transform the word to its root form.

 Write a code to determine stems in the input

You might also like