0% found this document useful (0 votes)

35 views24 pages

Language Engineering - Section

This document provides an overview of natural language processing basics using the NLTK library in Python. It covers how to install and import NLTK, tokenize text into sentences and words, stem and lemmatize words, part-of-speech tag words in sentences, and describes how WordNet can be used to look up words and their relationships. Examples are given for each NLTK concept along with tasks for readers to try out tokenization, tagging and stemming on their own.

Uploaded by

asmaa soliman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views24 pages

Language Engineering - Section

Uploaded by

asmaa soliman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Language Engineering

Prepared by: Abdelrahman M. Safwat

Section (3) – NLTK Basics

Installing NLTK

!pip install nltk

import nltk
nltk.download(“punkt”)
nltk.download(“wordnet”)
nltk.download(“averaged_perceptron_tagger”)

2
Tokenizing

 NLTK has a module that can tokenize text. You can tokenize text
based on sentences or words.
from nltk.tokenize import sent_tokenize

text = """To be, or not to be, that is the question. Whether 'tis nobler in the mind to suffer. The slings and arrows
of outrageous fortune, or to take arms against a sea of troubles. And by opposing end them. To die—to sleep, no
more; and by a sleep to say we end. The heart-ache and the thousand natural shocks"""

tokenized_text = sent_tokenize(text)

print(tokenized_text)
3
Tokenizing

from nltk.tokenize import word_tokenize

tokenized_text = word_tokenize(text)

print(tokenized_text)

4
Stemming

 If we want to get the origin form of a word, we use a stemmer.

 For example, stemming the word “connection,” “connecting,” or
“connected” would all result in the word “connect”

5
Stemming

from nltk.stem import PorterStemmer

words = ["connection", "connected", "connecting"]

for word in words:
print(PorterStemmer().stem(word))

6
Lemmatization

 Using stemming sometimes can lead to a wrong origin word, or a

word that doesn’t exist.
 In that case, we can use lemmatization, which is similar to looking
up the origin of a word in a dictionary.

7
Lemmatization

from nltk.stem import WordNetLemmatizer

text = "The rabbit was running quickly towards the carrot"
tokenized_text = nltk.word_tokenize(text)

for word in tokenized_text:
print(WordNetLemmatizer().lemmatize(word))

8
Lemmatization

 In the above example, you’ll see that there is no meaningful

change after lemmatization.
 That’s because you need to provide the lemmatization function
with Parts of Speech tags.

9
Lemmatization

from nltk.stem import WordNetLemmatizer

text = "The rabbit was running quickly towards the carrot"
tokenized_text = nltk.word_tokenize(text)

for word in tokenized_text:
print(WordNetLemmatizer().lemmatize(word, pos="v"))

10
Parts of Speech Tagging

 Parts of Speech tagging is the process of tagging a word in a text

based on its definition and context.
 Example: Tagging “likes” as a verb.
 Note: To tag words in a text, you need to tokenize it first.

11
Parts of Speech Tagging

import nltk

text = "The rabbit was running quickly towards the carrot"
tokenized_text = nltk.word_tokenize(text)

print(nltk.pos_tag(tokenized_text))

12
Parts of Speech Tagging

Tag Meaning Examples

ADJ Adjective new, good, high, special, big
ADP Adposition on, of, at, with, by, into, under
ADV Adverb really, already, still, early, now
CONJ Conjunction and, or, but, if, while
DET Determiner the, a, some, most, every
NOUN Noun year, home, costs, time
NUM Numeral twenty-four, fourth, 1991
PRT Particle at, on, out, over per, that, up
PRON Pronoun he, their, her, its, my, I, us
VERB Verb is, say, told, given, playing
13
. Punctuation .,;!
WordNet

 In lemmatization, we mentioned a process similar to looking up a

dictionary. WordNet is what we use to look up words.
 WordNet is similar to a database or a dictionary of links and
relationships between words.

14
WordNet

The WordNet is a part of Python's Natural Language Toolkit. It is a

large word database of English Nouns, Adjectives, Adverbs and Verbs.

WordNet has been used for a number of purposes in information systems,

including

• Word-sense disambiguation
• Information retrieval
• Automatic text classification
• Automatic text summarization
• Machine translation 15
Example (Synsets and Lemmas)

In WordNet, similar words are grouped into a set known as a Synset

Every Synset has a name, a part-of-speech, and a number. The words in

a Synset are known as Lemmas.

16
code

The function wordnet.synsets('word’);

returns an array containing all the Synsets related to the word passed to it as

the argument.
from nltk.corpus import wordnet
synset = wordnet.synsets(“room”)

17
Output

[Synset(‘room.n.01’), Synset(‘room.n.02’), Synset(‘room.n.03’), Synset(‘room.n.04’), Synset(‘board.v.02’)]

four have the name ’room’ and are a nouns, while the last one’s name
is ’board’ and is a verb.

also suggests that the word ‘room’ has a total of five meanings or
contexts.
18
WordNet

from nltk.corpus import wordnet

word = "hungry"
synset = wordnet.synsets(word)[0]

print("Name: " + synset.name())
print("Description: " + synset.definition())
print("Antonym: " + synset.lemmas()[0].antonyms()[0].name())
print("Examples: " + synset.examples()[0])

19
Try it out yourself

 Code:
https://colab.research.google.com/drive/1wLjqqi4aLEY2PWDcpax-_
4tCyh946yVQ
 Parts of Speech tagger:
https://parts-of-speech.info/
 WordNet search:
http://wordnetweb.princeton.edu/perl/webwn
20
Task #1

 Read a PDF file using the PyPDF2 library, extract the text from the
first page, tokenize it into sentences and then tag with the Parts of
Speech tagger.

21
Task #2

 With using stemming transform the word to its root form.

22
Thank you for your attention!

23
References

 https://medium.com/@gianpaul.r/tokenization-and-parts-of-speech-pos-tagging-in-pythons-nltk-library-2d30
f70af13b
 https://medium.com/@gaurav5430/using-nltk-for-lemmatizing-sentences-c1bfff963258
 https://www.datacamp.com/community/tutorials/stemming-lemmatization-python
 https://www.nltk.org/book/ch05.html#tab-universal-tagset

English Ahead 8 TG
No ratings yet
English Ahead 8 TG
97 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
Third Term Examination Verbal Reasoning Primary 4 (Basic 4) - Exam Questions - ClassRoom Lesson Notes
No ratings yet
Third Term Examination Verbal Reasoning Primary 4 (Basic 4) - Exam Questions - ClassRoom Lesson Notes
11 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP Unit-2
No ratings yet
NLP Unit-2
12 pages
Questions: Yes / No Question Information Question
No ratings yet
Questions: Yes / No Question Information Question
10 pages
ChatGPT-Tokenization Stemming Lemmatization NLTK
No ratings yet
ChatGPT-Tokenization Stemming Lemmatization NLTK
110 pages
AP For NLP-LO1
No ratings yet
AP For NLP-LO1
61 pages
Ir Manual
No ratings yet
Ir Manual
53 pages
Describing Character and Behavior American English Teacher
No ratings yet
Describing Character and Behavior American English Teacher
8 pages
Lab 2
No ratings yet
Lab 2
49 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
Angličtina B1 - Ukážka
No ratings yet
Angličtina B1 - Ukážka
32 pages
List of Regular Verbs in English
No ratings yet
List of Regular Verbs in English
5 pages
TextMining
No ratings yet
TextMining
43 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
25 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
NLTK Cheatsheet
No ratings yet
NLTK Cheatsheet
27 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
UBC Summer School in NLP - VSP 2019 Lecture 10
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 10
33 pages
Natural Language Processing With Python's NLTK Package - Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package - Real Python
27 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
NLP FinAL
No ratings yet
NLP FinAL
27 pages
UNIT-V-NLP Using NLTK
No ratings yet
UNIT-V-NLP Using NLTK
19 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
01 NLP - Merged Vinay
No ratings yet
01 NLP - Merged Vinay
27 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
NLP - Exp 1 11
No ratings yet
NLP - Exp 1 11
29 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
32 pages
NLP Intro
No ratings yet
NLP Intro
15 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
NLP Op
No ratings yet
NLP Op
16 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
NLP Record
No ratings yet
NLP Record
15 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
NLP Manual
No ratings yet
NLP Manual
15 pages
Lemmatization Approaches
No ratings yet
Lemmatization Approaches
13 pages
Chapter 5
No ratings yet
Chapter 5
7 pages
Lab 04 - Text Normalization Tutorial
No ratings yet
Lab 04 - Text Normalization Tutorial
5 pages
NLTK
No ratings yet
NLTK
4 pages
NLP UNIT 2 Part 2
No ratings yet
NLP UNIT 2 Part 2
6 pages
Experiment 3 Manual
No ratings yet
Experiment 3 Manual
7 pages
NLP Soln
No ratings yet
NLP Soln
6 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
3 A Morphology
No ratings yet
3 A Morphology
4 pages
NLTK
No ratings yet
NLTK
3 pages
NLP PDF
No ratings yet
NLP PDF
3 pages
NLP Exp-123
No ratings yet
NLP Exp-123
6 pages
NLP Programs
No ratings yet
NLP Programs
5 pages
NLP Lab Programms
No ratings yet
NLP Lab Programms
9 pages
NLP Exp 5, Implement Stemming, Lemmetization, Pos - Tag, Wordnet - Colab
No ratings yet
NLP Exp 5, Implement Stemming, Lemmetization, Pos - Tag, Wordnet - Colab
2 pages
NLP Lab
No ratings yet
NLP Lab
7 pages
A Dream of Hitchcock 1st Edition Murray Pomerance. Instant Download
No ratings yet
A Dream of Hitchcock 1st Edition Murray Pomerance. Instant Download
61 pages
Guia de Estudio #1
No ratings yet
Guia de Estudio #1
14 pages
Book of Genesis
No ratings yet
Book of Genesis
132 pages
Software Engineering 1 Sec
No ratings yet
Software Engineering 1 Sec
21 pages
ODF2 Unit 7 Grammar Practice
No ratings yet
ODF2 Unit 7 Grammar Practice
2 pages
Bai Tap Tieng Anh 8 I Learn Smart World Unit 2
No ratings yet
Bai Tap Tieng Anh 8 I Learn Smart World Unit 2
3 pages
Workbook 2025 - 6to Uzzi - Turno Mañana
No ratings yet
Workbook 2025 - 6to Uzzi - Turno Mañana
126 pages
LearnEnglish Listening A2 Changing Meeting Time 1 2
No ratings yet
LearnEnglish Listening A2 Changing Meeting Time 1 2
2 pages
Module 1 - Ch3 - Programming Methodologies, Approaches and Design Tools
No ratings yet
Module 1 - Ch3 - Programming Methodologies, Approaches and Design Tools
47 pages
So That in Order To
No ratings yet
So That in Order To
27 pages
The Appointment of A Mushir (Advisor) by Sultan Zainal Abidin Iii in Efforts To Avoid British Interference in The Governance of Terengganu
No ratings yet
The Appointment of A Mushir (Advisor) by Sultan Zainal Abidin Iii in Efforts To Avoid British Interference in The Governance of Terengganu
6 pages
ответы англ сессия
No ratings yet
ответы англ сессия
11 pages
OR Section 5
No ratings yet
OR Section 5
49 pages
Sdit. 1
No ratings yet
Sdit. 1
7 pages
PLC Symposium
No ratings yet
PLC Symposium
16 pages
Speaking Prompts - Unit 1-12-ACE-101
No ratings yet
Speaking Prompts - Unit 1-12-ACE-101
7 pages
Milena Rodriguez - FOL
No ratings yet
Milena Rodriguez - FOL
13 pages
Informal Letter
No ratings yet
Informal Letter
7 pages
My Schedule
No ratings yet
My Schedule
1 page
Language Engineering - Section
No ratings yet
Language Engineering - Section
20 pages
JAMBOREE
No ratings yet
JAMBOREE
6 pages
IT221-Computer Graphics - Lab 5
No ratings yet
IT221-Computer Graphics - Lab 5
15 pages
Inspetor StonIs s1 E3 Activities Portuguese
No ratings yet
Inspetor StonIs s1 E3 Activities Portuguese
2 pages
Roman Numerals Chart
No ratings yet
Roman Numerals Chart
2 pages
Jadwal Supervisi 2324 Ganjil
No ratings yet
Jadwal Supervisi 2324 Ganjil
3 pages
Hindi Worksheet Classs 9
No ratings yet
Hindi Worksheet Classs 9
2 pages
English1 Q4 W5 2021
No ratings yet
English1 Q4 W5 2021
5 pages
Grammar Revision
No ratings yet
Grammar Revision
3 pages
SDL Trados Studio – A Practical Guide
From Everand
SDL Trados Studio – A Practical Guide
Andy Walker
5/5 (1)
Useful Python
From Everand
Useful Python
Stuart Langridge
No ratings yet
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet