0% found this document useful (0 votes)

101 views64 pages

Natural Language Processing: Practical 1

The document provides instructions for removing stop words from text using the Natural Language Toolkit (NLTK) in Python. It first discusses what stop words are and some common examples. It then outlines the steps to remove stop words using NLTK, which includes installing NLTK, importing it, opening and reading a text file, getting a list of stop words, removing them from the text, and printing the modified output. The document thus provides a concise tutorial for using NLTK to remove stop words from text in a few simple steps.

Uploaded by

hamza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views64 pages

Natural Language Processing: Practical 1

Uploaded by

hamza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Natural Language Processing

Practical 1 BS(SE) 5th Semester

Dr. Nayyar Iqbal, Lecturer Department of Computer Science

Task
• Open Text File in Python
• Check the Contents of File
• Detect and Erase URLs
• Check if URLs are removed or not
• Save it as a New File
Open Text File in Python
• Create a “Nayyar.txt” file using Notepad
• Save the “Nayyar.txt” file on hard disk drive
• Open text file using open() function
• file = open(“C:\\Users\\HP\\Desktop\\Practical NLP\\Nayyar.txt”,”r”)

• First parameter is “Path of file containing drive, folder, sub-folder, filename and
extension”
• Second parameter is “mode of opening file”
• “r” is read-only
Check the Contents of File
• Save contents of a file in a variable by reading the file contents

• read() function reads the contents of file from start to end and stores in text variable

• Print the contents of a file

• print() function prints the contents line by line

Detect and Erase URLs
• We need regular expression library: re
• We need to import it using: import re

• Use re.sub() function to replace HTTP, HTTPs URLs with blank space and store
the result in a separate variable

• First parameter is regular expression for detecting URLs with http/https

• Second parameter is string to replace URLs with blank space
• Third parameter is contents of text file
• Fourth parameter is flag
Check if contents are removed or not…
• Use print() to display contents after removing URLs
Save it as a New File
• Open a new file (that you want to save, name it)

• open() method opens new file with “w+” right

• Write contents to the new file

• Close the newly created file to save it on disk

Complete Code
Natural Language Processing
Practical 2 BS(SE) 5th Semester

Dr. Nayyar Iqbal, Lecturer Department of Computer Science

Stem or Lemmatize Words
Python Practical 6
Stemming
• Process of reducing a word to its word stem
• Important in natural language understanding (NLU) and natural
language processing (NLP)
• Stemming is also a part of queries and Internet search engines
• Example:
• Words = Used, User, Using, Usable
• Stem = USE
Stemming and Lemmatization
• Stemming and Lemmatization are Text Normalization/Word
Normalization
• Used in Natural Language Processing that are used to prepare text,
words, and documents for further processing
How to do it in Python?
• Using NLTK [Natural Language Tool Kit]
• Steps:
• Download and Install NLTK
• Stemming Words using PorterStemmer and LancasterStemmer
• Stemming Sentences (with TOKENIZATION)
• Stemming Documents
Step 1-Download and Install NLTK
• NLTK stands for Natural Language Toolkit
• This is a suite of libraries and programs for statistical natural language processing
for English written in Python
• Open command prompt as Administrator
• Use command pip install NLTK in command prompt
Step 1-Download and Install NLTK

• Use: nltk.download() in Python Shell

• Use the downloader to download packages, corpora, models
Step 2-Stemming Words using
PorterStemmer and LancasterStemmer
• PorterStemmer uses Suffix Stripping to produce stems. Notice how the
PorterStemmer is giving the root (stem) of the word "cats" by simply
removing the 's' after cat.
• Simple and Speedy

• The LancasterStemmer (Paice-Husk stemmer) is an iterative algorithm

with rules saved externally
• One table containing about 120 rules indexed by the last letter of a suffix
• On each iteration, it tries to find an applicable rule by the last character of the
word
• Each rule specifies either a deletion or replacement of an ending
Step 2-Stemming Words using
PorterStemmer and LancasterStemmer
Import Stemmers

Create Objects of Stemmers

Stemming using PorterStemmer

Stemming using LancasterStemmer

Step 2-Stemming Sentences (without
TOKENIZATION)
Import Stemmers

Create Objects of Stemmers

Stemming using

1. PorterStemmer
2. LancasterStemmer

As you see the stemmer sees the entire sentence as a

word, so it returns it as it is. We need to stem each
word in the sentence and return a combined sentence.
To separate the sentence into words, you can
use tokenizer.
Step 2-Stemming Sentences (with
TOKENIZATION)
Import Stemmers

Create Objects of Stemmers

Import TOKENIZER

Set the Sentence

Tokenize the Sentence &

Display TOKENS

Stem TOKENS and store

In a separate list

Join the list containing

STEM words
Natural Language Processing
Practical 2 BS(SE) 5th Semester

Dr. Nayyar Iqbal, Lecturer Department of Computer Science

Split Text into Words via Tokenization

Python Practical
Objectives
• What is tokenization?
• Different ways to perform tokenization
What is Tokenization?
• Tokenization is a common task a data scientist comes
across when working with text data
• It consists of splitting an entire text into small units, also
known as tokens
• NLP projects have tokenization as the first step because it
is the foundation for developing good models and helps
better understand the text we have
Different Ways to Tokenize Text
1. Using split() function
2. NLTK
Tokenization with split() - Steps
• Open the text document

• Print the text document

• Tokenize the text document

• Display tokens on screen

Complete Code
2. NLTK – Steps
• NLTK stands for Natural Language Toolkit
• This is a suite of libraries and programs for statistical natural language processing
for English written in Python
• Recipe:
• Install NLTK
• Import NLTK
• Read text document file
• Display the text
• Tokenize text
• Display tokens on screen
Install NLTK
• Open command prompt as Administrator
• Use command pip install NLTK in command prompt
Import NLTK in Python
• NLTK needs to be imported in Python program
• NLTK contains a module called tokenize – we’ll be importing this module
• import nltk
• from nltk.tokenize import word_tokenize
Open and read text document file
• Same method used previously
Display text document
• Same method used previously
Tokenize text with NLTK & Display Tokes
• Use word_tokenize() method

• print() method to display tokens

Complete Code
Error! (Use nltk.download)

• Use: nltk.download()
Natural Language Processing
Practical 3 BS(SE) 5th Semester

Dr. Nayyar Iqbal, Lecturer Department of Computer Science

Remove Punctuation Marks from Text

Python Practical
Punctuations
• Symbols
• Used to punctuate text
• .,;“‘!- [different punctuation marks]
How to remove punctuation marks?
• Three methods:
• Using string library
• Using for each loop
• Using replace() function
Method # 1 – Using string library
1. Import string library
2. Open a text file and read it
3. Print text
4. Remove punctuations
5. Print text without punctuations
Step # 1 – Import string library
• Python string is a built-in module contains some
constants, utility function, and classes for string
manipulation.
Step # 2 – Open text file and read it
• Use already discussed method
Step # 3 – Print text
• Use already discussed method
Step # 4 – Remove punctuations
• Use translate() and maketrans() methods

https://datagy.io/python-remove-punctuation-from-string/
Step # 4 – Print text without punc.
Complete Code
Detect Language of Text
Python Practical 4
Using “TextBlob”
• Steps:
• Install TextBlob library
Step 1: Install TextBlob Library
• Open Command Prompt
• Install TextBlob library using command:
• pip install textblob
Step 2: Import textblob Library
• Open python IDLE Shell
• Write the code to import Translator
• from textblob import TextBlob
Step 3: Detect the language
• Create text variable and store some sentence:

• Convert sentence into Blob and detect language

Complete Code
Remove Stop Words from
Text
Python Practical 5
Stop Words
• Common words in any language like:
• Articles e.g., the and a/an
• Prepositions
A preposition is a word or group of words used before a noun, pronoun,
or noun phrase to show direction, time, place, location, spatial
relationships, or to introduce an object. Some examples of
prepositions are words like "in," "at," "on," "of," and "to.“
• Pronouns
A pronoun (I, me, he, she, herself, you, it, that, they, each, few, many, who,
whoever, whose, someone, everybody, etc.) is a word that takes the place
of a noun. In the sentence Joe saw Jill, and he waved at her, the pronouns
he and her take the place of Joe and Jill, respectively. There are three
types of pronouns: subject (for example, he ); object (him); or possessive
(his).
• Conjunctions
A conjunction is a word that is used to connect words,
phrases, and clauses. There are many conjunctions in the
English language, but some common ones include and, or, but,
because, for, if, and when.
Methods to remove stop words
• Different methods:
• Using NLTK
• Using spaCV
• Using Gensim
• Using SKLearn
Method # 1: Using NLTK
• NLTK = Natural Language Toolkit
• Steps:
• Install nltk
• Import nltk
• Open and read text file
• Display the text in the text file (optional)
• Get a list of stop words
• Remove stop words
• Print modified output
Step 1: Install NLTK
• Open command prompt as Administrator
• Use command pip install NLTK in command prompt
Step 2: Import NLTK in Python
• NLTK needs to be imported in Python program
• NLTK contains a module called tokenize – we’ll be importing this module
• import nltk
• from nltk.tokenize import word_tokenize
Step 3: Open and read text document
file
• Same method used previously
Step 4: Display the text in the text
file
• Same method used previously
Step 5: Get list of stop words
Step 6: Remove stop words
Step 7: Print modified text
Complete code

Experience: Aws Solutions Architect July 2015 - Present
No ratings yet
Experience: Aws Solutions Architect July 2015 - Present
2 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
NLP Programs
No ratings yet
NLP Programs
5 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
Lab Prgms Weel1-Output
No ratings yet
Lab Prgms Weel1-Output
4 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
Lab 2
No ratings yet
Lab 2
49 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
Ir Manual
No ratings yet
Ir Manual
53 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
Unit 5 Machine Learning
No ratings yet
Unit 5 Machine Learning
9 pages
Natural Language Processing With Python's NLTK Package - Real Python
No ratings yet
Natural Language Processing With Python's NLTK Package - Real Python
27 pages
UNIT-V-NLP Using NLTK
No ratings yet
UNIT-V-NLP Using NLTK
19 pages
01 NLP - Merged Vinay
No ratings yet
01 NLP - Merged Vinay
27 pages
NLTK
No ratings yet
NLTK
3 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
UBC Summer School in NLP - VSP 2019 Lecture 10
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 10
33 pages
NLTK Tutorial
No ratings yet
NLTK Tutorial
33 pages
NLP Basics
No ratings yet
NLP Basics
12 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
Removing Stopwords in NLP
No ratings yet
Removing Stopwords in NLP
32 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
NLP Intro
No ratings yet
NLP Intro
15 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Natural Language Processing Dossier 20231110 141736 0000
No ratings yet
Natural Language Processing Dossier 20231110 141736 0000
114 pages
NLP Lab Manual Lab Work
No ratings yet
NLP Lab Manual Lab Work
24 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
NLB Final Lab Manual
No ratings yet
NLB Final Lab Manual
23 pages
Clint-Roy Muvirimi-Mukarakate H1802386 AI Practical Assignment
No ratings yet
Clint-Roy Muvirimi-Mukarakate H1802386 AI Practical Assignment
8 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
NLP Lab Programms
No ratings yet
NLP Lab Programms
9 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
NLP Experiment 1
No ratings yet
NLP Experiment 1
13 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
4.twitter Extraction and Analytics
No ratings yet
4.twitter Extraction and Analytics
45 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
29 pages
Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
No ratings yet
Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
81 pages
Understanding Language Model
No ratings yet
Understanding Language Model
5 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Python and NLP Notes
No ratings yet
Python and NLP Notes
32 pages
Final Summary NLP
No ratings yet
Final Summary NLP
446 pages
Tokenizer
No ratings yet
Tokenizer
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
NLP Record
No ratings yet
NLP Record
6 pages
AP For NLP-LO1
No ratings yet
AP For NLP-LO1
61 pages
Experiment: 1
No ratings yet
Experiment: 1
28 pages
Text Processing
No ratings yet
Text Processing
16 pages
Token Ization
No ratings yet
Token Ization
5 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
25 pages
Mastering Python in 7 Days
From Everand
Mastering Python in 7 Days
Alex Wood
No ratings yet
Safe Exam Browser Usage Instructions For Windows
No ratings yet
Safe Exam Browser Usage Instructions For Windows
2 pages
Check Point 15600 Next Generation Security Gateway PDF
No ratings yet
Check Point 15600 Next Generation Security Gateway PDF
5 pages
Code Injection Series Part1
No ratings yet
Code Injection Series Part1
18 pages
VoIP Through OPNET's System-In-The-Loop
100% (1)
VoIP Through OPNET's System-In-The-Loop
30 pages
Online Safety
No ratings yet
Online Safety
5 pages
Address Resolution Protocol (ARP) : Relates To Lab 2
No ratings yet
Address Resolution Protocol (ARP) : Relates To Lab 2
12 pages
Lesson Plan Copa NW
No ratings yet
Lesson Plan Copa NW
91 pages
Parity & ECC - How They Work The Need For Error Checking
No ratings yet
Parity & ECC - How They Work The Need For Error Checking
4 pages
Updating The 3G/3GP: Software & Navigation Database: MMI NAR
No ratings yet
Updating The 3G/3GP: Software & Navigation Database: MMI NAR
59 pages
Cloud Security Vulnerabilities
No ratings yet
Cloud Security Vulnerabilities
7 pages
2012 Sampling Exam - IT Portion Exam
No ratings yet
2012 Sampling Exam - IT Portion Exam
20 pages
Security Kaizen 5
No ratings yet
Security Kaizen 5
23 pages
Tor PDF
No ratings yet
Tor PDF
19 pages
Data Alteration
No ratings yet
Data Alteration
13 pages
MVC Interview Questions
No ratings yet
MVC Interview Questions
7 pages
Lesson 6 PCO Multimedia-PRE FINALS
No ratings yet
Lesson 6 PCO Multimedia-PRE FINALS
5 pages
Unit 6 - Stream Classes
No ratings yet
Unit 6 - Stream Classes
39 pages
Datasheet of DS 7608NI K2 NVRD - V4.31.610 - 20220412
No ratings yet
Datasheet of DS 7608NI K2 NVRD - V4.31.610 - 20220412
4 pages
CYBERSECURITY Test 2
No ratings yet
CYBERSECURITY Test 2
1 page
Ax Adc: Application Delivery Controller
No ratings yet
Ax Adc: Application Delivery Controller
5 pages
Sarko Yaseen Haji Mar.2022 Tommorw
No ratings yet
Sarko Yaseen Haji Mar.2022 Tommorw
5 pages
Combining Security Associations
No ratings yet
Combining Security Associations
3 pages
Security and Privacy of Internet of Medical Things (Iomt) Based Healthcare Applications: A Review
No ratings yet
Security and Privacy of Internet of Medical Things (Iomt) Based Healthcare Applications: A Review
18 pages
Retable Home - Turn Complex Spreadsheets Into Smart Database
No ratings yet
Retable Home - Turn Complex Spreadsheets Into Smart Database
17 pages
Open 653
No ratings yet
Open 653
118 pages
CP1 Manual
No ratings yet
CP1 Manual
4 pages
Benefits of Hyperlink
No ratings yet
Benefits of Hyperlink
4 pages
Things To Know About Cybersecurity Today
No ratings yet
Things To Know About Cybersecurity Today
7 pages
S.No Register No. Student Name Office Namcourse Na
No ratings yet
S.No Register No. Student Name Office Namcourse Na
18 pages

Natural Language Processing: Practical 1

Uploaded by

Natural Language Processing: Practical 1

Uploaded by

Natural Language Processing

Practical 1 BS(SE) 5th Semester

Dr. Nayyar Iqbal, Lecturer Department of Computer Science

• Print the contents of a file

• print() function prints the contents line by line

• First parameter is regular expression for detecting URLs with http/https

• open() method opens new file with “w+” right

• Write contents to the new file

• Close the newly created file to save it on disk

Dr. Nayyar Iqbal, Lecturer Department of Computer Science

• Use: nltk.download() in Python Shell

• The LancasterStemmer (Paice-Husk stemmer) is an iterative algorithm

Create Objects of Stemmers

Stemming using PorterStemmer

Stemming using LancasterStemmer

Create Objects of Stemmers

As you see the stemmer sees the entire sentence as a

Create Objects of Stemmers

Set the Sentence

Tokenize the Sentence &

Stem TOKENS and store

Join the list containing

Dr. Nayyar Iqbal, Lecturer Department of Computer Science

• Print the text document

• Tokenize the text document

• Display tokens on screen

• print() method to display tokens

Dr. Nayyar Iqbal, Lecturer Department of Computer Science

• Convert sentence into Blob and detect language

You might also like