0% found this document useful (0 votes)

9 views2 pages

Write A Python Program For The Following Preprocessing of Text in NLP: Tokenization Filtration Script Validation Stop Word Removal Stemming

The document provides a Python program for text preprocessing in NLP, which includes tokenization, filtration, script validation, stop word removal, and stemming. It utilizes the NLTK library for various text processing tasks and the langdetect library for language detection. An example usage is included to demonstrate the preprocessing steps on a sample text.

Uploaded by

Nidhi Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views2 pages

Write A Python Program For The Following Preprocessing of Text in NLP: Tokenization Filtration Script Validation Stop Word Removal Stemming

Uploaded by

Nidhi Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

1 Write a Python program for the following preprocessing of text in NLP:

● Tokenization
● Filtration
● Script Validation
● Stop Word Removal
● Stemming

pip install nltk langdetect

import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
import re
from langdetect import detect

# Download necessary resources

nltk.download('punkt')
nltk.download('stopwords')

def preprocess_text(text):
# Tokenization
tokens = word_tokenize(text)
print("Tokens:", tokens)

# Filtration (Removing non-alphabetic tokens)

filtered_tokens = [word for word in tokens if word.isalpha()]
print("Filtered Tokens:", filtered_tokens)

# Script Validation (Checking if text is in English)

try:
if detect(text) != 'en':
return "Text is not in English, skipping preprocessing."
except:
return "Language detection failed."

# Stop Word Removal

stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in filtered_tokens if word.lower() not in
stop_words]
print("After Stop Word Removal:", filtered_tokens)

# Stemming
stemmer = PorterStemmer()
stemmed_tokens = [stemmer.stem(word) for word in filtered_tokens]
print("Stemmed Tokens:", stemmed_tokens)

return ' '.join(stemmed_tokens)

# Example Usage
text = "This is an example sentence demonstrating text preprocessing in NLP!"
processed_text = preprocess_text(text)
print("Processed Text:", processed_text)

OUTPUT

Tokens: ['This', 'is', 'an', 'example', 'sentence', 'demonstrating', 'text',

'preprocessing', 'in', 'NLP', '!']

Filtered Tokens: ['This', 'is', 'an', 'example', 'sentence', 'demonstrating', 'text',

'preprocessing', 'in', 'NLP']

After Stop Word Removal: ['example', 'sentence', 'demonstrating', 'text',

'preprocessing', 'NLP']

Stemmed Tokens: ['exampl', 'sentenc', 'demonstr', 'text', 'preprocess', 'nlp']

Processed Text: exampl sentenc demonstr text preprocess nlp

NLTK Tutorial
No ratings yet
NLTK Tutorial
33 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
Tokenizer
No ratings yet
Tokenizer
4 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
NLP Lab 1
No ratings yet
NLP Lab 1
1 page
NLP Programs
No ratings yet
NLP Programs
5 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
Sahil NLP
No ratings yet
Sahil NLP
16 pages
NLP 02
No ratings yet
NLP 02
6 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
NLP Experiment 2
No ratings yet
NLP Experiment 2
5 pages
Ir 1 Stop Word Removed
No ratings yet
Ir 1 Stop Word Removed
1 page
NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
NLP Lab Manual 3-2 Aiml R22 Update
100% (1)
NLP Lab Manual 3-2 Aiml R22 Update
20 pages
AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem
No ratings yet
AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem
20 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
NLP Practicals All
No ratings yet
NLP Practicals All
57 pages
NLP Lab File
No ratings yet
NLP Lab File
13 pages
NLP Experiment 1
No ratings yet
NLP Experiment 1
13 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
Date: Practical No.4:: Foundation of AI and ML (4351601)
No ratings yet
Date: Practical No.4:: Foundation of AI and ML (4351601)
10 pages
NLP Preprocessing Steps 1740444240
No ratings yet
NLP Preprocessing Steps 1740444240
20 pages
Lab Prgms Weel1-Output
No ratings yet
Lab Prgms Weel1-Output
4 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Aiml P4
No ratings yet
Aiml P4
12 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Pranay Assign 2
No ratings yet
Pranay Assign 2
1 page
A7 Dsbda Sana
No ratings yet
A7 Dsbda Sana
15 pages
Pranay Assign 1
No ratings yet
Pranay Assign 1
2 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
3 pages
NLP - Lab - 1.ipynb - Colab
No ratings yet
NLP - Lab - 1.ipynb - Colab
4 pages
Prog 1
No ratings yet
Prog 1
2 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
NLP
No ratings yet
NLP
12 pages
Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
No ratings yet
Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
81 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
NLP Record
No ratings yet
NLP Record
23 pages
Text Preprocessing
No ratings yet
Text Preprocessing
3 pages
123 NLP 456
No ratings yet
123 NLP 456
4 pages
DS 7
No ratings yet
DS 7
3 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
1.implement and Demonstrate Depth First Search Algorithm On Water Jug Problem
No ratings yet
1.implement and Demonstrate Depth First Search Algorithm On Water Jug Problem
2 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
2) Implement and Demonstrate Best First Search Algorithm On Missionaries
No ratings yet
2) Implement and Demonstrate Best First Search Algorithm On Missionaries
3 pages
NLP Lab1
No ratings yet
NLP Lab1
2 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
32 pages
Ai Lab Program 2
No ratings yet
Ai Lab Program 2
3 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
Pertemuan 3 - Preprocessing
No ratings yet
Pertemuan 3 - Preprocessing
25 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
Ddco Simp 2024
No ratings yet
Ddco Simp 2024
3 pages
PART B Python
No ratings yet
PART B Python
2 pages
Lab 2
No ratings yet
Lab 2
4 pages
Token Ization
No ratings yet
Token Ization
5 pages
Assignment Sub 1ec
No ratings yet
Assignment Sub 1ec
1 page
CO Viva Questions
No ratings yet
CO Viva Questions
1 page

Write A Python Program For The Following Preprocessing of Text in NLP: Tokenization Filtration Script Validation Stop Word Removal Stemming

Uploaded by

Write A Python Program For The Following Preprocessing of Text in NLP: Tokenization Filtration Script Validation Stop Word Removal Stemming

Uploaded by

1 Write a Python program for the following preprocessing of text in NLP:

pip install nltk langdetect

# Download necessary resources

# Filtration (Removing non-alphabetic tokens)

# Script Validation (Checking if text is in English)

# Stop Word Removal

return ' '.join(stemmed_tokens)

Tokens: ['This', 'is', 'an', 'example', 'sentence', 'demonstrating', 'text',

Filtered Tokens: ['This', 'is', 'an', 'example', 'sentence', 'demonstrating', 'text',

After Stop Word Removal: ['example', 'sentence', 'demonstrating', 'text',

Stemmed Tokens: ['exampl', 'sentenc', 'demonstr', 'text', 'preprocess', 'nlp']

Processed Text: exampl sentenc demonstr text preprocess nlp

You might also like