0% found this document useful (0 votes)

35 views4 pages

NLP - Lab - 1.ipynb - Colab

The document outlines various Natural Language Processing (NLP) techniques using the NLTK library, including tokenization, stopwords removal, part-of-speech tagging, stemming, lemmatization, and word frequency counting. Each technique is demonstrated with example code and outputs. The document serves as a practical guide for implementing these NLP methods in Python.

Uploaded by

Likhit Pvss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views4 pages

NLP - Lab - 1.ipynb - Colab

Uploaded by

Likhit Pvss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

3/3/25, 12:37 PM NLP_Lab_1.

ipynb - Colab

keyboard_arrow_down Tokenization
import nltk
nltk.download('punkt_tab')
from nltk.tokenize import word_tokenize
def tokenize_text(text):
tokens = word_tokenize(text)
return tokens
# Example usage:
text = "This is an example sentence. Tokenization is important in NLP."
tokens = tokenize_text(text)
tokens

[nltk_data] Downloading package punkt_tab to /root/nltk_data...

[nltk_data] Package punkt_tab is already up-to-date!
['This',
'is',
'an',
'example',
'sentence',
'.',
'Tokenization',
'is',
'important',
'in',
'NLP',
'.']

keyboard_arrow_down Stopwords removal

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

nltk.download('stopwords')
nltk.download('punkt')

text = "This is a sample sentence, showing off stop word filtering."

words = word_tokenize(text)
filtered_text = [word for word in words if word.lower() not in stopwords.words('english')]

print(filtered_text) # Output: ['sample', 'sentence', ',', 'showing', 'stop', 'word', 'filt

['sample', 'sentence', ',', 'showing', 'stop', 'word', 'filtering', '.']

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Package stopwords is already up-to-date!

https://colab.research.google.com/drive/1ZkyIk18BbWhzTjFZ_358EaB-3VBgogi9?authuser=3#scrollTo=eq6UqK-1k9j4&printMode=true 1/4
3/3/25, 12:37 PM NLP_Lab_1.ipynb - Colab
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!

keyboard_arrow_down POS tagging

import nltk
nltk.download('averaged_perceptron_tagger_eng')

text = "The quick brown fox jumps over the lazy dog."
words = word_tokenize(text)
pos_tags = nltk.pos_tag(words)

filtered_text = [word for word, tag in pos_tags if tag.startswith('NN')] # Keep only nouns
print(filtered_text) # Output: ['fox', 'dog']

[nltk_data] Downloading package averaged_perceptron_tagger_eng to

[nltk_data] /root/nltk_data...
[nltk_data] Unzipping taggers/averaged_perceptron_tagger_eng.zip.
['brown', 'fox', 'dog']

keyboard_arrow_down Stemming
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')
# Initialize the Porter Stemmer
ps = PorterStemmer()
# Example sentence
text = "Running runners run easily and are loving the adventure."
# Tokenize the sentence
words = word_tokenize(text)
# Apply stemming
stemmed_words = [ps.stem(word) for word in words]
print("Stemmed Words:", stemmed_words)

Stemmed Words: ['run', 'runner', 'run', 'easili', 'and', 'are', 'love', 'the', 'adventur
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!

keyboard_arrow_down lemmatization
https://colab.research.google.com/drive/1ZkyIk18BbWhzTjFZ_358EaB-3VBgogi9?authuser=3#scrollTo=eq6UqK-1k9j4&printMode=true 2/4
3/3/25, 12:37 PM NLP_Lab_1.ipynb - Colab

import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
# Download necessary datasets
nltk.download('wordnet')
nltk.download('punkt')
nltk.download('omw-1.4')
# Initialize lemmatizer
lemmatizer = WordNetLemmatizer()
# Sample text
text = "The leaves are falling from the trees and the wolves are howling."
# Tokenize words
words = word_tokenize(text)
# Apply lemmatization
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
print("Lemmatized Words:", lemmatized_words)

[nltk_data] Downloading package wordnet to /root/nltk_data...

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
Lemmatized Words: ['The', 'leaf', 'are', 'falling', 'from', 'the', 'tree', 'and', 'the',

keyboard_arrow_down word frequency count

from collections import Counter
import re
# Sample text
text = "Natural Language Processing is amazing! NLP is a subset of AI, and AI is the future.
# Preprocessing: Convert to lowercase and remove punctuation
text = re.sub(r'[^\w\s]', '', text.lower())
# Tokenize words
words = text.split()
# Count word frequency
word_counts = Counter(words)
print("Word Frequency:", word_counts)

Word Frequency: Counter({'is': 3, 'ai': 2, 'natural': 1, 'language': 1, 'processing': 1,

https://colab.research.google.com/drive/1ZkyIk18BbWhzTjFZ_358EaB-3VBgogi9?authuser=3#scrollTo=eq6UqK-1k9j4&printMode=true 3/4
3/3/25, 12:37 PM NLP_Lab_1.ipynb - Colab

https://colab.research.google.com/drive/1ZkyIk18BbWhzTjFZ_358EaB-3VBgogi9?authuser=3#scrollTo=eq6UqK-1k9j4&printMode=true 4/4

NLP Lab Manual Lab Work
No ratings yet
NLP Lab Manual Lab Work
24 pages
NLP Practicals All
No ratings yet
NLP Practicals All
57 pages
NLP Lab Manual 3-2 Aiml R22 Update
100% (1)
NLP Lab Manual 3-2 Aiml R22 Update
20 pages
Tokenization in NLP
No ratings yet
Tokenization in NLP
10 pages
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
BE03000081
No ratings yet
BE03000081
4 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
3 pages
IR Assignment1
No ratings yet
IR Assignment1
3 pages
20BCP123 - NLP Lab Manual
No ratings yet
20BCP123 - NLP Lab Manual
45 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
No ratings yet
Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
81 pages
Sahil NLP
No ratings yet
Sahil NLP
16 pages
7 Idf
No ratings yet
7 Idf
5 pages
NLP Lab File
No ratings yet
NLP Lab File
13 pages
Write A Python Program For The Following Preprocessing of Text in NLP: Tokenization Filtration Script Validation Stop Word Removal Stemming
No ratings yet
Write A Python Program For The Following Preprocessing of Text in NLP: Tokenization Filtration Script Validation Stop Word Removal Stemming
2 pages
Ir 1 Stop Word Removed
No ratings yet
Ir 1 Stop Word Removed
1 page
NLP Lab File
No ratings yet
NLP Lab File
15 pages
Removing Stopwords in NLP
No ratings yet
Removing Stopwords in NLP
32 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
Prog 1
No ratings yet
Prog 1
2 pages
7 TextAnalysis
No ratings yet
7 TextAnalysis
3 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP Lab 1
No ratings yet
NLP Lab 1
1 page
NLP Lab Manual
No ratings yet
NLP Lab Manual
32 pages
NLP Lab1
No ratings yet
NLP Lab1
2 pages
Text Preprocessing
No ratings yet
Text Preprocessing
3 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
ML Practical File
No ratings yet
ML Practical File
24 pages
Dsbdal A7
No ratings yet
Dsbdal A7
65 pages
Sumati
No ratings yet
Sumati
10 pages
Lab-1 - Tokenization, Stemming, Stopwords - Jupyter Notebook
No ratings yet
Lab-1 - Tokenization, Stemming, Stopwords - Jupyter Notebook
15 pages
PR 7
No ratings yet
PR 7
2 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
DS 7
No ratings yet
DS 7
3 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
Lecture 2 Tokenization
No ratings yet
Lecture 2 Tokenization
16 pages
Experiment 2 Manual
No ratings yet
Experiment 2 Manual
6 pages
Sree017 NLP
No ratings yet
Sree017 NLP
3 pages
NLP Basics
No ratings yet
NLP Basics
12 pages
Token Ization
No ratings yet
Token Ization
5 pages
Lab 2
No ratings yet
Lab 2
4 pages
NLP
No ratings yet
NLP
12 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
NLTK Tutorial
No ratings yet
NLTK Tutorial
33 pages
Exp2 Ananya 66 C NLP
No ratings yet
Exp2 Ananya 66 C NLP
7 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem
No ratings yet
AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem
20 pages
Exp1 NLP
No ratings yet
Exp1 NLP
2 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
NLP 02
No ratings yet
NLP 02
6 pages
Tokenizer
No ratings yet
Tokenizer
4 pages
NLP Experiment 2
No ratings yet
NLP Experiment 2
5 pages
Lab Prgms Weel1-Output
No ratings yet
Lab Prgms Weel1-Output
4 pages
NLP Programs
No ratings yet
NLP Programs
5 pages
Data Protection Risk in LLM
No ratings yet
Data Protection Risk in LLM
34 pages
Project
No ratings yet
Project
13 pages
15th International Conference On Soft Computing Models in Industrial and Environmental Applications (SOCO 2020)
No ratings yet
15th International Conference On Soft Computing Models in Industrial and Environmental Applications (SOCO 2020)
880 pages
Honors Unit 4
No ratings yet
Honors Unit 4
8 pages
Arpita Upadhyay (IT)
No ratings yet
Arpita Upadhyay (IT)
1 page
Introduction To Distributed Query Processing
No ratings yet
Introduction To Distributed Query Processing
10 pages
Data Mining and Visualization Question Bank
100% (1)
Data Mining and Visualization Question Bank
11 pages
Unit 1
No ratings yet
Unit 1
21 pages
Sp10-Gap Analysis - Global Aidc Icd
No ratings yet
Sp10-Gap Analysis - Global Aidc Icd
48 pages
Unit-5 FSD
No ratings yet
Unit-5 FSD
19 pages
Unit 15
No ratings yet
Unit 15
19 pages
HashTags Recommendations
No ratings yet
HashTags Recommendations
18 pages
Java Platforms / Editions: 3) Enterprise Application
No ratings yet
Java Platforms / Editions: 3) Enterprise Application
11 pages
Siemens Books
No ratings yet
Siemens Books
132 pages
Non-Employee Data Form 11-11-13
No ratings yet
Non-Employee Data Form 11-11-13
1 page
ISAI DT Conference Brochure - 3
No ratings yet
ISAI DT Conference Brochure - 3
5 pages
Ids Project
No ratings yet
Ids Project
25 pages
BDA Module1
No ratings yet
BDA Module1
64 pages
Yohannes Ephrem
No ratings yet
Yohannes Ephrem
70 pages
XML - HTML - Css
No ratings yet
XML - HTML - Css
3 pages
283251assignment 1 - Data Structures and Algorithms-1720430124132
No ratings yet
283251assignment 1 - Data Structures and Algorithms-1720430124132
2 pages
Curriculum Vitae: Rajat Agrawal
No ratings yet
Curriculum Vitae: Rajat Agrawal
3 pages
LinkedIn AI - Intro May2022
No ratings yet
LinkedIn AI - Intro May2022
30 pages
Adarsh Lokhande
No ratings yet
Adarsh Lokhande
2 pages
Spam Email Detection Using Deep Learning Techniques
No ratings yet
Spam Email Detection Using Deep Learning Techniques
6 pages
19int61 Internship - Ii Evaluation - PPT Format
No ratings yet
19int61 Internship - Ii Evaluation - PPT Format
11 pages
SI Assignment
No ratings yet
SI Assignment
1 page
Authors:-Muhammad Baqer Mollaha, Md. Abul Kalam Azada, Athanasios Vasilakos
No ratings yet
Authors:-Muhammad Baqer Mollaha, Md. Abul Kalam Azada, Athanasios Vasilakos
16 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

NLP - Lab - 1.ipynb - Colab

Uploaded by

NLP - Lab - 1.ipynb - Colab

Uploaded by

3/3/25, 12:37 PM NLP_Lab_1.

[nltk_data] Downloading package punkt_tab to /root/nltk_data...

keyboard_arrow_down Stopwords removal

text = "This is a sample sentence, showing off stop word filtering."

print(filtered_text) # Output: ['sample', 'sentence', ',', 'showing', 'stop', 'word', 'filt

['sample', 'sentence', ',', 'showing', 'stop', 'word', 'filtering', '.']

keyboard_arrow_down POS tagging

[nltk_data] Downloading package averaged_perceptron_tagger_eng to

[nltk_data] Downloading package wordnet to /root/nltk_data...

keyboard_arrow_down word frequency count

Word Frequency: Counter({'is': 3, 'ai': 2, 'natural': 1, 'language': 1, 'processing': 1,

You might also like