0% found this document useful (0 votes)

6 views

R22 Nlp Python Programs

The document provides a comprehensive guide on installing the NLTK library and performing various text processing tasks using Python. It includes instructions for tokenization, stop word removal, stemming, word analysis, word generation, and morphological analysis, along with example code snippets and outputs. Additionally, it covers the implementation of word sense disambiguation and part-of-speech tagging.

Uploaded by

pasupunoorisrujana3

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

R22 Nlp Python Programs

Uploaded by

pasupunoorisrujana3

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Steps for Install nltk library

Install python latest version python 3.9.6

Select command prompt and type text as python –version and click on enter

Type text as pip –version and click on enter

Type text as pip install nltk and click on enter

Open IDLE shell click on file and select new file

Click on save as give the program name with extension .py

Click on run select run module

If any package not found in nltk library go to IDLE shell type below commands and click on enter

import nltk

nltk.download()

Example:

nltk.download('stopwords')

1. Write a Python program to perform following tasks on text

a) Tokenization

Word tokenization:

import nltk

word_data = "It originated from the idea that there are readers who prefer learning new skills
from the comforts of their drawing rooms"

nltk_tokens = nltk.word_tokenize(word_data)

print (nltk_tokens)

Output:

['It', 'originated', 'from', 'the', 'idea', 'that', 'there', 'are', 'readers', 'who', 'prefer', 'learning',
'new', 'skills', 'from', 'the', 'comforts', 'of', 'their', 'drawing', 'rooms']
Sentence tokenization:

import nltk

sentence_data = "The First sentence is about Python. The Second: about Django. You can learn
Python,Django and Data Ananlysis here. "

nltk_tokens = nltk.sent_tokenize(sentence_data)

print (nltk_tokens)

Output:

['The First sentence is about Python.', 'The Second: about Django.', 'You can learn
Python,Django and Data Ananlysis here.']

Character tokenization

Import nltk

charact_data=" Python programming"

charact_tokens=list(charact_data)

print(charact_tokens)

Output:

['P', 'y', 't', 'h', 'o', 'n', 'p', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g'

b) Stop word Removal

from nltk.corpus import stopwords

en_stops = set(stopwords.words('english'))

all_words = ['There', 'is', 'a', 'tree','near','the','river']

for word in all_words:

if word not in en_stops:

print(word)

Output:

There
tree

near

river

2) Write a Python program to implement Porter stemmer algorithm for stemming

import nltk

from nltk.stem import Porter Stemmer

nltk.download('punkt')

stemmer=PorterStemmer()

words=["running","beautifulness","rivers","caresses","happily","studies","banking"]

stemmed_words=[stemmer.stem(word) for word in words]

print("Original Words:",words)

print("Stemmed Words",stemmed_words)

Output:

Original Words: ['running', 'beautifulness', 'rivers', 'caresses', 'happily', 'studies', 'banking']

Stemmed Words ['run', 'beauti', 'river', 'caress', 'happili', 'studi', 'bank']

3) Write a Python program for

a) Word Analysis

import re

from collections import Counter

def word_analysis(text):

# Convert text to lowercase and remove punctuation

text = text.lower()

text = re.sub(r'[^\w\s]', '', text)

# Split text into words

words = text.split()

# Count frequency of each word

word_freq = Counter(words)

# Calculate length of each word

word_len = {word: len(word) for word in words}

# Identify most common words

most_common_words = word_freq.most_common(10)

return word_freq, word_len, most_common_words

text = "This is an example sentence for word analysis. This sentence is just an example."

word_freq, word_len, most_common_words = word_analysis(text)

print("Word Frequency:")

for word, freq in word_freq.items():

print(f"{word}: {freq}")

print("\nWord Length:")

for word, length in word_len.items():

print(f"{word}: {length}")

print("\nMost Common Words:")

for word, freq in most_common_words:

print(f"{word}: {freq}")

Output:

Word Frequency:

this: 2

is: 2

an: 2
example: 2

sentence: 2

for: 1

word: 1

analysis: 1

just: 1

Word Length:

this: 4

is: 2

an: 2

example: 7

sentence: 8

for: 3

word: 4

analysis: 8

just: 4

Most Common Words:

this: 2

is: 2

an: 2

example: 2

sentence: 2
for: 1

word: 1

analysis: 1

just: 1

b) Word Generation

import random

import nltk

from nltk.corpus import wordnet

nltk.download('wordnet')

def generate_meaningful_words(part_of_speech, num_words):

synsets = list(wordnet.all_synsets(part_of_speech))

words = []

for _ in range(num_words):

synset = random.choice(synsets)

lemma = random.choice(synset.lemmas())

words.append(lemma.name())

return words

nouns = generate_meaningful_words('n', 10)

verbs = generate_meaningful_words('v', 10)

adjectives = generate_meaningful_words('a', 10)

adverbs = generate_meaningful_words('r', 10)

print("Nouns:")

for noun in nouns:

print(noun)
print("\nVerbs:")

for verb in verbs:

print(verb)

print("\nAdjectives:")

for adjective in adjectives:

print(adjective)

print("\nAdverbs:")

for adverb in adverbs:

print(adverb)

Output:

Nouns:

Haastia_pulvinaris

televangelist

genus_Estrilda

E._H._Weber

insidiousness

Evangelical_and_Reformed_Church

garnishee

semigloss

powder_keg

townspeople

Verbs:

encapsulate
remain

salve

cruise

credit

charge

drone_on

fume

sandblast

Adjectives:

stipendiary

reportable

stilly

live

adscititious

bindable

upper-class

god-awful

organized

untechnical

Adverbs:

pitty-patty
naturally

managerially

smartly

providently

dumbly

worse

tight

magniloquently

pointlessly

4. Create a sample list of at least 5 words with ambiguous sense and write a python program
to implement WSD.

import nltk

from nltk.corpus import wordnet as wn

from nltk.wsd import lesk

# Ensure that NLTK resources are downloaded

nltk.download('punkt')

nltk.download('wordnet')

nltk.download('omw-1.4')

# List of ambiguous words

ambiguous_words = ["bank", "bat", "bark", "pitch", "lead"]

# Example context sentences for each word

contexts = {

"bank": "I went to the river bank to relax by the water.",

"bat": "The bat flew out of the cave at dusk.",

"bark": "The dog barked loudly at the stranger.",

"pitch": "He gave a brilliant pitch to the investors.",

"lead": "He decided to lead the team on the project."

# Function to disambiguate word senses using Lesk algorithm

def disambiguate_word(word, context):

sense = lesk(context.split(), word)

if sense:

return sense.name() # Return the sense (meaning) of the word

else:

return "No sense found"

# Iterate over each ambiguous word and print its disambiguated sense based on context

for word in ambiguous_words:

context = contexts[word]

print(f"Word: {word}")

print(f"Context: {context}")

print(f"Disambiguated Sense: {disambiguate_word(word, context)}")

print("-" * 50)
Output:

Word: bank

Context: I went to the river bank to relax by the water.

Disambiguated Sense: bank.v.07

--------------------------------------------------

Word: bat

Context: The bat flew out of the cave at dusk.

Disambiguated Sense: bat.v.03

--------------------------------------------------

Word: bark

Context: The dog barked loudly at the stranger.

Disambiguated Sense: bark.n.04

--------------------------------------------------

Word: pitch

Context: He gave a brilliant pitch to the investors.

Disambiguated Sense: pitch.v.04

--------------------------------------------------

Word: lead

Context: He decided to lead the team on the project.

Disambiguated Sense: spark_advance.n.01

--------------------------------------------------

5. Install NLTK tool kit and perform stemming

import nltk

from nltk.stem import PorterStemmer

stemmer = PorterStemmer()

words = ['running', 'jumping', 'hiking', 'swimming']

stemmed_words = [stemmer.stem(word) for word in words]

print(stemmed_words)

Output:

['run', 'jump', 'hike', 'swim']

6. Create Sample list of at least 10 words POS tagging and find the POS for any given word

import nltk

from nltk import pos_tag, word_tokenize

nltk.download('punkt')

nltk.download('averaged_perceptron_tagger')

def find_pos_tag(word):

tokens = word_tokenize(word)

pos_tags = pos_tag(tokens)

return pos_tags[0][1]

word = input("Enter a word: ")

pos_tag = find_pos_tag(word)

print("The POS tag for '{}' is '{}'".format(word, pos_tag))

Output:

Enter a word: say

The POS tag for 'say' is 'VB'

Enter a word: the

The POS tag for 'the' is 'DT'

Enter a word: karimnagar

The POS tag for 'karimnagar' is 'NN'

Enter a word: good

The POS tag for 'good' is 'JJ'

7. Write a Python program to

a) Perform Morphological Analysis using NLTK library

import nltk

from nltk.stem import WordNetLemmatizer

from nltk.tokenize import word_tokenize

from nltk import pos_tag

nltk.download('punkt')

nltk.download('averaged_perceptron_tagger')

nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()

def morphological_analysis(text):

tokens = word_tokenize(text)

tagged_tokens = pos_tag(tokens)

lemmatized_tokens = []

for token, tag in tagged_tokens:

if tag.startswith('J'):

wordnet_tag = 'a'

elif tag.startswith('V'):

wordnet_tag = 'v'

elif tag.startswith('N'):

wordnet_tag = 'n'
elif tag.startswith('R'):

wordnet_tag = 'r'

else:

wordnet_tag = ''

if wordnet_tag:

lemmatized_token = lemmatizer.lemmatize(token, wordnet_tag)

else:

lemmatized_token = token

lemmatized_tokens.append(lemmatized_token)

return lemmatized_tokens

text = "The quick brown fox jumps over the lazy dog."

print("Original Text:")

print(text)

print("\nLemmatized Tokens:")

print(morphological_analysis(text))

Output:

Original Text:

The quick brown fox jumps over the lazy dog.

Lemmatized Tokens:

['The', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazy', 'dog', '.']

b) Generate n-grams using NLTK N-Grams library

Full Blast Plus for Ukraine 7 Workbook Пілотна Версія 89с
50% (2)
Full Blast Plus for Ukraine 7 Workbook Пілотна Версія 89с
10 pages
Write A Random Number Generator That Generates Random Numbers Between 1 and 6 (Simulates A Dice)
100% (2)
Write A Random Number Generator That Generates Random Numbers Between 1 and 6 (Simulates A Dice)
6 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
1
No ratings yet
1
13 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
H7 W5 NLP - Merged
No ratings yet
H7 W5 NLP - Merged
17 pages
NLP Op
No ratings yet
NLP Op
16 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
AI Lab Manual aktu
No ratings yet
AI Lab Manual aktu
11 pages
NLP PDF
No ratings yet
NLP PDF
3 pages
python manual_CSE
No ratings yet
python manual_CSE
14 pages
NLP - Practical List
No ratings yet
NLP - Practical List
14 pages
NLP (1)
No ratings yet
NLP (1)
12 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
Final_NLP_Lab_File
No ratings yet
Final_NLP_Lab_File
28 pages
PYTHON LAB MANUAL
No ratings yet
PYTHON LAB MANUAL
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
AI Lab Manual
No ratings yet
AI Lab Manual
24 pages
7.TextAnalysis
No ratings yet
7.TextAnalysis
3 pages
NLP Expts
No ratings yet
NLP Expts
41 pages
Karan Offfical PDF
No ratings yet
Karan Offfical PDF
16 pages
Programs code
No ratings yet
Programs code
7 pages
NLP 3
No ratings yet
NLP 3
3 pages
Sahil NLP
No ratings yet
Sahil NLP
16 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
Python Lab Programs
No ratings yet
Python Lab Programs
15 pages
All Practicals
No ratings yet
All Practicals
33 pages
Assignment 7
No ratings yet
Assignment 7
2 pages
12 CS Practical-1
No ratings yet
12 CS Practical-1
15 pages
CCS369 - Text and Speech Analysis
No ratings yet
CCS369 - Text and Speech Analysis
31 pages
DBMS
No ratings yet
DBMS
2 pages
Record Practicals - Xii
No ratings yet
Record Practicals - Xii
30 pages
Nlp Lab Manual
No ratings yet
Nlp Lab Manual
21 pages
python ques
No ratings yet
python ques
14 pages
NLP Projects
No ratings yet
NLP Projects
4 pages
Morphological Colab
No ratings yet
Morphological Colab
2 pages
Cs Lab Manual Class Xii
No ratings yet
Cs Lab Manual Class Xii
63 pages
ENROLLMENT NO.:-160280107033 PYTHON PROGRAMMING (2180711) : Be - Comp. - Sem-8 - Ldce Page
No ratings yet
ENROLLMENT NO.:-160280107033 PYTHON PROGRAMMING (2180711) : Be - Comp. - Sem-8 - Ldce Page
23 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
python programs
No ratings yet
python programs
10 pages
Computer Ssm 2
No ratings yet
Computer Ssm 2
8 pages
UsingNLTK - Jupyter Notebook
No ratings yet
UsingNLTK - Jupyter Notebook
3 pages
Python Code Examples
No ratings yet
Python Code Examples
30 pages
Experiment 1: Write A Python Program To Find Sum of Series (1+ (1+2) + (1+2+3) +-+ (1+2+3+ - +N) )
No ratings yet
Experiment 1: Write A Python Program To Find Sum of Series (1+ (1+2) + (1+2+3) +-+ (1+2+3+ - +N) )
20 pages
Python Programming Lab Manual
No ratings yet
Python Programming Lab Manual
10 pages
D22dce179 Ai Practical-3,4
No ratings yet
D22dce179 Ai Practical-3,4
6 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
PYTHON programs
No ratings yet
PYTHON programs
62 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
Plc Additional Programs
No ratings yet
Plc Additional Programs
5 pages
First Two Plc Lab Programs Eee_ec (1)
No ratings yet
First Two Plc Lab Programs Eee_ec (1)
10 pages
Aped For Fake News
No ratings yet
Aped For Fake News
6 pages
gen ai nw
No ratings yet
gen ai nw
12 pages
IPP LAB Manual
No ratings yet
IPP LAB Manual
18 pages
IR Practical Code
No ratings yet
IR Practical Code
13 pages
Python
No ratings yet
Python
11 pages
Programs for practical file for app
No ratings yet
Programs for practical file for app
47 pages
nlp2
No ratings yet
nlp2
3 pages
Data Science With Python
No ratings yet
Data Science With Python
31 pages
Xii Cs Practical Record
100% (1)
Xii Cs Practical Record
20 pages
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
Analysis of Results of First Periodical Test: Subject/Learning Area: English Grade & Section
No ratings yet
Analysis of Results of First Periodical Test: Subject/Learning Area: English Grade & Section
1 page
21 Days of Pronunciation4f4fd
No ratings yet
21 Days of Pronunciation4f4fd
8 pages
Module 1 Purposive Commnication
No ratings yet
Module 1 Purposive Commnication
9 pages
Study Guide: Graphic Novel
No ratings yet
Study Guide: Graphic Novel
6 pages
3 Answer
No ratings yet
3 Answer
73 pages
Sinif Sinav Öncesi̇ Çalişma Kağidi-1
No ratings yet
Sinif Sinav Öncesi̇ Çalişma Kağidi-1
5 pages
Sow English Year 2 Dpk Sk 2025 2026 by Rozayusacademy Kump b
No ratings yet
Sow English Year 2 Dpk Sk 2025 2026 by Rozayusacademy Kump b
11 pages
Eulogy of Roaches
No ratings yet
Eulogy of Roaches
1 page
Emma Pavydis - Gr-4-Goal Setting Menu
No ratings yet
Emma Pavydis - Gr-4-Goal Setting Menu
2 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
47 pages
Creating Effective Thesis Statements and Assertions
No ratings yet
Creating Effective Thesis Statements and Assertions
6 pages
Sentence Connectors
No ratings yet
Sentence Connectors
5 pages
Y8 Creative Writing Lessons
No ratings yet
Y8 Creative Writing Lessons
39 pages
English 2 Q1 Week 5
No ratings yet
English 2 Q1 Week 5
35 pages
Exam - 7-8-9
No ratings yet
Exam - 7-8-9
3 pages
Oxford Thinkers 6 Evaluation Material and Test Notes
No ratings yet
Oxford Thinkers 6 Evaluation Material and Test Notes
48 pages
(Carmen Fought) Sociolinguistic Variation Critica
No ratings yet
(Carmen Fought) Sociolinguistic Variation Critica
231 pages
CT Text PDF
No ratings yet
CT Text PDF
89 pages
My Korean Notebook: Home PDF File Quick List About MKN
No ratings yet
My Korean Notebook: Home PDF File Quick List About MKN
2 pages
Compounding
No ratings yet
Compounding
18 pages
Taller de Bilinguismo: Comparatives and Superlatives of Adjectives
No ratings yet
Taller de Bilinguismo: Comparatives and Superlatives of Adjectives
8 pages
Present Tenses
No ratings yet
Present Tenses
7 pages
Grammar Revision: 2.1 Active vs. Passive
No ratings yet
Grammar Revision: 2.1 Active vs. Passive
5 pages
Wa0013.
No ratings yet
Wa0013.
2 pages
Lpe2501 SCL Worksheet 1
No ratings yet
Lpe2501 SCL Worksheet 1
5 pages
RJ Act II Reading Guide
No ratings yet
RJ Act II Reading Guide
2 pages
English Grammar Express
No ratings yet
English Grammar Express
20 pages
Communication Skills-II - Cl-10
No ratings yet
Communication Skills-II - Cl-10
5 pages
Bahasa Inggris II SWE USE THE COMPARATIVE AND SUPERLATIVE CORRECTLY-1
No ratings yet
Bahasa Inggris II SWE USE THE COMPARATIVE AND SUPERLATIVE CORRECTLY-1
3 pages