0% found this document useful (0 votes)

357 views17 pages

Natural Language Processing

The document discusses various natural language processing tasks in Python including tokenization, stop word removal, stemming, part-of-speech tagging, morphological analysis, n-gram generation, and n-gram smoothing. Code examples with outputs are provided to demonstrate how to perform tokenization, stop word removal, stemming with PorterStemmer, word sense disambiguation with Lesk, part-of-speech tagging, morphological analysis with NLTK, n-gram generation with NLTK ngrams library, and n-gram smoothing.

Uploaded by

coding ak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

357 views17 pages

Natural Language Processing

Uploaded by

coding ak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

1.

Write a Python Program to perform following tasks on text a)

Tokenization b) Stop word Removal
import nltk

from nltk.tokenize import word_tokenize

from nltk.corpus import stopwords

def preprocess_text(text):

# Tokenization

tokens = word_tokenize(text)

# Removing stop words

stop_words = set(stopwords.words('english'))

filtered_tokens = [word for word in tokens if word.lower() not in stop_words]

return filtered_tokens

def main():

text = "NLTK is a leading platform for building Python programs to work with
human language data."

preprocessed_text = preprocess_text(text)

print("Original Text:")

print(text)

print("\nTokenized Text:")

print(preprocessed_text)
if __name__ == "__main__":

main()

Output:-

Original Text:

NLTK is a leading platform for building Python programs to work with

human language data.

Tokenized Text:

['NLTK', 'leading', 'platform', 'building', 'Python', 'programs', 'work', 'human',

'language', 'data', '.']

2. Write a Python program to implement Porter stemmer algorithm for

stemming
import nltk

from nltk.tokenize import word_tokenize

from nltk.corpus import stopwords

from nltk.stem import PorterStemmer

def preprocess_text(text):

# Tokenization

tokens = word_tokenize(text)

# Removing stop words

stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in
stop_words]

return filtered_tokens

def apply_stemming(tokens):

porter = PorterStemmer()

stemmed_tokens = [porter.stem(token) for token in tokens]

return stemmed_tokens

def main():

text = "NLTK is a leading platform for building Python programs to

work with human language data."

preprocessed_text = preprocess_text(text)

stemmed_text = apply_stemming(preprocessed_text)

print("Original Text:")

print(text)

print("\nTokenized Text:")

print(preprocessed_text)

print("\nStemmed Text:")

print(stemmed_text)

if __name__ == "__main__":

main()
Output:-
Original Text:

NLTK is a leading platform for building Python programs to work with human
language data.

Tokenized Text:

['NLTK', 'leading', 'platform', 'building', 'Python', 'programs', 'work', 'human',

'language', 'data', '.']

Stemmed Text:

['nltk', 'lead', 'platform', 'build', 'python', 'program', 'work', 'human', 'languag', 'data',
'.']

3. Write Python Program for a) Word Analysis b) Word Generation

with output.
import nltk

from nltk.corpus import brown

def word_analysis():

# Load the Brown corpus

nltk.download('brown')

words = brown.words()

# Calculate word frequency

freq_dist = nltk.FreqDist(words)
# Print 10 most common words

print("10 Most Common Words:")

print(freq_dist.most_common(10))

def word_generation():

# Load the Brown corpus

nltk.download('brown')

words = brown.words()

# Generate words using bigrams

bigrams = nltk.bigrams(words)

word_dict = {}

for w1, w2 in bigrams:

if w1 not in word_dict:

word_dict[w1] = []

word_dict[w1].append(w2)

# Generate a sentence

import random

sentence = []

current_word = random.choice(list(word_dict.keys()))

sentence.append(current_word)

for _ in range(10):

next_word = random.choice(word_dict[current_word])

sentence.append(next_word)
current_word = next_word

# Print the generated sentence

print("\nGenerated Sentence:")

print(' '.join(sentence))

def main():

print("Word Analysis:")

word_analysis()

print("\nWord Generation:")

word_generation()

if __name__ == "__main__":

main()

Output:-

Word Analysis:

10 Most Common Words:

[('the', 62713), (',', 58334), ('.', 49346), ('of', 36080), ('and', 27915), ('to', 25732), ('a',
21881), ('in', 19536), ('that', 10237), ('is', 10011)]

Word Generation:

Generated Sentence:

combination of radiologist in their own issues for financing their ability to create a
different thing . And in contrast to learn to the games where you have been
4. Create a Sample list for at least 5 words with ambiguous sense and
Write a Python program to implement WSD
from nltk.wsd import lesk

from nltk.tokenize import word_tokenize

def wsd(sample_sentences):

for sentence in sample_sentences:

words = word_tokenize(sentence)

for word in words:

synset = lesk(words, word)

if synset is not None:

print("Word:", word)

print("Definition:", synset.definition())

print("Example:", synset.examples())

print("-------------------------------------------------")

def main():

sample_sentences = [

"The bank can guarantee deposits will eventually cover future tuition costs
because it invests in adjustable-rate mortgage securities.",

"I went to the bank to deposit my money.",

"The bark of the tree was rough.",

"I heard a loud bark from the dog.",

"I need to address the issue with the address provided."

]
wsd(sample_sentences)

if __name__ == "__main__":

main()

Output:-

Word: bank

Definition: a financial institution that accepts deposits and channels the

money into lending activities

Example: ['he cashed a check at the bank', 'that bank holds the mortgage
on my home']

-------------------------------------------------

Word: bank

Definition: a financial institution where money is kept for saving or

commercial purposes or is invested, supplied for loans, or exchanged.

Example: ['he cashed a check at the bank', 'that bank holds the mortgage
on my home']

-------------------------------------------------

Word: bark

Definition: the sound made by a dog

Example: ['the dog's barking kept me awake all night']

-------------------------------------------------

Word: bark

Definition: tough protective covering of the woody stems and roots of trees
and other woody plants

Example: ['it was stripped of bark']

-------------------------------------------------
Word: address

Definition: the place where a person or organization can be found or

communicated with

Example: ['he didn't leave an address', 'my address is 123 Main Street']

-------------------------------------------------

Word: address

Definition: give a speech to

Example: ['The chairman addressed the board of trustees']

-------------------------------------------------

5. Install NLTK tool kit and perform stemming

import nltk

nltk.download('punkt')

nltk.download('stopwords')

nltk.download('wordnet')

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize

# Sample text

text = "It is important to be very pythonly while you are

pythoning with python. All pythoners have pythoned poorly at
least once."
# Tokenize the text

words = word_tokenize(text)

# Create a PorterStemmer object

porter = PorterStemmer()

# Stem each word in the text

stemmed_words = [porter.stem(word) for word in words]

# Print the stemmed words

print("Original text:")

print(text)

print("\nStemmed text:")

print(" ".join(stemmed_words))

output:-

Original text:

It is important to be very pythonly while you are pythoning with

python. All pythoners have pythoned poorly at least once.

Stemmed text:

It is import to be veri pythonli while you are python with python .

all python have python poorli at least onc .
6. Create Sample list of at least 10 words POS tagging and find the
POS for any given word

import nltk

# Sample list of words

sample_words = ["Python", "Programming", "Language", "is", "widely", "used",
"for", "developing", "various", "applications"]

# Perform POS tagging

pos_tags = nltk.pos_tag(sample_words)

# Function to find POS for a given word

def find_pos(word):
for w, pos in pos_tags:
if w.lower() == word.lower():
return pos
return "POS not found"

# Test the function with a given word

given_word = "Python"
pos = find_pos(given_word)
print(f"POS tag for '{given_word}': {pos}")

Output:-

POS tag for 'Python': NN

7. Write a Python program to

a) Perform Morphological Analysis using NLTK library

b) Generate n-grams using NLTK N-Grams library

c) Implement N-Grams Smoothing also give me output

import nltk

from nltk.util import ngrams

from nltk.corpus import stopwords

from nltk.stem import WordNetLemmatizer

from collections import Counter

import math

def morphological_analysis(text):

# Tokenize the text

tokens = nltk.word_tokenize(text)

# Remove stopwords

stop_words = set(stopwords.words('english'))

filtered_tokens = [word for word in tokens if word.lower() not in stop_words]

# Perform lemmatization

lemmatizer = WordNetLemmatizer()

lemmas = [lemmatizer.lemmatize(token) for token in filtered_tokens]

return lemmas

def generate_ngrams(text, n):

# Tokenize the text

tokens = nltk.word_tokenize(text)

# Generate n-grams

n_grams = list(ngrams(tokens, n))

return n_grams

def calculate_ngram_smoothing(n_grams):

# Count occurrences of n-grams

n_gram_counts = Counter(n_grams)

# Calculate probabilities with Laplace smoothing

n_gram_probabilities = {}

for n_gram in n_gram_counts:

context = n_gram[:-1]

context_count = sum(1 for ng in n_grams if ng[:-1] == context)

probability = (n_gram_counts[n_gram] + 1) / (context_count + len(n_gram_counts))

n_gram_probabilities[n_gram] = probability
return n_gram_probabilities

def main():

text = "The quick brown fox jumps over the lazy dog."

print("Original Text:", text)

# a) Morphological Analysis

morph_analysis_result = morphological_analysis(text)

print("\nMorphological Analysis:", morph_analysis_result)

# b) Generate n-grams

n=3

n_grams = generate_ngrams(text, n)

print("\n{}-grams:".format(n), n_grams)

# c) N-Grams Smoothing

n_gram_probabilities = calculate_ngram_smoothing(n_grams)

print("\nN-Gram Probabilities (with Laplace smoothing):", n_gram_probabilities)

if __name__ == "__main__":

main()

Output:-
Original Text: The quick brown fox jumps over the lazy dog.

Morphological Analysis: ['The', 'quick', 'brown', 'fox', 'jump', 'lazy', 'dog', '.']

3-grams: [('The', 'quick', 'brown'), ('quick', 'brown', 'fox'), ('brown', 'fox', 'jumps'),
('fox', 'jumps', 'lazy'), ('jumps', 'lazy', 'dog'), ('lazy', 'dog', '.')]

N-Gram Probabilities (with Laplace smoothing): {('The', 'quick', 'brown'):

0.16666666666666666, ('quick', 'brown', 'fox'): 0.16666666666666666, ('brown',
'fox', 'jumps'): 0.16666666666666666, ('fox', 'jumps', 'lazy'): 0.16666666666666666,
('jumps', 'lazy', 'dog'): 0.16666666666666666, ('lazy', 'dog', '.'):
0.16666666666666666}

8. Using NLTK package to convert audio file to text and text file to
audio files.
import speech_recognition as sr

import pyttsx3

def audio_to_text(audio_file):

# Initialize the recognizer

recognizer = sr.Recognizer()

# Load the audio file

with sr.AudioFile(audio_file) as source:

audio_data = recognizer.record(source)
# Convert audio to text

try:

text = recognizer.recognize_google(audio_data)

return text

except sr.UnknownValueError:

return "Speech Recognition could not understand audio"

except sr.RequestError as e:

return f"Could not request results from Speech Recognition service; {e}"

def text_to_audio(text, output_file):

# Initialize the Text-to-Speech engine

engine = pyttsx3.init()

# Save the text to an audio file

engine.save_to_file(text, output_file)

engine.runAndWait()

if __name__ == "__main__":

# Audio file to text

audio_file = "audio_sample.wav"

text = audio_to_text(audio_file)

print("Text from audio:", text)

# Text to audio

output_file = "output_audio.wav"
text_to_audio(text, output_file)

print("Text converted to audio")

Output:-

Text from audio: hello how are you

Text converted to audio

Exercises With Solutions On OOP
100% (1)
Exercises With Solutions On OOP
8 pages
AI Full Notes
50% (2)
AI Full Notes
81 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
45 pages
DEVOPS Spectrum
No ratings yet
DEVOPS Spectrum
44 pages
Ethical Hacking Unit-1
No ratings yet
Ethical Hacking Unit-1
30 pages
FIOT Unit-5
No ratings yet
FIOT Unit-5
24 pages
Acer Aspire One D270 Service Manual-Aod270
50% (2)
Acer Aspire One D270 Service Manual-Aod270
405 pages
Word Level Analysis
No ratings yet
Word Level Analysis
49 pages
21ML1601 NLP QB
No ratings yet
21ML1601 NLP QB
34 pages
NLP Lect Unit I
100% (1)
NLP Lect Unit I
140 pages
NLP Notes Unit-3
No ratings yet
NLP Notes Unit-3
19 pages
NLP Sem Questions and Answers
No ratings yet
NLP Sem Questions and Answers
72 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
Cse Flat Digital Notes Full 2020 21
No ratings yet
Cse Flat Digital Notes Full 2020 21
195 pages
KRR Unit I Notes
100% (1)
KRR Unit I Notes
32 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
FIOT Unit-1 Notes
No ratings yet
FIOT Unit-1 Notes
27 pages
Information Visualization Technologies
No ratings yet
Information Visualization Technologies
15 pages
Question Bank For Ai
0% (1)
Question Bank For Ai
2 pages
NLP Unit-3-Semantics-And-Pragmatics
No ratings yet
NLP Unit-3-Semantics-And-Pragmatics
20 pages
Path, Path Products and Regular Expressions - G9
No ratings yet
Path, Path Products and Regular Expressions - G9
37 pages
STM Question Paper R18
No ratings yet
STM Question Paper R18
2 pages
Unit 4 NLP Notes
No ratings yet
Unit 4 NLP Notes
35 pages
STM Notes
No ratings yet
STM Notes
153 pages
FLAT - UNIT 1 Notes
100% (2)
FLAT - UNIT 1 Notes
18 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
Python Full Notes - Working
100% (4)
Python Full Notes - Working
645 pages
ML Lab Mannual R22 Cse (DS)
No ratings yet
ML Lab Mannual R22 Cse (DS)
46 pages
States, State Graphs, and Transition Testing: Unit Iv
No ratings yet
States, State Graphs, and Transition Testing: Unit Iv
42 pages
NLP Question Bank
No ratings yet
NLP Question Bank
1 page
Dbms Lab Manual II Cse II Sem
No ratings yet
Dbms Lab Manual II Cse II Sem
58 pages
STM Unit 5
No ratings yet
STM Unit 5
31 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
51 pages
Reasons For Studying Concepts
100% (1)
Reasons For Studying Concepts
2 pages
NLP QB
100% (2)
NLP QB
14 pages
Data Analytics Unit-3 Notes
No ratings yet
Data Analytics Unit-3 Notes
21 pages
Information Sheet 1.1: Ms Word
No ratings yet
Information Sheet 1.1: Ms Word
9 pages
NLP UNIT 2 (Ques Ans Bank)
No ratings yet
NLP UNIT 2 (Ques Ans Bank)
26 pages
KRR Unit-5
100% (1)
KRR Unit-5
51 pages
Recursively Enumerable Languages
No ratings yet
Recursively Enumerable Languages
8 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
Explain Item Normalization?
No ratings yet
Explain Item Normalization?
7 pages
STM Viva Que
100% (2)
STM Viva Que
54 pages
3 Months Training
No ratings yet
3 Months Training
21 pages
SEM-2-NLP Questions
No ratings yet
SEM-2-NLP Questions
3 pages
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
No ratings yet
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
7 pages
KRR Notes
No ratings yet
KRR Notes
5 pages
STM Lab Manual
No ratings yet
STM Lab Manual
50 pages
Animal Detection and Prevention in Agri Field Using Iot
No ratings yet
Animal Detection and Prevention in Agri Field Using Iot
36 pages
Unit 4 Knowledge Representation
No ratings yet
Unit 4 Knowledge Representation
13 pages
Python Programming Exam Paper
100% (1)
Python Programming Exam Paper
3 pages
NLP Unit 5
No ratings yet
NLP Unit 5
10 pages
Natural Language Processing Parsing Techniques:: Unit IV
100% (1)
Natural Language Processing Parsing Techniques:: Unit IV
24 pages
NLP Lab Manual Updated
No ratings yet
NLP Lab Manual Updated
34 pages
Unification and Lifting
No ratings yet
Unification and Lifting
8 pages
NLP Module 4 Notes
No ratings yet
NLP Module 4 Notes
8 pages
NLP Important and Super Important Questions-18CS743
No ratings yet
NLP Important and Super Important Questions-18CS743
2 pages
NLP ORAL - Sample Question Bank: Modul e No. Sr. No - Description
No ratings yet
NLP ORAL - Sample Question Bank: Modul e No. Sr. No - Description
9 pages
UP 2210 Hardware Installation and Configuration Manual 1
No ratings yet
UP 2210 Hardware Installation and Configuration Manual 1
127 pages
Evs TM Unit 5
No ratings yet
Evs TM Unit 5
34 pages
Web Project Proposal by Slidesgo
No ratings yet
Web Project Proposal by Slidesgo
37 pages
IDeliverable - Writing An Orchard Webshop Module From Scratch - Part 4
No ratings yet
IDeliverable - Writing An Orchard Webshop Module From Scratch - Part 4
41 pages
TRA Bulk Employees Uploading by Excel User Guide
No ratings yet
TRA Bulk Employees Uploading by Excel User Guide
13 pages
KRR Unit-3
No ratings yet
KRR Unit-3
19 pages
Non-Vitrea Floating License Server Install Guide
No ratings yet
Non-Vitrea Floating License Server Install Guide
24 pages
XP - e Treme Rogramming
No ratings yet
XP - e Treme Rogramming
60 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
NFTs and On Chain
No ratings yet
NFTs and On Chain
11 pages
Unit 3
No ratings yet
Unit 3
33 pages
KRR Unit 4 Part 1 Lecture Notes
No ratings yet
KRR Unit 4 Part 1 Lecture Notes
9 pages
Android Services With Examples
No ratings yet
Android Services With Examples
9 pages
Forticlient Ems 7.0.9 Release Notes
No ratings yet
Forticlient Ems 7.0.9 Release Notes
22 pages
21bca1953 DS (3.1)
No ratings yet
21bca1953 DS (3.1)
5 pages
All Sem Mark Sheet
No ratings yet
All Sem Mark Sheet
9 pages
AIML Projectsynopsis Format 2024-25
No ratings yet
AIML Projectsynopsis Format 2024-25
4 pages
Designing, Creating Alogo
No ratings yet
Designing, Creating Alogo
7 pages
Linux Fresher CV Format
100% (3)
Linux Fresher CV Format
4 pages
Csci 260 Study Guide-10
No ratings yet
Csci 260 Study Guide-10
10 pages
Unit 5 1
No ratings yet
Unit 5 1
18 pages
Spring Rest Api Documenting
No ratings yet
Spring Rest Api Documenting
6 pages
05 Huawei MindSpore AI Development Framework
No ratings yet
05 Huawei MindSpore AI Development Framework
28 pages
Smash 3000
No ratings yet
Smash 3000
4 pages
Adm510 Planning Report - Group 5
No ratings yet
Adm510 Planning Report - Group 5
6 pages
1-NLP - Lab Manual
No ratings yet
1-NLP - Lab Manual
15 pages
7CS082 Database Development CW 2022 Spec
No ratings yet
7CS082 Database Development CW 2022 Spec
8 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
11 pages
LM7 Approximate Inference in BN
No ratings yet
LM7 Approximate Inference in BN
18 pages
Computer Aided Manufacturing
No ratings yet
Computer Aided Manufacturing
2 pages
Unit 3 AI Srs 13-14
No ratings yet
Unit 3 AI Srs 13-14
45 pages