100% found this document useful (1 vote)

4K views20 pages

NLP Lab Manual 3-2 Aiml R22 Update

The document outlines the syllabus and lab experiments for the Natural Language Processing course at Jawaharlal Nehru Technological University, Hyderabad. It includes objectives, outcomes, and detailed descriptions of various programming tasks using Python and the NLTK library, such as tokenization, stemming, word analysis, and word generation. Additionally, it lists textbooks and reference materials related to the course content.

Uploaded by

Vemula Naresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

4K views20 pages

NLP Lab Manual 3-2 Aiml R22 Update

Uploaded by

Vemula Naresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

lOMoARcPSD|34374867

AM604PC Natural Language Processing LAB R22 AI&ML 3rd

yr 2nd sem
Natural Language Processing lab (Jawaharlal Nehru Technological University,
Hyderabad)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by Vemula Naresh ([email protected])
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Prerequisites: Data structures, 昀椀nite automata and probability theory.

Course Objec琀椀ves: To Develop and explore the problems and solu琀椀ons of NLP

Course Outcomes:

Show sensi琀椀vity to linguis琀椀c phenomena and an ability to model them with formal grammars.
Knowledge on NLTK Library implementa琀椀on
Work on strings and trees, and es琀椀mate parameters using supervised and unsupervised training
methods.

List of Experiments

1. Write a Python Program to perform following tasks on text

a) Tokeniza琀椀on b) Stop word Removal
2. Write a Python program to implement Porter stemmer algorithm for stemming
3. Write Python Program for a) Word Analysis b) Word Genera琀椀on
4. Create a Sample list for at least 5 words with ambiguous sense and Write a Python program
to implement WSD
5. Install NLTK tool kit and perform stemming
6. Create Sample list of at least 10 words POS tagging and 昀椀nd the POS for any given word
7. Write a Python program to
a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Gram’s library
c) Implement N-Grams Smoothing
8. Using NLTK package to convert audio 昀椀le to text and text 昀椀le to audio 昀椀les.

TEXT BOOKS:

1. Mul琀椀lingual natural Language Processing Applica琀椀ons: From Theory to Prac琀椀ce – Daniel M.

Bikel and Imed Zitouni, Pearson Publica琀椀on.
2. Oreilly Prac琀椀cal natural Language Processing, A Comprehensive Guide to Building Real World
NLP Systems.
3. Daniel Jurafsky, James H. Mar琀椀n―Speech and Language Processing: An Introduc琀椀on to
Natural Language Processing, Computa琀椀onal Linguis琀椀cs and Speech, Pearson Publica琀椀on,
2014.

REFERENCE BOOKS:

1. Steven Bird, Ewan Klein and Edward Loper, ―Natural Language Processing with Python, First
Edi琀椀on, O‘Reilly Media, 2009.

Page 1 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.1: Write a Python Program to perform following tasks on text

a) Tokeniza琀椀on b) Stop word Removal

Aim: Write a Python Program to perform following tasks on text a) Tokeniza琀椀on & b) Stop word
Removal

Descrip琀椀on:
1) Import Necessary Libraries: The program imports the required libraries from NLTK, namely
word_tokenize for tokenization and stop_words for obtaining a list of stop words.

2) Download NLTK Resources: Before using NLTK functions, the program checks if the required resources
(word tokenizer and stop words corpus) are downloaded. If not, it downloads them.

3) Tokenization and Stop Word Removal Function: The tokenize_and_remove_stopwords function takes
the input text as an argument. It tokenizes the text using NLTK's word_tokenize function to split it into
individual words. Then, it retrieves the English stop words using stopwords.words('english'). Finally, it
removes stop words from the list of tokens and returns the filtered tokens.

4) Main Function: The main function serves as the entry point of the program. It contains a sample text for
demonstration purposes. It calls the tokenize_and_remove_stopwords function to process the text and then
prints both the original text and the processed text with stop words removed.

Program:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
# Download NLTK resources if not already downloaded
nltk.download('punkt')
nltk.download('stopwords')
def tokenize_and_remove_stopwords(text):
"""
Tokenizes the input text and removes stop words.
Args:
text (str): The input text to be processed.
Returns:
list: A list of tokens a昀琀er removing stop words.
"""
# Tokenize the text
tokens = word_tokenize(text)
# Get English stop words
stop_words = set(stopwords.words('english'))
# Remove stop words from tokens
昀椀ltered_tokens = [word for word in tokens if word.lower() not in stop_words]
return 昀椀ltered_tokens
def main():
"""
Main func琀椀on to demonstrate tokeniza琀椀on and stop word removal.

Page 2 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

"""
# Sample text for demonstra琀椀on
text = "This is a sample sentence, showing o昀昀 the stop words removal and tokeniza琀椀on."
# Tokenize and remove stop words
processed_text = tokenize_and_remove_stopwords(text)
# Print original text
print("Original text:")
print(text)
# Print tokenized text with stop words removed
print("\nTokenized text with stop words removed:")
print(processed_text)
if __name__ == "__main__":
main()

This program 昀椀rst tokenizes the input text using NLTK's word_tokenize func琀椀on and then removes
stop words using NLTK's English stop words list. Finally, it prints both the original text and the
processed text with stop words removed.

Output:
Original text: This is a sample sentence, showing o昀昀 the stop words removal and tokeniza琀椀on.
Tokenized text with stop words removed:
['sample', 'sentence', ',', 'showing', 'stop', 'words', 'removal', 'tokeniza琀椀on', '.']

Original text: The original input text is displayed.

Tokenized text with stop words removed: The input text is tokenized into individual words, and then
stop words are removed. The resul琀椀ng list contains only the meaningful words from the original text,
excluding common stop words like "This", "is", "a", "the", "and", etc.

Viva:
1) What is tokeniza琀椀on?
2) How does tokeniza琀椀on di昀昀er from stemming or lemma琀椀za琀椀on?
3) What are stop words, and why are they removed during text processing?
4) Why is it important to convert all words to lowercase before removing stop words?
5) Can you explain the purpose of NLTK in natural language processing tasks like tokeniza琀椀on and
stop word removal?

Page 3 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.2: Write a Python program to implement Porter stemmer algorithm for
stemming

Aim: Write a Python program to implement Porter stemmer algorithm for stemming

Descrip琀椀on:
1) Import Necessary Libraries: The program imports the PorterStemmer class from the nltk.stem
module. This class implements the Porter stemming algorithm.
2) Porter Stemming Func琀椀on: The porter_stemmer_example func琀椀on takes a list of words as input.
It ini琀椀alizes a PorterStemmer object and applies the stemming algorithm to each word in the list
using the stem method of the PorterStemmer object. The stemmed words are collected in a new
list, which is then returned.
3) Main Func琀椀on: The main func琀椀on serves as the entry point of the program. It contains a sample
list of words for demonstra琀椀on. It calls the porter_stemmer_example func琀椀on to perform
stemming on the sample words and then prints both the original words and the stemmed words.
The Porter stemming algorithm reduces words to their root forms, which can help in tasks like text
normaliza琀椀on and informa琀椀on retrieval. It removes common su昀케xes from words, but it might not
always produce a valid word, as it operates based on a set of rules.

Program:
import nltk
from nltk.stem import PorterStemmer
def porter_stemming(text):
"""
Applies Porter stemming algorithm to the input text.
Args:
text (str): The input text to be stemmed.
Returns:
str: The stemmed text.
"""
# Ini琀椀alize Porter stemmer
porter = PorterStemmer()
# Tokenize the text
tokens = nltk.word_tokenize(text)
# Apply stemming to each token
stemmed_tokens = [porter.stem(token) for token in tokens]
# Join the stemmed tokens back into a single string
stemmed_text = ' '.join(stemmed_tokens)
return stemmed_text
def main():
"""
Main func琀椀on to demonstrate Porter stemming.
"""

Page 4 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

# Sample text for demonstra琀椀on

text = "It is important to be very pythonly while you are pythoning with python. All pythoners have
pythoned poorly at least once."
# Apply Porter stemming
stemmed_text = porter_stemming(text)
# Print original and stemmed text
print("Original text:")
print(text)
print("\nStemmed text using Porter stemming algorithm:")
print(stemmed_text)
if __name__ == "__main__":
main()

This program u琀椀lizes NLTK's PorterStemmer class to perform stemming on the given text. The
porter_stemming func琀椀on tokenizes the input text, applies stemming to each token using the Porter
stemmer, and then joins the stemmed tokens back into a single string. Finally, the main func琀椀on
demonstrates the use of the Porter stemming algorithm by applying it to a sample text.

Output:
Original text:
It is important to be very pythonly while you are pythoning with python. All pythoners have
pythoned poorly at least once.
Stemmed text using Porter stemming algorithm:
It is import to be veri pythonli while you are python with python . all python have python poorli at
least onc .

Viva:
1) What is stemming, and why is it used in natural language processing?
2) How does the Porter stemming algorithm work?
3) Can you explain the process of tokeniza琀椀on in the context of stemming?
4) Why is it necessary to preprocess text before applying stemming algorithms?
5) What are some advantages and limita琀椀ons of the Porter stemming algorithm compared to other
stemming algorithms?

Page 5 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.3: Write Python Program for a) Word Analysis b) Word Genera琀椀on

Aim: Write Python Program for a) Word Analysis b) Word Genera琀椀on

Descrip琀椀on:
1) Word Analysis: In this program, we analyze a given text to understand the frequency of
occurrence of each word. This helps in gaining insights into the most commonly used words in
the text.
2) Word Genera琀椀on: We generate new words using the concept of Markov chains. Markov chains
are stochas琀椀c models that describe a sequence of possible events in which the probability of
each event depends only on the state a琀琀ained in the previous event. We use NLTK's
MarkovModel for word genera琀椀on based on the provided text.

Program:
import nltk
import random
def word_analysis(text):
"""
Analyzes the given text to calculate the frequency of occurrence of each word.
Args:
text (str): The input text to be analyzed.
Returns:
dict: A dic琀椀onary containing words as keys and their frequencies as values.
"""
tokens = nltk.word_tokenize(text)
word_freq = nltk.FreqDist(tokens)
return word_freq
def word_genera琀椀on(text, num_words=10):
"""
Generates new words using Markov chains based on the provided text.
Args:
text (str): The input text to generate new words from.
num_words (int): The number of words to generate.
Returns:
list: A list of generated words.
"""
tokens = nltk.word_tokenize(text)
model = nltk.MarkovModel(tokens)
generated_words = model.generate(num_words)
return generated_words
def main():

Page 6 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

"""
Main func琀椀on to demonstrate word analysis and genera琀椀on.
"""
# Sample text for demonstra琀椀on
text = "The quick brown fox jumps over the lazy dog. The dog barks loudly. The fox runs away
quickly."
# Word analysis
word_freq = word_analysis(text)
print("Word Analysis:")
print(word_freq.most_common(5)) # Display 5 most common words
# Word genera琀椀on
generated_words = word_genera琀椀on(text)
print("\nWord Genera琀椀on:")
print(generated_words)
if __name__ == "__main__":
main()

Output:
Word Analysis:
[('The', 4), ('quick', 1), ('brown', 1), ('fox', 2), ('jumps', 1)]
Word Genera琀椀on:
['jumps', 'over', 'the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'quick']

Viva:
1) How does word analysis help in understanding the characteris琀椀cs of a text?
2) What is the purpose of using Markov chains in word genera琀椀on?
3) How does NLTK's FreqDist func琀椀on work in word analysis?
4) Can you explain the concept of stochas琀椀c models in the context of Markov chains?
5) What are some poten琀椀al applica琀椀ons of word genera琀椀on using Markov models in natural
language processing?

Page 7 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.4: Create a Sample list for at least 5 words with ambiguous sense and Write a
Python program to implement WSD

Aim: Create a Sample list for at least 5 words with ambiguous sense and Write a Python program
to implement WSD

Descrip琀椀on:
Word Sense Disambigua琀椀on (WSD) is the task of determining the correct meaning of a word with
mul琀椀ple meanings (senses) based on the context in which it appears. The Lesk algorithm is a popular
approach for WSD, which compares the meanings of words in a given context with the meanings of
words in the dic琀椀onary de昀椀ni琀椀ons.
Sample List of Words with Ambiguous Senses:
"bank"
"bat"
"crane"
"light"
"bass"

Program:
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize
def wsd(word, sentence):
"""
Implements Word Sense Disambigua琀椀on (WSD) using the Lesk algorithm.
Args:
word (str): The ambiguous word for which WSD is performed.
sentence (str): The sentence containing the ambiguous word.
Returns:
str: The disambiguated sense of the word.
"""
tokens = word_tokenize(sentence)
sense = lesk(tokens, word)
return sense.de昀椀ni琀椀on() if sense else "No appropriate sense found"
def main():
"""
Main func琀椀on to demonstrate Word Sense Disambigua琀椀on (WSD).
"""
# Sample list of words with ambiguous senses
words = ["bank", "bat", "crane", "light", "bass"]

Page 8 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

# Sample sentence for each word

sentences = [
"I deposited money in the bank.",
"The baseball player swung the bat.",
"The crane li昀琀ed heavy loads at the construc琀椀on site.",
"Turn on the light, please.",
"He caught a large bass while 昀椀shing."
]
# Perform WSD for each word in the list
for word, sentence in zip(words, sentences):
print(f"Word: {word}")
print(f"Sentence: {sentence}")
print(f"Sense: {wsd(word, sentence)}\n")
if __name__ == "__main__":
main()

Output:
Word: bank
Sentence: I deposited money in the bank.
Sense: a 昀椀nancial ins琀椀tu琀椀on that accepts deposits and channels the money into lending ac琀椀vi琀椀es
Word: bat
Sentence: The baseball player swung the bat.
Sense: (baseball) a club used for hi琀�ng a ball in various games
Word: crane
Sentence: The crane li昀琀ed heavy loads at the construc琀椀on site.
Sense: large long-necked wading bird of marshes and plains in many parts of the world
Word: light
Sentence: Turn on the light, please.
Sense: (physics) electromagne琀椀c radia琀椀on that can produce a visual sensa琀椀on
Word: bass
Sentence: He caught a large bass while 昀椀shing.
Sense: the lowest part of the musical range

Viva:
1) What is Word Sense Disambigua琀椀on (WSD) and why is it important in natural language
processing?
2) Can you explain the Lesk algorithm and how it works for WSD?
3) How does the context of a word in a sentence help in determining its correct sense?
4) What are some challenges faced in implemen琀椀ng WSD algorithms?
5) Are there any limita琀椀ons of the Lesk algorithm? If so, what are they, and how can they be
addressed?

Page 9 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.5: Install NLTK tool kit and perform stemming

Aim: Install NLTK tool kit and perform stemming

Descrip琀椀on:
To install NLTK, you can use pip, Python's package manager. Here's the command to install NLTK:
pip install nltk
Once NLTK is installed, you can perform stemming using various stemming algorithms available in
NLTK. One of the popular stemming algorithms is the Porter stemming algorithm.
Stemming is the process of reducing words to their root or base form. NLTK provides various
stemming algorithms, such as the Porter, Lancaster, and Snowball stemmers. In this program, we'll
use the Porter stemming algorithm to perform stemming on a sample text.

Program:
import nltk
from nltk.stem import PorterStemmer
# Download NLTK resources if not already downloaded
nltk.download('punkt')
def perform_stemming(text):
"""
Performs stemming on the input text using the Porter stemming algorithm.
Args:
text (str): The input text to be stemmed.
Returns:
str: The stemmed text.
"""
# Ini琀椀alize the Porter stemmer
porter = PorterStemmer()
# Tokenize the text
tokens = nltk.word_tokenize(text)
# Apply stemming to each token
stemmed_tokens = [porter.stem(token) for token in tokens]
# Join the stemmed tokens back into a single string
stemmed_text = ' '.join(stemmed_tokens)
return stemmed_text
def main():
"""
Main func琀椀on to demonstrate stemming using NLTK.

Page 10 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

"""
# Sample text for demonstra琀椀on
text = "It is important to be very pythonly while you are pythoning with python."
# Perform stemming
stemmed_text = perform_stemming(text)
# Print original and stemmed text
print("Original text:")
print(text)
print("\nStemmed text using Porter stemming algorithm:")
print(stemmed_text)
if __name__ == "__main__":
main()

Output:
Original text:
It is important to be very pythonly while you are pythoning with python.
Stemmed text using Porter stemming algorithm:
It is import to be veri pythonli while you are python with python.

Viva:
1) What is stemming, and why is it used in natural language processing?
2) Can you explain the Porter stemming algorithm and how it works?
3) How does NLTK facilitate stemming in Python?
4) Are there any limita琀椀ons of the Porter stemming algorithm? If so, what are they?
5) How does stemming di昀昀er from lemma琀椀za琀椀on, and in what scenarios would you prefer one over
the other?

Page 11 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.6: Create Sample list of at least 10 words POS tagging and 昀椀nd the POS for any
given word

Aim: Create Sample list of at least 10 words POS tagging and 昀椀nd the POS for any given word

Descrip琀椀on:
Part-of-Speech (POS) tagging is the process of assigning gramma琀椀cal categories (such as noun, verb,
adjec琀椀ve, etc.) to words in a text. NLTK provides a variety of tools and algorithms for POS tagging,
which can be used to analyze and understand the structure of sentences.

Program:
import nltk
def pos_tagging(words):
"""
Performs Part-of-Speech (POS) tagging on the given list of words.
Args:
words (list): The list of words to be tagged.
Returns:
list: A list of tuples containing (word, POS_tag) pairs.
"""
tagged_words = nltk.pos_tag(words)
return tagged_words
def 昀椀nd_pos(word, tagged_words):
"""
Finds the Part-of-Speech (POS) tag for the given word in the tagged words.
Args:
word (str): The word for which POS tag needs to be found.
tagged_words (list): A list of tuples containing (word, POS_tag) pairs.
Returns:
str: The POS tag for the given word.
"""
for tagged_word in tagged_words:
if tagged_word[0].lower() == word.lower():
return tagged_word[1]
return "POS tag not found"
def main():

Page 12 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

"""
Main func琀椀on to demonstrate POS tagging and 昀椀nding POS for a given word.
"""
# Sample list of words
words = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog", "in", "the", "park"]
# Perform POS tagging
tagged_words = pos_tagging(words)
# Print POS tags for each word
print("POS tagging:")
for word, pos_tag in tagged_words:
print(f"{word}: {pos_tag}")
# Find POS for a given word
search_word = "fox"
pos = 昀椀nd_pos(search_word, tagged_words)
print(f"\nPOS for '{search_word}': {pos}")
if __name__ == "__main__":
main()

Output:
POS tagging:
The: DT
quick: JJ
brown: NN
fox: NN
jumps: VBZ
over: IN
the: DT
lazy: JJ
dog: NN
in: IN
the: DT
park: NN
POS for 'fox': NN

Viva:
1) What is Part-of-Speech (POS) tagging, and why is it important in natural language processing?
2) How does NLTK facilitate POS tagging in Python?
3) Can you explain the meaning of common POS tags such as 'NN', 'VBZ', 'JJ', 'IN', and 'DT'?
4) How accurate are POS taggers, and what factors can a昀昀ect their accuracy?
5) Can you describe a scenario where POS tagging is useful in real-world applica琀椀ons?

Page 13 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.7: Write a Python program to

a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Grams library
c) Implement N-Grams Smoothing

Aim: Write a Python program to

a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Grams library
c) Implement N-Grams Smoothing

Descrip琀椀on:
1) Morphological Analysis: Morphological analysis involves analyzing the structure of words to
understand their meaning and gramma琀椀cal proper琀椀es. NLTK provides tools to perform
morphological analysis, such as stemming and lemma琀椀za琀椀on.
2) N-Grams Genera琀椀on: N-grams are con琀椀guous sequences of n items (words, characters, etc.)
from a given text. NLTK provides func琀椀ons to generate n-grams from a list of tokens.
3) N-Grams Smoothing: N-gram smoothing is a technique used to address the sparsity problem in
language models by assigning non-zero probabili琀椀es to unseen n-grams. Here, we'll implement
simply add-one (Laplace) smoothing for n-grams.

Program:
import nltk
from nltk.u琀椀l import ngrams
from nltk.lm import Laplace
from nltk.tokenize import word_tokenize
def morphological_analysis(word):
"""
Performs morphological analysis on the given word using NLTK's WordNet Lemma琀椀zer.
Args:
word (str): The word to be analyzed.
Returns:
str: The base form of the word (lemma).
"""
lemma琀椀zer = nltk.WordNetLemma琀椀zer()

Page 14 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

lemma = lemma琀椀zer.lemma琀椀ze(word)
return lemma
def generate_ngrams(text, n):
"""
Generates n-grams from the given text.
Args:
text (str): The input text from which n-grams will be generated.
n (int): The size of n-grams (e.g., 2 for bigrams, 3 for trigrams, etc.).
Returns:
list: A list of n-grams.
"""
tokens = nltk.word_tokenize(text)
ngrams_list = list(ngrams(tokens, n))
return ngrams_list
def ngram_smoothing(ngrams_list):
"""
Implements Laplace (add-one) smoothing for n-grams.
Args:
ngrams_list (list): A list of n-grams.
Returns:
nltk.lm.Laplace: A Laplace language model trained with smoothed n-grams.
"""
vocab = nltk.lm.Vocabulary(ngrams_list)
laplace = Laplace(order=len(ngrams_list[0]), vocabulary=vocab)
laplace.昀椀t([ngrams_list])
return laplace
def main():
"""
Main func琀椀on to demonstrate morphological analysis, n-grams genera琀椀on, and n-gram smoothing.
"""
# Sample word for morphological analysis
word = "running"
lemma = morphological_analysis(word)
print(f"Morphological analysis of '{word}': {lemma}")
# Sample text for n-grams genera琀椀on and smoothing
text = "The quick brown fox jumps over the lazy dog"
print("\nOriginal text:")
print(text)
# Generate trigrams
n=3
trigrams_list = generate_ngrams(text, n)
print(f"\nGenerated {n}-grams:")
print(trigrams_list)
# Apply n-gram smoothing
laplace_model = ngram_smoothing(trigrams_list)
print("\nN-gram probabili琀椀es a昀琀er Laplace smoothing:")

Page 15 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

print(laplace_model)
if __name__ == "__main__":
main()

Output:
Morphological analysis of 'running': running
Original text:
The quick brown fox jumps over the lazy dog
Generated 3-grams:
[('The', 'quick', 'brown'), ('quick', 'brown', 'fox'), ('brown', 'fox', 'jumps'), ('fox', 'jumps', 'over'),
('jumps', 'over', 'the'), ('over', 'the', 'lazy'), ('the', 'lazy', 'dog')]
N-gram probabili琀椀es a昀琀er Laplace smoothing:
<NgramModel with 1 3-grams>

Viva:
1) What is morphological analysis, and why is it important in natural language processing?
2) Can you explain how NLTK's WordNet Lemma琀椀zer works for morphological analysis?
3) What are n-grams, and how are they useful in language modeling?
4) How does Laplace (add-one) smoothing address the sparsity problem in n-gram language
models?
5) Are there any drawbacks or limita琀椀ons of Laplace smoothing? If so, what are they, and how can
they be mi琀椀gated?

Page 16 of 19
By R. A. B (KLH)
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.8: Using NLTK package to convert audio 昀椀le to text and text 昀椀le to audio 昀椀les.

Aim: Using NLTK package to convert audio 昀椀le to text and text 昀椀le to audio 昀椀les.

Descrip琀椀on:
1) Conver琀椀ng Audio to Text: Speech recogni琀椀on is the process of conver琀椀ng spoken language into
text. NLTK provides an interface to various speech recogni琀椀on engines. In this program, we'll use
the SpeechRecogni琀椀on library along with NLTK to convert an audio 昀椀le into text.
2) Conver琀椀ng Text to Audio: Text-to-speech (TTS) is the process of conver琀椀ng text into spoken
language. NLTK provides func琀椀onali琀椀es to synthesize speech from text using various TTS engines.
We'll use the gTTS library along with NLTK to convert text into an audio 昀椀le.

Program:
import speech_recogni琀椀on as sr
from g琀琀s import gTTS
import os
def audio_to_text(audio_昀椀le):
"""
Converts an audio 昀椀le to text using speech recogni琀椀on.
Args:
audio_昀椀le (str): The path to the audio 昀椀le.
Returns:
str: The recognized text from the audio 昀椀le.
"""
recognizer = sr.Recognizer()
with sr.AudioFile(audio_昀椀le) as source:
audio_data = recognizer.record(source)
text = recognizer.recognize_google(audio_data)
return text
def text_to_audio(text, output_昀椀le):
"""

Page 17 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Converts text to audio and saves it as a 昀椀le.

Args:
text (str): The text to be converted to audio.
output_昀椀le (str): The path to save the output audio 昀椀le.
"""
琀琀s = gTTS(text=text, lang='en')
琀琀s.save(output_昀椀le)
def main():
"""
Main func琀椀on to demonstrate audio-to-text and text-to-audio conversion.
"""
# Conver琀椀ng audio 昀椀le to text
audio_昀椀le = "sample_audio.wav"
recognized_text = audio_to_text(audio_昀椀le)
print("Audio to Text:")
print(recognized_text)
# Conver琀椀ng text to audio
text = "This is a sample text-to-speech conversion."
output_昀椀le = "output_audio.mp3"
text_to_audio(text, output_昀椀le)
print("\nText to Audio: Conversion successful")
if __name__ == "__main__":
main()

Output:
Audio to Text:
this is a sample audio 昀椀le for tes琀椀ng text-to-speech conversion
Text to Audio: Conversion successful

Viva:
1) What is speech recogni琀椀on, and how does it work?
2) How does the NLTK library facilitate speech recogni琀椀on in Python?
3) Can you explain the process of conver琀椀ng an audio 昀椀le to text using NLTK and
SpeechRecogni琀椀on?
4) What are some poten琀椀al challenges or limita琀椀ons of speech recogni琀椀on systems?
5) What is text-to-speech (TTS) synthesis, and why is it useful in natural language processing
applica琀椀ons?

Page 18 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Page 19 of 19

NLP Question Paper Solution
No ratings yet
NLP Question Paper Solution
27 pages
Mrit Devops Full Notes PDF
No ratings yet
Mrit Devops Full Notes PDF
214 pages
WSMA Lab Manual 2
No ratings yet
WSMA Lab Manual 2
8 pages
NLP QB
100% (2)
NLP QB
14 pages
CS8711 - Cloud Computing Laboratory Record: Department of Computer Science & Engineering
No ratings yet
CS8711 - Cloud Computing Laboratory Record: Department of Computer Science & Engineering
5 pages
Standardizacija Romskog Jezika
100% (1)
Standardizacija Romskog Jezika
152 pages
Brosura Teste Engleza
No ratings yet
Brosura Teste Engleza
97 pages
Applied Linguistics Exam Sample
No ratings yet
Applied Linguistics Exam Sample
4 pages
02basics of Veda-Dheerga Swaritam
No ratings yet
02basics of Veda-Dheerga Swaritam
34 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
NLP UNIT 2 (Ques Ans Bank)
No ratings yet
NLP UNIT 2 (Ques Ans Bank)
26 pages
Unit 5
No ratings yet
Unit 5
20 pages
NLP Notes For Students
100% (2)
NLP Notes For Students
18 pages
Data Analytics Unit-I
No ratings yet
Data Analytics Unit-I
25 pages
Unit 3
No ratings yet
Unit 3
19 pages
U4 NLP Notes
No ratings yet
U4 NLP Notes
5 pages
NLP ORAL - Sample Question Bank: Modul e No. Sr. No - Description
No ratings yet
NLP ORAL - Sample Question Bank: Modul e No. Sr. No - Description
9 pages
Word Level Analysis
No ratings yet
Word Level Analysis
49 pages
NLP Sem Questions and Answers
No ratings yet
NLP Sem Questions and Answers
72 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
71 pages
NLP - AI2214601 Unit 1to Unit 5 Notes
No ratings yet
NLP - AI2214601 Unit 1to Unit 5 Notes
98 pages
Unit 1 2 3 4 5 NLP Notes Merged
100% (1)
Unit 1 2 3 4 5 NLP Notes Merged
105 pages
NLP Lab Manual Updated
No ratings yet
NLP Lab Manual Updated
34 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
SEM-2-NLP Questions
No ratings yet
SEM-2-NLP Questions
3 pages
Natural Language Processing
100% (2)
Natural Language Processing
48 pages
Efficient Convolution Algorithms
No ratings yet
Efficient Convolution Algorithms
13 pages
Unit Ii - NLP
No ratings yet
Unit Ii - NLP
35 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
No ratings yet
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
7 pages
NLP Question Bank
No ratings yet
NLP Question Bank
1 page
NLP Unit 1 Notes
100% (1)
NLP Unit 1 Notes
19 pages
Cs3591 - CN Unit 2 Transport Layer
No ratings yet
Cs3591 - CN Unit 2 Transport Layer
15 pages
Compiler Design Unit 2
No ratings yet
Compiler Design Unit 2
117 pages
ATCD Important Questions
No ratings yet
ATCD Important Questions
7 pages
Data Analytics - Object Segmentation UNIT-IV
100% (1)
Data Analytics - Object Segmentation UNIT-IV
33 pages
CNS 3-1 Lab Manual
100% (2)
CNS 3-1 Lab Manual
34 pages
Write C Programs To Illustrate The Following IPC Mechanisms: A) Pipes
No ratings yet
Write C Programs To Illustrate The Following IPC Mechanisms: A) Pipes
6 pages
IRS UNIT 5-Compressed
No ratings yet
IRS UNIT 5-Compressed
80 pages
Unit 4
100% (1)
Unit 4
8 pages
NLP Unit-3-Semantics-And-Pragmatics
No ratings yet
NLP Unit-3-Semantics-And-Pragmatics
20 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
Unit 4 NLP
No ratings yet
Unit 4 NLP
51 pages
KRR Unit I Notes
100% (1)
KRR Unit I Notes
32 pages
Explain Item Normalization?
No ratings yet
Explain Item Normalization?
7 pages
Path, Path Products and Regular Expressions - G9
No ratings yet
Path, Path Products and Regular Expressions - G9
37 pages
Natural Language Processing Module 1 Notes PDF
100% (3)
Natural Language Processing Module 1 Notes PDF
15 pages
Unit 1 Introduction To ML
100% (1)
Unit 1 Introduction To ML
52 pages
Unit 1 (Fiot)
No ratings yet
Unit 1 (Fiot)
38 pages
Unit 2
No ratings yet
Unit 2
15 pages
STM Unit 3 Notes
No ratings yet
STM Unit 3 Notes
36 pages
Compiler Design Two Marks
50% (2)
Compiler Design Two Marks
17 pages
CHAPTER - 4 Transaction Flow Testing
100% (2)
CHAPTER - 4 Transaction Flow Testing
3 pages
Unit IV Notes
No ratings yet
Unit IV Notes
14 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
IRS Study Material
100% (1)
IRS Study Material
87 pages
Unit 5 - Notes
No ratings yet
Unit 5 - Notes
11 pages
PAT Trees and PAT Arrays
No ratings yet
PAT Trees and PAT Arrays
12 pages
Graph Matrices and Applications
88% (8)
Graph Matrices and Applications
47 pages
FIOT Unit-1 Notes
No ratings yet
FIOT Unit-1 Notes
27 pages
AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem
No ratings yet
AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem AM604PC Natural Language Processing LAB R22 AI&ML 3rd Yr 2nd Sem
20 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Date: Practical No.4:: Foundation of AI and ML (4351601)
No ratings yet
Date: Practical No.4:: Foundation of AI and ML (4351601)
10 pages
Big Data Analytics
No ratings yet
Big Data Analytics
9 pages
Data Analytics7
No ratings yet
Data Analytics7
5 pages
SOFTWARE ENGINEERING NOTES (Edited
No ratings yet
SOFTWARE ENGINEERING NOTES (Edited
102 pages
Knowledge Representation and Reasoning
No ratings yet
Knowledge Representation and Reasoning
132 pages
BDA Lab Manual-2
No ratings yet
BDA Lab Manual-2
61 pages
Operating Systems Lab Manual
No ratings yet
Operating Systems Lab Manual
51 pages
UNIT-I Notes Web Programming
No ratings yet
UNIT-I Notes Web Programming
79 pages
Java Lab Manual (Aids)
No ratings yet
Java Lab Manual (Aids)
16 pages
IELTS Materials
No ratings yet
IELTS Materials
162 pages
RPP 101 Bahasa Inggris Kelas X SMK
100% (1)
RPP 101 Bahasa Inggris Kelas X SMK
8 pages
Functional Grammar 8
No ratings yet
Functional Grammar 8
19 pages
ENGL Prepositions
No ratings yet
ENGL Prepositions
2 pages
Quiz First Conditional
No ratings yet
Quiz First Conditional
3 pages
Parents Articles Pronunciation Activities English
No ratings yet
Parents Articles Pronunciation Activities English
2 pages
Outcomes - Beginner - UnitTests 4
No ratings yet
Outcomes - Beginner - UnitTests 4
1 page
Ability - Session 1 English 4
No ratings yet
Ability - Session 1 English 4
7 pages
Engleski Jezik 4.god
No ratings yet
Engleski Jezik 4.god
3 pages
Aqui Hay Todo Mija - AP Literature & Composition
No ratings yet
Aqui Hay Todo Mija - AP Literature & Composition
7 pages
Multinational Companies
No ratings yet
Multinational Companies
27 pages
Present Continuous Grammar Drills Renewed
No ratings yet
Present Continuous Grammar Drills Renewed
2 pages
Glottal Stop Practice
No ratings yet
Glottal Stop Practice
3 pages
Worksheet 4 The Mask
No ratings yet
Worksheet 4 The Mask
2 pages
Revision For Phonetics and Phonology I. True / False: Decide Whether The Following Statements Are True or False
100% (1)
Revision For Phonetics and Phonology I. True / False: Decide Whether The Following Statements Are True or False
12 pages
Christiane Dalton-Puffer, Ana Llinares, Francisco Lorenzo and Tarja Nikula - You Can Stand Under My Umbrella Immersion CLIL and Bilingual Education
No ratings yet
Christiane Dalton-Puffer, Ana Llinares, Francisco Lorenzo and Tarja Nikula - You Can Stand Under My Umbrella Immersion CLIL and Bilingual Education
6 pages
ALLIED - Module 2
No ratings yet
ALLIED - Module 2
6 pages
Language Acquisition of Consecutive Bilinguals
100% (1)
Language Acquisition of Consecutive Bilinguals
15 pages
Narrative Report (Seminar/Training/Workshop/ Webinar Attended)
100% (1)
Narrative Report (Seminar/Training/Workshop/ Webinar Attended)
3 pages
Stem (Aside From Conjugating, You Have Yet To Learn Other Times When You Must Add Another
No ratings yet
Stem (Aside From Conjugating, You Have Yet To Learn Other Times When You Must Add Another
8 pages
Pat-A-Cake Lesson Plan Weekly With Standards
0% (1)
Pat-A-Cake Lesson Plan Weekly With Standards
5 pages
Wonders5thGradeFreeSample 1
100% (1)
Wonders5thGradeFreeSample 1
57 pages
Periodic Test-II
No ratings yet
Periodic Test-II
2 pages
Direction 1
No ratings yet
Direction 1
23 pages
007 - Herbert E. Wiegand (Heidelberg) - On The Structure and Contents of A General Theory of Lexico PDF
No ratings yet
007 - Herbert E. Wiegand (Heidelberg) - On The Structure and Contents of A General Theory of Lexico PDF
18 pages
Unit 4 Written Assignment Rubric Limited Proficiency Emerging Proficiency Proficient Advanced Weight
No ratings yet
Unit 4 Written Assignment Rubric Limited Proficiency Emerging Proficiency Proficient Advanced Weight
3 pages

NLP Lab Manual 3-2 Aiml R22 Update

Uploaded by

NLP Lab Manual 3-2 Aiml R22 Update

Uploaded by

lOMoARcPSD|34374867

AM604PC Natural Language Processing LAB R22 AI&ML 3rd

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Prerequisites: Data structures, 昀椀nite automata and probability theory.

1. Write a Python Program to perform following tasks on text

1. Mul琀椀lingual natural Language Processing Applica琀椀ons: From Theory to Prac琀椀ce – Daniel M.

Experiment No.1: Write a Python Program to perform following tasks on text

Original text: The original input text is displayed.

# Sample text for demonstra琀椀on

Aim: Write Python Program for a) Word Analysis b) Word Genera琀椀on

# Sample sentence for each word

Experiment No.5: Install NLTK tool kit and perform stemming

Aim: Install NLTK tool kit and perform stemming

Experiment No.7: Write a Python program to

Aim: Write a Python program to

Converts text to audio and saves it as a 昀椀le.

You might also like