100% found this document useful (1 vote)
4K views20 pages

NLP Lab Manual 3-2 Aiml R22 Update

The document outlines the syllabus and lab experiments for the Natural Language Processing course at Jawaharlal Nehru Technological University, Hyderabad. It includes objectives, outcomes, and detailed descriptions of various programming tasks using Python and the NLTK library, such as tokenization, stemming, word analysis, and word generation. Additionally, it lists textbooks and reference materials related to the course content.

Uploaded by

Vemula Naresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
4K views20 pages

NLP Lab Manual 3-2 Aiml R22 Update

The document outlines the syllabus and lab experiments for the Natural Language Processing course at Jawaharlal Nehru Technological University, Hyderabad. It includes objectives, outcomes, and detailed descriptions of various programming tasks using Python and the NLTK library, such as tokenization, stemming, word analysis, and word generation. Additionally, it lists textbooks and reference materials related to the course content.

Uploaded by

Vemula Naresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

lOMoARcPSD|34374867

AM604PC Natural Language Processing LAB R22 AI&ML 3rd


yr 2nd sem
Natural Language Processing lab (Jawaharlal Nehru Technological University,
Hyderabad)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Vemula Naresh ([email protected])
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Prerequisites: Data structures, 昀椀nite automata and probability theory.

Course Objec琀椀ves: To Develop and explore the problems and solu琀椀ons of NLP

Course Outcomes:

Show sensi琀椀vity to linguis琀椀c phenomena and an ability to model them with formal grammars.
Knowledge on NLTK Library implementa琀椀on
Work on strings and trees, and es琀椀mate parameters using supervised and unsupervised training
methods.

List of Experiments

1. Write a Python Program to perform following tasks on text


a) Tokeniza琀椀on b) Stop word Removal
2. Write a Python program to implement Porter stemmer algorithm for stemming
3. Write Python Program for a) Word Analysis b) Word Genera琀椀on
4. Create a Sample list for at least 5 words with ambiguous sense and Write a Python program
to implement WSD
5. Install NLTK tool kit and perform stemming
6. Create Sample list of at least 10 words POS tagging and 昀椀nd the POS for any given word
7. Write a Python program to
a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Gram’s library
c) Implement N-Grams Smoothing
8. Using NLTK package to convert audio 昀椀le to text and text 昀椀le to audio 昀椀les.

TEXT BOOKS:

1. Mul琀椀lingual natural Language Processing Applica琀椀ons: From Theory to Prac琀椀ce – Daniel M.


Bikel and Imed Zitouni, Pearson Publica琀椀on.
2. Oreilly Prac琀椀cal natural Language Processing, A Comprehensive Guide to Building Real World
NLP Systems.
3. Daniel Jurafsky, James H. Mar琀椀n―Speech and Language Processing: An Introduc琀椀on to
Natural Language Processing, Computa琀椀onal Linguis琀椀cs and Speech, Pearson Publica琀椀on,
2014.

REFERENCE BOOKS:

1. Steven Bird, Ewan Klein and Edward Loper, ―Natural Language Processing with Python, First
Edi琀椀on, O‘Reilly Media, 2009.

Page 1 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.1: Write a Python Program to perform following tasks on text


a) Tokeniza琀椀on b) Stop word Removal

Aim: Write a Python Program to perform following tasks on text a) Tokeniza琀椀on & b) Stop word
Removal

Descrip琀椀on:
1) Import Necessary Libraries: The program imports the required libraries from NLTK, namely
word_tokenize for tokenization and stop_words for obtaining a list of stop words.

2) Download NLTK Resources: Before using NLTK functions, the program checks if the required resources
(word tokenizer and stop words corpus) are downloaded. If not, it downloads them.

3) Tokenization and Stop Word Removal Function: The tokenize_and_remove_stopwords function takes
the input text as an argument. It tokenizes the text using NLTK's word_tokenize function to split it into
individual words. Then, it retrieves the English stop words using stopwords.words('english'). Finally, it
removes stop words from the list of tokens and returns the filtered tokens.

4) Main Function: The main function serves as the entry point of the program. It contains a sample text for
demonstration purposes. It calls the tokenize_and_remove_stopwords function to process the text and then
prints both the original text and the processed text with stop words removed.

Program:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
# Download NLTK resources if not already downloaded
nltk.download('punkt')
nltk.download('stopwords')
def tokenize_and_remove_stopwords(text):
"""
Tokenizes the input text and removes stop words.
Args:
text (str): The input text to be processed.
Returns:
list: A list of tokens a昀琀er removing stop words.
"""
# Tokenize the text
tokens = word_tokenize(text)
# Get English stop words
stop_words = set(stopwords.words('english'))
# Remove stop words from tokens
昀椀ltered_tokens = [word for word in tokens if word.lower() not in stop_words]
return 昀椀ltered_tokens
def main():
"""
Main func琀椀on to demonstrate tokeniza琀椀on and stop word removal.

Page 2 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

"""
# Sample text for demonstra琀椀on
text = "This is a sample sentence, showing o昀昀 the stop words removal and tokeniza琀椀on."
# Tokenize and remove stop words
processed_text = tokenize_and_remove_stopwords(text)
# Print original text
print("Original text:")
print(text)
# Print tokenized text with stop words removed
print("\nTokenized text with stop words removed:")
print(processed_text)
if __name__ == "__main__":
main()

This program 昀椀rst tokenizes the input text using NLTK's word_tokenize func琀椀on and then removes
stop words using NLTK's English stop words list. Finally, it prints both the original text and the
processed text with stop words removed.

Output:
Original text: This is a sample sentence, showing o昀昀 the stop words removal and tokeniza琀椀on.
Tokenized text with stop words removed:
['sample', 'sentence', ',', 'showing', 'stop', 'words', 'removal', 'tokeniza琀椀on', '.']

Original text: The original input text is displayed.


Tokenized text with stop words removed: The input text is tokenized into individual words, and then
stop words are removed. The resul琀椀ng list contains only the meaningful words from the original text,
excluding common stop words like "This", "is", "a", "the", "and", etc.

Viva:
1) What is tokeniza琀椀on?
2) How does tokeniza琀椀on di昀昀er from stemming or lemma琀椀za琀椀on?
3) What are stop words, and why are they removed during text processing?
4) Why is it important to convert all words to lowercase before removing stop words?
5) Can you explain the purpose of NLTK in natural language processing tasks like tokeniza琀椀on and
stop word removal?

Page 3 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.2: Write a Python program to implement Porter stemmer algorithm for
stemming

Aim: Write a Python program to implement Porter stemmer algorithm for stemming

Descrip琀椀on:
1) Import Necessary Libraries: The program imports the PorterStemmer class from the nltk.stem
module. This class implements the Porter stemming algorithm.
2) Porter Stemming Func琀椀on: The porter_stemmer_example func琀椀on takes a list of words as input.
It ini琀椀alizes a PorterStemmer object and applies the stemming algorithm to each word in the list
using the stem method of the PorterStemmer object. The stemmed words are collected in a new
list, which is then returned.
3) Main Func琀椀on: The main func琀椀on serves as the entry point of the program. It contains a sample
list of words for demonstra琀椀on. It calls the porter_stemmer_example func琀椀on to perform
stemming on the sample words and then prints both the original words and the stemmed words.
The Porter stemming algorithm reduces words to their root forms, which can help in tasks like text
normaliza琀椀on and informa琀椀on retrieval. It removes common su昀케xes from words, but it might not
always produce a valid word, as it operates based on a set of rules.

Program:
import nltk
from nltk.stem import PorterStemmer
def porter_stemming(text):
"""
Applies Porter stemming algorithm to the input text.
Args:
text (str): The input text to be stemmed.
Returns:
str: The stemmed text.
"""
# Ini琀椀alize Porter stemmer
porter = PorterStemmer()
# Tokenize the text
tokens = nltk.word_tokenize(text)
# Apply stemming to each token
stemmed_tokens = [porter.stem(token) for token in tokens]
# Join the stemmed tokens back into a single string
stemmed_text = ' '.join(stemmed_tokens)
return stemmed_text
def main():
"""
Main func琀椀on to demonstrate Porter stemming.
"""

Page 4 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

# Sample text for demonstra琀椀on


text = "It is important to be very pythonly while you are pythoning with python. All pythoners have
pythoned poorly at least once."
# Apply Porter stemming
stemmed_text = porter_stemming(text)
# Print original and stemmed text
print("Original text:")
print(text)
print("\nStemmed text using Porter stemming algorithm:")
print(stemmed_text)
if __name__ == "__main__":
main()

This program u琀椀lizes NLTK's PorterStemmer class to perform stemming on the given text. The
porter_stemming func琀椀on tokenizes the input text, applies stemming to each token using the Porter
stemmer, and then joins the stemmed tokens back into a single string. Finally, the main func琀椀on
demonstrates the use of the Porter stemming algorithm by applying it to a sample text.

Output:
Original text:
It is important to be very pythonly while you are pythoning with python. All pythoners have
pythoned poorly at least once.
Stemmed text using Porter stemming algorithm:
It is import to be veri pythonli while you are python with python . all python have python poorli at
least onc .

Viva:
1) What is stemming, and why is it used in natural language processing?
2) How does the Porter stemming algorithm work?
3) Can you explain the process of tokeniza琀椀on in the context of stemming?
4) Why is it necessary to preprocess text before applying stemming algorithms?
5) What are some advantages and limita琀椀ons of the Porter stemming algorithm compared to other
stemming algorithms?

Page 5 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.3: Write Python Program for a) Word Analysis b) Word Genera琀椀on

Aim: Write Python Program for a) Word Analysis b) Word Genera琀椀on

Descrip琀椀on:
1) Word Analysis: In this program, we analyze a given text to understand the frequency of
occurrence of each word. This helps in gaining insights into the most commonly used words in
the text.
2) Word Genera琀椀on: We generate new words using the concept of Markov chains. Markov chains
are stochas琀椀c models that describe a sequence of possible events in which the probability of
each event depends only on the state a琀琀ained in the previous event. We use NLTK's
MarkovModel for word genera琀椀on based on the provided text.

Program:
import nltk
import random
def word_analysis(text):
"""
Analyzes the given text to calculate the frequency of occurrence of each word.
Args:
text (str): The input text to be analyzed.
Returns:
dict: A dic琀椀onary containing words as keys and their frequencies as values.
"""
tokens = nltk.word_tokenize(text)
word_freq = nltk.FreqDist(tokens)
return word_freq
def word_genera琀椀on(text, num_words=10):
"""
Generates new words using Markov chains based on the provided text.
Args:
text (str): The input text to generate new words from.
num_words (int): The number of words to generate.
Returns:
list: A list of generated words.
"""
tokens = nltk.word_tokenize(text)
model = nltk.MarkovModel(tokens)
generated_words = model.generate(num_words)
return generated_words
def main():

Page 6 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

"""
Main func琀椀on to demonstrate word analysis and genera琀椀on.
"""
# Sample text for demonstra琀椀on
text = "The quick brown fox jumps over the lazy dog. The dog barks loudly. The fox runs away
quickly."
# Word analysis
word_freq = word_analysis(text)
print("Word Analysis:")
print(word_freq.most_common(5)) # Display 5 most common words
# Word genera琀椀on
generated_words = word_genera琀椀on(text)
print("\nWord Genera琀椀on:")
print(generated_words)
if __name__ == "__main__":
main()

Output:
Word Analysis:
[('The', 4), ('quick', 1), ('brown', 1), ('fox', 2), ('jumps', 1)]
Word Genera琀椀on:
['jumps', 'over', 'the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'quick']

Viva:
1) How does word analysis help in understanding the characteris琀椀cs of a text?
2) What is the purpose of using Markov chains in word genera琀椀on?
3) How does NLTK's FreqDist func琀椀on work in word analysis?
4) Can you explain the concept of stochas琀椀c models in the context of Markov chains?
5) What are some poten琀椀al applica琀椀ons of word genera琀椀on using Markov models in natural
language processing?

Page 7 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.4: Create a Sample list for at least 5 words with ambiguous sense and Write a
Python program to implement WSD

Aim: Create a Sample list for at least 5 words with ambiguous sense and Write a Python program
to implement WSD

Descrip琀椀on:
Word Sense Disambigua琀椀on (WSD) is the task of determining the correct meaning of a word with
mul琀椀ple meanings (senses) based on the context in which it appears. The Lesk algorithm is a popular
approach for WSD, which compares the meanings of words in a given context with the meanings of
words in the dic琀椀onary de昀椀ni琀椀ons.
Sample List of Words with Ambiguous Senses:
"bank"
"bat"
"crane"
"light"
"bass"

Program:
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize
def wsd(word, sentence):
"""
Implements Word Sense Disambigua琀椀on (WSD) using the Lesk algorithm.
Args:
word (str): The ambiguous word for which WSD is performed.
sentence (str): The sentence containing the ambiguous word.
Returns:
str: The disambiguated sense of the word.
"""
tokens = word_tokenize(sentence)
sense = lesk(tokens, word)
return sense.de昀椀ni琀椀on() if sense else "No appropriate sense found"
def main():
"""
Main func琀椀on to demonstrate Word Sense Disambigua琀椀on (WSD).
"""
# Sample list of words with ambiguous senses
words = ["bank", "bat", "crane", "light", "bass"]

Page 8 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

# Sample sentence for each word


sentences = [
"I deposited money in the bank.",
"The baseball player swung the bat.",
"The crane li昀琀ed heavy loads at the construc琀椀on site.",
"Turn on the light, please.",
"He caught a large bass while 昀椀shing."
]
# Perform WSD for each word in the list
for word, sentence in zip(words, sentences):
print(f"Word: {word}")
print(f"Sentence: {sentence}")
print(f"Sense: {wsd(word, sentence)}\n")
if __name__ == "__main__":
main()

Output:
Word: bank
Sentence: I deposited money in the bank.
Sense: a 昀椀nancial ins琀椀tu琀椀on that accepts deposits and channels the money into lending ac琀椀vi琀椀es
Word: bat
Sentence: The baseball player swung the bat.
Sense: (baseball) a club used for hi琀�ng a ball in various games
Word: crane
Sentence: The crane li昀琀ed heavy loads at the construc琀椀on site.
Sense: large long-necked wading bird of marshes and plains in many parts of the world
Word: light
Sentence: Turn on the light, please.
Sense: (physics) electromagne琀椀c radia琀椀on that can produce a visual sensa琀椀on
Word: bass
Sentence: He caught a large bass while 昀椀shing.
Sense: the lowest part of the musical range

Viva:
1) What is Word Sense Disambigua琀椀on (WSD) and why is it important in natural language
processing?
2) Can you explain the Lesk algorithm and how it works for WSD?
3) How does the context of a word in a sentence help in determining its correct sense?
4) What are some challenges faced in implemen琀椀ng WSD algorithms?
5) Are there any limita琀椀ons of the Lesk algorithm? If so, what are they, and how can they be
addressed?

Page 9 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.5: Install NLTK tool kit and perform stemming

Aim: Install NLTK tool kit and perform stemming

Descrip琀椀on:
To install NLTK, you can use pip, Python's package manager. Here's the command to install NLTK:
pip install nltk
Once NLTK is installed, you can perform stemming using various stemming algorithms available in
NLTK. One of the popular stemming algorithms is the Porter stemming algorithm.
Stemming is the process of reducing words to their root or base form. NLTK provides various
stemming algorithms, such as the Porter, Lancaster, and Snowball stemmers. In this program, we'll
use the Porter stemming algorithm to perform stemming on a sample text.

Program:
import nltk
from nltk.stem import PorterStemmer
# Download NLTK resources if not already downloaded
nltk.download('punkt')
def perform_stemming(text):
"""
Performs stemming on the input text using the Porter stemming algorithm.
Args:
text (str): The input text to be stemmed.
Returns:
str: The stemmed text.
"""
# Ini琀椀alize the Porter stemmer
porter = PorterStemmer()
# Tokenize the text
tokens = nltk.word_tokenize(text)
# Apply stemming to each token
stemmed_tokens = [porter.stem(token) for token in tokens]
# Join the stemmed tokens back into a single string
stemmed_text = ' '.join(stemmed_tokens)
return stemmed_text
def main():
"""
Main func琀椀on to demonstrate stemming using NLTK.

Page 10 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

"""
# Sample text for demonstra琀椀on
text = "It is important to be very pythonly while you are pythoning with python."
# Perform stemming
stemmed_text = perform_stemming(text)
# Print original and stemmed text
print("Original text:")
print(text)
print("\nStemmed text using Porter stemming algorithm:")
print(stemmed_text)
if __name__ == "__main__":
main()

Output:
Original text:
It is important to be very pythonly while you are pythoning with python.
Stemmed text using Porter stemming algorithm:
It is import to be veri pythonli while you are python with python.

Viva:
1) What is stemming, and why is it used in natural language processing?
2) Can you explain the Porter stemming algorithm and how it works?
3) How does NLTK facilitate stemming in Python?
4) Are there any limita琀椀ons of the Porter stemming algorithm? If so, what are they?
5) How does stemming di昀昀er from lemma琀椀za琀椀on, and in what scenarios would you prefer one over
the other?

Page 11 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.6: Create Sample list of at least 10 words POS tagging and 昀椀nd the POS for any
given word

Aim: Create Sample list of at least 10 words POS tagging and 昀椀nd the POS for any given word

Descrip琀椀on:
Part-of-Speech (POS) tagging is the process of assigning gramma琀椀cal categories (such as noun, verb,
adjec琀椀ve, etc.) to words in a text. NLTK provides a variety of tools and algorithms for POS tagging,
which can be used to analyze and understand the structure of sentences.

Program:
import nltk
def pos_tagging(words):
"""
Performs Part-of-Speech (POS) tagging on the given list of words.
Args:
words (list): The list of words to be tagged.
Returns:
list: A list of tuples containing (word, POS_tag) pairs.
"""
tagged_words = nltk.pos_tag(words)
return tagged_words
def 昀椀nd_pos(word, tagged_words):
"""
Finds the Part-of-Speech (POS) tag for the given word in the tagged words.
Args:
word (str): The word for which POS tag needs to be found.
tagged_words (list): A list of tuples containing (word, POS_tag) pairs.
Returns:
str: The POS tag for the given word.
"""
for tagged_word in tagged_words:
if tagged_word[0].lower() == word.lower():
return tagged_word[1]
return "POS tag not found"
def main():

Page 12 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

"""
Main func琀椀on to demonstrate POS tagging and 昀椀nding POS for a given word.
"""
# Sample list of words
words = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog", "in", "the", "park"]
# Perform POS tagging
tagged_words = pos_tagging(words)
# Print POS tags for each word
print("POS tagging:")
for word, pos_tag in tagged_words:
print(f"{word}: {pos_tag}")
# Find POS for a given word
search_word = "fox"
pos = 昀椀nd_pos(search_word, tagged_words)
print(f"\nPOS for '{search_word}': {pos}")
if __name__ == "__main__":
main()

Output:
POS tagging:
The: DT
quick: JJ
brown: NN
fox: NN
jumps: VBZ
over: IN
the: DT
lazy: JJ
dog: NN
in: IN
the: DT
park: NN
POS for 'fox': NN

Viva:
1) What is Part-of-Speech (POS) tagging, and why is it important in natural language processing?
2) How does NLTK facilitate POS tagging in Python?
3) Can you explain the meaning of common POS tags such as 'NN', 'VBZ', 'JJ', 'IN', and 'DT'?
4) How accurate are POS taggers, and what factors can a昀昀ect their accuracy?
5) Can you describe a scenario where POS tagging is useful in real-world applica琀椀ons?

Page 13 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.7: Write a Python program to


a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Grams library
c) Implement N-Grams Smoothing

Aim: Write a Python program to


a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Grams library
c) Implement N-Grams Smoothing

Descrip琀椀on:
1) Morphological Analysis: Morphological analysis involves analyzing the structure of words to
understand their meaning and gramma琀椀cal proper琀椀es. NLTK provides tools to perform
morphological analysis, such as stemming and lemma琀椀za琀椀on.
2) N-Grams Genera琀椀on: N-grams are con琀椀guous sequences of n items (words, characters, etc.)
from a given text. NLTK provides func琀椀ons to generate n-grams from a list of tokens.
3) N-Grams Smoothing: N-gram smoothing is a technique used to address the sparsity problem in
language models by assigning non-zero probabili琀椀es to unseen n-grams. Here, we'll implement
simply add-one (Laplace) smoothing for n-grams.

Program:
import nltk
from nltk.u琀椀l import ngrams
from nltk.lm import Laplace
from nltk.tokenize import word_tokenize
def morphological_analysis(word):
"""
Performs morphological analysis on the given word using NLTK's WordNet Lemma琀椀zer.
Args:
word (str): The word to be analyzed.
Returns:
str: The base form of the word (lemma).
"""
lemma琀椀zer = nltk.WordNetLemma琀椀zer()

Page 14 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

lemma = lemma琀椀zer.lemma琀椀ze(word)
return lemma
def generate_ngrams(text, n):
"""
Generates n-grams from the given text.
Args:
text (str): The input text from which n-grams will be generated.
n (int): The size of n-grams (e.g., 2 for bigrams, 3 for trigrams, etc.).
Returns:
list: A list of n-grams.
"""
tokens = nltk.word_tokenize(text)
ngrams_list = list(ngrams(tokens, n))
return ngrams_list
def ngram_smoothing(ngrams_list):
"""
Implements Laplace (add-one) smoothing for n-grams.
Args:
ngrams_list (list): A list of n-grams.
Returns:
nltk.lm.Laplace: A Laplace language model trained with smoothed n-grams.
"""
vocab = nltk.lm.Vocabulary(ngrams_list)
laplace = Laplace(order=len(ngrams_list[0]), vocabulary=vocab)
laplace.昀椀t([ngrams_list])
return laplace
def main():
"""
Main func琀椀on to demonstrate morphological analysis, n-grams genera琀椀on, and n-gram smoothing.
"""
# Sample word for morphological analysis
word = "running"
lemma = morphological_analysis(word)
print(f"Morphological analysis of '{word}': {lemma}")
# Sample text for n-grams genera琀椀on and smoothing
text = "The quick brown fox jumps over the lazy dog"
print("\nOriginal text:")
print(text)
# Generate trigrams
n=3
trigrams_list = generate_ngrams(text, n)
print(f"\nGenerated {n}-grams:")
print(trigrams_list)
# Apply n-gram smoothing
laplace_model = ngram_smoothing(trigrams_list)
print("\nN-gram probabili琀椀es a昀琀er Laplace smoothing:")

Page 15 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

print(laplace_model)
if __name__ == "__main__":
main()

Output:
Morphological analysis of 'running': running
Original text:
The quick brown fox jumps over the lazy dog
Generated 3-grams:
[('The', 'quick', 'brown'), ('quick', 'brown', 'fox'), ('brown', 'fox', 'jumps'), ('fox', 'jumps', 'over'),
('jumps', 'over', 'the'), ('over', 'the', 'lazy'), ('the', 'lazy', 'dog')]
N-gram probabili琀椀es a昀琀er Laplace smoothing:
<NgramModel with 1 3-grams>

Viva:
1) What is morphological analysis, and why is it important in natural language processing?
2) Can you explain how NLTK's WordNet Lemma琀椀zer works for morphological analysis?
3) What are n-grams, and how are they useful in language modeling?
4) How does Laplace (add-one) smoothing address the sparsity problem in n-gram language
models?
5) Are there any drawbacks or limita琀椀ons of Laplace smoothing? If so, what are they, and how can
they be mi琀椀gated?

Page 16 of 19
By R. A. B (KLH)
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Experiment No.8: Using NLTK package to convert audio 昀椀le to text and text 昀椀le to audio 昀椀les.

Aim: Using NLTK package to convert audio 昀椀le to text and text 昀椀le to audio 昀椀les.

Descrip琀椀on:
1) Conver琀椀ng Audio to Text: Speech recogni琀椀on is the process of conver琀椀ng spoken language into
text. NLTK provides an interface to various speech recogni琀椀on engines. In this program, we'll use
the SpeechRecogni琀椀on library along with NLTK to convert an audio 昀椀le into text.
2) Conver琀椀ng Text to Audio: Text-to-speech (TTS) is the process of conver琀椀ng text into spoken
language. NLTK provides func琀椀onali琀椀es to synthesize speech from text using various TTS engines.
We'll use the gTTS library along with NLTK to convert text into an audio 昀椀le.

Program:
import speech_recogni琀椀on as sr
from g琀琀s import gTTS
import os
def audio_to_text(audio_昀椀le):
"""
Converts an audio 昀椀le to text using speech recogni琀椀on.
Args:
audio_昀椀le (str): The path to the audio 昀椀le.
Returns:
str: The recognized text from the audio 昀椀le.
"""
recognizer = sr.Recognizer()
with sr.AudioFile(audio_昀椀le) as source:
audio_data = recognizer.record(source)
text = recognizer.recognize_google(audio_data)
return text
def text_to_audio(text, output_昀椀le):
"""

Page 17 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Converts text to audio and saves it as a 昀椀le.


Args:
text (str): The text to be converted to audio.
output_昀椀le (str): The path to save the output audio 昀椀le.
"""
琀琀s = gTTS(text=text, lang='en')
琀琀s.save(output_昀椀le)
def main():
"""
Main func琀椀on to demonstrate audio-to-text and text-to-audio conversion.
"""
# Conver琀椀ng audio 昀椀le to text
audio_昀椀le = "sample_audio.wav"
recognized_text = audio_to_text(audio_昀椀le)
print("Audio to Text:")
print(recognized_text)
# Conver琀椀ng text to audio
text = "This is a sample text-to-speech conversion."
output_昀椀le = "output_audio.mp3"
text_to_audio(text, output_昀椀le)
print("\nText to Audio: Conversion successful")
if __name__ == "__main__":
main()

Output:
Audio to Text:
this is a sample audio 昀椀le for tes琀椀ng text-to-speech conversion
Text to Audio: Conversion successful

Viva:
1) What is speech recogni琀椀on, and how does it work?
2) How does the NLTK library facilitate speech recogni琀椀on in Python?
3) Can you explain the process of conver琀椀ng an audio 昀椀le to text using NLTK and
SpeechRecogni琀椀on?
4) What are some poten琀椀al challenges or limita琀椀ons of speech recogni琀椀on systems?
5) What is text-to-speech (TTS) synthesis, and why is it useful in natural language processing
applica琀椀ons?

Page 18 of 19
lOMoARcPSD|34374867

R22 B.Tech. AI & ML Syllabus AM604PC: NATURAL LANGUAGE PROCESSING LAB JNTU Hyderabad
B.Tech. III Year II Sem L TP C
0 0 3 1.5

Page 19 of 19

You might also like