0% found this document useful (0 votes)

7 views13 pages

1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization

The document provides a comprehensive guide on performing various text processing tasks in Python, including tokenization, stop word removal, stemming using the Porter stemmer algorithm, word analysis, word generation, and part-of-speech tagging. It also covers morphological analysis, n-grams generation, and smoothing techniques using the NLTK library. Sample code snippets are included for each task to demonstrate their implementation.

Uploaded by

John Sukeerth reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views13 pages

1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization

Uploaded by

John Sukeerth reddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 13

1 .

Write a Python Program to perform following tasks on text

a) Tokenization

# Sample text

text = "This is a simple example of tokenization using split."

# Tokenizing the text using split() function

tokens = text.split()

# Displaying the tokens

print("Tokens:", tokens)

b) Stop word Removal

import nltk

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

# Download the necessary resources for tokenization and stop words

nltk.download('punkt')

nltk.download('stopwords')

# Sample text

text = "This is an example sentence demonstrating stopword removal using NLTK."

# Tokenizing the text using word_tokenize()

tokens = word_tokenize(text)

# Get the set of stopwords in English

stop_words = set(stopwords.words('english'))

# Filter out stopwords from the tokens

filtered_tokens = [word for word in tokens if word.lower() not in stop_words]

# Displaying the filtered tokens

print("Filtered Tokens:", filtered_tokens)

2. Write a Python program to implement Porter stemmer algorithm for stemming

import re

class PorterStemmer:

def stem(self, word):

suffixes = [

(r"(sses|ss)$", ""),

(r"(ies|ied)$", "i"),

(r"(ing|ed)$", ""),

(r"(es|s)$", ""),

(r"(ly|ness)$", ""),

(r"(er|ful)$", ""),

word = word.lower()

for pattern, replacement in suffixes:

word = re.sub(pattern, replacement, word)

return word

# Test the Porter Stemmer

porter_stemmer = PorterStemmer()

words = ["running", "better", "happiness", "jumps", "faster", "running", "beauty", "kindness"]

stemmed_words = [porter_stemmer.stem(word) for word in words]

print("Original Words:", words)

print("Stemmed Words:", stemmed_words)

3. Write Python Program for

a) Word Analysis

import string

class WordAnalyzer:

def __init__(self):

self.vowels = "aeiou"

def analyze_word(self, word):

word = word.lower()

word_length = len(word)

vowels_count = sum(1 for char in word if char in self.vowels)

consonants_count = sum(1 for char in word if char in string.ascii_lowercase and char not in
self.vowels)

unique_chars = len(set(word)

return {

"Word": word,

"Length": word_length,

"Vowels": vowels_count,

"Consonants": consonants_count,

"Unique Characters": unique_chars

# Usage example for Word Analysis:

word_analyzer = WordAnalyzer()

word = input("Enter a word for analysis: ")

analysis = word_analyzer.analyze_word(word)

print("\nWord Analysis:")

for key, value in analysis.items():

print(f"{key}: {value}

b) Word Generation

import random

import string

def generate_word(length):

# Randomly selects letters to form a word of specified length

return ''.join(random.choice(string.ascii_lowercase) for _ in range(length))

def generate_words(num_words, word_length):

words = [generate_word(word_length) for _ in range(num_words)]

return words

# Example usage

num_words = 5 # Number of words to generate

word_length = 8 # Length of each word

generated_words = generate_words(num_words, word_length)

print("Generated Words:", generated_words)

4. Create a Sample list for at least 5 words with ambiguous sense and Write a Python program to
implement WSD

from collections import Counter

# Ambiguous words and their possible senses

word_senses = {

'bank': ['financial institution', 'side of a river'],

'bark': ['sound a dog makes', 'outer covering of a tree'],

'bat': ['flying mammal', 'sports equipment'],

'lead': ['metal', 'to guide someone'],

'spring': ['season', 'coiled object used for bouncing']

# Sample sentences with ambiguous words

sentences = [

"I went to the bank to deposit some money.",

"The dog started to bark loudly.",

"He hit the ball with a bat.",

"She will lead the team to victory.",

"The flowers bloom every spring."

# Function to determine the sense of a word based on context

def wsd(word, sentence):

senses = word_senses.get(word, [])

sense_counter = Counter()

# Count sense occurrences based on the context

for sense in senses:

for word_in_context in sentence.split():

if word_in_context.lower() in sense.lower():

sense_counter[sense] += 1

return sense_counter.most_common(1)[0][0] if sense_counter else "No clear sense"

# Implement WSD for each sentence

for sentence in sentences:

for word in word_senses:

if word in sentence.lower():

print(f"Sentence: {sentence}")

print(f"Word: {word} -> Predicted Sense: {wsd(word, sentence)}")

print("-" * 50)
5. Install NLTK tool kit and perform stemming

import nltk

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize

# Initialize the PorterStemmer

stemmer = PorterStemmer()

# Sample text

text = "The cats were playing with the scratched balls, and they enjoyed the games."

# Tokenize the sentence into words

words = word_tokenize(text)

# Perform stemming

stemmed_words = [stemmer.stem(word) for word in words]

# Display the stemmed words

print("Original Text: ", text)

print("Stemmed Words: ", stemmed_words)

6. Create Sample list of at least 10 words POS tagging and find the POS for any given word

import nltk

from nltk import pos_tag

# Sample list of words

words = ["run", "quickly", "dog", "happily", "under", "the", "sky", "ate", "jump", "beautiful"]

# Perform POS tagging

tagged_words = pos_tag(words)

# Print tagged words

print("Tagged Words:", tagged_words)

# Function to find POS for a given word

def find_pos(word):

for w, tag in tagged_words:

if w.lower() == word.lower():

return f"POS for '{word}': {tag}"

return f"'{word}' not found."

# Example usage

print(find_pos("dog"))

print(find_pos("run"))
7. Write a Python program to

a) Perform Morphological Analysis using NLTK library

import nltk

from nltk.stem import PorterStemmer, WordNetLemmatizer

from nltk.tokenize import word_tokenize

# Download necessary NLTK data

nltk.download('punkt')

nltk.download('wordnet')

# Initialize stemmer and lemmatizer

stemmer = PorterStemmer()

lemmatizer = WordNetLemmatizer()

# Sample text

text = "hi this is me."

# Tokenize text and perform morphological analysis

words = word_tokenize(text)

stemmed = [stemmer.stem(word) for word in words]

lemmatized = [lemmatizer.lemmatize(word, pos='v') for word in words]

# Display results

print("Stemmed:", stemmed)

print("Lemmatized:", lemmatized)
b) Generate n-grams using NLTK N-Grams library

import nltk

from nltk.util import ngrams

from nltk.tokenize import word_tokenize

# Download necessary NLTK data

nltk.download('punkt')

# Sample text

text = "hi this is me."

# Tokenize and generate bigrams (n=2)

bigrams = list(ngrams(word_tokenize(text), 2))

# Display the bigrams

print("Bigrams:", bigrams)

# Generate trigrams (n=3)

trigrams = list(ngrams(word_tokenize(text), 3))

# Display the trigrams

print("Trigrams:", trigrams)
c) Implement N-Grams Smoothing

import nltk

from nltk.util import ngrams

from nltk.tokenize import word_tokenize

from collections import Counter

# Download necessary NLTK data

nltk.download('punkt')

# Sample text

text = "The quick brown fox jumps over the lazy dog."

# Tokenize the text

tokens = word_tokenize(text)

# Define n (for bigrams, n=2)

n=2

# Generate n-grams

ngram_list = list(ngrams(tokens, n))

# Count the occurrences of n-grams

ngram_counts = Counter(ngram_list)

# Calculate total n-grams

total_ngrams = len(ngram_list)
# Laplace (Add-one) smoothing

vocab_size = len(set(tokens)) # Number of unique words

# Function to calculate smoothed probability of an n-gram

def laplace_smoothing(ngram):

ngram_count = ngram_counts[ngram] + 1 # Add-one smoothing

return ngram_count / (total_ngrams + vocab_size)

# Test with a bigram

bigram = ('quick', 'brown')

smoothed_prob = laplace_smoothing(bigram)

print(f"Smoothed probability of {bigram}: {smoothed_prob:.4f}")

NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
Secrets To Writing Great Papers - Judi Kesselman-Turkel
100% (1)
Secrets To Writing Great Papers - Judi Kesselman-Turkel
92 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
Lesson 3.1 Now Let's Talk About The Method of Payment - Web - 1574841057
100% (1)
Lesson 3.1 Now Let's Talk About The Method of Payment - Web - 1574841057
53 pages
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
Limba Engleza (Sintaxa Frazei) curs-CERBAN MADALINA-MARINA
No ratings yet
Limba Engleza (Sintaxa Frazei) curs-CERBAN MADALINA-MARINA
18 pages
20BCP112 - NLP Lab - LAB - Manual
No ratings yet
20BCP112 - NLP Lab - LAB - Manual
65 pages
20BCP123 - NLP Lab Manual
No ratings yet
20BCP123 - NLP Lab Manual
45 pages
NLP Op
No ratings yet
NLP Op
16 pages
NLP Record
No ratings yet
NLP Record
23 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
Sahil NLP
No ratings yet
Sahil NLP
16 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
CCS369 - Text and Speech Analysis
No ratings yet
CCS369 - Text and Speech Analysis
31 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
AI Lab Manual Aktu
No ratings yet
AI Lab Manual Aktu
11 pages
Soundarya 256 NLP Practs
No ratings yet
Soundarya 256 NLP Practs
14 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
A7 Dsbda Sana
No ratings yet
A7 Dsbda Sana
15 pages
NLP Expts
No ratings yet
NLP Expts
41 pages
Final NLP Lab File
No ratings yet
Final NLP Lab File
28 pages
Thesis Statement About Family
100% (3)
Thesis Statement About Family
5 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
7 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
All Practicals
No ratings yet
All Practicals
33 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
Class 8th Test - 2 GK
No ratings yet
Class 8th Test - 2 GK
4 pages
x0 Process
No ratings yet
x0 Process
4 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
Ai&Ml Bai601 NLP Lab Manual
No ratings yet
Ai&Ml Bai601 NLP Lab Manual
48 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
32 pages
7 TextAnalysis
No ratings yet
7 TextAnalysis
3 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
123 NLP 456
No ratings yet
123 NLP 456
4 pages
NLP - Practical List
No ratings yet
NLP - Practical List
14 pages
Assignment No - 7
No ratings yet
Assignment No - 7
4 pages
Batch 2
No ratings yet
Batch 2
13 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
Lab - Manual - IR - BE AI&DS CL II
No ratings yet
Lab - Manual - IR - BE AI&DS CL II
38 pages
3.Nlp Lab Manual
No ratings yet
3.Nlp Lab Manual
18 pages
Progress Test UNIT 2 Grammar: - Office Recently? - /20
No ratings yet
Progress Test UNIT 2 Grammar: - Office Recently? - /20
4 pages
Text Processing
No ratings yet
Text Processing
16 pages
NLP Manual
No ratings yet
NLP Manual
9 pages
NLP Final
No ratings yet
NLP Final
26 pages
NLP Record
No ratings yet
NLP Record
15 pages
Module 5
No ratings yet
Module 5
69 pages
7 Exp
No ratings yet
7 Exp
6 pages
Text Mining Basics
No ratings yet
Text Mining Basics
16 pages
Tugas Bahasa Inggris Rieke 030
No ratings yet
Tugas Bahasa Inggris Rieke 030
4 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
DSBDL Assn 07
No ratings yet
DSBDL Assn 07
4 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
J.K. Institute of Applied Physics and Technology: Natural Language Processing Assignment
No ratings yet
J.K. Institute of Applied Physics and Technology: Natural Language Processing Assignment
22 pages
Assam Police Constable Recruitment 2021: Important Date
No ratings yet
Assam Police Constable Recruitment 2021: Important Date
14 pages
Beginning Sounds - 3
No ratings yet
Beginning Sounds - 3
54 pages
Exp 4
No ratings yet
Exp 4
5 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
NLP - Exp 1 11
No ratings yet
NLP - Exp 1 11
29 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
NLB Final Lab Manual
No ratings yet
NLB Final Lab Manual
23 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Quoting, Paraphrasing, & Summarizing Activities - Part 2
0% (1)
Quoting, Paraphrasing, & Summarizing Activities - Part 2
2 pages
Bling
No ratings yet
Bling
7 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
History of The Poqomam Language
No ratings yet
History of The Poqomam Language
3 pages
Pashupati Shikshya Mandir: Dhangadhi, Kailali 1 Terminal Examination-2077 Subject: Math
No ratings yet
Pashupati Shikshya Mandir: Dhangadhi, Kailali 1 Terminal Examination-2077 Subject: Math
3 pages
From Import From Import From Import From Import Import
No ratings yet
From Import From Import From Import From Import Import
3 pages
Quantifiers Homework 1
No ratings yet
Quantifiers Homework 1
1 page
EIM Unit and Progress Tests Teachers Notes
No ratings yet
EIM Unit and Progress Tests Teachers Notes
1 page
LearnEnglish Listening A2 Changing Meeting Time 1 2
No ratings yet
LearnEnglish Listening A2 Changing Meeting Time 1 2
2 pages
What Is A Gerund Phrase
No ratings yet
What Is A Gerund Phrase
3 pages
Answer Sheet B1 Listening Mock 25-05-24
No ratings yet
Answer Sheet B1 Listening Mock 25-05-24
1 page
Kilba Morphology 2003
No ratings yet
Kilba Morphology 2003
14 pages
ISE I Sample Exam Paper With Answers
No ratings yet
ISE I Sample Exam Paper With Answers
13 pages
(123doc) - De-Thi-Hoc-Sinh-Gioi-Tieng-Anh-9-Huyen-Tam-Duong-Nam-Hoc-2010-2011-Co-Dap-An
No ratings yet
(123doc) - De-Thi-Hoc-Sinh-Gioi-Tieng-Anh-9-Huyen-Tam-Duong-Nam-Hoc-2010-2011-Co-Dap-An
5 pages
Imt 103-1 The Process of Academic Writing
No ratings yet
Imt 103-1 The Process of Academic Writing
36 pages
CV Template Achiever
No ratings yet
CV Template Achiever
2 pages
Big English Libro de Clase Unidad 1
No ratings yet
Big English Libro de Clase Unidad 1
10 pages
Intercultural Nonverbal Communication Competence
No ratings yet
Intercultural Nonverbal Communication Competence
16 pages
Sdit. 1
No ratings yet
Sdit. 1
7 pages
Silabo INMUNOLOGÍA
No ratings yet
Silabo INMUNOLOGÍA
5 pages
Grammar Revision
No ratings yet
Grammar Revision
3 pages
Essay - Plan - Marking - Criteria For IB
No ratings yet
Essay - Plan - Marking - Criteria For IB
1 page
3b. Communicating Effectively in A Multicultural World
No ratings yet
3b. Communicating Effectively in A Multicultural World
2 pages
Longman Dictionary of Language Teaching and Applied Linguistics
No ratings yet
Longman Dictionary of Language Teaching and Applied Linguistics
1 page