0% found this document useful (0 votes)

16 views

NLP Lab1

The document provides examples of code to perform various natural language processing tasks including tokenization, stemming, part-of-speech tagging, n-gram modeling, and shallow parsing.

Uploaded by

karthikeyacharan78

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

NLP Lab1

The document provides examples of code to perform various natural language processing tasks including tokenization, stemming, part-of-speech tagging, n-gram modeling, and shallow parsing.

Uploaded by

karthikeyacharan78

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 6

1. Read the paragraph and obtain the frequency of words.

code:
import nltk
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist

paragraph = "Sukumar is good at coding and pratcing lot of problems in

leetcode .sukumar is very nice guy"

words = word_tokenize(paragraph)

fdist = FreqDist(words)
for word, frequency in fdist.items():
print(f"{word}: {frequency}")

2. Write a program to slit sentences in a document?

code:
import nltk
from nltk.tokenize import sent_tokenize

# Sample document
document = "sukumar is good boy. Sukumar in vitap"
# Tokenize the document into sentences
sentences = sent_tokenize(document)

# Print each sentence

for sentence in sentences:
print(sentence)

3.Perform tokenizing and stemming by reading the input string?

code:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer

# Sample input string

input_string = "i am running"

# Tokenize the input string into words

words = word_tokenize(input_string)

# Initialize the PorterStemmer

stemmer = PorterStemmer()

# Perform stemming on each word

stemmed_words = [stemmer.stem(word) for word in words]

# Print the original words and their stemmed forms

for original, stemmed in zip(words, stemmed_words):
print(f"{original} -> {stemmed}")

4. Remove the stopwords and rareword in the document?

code:
import nltk
nltk.download('stopwords')
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.probability import FreqDist

# Sample document
document = "running in the forest is most dangerous than any ting in world of
human. sukumar sukumar hero model model run"

# Tokenize the document into words

words = word_tokenize(document)

# Remove stopwords
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if word.lower() not in stop_words]

# Calculate the frequency distribution of words

fdist = FreqDist(filtered_words)

# Define a threshold for rare words (e.g., words that occur less than 2 times)
rare_words = [word for word, frequency in fdist.items() if frequency < 2]

# Remove rare words from the filtered words

filtered_words = [word for word in filtered_words if word not in rare_words]

# Join the filtered words back into a document

filtered_document = ' '.join(filtered_words)

print(filtered_words)

5.Identify the parts of speech in the document?

code:
import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Sample document
document = "NLTK is a leading platform for building Python programs. It provides
easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet."

# Tokenize the document into words

words = word_tokenize(document)

# Perform part-of-speech tagging

pos_tags = pos_tag(words)

# Print the part-of-speech tags

for word, pos_tag in pos_tags:
print(f"{word}: {pos_tag}")

6. Write a program to read the words form a string variable/Text and perform
tokenizing
and Lancaster stemming by reading the input string?
code:
import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import LancasterStemmer

# Sample input string

input_string = "NLTK is a leading platform for building Python programs."

# Tokenize the input string into words

words = word_tokenize(input_string)

# Initialize the LancasterStemmer

stemmer = LancasterStemmer()

# Perform stemming on each word

stemmed_words = [stemmer.stem(word) for word in words]

# Print the original words and their stemmed forms

for original, stemmed in zip(words, stemmed_words):
print(f"{original} -> {stemmed}")

7.NGRAM:
CODE:

import nltk
from nltk.tokenize import word_tokenize,sent_tokenize
from nltk.util import ngrams
import re

s= """Natural language processing is the ability of a computer program to

understand
human language as it is spoken and written referred to as natural language. It is
a
component of Artificial intelligence."""

s = s.lower()
s = re.sub(r'[^a-zA-Z0-9\s]',' ',s)
tokens = [token for token in s.split(" ") if token!=""]
ouput = list(ngrams(tokens,5))
print(ouput)

9.UNIGRAM BIGRAM TRIGRAM

CODE:
import nltk
from nltk.corpus import treebank
from nltk.tag import UnigramTagger, BigramTagger, TrigramTagger

# Download the Treebank corpus if not already downloaded

nltk.download('treebank')

# Get tagged sentences from the Treebank corpus

tagged_sentences = treebank.tagged_sents()

# Split the tagged sentences into train and test sets

train_size = int(0.8 * len(tagged_sentences))
train_sents = tagged_sentences[:train_size]
test_sents = tagged_sentences[train_size:]
# Train Unigram, Bigram, and Trigram taggers
unigram_tagger = UnigramTagger(train_sents)
bigram_tagger = BigramTagger(train_sents, backoff=unigram_tagger)
trigram_tagger = TrigramTagger(train_sents, backoff=bigram_tagger)

# Evaluate the taggers on the test set

print(f"Unigram tagger accuracy: {unigram_tagger.evaluate(test_sents)}")
print(f"Bigram tagger accuracy: {bigram_tagger.evaluate(test_sents)}")
print(f"Trigram tagger accuracy: {trigram_tagger.evaluate(test_sents)}")

# Tag a sample sentence

sentence = "Barack Obama was born in Hawaii."
words = nltk.word_tokenize(sentence)
tags = trigram_tagger.tag(words)
print(tags)

10.Affix Tagger
code:
import nltk
from nltk.corpus import treebank
from nltk.tag import AffixTagger

# Download the Treebank corpus if not already downloaded

nltk.download('treebank')

# Get tagged sentences from the Treebank corpus

tagged_sentences = treebank.tagged_sents()

# Split the tagged sentences into train and test sets

train_size = int(0.8 * len(tagged_sentences))
train_sents = tagged_sentences[:train_size]
test_sents = tagged_sentences[train_size:]

# Specify the affix tagger parameters

prefix_length = 3
suffix_length = 3
min_stem_length = 2

# Train an affix tagger

affix_tagger = AffixTagger(train_sents, affix_length=prefix_length,
min_stem_length=2)

# Tag a sample sentence

sentence = "Barack Obama was born in Hawaii."
words = nltk.word_tokenize(sentence)
tags = affix_tagger.tag(words)
print(tags)

12. Dependency parser

code:
import nltk

# Define a simple context-free grammar for parsing

# Input sentences
input_sentences = ['the dog chased the cat', 'the man saw the ball']

# Create a chart parser

parser = nltk.ChartParser(grammar)

# Iterate over input sentences

for sent in input_sentences:
# Tokenize the sentence
tokens = nltk.word_tokenize(sent)
# Parse the sentence
for tree in parser.parse(tokens):
# Convert constituency parse tree to dependency parse tree
dep_tree = nltk.tree.ParentedTree.convert(tree)
# Print the original sentence
print("Input Sentence:", sent)
# Print the dependency parse tree
print("Dependency Parse Tree:")
print(dep_tree)
print()

13.Shallow parsing
import nltk
nltk.download('averaged_perceptron_tagger')
text = "The quick brown fox jumps over the lazy dog"

tokens = nltk.word_tokenize(text)
pos_tags = nltk.pos_tag(tokens)

chunk_grammar = r"""
NP: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN
PP: {<IN><NP>} # Chunk prepositions followed by NP
VP: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
CLAUSE: {<NP><VP>} # Chunk NP followed by VP
"""

chunk_parser = nltk.RegexpParser(chunk_grammar)

chunks = chunk_parser.parse(pos_tags)

print(chunks)
#14 NER:'
code:
import nltk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag
from nltk.chunk import ne_chunk
nltk.download("punkt")
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')

doc = "Harry Potter, the young wizard with a lightning-shaped scar, attended
Hogwarts School, faced challenges, and triumphed over the dark wizard Voldemort,
bringing an end to the magical conflict."

words = word_tokenize(doc)

pos_tags = pos_tag(words)

ne_tags = ne_chunk(pos_tags)

for chunk in ne_tags:

if hasattr(chunk, 'label'):
print(chunk.label(),':',' '.join(c[0] for c in chunk))

Grammar: For Questions 1-13, Select The Word or Phrase That CORRECTLY Completes The Sentence
75% (4)
Grammar: For Questions 1-13, Select The Word or Phrase That CORRECTLY Completes The Sentence
12 pages
DLD LAB REPORT 01
No ratings yet
DLD LAB REPORT 01
10 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
3 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Final_NLP_Lab_File
No ratings yet
Final_NLP_Lab_File
28 pages
01 NLP - Merged Vinay
No ratings yet
01 NLP - Merged Vinay
27 pages
1
No ratings yet
1
13 pages
Programs code
No ratings yet
Programs code
7 pages
AI Lab Manual aktu
No ratings yet
AI Lab Manual aktu
11 pages
All Practicals
No ratings yet
All Practicals
33 pages
Morphological Colab
No ratings yet
Morphological Colab
2 pages
Unit 5 Machine Learning
No ratings yet
Unit 5 Machine Learning
9 pages
NLP PDF
No ratings yet
NLP PDF
3 pages
H7 W5 NLP - Merged
No ratings yet
H7 W5 NLP - Merged
17 pages
NLP Programs
No ratings yet
NLP Programs
13 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
prog 1
No ratings yet
prog 1
2 pages
R22 Nlp Python Programs
No ratings yet
R22 Nlp Python Programs
15 pages
CCS369 - Text and Speech Analysis
No ratings yet
CCS369 - Text and Speech Analysis
31 pages
NLP_Lab_1.ipynb - Colab
No ratings yet
NLP_Lab_1.ipynb - Colab
4 pages
NLP (1)
No ratings yet
NLP (1)
12 pages
NLP 3
No ratings yet
NLP 3
3 pages
Sahil NLP
No ratings yet
Sahil NLP
16 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
NLP EXP 3(a) - Word Analysis
No ratings yet
NLP EXP 3(a) - Word Analysis
2 pages
NLP Lab File
No ratings yet
NLP Lab File
13 pages
NLP_Preprocessing_Steps__1740444240
No ratings yet
NLP_Preprocessing_Steps__1740444240
20 pages
NLP Op
No ratings yet
NLP Op
16 pages
NLP Lab Complete
No ratings yet
NLP Lab Complete
23 pages
NLP lab Manual (3)
No ratings yet
NLP lab Manual (3)
7 pages
Record
No ratings yet
Record
6 pages
NLP Assignment(917722H031)
No ratings yet
NLP Assignment(917722H031)
18 pages
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
No ratings yet
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
11 pages
Assignment 2 NLP 20bci7108
No ratings yet
Assignment 2 NLP 20bci7108
2 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
NLP Lab File (1)
No ratings yet
NLP Lab File (1)
13 pages
x0 Process
No ratings yet
x0 Process
4 pages
PR 7
No ratings yet
PR 7
2 pages
Text Chunking Using NLTK
No ratings yet
Text Chunking Using NLTK
24 pages
NLP record
No ratings yet
NLP record
16 pages
NLP Notes and Related Questions
No ratings yet
NLP Notes and Related Questions
7 pages
1
No ratings yet
1
2 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
Ir 1 Stop Word Removed
No ratings yet
Ir 1 Stop Word Removed
1 page
Assignment II - NLP - D
No ratings yet
Assignment II - NLP - D
4 pages
NLP___
No ratings yet
NLP___
28 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
TSA Student
No ratings yet
TSA Student
20 pages
Editor Python Online - Contoh Source Code - Kode Program 1
No ratings yet
Editor Python Online - Contoh Source Code - Kode Program 1
3 pages
Sample Program Using Python 3
No ratings yet
Sample Program Using Python 3
5 pages
Naive Bayes
No ratings yet
Naive Bayes
11 pages
DL_6
No ratings yet
DL_6
4 pages
NLP LAB_MANUAL (1)
No ratings yet
NLP LAB_MANUAL (1)
33 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
Nlp Lab Manual
No ratings yet
Nlp Lab Manual
21 pages
NLP Lab File
No ratings yet
NLP Lab File
15 pages
NLP 02
No ratings yet
NLP 02
6 pages
Python Scripts
No ratings yet
Python Scripts
5 pages
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
2023 Ashley Garcia Resume
No ratings yet
2023 Ashley Garcia Resume
2 pages
Writing Task 1
No ratings yet
Writing Task 1
8 pages
Savitribai Phule Pune University: Automation and Robotics
No ratings yet
Savitribai Phule Pune University: Automation and Robotics
2 pages
Download Full Advances in Fish Processing Technologies : Preservation, Waste Utilization, and Safety Assurance 1st Edition Ranendra K. Majumder PDF All Chapters
100% (6)
Download Full Advances in Fish Processing Technologies : Preservation, Waste Utilization, and Safety Assurance 1st Edition Ranendra K. Majumder PDF All Chapters
40 pages
POLLUTER PAYS PRINCIPLE
No ratings yet
POLLUTER PAYS PRINCIPLE
5 pages
Deloitte Governance Model Table
No ratings yet
Deloitte Governance Model Table
1 page
Dvcpro Format
No ratings yet
Dvcpro Format
80 pages
Learning Module 2 - Historical Sources Its Classification
No ratings yet
Learning Module 2 - Historical Sources Its Classification
9 pages
Low-Voltage Halogen Lamps Without Reflector: Product Family Datasheet
No ratings yet
Low-Voltage Halogen Lamps Without Reflector: Product Family Datasheet
13 pages
RC Beam Design Using MCDX
No ratings yet
RC Beam Design Using MCDX
20 pages
Nanoscale by Kanan
No ratings yet
Nanoscale by Kanan
10 pages
Introduction To Health Service Management-1
No ratings yet
Introduction To Health Service Management-1
66 pages
Homo Sapien Got Hacked
No ratings yet
Homo Sapien Got Hacked
78 pages
[FREE PDF sample] How to Analyze People: Proven Techniques to Analyze People on Sight and Read Anyone Like a Book; Simple Tricks to Understand the Human Mind and Master Human Psychology Allan Goldman ebooks
100% (1)
[FREE PDF sample] How to Analyze People: Proven Techniques to Analyze People on Sight and Read Anyone Like a Book; Simple Tricks to Understand the Human Mind and Master Human Psychology Allan Goldman ebooks
55 pages
Stigma On Mental Health
No ratings yet
Stigma On Mental Health
5 pages
Bellows-Enclosed Shape RFQ
No ratings yet
Bellows-Enclosed Shape RFQ
2 pages
Readings 1 5
No ratings yet
Readings 1 5
15 pages
MIT Computational and Systems Biology
No ratings yet
MIT Computational and Systems Biology
5 pages
7 Netzach
No ratings yet
7 Netzach
2 pages
Principles and Methods of Food Processing & Preservation
100% (1)
Principles and Methods of Food Processing & Preservation
34 pages
The Mangalayatan University Uttar Pradesh Act, 2006
No ratings yet
The Mangalayatan University Uttar Pradesh Act, 2006
15 pages
Goldsmiths Thesis Online
100% (2)
Goldsmiths Thesis Online
4 pages
Project Monitoring and Control Process 10 Key Activities
No ratings yet
Project Monitoring and Control Process 10 Key Activities
57 pages
All Elements
100% (2)
All Elements
392 pages
Essay Structure (CAE Writing, Part 1)
No ratings yet
Essay Structure (CAE Writing, Part 1)
2 pages
Bolu Wife's Work
No ratings yet
Bolu Wife's Work
13 pages
ĐỀ VÀ ĐÁP ÁN CHON TUYỂN TỈNH LỚP 11 CTN
No ratings yet
ĐỀ VÀ ĐÁP ÁN CHON TUYỂN TỈNH LỚP 11 CTN
17 pages
Prime Time Volumen 3. Work Book Grammar Reference Word List Module 5
No ratings yet
Prime Time Volumen 3. Work Book Grammar Reference Word List Module 5
12 pages

NLP Lab1

Uploaded by

NLP Lab1

Uploaded by

1. Read the paragraph and obtain the frequency of words.

paragraph = "Sukumar is good at coding and pratcing lot of problems in

2. Write a program to slit sentences in a document?

# Print each sentence

3.Perform tokenizing and stemming by reading the input string?

# Sample input string

# Tokenize the input string into words

# Initialize the PorterStemmer

# Perform stemming on each word

# Print the original words and their stemmed forms

4. Remove the stopwords and rareword in the document?

# Tokenize the document into words

# Calculate the frequency distribution of words

# Remove rare words from the filtered words

# Join the filtered words back into a document

5.Identify the parts of speech in the document?

# Tokenize the document into words

# Perform part-of-speech tagging

# Print the part-of-speech tags

# Sample input string

# Tokenize the input string into words

# Initialize the LancasterStemmer

# Perform stemming on each word

# Print the original words and their stemmed forms

s= """Natural language processing is the ability of a computer program to

9.UNIGRAM BIGRAM TRIGRAM

# Download the Treebank corpus if not already downloaded

# Get tagged sentences from the Treebank corpus

# Split the tagged sentences into train and test sets

# Evaluate the taggers on the test set

# Tag a sample sentence

# Download the Treebank corpus if not already downloaded

# Get tagged sentences from the Treebank corpus

# Split the tagged sentences into train and test sets

# Specify the affix tagger parameters

# Train an affix tagger

# Tag a sample sentence

12. Dependency parser

# Define a simple context-free grammar for parsing

# Create a chart parser

# Iterate over input sentences

for chunk in ne_tags:

You might also like