0% found this document useful (0 votes)

26 views6 pages

NLP Lab Assignment-2

The document outlines an NLP lab assignment by Nikhil Garigipati, which includes four tasks: splitting sentences in a document, tokenizing and stemming an input string, removing stopwords and rare words from a document, and identifying parts of speech. Each task is accompanied by Python code utilizing the NLTK and SpaCy libraries. The document also includes instructions for reading from a file and processing text data.

Uploaded by

durgasathwik92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views6 pages

NLP Lab Assignment-2

Uploaded by

durgasathwik92

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

NLP LAB ASSIGNMENT-2(L55+56)

NAME - NIKHIL GARIGIPATI

REG NO - 22BCE8334

Q1.Write a program to slit sentences in a document?

CODE

import nltk

from nltk.probability import FreqDist

from nltk.tokenize import word_tokenize

nltk.download('punkt_tab')

import nltk

nltk.download('punkt')

def split_sentences(text):

sentences = nltk.sent_tokenize(text)

return sentences

document = """Every Saturday I have NLP lab and NLP class.My professor is vishalakshi annepu"""

sentences = split_sentences(document)

for sentence in sentences:

print(sentence)

OUTPUT-
Q2.Perform tokenizing and stemming by reading the input string?

CODE -

from nltk.tokenize import word_tokenize

from nltk.stem import PorterStemmer

input_string = input("Enter a string: ")

tokens = word_tokenize(input_string)

stemmer = PorterStemmer()

stems = [stemmer.stem(token) for token in tokens]

print("Tokens:", tokens)

print("Stems:", stems)

OUTPUT -

Q3.Remove the stopwords and rarewords in the document?

CODE -

from google.colab import drive

drive.mount('/content/drive')

file_path = '/content/nlpLab2.txt'
with open(file_path, 'r') as file:

document = file.read()

print("Document Content:")

print(document)

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

from collections import Counter

import nltk

from google.colab import drive

file_path = '/content/nlpLab2.txt'

with open(file_path, 'r') as file:

document = file.read()

tokens = word_tokenize(document.lower())

stop_words = set(stopwords.words('english'))

filtered_tokens = [word for word in tokens if word.isalnum() and word not in stop_words]

word_counts = Counter(filtered_tokens)

filtered_tokens = [word for word in filtered_tokens if word_counts[word] > 1]

print("Filtered Tokens:", filtered_tokens)

OUTPUT -

Q4. Identify the parts of speech in the document?

CODE -

import nltk
nltk.download('averaged_perceptron_tagger')

import spacy

nlp = spacy.load("en_core_web_sm")

file_path = '/content/nlpLab2.txt' # Adjust file path here

with open(file_path, 'r') as file:

text = file.read()

doc = nlp(text)

for token in doc:

print(f'{token.text}: {token.pos_} ({token.tag_})')

OUTPUT -
END.

NLP Lab Manual Lab Work
No ratings yet
NLP Lab Manual Lab Work
24 pages
NLP 02
No ratings yet
NLP 02
6 pages
NLP Assignment1
No ratings yet
NLP Assignment1
1 page
NLP-Lab Manual - Ashwini - Kachare
No ratings yet
NLP-Lab Manual - Ashwini - Kachare
41 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
20BCP112 - NLP Lab - LAB - Manual
No ratings yet
20BCP112 - NLP Lab - LAB - Manual
65 pages
NLP Lab File
No ratings yet
NLP Lab File
15 pages
NLP Lab File
No ratings yet
NLP Lab File
13 pages
NLP Lab File
No ratings yet
NLP Lab File
13 pages
20BCP123 - NLP Lab Manual
No ratings yet
20BCP123 - NLP Lab Manual
45 pages
123 NLP 456
No ratings yet
123 NLP 456
4 pages
Programs Code
No ratings yet
Programs Code
7 pages
NLP Lab1
No ratings yet
NLP Lab1
2 pages
AI Lab Manual Aktu
No ratings yet
AI Lab Manual Aktu
11 pages
Record
No ratings yet
Record
6 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
3 pages
S 20
No ratings yet
S 20
1 page
Pranay Assign 1
No ratings yet
Pranay Assign 1
2 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Pranay Assign 2
No ratings yet
Pranay Assign 2
1 page
Vatsav Assign 2
No ratings yet
Vatsav Assign 2
1 page
Sahil NLP
No ratings yet
Sahil NLP
16 pages
AI Practical No 9-13
No ratings yet
AI Practical No 9-13
5 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
Bling
No ratings yet
Bling
7 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
Soundarya 256 NLP Practs
No ratings yet
Soundarya 256 NLP Practs
14 pages
Assignment 2 NLP 20bci7108
No ratings yet
Assignment 2 NLP 20bci7108
2 pages
H7 W5 NLP - Merged
No ratings yet
H7 W5 NLP - Merged
17 pages
01 NLP - Merged Vinay
No ratings yet
01 NLP - Merged Vinay
27 pages
NLP Expts
No ratings yet
NLP Expts
41 pages
Digital Assignment-1: Name: Bejugam Shiva Suprith REG NO: 18BCE0427 Faculty: Natarajan P SLOT: L45+L46
No ratings yet
Digital Assignment-1: Name: Bejugam Shiva Suprith REG NO: 18BCE0427 Faculty: Natarajan P SLOT: L45+L46
4 pages
NLP
No ratings yet
NLP
12 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
Final NLP Lab File
No ratings yet
Final NLP Lab File
28 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
NLP - Practical List
No ratings yet
NLP - Practical List
14 pages
65 SC Tae1 A3
No ratings yet
65 SC Tae1 A3
3 pages
NLP Experiment 2
No ratings yet
NLP Experiment 2
5 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
Text Preprocessing For NLP
No ratings yet
Text Preprocessing For NLP
15 pages
Natural Language Processing: Practical 1
No ratings yet
Natural Language Processing: Practical 1
64 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Ir Manual
No ratings yet
Ir Manual
53 pages
SNLP
No ratings yet
SNLP
18 pages
Shubham Jade MSC It 31031420010 NLP Practical Journal
No ratings yet
Shubham Jade MSC It 31031420010 NLP Practical Journal
17 pages
Machine Learning NLP LAB Sayak Mallick
No ratings yet
Machine Learning NLP LAB Sayak Mallick
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Pyqt6 101: A Beginner’s Guide to PyQt6
From Everand
Pyqt6 101: A Beginner’s Guide to PyQt6
Edward Chang
No ratings yet
Fresher PyQt5: A Beginner’s Guide to PyQt5
From Everand
Fresher PyQt5: A Beginner’s Guide to PyQt5
Edward Chang
No ratings yet
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
From Everand
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
Michael Blake
5/5 (1)

NLP Lab Assignment-2

Uploaded by

NLP Lab Assignment-2

Uploaded by

NLP LAB ASSIGNMENT-2(L55+56)

NAME - NIKHIL GARIGIPATI

Q1.Write a program to slit sentences in a document?

from nltk.probability import FreqDist

from nltk.tokenize import word_tokenize

for sentence in sentences:

from nltk.tokenize import word_tokenize

from nltk.stem import PorterStemmer

input_string = input("Enter a string: ")

stems = [stemmer.stem(token) for token in tokens]

Q3.Remove the stopwords and rarewords in the document?

from google.colab import drive

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

from collections import Counter

from google.colab import drive

with open(file_path, 'r') as file:

filtered_tokens = [word for word in filtered_tokens if word_counts[word] > 1]

print("Filtered Tokens:", filtered_tokens)

Q4. Identify the parts of speech in the document?

file_path = '/content/nlpLab2.txt' # Adjust file path here

with open(file_path, 'r') as file:

for token in doc:

print(f'{token.text}: {token.pos_} ({token.tag_})')

You might also like