0% found this document useful (0 votes)
26 views6 pages

NLP Lab Assignment-2

The document outlines an NLP lab assignment by Nikhil Garigipati, which includes four tasks: splitting sentences in a document, tokenizing and stemming an input string, removing stopwords and rare words from a document, and identifying parts of speech. Each task is accompanied by Python code utilizing the NLTK and SpaCy libraries. The document also includes instructions for reading from a file and processing text data.

Uploaded by

durgasathwik92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views6 pages

NLP Lab Assignment-2

The document outlines an NLP lab assignment by Nikhil Garigipati, which includes four tasks: splitting sentences in a document, tokenizing and stemming an input string, removing stopwords and rare words from a document, and identifying parts of speech. Each task is accompanied by Python code utilizing the NLTK and SpaCy libraries. The document also includes instructions for reading from a file and processing text data.

Uploaded by

durgasathwik92
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

NLP LAB ASSIGNMENT-2(L55+56)

NAME - NIKHIL GARIGIPATI

REG NO - 22BCE8334

Q1.Write a program to slit sentences in a document?

CODE

import nltk

from nltk.probability import FreqDist

from nltk.tokenize import word_tokenize

nltk.download('punkt_tab')

import nltk

nltk.download('punkt')

def split_sentences(text):

sentences = nltk.sent_tokenize(text)

return sentences

document = """Every Saturday I have NLP lab and NLP class.My professor is vishalakshi annepu"""

sentences = split_sentences(document)

for sentence in sentences:

print(sentence)

OUTPUT-
Q2.Perform tokenizing and stemming by reading the input string?

CODE -

from nltk.tokenize import word_tokenize

from nltk.stem import PorterStemmer

input_string = input("Enter a string: ")

tokens = word_tokenize(input_string)

stemmer = PorterStemmer()

stems = [stemmer.stem(token) for token in tokens]

print("Tokens:", tokens)

print("Stems:", stems)

OUTPUT -

Q3.Remove the stopwords and rarewords in the document?

CODE -

from google.colab import drive

drive.mount('/content/drive')

file_path = '/content/nlpLab2.txt'
with open(file_path, 'r') as file:

document = file.read()

print("Document Content:")

print(document)

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

from collections import Counter

import nltk

from google.colab import drive

file_path = '/content/nlpLab2.txt'

with open(file_path, 'r') as file:

document = file.read()

tokens = word_tokenize(document.lower())

stop_words = set(stopwords.words('english'))

filtered_tokens = [word for word in tokens if word.isalnum() and word not in stop_words]

word_counts = Counter(filtered_tokens)

filtered_tokens = [word for word in filtered_tokens if word_counts[word] > 1]

print("Filtered Tokens:", filtered_tokens)

OUTPUT -

Q4. Identify the parts of speech in the document?

CODE -

import nltk
nltk.download('averaged_perceptron_tagger')

import spacy

nlp = spacy.load("en_core_web_sm")

file_path = '/content/nlpLab2.txt' # Adjust file path here

with open(file_path, 'r') as file:

text = file.read()

doc = nlp(text)

for token in doc:

print(f'{token.text}: {token.pos_} ({token.tag_})')

OUTPUT -
END.

You might also like