NLP Lab - Manual
NLP Lab - Manual
NAME :
REG.NO :
(24P2CSEP01)
SEMESTER–II
2024-2026
VIVEKANANDHA
COLLEGE OF ARTS AND SCIENCES FOR WOMEN
(AUTONOMOUS)
AnISO9001:2015CertifiedInstitution
(Affiliated to Periyar University-Salem, Approved by AICTE,
Reaccredited with “A++”Grade by NAAC, RecognizedU/S12(B), 2(f)ofUGCAct1956)
Elayampalayam , Tiruchengode-637205.
Tiruchengode.
1
Tokenize a given text
A. Perform Stemming
5 B. Lemmatize a given Text
1. Tokenize a text
# Example text
text = "NLP makes machines understand language. Tokenization is the first step."
# Sentence Tokenization
print("Sentences:", sent_tokenize(text))
# Word Tokenization
print("Words:", word_tokenize(text))
OUTPUT
PROGRAM
# Sentence Tokenization
sentences = sent_tokenize(text)
OUTPUT
PROGRAM
# Example text
text = "Python is great! It's simple and powerful."
5. A. Perform Stemming
ps = PorterStemmer()
for w in words:
print(w, " : ", ps.stem(w))
OUTPUT
PROGRAM
def lemmatize_text(text):
lemmatizer = WordNetLemmatizer()
tokens = word_tokenize(text)
lemmatized_text = ' '.join([lemmatizer.lemmatize(word) for word in tokens])
return lemmatized_text
text = "The cats are chasing mice and playing in the garden"
lemmatized_text = lemmatize_text(text)
import nltk
from nltk.corpus import stopwords
from collections import Counter
import string
# Download stopwords (only needed once)
nltk.download("stopwords")
11. a) load the iris data from a given csv file into a dataframe and print
the shape of the data, type of the data and first 3 rows.
import pandas as pd
data =
pd.read_csv("iris.csv")
print("Shape of the
data:") print(data.shape)
print("\nData Type:")
print(type(data))
print("\nFirst 3 rows:")
print(data.head(3))
OUTPUT
Shape of the data:
(150, 6)
Data Type:
<class 'pandas.core.frame.DataFrame'>
First 3 rows:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
def chunk_sentence(sentence):
grammar = r"""
return chunked_sentence
# Example sentence
sentence = "The quick brown fox jumps over the lazy dog"
# Perform chunking
chunked_sentence = chunk_sentence(sentence)
print(chunked_sentence)
chunked_sentence.draw()
OUTPUT
PROGRAM
12. Write a Python NLTK program to find the sets of synonyms and
antonyms of a given word.
def synonym_antonym_extractor(phrase):
synonyms = []
antonyms = []
for l in syn.lemmas():
synonyms.append(l.name())
if l.antonyms():
antonyms.append(l.antonyms()[0].name())
print(set(synonyms))
print(set(antonyms))
synonym_antonym_extractor(phrase="word")
OUTPUT
PROGRAM
13. Print the first 15 random combine labeled male and labeled female
names from names corpus.