Ccs369-Lab Ex 3,4,5
Ccs369-Lab Ex 3,4,5
AIM:
ALGORITHM:
PROGRAM:
1. Install NLTK:
pip install nltk
2. Import NLTK:
import nltk
3. Download Corpora:
Nltk.download(‘gutenberg’)
OUTPUT:
True
4. Access a Corpus:
from nltk.corpus import gutenberg
print(gutenberg.fileids())
Page | 11
text=gutenberg.raw('austen-emma.txt')
print(text[:1000])
OUTPUT:
['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt',
'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.txt', 'carroll-alice.txt',
'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'edgeworth-
parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-
caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-
leaves.txt']
[Emma by Jane Austen 1816]
VOLUME I
CHAPTER I
Emma Woodhouse, handsome, clever, and rich, with a comfortable home
and happy disposition, seemed to unite some of the best blessings
of existence; and had lived nearly twenty-one years in the world
with very little to distress or vex her.
She was the youngest of the two daughters of a most affectionate,
indulgent father; and had, in consequence of her sister's marriage,
been mistress of his house from a very early period. Her mother
had died too long ago for her to have more than an indistinct
remembrance of her caresses; and her place had been supplied
by an excellent woman as governess, who had fallen little short
of a mother in affection.
Sixteen years had Miss Taylor been in Mr. Woodhouse's family,
less as a governess than a friend, very fond of both daughters,
but particularly of Emma. Between _them_ it was more the intimacy
of sisters. Even before Miss Taylor had ceased to hold the nominal
office of governess, the mildness o
5. Downloading:
import nltk
Nltk.download(‘brown’)
OUTPUT:
True
Page | 12
6. Working with Other Corpora:
from nltk.corpus import brown
print(brown.categories())
news_text=brown.raw(categories=’news’)
print(news_text[:1000])
OUTPUT:
['adventure', 'belles_lettres', 'editorial', 'fiction', 'government', 'hobbies', 'humor',
'learned', 'lore', 'mystery', 'news', 'religion', 'reviews', 'romance', 'science_fiction']
OUTPUT:
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've",
"you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his']
Page | 13
RESULT:
Thus the python program for Accessing text corpora using NLTK in
python was executed successfully and the output is verified.
Page | 14
WRITE A FUNCTION THAT FINDS THE 50 MOST
EX.NO:04 FREQUENTLY OCCURING WORDS OF A TEXT
THAT ARE NOT STOP WORDS
AIM:
ALGORITHM:
PROGRAM:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from collections import Counter
import string
nltk.download('stopwords')
nltk.download('punkt')
def get_most_frequent_words(text,num_words=50):
stop_words=set(stopwords.words('english'))
Page | 15
words=word_tokenize(text)
words=[word.lower() for word in words if word.isalpha()]
filtered_words=[word for word in words if word not in stop_words]
word_counts=Counter(filtered_words)
most_common_words=word_counts.most_common(num_words)
return most_common_words
if __name == " main ":
example_text="""Everything we see around us constitutes nature, including
the sun, the moon, trees, flowers, fruits, human beings, birds, animals, etc. In
nature, everyone depends on one another to keep the ecosystem healthy. For
survival, every creature is interrelated and reliant on one another. Humans, for
example, rely on nature for their survival, and nature provides us with oxygen,
food, water, shelter, medicines, and clothing, among other things."""
top_words=get_most_frequent_words(example_text)
print(top_words)
OUTPUT:
[('nature', 4), ('us', 2), ('one', 2), ('another', 2), ('survival', 2), ('everything', 1),
('see', 1), ('around', 1), ('constitutes', 1), ('including', 1), ('sun', 1), ('moon', 1),
('trees', 1), ('flowers', 1), ('fruits', 1), ('human', 1), ('beings', 1), ('birds', 1),
('animals', 1), ('etc', 1), ('everyone', 1), ('depends', 1), ('keep', 1), ('ecosystem', 1),
('healthy', 1), ('every', 1), ('creature', 1), ('interrelated', 1), ('reliant', 1), ('humans',
1), ('example', 1), ('rely', 1), ('provides', 1), ('oxygen', 1), ('food', 1), ('water', 1),
('shelter', 1), ('medicines', 1), ('clothing', 1), ('among', 1), ('things', 1)]
RESULT:
Thus the python program to write a function that finds the 50 most
frequently occurring words of a text that are not stop words was executed
successfully and the output is verified.
Page | 16
EX.NO:05
IMPLEMENT THE WORD2VEC MODEL
AIM:
ALGORITHM:
PROGRAM:
Page | 17
OUTPUT:
RESULT:
Page | 18