0% found this document useful (0 votes)
36 views6 pages

BR PRB 2

The document discusses analyzing inaugural speeches from three US presidents - Franklin D. Roosevelt in 1941, John F. Kennedy in 1961, and Richard Nixon in 1973 - using the NLTK library in Python. It performs the following analysis: 1. Finds the number of characters, words, and sentences for each speech. 2. Removes common stop words from the speeches. 3. Determines the top three most commonly used words for each president after removing stop words. 4. Generates a word cloud visualization for each speech to depict the most used words.

Uploaded by

Pratigya pathak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views6 pages

BR PRB 2

The document discusses analyzing inaugural speeches from three US presidents - Franklin D. Roosevelt in 1941, John F. Kennedy in 1961, and Richard Nixon in 1973 - using the NLTK library in Python. It performs the following analysis: 1. Finds the number of characters, words, and sentences for each speech. 2. Removes common stop words from the speeches. 3. Determines the top three most commonly used words for each president after removing stop words. 4. Generates a word cloud visualization for each speech to depict the most used words.

Uploaded by

Pratigya pathak
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Problem 2:

In this particular project, we are going to work on the inaugural corpora from the nltk

in Python. We will be looking at the following speeches of the Presidents of the

United States of America:

1. President Franklin D. Roosevelt in 1941

2. President John F. Kennedy in 1961

3. President Richard Nixon in 1973

2.1 Find the number of characters, words, and

sentences for the mentioned documents. – 3Marks.

Import Libraries.

import nltk

nltk.download('inaugural')

from nltk.corpus import inaugural

inaugural.fileids()

inaugural.raw('1941-Roosevelt.txt')

inaugural.raw('1961-Kennedy.txt')

inaugural.raw('1973-Nixon.txt')

[nltk_data] Downloading package stopwords to

[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...

[nltk_data] Package stopwords is already up-to-date!

[nltk_data] Downloading package punkt to

[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...

[nltk_data] Package punkt is already up-to-date!

[nltk_data] Downloading package movie_reviews to

[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...
[nltk_data] Package movie_reviews is already up-to-date!

[nltk_data] Downloading package inaugural to

[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...

[nltk_data] Package inaugural is already up-to-date!

y0 = pd.DataFrame({'Text':inaugural.raw('1961-Kennedy.txt')},index = [0])

y1 = pd.DataFrame({'Text':inaugural.raw('1941-Roosevelt.txt')},index = [0])

y2 = pd.DataFrame({'Text':inaugural.raw( '1973-Nixon.txt')},index = [0])

Text wordcount char count sent c

[('the', 9446),

('of', 7087),

(',', 7045),

('and', 5146),

('.', 4856),

('to', 4414),

('in', 2561),

('a', 2184),

('our', 2021),

('that', 1748)]

Most Common top (10) Words Used by all 3 Presidents during the Inaugural Ceremony since the

Time.

2.2 Remove all the stop words from all three

speeches. – 3 Marks.

We can filter the stop words with the help to Filter, Sort & Stop function.

'i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "yo

u're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yours


elves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', '

herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'thei

rs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "tha

t'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been'

, 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing'

, 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',

'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between'

, 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to

', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'ag

ain', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why

', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other',

'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than'

, 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'shou

ld', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', '

aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "does

n't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "i

sn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn'

t", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren'

, "weren't", 'won', "won't", 'wouldn', "wouldn't"]

from nltk.tokenize import word_tokenize

text =inaugural.raw('1941-Roosevelt.txt')

text_tokens = word_tokenize(y1['Text'][0])

tokens_without_sw = [word for word in text_tokens if not word in stop_t

est]

print(tokens_without_sw)
We need to tokenize the all three speeches to get the stop words and to get out the special

characters, Sentences and Words out of the Speeches.

filtered_sentence = (" ").join(tokens_without_sw)

print(filtered_sentence)

Need to Filter all speeches to get the speech in proper Maner., we can use function Filter Sentences.

2.3 Which word occurs the most number of

times in his inaugural address for each

president? Mention the top three words. (after

removing the stopwords) – 3 Marks¶

from collections import Counter

Roosevelt_split = filtered_sentence.split()#y0['Text'][0].split()

Roosevelt_counter = Counter(Roosevelt_split)

Kennedy_split = filtered_sentence.split()#y1['Text'][0].split()

Kenndey_counter = Counter(Kennedy_split)

Nixon_split = filtered_sentence.split()#y2['Text'][0].split()

Nixon_counter = Counter(Nixon_split)

In [39]:

Roosevelt_most_occur = Roosevelt_counter.most_common(10)

print("Most common word of Roosevelt speech ",Roosevelt_most_occur )

Roosevelt_freq = pd.DataFrame(Roosevelt_most_occur, columns= ['Roosevelt_Fr

equent_words', 'Roosevelt_total_words'])

Roosevelt_freq

Kennedy_most_occur = Kenndey_counter.most_common(10)

print("Most common word of Kennedy speech ",Kennedy_most_occur )

Kennedy_freq = pd.DataFrame(Kennedy_most_occur, columns= ['Kennedy_Frequent


_words', 'Kennedy_total_words'])

Kennedy_freq

Nixon_most_occur = Nixon_counter.most_common(10)

print("Most common word of Nixon speech ",Nixon_most_occur )

Nixon_freq = pd.DataFrame(Nixon_most_occur, columns= ['Nixon_Frequent_words

', 'Nixon_total_words'])

Nixon_freq

Nixon_Frequent_words Nixon_total_words

0 , 77

1 . 68

2 -- 25

3 It 13

4 The 10

5 know 10

6 We 10

7 spirit 9

8 life 9

9 us 8

The Most Common words use by the all 3 President during the Speech.

Most common word of Roosevelt speech [(',', 77), ('.', 68), ('--', 25

), ('It', 13), ('The', 10), ('know', 10), ('We', 10), ('spirit', 9),

('life', 9), ('us', 8)]

Most common word of Kennedy speech [(',', 77), ('.', 68), ('--', 25),

('It', 13), ('The', 10), ('know', 10), ('We', 10), ('spirit', 9), ('l

ife', 9), ('us', 8)]


Most common word of Nixon speech [(',', 77), ('.', 68), ('--', 25), (

'It', 13), ('The', 10), ('know', 10), ('We', 10), ('spirit', 9), ('li

fe', 9), ('us', 8)]

2.4 Plot the word cloud of each of the speeches of

the variable. (after removing the stopwords) – 3

Marks¶

from wordcloud import WordCloud,STOPWORDS

from wordcloud import WordCloud,STOPWORDS

words = ' '.join(y0['Text'])

cleaned_word = " ".join([word for word in words.split()

if '\n' not in word

With the Help of World Cloud Function, we can distinguish the most used word by the all 3 Presidents

During the Speech. We need to change the Vales of y0,y1,& y2 for app

You might also like