0% found this document useful (0 votes)

36 views6 pages

BR PRB 2

The document discusses analyzing inaugural speeches from three US presidents - Franklin D. Roosevelt in 1941, John F. Kennedy in 1961, and Richard Nixon in 1973 - using the NLTK library in Python. It performs the following analysis: 1. Finds the number of characters, words, and sentences for each speech. 2. Removes common stop words from the speeches. 3. Determines the top three most commonly used words for each president after removing stop words. 4. Generates a word cloud visualization for each speech to depict the most used words.

Uploaded by

Pratigya pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views6 pages

BR PRB 2

Uploaded by

Pratigya pathak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Problem 2:

In this particular project, we are going to work on the inaugural corpora from the nltk

in Python. We will be looking at the following speeches of the Presidents of the

United States of America:

1. President Franklin D. Roosevelt in 1941

2. President John F. Kennedy in 1961

3. President Richard Nixon in 1973

2.1 Find the number of characters, words, and

sentences for the mentioned documents. – 3Marks.

Import Libraries.

import nltk

nltk.download('inaugural')

from nltk.corpus import inaugural

inaugural.fileids()

inaugural.raw('1941-Roosevelt.txt')

inaugural.raw('1961-Kennedy.txt')

inaugural.raw('1973-Nixon.txt')

[nltk_data] Downloading package stopwords to

[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...

[nltk_data] Package stopwords is already up-to-date!

[nltk_data] Downloading package punkt to

[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...

[nltk_data] Package punkt is already up-to-date!

[nltk_data] Downloading package movie_reviews to

[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...
[nltk_data] Package movie_reviews is already up-to-date!

[nltk_data] Downloading package inaugural to

[nltk_data] C:\Users\Hp\AppData\Roaming\nltk_data...

[nltk_data] Package inaugural is already up-to-date!

y0 = pd.DataFrame({'Text':inaugural.raw('1961-Kennedy.txt')},index = [0])

y1 = pd.DataFrame({'Text':inaugural.raw('1941-Roosevelt.txt')},index = [0])

y2 = pd.DataFrame({'Text':inaugural.raw( '1973-Nixon.txt')},index = [0])

Text wordcount char count sent c

[('the', 9446),

('of', 7087),

(',', 7045),

('and', 5146),

('.', 4856),

('to', 4414),

('in', 2561),

('a', 2184),

('our', 2021),

('that', 1748)]

Most Common top (10) Words Used by all 3 Presidents during the Inaugural Ceremony since the

Time.

2.2 Remove all the stop words from all three

speeches. – 3 Marks.

We can filter the stop words with the help to Filter, Sort & Stop function.

'i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "yo

u're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yours

elves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', '

herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'thei

rs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "tha

t'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been'

, 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing'

, 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',

'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between'

, 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to

', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'ag

ain', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why

', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other',

'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than'

, 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'shou

ld', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', '

aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "does

n't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "i

sn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn'

t", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren'

, "weren't", 'won', "won't", 'wouldn', "wouldn't"]

from nltk.tokenize import word_tokenize

text =inaugural.raw('1941-Roosevelt.txt')

text_tokens = word_tokenize(y1['Text'][0])

tokens_without_sw = [word for word in text_tokens if not word in stop_t

est]

print(tokens_without_sw)
We need to tokenize the all three speeches to get the stop words and to get out the special

characters, Sentences and Words out of the Speeches.

filtered_sentence = (" ").join(tokens_without_sw)

print(filtered_sentence)

Need to Filter all speeches to get the speech in proper Maner., we can use function Filter Sentences.

2.3 Which word occurs the most number of

times in his inaugural address for each

president? Mention the top three words. (after

removing the stopwords) – 3 Marks¶

from collections import Counter

Roosevelt_split = filtered_sentence.split()#y0['Text'][0].split()

Roosevelt_counter = Counter(Roosevelt_split)

Kennedy_split = filtered_sentence.split()#y1['Text'][0].split()

Kenndey_counter = Counter(Kennedy_split)

Nixon_split = filtered_sentence.split()#y2['Text'][0].split()

Nixon_counter = Counter(Nixon_split)

In [39]:

Roosevelt_most_occur = Roosevelt_counter.most_common(10)

print("Most common word of Roosevelt speech ",Roosevelt_most_occur )

Roosevelt_freq = pd.DataFrame(Roosevelt_most_occur, columns= ['Roosevelt_Fr

equent_words', 'Roosevelt_total_words'])

Roosevelt_freq

Kennedy_most_occur = Kenndey_counter.most_common(10)

print("Most common word of Kennedy speech ",Kennedy_most_occur )

Kennedy_freq = pd.DataFrame(Kennedy_most_occur, columns= ['Kennedy_Frequent

_words', 'Kennedy_total_words'])

Kennedy_freq

Nixon_most_occur = Nixon_counter.most_common(10)

print("Most common word of Nixon speech ",Nixon_most_occur )

Nixon_freq = pd.DataFrame(Nixon_most_occur, columns= ['Nixon_Frequent_words

', 'Nixon_total_words'])

Nixon_freq

Nixon_Frequent_words Nixon_total_words

0 , 77

1 . 68

2 -- 25

3 It 13

4 The 10

5 know 10

6 We 10

7 spirit 9

8 life 9

9 us 8

The Most Common words use by the all 3 President during the Speech.

Most common word of Roosevelt speech [(',', 77), ('.', 68), ('--', 25

), ('It', 13), ('The', 10), ('know', 10), ('We', 10), ('spirit', 9),

('life', 9), ('us', 8)]

Most common word of Kennedy speech [(',', 77), ('.', 68), ('--', 25),

('It', 13), ('The', 10), ('know', 10), ('We', 10), ('spirit', 9), ('l

ife', 9), ('us', 8)]

Most common word of Nixon speech [(',', 77), ('.', 68), ('--', 25), (

'It', 13), ('The', 10), ('know', 10), ('We', 10), ('spirit', 9), ('li

fe', 9), ('us', 8)]

2.4 Plot the word cloud of each of the speeches of

the variable. (after removing the stopwords) – 3

Marks¶

from wordcloud import WordCloud,STOPWORDS

words = ' '.join(y0['Text'])

cleaned_word = " ".join([word for word in words.split()

if '\n' not in word

With the Help of World Cloud Function, we can distinguish the most used word by the all 3 Presidents

During the Speech. We need to change the Vales of y0,y1,& y2 for app

Supply Chin Management of Himalaya Company
83% (6)
Supply Chin Management of Himalaya Company
79 pages
New General Service List
No ratings yet
New General Service List
154 pages
Medicare ACOs 2015 Starters PDF
No ratings yet
Medicare ACOs 2015 Starters PDF
30 pages
Car Rental Mobile Apps
60% (5)
Car Rental Mobile Apps
56 pages
UNIT-2-Starter Motor and Drives
100% (1)
UNIT-2-Starter Motor and Drives
34 pages
Ecological Concern in Mahabharata
100% (1)
Ecological Concern in Mahabharata
3 pages
Hydraulic Motors: Concentric AB
100% (1)
Hydraulic Motors: Concentric AB
47 pages
E7125v3.0 (With RoHS)
No ratings yet
E7125v3.0 (With RoHS)
120 pages
BSSW 3 2 Proposal
No ratings yet
BSSW 3 2 Proposal
5 pages
Tutorial 2
No ratings yet
Tutorial 2
82 pages
Living with Complexity
From Everand
Living with Complexity
Donald A. Norman
3.5/5 (18)
Assignment 7 Text Analysis
No ratings yet
Assignment 7 Text Analysis
76 pages
Problem 2 Businessreport ML Docx 1
No ratings yet
Problem 2 Businessreport ML Docx 1
9 pages
20BCP123 - NLP Lab Manual
No ratings yet
20BCP123 - NLP Lab Manual
45 pages
Harvard Problem of The Week 19
No ratings yet
Harvard Problem of The Week 19
3 pages
Text Summarizer
No ratings yet
Text Summarizer
30 pages
Us Presidential Vocabulary - Ipynb
No ratings yet
Us Presidential Vocabulary - Ipynb
40 pages
Text-Summarizer-Using-Nlp-Advanced-Copy1 Updated
No ratings yet
Text-Summarizer-Using-Nlp-Advanced-Copy1 Updated
31 pages
Arnold & Son Catalogue 2013-2014
No ratings yet
Arnold & Son Catalogue 2013-2014
168 pages
Facility Location 9
No ratings yet
Facility Location 9
20 pages
DAV Example
No ratings yet
DAV Example
5 pages
The Reptilian Blood Legacy
No ratings yet
The Reptilian Blood Legacy
1 page
Captura de Pantalla 2024-09-24 A La(s) 1.28.16 P. M.
No ratings yet
Captura de Pantalla 2024-09-24 A La(s) 1.28.16 P. M.
34 pages
Filariasis
No ratings yet
Filariasis
2 pages
Text Mining Basics
No ratings yet
Text Mining Basics
16 pages
Online Bakery Business Plan Template
No ratings yet
Online Bakery Business Plan Template
9 pages
A Stop List For General Text
No ratings yet
A Stop List For General Text
17 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
Assignment No.1 (Code 6507)
No ratings yet
Assignment No.1 (Code 6507)
12 pages
English Practice 23: Ture Ture Ture Ture TH TH TH TH C C C C Ed Ed Ed Ed H H H H
No ratings yet
English Practice 23: Ture Ture Ture Ture TH TH TH TH C C C C Ed Ed Ed Ed H H H H
12 pages
Order-Splitting and Long-Memory in An Order-Driven Market: T E P J B
No ratings yet
Order-Splitting and Long-Memory in An Order-Driven Market: T E P J B
7 pages
Hindawi Interview PDF
No ratings yet
Hindawi Interview PDF
60 pages
ARC 073 - Urban Design
No ratings yet
ARC 073 - Urban Design
62 pages
MANGANESE
No ratings yet
MANGANESE
6 pages
Final Project ML Nikita Chaturvedi 03.10.2021 Text Analytics
No ratings yet
Final Project ML Nikita Chaturvedi 03.10.2021 Text Analytics
32 pages
President Words - Ipynb
No ratings yet
President Words - Ipynb
8 pages
Comprehensive Land Use Plans
No ratings yet
Comprehensive Land Use Plans
8 pages
Elements and Principles of Art
No ratings yet
Elements and Principles of Art
5 pages
The Fitness Club Topo
No ratings yet
The Fitness Club Topo
6 pages
Exercises For Morphology
No ratings yet
Exercises For Morphology
7 pages
Rev Insurance Business Report
No ratings yet
Rev Insurance Business Report
4 pages
Document
No ratings yet
Document
5 pages
Excel Worksheet Meg - S General Store - Teacher Version
No ratings yet
Excel Worksheet Meg - S General Store - Teacher Version
7 pages
Lesson 2.3 - Progressive Presidents
No ratings yet
Lesson 2.3 - Progressive Presidents
5 pages
Ex4 Lab
No ratings yet
Ex4 Lab
4 pages
Past Tense Worksheet PDF
No ratings yet
Past Tense Worksheet PDF
4 pages
9.8 Text Analysis-Dispersion Plots
No ratings yet
9.8 Text Analysis-Dispersion Plots
6 pages
Text Analysis
No ratings yet
Text Analysis
2 pages
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
NLP Programming
No ratings yet
NLP Programming
39 pages
Performance Management
No ratings yet
Performance Management
18 pages
General Comprehension: Words From The Speech
No ratings yet
General Comprehension: Words From The Speech
8 pages
NLP Pratical
No ratings yet
NLP Pratical
14 pages
HW2 Answers
No ratings yet
HW2 Answers
3 pages
Monash - Bachelor of Pharmaceutical Science
No ratings yet
Monash - Bachelor of Pharmaceutical Science
1 page
Team Assignment 1
No ratings yet
Team Assignment 1
3 pages
High Low Voltage Protection Device Syste
No ratings yet
High Low Voltage Protection Device Syste
6 pages
Examen Final Verbos Irregulares
No ratings yet
Examen Final Verbos Irregulares
5 pages
Vocabulary Practice
No ratings yet
Vocabulary Practice
3 pages
Summary of Dissertation
No ratings yet
Summary of Dissertation
5 pages
Lab - Manual - IR - BE AI&DS CL II
No ratings yet
Lab - Manual - IR - BE AI&DS CL II
38 pages
Write A Dialogue Using The
No ratings yet
Write A Dialogue Using The
3 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
The 1200 Most Frequently Used Words
100% (1)
The 1200 Most Frequently Used Words
7 pages
(Ebook) The New Global Rulers: The Privatization of Regulation in The World Economy by Tim Büthe Walter Mattli ISBN 9781400838790 Instant Download
100% (4)
(Ebook) The New Global Rulers: The Privatization of Regulation in The World Economy by Tim Büthe Walter Mattli ISBN 9781400838790 Instant Download
41 pages
Cuet PG History Question Paper 2023
No ratings yet
Cuet PG History Question Paper 2023
75 pages
NLP EXP 3 (A) - Word Analysis
No ratings yet
NLP EXP 3 (A) - Word Analysis
2 pages
Tsarecord
No ratings yet
Tsarecord
22 pages
English - 9 Grade - 5 Level - Worksheet 3
No ratings yet
English - 9 Grade - 5 Level - Worksheet 3
2 pages
Bag of Words 03 and 04 Model
No ratings yet
Bag of Words 03 and 04 Model
4 pages
English - 9 Grade - 5 Level - Worksheet 3
No ratings yet
English - 9 Grade - 5 Level - Worksheet 3
2 pages
1 A World Language: Despite Various Attempts To - Create Universal
No ratings yet
1 A World Language: Despite Various Attempts To - Create Universal
3 pages
7 Idf
No ratings yet
7 Idf
5 pages
Frequency Density Word List
No ratings yet
Frequency Density Word List
8 pages
Files in Python
No ratings yet
Files in Python
22 pages
Clanbridge Food Menu 10-16 Feb 2
No ratings yet
Clanbridge Food Menu 10-16 Feb 2
1 page
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
Text Processing
No ratings yet
Text Processing
16 pages
Palompon South - Tambis ES - 3rd Portfolio Day - 24-25 - ACCOMPLISHMENT REPORT
No ratings yet
Palompon South - Tambis ES - 3rd Portfolio Day - 24-25 - ACCOMPLISHMENT REPORT
2 pages
Preliminaries Zeta PREPBOOK 2023 2024 A 1687770309
No ratings yet
Preliminaries Zeta PREPBOOK 2023 2024 A 1687770309
14 pages
TSA Student
No ratings yet
TSA Student
20 pages
1-NLP - Lab Manual
No ratings yet
1-NLP - Lab Manual
15 pages
Problem 2
100% (1)
Problem 2
10 pages
Business Report Problem 2
No ratings yet
Business Report Problem 2
10 pages
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
From Everand
Learn Python through Nursery Rhymes and Fairy Tales: Classic Stories Translated into Python Programs (Coding for Kids and Beginners)
Shari Eskenas
5/5 (1)
The Following Table Is A List of The Most Common Irregular Verbs in English
No ratings yet
The Following Table Is A List of The Most Common Irregular Verbs in English
3 pages
ML Project Report: (Text Learning Case Study)
No ratings yet
ML Project Report: (Text Learning Case Study)
9 pages
Problem 2 Businessreport ML
No ratings yet
Problem 2 Businessreport ML
9 pages
Editing, Omissions, Cloze Exercise
No ratings yet
Editing, Omissions, Cloze Exercise
5 pages
Test 30
No ratings yet
Test 30
9 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
3 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
(PDF Download) Environment The Science Behind The Stories 5th Edition Withgott Test Bank Full Chapter
100% (11)
(PDF Download) Environment The Science Behind The Stories 5th Edition Withgott Test Bank Full Chapter
44 pages

BR PRB 2

Uploaded by

BR PRB 2

Uploaded by

Problem 2:

in Python. We will be looking at the following speeches of the Presidents of the

United States of America:

1. President Franklin D. Roosevelt in 1941

2. President John F. Kennedy in 1961

3. President Richard Nixon in 1973

2.1 Find the number of characters, words, and

sentences for the mentioned documents. – 3Marks.

from nltk.corpus import inaugural

[nltk_data] Downloading package stopwords to

[nltk_data] Package stopwords is already up-to-date!

[nltk_data] Downloading package punkt to

[nltk_data] Package punkt is already up-to-date!

[nltk_data] Downloading package movie_reviews to

[nltk_data] Downloading package inaugural to

[nltk_data] Package inaugural is already up-to-date!

y2 = pd.DataFrame({'Text':inaugural.raw( '1973-Nixon.txt')},index = [0])

Text wordcount char count sent c

2.2 Remove all the stop words from all three

u're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yours

herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'thei

rs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "tha

, 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing'

'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between'

, 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to

ain', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why

aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "does

n't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "i

sn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn'

t", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren'

, "weren't", 'won', "won't", 'wouldn', "wouldn't"]

from nltk.tokenize import word_tokenize

tokens_without_sw = [word for word in text_tokens if not word in stop_t

characters, Sentences and Words out of the Speeches.

filtered_sentence = (" ").join(tokens_without_sw)

2.3 Which word occurs the most number of

times in his inaugural address for each

president? Mention the top three words. (after

removing the stopwords) – 3 Marks¶

from collections import Counter

print("Most common word of Roosevelt speech ",Roosevelt_most_occur )

Roosevelt_freq = pd.DataFrame(Roosevelt_most_occur, columns= ['Roosevelt_Fr

print("Most common word of Kennedy speech ",Kennedy_most_occur )

Kennedy_freq = pd.DataFrame(Kennedy_most_occur, columns= ['Kennedy_Frequent

print("Most common word of Nixon speech ",Nixon_most_occur )

Nixon_freq = pd.DataFrame(Nixon_most_occur, columns= ['Nixon_Frequent_words

('life', 9), ('us', 8)]

ife', 9), ('us', 8)]

fe', 9), ('us', 8)]

2.4 Plot the word cloud of each of the speeches of

the variable. (after removing the stopwords) – 3

from wordcloud import WordCloud,STOPWORDS

from wordcloud import WordCloud,STOPWORDS

words = ' '.join(y0['Text'])

cleaned_word = " ".join([word for word in words.split()

if '\n' not in word

You might also like