DAV Example
DAV Example
The resultant graph will look similar to the one given below:
Read the file speech.txt using Python and extract each word in the file. (Remember this list will and
Task 1
should contain duplicates)
Solution # Import libraries
import numpy as np
import pandas as pd
import string
import matplotlib.pyplot as plt
%matplotlib inline
# Read the speech.txt file and create a word list
# file_name = "speech.txt"
word_list = []
OUTPUT
['thank', 'you', 'today', 'i', 'want', 'to', 'talk', 'about', 'the', 'united', 'kingdom', 'our', 'place', 'in', 'the',
'world', 'and', 'our', 'membership', 'of']
Read stopwords.csv using pandas and extract all the stopwords in the file that belongs to language
Task 2
English.
Solution # Read 'stopwords.csv' file as pandas dataframe and print its head()
data = pd.read_csv('stopwords.csv')
data.head()
OUTPUT
# Selecting stopwords that belong to English language
english_stopwords = set(data[data['Language'] == 'English']['Words'].tolist())
print(english_stopwords)
OUTPUT
{'was', 're', 'himself', 'after', 'ourselves', 'whom', 'out', 'against', 'has', 'aren', 't', 'we', 'have', 'own',
'it', 'as', 'do', "should've", "aren't", "hadn't", 'where', 'll', 'its', 'their', 'again', 'below', 's', 've', 'the',
'too', 'wouldn', 'more', 'y', 'can', 'are', 'itself', 'needn', 'she', 'having', 'now', 'or', 'at', 'nor', 'about',
"don't", 'very', 'were', 'mustn', "mustn't", 'his', "you're", "wasn't", 'once', 'which', 'doing', 'don',
"mightn't", 'but', "shouldn't", 'ain', 'most', 'then', "isn't", 'with', 'both', 'them', 'weren', 'be', 'had',
'will', 'hasn', 'isn', 'that', "couldn't", 'there', 'only', 'into', 'here', 'does', 'doesn', 'herself', 'me', 'by',
'is', "shan't", 'if', "it's", 'from', 'while', 'theirs', 'wasn', 'of', 'because', 'just', 'm', 'than', 'in', 'haven',
'yourself', 'him', "she's", 'my', 'our', 'did', 'off', 'each', "won't", 'shouldn', 'above', 'other',
'yourselves', 'how', 'they', 'hers', 'ours', 'hadn', 'am', 'her', 'shan', 'such', 'couldn', "haven't", 'those',
'when', 'been', 'for', 'same', "you've", 'between', "didn't", 'why', 'and', 'ma', 'on', 'should', 'under',
"you'll", 'over', 'some', 'further', "needn't", "doesn't", 'myself', 'until', "weren't", 'before', 'any', 'he',
'o', 'didn', 'being', 'not', 'few', 'to', 'up', 'won', 'an', 'down', 'd', "that'll", 'through', "wouldn't", 'you',
'during', 'your', 'what', 'mightn', 'i', 'all', 'yours', 'no', 'so', 'a', 'these', 'themselves', "you'd", 'who',
"hasn't", 'this'}
Task 3 Remove all the words extracted from the speech file if those words are present in the stopwords.
word_list = [word for word in word_list if word not in english_stopwords]
Solution
OUTPUT
['thank', 'today', 'want', 'talk', 'united', 'kingdom', 'place', 'world', 'membership', 'european',
'union', 'start', 'want', 'make', 'clear', 'see', 'rally', 'attack', 'even', 'criticism']
Task 4 Sort the pruned list of words from the file in order of the highest frequency of occurrence.
Task 5 Plot a bar chart that represents the frequency of the top 15 words of the speech.
# Separate the frequency_list into 2 lists for easy plotting
keys = [key for key in sorted_word_freq]
values = [sorted_word_freq[key] for key in sorted_word_freq]
OUTPUT
# Plotting the words on the x-axis and frequency along the y-axis
OUTPUT
Solution
THE END