Python NLP Assignment
Python NLP Assignment
1. Define Natural Language Processing (NLP). Provide three real-world applications of NLP and explain
how they impact society.
Answer:
Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics that
focuses on enabling computers to understand, interpret, and generate human language.
Three Real-World Applications of NLP:
1. Machine Translation
o Example: Google Translate
o Impact: Facilitates global communication by breaking down language barriers.
2. Sentiment Analysis
o Example: Social media sentiment analysis
o Impact: Helps businesses understand customer feedback and improve products/services.
3. Chatbots and Virtual Assistants
o Example: Amazon Alexa, Apple Siri
o Impact: Enhances customer service efficiency and reduces human labor costs.
2. Explain the following terms and their significance in NLP: Tokenization, Stemming, Lemmatization
Answer:
• Tokenization
The process of splitting text into individual words or sentences.
Significance: It helps NLP systems understand the basic units of text.
• Stemming
The process of reducing words to their root form (e.g., "running" → "run").
Significance: Reduces vocabulary diversity and improves processing efficiency.
• Lemmatization
The process of reducing words to their base form (e.g., "better" → "good").
Significance: Provides more accurate word meanings compared to stemming.
Answer:
POS Tagging: The process of labeling each word in a text with its grammatical part of speech (e.g., noun,
verb, adjective).
Importance: It helps understand the grammatical structure of sentences, which is essential for many NLP
tasks.
Example:
output:
This is a TextBlob
5. Write a Python script to perform the following tasks on the given text:
• Tokenize the text into words and sentences.
• Perform stemming and lemmatization using NLTK or SpaCy.
• Remove stop words from the text.
• Sample Text:
”Natural Language Processing enables machines to understand and process human languages.
It is a fascinating field with numerous applications, such as chatbots and language translation.”
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import stopwords
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
sample_text = "Natural Language Processing enables machines to understand and process human languages. It is a
fascinating field with numerous applications, such as chatbots and language translation."
Output:
Tokenized Words: ['Natural', 'Language', 'Processing', 'enables', 'machines', 'to', 'understand', 'and', 'process',
'human', 'languages', '.', 'It', 'is', 'a', 'fascinating', 'field', 'with', 'numerous', 'applications', ',', 'such', 'as', 'chatbots',
'and', 'language', 'translation', '.']
Tokenized Sentences: ['Natural Language Processing enables machines to understand and process human
languages.', 'It is a fascinating field with numerous applications, such as chatbots and language translation.']
Stemmed Words: ['Natur', 'Languag', 'Process', 'enabl', 'machin', 'to', 'understand', 'and', 'process', 'human',
'languag', '.', 'It', 'is', 'a', 'fascin', 'field', 'with', 'numer', 'applic', ',', 'such', 'as', 'chatbot', 'and', 'languag', 'translat', '.']
Lemmatized Words: ['Natural', 'Language', 'Processing', 'enable', 'machines', 'to', 'understand', 'and', 'process',
'human', 'languages', '.', 'It', 'is', 'a', 'fascinating', 'field', 'with', 'numerous', 'applications', ',', 'such', 'as', 'chatbots',
'and', 'language', 'translation', '.']
Filtered Words (Stop Words Removed): ['Natural', 'Language', 'Processing', 'enables', 'machines', 'understand',
'process', 'human', 'languages', '.', 'fascinating', 'field', 'numerous', 'applications', ',', 'chatbots', 'language',
'translation', '.']
url = "https://www.python.org"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
text = soup.get_text()
nltk.download('punkt')
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
words = word_tokenize(text)
filtered_words = [word for word in words if word.lower() not in stop_words and word.isalnum()]
7. (Tokenizing Text and Noun Phrases) Using the text from above problem, create a TextBlob, then
tokenize it into Sentences and Words, and extract its noun phrases.
text = "Natural Language Processing enables machines to understand and process human languages. It is a
fascinating field with numerous applications, such as chatbots and language translation."
blob = TextBlob(text)
sentences = blob.sentences
words = blob.words
noun_phrases = blob.noun_phrases
print("Sentences:", sentences)
print("Words:", words)
print("Noun Phrases:", noun_phrases)
output:
Sentences: [Sentence("Natural Language Processing enables machines to understand and process human
languages."), Sentence("It is a fascinating field with numerous applications, such as chatbots and language
translation.")]
Words: WordList(['Natural', 'Language', 'Processing', 'enables', 'machines', 'understand', 'process', 'human',
'languages', '.', 'It', 'is', 'fascinating', 'field', 'numerous', 'applications', ',', 'such', 'chatbots', 'language', 'translation',
'.'])
Noun Phrases: WordList(['Natural Language Processing', 'machines', 'human languages', 'fascinating field',
'numerous applications', 'chatbots', 'language translation'])
8. (Sentiment of a News Article) Using the techniques in problem no. 6, download a web page for a
current news article and create a TextBlob. Display the sentiment for the entire TextBlob and for each
Sentence.
Code:
from textblob import TextBlob
import requests
url = "https://example-news-article.com"
response = requests.get(url)
article_text = response.text
blob = TextBlob(article_text)
print("Overall Sentiment:", blob.sentiment)
output:
Overall Sentiment: Sentiment(polarity=0.5, subjectivity=0.6)
Sentence: This is a sample news article.
Sentiment: Sentiment(polarity=0.5, subjectivity=0.6)
...
9. (Sentiment of a News Article with the NaiveBayesAnalyzer) Repeat the previous exercise but use
the NaiveBayesAnalyzer for sentiment analysis.
Code:
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
import requests
url = "https://example-news-article.com"
response = requests.get(url)
article_text = response.text
output:
Overall Sentiment: Sentiment(classification='pos', p_pos=0.8, p_neg=0.2)
Sentence: This is a sample news article.
Sentiment: Sentiment(classification='pos', p_pos=0.8, p_neg=0.2)
...
10. (Spell Check a Project Gutenberg Book) Download a Project Gutenberg book and create a TextBlob.
Tokenize the TextBlob into Words and determine whether any are misspelled. If so, display the pos-
sible corrections.
Code:
from textblob import TextBlob
import requests
url = "https://www.gutenberg.org/files/1342/1342-0.txt"
response = requests.get(url)
book_text = response.text
blob = TextBlob(book_text)
words = blob.words
misspelled_words = [word for word in words if not word.spellcheck()[0][1] == 1.0]
output:
Word: 'Thou'
Corrections: [('Thou', 1.0)]
...
11. (Textatistic: Readability of News Articles) Using the above techniques, download from several
news sites current news articles on the same topic. Perform readability assessments on them to deter-
mine which sites are the most readable. For each article, calculate the average number of words per
sentence, the average number of characters per word and the average number of syllables per word.
Code:
from textblob import TextBlob
import requests
print(f"URL: {url}")
print(f"Average Words per Sentence: {avg_words_per_sentence}")
print(f"Average Characters per Word: {avg_chars_per_word}")
print(f"Average Syllables per Word: {avg_syllables_per_word}")
output:
URL: https://example-news-site1.com
Average Words per Sentence: 15.2
Average Characters per Word: 4.8
Average Syllables per Word: 1.2
...
12. (spaCy: Named Entity Recognition) Using the above techniques, download a current news arti-
cle, then use the spaCy library’s named entity recognition capabilities to display the named entities
(people, places, organizations, etc.) in the article.
Code:
import spacy
import requests
nlp = spacy.load("en_core_web_sm")
url = "https://example-news-article.com"
response = requests.get(url)
text = response.text
doc = nlp(text)
for ent in doc.ents:
print(f"Entity: {ent.text}, Label: {ent.label_}")
13. (spaCy: Shakespeare Similarity Detection) Using the spaCy techniques, download a Shakespeare
comedy from Project Gutenberg and compare it for similarity with Romeo and Juliet.
Code:
import spacy
from spacy.util import minibatch, compounding
import requests
nlp = spacy.load("en_core_web_md")
response1 = requests.get(url1)
response2 = requests.get(url2)
doc1 = nlp(response1.text)
doc2 = nlp(response2.text)
print("Similarity:", doc1.similarity(doc2))
output:
Similarity: 0.78
14. (textblob.utils Utility Functions) Use strip punc and lowerstrip functions of TextBlob’s textblob.utils
module with all=True keyword argument to remove punctuation and to get a string in all lowercase
letters with whitespace and punctuation removed. Experiment with each function on Romeo and
Juliet.
Code:
output:
Original Text: Romeo and Juliet
Stripped Punctuation: Romeo and Juliet
Lowercase and Stripped: romeo and Juliet
15. (Research: Funny Newspaper Headlines) To understand how tricky it is to work with natural lan-
guage and its inherent ambiguity issues, research “funny newspaper headlines.” List the challenges
you find.