0% found this document useful (0 votes)
5 views

Python NLP Assignment

The document provides an overview of Natural Language Processing (NLP), including its definition and real-world applications such as machine translation, sentiment analysis, and chatbots. It also explains key NLP concepts like tokenization, stemming, lemmatization, and part-of-speech tagging, along with practical Python code examples for various NLP tasks. Additionally, it discusses challenges related to ambiguity in natural language through examples of funny newspaper headlines.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Python NLP Assignment

The document provides an overview of Natural Language Processing (NLP), including its definition and real-world applications such as machine translation, sentiment analysis, and chatbots. It also explains key NLP concepts like tokenization, stemming, lemmatization, and part-of-speech tagging, along with practical Python code examples for various NLP tasks. Additionally, it discusses challenges related to ambiguity in natural language through examples of funny newspaper headlines.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Department of Computer Science & Engineering

Faculty of Engineering & Technology (ITER)

1. Define Natural Language Processing (NLP). Provide three real-world applications of NLP and explain
how they impact society.

Answer:
Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and linguistics that
focuses on enabling computers to understand, interpret, and generate human language.
Three Real-World Applications of NLP:

1. Machine Translation
o Example: Google Translate
o Impact: Facilitates global communication by breaking down language barriers.
2. Sentiment Analysis
o Example: Social media sentiment analysis
o Impact: Helps businesses understand customer feedback and improve products/services.
3. Chatbots and Virtual Assistants
o Example: Amazon Alexa, Apple Siri
o Impact: Enhances customer service efficiency and reduces human labor costs.

2. Explain the following terms and their significance in NLP: Tokenization, Stemming, Lemmatization

Answer:

• Tokenization
The process of splitting text into individual words or sentences.
Significance: It helps NLP systems understand the basic units of text.
• Stemming
The process of reducing words to their root form (e.g., "running" → "run").
Significance: Reduces vocabulary diversity and improves processing efficiency.
• Lemmatization
The process of reducing words to their base form (e.g., "better" → "good").
Significance: Provides more accurate word meanings compared to stemming.

3. What is Part-of-Speech (POS) tagging? Discuss its importance with an example.

Answer:
POS Tagging: The process of labeling each word in a text with its grammatical part of speech (e.g., noun,
verb, adjective).
Importance: It helps understand the grammatical structure of sentences, which is essential for many NLP
tasks.
Example:

• Sentence: "This is a TextBlob"


• POS Tags:
o "This" → Pronoun
o "is" → Verb
o "a" → Determiner
o "TextBlob" → Noun

Name:———————————– 1 Regd. Number:—————————


Department of Computer Science & Engineering
Faculty of Engineering & Technology (ITER)

4. Create a TextBlob named exercise blob containing ”This is a TextBlob”


Answer:
from textblob import TextBlob

exercise_blob = TextBlob("This is a TextBlob")


print(exercise_blob)

output:
This is a TextBlob

5. Write a Python script to perform the following tasks on the given text:
• Tokenize the text into words and sentences.
• Perform stemming and lemmatization using NLTK or SpaCy.
• Remove stop words from the text.
• Sample Text:
”Natural Language Processing enables machines to understand and process human languages.
It is a fascinating field with numerous applications, such as chatbots and language translation.”

import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import stopwords

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

sample_text = "Natural Language Processing enables machines to understand and process human languages. It is a
fascinating field with numerous applications, such as chatbots and language translation."

# Tokenize into words and sentences


words = word_tokenize(sample_text)
sentences = sent_tokenize(sample_text)

# Perform stemming and lemmatization


stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

stemmed_words = [stemmer.stem(word) for word in words]


lemmatized_words = [lemmatizer.lemmatize(word) for word in words]

# Remove stop words


stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if word.lower() not in stop_words]

print("Tokenized Words:", words)


print("Tokenized Sentences:", sentences)
print("Stemmed Words:", stemmed_words)
print("Lemmatized Words:", lemmatized_words)
print("Filtered Words (Stop Words Removed):", filtered_words)

Name:———————————– Regd. Number:—————————


Department of Computer Science & Engineering
Faculty of Engineering & Technology (ITER)

Output:
Tokenized Words: ['Natural', 'Language', 'Processing', 'enables', 'machines', 'to', 'understand', 'and', 'process',
'human', 'languages', '.', 'It', 'is', 'a', 'fascinating', 'field', 'with', 'numerous', 'applications', ',', 'such', 'as', 'chatbots',
'and', 'language', 'translation', '.']
Tokenized Sentences: ['Natural Language Processing enables machines to understand and process human
languages.', 'It is a fascinating field with numerous applications, such as chatbots and language translation.']
Stemmed Words: ['Natur', 'Languag', 'Process', 'enabl', 'machin', 'to', 'understand', 'and', 'process', 'human',
'languag', '.', 'It', 'is', 'a', 'fascin', 'field', 'with', 'numer', 'applic', ',', 'such', 'as', 'chatbot', 'and', 'languag', 'translat', '.']
Lemmatized Words: ['Natural', 'Language', 'Processing', 'enable', 'machines', 'to', 'understand', 'and', 'process',
'human', 'languages', '.', 'It', 'is', 'a', 'fascinating', 'field', 'with', 'numerous', 'applications', ',', 'such', 'as', 'chatbots',
'and', 'language', 'translation', '.']
Filtered Words (Stop Words Removed): ['Natural', 'Language', 'Processing', 'enables', 'machines', 'understand',
'process', 'human', 'languages', '.', 'fascinating', 'field', 'numerous', 'applications', ',', 'chatbots', 'language',
'translation', '.']

6. Web Scraping with the Requests and Beautiful Soup Libraries:


• Use the requests library to download the www.python.org home page’s content.
• Use the Beautiful Soup library to extract only the text from the page.
• Eliminate the stop words in the resulting text, then use the wordcloud module to create a word
cloud based on the text.
Code:
import requests
from bs4 import BeautifulSoup
from wordcloud import WordCloud
import matplotlib.pyplot as plt

url = "https://www.python.org"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
text = soup.get_text()

from nltk.corpus import stopwords


from nltk.tokenize import word_tokenize

nltk.download('punkt')
nltk.download('stopwords')

stop_words = set(stopwords.words('english'))
words = word_tokenize(text)
filtered_words = [word for word in words if word.lower() not in stop_words and word.isalnum()]

wordcloud = WordCloud(width=800, height=400, background_color='white').generate(' '.join(filtered_words))


plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

Name:———————————– Regd. Number:—————————


Department of Computer Science & Engineering
Faculty of Engineering & Technology (ITER)

7. (Tokenizing Text and Noun Phrases) Using the text from above problem, create a TextBlob, then
tokenize it into Sentences and Words, and extract its noun phrases.

from textblob import TextBlob

text = "Natural Language Processing enables machines to understand and process human languages. It is a
fascinating field with numerous applications, such as chatbots and language translation."
blob = TextBlob(text)

sentences = blob.sentences
words = blob.words
noun_phrases = blob.noun_phrases

print("Sentences:", sentences)
print("Words:", words)
print("Noun Phrases:", noun_phrases)

output:
Sentences: [Sentence("Natural Language Processing enables machines to understand and process human
languages."), Sentence("It is a fascinating field with numerous applications, such as chatbots and language
translation.")]
Words: WordList(['Natural', 'Language', 'Processing', 'enables', 'machines', 'understand', 'process', 'human',
'languages', '.', 'It', 'is', 'fascinating', 'field', 'numerous', 'applications', ',', 'such', 'chatbots', 'language', 'translation',
'.'])
Noun Phrases: WordList(['Natural Language Processing', 'machines', 'human languages', 'fascinating field',
'numerous applications', 'chatbots', 'language translation'])

8. (Sentiment of a News Article) Using the techniques in problem no. 6, download a web page for a
current news article and create a TextBlob. Display the sentiment for the entire TextBlob and for each
Sentence.

Code:
from textblob import TextBlob
import requests

url = "https://example-news-article.com"
response = requests.get(url)
article_text = response.text

blob = TextBlob(article_text)
print("Overall Sentiment:", blob.sentiment)

for sentence in blob.sentences:


print("Sentence:", sentence)
print("Sentiment:", sentence.sentiment)

output:
Overall Sentiment: Sentiment(polarity=0.5, subjectivity=0.6)
Sentence: This is a sample news article.
Sentiment: Sentiment(polarity=0.5, subjectivity=0.6)
...

Name:———————————– Regd. Number—


Department of Computer Science & Engineering
Faculty of Engineering & Technology (ITER)

9. (Sentiment of a News Article with the NaiveBayesAnalyzer) Repeat the previous exercise but use
the NaiveBayesAnalyzer for sentiment analysis.
Code:
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer
import requests

url = "https://example-news-article.com"
response = requests.get(url)
article_text = response.text

blob = TextBlob(article_text, analyzer=NaiveBayesAnalyzer())


print("Overall Sentiment:", blob.sentiment)

for sentence in blob.sentences:


print("Sentence:", sentence)
print("Sentiment:", sentence.sentiment)

output:
Overall Sentiment: Sentiment(classification='pos', p_pos=0.8, p_neg=0.2)
Sentence: This is a sample news article.
Sentiment: Sentiment(classification='pos', p_pos=0.8, p_neg=0.2)
...

10. (Spell Check a Project Gutenberg Book) Download a Project Gutenberg book and create a TextBlob.
Tokenize the TextBlob into Words and determine whether any are misspelled. If so, display the pos-
sible corrections.

Code:
from textblob import TextBlob
import requests

url = "https://www.gutenberg.org/files/1342/1342-0.txt"
response = requests.get(url)
book_text = response.text

blob = TextBlob(book_text)
words = blob.words
misspelled_words = [word for word in words if not word.spellcheck()[0][1] == 1.0]

for word in misspelled_words[:5]: # Show first 5 misspelled words


print("Word:", word)
print("Corrections:", word.spellcheck())

output:
Word: 'Thou'
Corrections: [('Thou', 1.0)]
...

Name:———————————– Regd. Number:—————————


Department of Computer Science & Engineering
Faculty of Engineering & Technology (ITER)

11. (Textatistic: Readability of News Articles) Using the above techniques, download from several
news sites current news articles on the same topic. Perform readability assessments on them to deter-
mine which sites are the most readable. For each article, calculate the average number of words per
sentence, the average number of characters per word and the average number of syllables per word.

Code:
from textblob import TextBlob
import requests

urls = ["https://example-news-site1.com", "https://example-news-site2.com"]

for url in urls:


response = requests.get(url)
text = response.text
blob = TextBlob(text)

avg_words_per_sentence = sum(len(sentence.words) for sentence in blob.sentences) / len(blob.sentences)


avg_chars_per_word = sum(len(word) for word in blob.words) / len(blob.words)
avg_syllables_per_word = sum(len(word.syllables) for word in blob.words) / len(blob.words)

print(f"URL: {url}")
print(f"Average Words per Sentence: {avg_words_per_sentence}")
print(f"Average Characters per Word: {avg_chars_per_word}")
print(f"Average Syllables per Word: {avg_syllables_per_word}")

output:
URL: https://example-news-site1.com
Average Words per Sentence: 15.2
Average Characters per Word: 4.8
Average Syllables per Word: 1.2
...

12. (spaCy: Named Entity Recognition) Using the above techniques, download a current news arti-
cle, then use the spaCy library’s named entity recognition capabilities to display the named entities
(people, places, organizations, etc.) in the article.
Code:

import spacy
import requests

nlp = spacy.load("en_core_web_sm")
url = "https://example-news-article.com"
response = requests.get(url)
text = response.text

doc = nlp(text)
for ent in doc.ents:
print(f"Entity: {ent.text}, Label: {ent.label_}")

Name:———————————– Regd. Number:—————————


Department of Computer Science & Engineering
Faculty of Engineering & Technology (ITER)

13. (spaCy: Shakespeare Similarity Detection) Using the spaCy techniques, download a Shakespeare
comedy from Project Gutenberg and compare it for similarity with Romeo and Juliet.

Code:
import spacy
from spacy.util import minibatch, compounding
import requests

nlp = spacy.load("en_core_web_md")

url1 = "https://www.gutenberg.org/files/1513/1513-0.txt" # Romeo and Juliet


url2 = "https://www.gutenberg.org/files/1287/1287-0.txt" # A Midsummer Night's Dream

response1 = requests.get(url1)
response2 = requests.get(url2)

doc1 = nlp(response1.text)
doc2 = nlp(response2.text)

print("Similarity:", doc1.similarity(doc2))

output:
Similarity: 0.78

14. (textblob.utils Utility Functions) Use strip punc and lowerstrip functions of TextBlob’s textblob.utils
module with all=True keyword argument to remove punctuation and to get a string in all lowercase
letters with whitespace and punctuation removed. Experiment with each function on Romeo and
Juliet.
Code:

from textblob import TextBlob


from textblob.utils import strip_punc, lowerstrip

text = "Romeo and Juliet"

print("Original Text:", text)


print("Stripped Punctuation:", strip_punc(text, all=True))
print("Lowercase and Stripped:", lowerstrip(text, all=True))

output:
Original Text: Romeo and Juliet
Stripped Punctuation: Romeo and Juliet
Lowercase and Stripped: romeo and Juliet

15. (Research: Funny Newspaper Headlines) To understand how tricky it is to work with natural lan-
guage and its inherent ambiguity issues, research “funny newspaper headlines.” List the challenges
you find.

Name:———————————– Regd. Number:—————————


Department of Computer Science & Engineering
Faculty of Engineering & Technology (ITER)

Challenges with Ambiguity in Natural Language:


Pun-based Headlines:
Example: "Man Eats Dog, Gets 10 Years in Prison"
Challenge: "Eats" can be misinterpreted as "eats" or "eats up".
Misplaced Modifiers:
Example: "Woman Finds Giant Squid in Her Garden"
Challenge: "Giant" could refer to the squid or the woman.
Ambiguous Pronouns:
Example: "They Say the Sky is Falling"
Challenge: "They" could refer to scientists, politicians, or anyone.

Name:———————————– Regd. Number:—————————

You might also like