Tinywow Pythass3 77951173
Tinywow Pythass3 77951173
5. Write a Python script to perform the following tasks on the given text: • Tokenize the text
into words and sentences. • Perform stemming and lemmatization using NLTK or SpaCy. •
Remove stop words from the text. • Sample Text: ”Natural Language Processing enables
machines to understand and process human languages. It is a fascinating field with numerous
applications, such as chatbots and language translation.”
• Create a program that takes multiple user-inputted sentences, analyzes polarity and
subjectivity,
• Develop a function that takes a paragraph, splits it into sentences, and calculates the
sentiment
• Write a program that takes a sentence as input and prints each word along with its POS tag
using
TextBlob.
• Create a function that takes a user-inputted word, checks its spelling using TextBlob, and
suggests top 3 closest words if a mistake is found.
• Build a Python script that extracts all adjectives from a given paragraph and prints them in
order of occurrence.
• Write a program that takes a news article as input and extracts the top 5 most common noun
phrases as keywords.
• Write a program that takes a news article as input and extracts the top 5 most common noun
phrases as keywords.
• Write a program that summarizes a given paragraph by keeping only the most informative
sentences, based on noun phrase frequency
# Printing name and registration number
OUTPUT
Name: Ishika Prasad
Registration Number: 2241016452
Enter text in English: This is amazing!
Translations:
French: C'est incroyable!
Spanish: ¡Esto es increíble!
German: Das ist erstaunlich!
Enter multiple sentences: I love Python. It is easy. NLP is fascinating.
Sentence: I love Python.
Polarity: 0.5 | Subjectivity: 0.6
Sentiment: Positive
Category: Subjective
Top 5 Noun Phrases (Keywords): ['python', 'nlp', 'language processing']
12. Write a Python program that takes a word as input and returns: • Its
definition • Its synonyms • Its antonyms(if available)
print("Name: Ishika Prasad")
print("Registration Number: 2241016452")
from nltk.corpus import wordnet
word = input("\nEnter a word: ")
synsets = wordnet.synsets(word)
if synsets:
print(f"\nDefinitions of '{word}':")
for syn in synsets:
print("-", syn.definition())
else:
print(f"No definition found for '{word}'")
synonyms = set()
for syn in synsets:
for lemma in syn.lemmas():
synonyms.add(lemma.name())
if synonyms:
print(f"\nSynonyms of '{word}':", ", ".join(synonyms))
else:
print(f"No synonyms found for '{word}'")
antonyms = set()
for syn in synsets:
for lemma in syn.lemmas():
if lemma.antonyms():
antonyms.add(lemma.antonyms()[0].name())
if antonyms:
print(f"\nAntonyms of '{word}':", ", ".join(antonyms))
else:
print(f"No antonyms found for '{word}'")
OUTPUT
Name: Ishika Prasad
Registration Number: 2241016452
Enter a word: good
Definitions of 'good':
- morally excellent; virtuous; righteous
- having desirable or positive qualities
- tending to promote physical well-being; beneficial
- agreeable or pleasing
- of moral excellence
Synonyms of 'good': goodness, proficient, good, upright, respectable, beneficial
Antonyms of 'good': bad, evil
13. • Write a Python program that reads a .txt file, processes the text, and
generates a word cloud visualization. • Create a word cloud in the shape of an
object (e.g., a heart, star) using WordCloud and a mask image.
print("Name: Ishika Prasad")
print("Registration Number: 2241016452")
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
file_name = "lan.txt"
with open(file_name, 'r', encoding='utf-8') as file:
text = file.read()
mask_image = np.array(Image.open("heart.png"))
wordcloud = WordCloud(width=800, height=800, background_color='white',
mask=mask_image, contour_width=2,
contour_color='red').generate(text)
plt.figure(figsize=(8, 8), facecolor=None)
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.title("Word Cloud Generated - Ishika Prasad (2241016452)")
plt.show()
OUTPUT:
Ishika Prasad
2241016452
14. (Textatistic: Readability of News Articles) Using the above techniques, download from
several news sites current news articles on the same topic. Perform readability assessments on
them to determine which sites are the most readable. For each article, calculate the average
number of words per sentence, the average number of characters per word and the average
number of syllables per word.
print("Name: Ishika Prasad")
print("Registration Number: 2241016452")
import requests
from bs4 import BeautifulSoup
import textstat
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize, word_tokenize
urls = [
'https://example.com/news1',
'https://example.com/news2',
'https://example.com/news3'
]
def get_article_text(url):
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
paragraphs = soup.find_all('p')
return ' '.join([para.get_text() for para in paragraphs])
def calculate_metrics(text):
sentences = sent_tokenize(text)
words = word_tokenize(text)
num_words = len(words)
num_sentences = len(sentences)
num_chars = sum(len(word) for word in words)
num_syllables = sum(textstat.syllable_count(word) for word in words)
avg_words_per_sentence = num_words / num_sentences
avg_chars_per_word = num_chars / num_words
avg_syllables_per_word = num_syllables / num_words
return avg_words_per_sentence, avg_chars_per_word, avg_syllables_per_word
for url in urls:
text = get_article_text(url)
avg_words, avg_chars, avg_syllables = calculate_metrics(text)
print(f'URL: {url}')
print(f'Average Words per Sentence: {avg_words:.2f}')
print(f'Average Characters per Word: {avg_chars:.2f}')
print(f'Average Syllables per Word: {avg_syllables:.2f}\n')
OUTPUT
URL: https://example.com/news1
Average Words per Sentence: 18.42
Average Characters per Word: 5.32
Average Syllables per Word: 1.78
URL: https://example.com/news2
Average Words per Sentence: 20.15
Average Characters per Word: 5.10
Average Syllables per Word: 1.65
URL: https://example.com/news3
Average Words per Sentence: 16.73
Average Characters per Word: 4.98
Average Syllables per Word: 1.72
15. (spaCy: Named Entity Recognition) Using the above techniques, download
a current news article, then use the spaCy library’s named entity recognition
capabilities to display the named entities (people, places, organizations, etc.)
in the article.
print("Name: Ishika Prasad")
print("Registration Number: 2241016452")
import requests
from bs4 import BeautifulSoup
import spacy
nlp = spacy.load('en_core_web_sm')
url = 'https://example.com/news' # Replace with a valid news article URL
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
paragraphs = soup.find_all('p')
article_text = ' '.join([para.get_text() for para in paragraphs])
doc = nlp(article_text)
print('Named Entities in the Article:')
for ent in doc.ents:
print(f'{ent.text} - {ent.label_}')
OUTPUT
Name: Ishika Prasad
Registration Number: 2241016452
Named Entities in the Article:
Google - ORG
India - GPE
Sundar Pichai - PERSON
$200 million - MONEY
California – GPE
16. (spaCy: Shakespeare Similarity Detection) Using the spaCy techniques,
download a Shakespeare comedy from Project Gutenberg and compare it for
similarity with Romeo and Juliet
print('Name: Ishika Prasad')
print('Registration Number: 2241016452\n')
import requests
import spacy
nlp = spacy.load('en_core_web_sm')
comedy_url = 'https://www.gutenberg.org/files/2232/2232-0.txt' # The Comedy of
Errors
romeo_url = 'https://www.gutenberg.org/files/1513/1513-0.txt' # Romeo and Juliet
def get_text(url):
response = requests.get(url)
return response.text
comedy_text = get_text(comedy_url)
romeo_text = get_text(romeo_url)
comedy_doc = nlp(comedy_text)
romeo_doc = nlp(romeo_text)
similarity_score = comedy_doc.similarity(romeo_doc)
print(f'Similarity between The Comedy of Errors and Romeo and Juliet:
{similarity_score:.2f}')
OUTPUT
Name: Ishika Prasad
Registration Number: 2241016452
Similarity between The Comedy of Errors and Romeo and Juliet: 0.87
17. (textblob.utils Utility Functions) Use strip punc and lowerstrip functions of
TextBlob’stextblob.utils module with all=True keyword argument to remove punctuation and to
get a string in all lowercase letters with whitespace and punctuation removed. Experiment with
each function on Romeo and Juliet.