0% found this document useful (0 votes)
3 views

Assignment-10 (NLP-part-2)

Lab Assignment 10 for UCS420 Cognitive Computing focuses on Natural Language Processing (NLP) using Python. It includes tasks such as text preprocessing, feature extraction, sentiment analysis, and text generation through various techniques like tokenization, stemming, and similarity metrics. Students are required to apply libraries like NLTK, TextBlob, and Keras to analyze and generate text based on their own inputs.

Uploaded by

skaushal1be23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Assignment-10 (NLP-part-2)

Lab Assignment 10 for UCS420 Cognitive Computing focuses on Natural Language Processing (NLP) using Python. It includes tasks such as text preprocessing, feature extraction, sentiment analysis, and text generation through various techniques like tokenization, stemming, and similarity metrics. Students are required to apply libraries like NLTK, TextBlob, and Keras to analyze and generate text based on their own inputs.

Uploaded by

skaushal1be23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Lab Assignment 10

UCS420 Cogni ve Compu ng

Assignment Title: NLP using Python-II


(Feature extrac on from text, sen ment analysis and text genera on)

Q1. Write a unique paragraph (5-6 sentences) about your favorite topic (e.g., sports,
technology, food, books, etc.).

1. Convert text to lowercase and remove punctua on using re.


2. Tokenize the text into words and sentences.
3. Split using split() and word_tokenize() and compare how Python split and NLTK’s
word_tokenize() differ.
4. Remove stopwords (using NLTK's stopwords list).
5. Display word frequency distribu on (excluding stopwords).

Q2. Using the same paragraph from Q1:


1. Extract all words with only alphabets using re.findall()
2. Remove stop words using NLTK’s stopword list
3. Perform stemming with PorterStemmer
4. Perform lemma za on with WordNetLemma zer
5. Compare the stemmed and lemma zed outputs and explain when you’d prefer one over
the other.

Q3. Choose 3 short texts of your own (e.g., different news headlines, product reviews).

1. Use CountVectorizer to generate the Bag of Words representa on.


2. Use TfidfVectorizer to compute TF-IDF scores.
3. Print and interpret the top 3 keywords from each text using TF-IDF.

Q4. Write 2 short texts (4–6 lines each) describing two different technologies (e.g., AI vs
Blockchain).

1. Preprocess and tokenize both texts.


2. Calculate:
a. Jaccard Similarity using sets
b. Cosine Similarity using TfidfVectorizer + cosine_similarity()
c. Analyze which similarity metric gives be er insights in your case.

Q5. Write a short review for a product or service.


1. Use TextBlob or VADER to find polarity & subjec vity for each review.
2. Classify reviews into Posi ve / Nega ve / Neutral.
3. Create a word cloud using the wordcloud library for all posi ve reviews.

Q6. Choose your own paragraph (~100 words) as training data.


1. Tokenize text using Tokenizer() from keras.preprocessing.text
2. Create input sequences and build a simple LSTM or Dense model
3. Train the model and generate 2–3 new lines of text star ng from any seed word you
provide.

You might also like