0% found this document useful (0 votes)

11 views3 pages

NLP - Assignment2 Proper RNN Working

Uploaded by

laiba Abdullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views3 pages

NLP - Assignment2 Proper RNN Working

Uploaded by

laiba Abdullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Natural Language Processing

Q1. A company has a dataset containing raw customer reviews. Design an NLP pipeline to
preprocess this data for sentiment analysis. Discuss each step in detail.
Example Data
"The product was fantastic! But delivery was delayed."
"Horrible customer service. Would not recommend!"
"Great quality for the price."

Solution:
The NLP pipeline can help in pre-processing the raw customer reviews for sentiment analysis so
that the data will be structured and ready to be modeled correctly. A further explanation of the
following steps is given below:
1. Text Cleaning:
o The text is made standard by removing special characters, punctuation, numbers,
and extra whitespace. This step of standardization reduces the noise that could
mislead the sentiment model.
o Example: "The product was fantastic! But delivery was delayed." → "The product
was fantastic But delivery was delayed"
2. Tokenization:
o Split the text into smaller units (tokens), typically words or sentences, for easier
processing. Tokenizing sentences can aid in capturing contextual sentiment for
compound sentences.
o Example: "The product was fantastic But delivery was delayed" → ["The", "product",
"was", "fantastic", "But", "delivery", "was", "delayed"]
3. Lowercasing:
o Convert all text to lowercase to ensure that identical words in different cases are
treated uniformly.
o Example: ["The", "product", "was", "fantastic"] → ["the", "product", "was",
"fantastic"]
4. Stopword Removal:
o Remove frequent words (like "the," "is," "and") that do not contribute to sentiment.
This reduces noise and enhances focus on sentiment-carrying words.
o Example: ["the", "product", "was", "fantastic"] → ["product", "fantastic"]
5. Stemming or Lemmatization:
o Convert words to their base or root forms to reduce dimensionality. Lemmatization
is preferred for sentiment analysis since it retains meaning.
o Example: ["delayed", "fantastic"] → ["delay", "fantastic"]
6. Feature Extraction:
o Transform textual data into numerical format. Use techniques like TF-IDF for
traditional approaches or embeddings like Word2Vec/BERT for context-sensitive
representations. Embeddings are particularly effective for capturing sentiment
nuances in phrases like "not bad."
7. Sentiment Labeling:
o Assign a sentiment score or label (positive, negative, or neutral) based on patterns in
the preprocessed data, typically using a supervised model trained on annotated
examples.
This pipeline will prepare data such as "The product was fantastic! But delivery was delayed."
into a structured format like ["product", "fantastic", "delivery", "delay"] and numerical
embeddings for downstream sentiment analysis.

Q2. Write a Python function to tokenize and remove stop words from the “The quick brown
fox jumps over the lazy dog.”. Explain how this step affects the quality of an NLP model.
Solution:
Python Function for Tokenization and Stopword Removal
Below is a Python function that tokenizes the input sentence and removes stopwords using the
NLTK library:

Q3.Compare and contrast traditional feature extraction techniques like TF-IDF with modern
embeddings like Word2Vec and BERT. Discuss the impact of these advancements.
You are tasked with classifying emails as spam or non-spam. Justify the choice of feature
extraction technique you would use and why.
Solution:
Comparing TF-IDF, Word2Vec, and BERT
Traditional technique, TF-IDF, generates sparse vectors by checking the word importance based
on frequency. Though computationally quite simple, there are some big disadvantages because
the semantic and contextual aspects of the meaning are beyond its consideration potential.
Hence, it would not fit more complex NLP tasks. On the other side, Word2Vec—modern
embedding—captures the semantic relationship of words, such as "King" and "Queen." It fails
for out-of-vocabulary words, and its embeddings do not consider context. BERT is the state-of-
the-art approach that builds contextual embeddings considering sentence-level meaning, thus it
can handle polysemy and subtle shades of meaning; however, it is resource-intensive in terms of
computation.
Justification for Spam Classification
For spam classification, the choice of feature extraction depends on the size, complexity, and
computational resources of a dataset:
 TF-IDF works well for resource-constrained or smaller datasets. This is interpretable, fast,
and integrates well with traditional models like SVM or logistic regression.
 Word2Vec probably does better when the size of the dataset is a moderately large and
semantic understanding of words e.g., "free" and "offer" are important.
 BERT may be suited to large datasets or cases where subtlety like the contextual meaning of
the word "urgent" across different phrases needs to be captured. These embeddings, fed
into deep learning classifiers, achieve state-of-the-art results, but may be computationally
expensive.

Q4. Write a Python script to train a Recurrent Neural Network (RNN) on the Shakespeare text
dataset available at this link. Follow the NLP pipeline to train the model and generate
Shakespearean-style text.
Solution:
Recurrent Neural Networks (RNNs) are designed for sequential data, making them ideal for text
generation tasks. When applied to the Shakespeare text dataset, an RNN learns patterns and
structures in the text by processing it in sequences and predicting the next character or word.
Following the NLP pipeline, the dataset is preprocessed, tokenized, and used to train the model.
Once trained, the RNN can generate Shakespearean-style text by predicting and appending
tokens sequentially, capturing the unique style and flow of the original works. Below is the link of
model: RNN MODEL LINK

NLP Assignment 2
No ratings yet
NLP Assignment 2
3 pages
Approaching Almost Any NLP
No ratings yet
Approaching Almost Any NLP
118 pages
REGULA - FALSI METHOD Notes
0% (1)
REGULA - FALSI METHOD Notes
14 pages
NLP Assignment2
No ratings yet
NLP Assignment2
7 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
Sentiment Analysis Behind Text With Different Length and Formality
No ratings yet
Sentiment Analysis Behind Text With Different Length and Formality
6 pages
Assignment-10 (NLP-part-2)
No ratings yet
Assignment-10 (NLP-part-2)
2 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Experiential Learning
No ratings yet
Experiential Learning
8 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Natural Language Processing - NOTES
No ratings yet
Natural Language Processing - NOTES
4 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
Natural Language Processing Manual
No ratings yet
Natural Language Processing Manual
39 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Assignment 1 - NLP
No ratings yet
Assignment 1 - NLP
2 pages
Unit2 Full
No ratings yet
Unit2 Full
28 pages
NLP Record300
No ratings yet
NLP Record300
24 pages
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
No ratings yet
SMA (TASK1 AND 2) ... HARDCOPY (Final) ..Pranchal..
11 pages
NLP 1 Week Tutorial NLTK
No ratings yet
NLP 1 Week Tutorial NLTK
15 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
NLP
No ratings yet
NLP
9 pages
Assignment Data Science Intern
No ratings yet
Assignment Data Science Intern
8 pages
MTE Practice Set
No ratings yet
MTE Practice Set
4 pages
Sivasri NLP Lab
No ratings yet
Sivasri NLP Lab
50 pages
AI ML Assessment Test
No ratings yet
AI ML Assessment Test
4 pages
Dav Exp7 56
No ratings yet
Dav Exp7 56
8 pages
Set 1
No ratings yet
Set 1
4 pages
NLP Practicals
No ratings yet
NLP Practicals
6 pages
NLP
No ratings yet
NLP
12 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
Lab Manual
No ratings yet
Lab Manual
10 pages
NLP Preprocessing Steps 1740444240
No ratings yet
NLP Preprocessing Steps 1740444240
20 pages
NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
AI Lab Manual Aktu
No ratings yet
AI Lab Manual Aktu
11 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
Building A Simple Chatbot From Scratch in Python1
No ratings yet
Building A Simple Chatbot From Scratch in Python1
8 pages
NLP Tushar
No ratings yet
NLP Tushar
21 pages
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
No ratings yet
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
16 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
SL-3 - Assignment No 7
No ratings yet
SL-3 - Assignment No 7
14 pages
NLP Pipeline
No ratings yet
NLP Pipeline
50 pages
Minor Assignment-3 (NLP)
No ratings yet
Minor Assignment-3 (NLP)
2 pages
Case Study
No ratings yet
Case Study
25 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
13 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
No ratings yet
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
7 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
Aiml P4
No ratings yet
Aiml P4
12 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Anushasri939@Gmail - Com NLP Hackathon Level1
No ratings yet
Anushasri939@Gmail - Com NLP Hackathon Level1
20 pages
NLP MTE Syllabus and Practice Problems
No ratings yet
NLP MTE Syllabus and Practice Problems
2 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Perceptual Computing: Fundamentals and Applications
From Everand
Perceptual Computing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
A Review On Machine Learning Techniques in Biomedical Research
No ratings yet
A Review On Machine Learning Techniques in Biomedical Research
15 pages
Module 5: Design of Sampled Data Control Systems
No ratings yet
Module 5: Design of Sampled Data Control Systems
5 pages
STATS
No ratings yet
STATS
18 pages
Maxima and Minima - GATE Study Material in PDF
No ratings yet
Maxima and Minima - GATE Study Material in PDF
7 pages
Css 23 May
No ratings yet
Css 23 May
1 page
Lecture#08 Ch7 Sampling
No ratings yet
Lecture#08 Ch7 Sampling
19 pages
Class Imbalance Should Not Throw You Off Balance - Choosing The Right Classifiers and Performance Metrics For Brain Decoding With Imbalanced Data
No ratings yet
Class Imbalance Should Not Throw You Off Balance - Choosing The Right Classifiers and Performance Metrics For Brain Decoding With Imbalanced Data
14 pages
Fast Fourier Transform (FFT) and Digital Filtering Using Labview
No ratings yet
Fast Fourier Transform (FFT) and Digital Filtering Using Labview
5 pages
Distributed Word Representations For Information Retrieval
No ratings yet
Distributed Word Representations For Information Retrieval
46 pages
Daley Etal 2022 Practical Quantum Advantage in Quantum Simulation
No ratings yet
Daley Etal 2022 Practical Quantum Advantage in Quantum Simulation
14 pages
10d Exam Review Overall 1 Solutions
No ratings yet
10d Exam Review Overall 1 Solutions
4 pages
MFDM
100% (1)
MFDM
2 pages
OS Part 06
No ratings yet
OS Part 06
47 pages
STD X Chapter 3 Arrays.20231219122052
No ratings yet
STD X Chapter 3 Arrays.20231219122052
48 pages
Optimal Channel Estimation For Capacity Maximization in OFDM Systems
No ratings yet
Optimal Channel Estimation For Capacity Maximization in OFDM Systems
20 pages
Module 3 Mid Mod Assessment Study Guide
No ratings yet
Module 3 Mid Mod Assessment Study Guide
1 page
MC Module 3 Notes
No ratings yet
MC Module 3 Notes
48 pages
Heuristic Optimization Methods: Tabu Search: Slides Prepared by Nina Skorin-Kapov
No ratings yet
Heuristic Optimization Methods: Tabu Search: Slides Prepared by Nina Skorin-Kapov
40 pages
Introducation To Linear Control Systems
No ratings yet
Introducation To Linear Control Systems
23 pages
APPLIED THERMODYNAMICS (Solved Question Paper - 2017 Dec/2018 Jan)
50% (4)
APPLIED THERMODYNAMICS (Solved Question Paper - 2017 Dec/2018 Jan)
21 pages
Fundamentals of Biostatistics 8th Edition TEXTBOOK PDF
No ratings yet
Fundamentals of Biostatistics 8th Edition TEXTBOOK PDF
11 pages
EE320A Solutions For Tutorial 2
No ratings yet
EE320A Solutions For Tutorial 2
14 pages
X X P Q: Chapter 15: Random Variables
No ratings yet
X X P Q: Chapter 15: Random Variables
7 pages
Heuristic Search
No ratings yet
Heuristic Search
25 pages
Data Science Task List Pfsinterns
No ratings yet
Data Science Task List Pfsinterns
14 pages
Survey - 4 - Adjustment 2023
No ratings yet
Survey - 4 - Adjustment 2023
4 pages
Implementation of Linked List
No ratings yet
Implementation of Linked List
7 pages
Cse308-Lab-Report-5 2D Parity
No ratings yet
Cse308-Lab-Report-5 2D Parity
6 pages
2007 YJC Paper 2sol
No ratings yet
2007 YJC Paper 2sol
10 pages

NLP - Assignment2 Proper RNN Working

Uploaded by

NLP - Assignment2 Proper RNN Working

Uploaded by

Natural Language Processing

You might also like