0% found this document useful (0 votes)

13 views66 pages

NLP Module 3

Uploaded by

sowmya17280

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views66 pages

NLP Module 3

Uploaded by

sowmya17280

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

SUBJECT CODE BY

22AI632 RACHEL E C
BITM, BALLARI
Text classification is the task of assigning one or more categories to a
given piece of text from a larger set of possible categories.
This task of categorizing texts based on some properties has a wide
range of applications across diverse domains, such as social media, e-
commerce, healthcare, law, and marketing etc.
Any supervised classification approach, includes three types based on
the number of categories involved:
1. Binary – 2 classes
2. Multiclass – more than 2 classes
3. Multilabel classification – one or more label/classes attached to it

Text classification is sometimes also referred to as topic classification,

text categorization, or document categorization.
APPLICATION

Content classification and organization

Customer support

E-commerce

Language identification

Authorship attribution

Triaging posts

Segregate fake news from real news

One typically follows these steps when building a text classification
system:
1. Collect or create a labeled dataset suitable for the task.
2. Split the dataset into two (training and test) or three parts: training,
validation (i.e., development), and test sets, then decide on evaluation
metric(s).
3. Transform raw text into feature vectors.
4. Train a classifier using the feature vectors and the corresponding
labels from the training set.
5. Using the evaluation metric(s) from Step 2, benchmark the model
performance on the test set.
6. Deploy the model to serve the real-world use case and monitor its
performance.
A Simple Classifier
Lexicon – based Sentiment Analysis
Bayes’ Theorem is used to determine the conditional probability of an
event.
It is used to find the probability of an event, based on prior
knowledge of conditions that might be related to that event.
Pr (A ∩ B | C) = Pr (A | C) Pr (B | C)
Types of Naïve Bayes Model

Gaussian: The Gaussian model assumes that features follow a normal

distribution. This means if predictors take continuous values instead of
discrete, then the model assumes that these values are sampled from
the Gaussian distribution.
Multinomial: The Multinomial Naïve Bayes classifier is used when the
data is multinomial distributed. It is primarily used for document
classification problems, it means a particular document belongs to
which category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
Bernoulli: The Bernoulli classifier works similar to the Multinomial
classifier, but the predictor variables are the independent Booleans
variables. Such as if a particular word is present or not in a document.
This model is also famous for document classification tasks.
Naive Bayes classifier, it learns the probability of a text for each class
and chooses the one with maximum probability. Such a classifier is
called a generative classifier.
Logistic regression is an example of a discriminative classifier, as a
baseline in research, and as an MVP in real-world industry scenarios.

LR estimates probabilities based on feature occurrence in classes,

logistic regression “learns” the weights for individual features based
on how important they are to make a classification decision.

The goal of logistic regression is to learn a linear separator between

classes in the training data with the aim of maximizing the probability
of the data.

This “learning” of feature weights and probability distribution over all

classes is done through a function called “logistic” function, hence it is
called as logistic regression
Support Vector Machine
• A support vector machine (SVM), first invented in the early 1960s, is a
discriminative classifier.

• Support Vector Machine (SVM) is a powerful machine learning

algorithm used for linear or nonlinear classification, regression, and
even outlier detection tasks.

• It aims to look for an optimal hyperplane in a higher dimensional

space, which can separate the classes in the data by a maximum
possible margin.

• SVMs are capable of learning even non-linear separations between

classes
The main objective of the SVM algorithm is to find the
optimal hyperplane in an N-dimensional space that can separate the
data points in different classes in the feature space.

The hyperplane tries that the margin between the closest points of
different classes should be as maximum as possible.

If the number of input features is two, then the hyperplane is just a

line. If the number of input features is three, then the hyperplane
becomes a 2-D plane.

The best hyperplane is the one that represents the largest separation
or margin between the two classes.

The hyperplane whose distance from it to the nearest data point on

each side is maximized. If such a hyperplane exists it is known as
the maximum-margin hyperplane/hard margin.
Deep Learning for Text Classification
The steps involved in converting training and test data into a format
suitable for the neural network input layers:

1. Tokenize the texts and convert them into word index vectors.
2. Pad the text sequences so that all text vectors are of the same
length.
3. Map every word index to an embedding vector.
We do that by multiplying word index vectors with the embedding
matrix.
The embedding matrix can either be populated using pre-trained
embeddings or it can be trained for embeddings on this corpus.
4. Use the output from Step 3 as the input to a neural network
architecture.
The code snippet below illustrates Steps 1 and 2:
Step 3: We have to download them and use them to convert our data
into the input format for the neural networks
Step 4: DL architectures consist of an input layer, an output layer, and
several hidden layers in between the two. Depending on the
architecture, different hidden layers are used. The input layer
for textual input is typically an embedding layer. The output
layer, especially in the context of text classification, is a
softmax layer with categorical output.
CNNs for Text Classification
CNNs typically consist of a series of convolution and pooling layers as
the hidden layers.
CNNs can be thought of as learning the most useful bag-of-words/n-
grams features instead of taking the entire collection of words/n-
grams as features.
Word Embeddings: Each word in the text is represented as a dense
vector (embedding), often using pre-trained embeddings like
Word2Vec, GloVe, or contextual embeddings from models like BERT.

Filters/Kernels: The CNN applies convolutional filters (or kernels)

across the word embeddings. Each filter slides over the matrix of
word embeddings (representing the text) and performs convolution
operations. These filters detect local patterns or features in the text,
such as phrases or combinations of words that may be indicative of
certain classes.

Feature Maps: The result of applying a filter is a feature map, which

captures specific patterns or features from the text. For instance, a
filter might be tuned to recognize the presence of negations or
sentiment-laden phrases.
Max Pooling: After the convolution operation, a pooling layer is often
used to reduce the dimensionality of the feature maps and retain the
most important features. Max pooling, a common technique, involves
taking the maximum value from a set of features within a specified
window. This operation helps in capturing the most prominent features
and reduces the spatial size of the feature maps.

Global Max Pooling: For text classification, global max pooling might be
used to condense the entire feature map into a single vector by taking
the maximum value across all positions. This vector represents the most
salient features extracted by the convolutional layers.
Dense Layers: After pooling, the resulting feature vector is passed
through one or more fully connected (dense) layers. These layers are
responsible for combining the extracted features and making the final
classification decision.

Activation Function: Typically, the final layer uses an activation

function such as softmax (for multi-class classification) or sigmoid (for
binary classification) to produce the probability scores for each class.

Class Prediction: The output layer provides the final prediction, which
is usually a probability distribution over the possible classes. For
instance, if you’re classifying movie reviews as positive or negative,
the network will output probabilities indicating how likely the review
is to belong to each class.
Input Layer

Output Layer

Hidden Layer
Specifying the model, such as activation functions, hidden layers, layer
sizes, loss function, optimizer, metrics, epochs, and batch size.

We have the number of epochs as 10 or above. But that also increases

the amount of time it takes to train the model.

Another thing to note is that, if you want to train an embedding layer

instead of using pretrained embeddings in this model, the only thing
that changes is the line cnnmo del.add(embedding_layer).
Recurrent Neural Network (RNN)
Recurrent connection enables RNNs to maintain internal memory,
where the output of each step is fed back as an input to the next step,
allowing the network to capture the information from previous steps
and utilize it in the current step, enabling model to learn temporal
dependencies and handle input of variable length.
LSTMs for Text Classification

• Language is sequential in nature and RNNs are specialized in working

with sequential data.

• The current word in the sentence depends on its context— the words
before and after.

• RNNs work on the principle of using this context while learning the
language representation or a model of language.

• Long Short Term Memory (LSTM) is a special kind of Recurrent Neural

Network (RNN), capable of learning long-term dependencies.
•These long-term dependencies have a great influence on the meaning
and overall polarity of a document.

• Long short-term memory networks (LSTM) address this long-term

dependency problem by introducing a memory into the network.

• LSTM networks are designed to handle vanishing gradient problems

and learn long-term dependencies better than traditional RNNs.

• It was first introduced by Hochreiter & Schmidhuber.

• The LSTM architecture has a range of repeated modules for each

time step as in a standard RNN.
At each time step, the output of the LSTM module is controlled by a
set of gates, as a function of the old hidden state ℎ𝑡𝑡−1 and the input
at the current time step 𝑥𝑥 : the forget gate 𝑓𝑓, the input gate 𝑖𝑖 , and
the output gate 𝑂𝑂 .

These gates collectively decide how to update the current memory

cell 𝐶𝐶 and the current hidden state ℎ .

The LSTM transition functions are defined as follows:

𝑖𝑖𝑡𝑡=(𝑊𝑊𝑖𝑖[ℎ𝑡𝑡−1,𝑥𝑥𝑡𝑡]+𝑏𝑏𝑖𝑖)
𝐶𝐶´𝑡𝑡=𝑡𝑡𝑎𝑎𝑛𝑛ℎ(𝑊𝑊𝑐𝑐[ℎ𝑡𝑡−1,𝑥𝑥𝑡𝑡]+𝑏𝑏𝐶𝐶)
𝑓𝑓𝑡𝑡=(𝑊𝑊𝑓𝑓[ℎ𝑡𝑡−1,𝑥𝑥𝑡𝑡]+𝑏𝑏𝑓𝑓)
𝑂𝑂𝑡𝑡=(𝑊𝑊𝑜𝑜[ℎ𝑡𝑡−1,𝑥𝑥𝑡𝑡]+𝑏𝑏𝑜𝑜)
𝐶𝐶𝑡𝑡= 𝑓𝑓𝑡𝑡∗𝐶𝐶𝑡𝑡−1+𝑖𝑖𝑡𝑡∗𝐶𝐶´𝑡𝑡
Long Short Term Memory (LSTM) was designed to overcome the
problems of simple Recurrent Neural Network (RNN) by allowing the
network to store data in a sort of memory that it can access at a later
times.
The key of the LSTM model is the cell state.
The cell state is updated twice with few computations that resulting
stabilize gradients.
It has also a hidden state that acts like a short term memory.
The first step is to decide what information we’re going to throw away
from the cell state. This decision is made by a sigmoid layer called the
“Forget Gate” layer.
The second step is to decide what new information that we’re going
to store in the cell state. This has two parts.
First, a sigmoid layer called the “Input Gate” layer decides which
values we’ll update.
Next, a tanh layer which creates a vector of new candidate values that
could be added to the state.

Finally, we need to decide what we are going to give as output. This

output will be based on our cell state, but will be a filtered version.
First, we run a sigmoid layer which decides what parts of the cell state
we’re going to give as output.
Then, we put the cell state through tanh (to push the values to be
between -1 and 1) and multiply it by the output of the sigmoid gate,
so that we only output the parts we decided
Case Study: Corporate Ticketing
Imagine we’re asked to build a ticketing system for our organization
that will track all the tickets or issues people face in the organization
and route them to either internal or external agents.
Now let’s say our company has recently hired a medical counsel and
partnered with a hospital.
So our system should also be able to pinpoint any medical-related issue
and route it to the relevant people and teams.

1. Use existing APIs or libraries

2. Use public datasets
3. Utilize weak supervision
4. Active learning
5. Learning from implicit and explicit feedback
Phase 1: Initial Data Collection and Model

At the beginning of the project, there is no labeled data available to

train a text classification model.
The company needs a way to generate an initial dataset to kickstart the
model-building process.

1. Map Public API or Library

The team searches for public APIs or libraries that can provide relevant
data.

2. Map Public Dataset

Another approach is to find existing public datasets that are similar to
the corporate environment, such as datasets containing labeled
customer service tickets or product reviews. These datasets can
provide a base for understanding how to classify tickets.
3. Weak Supervision to Create Initial Dataset
Weak supervision involves using less accurate, noisy, or heuristic-based
methods to label the initial dataset.

4. Build Model
Using the initial dataset, the team builds a basic text classification
model. This model will likely be simple and less accurate but will serve
as a foundation for further development.

Phase 2: Improved Model with Continuous Iteration

5. Collect Explicit & Implicit Data

As the ticketing system is deployed, it collects explicit data (e.g., direct
feedback from users categorizing tickets) and implicit data (e.g.,
patterns in how tickets are resolved). This data is used to refine the
model.
6. Active Learning
Active learning involves the model selecting the most uncertain or
challenging cases and presenting them to human experts for labeling.
By focusing on the most difficult tickets, the model learns more
efficiently and improves its accuracy over time.

7. Analyze & Iterate

The team continually analyzes the model's performance, identifying
areas for improvement.
They iterate on the model, retraining it with newly collected and more
accurately labeled data.
This feedback loop ensures that the model becomes increasingly
reliable and effective at classifying tickets.

ML Merged
No ratings yet
ML Merged
433 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
74 pages
NLP m4
No ratings yet
NLP m4
97 pages
Machine Learning Algorithm Cheat Sheet
No ratings yet
Machine Learning Algorithm Cheat Sheet
1 page
Deep Learning
No ratings yet
Deep Learning
42 pages
G.H Patel College of Engineering and Technology: Text Analysis, Summarization and Extraction
No ratings yet
G.H Patel College of Engineering and Technology: Text Analysis, Summarization and Extraction
98 pages
Best Text To Speech Ai - Aitech - Studio
No ratings yet
Best Text To Speech Ai - Aitech - Studio
8 pages
L2 Cse256 Fa24 TC
No ratings yet
L2 Cse256 Fa24 TC
65 pages
Lecture5 421
No ratings yet
Lecture5 421
115 pages
ITD253 L6 TextClassificationClustering
No ratings yet
ITD253 L6 TextClassificationClustering
39 pages
NLP NB
No ratings yet
NLP NB
52 pages
AI lsn5 PDF
No ratings yet
AI lsn5 PDF
18 pages
IR - Group1
No ratings yet
IR - Group1
27 pages
Impact of Convolutional Neural Network and Fasttext Embedding On Text Classification
No ratings yet
Impact of Convolutional Neural Network and Fasttext Embedding On Text Classification
17 pages
Lec # 9
No ratings yet
Lec # 9
18 pages
NLP Unit-3
No ratings yet
NLP Unit-3
17 pages
Text Classification
No ratings yet
Text Classification
11 pages
DL Unit 3
No ratings yet
DL Unit 3
8 pages
What Is Text Classification - Exxact
No ratings yet
What Is Text Classification - Exxact
12 pages
CAT King Study Material 4
No ratings yet
CAT King Study Material 4
32 pages
Unit-III NLP
No ratings yet
Unit-III NLP
15 pages
Unit 3
No ratings yet
Unit 3
27 pages
CS221 - Artificial Intelligence - Machine Learning - 1 Overview
No ratings yet
CS221 - Artificial Intelligence - Machine Learning - 1 Overview
16 pages
Unit 2
No ratings yet
Unit 2
26 pages
Introduction To Generative AI
No ratings yet
Introduction To Generative AI
22 pages
Lect 05
No ratings yet
Lect 05
17 pages
Partial and Multiple Correlation
33% (3)
Partial and Multiple Correlation
2 pages
Lecture 6 Text Classification
No ratings yet
Lecture 6 Text Classification
19 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
03 Classification
No ratings yet
03 Classification
66 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
38 pages
Kshitij Text Classification
No ratings yet
Kshitij Text Classification
20 pages
Slide 10 Chapter9 Classification Advanced Methods
No ratings yet
Slide 10 Chapter9 Classification Advanced Methods
46 pages
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
Text Classification Using NLP
No ratings yet
Text Classification Using NLP
28 pages
127 1498038923 - 21-06-2017 PDF
No ratings yet
127 1498038923 - 21-06-2017 PDF
9 pages
Lec # 4-1
No ratings yet
Lec # 4-1
15 pages
Talking Points
No ratings yet
Talking Points
8 pages
Research Paper 3
No ratings yet
Research Paper 3
7 pages
4 DL
No ratings yet
4 DL
81 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
UNIT-III Text Classification
No ratings yet
UNIT-III Text Classification
4 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
33 pages
Document Classification Using Machine Learning: What Is Document Classifier?
No ratings yet
Document Classification Using Machine Learning: What Is Document Classifier?
9 pages
Newton's Laws of Motion - Example Problems With Solution
100% (1)
Newton's Laws of Motion - Example Problems With Solution
29 pages
Mlintro 2
No ratings yet
Mlintro 2
28 pages
Lecture 1
No ratings yet
Lecture 1
43 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
27 pages
Subtitle
No ratings yet
Subtitle
7 pages
A Complete Process of Text Classification System Using State of The Art NLP Models
No ratings yet
A Complete Process of Text Classification System Using State of The Art NLP Models
26 pages
A Survey On Different Types of Approaches To Text Categorization
No ratings yet
A Survey On Different Types of Approaches To Text Categorization
3 pages
L02 Fundamentals of ML
No ratings yet
L02 Fundamentals of ML
39 pages
ML Final Print Upload
No ratings yet
ML Final Print Upload
10 pages
Mlintro 3
No ratings yet
Mlintro 3
28 pages
DL Highlights
No ratings yet
DL Highlights
6 pages
Time Complexity Problems Notes
No ratings yet
Time Complexity Problems Notes
18 pages
Math4-Q3M3-Kinds-of-Triangles-and-Quadrilaterals - Manlapaz JM
No ratings yet
Math4-Q3M3-Kinds-of-Triangles-and-Quadrilaterals - Manlapaz JM
20 pages
UNIT-III Support Vector Machines
No ratings yet
UNIT-III Support Vector Machines
43 pages
Text Classification - Movie Review - News Wires
No ratings yet
Text Classification - Movie Review - News Wires
5 pages
Coursehero Chapter 1
100% (1)
Coursehero Chapter 1
23 pages
Ch7 Introduction To Machine Learning
No ratings yet
Ch7 Introduction To Machine Learning
29 pages
Module1 - Deep Learning
No ratings yet
Module1 - Deep Learning
26 pages
Time and Space Complexity Analysis - 2
No ratings yet
Time and Space Complexity Analysis - 2
2 pages
Time and Space Complexity Analysis - 1
No ratings yet
Time and Space Complexity Analysis - 1
2 pages
Boone (1996)
No ratings yet
Boone (1996)
11 pages
Chapter 2
100% (2)
Chapter 2
51 pages
High-Frequency Amplifier Design P Ìo
No ratings yet
High-Frequency Amplifier Design P Ìo
35 pages
PROGRAMMING FOR PROBLEM SOLVING USING C Syllabus
No ratings yet
PROGRAMMING FOR PROBLEM SOLVING USING C Syllabus
2 pages
Fire Behavior in Rooms by Kunio Kawagoe
No ratings yet
Fire Behavior in Rooms by Kunio Kawagoe
4 pages
Skin Temperature To Core Temperature
No ratings yet
Skin Temperature To Core Temperature
19 pages
HW01 Sol
100% (1)
HW01 Sol
11 pages
NLP - Assignment 1 Q
No ratings yet
NLP - Assignment 1 Q
1 page
NLP QB
No ratings yet
NLP QB
1 page
Non-Unitary Time Dynamics of Topological Modes in Open Planar Quantum Systems
No ratings yet
Non-Unitary Time Dynamics of Topological Modes in Open Planar Quantum Systems
18 pages
Latex Pratical Front Page.
No ratings yet
Latex Pratical Front Page.
3 pages
Syllabus - Teacher-Eligibility - Test - Very Essential To Know
No ratings yet
Syllabus - Teacher-Eligibility - Test - Very Essential To Know
15 pages
Computer Science
No ratings yet
Computer Science
3 pages
S9 Q4 Hybrid Module 2 Week 3 Conservation of Momentum
No ratings yet
S9 Q4 Hybrid Module 2 Week 3 Conservation of Momentum
19 pages
The Journey of A Toy Car, Science Fair Digital Board
No ratings yet
The Journey of A Toy Car, Science Fair Digital Board
3 pages
UD Fiber Composites
No ratings yet
UD Fiber Composites
9 pages
Simulation Technique
No ratings yet
Simulation Technique
54 pages
2021 Staar 8 Math Key
No ratings yet
2021 Staar 8 Math Key
1 page
DSP Lab Experiments
No ratings yet
DSP Lab Experiments
13 pages
Lesson-Plan W9
No ratings yet
Lesson-Plan W9
4 pages
Permeability of Saturated Sands, Soils and Clays: Department of Chemistry, Rondebosch, University of Cape Town
No ratings yet
Permeability of Saturated Sands, Soils and Clays: Department of Chemistry, Rondebosch, University of Cape Town
12 pages
Phase Transitions in Sudoku: Carlos Cotta
No ratings yet
Phase Transitions in Sudoku: Carlos Cotta
8 pages
3 Metstat
No ratings yet
3 Metstat
2 pages
Homework 1
No ratings yet
Homework 1
3 pages
Lab 2
No ratings yet
Lab 2
5 pages
Harmonic Oscilator With F.D.
No ratings yet
Harmonic Oscilator With F.D.
5 pages
2704 Enthusiast Score-Advanced Paper-1
No ratings yet
2704 Enthusiast Score-Advanced Paper-1
30 pages
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

NLP Module 3

Uploaded by

NLP Module 3

Uploaded by

SUBJECT CODE BY

Text classification is sometimes also referred to as topic classification,

Content classification and organization

Segregate fake news from real news

Gaussian: The Gaussian model assumes that features follow a normal

LR estimates probabilities based on feature occurrence in classes,

The goal of logistic regression is to learn a linear separator between

This “learning” of feature weights and probability distribution over all

• Support Vector Machine (SVM) is a powerful machine learning

• It aims to look for an optimal hyperplane in a higher dimensional

• SVMs are capable of learning even non-linear separations between

If the number of input features is two, then the hyperplane is just a

The hyperplane whose distance from it to the nearest data point on

Filters/Kernels: The CNN applies convolutional filters (or kernels)

Feature Maps: The result of applying a filter is a feature map, which

Activation Function: Typically, the final layer uses an activation

We have the number of epochs as 10 or above. But that also increases

Another thing to note is that, if you want to train an embedding layer

• Language is sequential in nature and RNNs are specialized in working

• Long Short Term Memory (LSTM) is a special kind of Recurrent Neural

• Long short-term memory networks (LSTM) address this long-term

• LSTM networks are designed to handle vanishing gradient problems

• It was first introduced by Hochreiter & Schmidhuber.

• The LSTM architecture has a range of repeated modules for each

These gates collectively decide how to update the current memory

The LSTM transition functions are defined as follows:

Finally, we need to decide what we are going to give as output. This

1. Use existing APIs or libraries

At the beginning of the project, there is no labeled data available to

1. Map Public API or Library

2. Map Public Dataset

Phase 2: Improved Model with Continuous Iteration

5. Collect Explicit & Implicit Data

7. Analyze & Iterate

You might also like