NLP Module 3
NLP Module 3
22AI632 RACHEL E C
BITM, BALLARI
Text classification is the task of assigning one or more categories to a
given piece of text from a larger set of possible categories.
This task of categorizing texts based on some properties has a wide
range of applications across diverse domains, such as social media, e-
commerce, healthcare, law, and marketing etc.
Any supervised classification approach, includes three types based on
the number of categories involved:
1. Binary – 2 classes
2. Multiclass – more than 2 classes
3. Multilabel classification – one or more label/classes attached to it
Customer support
E-commerce
Language identification
Authorship attribution
Triaging posts
The hyperplane tries that the margin between the closest points of
different classes should be as maximum as possible.
The best hyperplane is the one that represents the largest separation
or margin between the two classes.
1. Tokenize the texts and convert them into word index vectors.
2. Pad the text sequences so that all text vectors are of the same
length.
3. Map every word index to an embedding vector.
We do that by multiplying word index vectors with the embedding
matrix.
The embedding matrix can either be populated using pre-trained
embeddings or it can be trained for embeddings on this corpus.
4. Use the output from Step 3 as the input to a neural network
architecture.
The code snippet below illustrates Steps 1 and 2:
Step 3: We have to download them and use them to convert our data
into the input format for the neural networks
Step 4: DL architectures consist of an input layer, an output layer, and
several hidden layers in between the two. Depending on the
architecture, different hidden layers are used. The input layer
for textual input is typically an embedding layer. The output
layer, especially in the context of text classification, is a
softmax layer with categorical output.
CNNs for Text Classification
CNNs typically consist of a series of convolution and pooling layers as
the hidden layers.
CNNs can be thought of as learning the most useful bag-of-words/n-
grams features instead of taking the entire collection of words/n-
grams as features.
Word Embeddings: Each word in the text is represented as a dense
vector (embedding), often using pre-trained embeddings like
Word2Vec, GloVe, or contextual embeddings from models like BERT.
Global Max Pooling: For text classification, global max pooling might be
used to condense the entire feature map into a single vector by taking
the maximum value across all positions. This vector represents the most
salient features extracted by the convolutional layers.
Dense Layers: After pooling, the resulting feature vector is passed
through one or more fully connected (dense) layers. These layers are
responsible for combining the extracted features and making the final
classification decision.
Class Prediction: The output layer provides the final prediction, which
is usually a probability distribution over the possible classes. For
instance, if you’re classifying movie reviews as positive or negative,
the network will output probabilities indicating how likely the review
is to belong to each class.
Input Layer
Output Layer
Hidden Layer
Specifying the model, such as activation functions, hidden layers, layer
sizes, loss function, optimizer, metrics, epochs, and batch size.
• The current word in the sentence depends on its context— the words
before and after.
• RNNs work on the principle of using this context while learning the
language representation or a model of language.
4. Build Model
Using the initial dataset, the team builds a basic text classification
model. This model will likely be simple and less accurate but will serve
as a foundation for further development.