0% found this document useful (0 votes)

69 views9 pages

Improving Text Classifiers Through Controlled Text Generation Using Tranformer Wasserstein Autoencoder

This document discusses using a transformer-based Wasserstein autoencoder for controlled text generation to improve classifiers trained on imbalanced datasets. It proposes generating synthetic minority class data using the autoencoder to balance imbalanced text classification datasets. The paper compares classifiers trained on this synthetic data to those trained on data from other synthetic data generators. It discusses training a controller network to generate controlled synthetic text for different target classes to balance two example imbalanced natural language datasets.

Uploaded by

Jishnu P Mohanakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views9 pages

Improving Text Classifiers Through Controlled Text Generation Using Tranformer Wasserstein Autoencoder

Uploaded by

Jishnu P Mohanakrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Improving Text Classifiers through Controlled

Text Generation using Tranformer Wasserstein

Autoencoder?

Harikrishnan C and Dhanya N M

Amrita Vishwa Vidyapeetham, Amritanagar, Ettimadai, Tamil Nadu 641112

[email protected]
nm [email protected]

Abstract. Training good classifiers on imbalanced dataset have always

been a challenge, especially if the classifier has to work with textual
data. Natural language is one such area where there are abundant im-
balanced datasets such as spam filtering, fake news detection, toxic com-
ment classification, etc. Techniques for generating synthetic data like
Synthetic Minority Over-sampling Technique fail to train effective clas-
sifiers. This paper proposes a technique for generating controlled text
using the transformer-based Wasserstein autoencoder which helps in im-
proving the classifiers. The paper compares the results with classifiers
trained on data generated by other synthetic data generators. Further-
more, the potential issues of the proposed model for training classifiers
are discussed.

Keywords: Text Classification · Natural Language Generation · Trans-

formers

1 Introduction

A dataset is said to imbalanced when there is a skew in class proportions. This

skew is reflected on to the classifiers as when they are trained on skewed data,
their results are also skewed towards the class which has a higher proportion. A
dataset is said to imbalanced when there is a skew in class proportions. This skew
is reflected on to the classifiers as when they are trained on skewed data, their
results are also skewed towards the class which has a higher proportion. There are
different approaches to balance the data such as oversampling, undersampling,
using techniques like Synthetic Minority Over-sampling Technique (SMOTE)
[11] to generate synthetic data and assigning weights to classes [16]. But each
of those approaches has its downsides. But each of those approaches has its
downsides.There have been impressive works on spam detection and fake news
classification in the last few years [15] [12] [1] which showed how well deep
learning works better than traditional machine learning algorithms.
?
Supported by Amrita Vishwa Vidyapeetham
2 Harikrishnan et al.

Generative models have shown great improvements in the past few years. These
models can learn the distribution of the original data and generate samples from
that distribution. In the domain of natural language, text generation using a
variational autoencoder (VAE) was proven to be effective [3]. This architecture
used Recurrent Neural Networks based VAE to generate texts that are similar
to the trained dataset. The very same architecture with few modifications was
used to perform controlled text generation [6], this model was able to obtain
meaningful sentences by restricting the sentence length and better accuracy with
sentiment attributes.

After the introduction of the transformer architecture [14] which proved to be

much better at producing results on natural language tasks, most of the systems
moved from RNN based architecture to transformers. Naturally, this included
generic text generation [10] and controlled text [7] [4] using transformers.

The contribution of this paper is a transformer-based Wasserstein autoencoder

which is used for controlled text generation which in turn is used to train a
classifier.

2 Background

2.1 Strategies for balancing datasets

– Oversampling :It is the process where the data belonging to the minority
class is replicated randomly to match the number of instances in the majority
class. The disadvantage of this approach is that it can lead the model to
overfit on the training data
– Undersampling :It is the process where the instances belonging to the
majority class are removed to match the count of instances in the minority
class. There is a chance of losing important information which will lead to
poor generalization of the model.
– SMOTE :SMOTE is used for generating synthetic data for the minority
class. These instances are generated by interpolating the points between
nearest neighbors. While SMOTE has shown some promise in numerical
datasets, it doesn’t work very well in text data

2.2 Wasserstein Autoencoder

Wasserstein autoencoder (WAE) [13] uses the same architecture as variational

autoencoder [8]. While VAE uses Kullback-Leibler divergence for minimizing the
distance between the prior and the posterior distribution WAE replaces this by
using a discriminator network that assigns a score how much does the posterior
distribution resembles the prior. This is achieved by a min-max game played by
the encoder and the discriminator. The discriminator is trained on the following
objective function
Improving text classifier 3

λ m
Σ log(Dγ (zi )) + log(1 − Dγ (zî )) (1)
m i
The above objective is maximized by performing a gradient ascent. The en-
coder is trained on the following objective function which is to be minimized.
1 m
Σ c(xi , Gθ (zî )) − λ · log(Dγ (zî ) (2)
m i

2.3 Transformers
The transformer is a type of architecture that works with sequence to sequence
tasks. This architecture owes its performance to the self-attention mechanism
to understand the weightage of each word in the sentence. The self-attention
mechanism is further enhanced using Multihead attention, where there are h
number of heads and each head performs the self-attention operation. This helps
in interpreting the different meanings of a single sentence. The self-attention
operation can be expressed in the following mathematical expression.

(QK T )
Attention(Q, K, V ) = sof tmax( √ )V (3)
dK

2.4 Decoding Strategies

When generating text from a model, the diversity of the generated text depends
on the decoding strategies used. Some of the decoding strategies are:-
– Greedy Decoding: This is one of the simplest decoding strategies. While
generating the text, the next word is chosen by picking the word with the
highest probability. This process goes on until the maximum number of words
or the end of sentence tag is encountered.
– Topk Sampling: This decoding strategy takes top k probabilities and sam-
ples a word from it. This strategy helps to introduce words that don’t come
up often in sentences.
– Softmax with Temperature: Here a parameter T for temperature is used
to manipulate the output probabilities of the model [5]. The value of T is
used to divide the probabilities before the exponential operation in softmax.

yi
eT
sof tmax(x)i = yj (4)
ΣjN e T

3 Method and Experiments

To ascertain the results, two imbalanced natural language datasets were chosen:-
1. Covid Fake News Dataset [2]
2. Spam Identification [9]
4 Harikrishnan et al.

The Covid dataset has a total of 10201 headlines of which 9727 headlines are
real news and 474 headlines are fake. In the spam or ham dataset, there are a
total of 5572 mail subjects of which 4825 are not spam and 747 are spam mails.

Fig. 1. The proportion of imbalance in a) Covid dataset b) Spam dataset

For preprocessing text data, after tokenization, the words that do not repeat
more than once were replaced by the < unk > token. The numbers were replaced
by < num > token. To train the Transformer WAE start of sentence < sos >
and end of sentence < eos > tokens were appended at the beginning and end of
the text.

Fig. 2. Tranformer Wasserstein Autoencoder

Improving text classifier 5

The transformer WAE was trained using teachers forcing with negative log-
likelihood as the reconstruction loss and divergence loss determined by the dis-
criminator. The training is performed on the complete dataset. After training,
the encoder is used to train a controlling network. The dataset is downsam-
pled and the balanced dataset is encoded into a latent representation by the
encoder. The controlling network is trained to distinguish between the latent
representation by class.

Fig. 3. Training Controller Network

For controlled text generation, a random noise z is sampled from a unit Gaussian
distribution. This noise is then passed to the Controller Network Cz . The label
output by the controller network is set as the expected output and a cross-
entropy loss the calculated with respect to noise z. The noise z is then updated
by gradient descent after scaling it by a factor of η

y = argmax(Cz (z)) (5)

1 m
L= Σ y · log(Cz (z)) (6)
m 1

dL
z =z+η· (7)
dz
The noise is updated iteratively to convert it to the value that the controller is
confident about. The combination of decoding strategy is used while generating
the text, the softmax with temperature was applied over the probabilities and
greedy decoding was implemented. Topk sampling was used to find a replacement
word when a < unk > token was encountered.
The classifier is first trained on a balanced downsampled dataset for 100
epochs. Later the same classifier is finetuned for 5 epochs on the dataset that
is a combination of the downsampled dataset with the generated dataset. The
model trained on downsampled on the same downsampled dataset to prevent
the model from forgetting the original data.
6 Harikrishnan et al.

Fig. 4. Training the classifier

Two different types of classifiers were chosen to validate this approach. A

normal RNN classifier and an RNN with an attention decoder classifier. The
embedding dimension was chosen as 64, the hidden size was set as 256. As for the
architecture, two layers of LSTM were stacked together. For the second classifier,
one LSTM encoder and two LSTM decoders were part of the architecture. A
scaled dot product was performed on the output of the encoder and output
of the first decoder. The results of the dot product were input to the second
decoder.

The setup shown in Fig. 4 was modified by swapping out the transformer model
with RNN Variational Autoencoder’s (RNN VAE) decoder. This was done to
compare how transformer-based text generation affects the classifier.
Lastly, another set of classifiers were trained on a combination of the real
data and synthetic data generated by SMOTE. This was done to understand
how much does the SMOTE help in text classification.

4 Results
For comparing the different models explained in the previous section, accuracy
and F1-score were chosen as the metrics.The models were tested on validation
set which was not the part of the training set. The validation set was taken such
that there is no significant skewness among the proportion of class instances.
From the results, it can be inferred that fine-tuning the classifier on the
text generated by the transformer-based model produces better results. From
Table 1 and Table 2 it is evident that SMOTE does not function well with text
classification and prevents the classifier from generalizing.
The downside of this approach is that the text generated by the generator
dependant upon the random noise. This way we have no control over what kind
of text will be generated. Another issue would be memory consumption. While
training the classifier, at least three models are loaded into the memory i.e. the
generator model, the controller network, and the classifier network. If the size of
the models is large it can result in causing an out-of-memory error.
Improving text classifier 7

Table 1. Covid Fake news detection results

Model Accuracy F1 Score

RNN classifier 0.8105 0.8392
RNN with attention 0.8315 0.8446
RNN classifier on SMOTE 0.4315 0.1147
RNN with attention on SMOTE 0.4578 0.1889
RNN classifier trained on RNN VAE 0.7736 0.7902
RNN with attention trained on RNN VAE 0.8157 0.8292
RNN classifier trained on Transformer WAE 0.8421 0.8514
RNN with attention trained on Transformer WAE 0.8421 0.8543

Table 2. Spam detection results

Model Accuracy F1 Score

RNN classifier 0.8695 0.8849
RNN with attention 0.9364 0.9396
RNN classifier on SMOTE 0.4682 0.1928
RNN with attention on SMOTE 0.4715 0.2882
RNN classifier trained on RNN VAE 0.8695 0.8876
RNN with attention trained on RNN VAE 0.9096 0.9184
RNN classifier trained on Transformer WAE 0.8862 0.8950
RNN with attention trained on Transformer WAE 0.9397 0.9407

5 Conclusions

This paper proposes a new approach to train better models on an imbalanced

dataset. The experiments showed that the existing synthetic data generation
techniques such as SMOTE proved to be ineffective in the natural language
domain and the proposed approach is quite effective. The paper further discusses
the disadvantages of using the proposed approach.
Bibliography

[1] Anjali, B., Reshma, R., Geetha Lekshmy, V.: Detection of Counterfeit News
Using Machine Learning. 2019 2nd International Conference on Intelligent
Computing, Instrumentation and Control Technologies, ICICICT 2019 pp.
1382–1386 (2019). https://doi.org/10.1109/ICICICT46008.2019.8993330
[2] Banik, S.: Covid fake news dataset (Nov
2020). https://doi.org/10.5281/zenodo.4282522,
https://doi.org/10.5281/zenodo.4282522
[3] Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Jozefowicz, R., Bengio, S.:
Generating sentences from a continuous space. CoNLL 2016 - 20th SIGNLL
Conference on Computational Natural Language Learning, Proceedings pp.
10–21 (2016). https://doi.org/10.18653/v1/k16-1002
[4] Dathathri, S., Madotto, A., Lan, J., Hung, J., Frank, E., Molino, P., Yosin-
ski, J., Liu, R.: Plug and play language models: A simple approach to con-
trolled text generation. arXiv pp. 1–34 (2019)
[5] Hinton, G., Vinyals, O., Dean, J.: Distilling the Knowledge in a Neural
Network pp. 1–9 (2015), http://arxiv.org/abs/1503.02531
[6] Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., Xing, E.P.: Toward con-
trolled generation of text. 34th International Conference on Machine Learn-
ing, ICML 2017 4, 2503–2513 (2017)
[7] Keskar, N.S., McCann, B., Varshney, L.R., Xiong, C., Socher, R.: CTRL: A
conditional transformer language model for controllable generation. arXiv
pp. 1–18 (2019)
[8] Kingma, D.P., Welling, M.: Auto-encoding variational bayes. 2nd Interna-
tional Conference on Learning Representations, ICLR 2014 - Conference
Track Proceedings (Ml), 1–14 (2014)
[9] Klimt, B., Yang, Y.: The enron corpus: A new dataset for email classification
research pp. 217–226 (2004)
[10] Liu, D., Liu, G.: A Transformer-Based Variational Autoencoder
for Sentence Generation. Proceedings of the International Joint
Conference on Neural Networks 2019-July(July), 1–7 (2019).
https://doi.org/10.1109/IJCNN.2019.8852155
[11] Mansourifar, H., Shi, W.: Deep synthetic minority over-sampling technique.
arXiv 16, 321–357 (2020)
[12] Srinivasan, S., Ravi, V., Alazab, M., Ketha, S., Al-Zoubi, A.M., Kotti
Padannayil, S.: Spam Emails Detection Based on Distributed Word
Embedding with Deep Learning. Studies in Computational Intelligence
919(December), 161–189 (2021)
[13] Tolstikhin, I., Bousquet, O., Gelly, S., Schölkopf, B.: Wasserstein auto-
encoders. arXiv pp. 1–20 (2017)
[14] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,
Kaiser, L., Polosukhin, I.: Attention is all you need. Advances in Neural
Information Processing Systems 2017-Decem(Nips), 5999–6009 (2017)
Improving text classifier 9

[15] Vinayakumar, R., Soman, K.P., Poornachandran, P., Akarsh, S.: Applica-
tion of deep learning architectures for cyber security. No. June, Springer
International Publishing (2019)
[16] Vishagini, V., Rajan, A.K.: An Improved Spam Detection Method
with Weighted Support Vector Machine. 2018 International Con-
ference on Data Science and Engineering, ICDSE 2018 (2018).
https://doi.org/10.1109/ICDSE.2018.8527737

LLM Book
No ratings yet
LLM Book
161 pages
Foundations of LLM
No ratings yet
Foundations of LLM
231 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
Foundations of Large Language Models 1738142777
No ratings yet
Foundations of Large Language Models 1738142777
101 pages
Whitepaper - Foundational Large Language Models & Text Generation
100% (2)
Whitepaper - Foundational Large Language Models & Text Generation
75 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
Large Language Model
0% (1)
Large Language Model
38 pages
Introduction To LLMS: Transformers Types of Llms Configuration Settings
100% (2)
Introduction To LLMS: Transformers Types of Llms Configuration Settings
7 pages
CIS Microsoft 365 Foundations Benchmark v4.0.0
No ratings yet
CIS Microsoft 365 Foundations Benchmark v4.0.0
449 pages
Training The Application of LLM
No ratings yet
Training The Application of LLM
68 pages
14.chapter10 AdvancedDeepLearningForText
No ratings yet
14.chapter10 AdvancedDeepLearningForText
22 pages
Perspectives in Business Ethics
No ratings yet
Perspectives in Business Ethics
113 pages
Chapter 2. Transformers: A Note For Early Release Readers
No ratings yet
Chapter 2. Transformers: A Note For Early Release Readers
85 pages
LLM Cheatsheet
No ratings yet
LLM Cheatsheet
1 page
LLM Book 43-102
No ratings yet
LLM Book 43-102
60 pages
Language Model Evaluation in Open-Ended Text Gener
No ratings yet
Language Model Evaluation in Open-Ended Text Gener
70 pages
2 Generative Models
No ratings yet
2 Generative Models
60 pages
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
No ratings yet
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
68 pages
10.48550 Arxiv.2204.02311
No ratings yet
10.48550 Arxiv.2204.02311
87 pages
Pretraining and Evaluation CodeLLMs
No ratings yet
Pretraining and Evaluation CodeLLMs
71 pages
ChatBot With GANs
No ratings yet
ChatBot With GANs
61 pages
CS 4650/7650: Natural Language Processing: Neural Text Classification
No ratings yet
CS 4650/7650: Natural Language Processing: Neural Text Classification
85 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
06 Lecture09 Pretraining
No ratings yet
06 Lecture09 Pretraining
61 pages
Chowdhery Et Al. - 2022 - PaLM Scaling Language Modeling With Pathways
No ratings yet
Chowdhery Et Al. - 2022 - PaLM Scaling Language Modeling With Pathways
83 pages
Brief Introduction To LLM
No ratings yet
Brief Introduction To LLM
69 pages
Llms Course Andrew
No ratings yet
Llms Course Andrew
46 pages
12-13.chapter9 DeepLearningInNLP
No ratings yet
12-13.chapter9 DeepLearningInNLP
45 pages
cs224n 2023 Lecture9 Pretraining
No ratings yet
cs224n 2023 Lecture9 Pretraining
54 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
Lecture-5-Intro DL
No ratings yet
Lecture-5-Intro DL
39 pages
Jason Wei Stanford cs330 Talk
No ratings yet
Jason Wei Stanford cs330 Talk
44 pages
2207 06839
No ratings yet
2207 06839
32 pages
Learning Transferable Visual Models From Natural Language Supervision
No ratings yet
Learning Transferable Visual Models From Natural Language Supervision
48 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Cs224n Text Generation
No ratings yet
Cs224n Text Generation
73 pages
Unit 4 LLM
No ratings yet
Unit 4 LLM
11 pages
Unit 4 (Adl)
No ratings yet
Unit 4 (Adl)
18 pages
LLM Diversity
No ratings yet
LLM Diversity
17 pages
Nn4nlp 02 LM
No ratings yet
Nn4nlp 02 LM
47 pages
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
No ratings yet
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
14 pages
Practice Test 7. 01.06.2021. 0nline
No ratings yet
Practice Test 7. 01.06.2021. 0nline
7 pages
2024 - Skywork-MoE - Wei Et Al
No ratings yet
2024 - Skywork-MoE - Wei Et Al
14 pages
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
No ratings yet
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
12 pages
Pre Trained Models For NLP
No ratings yet
Pre Trained Models For NLP
15 pages
Data Generation Using Large Language Models For Text Classification
No ratings yet
Data Generation Using Large Language Models For Text Classification
17 pages
Clip
No ratings yet
Clip
15 pages
Neural Text Generation: A Practical Guide: Ziang Xie Zxie@cs - Stanford.edu
No ratings yet
Neural Text Generation: A Practical Guide: Ziang Xie Zxie@cs - Stanford.edu
21 pages
Lecture 1
No ratings yet
Lecture 1
7 pages
Large Language Models
No ratings yet
Large Language Models
10 pages
Leashing Inner Demons
No ratings yet
Leashing Inner Demons
9 pages
MLRESEARCHPAPERfinal
No ratings yet
MLRESEARCHPAPERfinal
7 pages
Text Generation:Use Technique Like Markov Models or LSTM Network To Generate Realistic Text in A Specific Style or Genre
No ratings yet
Text Generation:Use Technique Like Markov Models or LSTM Network To Generate Realistic Text in A Specific Style or Genre
7 pages
2412.01253v5【Yi Lightning】Technical Report
No ratings yet
2412.01253v5【Yi Lightning】Technical Report
17 pages
NLP Short
No ratings yet
NLP Short
5 pages
AI Primer
No ratings yet
AI Primer
12 pages
2311.09807 The Curious Decline of Linguistic Diversity - AI Garbage in Garbage Out
No ratings yet
2311.09807 The Curious Decline of Linguistic Diversity - AI Garbage in Garbage Out
10 pages
Advance Facebook Marketing
No ratings yet
Advance Facebook Marketing
38 pages
Summary - Foundations On LLMs
No ratings yet
Summary - Foundations On LLMs
6 pages
Set 2 Answer - 90 - (1) 2
No ratings yet
Set 2 Answer - 90 - (1) 2
11 pages
Text Generation
No ratings yet
Text Generation
4 pages
Proposal On GSM Based Project
No ratings yet
Proposal On GSM Based Project
12 pages
GDPR Awareness Handbook
No ratings yet
GDPR Awareness Handbook
58 pages
APSRTC & Redbus General Terms and Conditions
No ratings yet
APSRTC & Redbus General Terms and Conditions
18 pages
DBMS - 06 - Practice Problems (ERD)
No ratings yet
DBMS - 06 - Practice Problems (ERD)
8 pages
Temp Mail - Disposable Temporary Email
No ratings yet
Temp Mail - Disposable Temporary Email
4 pages
AP - Application Form 2015.03.05 3
No ratings yet
AP - Application Form 2015.03.05 3
9 pages
CFCS Recertification Handbook 7-22-20
No ratings yet
CFCS Recertification Handbook 7-22-20
11 pages
Email Marketing Note
No ratings yet
Email Marketing Note
9 pages
Unit-5 Application Layer
No ratings yet
Unit-5 Application Layer
29 pages
What To Write in An Email When Sending A Resume
100% (1)
What To Write in An Email When Sending A Resume
6 pages
5.0 - Professional Electronic Messaging-1
No ratings yet
5.0 - Professional Electronic Messaging-1
32 pages
HR Database-Hyd-31 (1) .03.2009
No ratings yet
HR Database-Hyd-31 (1) .03.2009
2 pages
UNIT - 4 Notes
No ratings yet
UNIT - 4 Notes
28 pages
Technical Writing-Formal Letter-Midterm
No ratings yet
Technical Writing-Formal Letter-Midterm
3 pages
Nist
No ratings yet
Nist
4 pages
TUGAS BAHASA INGGRIS Ex 6-7
No ratings yet
TUGAS BAHASA INGGRIS Ex 6-7
2 pages
How To Launch On Product Hunt, A Detailed Guide
No ratings yet
How To Launch On Product Hunt, A Detailed Guide
13 pages
DTR Form
No ratings yet
DTR Form
1 page
Purc Unit Test Reviewer
No ratings yet
Purc Unit Test Reviewer
5 pages
Markus Lenz 7361514494599
No ratings yet
Markus Lenz 7361514494599
2 pages
Bomb Threat Response
No ratings yet
Bomb Threat Response
1 page
Do and Dont
No ratings yet
Do and Dont
2 pages
Document Transmittal DT-30 01052024
No ratings yet
Document Transmittal DT-30 01052024
4 pages
Bankersdaily in Computer Knowledge Quiz Set 97
No ratings yet
Bankersdaily in Computer Knowledge Quiz Set 97
8 pages
Gmail - Booking Confirmation On IRCTC, Train - 12624, 28-Jan-2020, 3A, ERN - MAS
No ratings yet
Gmail - Booking Confirmation On IRCTC, Train - 12624, 28-Jan-2020, 3A, ERN - MAS
1 page
Gmail - Google Cloud Support 40256625
No ratings yet
Gmail - Google Cloud Support 40256625
1 page
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Improving Text Classifiers Through Controlled Text Generation Using Tranformer Wasserstein Autoencoder

Uploaded by

Improving Text Classifiers Through Controlled Text Generation Using Tranformer Wasserstein Autoencoder

Uploaded by

Improving Text Classifiers through Controlled

Text Generation using Tranformer Wasserstein

Harikrishnan C and Dhanya N M

Amrita Vishwa Vidyapeetham, Amritanagar, Ettimadai, Tamil Nadu 641112

Abstract. Training good classifiers on imbalanced dataset have always

Keywords: Text Classification · Natural Language Generation · Trans-

A dataset is said to imbalanced when there is a skew in class proportions. This

After the introduction of the transformer architecture [14] which proved to be

The contribution of this paper is a transformer-based Wasserstein autoencoder

2.1 Strategies for balancing datasets

2.2 Wasserstein Autoencoder

Wasserstein autoencoder (WAE) [13] uses the same architecture as variational

2.4 Decoding Strategies

3 Method and Experiments

Fig. 1. The proportion of imbalance in a) Covid dataset b) Spam dataset

Fig. 2. Tranformer Wasserstein Autoencoder

Fig. 3. Training Controller Network

y = argmax(Cz (z)) (5)

Fig. 4. Training the classifier

Two different types of classifiers were chosen to validate this approach. A

Table 1. Covid Fake news detection results

Model Accuracy F1 Score

Table 2. Spam detection results

Model Accuracy F1 Score

This paper proposes a new approach to train better models on an imbalanced

You might also like