TAM GAN - Tamil Text To Naturalistic Image Synthesis Using Conventional Deep Adversarial Networks - 3584019
TAM GAN - Tamil Text To Naturalistic Image Synthesis Using Conventional Deep Adversarial Networks - 3584019
DIVIYA M and KARMEL A, School of Computer Science and Engineering, VIT Chennai Campus,
Chennai, Tamilnadu, India
Text-to-image synthesis has advanced recently as a prospective area for improvement in computer vision
applications. The image synthesis model follows signicant neural network architectures such as Generative
Adversarial Networks (GANs). The ourishing text-to-image generation approaches can nominally reect the
meaning of the text in generated images. Still, they need the prospect of providing the necessary details and
eloquent object features. Intelligent systems are trained in text-to-image synthesis applications for various
languages. However, their contribution to regional languages is yet to be explored. Autoencoders prompt
the synthesis of images, but they result in blurriness, which results in clear output and essential features of
the picture. Based on textual descriptions, The GAN model is capable of producing realistic images of a high
quality that can be used in various applications, like fashion design, photo editing, computer-aided design,
and educational platforms. The proposed method uses two-stage processing to create a language model using
a BERT model called TAM-BERT and an existing MuRIL BERT, followed by image synthesis using a GAN.
The work was conducted using the Oxford-102 dataset, and the model’s eciency was evaluated using the
F1-Score measure.
CCS Concepts: • Computing methodologies → Information extraction;
Additional Key Words and Phrases: Computer vision, Generative Adversarial Network (GAN), BERT, MuRIL
BERT, language model, L1Norm, feature matching, latent vectors
ACM Reference format:
Diviya M and Karmel A. 2023. TAM GAN: Tamil Text to Naturalistic Image Synthesis Using Conventional
Deep Adversarial Networks. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22, 5, Article 128 (May 2023),
18 pages.
https://doi.org/10.1145/3584019
1 INTRODUCTION
Technology is enhancing everything. The rapid growth in the technological era paves the way for
text-to-image synthesis applications to spread their wings. The proposed work aims at providing
a text-based image synthesis based on Tamil textual descriptions. The work focuses on generating
images that relate to the given text from the dataset under study. Previously, many researchers
have contributed to image synthesis in other languages, including English, which has undergone
Authors’ address: Diviya M and Karmel A, School of Computer Science and Engineering, Vellore Institute of Technology,
Chennai, Tamilnadu, India; emails: [email protected], [email protected].
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and
the full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be
honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specic permission and/or a fee. Request permissions from [email protected].
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
2375-4699/2023/05-ART128 $15.00
https://doi.org/10.1145/3584019
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
128:2 Diviya M and Karmel A
extensive development. The researchers have taken on the computational challenge of creating
new human poses. They have taken a reference picture of a person and used it to create a new
image of that person in the required posture while keeping the original photo’s aesthetic qualities
intact, including the original photo’s lighting and background. Here, the authors provided a mod-
ular generative neural network capable of synthesizing poses from training pairs of images and
stances extracted from human action movies. To accomplish this, the network dissects the scene
into layers consisting of body parts and backgrounds, then relocates and modies the appearance
of those body parts before nally compositing the new foreground onto a background with holes
in it [1]. It would be fascinating and practical if AI could automatically synthesize realistic visuals
from text, but existing systems are far from this aim. Generic and strong recurrent neural network
topologies have been developed in recent years. To eectively combine these developments in text
and picture modeling, the researchers created a new deep architecture and Generative Adver-
sarial Network (GAN) formulation in this study to show that the proposed model can create
convincing pictures of birds and owers based on textual descriptions. The work follows GAN-
CLS and GAN-INT-CLS as a two-stage process [2]. In the world of fashion image synthesis using
Fashion Gen and Deep Fashion synthesis datasets, Kenan et al. proposed a novel method named
enhanced-Attentional Generative Adversarial Network; The model incorporates both real-world
and synthetic picture feature matching losses, as well as multimodal similarity learning for text
and image characteristics [3]. It uses feature-wise linear modulations for a clear understanding of
the context. For the time being, Krishna et al. came up with a new challenging task of generating
images based on scenic views. They gave a cross-view image translation, which is better than the
traditional methods.
While considering image synthesis as a prime motive, textual descriptions, however, need more
attention. While handling a language that has morphology-rich features comes with a challeng-
ing aspect. Here, with a good start, the text features are studied, and preprocessing is done [4]. In
the Tamil language, noted work was performed for language feature study. Saraswathi et al. have
drawn-out language models at various phases to identify the errors in syllables and words, result-
ing in a better speech recognition system. At the phonetic level, speech signals were segmented by
using their acoustic characteristics [5]. When we go for symbol-level language models, the authors
proposed a lexicon for text corpora in the Tamil language that improves accuracy [6]. Suresh et
al. dropped a bi-gram language model for online and handwritten words. The model weeds out
the problem of idiosyncrasies and disambiguation that are present in Indic scripts. The use of
(1) language models that take advantage of the peculiarities of Indic scripts and (2) well-framed
classiers for the disambiguation of confusing symbols are two areas where this work contributes
that are rarely discussed in the online Indic word recognition literature. Each symbol in the input
word is extracted before being recognized by a main Support Vector Machine (SVM) classi-
er. To further improve recognition accuracy, they use (i) a bigram language model at the symbol
or letter level and (ii) well-equipped classiers to review and disambiguate the multiple sets of
ambiguous morphemes [7].
Realistic image synthesis using generative adversarial networks has been utilized in many real-
time applications proposed by Ian Goodfellow et al. It consists of a generator and a discriminator.
This model doesn’t need any other interference network, which symbolizes the ability of GANS to
generate a renowned output. Following the work proposed by various researchers, a new GANS
was proposed [8]. Latterly, convolutional neural networks (CNNs) have become popular in
the computer vision discipline for supervised learning. However, little focus has been placed on
unsupervised learning using CNNs. The prime goal of this work is to ll in the gaps between su-
pervised and unsupervised learning that CNNs currently have. The authors presented a new type
of CNNs termed DCGANs that meet specic architectonic requirements and show a promising
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
TAM GAN: Tamil Text to Naturalistic Image Synthesis 128:3
platform for unsupervised learning. They demonstrated that while training on numerous image
datasets, a deep convolutional adversarial pair learns a hierarchy of representations from object
sections to scenes in both the generator and the discriminator. These representations are then ap-
plied to new problems and their generalizability as representations of images is demonstrated [9].
Followed by Alec et al., Mehran et al. also came up with regularised DC GAN for representation
learning. They developed and deployed deep neural networks (DNNs) in tandem with GAN;
they provide unsupervised representation learning (GANs). The suggested approach is better than
the competing approaches. As evidence, the proposed strategy not only aids feature evulsion but
also speeds up to boosts the consummation of the learning in GANs, leading to more appropriate
feature extraction [10]. The researchers created a CookGAN model that learns image features and
results in an ecient model that can distinguish real and fake images. Proposed CookGAN, a novel
network design that up-samples images gradually while preserving ne-grained features and sim-
ulating visual eects in causality chains. In particular, the researchers present a culinary simulator
sub-network that, over time, modies input images of food based on how various ingredients and
preparation techniques interact. CookGAN has been shown experimentally on Recipe1M to pro-
duce food photos with a respectable inception score. In addition, the visuals can be interpreted and
manipulated meaningfully [11]. Memory networks bring a new outlook to the existing GAN net-
work, which helps in generating high-resolution images using external memory. This prompted a
lot of researchers to concentrate on generating high-resolution images along with the traditional
GAN architecture [12]. Tan et al. brought a new framework to the existing GAN network, said
to be Knowledge-Transfer (KT). The KT mechanism solves the problem of cross-domain that
exists between image and text input [13]. In the case of complex image captions, the metrics for
image quality in practice don’t meet the standards. In such a case, the researchers have brought
out a new metric named Semantic Object Accuracy that results in and evaluates the image and the
caption that belong to it. Even though image synthesis from text has been explored to a greater
extent, the drawback still relies on pose variations, shape variations, viewpoints, and so on [14].
Wang et al. believed that instead of learning through text-image mapping, their algorithm learns
through the semantic layout, which proves to be a better model. GAN has another path of appli-
cations, such as the image in the painting, which is on the other oor of GANS [15]. The authors
applied a two-stage GAN model on a custom dataset to improve the performance.
On the one hand, image synthesis using GANS based on the text has been considered, but on the
other hand, processing Tamil text is a challenging part. Abundant research has been carried out to
understand the text features. Handwritten word recognition for Tamil and Devanagari scripts was
addressed by Bharat et al. An HMM model was proposed to study the features of the words [16].
The researchers created a Morphological Analyzer cum Generator said to be Tamizhi Morph for
processing text, which in turn applied to Machine Translation applications. The preprocessing step
plays a vital role in text feature extraction whereas Tamil is a morphologically rich language. This
study details the Finite-state Transducer (FST)-based design and Foma-based implementation
of Thamizhi Morph. It brings out the specics of Tamil’s nominal and verbal paradigms that in-
formed our design choices. To eciently characterize the inectional morphology of the language,
we dene a high-level meta-language [17]. Surya et al. developed a morphological analyzer named
Piripori for analyzing words using morphological rules. To understand the word form structure,
a morphological analyzer is an important tool [18]. From machine learning to Deep Learning, al-
most every challenge in NLP has been conquered. To this day, the process of translating a foreign
language into a local one remains mysterious. Languages besides English have muddled NLP is-
sues. Entity Extraction, Optical Character Recognition, and Sequence Modeling Classication and
Prediction are all possible names for the issues. As more people intended to convey in their native
tongues on social forums, it becomes increasingly vital to automate the process of categorizing
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
128:4 Diviya M and Karmel A
content written in languages like Tamil, Telegu, Hindi, and so on. The objective here is categoriz-
ing the Tamil news items according to their respective subjects (Sports, Cinema, Politics). Existing
work has taken a TFIDF-of-words-as-features approach to conventional machine learning tech-
niques. The purpose of this research was to assess the eciency of CNNs that were trained with
pre-trained embeddings to those that learned TFIDF features from scratch (CNN) [19]. The authors
processed the text before running CNN by removing stop words followed by generating an em-
bedding matrix and embedding vector. Another way of processing text is by using BOW and TF,
IDF, by the researchers Sajeetha et al. in their work. The work considered the following approach
based on supervised machine learning and a hybrid approach. Many algorithms and language
models were used to convert a text le to vector forms [20]. One among them is using Conv-LSTM
for understanding text and Parts of Speech Tagging for Swahili words through syllables. The ef-
fectiveness of a method can be best understood by employing it on the corpus in hand. Reviewing
and understanding various methodologies results in better research ideas over an area [21]. The
deep learning model gives a better focus for the work to be carried out. The models in the context
of NLP throw a spotlight on researchers’ need to have in-depth knowledge before they employ the
model. Such methods include word2vec, Fast text and Glove, RNNs, LSTM, GRUs, BRNNs, and so
on. They also made focused on explaining how activation functions are to be chosen in the genera-
tive models, followed by various optimization techniques to be followed in the generative models.
Normally, research happens with generic text corpora [22]. But the researchers brought out the
fact of handling scientic data by using the SCI-BERT model, which proves to be an improvement
of NLP tasks in scientic areas [23].
Deep analysis has been done through language models and image synthesis algorithms. Major
research contributions have been made to the English language, and applications based on other
regional languages have been in the initial stages of exploration. This has made a quick start and
the need for taking up research in the Tamil language. Since Tamil is a historically old language
with a rich morphology, it paved the way for various applications that could be developed for the
Tamil language. The research in hand concentrates on how a language model can be proposed for
the Tamil language, and further, an image synthesis deep learning architecture could be developed
for Tamil text. Being in the initial stage of research in the future, many such applications and
research problems for the Tamil language can be addressed. The main diculty that exists with
regional languages is the lack of corpora and tools for processing. The main aim of the research is
to support research carried out in Tamil, and this is in the budding stages of research, which could
be enhanced further in the future.
2.1 TAM-BERT
The Bidirectional Encoder Representations from Transformers (BERT) model developed by
Google created a revolution in the eld of NLP. The BERT model [23] was developed as a generic
model to process textual information in multiple languages. For Indic languages like Tamil, Hindi,
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
TAM GAN: Tamil Text to Naturalistic Image Synthesis 128:5
and Telugu, the model was trained on publicly available datasets. The model proposed in work is
trained on the sentences of various text les belonging to the Oxford 102 ower dataset. The Tamil
sentences for the corresponding image les have been collected as a corpus and were trained. In
total, the ower dataset has about 102 classes, each with 30 to 40 image les and a description
in English for those images. From Figure 1, the description is clear in that the available English
sentences were translated using Google Translate and formed as text les in the Tamil language,
possibly about 3,000 sentences on the run. Tamil sentences were used as inputs to the BERT model,
which trains for each of those inputs.
The following layers, starting from hidden layers, linear layers, and latent layers, were present
in the language model: Figure 2 represents the schematic representation of TAM-BERT. The input
sentence in the Tamil language is considered as S having T words. It is represented as, where S
ranges from 1 to T words. The Indic BERT is used as a base language model, which tokenizes
the Tamil input text [24]. Followed by tokenization, the tokens were converted into latent vectors
using the Latent Layer. Each latent layer model has a dened latent size and has hidden states. The
Output of the Latent Layer Model is the latent vectors, which serve as inputs to the GAN network.
Each layer performs its own function, starting from the tokenization of the given sentences to
individual words. Once the tokenization word has been done, the result is provided to the hidden
layer as an input where the processing happens. Based on the parameters for tuning, the next
layer of the linear layer performs its function, and the end output is passed to the latent layer. The
output is latent vectors of the input word sequence represented in Table 1. The various layers that
combine to build the TAM-BERT model are all encoders, which have the capacity to be trained on
the target corpus.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
128:6 Diviya M and Karmel A
2.3 Tam-Gan: A Generative Model for Tamil Text Image Synthesis Using TAMBERT
and MuRIL BERT Language Model
The generative network plays a major role in image synthesis. The basic units of a GAN model are
a generator network and a discriminator network. Both models try to learn through probabilistic
distribution. When the discriminator is considered, it learns the model by conditional probabilis-
tic distribution. However, generative models follow joint probabilistic distribution in learning. A
combined model is produced by using Bayes’ rule. The GAN model has an interesting history in
its development. Belief networks, autoencoders, and Boltzmann machines are the chronological
players before the GAN model. The Fully Visible Belief Network worked by employing the chain
rule of probability. But they fail to generate more samples in return. The next level of improvement
is made in the change of variables while considering non-linear ICA. But they ended up having a
constraint of similar dimensions for data and latent variables.
The auto encoders tried working on maximizing the log-likelihood function of data, but the
network lacks performance when there is a gap between the lower bound and the actual density
of data. The nal output of these models is of low quality. A Boltzmann machine works with an
energy function that is proportional to the probabilistic function of a dened state. They require
Monte Carlo and Markov chains, but they fail in high dimensional space. The limitations of the
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
TAM GAN: Tamil Text to Naturalistic Image Synthesis 128:7
previous players are overcome by the GAN model, which is consistent and does not require a
Markov chain, which results in the best samples. The major function of the generator is to fool the
discriminator by generating indistinguishable samples. But the discriminator results in identifying
fake and real data. The TAM-GAN model for the given data works in the following fashion.
The proposed system works by preprocessing the images to be trained and the text le under
study. From Figure 3, the initial process is described as a set of functions involved in text and
images. The work starts by tokenizing the given text le and creating individual tokens that serve
as inputs to the language model. The TAM-BERT language model and MuRIL BERT generate latent
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
128:8 Diviya M and Karmel A
vectors, and they are archived. However, the image is preprocessed, which involves image re-
sizing, normalization, cropping, and converting to tensors. The generated Output of the language
model and the image vectors are fed into the GAN network along with the loss functions. The
resultant images are synthesised along with the loss of the network.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
TAM GAN: Tamil Text to Naturalistic Image Synthesis 128:9
the corresponding Tamil text using MuRIL BERT, which helps in dealing with the Tamil language.
The vectorized forms of text have been generated based on segment embedding, positional em-
bedding, and sentence embedding. The word vectors are given as input to TAMGAN along with
a preprocessed image vector. The Generator and Discriminator work together to synthesis the
images for the Tamil text.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
128:10 Diviya M and Karmel A
ALGORITHM 1:
Input: Images x, Text description of the image t, training batch size B
For n = 1 to B, do
h ← Φ (t) – Encoded the text description for an image
Z ~ Noise (0, 1)
x ← G (Z, t) (Input to the Generator)
y ← D(x, h) - (Real Image, text description)
ŷ ← D(x̄ ,h) - (Fake image, text description)
G ← G − η∇f (G ) (update Generator)
D ← D − η∇f (D ) (Update Discriminator)
Prediction ← D (x, x̄ )
Generate Real Image x
where E X is Expected overall Real Data, G (Z ) is Generator output Noise Function, D Real (G (Z ) ) is
Estimate of Probabilistic value that false instance is real, and D Real (X ) is Estimate of Discriminator
probabilistic value that real instance is real. Whereas in Equation (1), the min-max function over
G and D Real implies the Generator’s nature in minimizing the Discriminator’s ability to deter-
mine whether it is fooled. In turn, the discriminator tries to maximize its function in detecting real
images from fake images. The log D Real (X) approaches value 1, and the log (1 − D Real (G(Z ))) ap-
proaches negative innity, which results in the synthesis of the original image by the discriminator.
multiplied by the tensor value T. The nal output matrix with tensor is represented as T € MA X B X C ,
Where the tensor is a three-dimensional representation and Si € MA X B is the resultant matrix with
T × f(xi ) followed by Output (xi ) is the Output of L1 norm values. With respect to the given input
embeddings and image vector, the model shows a higher loss value for the generator. But beyond
300 epochs, the model starts to synthesis images at the same rate, and the loss function tends
to be in the range of 2.5. L1 norm and mini-batch discrimination play a major role in stabilizing
the model to avoid mode collapse. Feature matching between the input image and the generated
synthetic images adopting the L1 norm helps to understand dierence in images synthesised by
the generator. Finally, the loss reaches a minimal value, which is represented in Figure 7.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
128:12 Diviya M and Karmel A
generator. Moreover, it focuses on the probability value of discriminating real images from fakes
that are supplied by the generator. The mathematical expressions are given as follows in Equa-
tions (4) and (5):
Loss of Generator L (G ) = E x̄ log (1 − D (x̄ ) ) , (4)
∑N
Loss of Discriminator L (D ) = −loд D Real (G (Z ) ). (5)
1
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
TAM GAN: Tamil Text to Naturalistic Image Synthesis 128:13
Fig. 9. Discriminator losses of the TAM GAN model across a number of epochs.
Fig. 10. Generator loss of the TAMAN model across a number of epochs.
in the initial epochs for real and fake images is 0.8216 and 0.0261. But when it proceeds for fur-
ther epochs, it tends to vary, and in the nal phase, the F1 score is 0.9707 for the real images and
0.1829 for the fake images. The score depicts the measure of precision and recall as a combined
value. The model performs well while we employ the validation set. Moreover, it doesn’t need to
be highly computational, as we have obtained word vectors during the initial stages of process-
ing. The performance of the model results in a signicant resolution of the image. The evaluation
of the TAM GAN model using the latent vectors generated by MuRIL BERT has also achieved a
better F1-Score value of 0.9721, which is similar to the previous model. The model has performed
with an error rate that is higher in the initial stages, and when epochs go on, the model has re-
ported a better resolution with the images synthesised. The synthesised images are represented in
Figure 14.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
128:14 Diviya M and Karmel A
Fig. 11. The score of synthetic images generated by the Generator network.
Fig. 12. True positive rate of the real images synthesised by the model.
From Figure 15, the scores of the real and fake images synthesised by the MuRIL BERT model
and TAM GAN are represented, which shows the model has achieved better performance.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
TAM GAN: Tamil Text to Naturalistic Image Synthesis 128:15
Fig. 13. The scores of Real and Fake images across a number of epochs—TAMBERT+TAMGAN.
Fig. 14. Synthesised images using MuRIL BERT and TAM GAN.
Fig. 15. The scores of Real and Fake images across a number of epochs—MuRIL BERT + TAMGAN.
the GAN architecture. The text processing is done with LSTM, which is a bidirectional one. They
worked with a popular MS-COCO dataset with which they created a Train-R, Test-R, and Train-S
that have specied data corresponding to groups such as white, top, pillows, and table. The results
are evaluated using BLUE-1, BLUE-2, -3, and -4 scores. A brand-new approach that goes from be-
ginning to text-to-image synthesis based on dimensional restrictions brought forth by mining the
geographical location and shape information of objects. Directly under the supervision of the de-
veloped semantic information, the system generates multi-object ne-grained images rather than
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
128:16 Diviya M and Karmel A
learning a hierarchical mapping from text to image. This is carried out under the guidance of the
established semantic data [27]. However, when applied to real-generation tasks, vanilla deep neu-
ral networks tend to proximate continuous mappings rather than discontinuous mappings with
discrete points. The failure of GAN to synthesis a variety of images, which we refer to as mode col-
lapse, occurs during training on datasets that contain many types. In this research, the researchers
present the Multi-generator Text Conditioned GAN, also known as the MTC-GAN, as a potential
solution to this problem [28]. The proposed work focuses on generating images for Tamil text
input using a TAM-BERT and MuRIL-BERT model along with a TAM-GAN model [29]. The syn-
thesised images and their resolution depend on the text embedding models adopted according to
the morphology of input and the GAN model employed for synthesis of images. Moreover, it also
depends on the available resources and corpus of the particular language. The proposed model
also overcomes mode collapse and reduced loss function through feature matching and minibatch
discrimination, which employs the L1 norm for improvement. The comparative analysis of the ex-
isting method with the proposed methodology is depicted in Figure 16. The model’s performance
on the given dataset is nearly on the same run, which is also better for the Tamil language. More-
over, the performance of each algorithm depends on the language, since every language has its
own morphological representation.
4 CONCLUSION
Image synthesis for textual description seems to be an interesting area of exploration. The pro-
posed text synthesis for Tamil text plays a vital role in developing tools that pave the way for
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
TAM GAN: Tamil Text to Naturalistic Image Synthesis 128:17
education in an easier way. Similarly, where regional languages play a major role in such an envi-
ronment, the proposed model can be employed in image generation. The TAM-BERT and MuRIL-
BERT models, as well as TAM-GAN, are in the initial phase of employing the Tamil language
to combine with the renowned GAN model. In the future, major text parts, including literature,
science, and so on, can be eciently handled. By enhancing the model, super-resolution photore-
alistic images can be obtained. Moreover, GAN-based loss functions can be used to attain accurate
image generation with the existing functions. In the future, auto-encoder models and diusion
models could be added to enhance the work, which would result in better performance.
REFERENCES
[1] Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, and John Guttag. 2018. Synthesizing images of hu-
mans in unseen poses. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8340–8348.
[2] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative
adversarial text to image synthesis. In Proceedings of the International Conference on Machine Learning. PMLR, 1060–
1069.
[3] Kenan E. Ak, Joo Hwee Lim, Jo Yew Tham, and Ashraf A. Kassim. 2020. Semantically consistent text to fashion image
synthesis with an enhanced attentional generative adversarial network. Pattern Recogn. Lett. 135 (2020), 22–29.
[4] Krishna Regmi and Ali Borji. 2018. Cross-view image synthesis using conditional GANs. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition. 3501–3510.
[5] S. Saraswathi and T. V. Geetha. 2004. Building language models for Tamil speech recognition system. In Proceedings
of the Asian Applied Computing Conference. Springer, Berlin, 161–168.
[6] Suresh Sundaram and A. G. Ramakrishnan. 2012. Language models for online handwritten Tamil word recognition.
In Proceeding of the Workshop on Document Analysis and Recognition. 42–48.
[7] Suresh Sundaram and A. G. Ramakrishnan. 2015. Bigram language models and reevaluation strategy for improved
recognition of online handwritten Tamil words. ACM Trans. Asian Low-Res. Lang. Info. Process. 14, 2 (2015), 1–28.
[8] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.
[9] Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional
generative adversarial networks. Retrieved from https:// arXiv:1511.06434.
[10] Mehran Mehralian and Babak Karas. 2018. RDCGAN: Unsupervised representation learning with regularized deep
convolutional generative adversarial networks. In Proceedings of the 9th Conference on Articial Intelligence and Ro-
botics and 2nd Asia-Pacic International Symposium. IEEE. 31–38.
[11] Bin Zhu and Chong-Wah Ngo. 2020. CookGAN: Causality-based text-to-image synthesis. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5519–5527.
[12] Minfeng Zhu, Pingbo Pan, Wei Chen, and Yi Yang. 2019. Dm-gan: Dynamic memory generative adversarial networks
for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
5802–5810.
[13] Hongchen Tan, Xiuping Liu, Meng Liu, Baocai Yin, and Xin Li. 2020. KT-GAN: Knowledge-transfer generative ad-
versarial network for text-to-image synthesis. IEEE Trans. Image Process. 30 (2020), 1275–1290.
[14] Tobias Hinz, Stefan Heinrich, and Stefan Wermter. 2019. Semantic object accuracy for generative text-to-image syn-
thesis. Retrieved from https:// arXiv:1910.13321.
[15] Purva Raut, Moxa Doshi, Monil Diwan, and Karan Doshi. 2020. Face completion using generative adversarial network.
In Advanced Computing Technologies and Applications. Springer, 523–531.
[16] A. Bharath and Sriganesh Madhvanath. 2011. HMM-based lexicon-driven and lexicon-free word recognition for on-
line handwritten Indic scripts. IEEE Trans. Pattern Anal. Mach. Intell. 34, 4 (2011), 670–682.
[17] Kengatharaiyer Sarveswaran, Gihan Dias, and Miriam Butt. 2021. ThamizhiMorph: A morphological parser for the
Tamil language. Mach. Transl. 35, 1 (2021), 37–70.
[18] M. Suriyah, Aarthy Anandan, Anitha Narasimhan, and Madhan Karky. 2019. Piripori: Morphological analyser for
Tamil. In Proceedings of the International Conference on Articial Intelligence, Smart Grid And Smart City Applications.
Springer, Cham, 801–809.
[19] S. Ramraj, R. Arthi, Solai Murugan, and M. S. Julie. 2020. Topic categorization of Tamil News Articles using Pre-
Trained Word2Vec Embeddings with Convolutional Neural Network. In Proceedings of the International Conference
on Computational Intelligence for Smart Power System and Sustainable Energy (CISPSSE’20). IEEE, 1–4.
[20] Sajeetha Thavareesan and Sinnathamby Mahesan. 2019. Sentiment analysis in Tamil texts: A study on machine learn-
ing techniques and feature representation. In Proceedings of the 14th Conference on Industrial and Information Systems
(ICIIS’19). IEEE, 320–325.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.
128:18 Diviya M and Karmel A
[21] Casper Shikali Shivachi, Refuoe Mokhosi, Zhou Shijie, and Liu Qihe. 2021. Learning syllables using Conv-LSTM
model for Swahili word representation and part-of-speech Tagging. Trans. Asian Low-Res. Lang. Info. Process. 20,
4 (2021), 1–25.
[22] Touseef Iqbal and Shaima Qureshi. 2022. The survey: Text generation models in deep learning. Journal of King Saud
University-Computer and Information Sciences 34, 6 (2022), 2515–2528.
[23] Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. Scibert: A pre-trained language model for scientic text. Retrieved from
https://arXiv:1903.10676.
[24] Christophe Van Gysel, Maarten De Rijke, and Evangelos Kanoulas. 2018. Neural vector spaces for unsupervised
information retrieval. ACM Trans. Info. Syst. 36, 4 (2018), 1–25.
[25] Bo Chang, Qiong Zhang, Shenyi Pan, and Lili Meng. 2018. Generating handwritten chinese characters using cyclegan.
In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’18). IEEE, 199–207.
[26] Md Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, Hamid Laga, and Mohammed Bennamoun. 2021. Text to
image synthesis for improved image captioning. IEEE Access 9 (2021), 64918–64928.
[27] Min Wang, Congyan Lang, Liqian Liang, Songhe Feng, Tao Wang, and Yutong Gao. 2020. End-to-end text-to-image
synthesis with spatial constraints. ACM Trans. Intell. Syst. Technol. 11, 4, Article 47 (Aug. 2020), 19 pages. https:
//doi.org/10.1145/3391709
[28] Min Zhang, Chunye Li, and Zhiping Zhou. 2021. Text to image synthesis using multi-generator text conditioned
generative adversarial networks. Multimedia Tools Appl.80, 5 (Feb 2021), 7789–7803. https://doi.org/10.1007/s11042-
020- 09965- 5
[29] S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. K. Margam, P. Aggarwal, R. T. Nagipogu, S. Dave,
S. Gupta, S. C. Gali, V. Subramanian, and Talukdar. 2021. MuRIL: Multilingual representations for Indian languages.
Retrieved from https://arxiv.org/abs/2103.10730.
ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 22, No. 5, Article 128. Publication date: May 2023.