Text To Video Generation Using Deep Learning

Uploaded by

SYA63Raj More

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views7 pages

Text To Video Generation Using Deep Learning

Uploaded by

SYA63Raj More

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM) | 979-8-3503-4779-1/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICONSTEM56934.2023.

10142725

Text to Video Generation using Deep Learning

Sivakami Raja1, Mierudhula S2, and Potheeswari J2
1
Professor,Department of CSE, Rajalakshmi Engineering College, Chennai,Tamilnadu,India,
2
UG Student,Department of CSE, Rajalakshmi Engineering College, Chennai,Tamilnadu,India,

E-mail:[email protected] [email protected] [email protected]

Abstract: Technology developments have resulted in GAN in its initial configuration. In order to
the creation of techniques that can provide desired distinguish between real samples drawn from the
visual multimedia. Particularly, deep learning-based actual data distribution and fraudulent samples
image generation has been the subject of in-depth produced by the generator, the discriminator is
research in many different disciplines. On the other
improved. In order to trick the discriminator, the
hand, it is still challenging for generative models to
produce films from text, a topic that is less focused. generator is taught to create samples that reflect the
This research tries to fill this gap by training the real data distribution. For recreating complex data
model to generate a clip that matches a given written distributions, such as those of texts, images, and
sentences. The field of conditional video creation is videos, GAN has lately shown a lot of potential.
largely underdeveloped. With the help of a conditional GAN models are known for being challenging to
generative adversarial network, which develops train, despite their success. The selection of hyper-
frame-by-frame and ultimately creates a full-length parameters frequently has an adverse effect on the
film, our project's goal is to transform text to image to training process stability. The deep convolutional
video. This focuses on creating a single, superb video GAN, proved effective in providing a realistic output
frame in the initial step while learning how to connect
using a GAN framework, other studies have claimed
text and visuals. As the stages go, our model is
gradually trained on an increasing count of continuous remarkable outcomes utilizing deep networks. The
frames. This approach of learning in stages stabilizes ability to generate photo realistic images that are
the training and makes it easier to understand. High- difficult for humans to discern from genuine pictures
definition movies may be created using conditional exists today. However, because creating videos is a
text descriptions. To demonstrate the efficacy of the far harder task than creating images, there are
recommended strategy, results from qualitative and significantly fewer studies on video generation than
quantitative trials on various datasets are required. there are on image generation. Videos must take into
account the sense of connection between frames,
Keywords— Variational auto encoders; GAN; Video whereas images just assess the finish of the single
generation; Conditional GAN; Video GAN.
frame. If the progression between nearer frames is
not assured, well-crafted films cannot be produced,
I.INTRODUCTION
even if each image is of a high quality. Further
complicating the process of creating videos is the fact
In this era, generative models has been researched in
that almost all publicly available video collections
a large scale. Variational auto encoders (VAEs)
are exceedingly diverse and out of alignment. In
which is a recurrent neural network with a prior and
contrast to the extensive study that has been done in
an appropriate noise distribution and generative
the field of image production, conditional video
adversarial networks (GANs) in which two artificial
generation does not gained much highlights on it. A
neural network combined with one another in a
network may produce a more realistic image that
machine learning (ML) model to make predictions
corresponding to the given text, for instance, and a
that are more accurate. They are extensively utilized
manual algorithm of one hot encoding can be used to
in the creation of images, videos, and voices. VAE
change the properties of the produced image.
and GAN are two recent innovations that stand out as
However, investigations on text-to-video production
current examples of rapid, prolific, and high-quality
are few and often undertaken at a lower resolution
growth. A model known as the Generative
than text-to-image generation. So, in order to widen
Adversarial Network (GAN) was put forth by Good
the scope of video generation, we concentrated on
fellow et al. A generator and a discriminator that
creating a conditional video that hasn't been widely
have been taught with conflicting goals make up a
studied in this sector. In this paper, a new

Authorized licensed use limited to: Somaiya University. Downloaded on July 23,2024 at 08:33:23 UTC from IEEE Xplore. Restrictions apply.
representation of using GANs in text-to-video IV.LITERATURE SURVEY
generating tasks is presented. This paper suggest a
novel network that creates videos in accordance with In [1], It suggest a revolutionary deep generative
provided descriptions. The network's learning method that can create videos from an image of a
structure is based on the fundamental idea that linked single face and designating a certain facial emotion,
frames in a video contain a lot of continuity. It will such an unintended smile. A frame sequence
be simple to generate a connected frame because they generator and an image generator make up the two
are related if we can create one high-quality video main components of our architecture. While the
frame. GAN is trained with respect to single image picture generator makes use of a deep neural model
and subsequently expand it to longer frames. GAN that combines GAN and VAE. The framework’s
may learn to produce lengthy scenes by sequential generator uses a neural image and a label
incrementally improving its ability to generate a large as inputs and generates a collection of secret
number of adjacent frames. Our extensive projections with smooth transitions that are
experimental findings demonstrate that, in addition to equivalent to video frames. The actual face images
producing an appropriate video for a given text, our are then created by decoding the hidden
approach also generates outcomes that are sharper representations using the image generator.
and better in both qualitative and quantitative terms
than those seen in previous comparable works. In[2], order to transfer hierarchical information from
the multiple picture semantic comprehension task to
II.PROBLEM STATEMENT the T2IS problem, the CKD approach has been
developed in this study. Using its experience with
Video creation from natural language phrases will image semantic comprehension challenges, the T2IS
have a significant influence since humans are able to can learn the transfer from semantic information to
listen to and read human language sentences as well picture content. In T2IS, text descriptions and
as imagine or see the things that are being discussed. artificial images are used. The distillation process is
Video is more effective than written words or text. divided into several steps using a multi-stage
Youngsters these days often do not have much time knowledge distillation paradigm. As a result, the
to go through an entire article to understand the visual quality issue for T2IS, can be resolved by
content yet they want to know all the important approximating the genuine picture distributions for
elements of the article. Hence there is a requirement T2IS, this carry out extensive tests using widely used
for video generation system that can create datasets to confirm the efficacy of our suggested
interesting, engaging, concise and high-quality CKD strategy.
videos from text stories with little or no human
intervention.

III.OBJECTIVE

To develop a model for the generation of video from

text using conditional Generative Adversarial
Network. To create videos that depict human
activities and facial emotions, where the human looks
is decided by a takes an input image and the face
expression or activity is decided by a category label,
such as "smile." Ultra resolution imaging, video to
video conversion, along with image and face
retouching, have all benefited from conditional
generation. Our approach, a cGAN architecture, aims
to produce human activities and facial emotions from
a single picture.
Fig. 1. a multi-stage knowledge distillation paradigm
[2]

Authorized licensed use limited to: Somaiya University. Downloaded on July 23,2024 at 08:33:23 UTC from IEEE Xplore. Restrictions apply.
In[3], A new Attention-Transfer Mechanism multiple pictures is determined using the DIRECT
(AATM) and Semantic Distillation Mechanism optimization technique.
(SDM) have been developed, along with a
Knowledge-Transfer Generative Adversarial
Network (KT-GAN) for the synthesis of Text to
Image (T21). To better encrypt text features and
synthesis photographic images, the SDM leverages
Image-to-Image synthesis as guidance. The AATM
aids the generator in gradually recognizing crucial
words and enhancing the specifics of the synthesized
image. The heterogeneous gap is effectively closed
by SDM and AATM, enabling the generator to
synthesize excellent images. Extensive experimental
findings and analysis showed that KT-GAN was
effective.

In[4], The inherent complexity of text makes it

difficult to create videos from it. Obtaining the
required multilevel linguistic alignment and ensuring Fig. 2. Text to Image Generation Stage[6]
the reality of the resulting video, it presented a novel
Bottom Up Generative Adversarial Network The findings are proven to be good in circumstances
(BoGAN) in this research in order to guarantee the where occlusion regions in the obtained 2D picture
coherence. It created a bottom-up optimization are created by other objects that are inclined toward
approach with three levels, from regional to
the optical axis of the camera view in addition to
worldwide, for the semantic alignment of a video overlaps of other locations in the scene. Additionally,
with a language description. The strength of the a 2.1D sketch efficiently helps arrange the image's
architecture it has described is demonstrated by the numerous parts into a hierarchy that determines
proposed method superior performance on the which portions are occluded or not when 3D images
benchmarks compare to its rivals. This suggested a
are generated. Additionally, the creation of 3D
approach also outperforms the rivals in studies images makes use of inter-region depths in relation to
involving humans, which is a significantly stronger any given referenced layer. The targeted area in the
indication of the efficacy of our strategy. current investigation is the preferred layer. It is
simple to see that either the background or
In[6], It has been shown that GANs are capable of foreground picture planes could be used to compute
creating sharp pictures. This demonstrates that
inter-region depth. The findings presented in this
beginning the text to image stage for text to video research were obtained by taking into account the
network training could lead to more successful video inter-depth between focused and unfocused layers in
creation. It separated the training step into two a 2.1D sketch.
sections; they are Text to Image Generation and
Evolutionary Generation. Up until the target video
According to [8], they suggested to create document
length is obtained, the training cycle is repeated. It pictures specifically for documents that have been
begins by learning how to create text to single collected or recovered. Document synthesis (left) and
pictures and then the training progresses, the number distortion simulation are the two main components of
of photos produced steadily rises. our approach (right). The IDNet system, which
includes networks for swapping faces, restoration of
In [7], Using a single RGB image and a decomposed faces, and editing text, is recommended for the
multiple plane image from a CFA image created improvement of real document pictures. The
using a 2.1D sketch, this provides a novel way for recommended method for dividing a document
creating 3D images in this paper. Depth estimation is picture into three parts uses the text, image and
carried out by estimating the disparities between the background region.. It should be noticed that the first
RG and BG planes using a phase correlation
two categories do not include the background region.
approach, which is known to be deterministic in After source pictures are used to replace the region's
nature. The inter-depth between decomposed

Authorized licensed use limited to: Somaiya University. Downloaded on July 23,2024 at 08:33:23 UTC from IEEE Xplore. Restrictions apply.
original material, it is then brought back to high popular datasets, and the finest quantitative outcomes
resolution. TENet or Text Editor is used to alter the serve as proof of the Bridge-efficacy GAN's.
text area. The backdrop area of the original document
is kept untouched when we combine them into a new In [11], it proactively poses and addresses the issue
original document image. After IDNet creation, the of interactive crowd video generating in this study.
fresh pictures are converted into a captured document Our proposed model, Crowd GAN, combines two
using GAN-based distortion simulation. The success task-specific networks with a self-selective fusion
of our approach is then assessed using CNN-based module to increases the merits of adversarial learning
recapture detection on the output captured/recaptured and flow-based warping. This model can provide
images. user friendly, realistic crowd photos when given the
guidance map for the subsequent frame. Extensive
In [9], this paper, It suggest a fully trained generative testing demonstrates that our approach can result in
adversarial network for the synthesis of text and visually convincing and continuous crowd films. It
faces in images. The research describes a network also shows other applications that the suggested
that has been trained to produce high-quality images approach can improve. Due to the constraints of the
in relation to the input phrases using both a text already-existing crowd video datasets discussed
encoder and an image decoder. We conduct in-depth above, the lack of sufficient diversity in the crowd
tests on the publicly accessible dataset to demonstrate scenarios restricts the generalization of the approach
the superiority of our suggested methodology. to a certain extent. The performance of the system
Additionally, we have contributed to the text to face will likely be enhanced in the future as we annotate a
creation dataset for this innovative challenge. The larger and more generation focused crowd video
locally produced photos and various publicly collection. This will be especially helpful for
accessible datasets have been integrated. Following addressing denser and more complex crowd
that, manual categorization of each photograph was circumstances. The collection will offer extra data
done by hand. The details of the resemblance with particular people of the curve or twisting motion
between the generated faces and the input ground in an effort to boost user engagement. If this is the
truth description words are also presented in the case, Crowd GAN will contribute more in order to
proposed work. According to research, the solve the issue of a shortage of crowd data and, if
recommended generative adversarial network needed, make more helpful crowd films. The study
generates realistic pictures with excellent quality and [12] raises the problem of detecting cursive text, and
a face that resembles the labels and faces from the this paper propose a segment of free method using a
real data. Using FID and FSD scores, it compared the deep learning convolutional rnn, with an emphasis on
suggested method to cutting edge techniques. writing in Urdu in real world settings. Urdu text
Proposed model received utilizing a benchmark detection is much more challenging to interpret than
approach with a FID score of 42.62 from Fully non-cursive scripts because of various writing styles,
Trained Generative Adversarial Networks, which is a variety of letter forms, linked text, continuous
relatively lower than other algorithms. Furthermore, overlaying, elongated, vertical, and compressed text.
human evaluations of our created photos are also Instead of pre segmenting the word picture into
credible. individual characters, the recommended model first
converts a full word image into the continuous frame
In[10], this research, it presents the Bridge-GAN of the relevant attributes. This model consists of a
strategy to solve the text to image synthesis's content deep CNN with very short connections for feature
consistency problem. By assuring the essential visual extraction and encoding, a recurrent neural network
information from the text descriptions, a transitional (RNN) for feature decoding of convolution layers
area is created as a bridge in our Bridge-GAN and a connectionist temporal classification for
approach, allowing the interpretable representation to converting predicted sequences into target labels. In
be learned. For the development of optimizing the order to extract more beneficial Urdu text
transitional, space we also deduced a smaller bound characteristics, we study more sophisticated CNN
for the ternary mutual information goal. In order to architectures including VGG 16, VGG 19, ResNet
optimize the mutual information between the given 18, and ResNet 34 and then compare the recognition
written contents, the interpretable representation, and results in an effort to further increase text recognition
the observations of synthetic pictures, this objective precision.
was created. We do extensive research on two

Authorized licensed use limited to: Somaiya University. Downloaded on July 23,2024 at 08:33:23 UTC from IEEE Xplore. Restrictions apply.
According to [13], It suggests a novel framework for image GAN model can transmit all of the facial traits
producing face videos. The suggested approach is a inside an attached text description while still being
two-stage framework in which a source face image is aesthetically pleasing.. Our model combined visual-
first used to generate temporal 3D dynamics, and linguistic features from a well disentangled latent
they are then created using a suggested sparse texture space by using the hierarchical structure of the
mapping methodology to express face structural cutting edge Style GAN model. Based on our
characteristics and image data. The produced sparse research, we conclude that adding CLIP elements to
texture then serves as a trustworthy precursor for the our framework encourages results with richer
construction of faces. It differs from most preceding contextual meaning without compromising the
approaches, which both concentrate on face video overall uniqueness of the facial results. Furthermore,
generation and face video prediction, as opposed to it demonstrates that altering a linear based attention
those which just concentrate on face video mechanism makes it easier to produce reliable
generation. Both face video generation and face pictures.
video prediction are the main topics. To demonstrate
the success of the approach, three difficult tasks— According to [17], They provide novel approaches
video restoration, video forecasting, and target-driven for developing a thorough framework for scheduling
video prediction—have been completed. SL translation, creation, and identification operations
under real-world conditions. To improve recognition
In[14], To deal with the problem of limited accuracy, we use a hybrid Convolutional Neural
information, we present an updated framework, Network (CNN) and Bi-directional Long Short Term
RiFeGAN2, to quickly enrich the input caption. In Memory (Bi-LSTM) model for posture feature
order to speed up retrieval and boost retrieval quality, extraction and text synthesis. On the other hand, sign
RiFeGAN2 uses a domain specific restricted model gesture movies are created for certain spoken
to filter previous intelligence. It then uses a domain utterances using a hybrid Neural Machine
scorer and ranking scorer to improve the candidates Translation (NMT), Media Pipe, and Dynamic
that have been collected. Additionally, in order to Generative Adversarial Network (GAN) model. The
highlight the input caption and enhance semantic suggested approach achieves above 95%
consistency, we suggest SAEM2s with a center- classification accuracy while resolving various
attention layer. Compared to the competition, our problems with the current techniques. At various
models can create visuals that are more realistic, phases of development, the model's efficacy is
according to tests on frequently used datasets, and assessed as well, and the assessment metrics show
they also have better semantic consistency. The actual improvements in our model. Comparing the
outcomes further show that the suggested models can model's performance to previous multi-language
use numerous captions to facilitate interactive reference sign datasets used in the testing, it performs
operations and can increasingly fulfills the given better in terms of picture quality and greatest
content or the given text. accuracy.

In[15], It introduces Semantic Object Accuracy In [18], the objective of the field of vision and
(SOA), a novel assessment metric that gauges how language known as "text to image synthesis" is to
precisely a model can create unique things in acquire multimodal representations between the
pictures. With the use of this new SOA assessment, attributes of the picture and the text. It is thus
text to picture generative models can be reviewed necessary to be able to comprehend how the various
more completely and failure and success traits for elements in the given text relate to one another and
specific items and object classes may be identified. In create stunning graphics using that understanding.
contrast to other measures like the Inceptions Score, Translation from text to picture is referred to as
the SOA score is similar to the ranking derived by neural network visual thinking. Our algorithm
human evaluation, according to a user survey extrapolates the complex relationships between the
involving 200 participants. None of the cutting edge text's objects using its existing knowledge to produce
SOA approaches that have been examined can the final image. We define a variety of innovative
currently produce convincing moving objects for a adversarial loss functions and then show which one
sample of the 80 classes in the COCO data set. improves the text to image synthesis's capacity for
reasoning. Surprisingly, the majority of our models
This paper [16] illustrated how a text description to are capable of reasoning for themselves. The

Authorized licensed use limited to: Somaiya University. Downloaded on July 23,2024 at 08:33:23 UTC from IEEE Xplore. Restrictions apply.
superiority of our strategy is shown by quantitative explored more dataset about human actions like
and qualitative comparisons with various kinetics, MUG, MSRVTT, Celeb that is used for
methodologies. generating videos. This model gives higher
resolution video, compared to the existing models.
In [19], they address the difficulty of learning from The study is extended in the field of different GAN
diverse data sources. Our method is designed to teach and tends to explore more into Conditional
a combined textvideo embedding and can Generative Adversarial Network (CGAN). However,
compensate missing video modalities during training. the production of videos is less researched in the
To accomplish this attribute, we can provide subject of deep learning, thus the area of deep
Mixture-of Embedding-Experts (MEE) framework learning is looking into video generation and
that contrasts text with various video modalities. The processing
model takes into consideration the distinctive
contributions of each modality and develops skillful REFERENCES
weights based on the end to end learning process.
[1]. W. Wang, X. Alameda-Pineda, D. Xu, E. Ricci, and N. Sebe,
During training, we integrate datasets from both ―Learning How to Smile: Expression Video Generation With
image captioning and video captioning, and we Conditional Adversarial Recurrent Nets‖, IEEE Trans.
regard images as a particular case of motionless, Multimedia, vol. 22, no. 11, pp. 2808-2819, Nov 2020.
[2]. M. Yuan and Y. Peng, ―CKD: Cross-Task Knowledge
soundless movies. For instance, even if "banana" Distillation for Text-to-Image Synthesis‖, IEEE Trans.
only appears in training photos and never in training Multimedia, vol. 22, no. 8, Aug 2020.
[3]. H. Tan, X. Liu, M. Liu, B. Yin, and X. Li, ―KT-GAN:
videos, our technique can nevertheless learn an Knowledgetransfer generative adversarial network for textto-
embedding for "Eating banana." The MPII Movie image synthesis,‖ IEEE Trans. Image Process., vol. 30, pp.
Description and MSR-VTT datasets are used to 1275–1290, 2021.
[4]. Q. Chen, Q. Wu, J. Chen, Q. Wu, A. van den Hengel, and M.
assess our method for solving the video retrieval Tan, ―Scripted Video Generation With a Bottom-Up
problem. On tasks that involve text to video and Generative Adversarial Network‖, IEEE Trans. Image Process.,
video to text retrieval, the recommended MEE model VOL. 29, pp. 7454–7467,2020.
[5]. J. Dong, Y. Wang, X. Chen, X. Qu, X. Li, Y. He, and X. Wang,
beats all previously published techniques. ―Reading-Strategy Inspired Visual Representation Learning
for Text-to-Video Retrieval‖ IEEE Trans. Circuits Syst. Video
Technol., VOL. 32, No. 8, pp. 5680– 5694, Aug 2022.
In [20], It uses a text to image deep learning model [6]. D. Kim, D. Joo and J. Kim, ―TiVGAN: Text to Image to Video
GAN, which has become one of the most exciting Generation With Step-by-Step Evolutionary Generator‖, IEEE,
research topics in our period, to get around this vol. 8, pp. 153113–153122, 2020.
[7]. R. Deshpande, CH. Renu Madhavi And M. Ram Bhatt, ―3D
problem and can create images from the descriptions Image Generation From Single Image Using Color Filtered
that go with them. Scene retrieval performance is Aperture and 2.1D Sketch-A Computational 3D Imaging
considerably improved by Query is GAN, which is System and Qualitative Analysis‖, IEEE, vol. 9, pp. 93580–
93592, Jul 2021.
based on the text to image GAN, in our upgraded [8]. G. Zhu, Y. Ding and L. Zhao, ―A Document Image
retrieval system. In our novel method, queries for the Generation Scheme Based on Face Swapping and Distortion
Generation‖, IEEE, vol. 10, pp. 78827–78837, 2022.
scene retrieval issue are created from text to image [9]. M. Zeeshan Khani, S. Jabeen, M. Usman Ghani Khan, T. Saba,
GAN-generated pictures. Additionally, we show that, A. Rehmat, A. Rehman and Usman Tariq, ―A Realistic Image
unlike earlier work on text to picture GANs, which Generation of Face From Text Description Using the Fully
Trained Generative Adversarial Networks‖, IEEE, vol. 9, pp.
largely concentrated on the development of high 1250–1260, Aug 2020 .
quality images, the produced images, although not [10]. M. Yuan and Y. Peng, ―Bridge-GAN: Interpretable
being aesthetically pleasing, contain appropriate representation learning for text-to-image synthesis,‖ IEEE
Trans. Circuits Syst. Video Technol., IEEE, vol. 30, no. 11, pp.
visual features suitable for the query. It use scene 4258–4268, Nov. 2020.
retrieval from actual video datasets to empirically [11]. L. Chai, Y. Liu, W. Liu, G. Han, and S. He, ―CrowdGAN:
Identity-Free Interactive Crowd Video Generation and Beyond‖,
assess the effectiveness of the suggested approach. IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 6, Jun
2022.
V.CONCLUSIONS [12]. A. Ali Chandio, MD. Asikuzzaman, R. Pickering and M.
Leghari, ―Cursive Text Recognition in Natural Scene Images
Using Deep Convolutional Recurrent Neural Network‖, IEEE,
Deep learning-based image generation has drawn the vol. 29, 2020.
[13]. X. Tu, Y. Zou, J. Zhao, W. Ai, J. Dong, Y. Yao, Z. Wang, G.
attention of researchers across a variety of fields, Guo, Z. Li, W. Liu and J. Feng, ―Image-to-Video Generation
particularly on conditional data, video creation, in via 3D Facial Dynamics‖, IEEE Trans. Circuits Syst. Video
contrast, is still a difficult and underrated field. It has Technol., vol. 32, no. 4, pp. Apr 2022.
[14]. J. Cheng, F. Wu, Y. Tian, L. Wang, and D. Tao, ―RiFeGAN:
been studied about Conditional GAN and It has been Rich feature generation for text-to-image synthesis from prior

Authorized licensed use limited to: Somaiya University. Downloaded on July 23,2024 at 08:33:23 UTC from IEEE Xplore. Restrictions apply.
knowledge,‖ IEEE Trans. Circuits Syst. Video Technol., vol. 32, [18]. H. Lee, G. Kim, Y. Hur and H. Seok Lim,, ―Visual Thinking
no. 8, pp.5187-5200, Aug 2022. of Neural Networks: Interactive Text to Image Synthesis‖,
[15]. T. Hinz, S. Heinrich, and S. Wermter, ―Semantic object IEEE, vol. 9, pp. 64510-64523, Apr 2021.
accuracy for generative text-to-image synthesis,‖ IEEE Trans. [19]. A. Miech, I. Laptev, and J. Sivic, ―Learning a text-video
Pattern Anal. Mach. Intell., early access, Sep. 2, 2020, doi: embedding from incomplete and heterogeneous data,‖ 2018,
10.1109/TPAMI.2020.3021209. arXiv:1804.02516.
[16]. U. Osahor and M. Nasrabadi, ―Text-Guided Sketch-toPhoto [20]. R. Yanagi, Rentogo, T. ogawa and M. Haseyama, ―Query is
Image Synthesis‖, IEEE, vol. 10, pp. 98278-98289, Nov 2022. GAN: Scene Retrieval With Attentional Text-to-Image
[17]. B. Natarajan, E. Rajalakshmi, R. Elakkiya, Ketan Kotecha, Generative Adversarial Network‖, IEEE, vol. 7, no. 22, pp.
Ajith Abraham, Lubna Abdelkareim Gabralla and V. 153183-153193, Oct 2019.
Subramaniyaswamy, ―Development of an End-to-End Deep
Learning Framework for Sign Language Recognition,
Translation, and Video Generation‖, IEEE, vol. 10, pp. 104358 -
104374, Sep 2022.

Authorized licensed use limited to: Somaiya University. Downloaded on July 23,2024 at 08:33:23 UTC from IEEE Xplore. Restrictions apply.

18.4 Evaluating and Choosing The Best Hypothesis: Model Selection: Complexity vs. Goodness of Fit
No ratings yet
18.4 Evaluating and Choosing The Best Hypothesis: Model Selection: Complexity vs. Goodness of Fit
8 pages
Predicting The Reviews of The Restaurant Using Natural Language Processing Technique
No ratings yet
Predicting The Reviews of The Restaurant Using Natural Language Processing Technique
4 pages
Nikhil MOOC Report
No ratings yet
Nikhil MOOC Report
16 pages
MCQ On Knowledge Representation 5eea6a0e39140f30f369e525
No ratings yet
MCQ On Knowledge Representation 5eea6a0e39140f30f369e525
21 pages
Scalable Parallel Computing
No ratings yet
Scalable Parallel Computing
11 pages
15 TCS NQT 2025 Logical Reasoning PYQs
No ratings yet
15 TCS NQT 2025 Logical Reasoning PYQs
11 pages
Sonata Software Sample Aptitude Placement Paper Level1
No ratings yet
Sonata Software Sample Aptitude Placement Paper Level1
7 pages
440 Sample Questions Dec
No ratings yet
440 Sample Questions Dec
7 pages
Report - Responses
No ratings yet
Report - Responses
9 pages
Bput Coa
No ratings yet
Bput Coa
2 pages
Pune University Soft Computing Exam Papers
No ratings yet
Pune University Soft Computing Exam Papers
4 pages
Co Din G & de Cod Ing: Here Starts Learning
No ratings yet
Co Din G & de Cod Ing: Here Starts Learning
28 pages
Project Ideas-Infosys
100% (1)
Project Ideas-Infosys
2 pages
21cs502 Ai Unit-I Notes Short 42 Pges
No ratings yet
21cs502 Ai Unit-I Notes Short 42 Pges
42 pages
Chapter 6-NLP
No ratings yet
Chapter 6-NLP
8 pages
Ai-Unit2 - QB-VDP
No ratings yet
Ai-Unit2 - QB-VDP
13 pages
Ai Unit 4
No ratings yet
Ai Unit 4
23 pages
r20 - Aiml (CSM) Syllabus
No ratings yet
r20 - Aiml (CSM) Syllabus
175 pages
Artificial Intelligence Question Bank-RICH
No ratings yet
Artificial Intelligence Question Bank-RICH
10 pages
Google App Engine
No ratings yet
Google App Engine
10 pages
CSE Dept. PPT 176 173
No ratings yet
CSE Dept. PPT 176 173
17 pages
Letter: Principal's MESSAGE
No ratings yet
Letter: Principal's MESSAGE
4 pages
Uid-Graphical System Advatages
No ratings yet
Uid-Graphical System Advatages
21 pages
Using Predicate Logic
No ratings yet
Using Predicate Logic
29 pages
Human Computer Interaction Cognitive Models
100% (1)
Human Computer Interaction Cognitive Models
12 pages
Artificial and Computational Intelligence
No ratings yet
Artificial and Computational Intelligence
124 pages
Synopsis: Stock Agent - A Java Stock Market Trading Program
No ratings yet
Synopsis: Stock Agent - A Java Stock Market Trading Program
27 pages
Tcs NQT Questions PDF
100% (1)
Tcs NQT Questions PDF
49 pages
Ai Model Question Paper-4
No ratings yet
Ai Model Question Paper-4
23 pages
TCS-NQT - Question-Paper - 3
No ratings yet
TCS-NQT - Question-Paper - 3
30 pages
TCS NQT Coding Sheet - TCS Coding Questions - Updated 2022
No ratings yet
TCS NQT Coding Sheet - TCS Coding Questions - Updated 2022
8 pages
AI Resume Analyzer Presentation
No ratings yet
AI Resume Analyzer Presentation
15 pages
VTU OLD QP@AzDOCUMENTS - in
No ratings yet
VTU OLD QP@AzDOCUMENTS - in
18 pages
CS UNIT 6.2 Case Studies
No ratings yet
CS UNIT 6.2 Case Studies
14 pages
AAI Module 2 Notes
No ratings yet
AAI Module 2 Notes
14 pages
CSE - 2022 Scheme & Syllabus
No ratings yet
CSE - 2022 Scheme & Syllabus
209 pages
AI Unit II All Topics
No ratings yet
AI Unit II All Topics
114 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
83 pages
Deep Learning Techniques Notes
No ratings yet
Deep Learning Techniques Notes
42 pages
Question Paper Nagarparishad Engineering Services Sindhudurg Solved Exam Paper Computer 2013
No ratings yet
Question Paper Nagarparishad Engineering Services Sindhudurg Solved Exam Paper Computer 2013
5 pages
Important Questions - AI - 2 Marks
No ratings yet
Important Questions - AI - 2 Marks
4 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
20 431 Internship PPT Final
No ratings yet
20 431 Internship PPT Final
19 pages
CO1 CC PPT Session 6
No ratings yet
CO1 CC PPT Session 6
22 pages
Assignment 1 Solution
No ratings yet
Assignment 1 Solution
6 pages
Design A Learning System in Machine Learning
No ratings yet
Design A Learning System in Machine Learning
41 pages
Parallel DBMS Vendors
No ratings yet
Parallel DBMS Vendors
14 pages
Apollo Institute of Engineering and Technology: Question Bank Branch: IT Subject: Artificial Intelligence (3161608)
No ratings yet
Apollo Institute of Engineering and Technology: Question Bank Branch: IT Subject: Artificial Intelligence (3161608)
2 pages
PHD CS 100 Questions Answers Complete
No ratings yet
PHD CS 100 Questions Answers Complete
9 pages
STQA MiniProject
No ratings yet
STQA MiniProject
13 pages
IOT Mod4@AzDOCUMENTS - in
No ratings yet
IOT Mod4@AzDOCUMENTS - in
17 pages
R18 B.Tech - CSE (AIML) 3-2 Tentative Syllabus
No ratings yet
R18 B.Tech - CSE (AIML) 3-2 Tentative Syllabus
24 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
Obstacles and Pitfalls in Development Path: Unit-Ii
No ratings yet
Obstacles and Pitfalls in Development Path: Unit-Ii
10 pages
Operating System Gate Questions
No ratings yet
Operating System Gate Questions
11 pages
STM Question Paper R18
No ratings yet
STM Question Paper R18
2 pages
Logical Ability: Directions For Questions 1 and 2: Answer The Questions Based On The
No ratings yet
Logical Ability: Directions For Questions 1 and 2: Answer The Questions Based On The
27 pages
Tivgan: Text To Image To Video Generation With Step-By-Step Evolutionary Generator
No ratings yet
Tivgan: Text To Image To Video Generation With Step-By-Step Evolutionary Generator
10 pages
VP 16
No ratings yet
VP 16
3 pages
A Good Image Generator Is What You Need For High Resolution Video Synthesis
No ratings yet
A Good Image Generator Is What You Need For High Resolution Video Synthesis
23 pages
BDA Practical Experiment 1
No ratings yet
BDA Practical Experiment 1
5 pages
Final Report
No ratings yet
Final Report
22 pages
Sem6 Minor Report
No ratings yet
Sem6 Minor Report
33 pages
Controllable Video Generation With Text-Based Instructions
No ratings yet
Controllable Video Generation With Text-Based Instructions
12 pages
Automated Facial Recognition
No ratings yet
Automated Facial Recognition
6 pages
Des Example Something
No ratings yet
Des Example Something
12 pages
LogBook 1
No ratings yet
LogBook 1
5 pages
Financial Education and Financial Knowledge: Evidence From Indian Schools
No ratings yet
Financial Education and Financial Knowledge: Evidence From Indian Schools
39 pages
A Survey of AI Text-to-Image and AI Text-to-Video Generators
No ratings yet
A Survey of AI Text-to-Image and AI Text-to-Video Generators
5 pages
Social Network Graph Mining
No ratings yet
Social Network Graph Mining
34 pages
Issues of Operating Systems Security
No ratings yet
Issues of Operating Systems Security
6 pages
Remus A Security Enhanced Operating Syst
No ratings yet
Remus A Security Enhanced Operating Syst
26 pages
Imaler: An Adversarial Attack Framework To Obfuscate Malware Structure Against Dgcnn-Based Classifier Via Reinforcement Learning
No ratings yet
Imaler: An Adversarial Attack Framework To Obfuscate Malware Structure Against Dgcnn-Based Classifier Via Reinforcement Learning
7 pages
NPTEL ML Assignment Week1
100% (4)
NPTEL ML Assignment Week1
5 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
ITM Demo Quiz - Mid Term
No ratings yet
ITM Demo Quiz - Mid Term
9 pages
Presentation On: Neural Network
No ratings yet
Presentation On: Neural Network
30 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
26 pages
Python Latest Ieee Extension Titles
No ratings yet
Python Latest Ieee Extension Titles
16 pages
Question Bank
No ratings yet
Question Bank
2 pages
Power Line Recognition From Aerial Images With Deep Learning
No ratings yet
Power Line Recognition From Aerial Images With Deep Learning
12 pages
5 10
No ratings yet
5 10
6 pages
How To Unleash The Power of Large Language Models For Few-Shot Relation Extraction
No ratings yet
How To Unleash The Power of Large Language Models For Few-Shot Relation Extraction
11 pages
Question Bank For Insem AIML
No ratings yet
Question Bank For Insem AIML
1 page
AI-Qu Paper-2022-23
No ratings yet
AI-Qu Paper-2022-23
8 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
ECE3009 Neural-Networks-and-Fuzzy-Control ETH 1 AC40 PDF
No ratings yet
ECE3009 Neural-Networks-and-Fuzzy-Control ETH 1 AC40 PDF
2 pages
ANN Calculations
No ratings yet
ANN Calculations
24 pages
Plant Disease and Pest Detection Using Deep Learning-Based Features
No ratings yet
Plant Disease and Pest Detection Using Deep Learning-Based Features
16 pages
7 - BV - Ananda - Path - All - Chapter PDF
No ratings yet
7 - BV - Ananda - Path - All - Chapter PDF
86 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Sleep Apnea Detection From Single-Lead ECG-A Comprehensive Analysis of Deep Learning Algorithms
No ratings yet
Sleep Apnea Detection From Single-Lead ECG-A Comprehensive Analysis of Deep Learning Algorithms
12 pages
Unlocking Underrepresented Use-Cases For Large Language Model-Driven Human-Robot Task Planning
No ratings yet
Unlocking Underrepresented Use-Cases For Large Language Model-Driven Human-Robot Task Planning
15 pages
New Methodology For Grouping Electric Power Iet-Gtd.2012.0472
No ratings yet
New Methodology For Grouping Electric Power Iet-Gtd.2012.0472
7 pages
Neural Network Architecture Search Enabled Wide Deep Learn - 2024 - Artificial I
No ratings yet
Neural Network Architecture Search Enabled Wide Deep Learn - 2024 - Artificial I
13 pages
10 Artificial Intelligence
No ratings yet
10 Artificial Intelligence
6 pages
Human Emotion Detection Using Deep Learning
No ratings yet
Human Emotion Detection Using Deep Learning
7 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Sequence Models
No ratings yet
Sequence Models
85 pages
Deep CNN Model Based On VGG16 For Breast Cancer Classification
No ratings yet
Deep CNN Model Based On VGG16 For Breast Cancer Classification
6 pages
DWDM PPT
No ratings yet
DWDM PPT
35 pages
Affan Abbas: Computer Vision Engineer
No ratings yet
Affan Abbas: Computer Vision Engineer
1 page