NLPReport Phase 1

Uploaded by

Raymond Themi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

NLPReport Phase 1

Uploaded by

Raymond Themi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Neural Text-To-Speech Synthesis Literature Review

Ahmed Tammaa Ahmed Ramy

Artificial Intelligence Major Artificial Intelligence Major
The British University in Egypt The British University in Egypt
Cairo, Egypt Cairo, Egypt
[email protected] [email protected]

Abstract—This paper covers a literature review for the Neural function. The final output is the element-wise multiplication of
TTS synthesis by comparing different vocoders. Our Results the two halves as shown in formula one. Lastly, it is building
shows that High Fidelity GAN (HiFi-GAN) is currently style is based on residual blocks which allows the output of layer
outperform its main competitors which are Mel-GAN for non- 𝑖 to the th layer 𝑖 + 𝑚 deeper layer this fix the problem of deep
auto-regressive vocoders and the auto-regressive WaveNet and convolution vanishing gradient and skip channels. As shown in
GlowNET as the Hifi-GAN has Mean Opinion Score (MOS) of 4.36 Figure1.
with comparison by ground truth of 4.45. The closest performing
network to Hifi-GAN is the WaveNet which has MOS of 4.02. 𝐺𝐿𝑈(𝑥𝑖 𝑤𝑖 + 𝑏𝑖 ) = ∑(𝑥𝑖 𝑤𝑖 + 𝑏𝑖 ) ⊗ 𝜎(𝑥𝑖 𝑤𝑖 + 𝑏𝑖 ) (1)

Keywords—text-to-speech, accessibility, Tacotron2, Generative On contrary, there are another non-auto-regressive models
Adversarial Network, GAN, Word Embedding, CBOW, Skip-gram, such as HiFi-GAN which is discussed in the next sub-section.
TTS, Natural Language Processing, NLP, Gensim, Deep Learning

I. INTRODUCTION
There are 32.4 million blind people worldwide and another
191 million visually impaired due to Cataracts only [1]. The
elder people suffer when reading screens. Visual impairment
and blindness are considered one of the most challenging
accessibility domains for computer developers. Text-to-Speech
(TTS) is considered one of the solutions that can help on creating
better accessibility for computers for everyone. However, the
TTS sounds unnatural and inconvenient for the users. In some
languages, such as Arabic, the pronunciation is wrong. Hence,
Generative Adversarial Networks can naturalize and improve
the output through High Fidelity GAN (HIFI-GAN) [2]. This
Figure 1 WaveNet Architecture
paper provides a literature review for the text-to-speech
synthesis, word embeddings comparative study between Skip-
Gram and Continuous Bag of Words (CBOW) and discusses B. Hifi GAN
Nvidia’s Tacotron2 and HiFi-GAN for TTS tasks.
HiFi GAN, or High Fidelity GAN is introduced, it is a
II. LITERATURE REVIEW Generative Adversarial Networks model able to generate high
quality speech with high computation efficiency. This model
A. Vocoders
achieved higher Mean Opinion Score (MOS) than other speech
The Neural TTS does not produce a synthesized speech synthesis models such as WaveNet, WaveGlow, and MelGAN.
directly. Instead, it outputs acoustic features. For Instance, The architecture of the model consists of one generator and two
Nvidia’s Tacotrron2 outputs a mel-spectogram that needs to be discriminators in which both are trained in parallel with two
converted to a raw waveform to be heard. One of the popular
losses to improve the model performance and the generated
vocoders is WaveNet. WaveNet is an autoregressive model [3].
speech [4].
Autoregressive models are the models that predict the future
based on past data. The WaveNet architecture is consisting of
dilated convolutions and Gated Activation Units (GLU). The • Generator
dilated convolution is convolution in it increases the kernel size The input of the generator is a mel spectogram, as
by inserting holes between neighboring elements. The GLU shown in Figure 2, the generator uses transposed
activation function works by doubling the input layer into two convolutions (ConvTransopse) for upsampling the
halves. The First half flows normally with its weights and biases, mel spectrogram to match the length of the raw
while the second half goes through a sigmoid activation waveform.

Natural Language Processing Phase 1 ©2022 IEEE

spectrogram loss is shown below, the φ is a function
• Discriminator that converts waveform to mel spectrogram.
Since the audio clips consist of sinusoidal signals, the ℒMel (𝐺) = 𝔼(𝑥,𝑠) [‖(𝜙(𝑥) − 𝜙(𝐺(𝑠))‖1 ]
periods should be identified to help generating
realistic speech. To achieve this, the paper proposed
the multi period discriminator (MPD) model consists
of small sub discriminators where each obtains a
periodic part from the raw waveform. The MPD model
was compared with the multi scale discriminator
(MSD) which was introduced in MelGAN[4].

• Training Loss
Both the generator and the discriminator have 2
different losses. The loss function is used by the
generator to improve the generated audio sample
based on the feedback of the discriminator. The aim of
the discriminator is to maximize the loss to be able to
classify the output of the generator as fake, while the
Figure 2 Hi-fi GAN architecture
generator aims to minimize the loss for deceiving the
discriminator to classify the output as real.
• Feature matching loss
The loss function of the discriminator (1) formula two the feature matching loss is used as an additional loss
and the generator (2) are shown in formula three, for the generator. The loss is the distance between the ground
where x denotes the audio clip and s denotes the input truth audio and a generated sample in each feature space. The
(mel spectrogram of the of the audio clip. feature matching loss is shown in formula four, T denotes
ℒAdv (𝐷; 𝐺) = 𝔼(𝑥,𝑠) [(𝐷(𝑥) − 1)2 number of layers in the discriminator, Di denotes the features,
2
Ni denotes the number of features in the 𝑖 𝑡ℎ layer. The Final
+ (𝐷(𝐺(𝑠))) ] (2) Loss is in formula five.

1
ℒAdv (𝐺; 𝐷) = 𝔼𝑠 [(𝐷(𝐺(𝑠)) − 1) (3)
2 ℒFM (𝐺; 𝐷) = 𝔼(𝑥,𝑠) [∑ ||𝐷𝑖 (𝑥)
𝑁𝑖
− 𝐷𝑖 (𝐺(𝑠))|| ](4)
1

• Mel Spectrogram Loss

ℒG = ℒAdv (𝐺; 𝐷) + 𝜆𝑓𝑚 ℒFM (𝐺; 𝐷) + 𝜆𝑚𝑒𝑙 ℒMel (𝐺)
The mel spectrogram loss is used for helping the
generator to produce a realistic waveform ℒD = ℒAdv (𝐷; 𝐺)(5)
Since the model has sub discriminators such as MPD and MSD,
corresponding to the input. The loss will be the
Equations 4, 5 are converted with respect to the sub discriminators
distance between the mel spectrogram of a waveform
where Dk denotes the k-th sub-discriminators.
generated and the ground truth waveform. The mel
ℒG = ∑(ℒadv (𝐺; 𝐷) + 𝜆𝑓𝑚 ℒFM (𝐺; 𝐷𝑘 ) + 𝜆𝑚𝑒𝑙 ℒMel (𝐺) (6)
Model Speed on CPU Speed on GPU # Param
MOS
(kHz) (kHz) (M)
Ground - - - ℒD = ∑(ℒAdv (𝐷𝑘 ; 𝐺) (7)
4.45(±0.06)
Truth
WaveNet 4.02(±0.08) - 0.07 (×0.003) 24.73 Results of the Hi-Fi GAN
(MoL) The HiFi GAN model was trained on two datasets, the LJSpeech
WaveGlow 3.81(±0.08) 4.72 (×0.21) 501 (×22.75) 87.73 dataset and the VCTK multi-speaker dataset. The LJSpeech dataset
consists of 13,100 audio clips for a single speaker. The Mean Opinion
MelGAN 3.79(±0.09) 145.52 14,238(×645.73) 4.26
Score (MOS) was used for evaluating the performance, fifty audio
HiFi-GAN 4.36(±0.07) 31.74(× 1.43) 3,701(×167.86) 13.92 clips were selected randomly from the LJSpeech dataset. As shown in
V1 table 1, three variants were used for the Hi-Fi GAN each were trained
HiFi-GAN 4.23(±0.07) 214.97(× 9.74) 16,863(×764.80) 0.92 using different hyperparamets. The three variants of the Hi-Fi GAN
V2 model outperformed all other models.
HiFi-GAN 4.05(±0.08) 296.38(× 13.44) 26,169(×1,186.80) 1.46
V3 C. TacoTron 2
Table 1. Comparison between models MOS and speed
This paper introduces a neural network architecture called
Tacotron2, developed by Nvidia. The model is used to
synthesize speech from text by generating mel spectrogram can be taken in both forward and backward direction, this
from the input text using encoder-decoder architecture, this is increases the amount of information for the network.
done by mapping character embeddings to mel-scale. Then, as The decoder is an autoregressive recurrent neural network.
shown in figure 3, a WaveNet model uses the mel spectrogram First, the previous time step prediction is passed through a 2
for synthesizing time domain waveforms. In other words, fully connected with 256 hidden ReLU units defined as pre-net.
transforms mel spectrogram to speech audio. It achieved a The output of the pre-net and the attention context are
concatenated and passed to 2 LSTM layers. Finally, the
concatenation of the 2 LSTM layers output and the attention
context is passed through Linear transform for predicting the
mel spectrogram [4].
III. WORD EMBEDDING
Word Embedding solves the problem of creating an
efficient learnable approach for relationships between words.
Hence, it converts the plain text into 𝑛 − 𝑑𝑖𝑚𝑒𝑛𝑠𝑖𝑜𝑛𝑎𝑙 vectors.
This step makes the words numerical and there can be a
mathematical relationship between them such as similarity
which can be measured by distance. One example for a distance
which is commonly used is the Euclidean distance as shown in
Figure 4 Skip-gram and CBOW architecture side-by-side formula eight.
𝑑(𝑥, 𝑦) = √∑(𝑥𝑖 − 𝑦𝑖 ) (8)
Mean Opinion Score (MOS) 4.53 compared to 4.58 for a
professional recorded speech. The aim of Tacotron2 is to However, the Euclidian distance is very sensitive. For
synthesize high quality speech that cannot be easily example, two documents can be similar, but the size difference
distinguished from human speech [4]. is so high, hence that dimension will have a considerable cost
on the function. Hence, Cosine similarity is a good solution
since it calculates the cosine of the angle between two variables.
Thus, cosine similarity is more convenient that the angles can
give more accurate representation of the similarity. The cosine
similarity can be defined by formula nine.
𝐴 .𝐵
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝐴, 𝐵) = (9)
||𝐴|| × ||𝐵||
A. Continous Bag of Words
The CBOW is a model that can predict words based on the
given context. Hence, it predicts the probability of the centroid
word in the input (window size) from other variables. The
architecture is simple as it is consisting of three layers (only one
hidden). Thus, it learns fast as there are fewer trainable
parameters comparing with skip-gram. The CBOW is used
whenever the context is available and there is a missing word
[5]. The architecture for CBOW is visualized in figure five.

Figure 3 TacoTron2 Architecture

As shown in figure three. A learned 512-dimensional character

embedding is used to represent the input text and passed to 3
convolution layers where each layer contains 512 filters. In
addition, batch normalization is used followed by ReLU
activations. After the 3 convolution layers, the output is passed
through bidirectional LSTM (Long Short-Term Memory) for
generating the encoded features, the LSTM layer containing
512 units (256 are bidirectional). Bidirectional LSTM can be
defined as duplication the recurrent layer in reverse, so the input
Besides, we used the most similar which is a function that gives
words like the input and the function of cosine similarity that
gives similarity index between two given words [7].
V. DATASET
The dataset used for the word embedding is LJSpeech
1.1 dataset which contains audio files with their respective
transcript. It has total of 13,100 short audio clip for the same
person. The total duration for all clips combined is 24 hours.
Each small passage was extracted from non-fiction books. Our
word embedding results from this dataset was very
inconvenient. The corpus and the available word in the corpus
are very little and unrelated to each other. Thus, on both models,
Skip-gram and CBOW, did not show reasonable on our analogy
analysis. We conclude that we need larger dataset that contains
bigger corpus to make reasonable analogy and word
relationship.
Figure 5 CBOW Architecture

VI. CONCLUSION
B. Skip-Gram In conclusion, our literature review shows an
Like CBOW the skip-gram is a word-to-vector model. The advantage for the non-auto-regressive vocoder over the auto-
difference is that it inverses the operation. Furthermore, it regressive vocoder by comparing mel-GAN and HiFi-GAN
predicts a context given a word [6]. Intuitively, it is architecture with waveNet and waveglow. We concluded that HiFi-GAN
is the same of CBOW but mirrored as shown in Figure four. In outmatches its competitors in the MOS metric. Which is a good
conclusion, the CBOW is faster and extracts only one word metric based on people’s opinions which can be a good
given context, while the skip-gram is slower and extracts a indicator of naturalness. TacoTron2 is used as pretrained model
context given only one word. that outputs acoustic features that can be supplied to the
vocoder as it specifically outputs a mel-spectogram. Which can
be considered as steppingstone for testing different vocoder.
IV. IMPLEMENTATION DETAILS TacoTron2 be default is connected to waveNet Vocoder which
In phase 1, only the word embedding is implemented. The shown in the literature review section that it is outperformed by
implementation is done through Python 3 supplied with utility HiFi-GAN. For Further research, we advise changing the
libraries. The implementation is available online on this link architecture of the TacoTron2 instead of using a WaveNet
Ahmed181532 and Ahmed181532.ipynb - Colaboratory vocoder it can be a Hifi-GAN vocoder.
(google.com)
REFERENCES

A. Data Cleaning and Preprocessing [1] KHAIRALLAH, MONCEF, ET AL. "NUMBER OF PEOPLE BLIND OR VISUALLY
IMPAIRED BY CATARACT WORLDWIDE AND IN WORLD REGIONS, 1990 TO
The data cleaning was converting all letters to lower case 2010." INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE 56.11
and removing the stop words and punctuation from the string (2015): 6762-6769.
using Gensim, NLTK, and string liberary. After that the words [2] KONG, JUNGIL, JAEHYEON KIM, AND JAEKYOUNG BAE. "HIFI-GAN:
GENERATIVE ADVERSARIAL NETWORKS FOR EFFICIENT AND HIGH
are stemmed i.e. returned back to its roots. Finally, the data is FIDELITY SPEECH SYNTHESIS." ADVANCES IN NEURAL INFORMATION
cleaned and tokinized; now, it is ready to be inserted to model. PROCESSING SYSTEMS 33 (2020): 17022-17033.
[3] OORD, AARON VAN DEN, ET AL. "WAVENET: A GENERATIVE MODEL FOR
B. Gensim Library RAW AUDIO." ARXIV PREPRINT ARXIV:1609.03499 (2016).
The Gensim library offers a very helpful tools that facilitate [4] Shen, J., Pang, R., Weiss, R. J., Schuster, M., Jaitly, N., Yang, Z., Chen,
the word embedding. We used this library to train both CBOW Z., Zhang, Y., Wang, Y., Skerrv-Ryan, R., Saurous, R. A.,
and Skip-Gram. By referring to its implementation form “wv” Agiomvrgiannakis, Y., & Wu, Y. (2018). Natural TTS Synthesis by
Conditioning Wavenet on MEL Spectrogram Predictions. ICASSP, IEEE
which stands for word-to-vector it shows how to train the International Conference on Acoustics, Speech and Signal Processing -
model. The function used to train was “Word2Vec”, the hyper Proceedings, 2018-April. https://doi.org/10.1109/ICASSP.2018.8461368
parameter it gets is the data. The other hyper parameters are
window size, min count i.e., ignores the words if it has lower [5] LIU, BING. "TEXT SENTIMENT ANALYSIS BASED ON CBOW MODEL AND
frequency given, vector size which is size of the embedding DEEP LEARNING IN BIG DATA ENVIRONMENT." JOURNAL OF AMBIENT
INTELLIGENCE AND HUMANIZED COMPUTING 11.2 (2020): 451-458..
vectors, number of epochs and number of workers for parallel
processing. Finally, the last parameter is “sg” by default it is 0
i.e., trains a CWOB model, otherwise, trains a skip gram model.
[6] MCCORMICK, CHRIS. "WORD2VEC TUTORIAL-THE SKIP-GRAM [7] ŘEHŮŘEK, RADIM, AND PETR SOJKA. "GENSIM—STATISTICAL
MODEL." APR-2016.[ONLINE]. AVAILABLE: HTTP://MCCORMICKML. SEMANTICS IN PYTHON." RETRIEVED FROM GENISM. ORG (2011).
COM/2016/04/19/WORD2VEC-TUTORIAL-THE-SKIP-GRAM-MODEL (2016).

Multilingual Text-To-Speech Training Using Cross Language Voice Conversion and Self-Supervised Learning of Speech Representations
No ratings yet
Multilingual Text-To-Speech Training Using Cross Language Voice Conversion and Self-Supervised Learning of Speech Representations
5 pages
Suoni
No ratings yet
Suoni
38 pages
Pfa Inr
No ratings yet
Pfa Inr
75 pages
Thesis
No ratings yet
Thesis
37 pages
ISM Report Final
No ratings yet
ISM Report Final
33 pages
NeurIPS 2020 Hifi Gan Generative Adversarial Networks For Efficient and High Fidelity Speech Synthesis Paper
No ratings yet
NeurIPS 2020 Hifi Gan Generative Adversarial Networks For Efficient and High Fidelity Speech Synthesis Paper
12 pages
Multi-Band Melgan Faster Waveform Generation For High-Quality Text-To-Speech
No ratings yet
Multi-Band Melgan Faster Waveform Generation For High-Quality Text-To-Speech
7 pages
Audio Gen
No ratings yet
Audio Gen
16 pages
Audio Wave Net
No ratings yet
Audio Wave Net
15 pages
2023 Emnlp-Main 990
No ratings yet
2023 Emnlp-Main 990
13 pages
Deep Learning-Based Analysis of A Real-Time Voice Cloning System
No ratings yet
Deep Learning-Based Analysis of A Real-Time Voice Cloning System
6 pages
Multimodal Approach For DeepFake Detection
No ratings yet
Multimodal Approach For DeepFake Detection
9 pages
Fast and Lightweight On-Device TTS With Tacotron2 and LPCNet
No ratings yet
Fast and Lightweight On-Device TTS With Tacotron2 and LPCNet
5 pages
Wave Tacotron Spectrogram Free End To End Text To Speech Synthesis
No ratings yet
Wave Tacotron Spectrogram Free End To End Text To Speech Synthesis
5 pages
High Fidelity Neural Audio Compression: Alexandre Défossez
No ratings yet
High Fidelity Neural Audio Compression: Alexandre Défossez
19 pages
Refinegan: Universally Generating Waveform Better Than Ground Truth With Highly Accurate Pitch and Intensity Responses
No ratings yet
Refinegan: Universally Generating Waveform Better Than Ground Truth With Highly Accurate Pitch and Intensity Responses
5 pages
Datasheet PDF
No ratings yet
Datasheet PDF
6 pages
Discrete Time Signal Processing - Oppenheim
No ratings yet
Discrete Time Signal Processing - Oppenheim
75 pages
3 Gan
No ratings yet
3 Gan
12 pages
Conditional Variational Autoencoder With Adversarial Learning For End-to-End Text-to-Speech
No ratings yet
Conditional Variational Autoencoder With Adversarial Learning For End-to-End Text-to-Speech
15 pages
Tacotron 2
No ratings yet
Tacotron 2
5 pages
Tacotron 2
No ratings yet
Tacotron 2
5 pages
Dversarial Udio Ynthesis: Chris Donahue Julian Mcauley Miller Puckette
No ratings yet
Dversarial Udio Ynthesis: Chris Donahue Julian Mcauley Miller Puckette
16 pages
Glow Wavegan
No ratings yet
Glow Wavegan
5 pages
HiFi GAN
No ratings yet
HiFi GAN
14 pages
Audio GAN
No ratings yet
Audio GAN
2 pages
Complete Data Science, Machine Learning, DL, NLP Bootcamp - Udemy Business
No ratings yet
Complete Data Science, Machine Learning, DL, NLP Bootcamp - Udemy Business
25 pages
SingGANGenerativeAdversarialNetworkForHigh Fidelity
No ratings yet
SingGANGenerativeAdversarialNetworkForHigh Fidelity
11 pages
1 Base
No ratings yet
1 Base
5 pages
F - S: L L L M A M T - S S: ISH Peech Everaging Arge Anguage Odels For Dvanced Ultilingual EXT TO Peech Ynthesis
No ratings yet
F - S: L L L M A M T - S S: ISH Peech Everaging Arge Anguage Odels For Dvanced Ultilingual EXT TO Peech Ynthesis
11 pages
Literature Survey
No ratings yet
Literature Survey
6 pages
Gen AI Unit 1
100% (1)
Gen AI Unit 1
86 pages
Lecture 10 - Text To Speech
No ratings yet
Lecture 10 - Text To Speech
76 pages
High-Fidelity Audio Compression With Improved RVQGAN: Rithesh Kumar Prem Seetharaman
No ratings yet
High-Fidelity Audio Compression With Improved RVQGAN: Rithesh Kumar Prem Seetharaman
14 pages
Low Resource Text To Speech Synthesis
No ratings yet
Low Resource Text To Speech Synthesis
15 pages
Melgan: Generative Adversarial Networks For Conditional Waveform Synthesis
No ratings yet
Melgan: Generative Adversarial Networks For Conditional Waveform Synthesis
14 pages
Audio Representations For Deep Learning in Sound Synthesis A Review
No ratings yet
Audio Representations For Deep Learning in Sound Synthesis A Review
8 pages
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
No ratings yet
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
9 pages
BemaGANv2 A Vocoder With Superior Periodicity Capture For Long-Term Audio Generation
No ratings yet
BemaGANv2 A Vocoder With Superior Periodicity Capture For Long-Term Audio Generation
5 pages
DRL VXV
No ratings yet
DRL VXV
5 pages
Minimax Speech
No ratings yet
Minimax Speech
20 pages
Amanda Cardoso Duarte WAV2PIX Speech-Conditioned Face Generation Using Generative Adversarial Networks CVPRW 2019 Paper
No ratings yet
Amanda Cardoso Duarte WAV2PIX Speech-Conditioned Face Generation Using Generative Adversarial Networks CVPRW 2019 Paper
4 pages
Gokul Karthik Kumar Praveen S V Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar
No ratings yet
Gokul Karthik Kumar Praveen S V Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar
8 pages
Multi-Instrument Music Synthesis With Spectrogram Diffusion
No ratings yet
Multi-Instrument Music Synthesis With Spectrogram Diffusion
12 pages
Keywords
No ratings yet
Keywords
4 pages
TAM GAN - Tamil Text To Naturalistic Image Synthesis Using Conventional Deep Adversarial Networks - 3584019
No ratings yet
TAM GAN - Tamil Text To Naturalistic Image Synthesis Using Conventional Deep Adversarial Networks - 3584019
18 pages
Research Paper 3
No ratings yet
Research Paper 3
11 pages
NVIDIA NeMo Audio Codec 44khz
No ratings yet
NVIDIA NeMo Audio Codec 44khz
7 pages
Seminar Report Parthiv
No ratings yet
Seminar Report Parthiv
58 pages
Informatics 08 00084
No ratings yet
Informatics 08 00084
15 pages
Text To Speech IT
No ratings yet
Text To Speech IT
3 pages
Phonetics 2
No ratings yet
Phonetics 2
14 pages
Seminar Report Final
No ratings yet
Seminar Report Final
37 pages
U 4
No ratings yet
U 4
8 pages
Imp Tts
No ratings yet
Imp Tts
4 pages
DL For Acoustics
No ratings yet
DL For Acoustics
4 pages
Speech Synthesis
No ratings yet
Speech Synthesis
4 pages
Final Intro AIReport
No ratings yet
Final Intro AIReport
9 pages
AI Audio Deepfake
No ratings yet
AI Audio Deepfake
18 pages
Btech Cs 7 Sem Deep Learning
No ratings yet
Btech Cs 7 Sem Deep Learning
3 pages
A Comprehensive Roadmap To Mastery in AI, ML, DS, DA, DSA & LLMs
No ratings yet
A Comprehensive Roadmap To Mastery in AI, ML, DS, DA, DSA & LLMs
24 pages
NN & DL Mid-1 Objective (Cse)
No ratings yet
NN & DL Mid-1 Objective (Cse)
8 pages
Question Bank New
No ratings yet
Question Bank New
3 pages
Deep Learning in Mobile and Wireless Networking: A Survey
No ratings yet
Deep Learning in Mobile and Wireless Networking: A Survey
67 pages
NN DL
No ratings yet
NN DL
54 pages
Machine Learning For Predictive Maintenance Applic
No ratings yet
Machine Learning For Predictive Maintenance Applic
11 pages
Sales Forecasting Using ML Paper
No ratings yet
Sales Forecasting Using ML Paper
7 pages
Poverty Cause and Effect Essay
100% (2)
Poverty Cause and Effect Essay
8 pages
Deep Learning Algorithms For Cyber-Bulling Detection in Social Media Platfo 20250424 135605 0000
No ratings yet
Deep Learning Algorithms For Cyber-Bulling Detection in Social Media Platfo 20250424 135605 0000
8 pages
Next Word Prediction Using Machine Learning Techniques: Cybersecurity November 2022
No ratings yet
Next Word Prediction Using Machine Learning Techniques: Cybersecurity November 2022
12 pages
Thesis - Anomaly Detection
No ratings yet
Thesis - Anomaly Detection
57 pages
Chandrasekaran, R., & Paramasivan, S. K. (2022) - A State-Of-The-Art Review of Time Series Forecasting Using Deep Learning Approaches.
No ratings yet
Chandrasekaran, R., & Paramasivan, S. K. (2022) - A State-Of-The-Art Review of Time Series Forecasting Using Deep Learning Approaches.
14 pages
Job Runtime Prediction of HPC Cluster Based On PC-Transformer
No ratings yet
Job Runtime Prediction of HPC Cluster Based On PC-Transformer
27 pages
Tensorflow Developer Certificate: Candidate Handbook
No ratings yet
Tensorflow Developer Certificate: Candidate Handbook
9 pages
Image Caption Generator Using Deep Learning
No ratings yet
Image Caption Generator Using Deep Learning
9 pages
Ca 4 NLP Report - 1
No ratings yet
Ca 4 NLP Report - 1
21 pages
Pakistan Wheat
No ratings yet
Pakistan Wheat
9 pages
So, You Are Working On A Machine Learning Problem...
No ratings yet
So, You Are Working On A Machine Learning Problem...
36 pages
Bilingual Machine Translation
No ratings yet
Bilingual Machine Translation
8 pages
Portfolio Optimization-Based Stock Prediction Using Long-Short Term Memory Network in Quantitative Trading
No ratings yet
Portfolio Optimization-Based Stock Prediction Using Long-Short Term Memory Network in Quantitative Trading
20 pages
Overview of EmoThreat: Emotions and Threat Detection in Urdu at FIRE 2022-T4-1
No ratings yet
Overview of EmoThreat: Emotions and Threat Detection in Urdu at FIRE 2022-T4-1
11 pages
An Optimized Complementary Prediction Method Based On Data Feature Extraction For Wind Speed Forecasting
No ratings yet
An Optimized Complementary Prediction Method Based On Data Feature Extraction For Wind Speed Forecasting
12 pages
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
No ratings yet
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
11 pages
Deep Learning For Predictions in Emerging Currency Markets: Svitlana Galeshchuk and Sumitra Mukherjee
No ratings yet
Deep Learning For Predictions in Emerging Currency Markets: Svitlana Galeshchuk and Sumitra Mukherjee
6 pages
Preprints202106 0242 v11
No ratings yet
Preprints202106 0242 v11
19 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
2 pages
Research Article Analysis of DNA Sequence Classification Using CNN and Hybrid Models
No ratings yet
Research Article Analysis of DNA Sequence Classification Using CNN and Hybrid Models
12 pages
Deep Learning in Natural Language Processing A State-of-the-Art Survey
No ratings yet
Deep Learning in Natural Language Processing A State-of-the-Art Survey
6 pages