0% found this document useful (0 votes)

17 views24 pages

Unit Iii

Uploaded by

prathmeshbajpai123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views24 pages

Unit Iii

Uploaded by

prathmeshbajpai123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Generative Adversarial Network (GAN)

GAN(Generative Adversarial Network) represents a cutting-edge approach to generative modeling within

deep learning, often leveraging architectures like convolutional neural networks. The goal of generative
modeling is to autonomously identify patterns in input data, enabling the model to produce new examples
that feasibly resemble the original dataset.

Generative Adversarial Networks (GANs) are a powerful class of neural networks that are used for an
unsupervised learning. GANs are made up of two neural networks, a discriminator and a generator. They
use adversarial training to produce artificial data that is identical to actual data.

The Generator attempts to fool the Discriminator, which is tasked with accurately distinguishing between
produced and genuine data, by producing random noise samples.Realistic, high-quality samples are
produced as a result of this competitive interaction, which drives both networks toward advancement.
GANs are proving to be highly versatile artificial intelligence tools, as evidenced by their extensive use in
image synthesis, style transfer, and text-to-image synthesis. They have also revolutionized generative
modeling.

Generative Adversarial Networks (GANs) can be broken down into three parts

Generative: To learn a generative model, which describes how data is generated in terms of a
probabilistic model.

Adversarial: The word adversarial refers to setting one thing up against another. This means that, in the
context of GANs, the generative result is compared with the actual images in the data set. A mechanism
known as a discriminator is used to apply a model that attempts to distinguish between real and fake
images.

Networks: Use deep neural networks as artificial intelligence (AI) algorithms for training purposes

Architecture of GANs

A Generative Adversarial Network (GAN) is composed of two primary parts, which are the Generator
and the Discriminator.

Generator Model

The training data’s underlying distribution is captured by layers of learnable parameters in its design
through training. The generator adjusts its output to produce samples that closely mimic real data as it is
being trained by using backpropagation to fine-tune its parameters.

The generator’s ability to generate high-quality, varied samples that can fool the discriminator is what
makes it successful.
Generator Loss

The objective of the generator in a GAN is to produce synthetic samples that are realistic enough to fool
the discriminator. The generator achieves this by minimizing its loss function J. The loss is minimized
when the log probability is maximized, i.e., when the discriminator is highly likely to classify the
generated samples as real.

Discriminator Model

An artificial neural network called a discriminator model is used in Generative Adversarial Networks
(GANs) to differentiate between generated and actual input. By evaluating input samples and allocating
probability of authenticity, the discriminator functions as a binary classifier.

Over time, the discriminator learns to differentiate between genuine data from the dataset and artificial
samples created by the generator. This allows it to progressively hone its parameters and increase its level
of proficiency.

Convolutional layers or pertinent structures for other modalities are usually used in its architecture when
dealing with picture data. Maximizing the discriminator’s capacity to accurately identify generated
samples as fraudulent and real samples as authentic is the aim of the adversarial training procedure. The
discriminator grows increasingly discriminating as a result of the generator and discriminator’s
interaction, which helps the GAN produce extremely realistic-looking synthetic data overall. The
discriminator aims to reduce this loss by accurately identifying artificial and real samples.

Discriminator Loss

The discriminator reduces the negative log likelihood of correctly classifying both produced and real
samples. This loss incentivizes the discriminator to accurately categorize generated samples as fake and
real samples.

The formula for generator and discriminator loss are given below:
● log D(G(zi)) represents log probability of the discriminator being
correct for generated samples.The generator aims to minimize this
loss, encouraging the production of samples that the discriminator
classifies as real (logD(G(zi)) close to 1.
● JD assesses the discriminator’s ability to discern between produced
and actual samples.

MinMax Loss
In a Generative Adversarial Network (GAN), the minimax loss formula is
provided by:

Where,

● G is generator network and is D is the discriminator network

● Actual data samples obtained from the true data distribution pdata(x)
are represented by x.
● Random noise sampled from a previous distribution pz(z) (usually a
normal or uniform distribution) is represented by z.
● D(x) represents the discriminator’s likelihood of correctly identifying

actual data as real.

● D(G(z)) is the likelihood that the discriminator will identify generated

data coming from the generator as authentic.

Types of GANs

Vanilla GAN: This is the simplest type of GAN. Here, the Generator and the Discriminator are simple a
basic multi-layer perceptrons. In vanilla GAN, the algorithm is really simple, it tries to optimize the
mathematical equation using stochastic gradient descent.

Conditional GAN (CGAN): CGAN can be described as a deep learning method in which some
conditional parameters are put into place. In CGAN, an additional parameter ‘y’ is added to the Generator
for generating the corresponding data. Labels are also put into the input to the Discriminator in order for
the Discriminator to help distinguish the real data from the fake generated data.

Deep Convolutional GAN (DCGAN): DCGAN is one of the most popular and also the most successful
implementations of GAN. It is composed of ConvNets in place of multi-layer perceptrons. The ConvNets
are implemented without max pooling, which is in fact replaced by convolutional stride.

Also, the layers are not fully connected.

Laplacian Pyramid GAN (LAPGAN): The Laplacian pyramid is a linear invertible image
representation consisting of a set of band-pass images, spaced an octave apart, plus a low-frequency
residual. This approach uses multiple numbers of Generator and Discriminator networks and different
levels of the Laplacian Pyramid. This approach is mainly used because it produces very high-quality
images. The image is down-sampled at first at each layer of the pyramid and then it is again up-scaled at
each layer in a backward pass where the image acquires some noise from the Conditional GAN at these
layers until it reaches its original size.

Super Resolution GAN (SRGAN): SRGAN as the name suggests is a way of designing a GAN in which
a deep neural network is used along with an adversarial network in order to produce higher-resolution
images. This type of GAN is particularly useful in optimally up-scaling native low-resolution images to
enhance their details minimizing errors while doing so.

Architecture of GANs

A Generative Adversarial Network (GAN) is composed of two primary parts, which are the Generator
and the Discriminator.

Generator Model

A key element responsible for creating fresh, accurate data in a Generative Adversarial Network (GAN)
is the generator model. The generator takes random noise as input and converts it into complex data
samples, such text or images. It is commonly depicted as a deep neural network.The training data’s
underlying distribution is captured by layers of learnable parameters in its design through training. The
generator adjusts its output to produce samples that closely mimic real data as it is being trained by using
backpropagation to fine-tune its parameters.The generator’s ability to generate high-quality, varied
samples that can fool the discriminator is what makes it successful.

Generator Loss

The objective of the generator in a GAN is to produce synthetic samples that are realistic enough to fool
the discriminator. The generator achieves this by minimizing its loss function . The loss is minimized
when the log probability is maximized, i.e., when the discriminator is highly likely to classify the
generated samples as real. The following equation is given below:

How does a GAN work?

The steps involved in how a GAN works:

Initialization: Two neural networks are created: a Generator (G) and a Discriminator (D).

G is tasked with creating new data, like images or text, that closely resembles real data.

D acts as a critic, trying to distinguish between real data (from a training dataset) and the data generated
by G.

Generator’s First Move: G takes a random noise vector as input. This noise vector contains random
values and acts as the starting point for G’s creation process. Using its internal layers and learned
patterns, G transforms the noise vector into a new data sample, like a generated image.

Discriminator’s Turn: D receives two kinds of inputs:Real data samples from the training dataset.

The data samples generated by G in the previous step. D’s job is to analyze each input and determine
whether it’s real data or something G cooked up. It outputs a probability score between 0 and 1. A score
of 1 indicates the data is likely real, and 0 suggests it’s fake.

The Learning Process: Now, the adversarial part comes in:If D correctly identifies real data as real
(score close to 1) and generated data as fake (score close to 0), both G and D are rewarded to a small
degree. This is because they’re both doing their jobs well. However, the key is to continuously improve.
If D consistently identifies everything correctly, it won’t learn much. So, the goal is for G to eventually
trick D.

Generator’s Improvement:

When D mistakenly labels G’s creation as real (score close to 1), it’s a sign that G is on the right track. In
this case, G receives a significant positive update, while D receives a penalty for being fooled. This
feedback helps G improve its generation process to create more realistic data.

Discriminator’s Adaptation:
Conversely, if D correctly identifies G’s fake data (score close to 0), but G receives no reward, D is
further strengthened in its discrimination abilities. This ongoing duel between G and D refines both
networks over time. As training progresses, G gets better at generating realistic data, making it harder for
D to tell the difference. Ideally, G becomes so adept that D can’t reliably distinguish real from fake data.
At this point, G is considered well-trained and can be used to generate new, realistic data samples.

Discriminator

A discriminator that tells how real an image is, is basically a deep Convolutional Neural Network (CNN)
as shown in Figure 1. For MNIST Dataset, the input is an image (28 pixel x 28 pixel x 1 channel). The
sigmoid output is a scalar value of the probability of how real the image is (0.0 is certainly fake, 1.0 is
certainly real, anything in between is a gray area). The difference from a typical CNN is the absence of
max-pooling in between layers. Instead, a strided convolution is used for downsampling. The activation
function used in each CNN layer is a leaky ReLU. A dropout between 0.4 and 0.7 between layers prevent
over fitting and memorization. Listing 1 shows the implementation in Keras.

Figure 1. Discriminator of DCGAN tells how real an input image of a digit is. MNIST Dataset is used as
ground truth for real images. Strided convolution instead of max-pooling down samples the image.

self.D = Sequential()

depth = 64

dropout = 0.4

# In: 28 x 28 x 1, depth = 1

# Out: 14 x 14 x 1, depth=64

input_shape = (self.img_rows, self.img_cols, self.channel)

self.D.add(Conv2D(depth*1, 5, strides=2, input_shape=input_shape,\

padding='same', activation=LeakyReLU(alpha=0.2)))

self.D.add(Dropout(dropout))

self.D.add(Conv2D(depth*2, 5, strides=2, padding='same',\

activation=LeakyReLU(alpha=0.2)))
self.D.add(Dropout(dropout))

self.D.add(Conv2D(depth*4, 5, strides=2, padding='same',\

activation=LeakyReLU(alpha=0.2)))

self.D.add(Dropout(dropout))

self.D.add(Conv2D(depth*8, 5, strides=1, padding='same',\

activation=LeakyReLU(alpha=0.2)))

self.D.add(Dropout(dropout))

# Out: 1-dim probability

self.D.add(Flatten())

self.D.add(Dense(1))

self.D.add(Activation('sigmoid'))

self.D.summary()

Generator

The generator synthesizes fake images. In Figure 2, the fake image is generated from a 100-dimensional
noise (uniform distribution between -1.0 to 1.0) using the inverse of convolution, called transposed
convolution. Instead of fractionally-strided convolution as suggested in DCGAN, upsampling between the
first three layers is used since it synthesizes more realistic handwriting images. In between layers, batch
normalization stabilizes learning. The activation function after each layer is a ReLU. The output of the
sigmoid at the last layer produces the fake image. Dropout of between 0.3 and 0.5 at the first layer
prevents overfitting. Listing 2 shows the implementation in Keras.

Figure 2. Generator model synthesizes fake MNIST images from noise. Upsampling is used instead of
fractionally-strided transposed convolution.

self.G = Sequential()

dropout = 0.4
depth = 64+64+64+64

dim = 7

# In: 100

# Out: dim x dim x depth

self.G.add(Dense(dim*dim*depth, input_dim=100))

self.G.add(BatchNormalization(momentum=0.9))

self.G.add(Activation('relu'))

self.G.add(Reshape((dim, dim, depth)))

self.G.add(Dropout(dropout))

# In: dim x dim x depth

# Out: 2dim x 2dim x depth/2

self.G.add(UpSampling2D())

self.G.add(Conv2DTranspose(int(depth/2), 5, padding='same'))

self.G.add(BatchNormalization(momentum=0.9))

self.G.add(Activation('relu'))

self.G.add(UpSampling2D())

self.G.add(Conv2DTranspose(int(depth/4), 5, padding='same'))

self.G.add(BatchNormalization(momentum=0.9))

self.G.add(Activation('relu'))

self.G.add(Conv2DTranspose(int(depth/8), 5, padding='same'))

self.G.add(BatchNormalization(momentum=0.9))

self.G.add(Activation('relu'))

# Out: 28 x 28 x 1 grayscale image [0.0,1.0] per pix

self.G.add(Conv2DTranspose(1, 5, padding='same'))

self.G.add(Activation('sigmoid'))

self.G.summary()
return self.G

GAN Model

So far, there are no models yet. It is time to build the models for training. We need two models: 1)
Discriminator Model (the police) and 2) Adversarial Model or Generator-Discriminator (the counterfeiter
learning from the police).

Discriminator Model

Listing 3 shows the Keras code for the Discriminator Model. It is the Discriminator described above with
the loss function defined for training. Since the output of the Discriminator is sigmoid, we use binary
cross entropy for the loss. RMSProp as optimizer generates more realistic fake images compared to Adam
for this case. Learning rate is 0.0008. Weight decay and clip value stabilize learning during the latter part
of the training. You have to adjust the decay if you adjust the learning rate.

optimizer = RMSprop(lr=0.0008, clipvalue=1.0, decay=6e-8)

self.DM = Sequential()

self.DM.add(self.discriminator())

self.DM.compile(loss='binary_crossentropy', optimizer=optimizer,\

metrics=['accuracy'])

Listing 3. Discriminator Model implemented in Keras.

Adversarial Model

The adversarial model is just the generator-discriminator stacked together as shown in Figure 3. The
Generator part is trying to fool the Discriminator and learning from its feedback at the same time. Listing
4 shows the implementation using Keras code. The training parameters are the same as in the
Discriminator model except for a reduced learning rate and corresponding weight decay.
Figure 3. The Adversarial model is simply generator with its output connected to the input of the
discriminator. Also shown is the training process wherein the Generator labels its fake image output with
1.0 trying to fool the Discriminator.

optimizer = RMSprop(lr=0.0004, clipvalue=1.0, decay=3e-8)

self.AM = Sequential()

self.AM.add(self.generator())

self.AM.add(self.discriminator())

self.AM.compile(loss='binary_crossentropy', optimizer=optimizer,\

metrics=['accuracy'])

Listing 4. Adversarial Model as shown in Figure 3 implemented in Keras.

Training

Training is the hardest part. We determine first if Discriminator model is correct by training it alone with
real and fake images. Afterwards, the Discriminator and Adversarial models are trained one after the
other. Figure 4 shows the Discriminator Model while Figure 3 shows the Adversarial Model during
training. Listing 5 shows the training code in Keras.
Figure 4. Discriminator model is trained to distinguish real from fake handwritten images.

images_train = self.x_train[np.random.randint(0,

self.x_train.shape[0], size=batch_size), :, :, :]

noise = np.random.uniform(-1.0, 1.0, size=[batch_size, 100])

images_fake = self.generator.predict(noise)

x = np.concatenate((images_train, images_fake))

y = np.ones([2*batch_size, 1])

y[batch_size:, :] = 0

d_loss = self.discriminator.train_on_batch(x, y)

y = np.ones([batch_size, 1])

noise = np.random.uniform(-1.0, 1.0, size=[batch_size, 100])

a_loss = self.adversarial.train_on_batch(noise, y)

Listing 5. Sequential training of Discriminator Model and Adversarial Model. Training steps above 1000
generates respectable outputs.

Sample Output

What are Optimizers in Deep Learning?

In deep learning, optimizers are crucial as algorithms that dynamically fine-tune a model’s parameters
throughout the training process, aiming to minimize a predefined loss function. These specialized
algorithms facilitate the learning process of neural networks by iteratively refining the weights and biases
based on the feedback received from the data. Well-known optimizers in deep learning encompass
Stochastic Gradient Descent (SGD), Adam, and RMSprop, each equipped with distinct update rules,
learning rates, and momentum strategies, all geared towards the overarching goal of discovering and
converging upon optimal model parameters, thereby enhancing overall performance.

Choosing the Right Optimizer

Optimizer algorithms are optimization method that helps improve a deep learning model’s performance.
These optimization algorithms or optimizers widely affect the accuracy and speed training of the deep
learning model. But first of all, the question arises of what an optimizer is.
While training the deep learning optimizers model, modify each epoch’s weights and minimize the loss
function. An optimizer is a function or an algorithm that adjusts the attributes of the neural network, such
as weights and learning rates. Thus, it helps in reducing the overall loss and improving accuracy. The
problem of choosing the right weights for the model is a daunting task, as a deep learning model generally
consists of millions of parameters. It raises the need to choose a suitable optimization algorithm for your
application. Hence understanding these machine learning algorithms is necessary for data scientists before
having a deep dive into the field.

You can use different optimizers in the machine learning model to change your weights and learning rate.
However, choosing the best optimizer depends upon the application. When dealing with hundreds of
gigabytes of data, even a single epoch can take considerable time. So randomly choosing an algorithm is
no less than gambling with your precious time that you will realize sooner or later in your journey.

There are various deep-learning optimizers, such as Gradient Descent, Stochastic Gradient Descent,
Stochastic Gradient descent with momentum, Mini-Batch Gradient Descent, Adagrad, RMSProp,
AdaDelta, and Adam.

Important Deep Learning Terms

Before proceeding, there are a few terms that you should be familiar with.

• Epoch – The number of times the algorithm runs on the whole training dataset.

• Sample – A single row of a dataset.

• Batch – It denotes the number of samples to be taken to for updating the model parameters.

• Learning rate – It is a parameter that provides the model a scale of how much model weights
should be updated.

• Cost Function/Loss Function – A cost function is used to calculate the cost, which is the
difference between the predicted value and the actual value.

• Weights/ Bias – The learnable parameters in a model that controls the signal between two
neurons.

Gradient Descent Deep Learning Optimizer

Gradient Descent can be considered the popular kid among the class of optimizers in deep learning. This
optimization algorithm uses calculus to consistently modify the values and achieve the local minimum.
Before moving ahead, you might question what a gradient is.

In simple terms, consider you are holding a ball resting at the top of a bowl. When you lose the ball, it
goes along the steepest direction and eventually settles at the bottom of the bowl. A Gradient provides the
ball in the steepest direction to reach the local minimum which is the bottom of the bowl.
The above equation means how the gradient is calculated. Here alpha is the step size that represents how
far to move against each gradient with each iteration.

Gradient descent works as follows:

1. Initialize Coefficients: Start with initial coefficients.

2. Evaluate Cost: Calculate the cost associated with these coefficients.

3. Search for Lower Cost: Look for a cost value lower than the current one.

4. Update Coefficients: Move towards the lower cost by updating the coefficients’ values.

5. Repeat Process: Continue this process iteratively.

6. Reach Local Minimum: Stop when a local minimum is reached, where further cost reduction is
not possible.

Gradient descent works best for most purposes. However, it has some downsides too. It is expensive to
calculate the gradients if the size of the data is huge. Gradient descent works well for convex functions,
but it doesn’t know how far to travel along the gradient for nonconvex functions.

Stochastic Gradient Descent Deep Learning Optimizer

At the end of the previous section, you learned why there might be better options than using gradient
descent on massive data. To tackle the challenges large datasets pose, we have stochastic gradient
descent, a popular approach among optimizers in deep learning. The term stochastic denotes the element
of randomness upon which the algorithm relies. In stochastic gradient descent, instead of processing the
entire dataset during each iteration, we randomly select batches of data. This implies that only a few
samples from the dataset are considered at a time, allowing for more efficient and computationally
feasible optimization in deep learning models.

The procedure is first to select the initial parameters w and learning rate n. Qi represents function.Then
randomly shuffle the data at each iteration to reach an approximate minimum.

Since we are not using the whole dataset but the batches of it for each iteration, the path taken by the
algorithm is full of noise as compared to the gradient descent algorithm. Thus, SGD uses a higher number
of iterations to reach the local minima. Due to an increase in the number of iterations, the overall
computation time increases. But even after increasing the number of iterations, the computation cost is
still less than that of the gradient descent optimizer. So the conclusion is if the data is enormous and
computational time is an essential factor, stochastic gradient descent should be preferred over batch
gradient descent algorithm.

Stochastic Gradient Descent With Momentum Deep Learning Optimizer

Stochastic Gradient Descent requires a more significant number of iterations to reach the optimal
minimum, and hence, computation time is very slow. To overcome the problem, we use stochastic
gradient descent with a momentum algorithm.

In the above equation, vt is called velocity, and it accelerates gradients in the direction that leads to
convergence.

What the momentum does is helps in faster convergence of the loss function. Stochastic gradient descent
oscillates between either direction of the gradient and updates the weights accordingly. However, adding
a fraction of the previous update to the current update will make the process a bit faster. One thing that
should be remembered while using this algorithm is that the learning rate should be decreased with a high
momentum term.

Mini Batch Gradient Descent Deep Learning Optimizer

In this variant of gradient descent, instead of taking all the training data, only a subset of the dataset is
used for calculating the loss function. Since we are using a batch of data instead of taking the whole
dataset, fewer iterations are needed. That is why the mini-batch gradient descent algorithm is faster than
both stochastic gradient descent and batch gradient descent algorithms. This algorithm is more efficient
and robust than the earlier variants of gradient descent. As the algorithm uses batching, all the training
data need not be loaded in the memory, thus making the process more efficient to implement. Moreover,
the cost function in mini-batch gradient descent is noisier than the batch gradient descent algorithm but
smoother than that of the stochastic gradient descent algorithm. Because of this, mini-batch gradient
descent is ideal and provides a good balance between speed and accuracy.

Despite all that, the mini-batch gradient descent algorithm has some downsides too. It needs a
hyperparameter that is “mini-batch-size”, which needs to be tuned to achieve the required accuracy.
Although, the batch size of 32 is considered to be appropriate for almost every case. Also, in some cases,
it results in poor final accuracy. Due to this, there needs a rise to look for other alternatives too.

Adagrad (Adaptive Gradient Descent) Deep Learning Optimizer

The adaptive gradient descent algorithm is slightly different from other gradient descent algorithms. This
is because it uses different learning rates for each iteration. The change in learning rate depends upon the
difference in the parameters during training. The more the parameters get changed, the more minor the
learning rate changes. This modification is highly beneficial because real-world datasets contain sparse as
well as dense features. So it is unfair to have the same value of learning rate for all the features. The
Adagrad algorithm uses the below formula to update the weights. Here the alpha(t) denotes the different
learning rates at each iteration, n is a constant, and E is a small positive to avoid division by 0.

The benefit of using Adagrad is that it abolishes the need to modify the learning rate manually. It is more
reliable than gradient descent algorithms and their variants, and it reaches convergence at a higher speed.
One downside of the AdaGrad optimizer is that it decreases the learning rate aggressively and
monotonically. There might be a point when the learning rate becomes extremely small. This is because
the squared gradients in the denominator keep accumulating, and thus the denominator part keeps on
increasing. Due to small learning rates, the model eventually becomes unable to acquire more knowledge,
and hence the accuracy of the model is compromised.

RMS Prop (Root Mean Square) Deep Learning Optimizer

RMS prop is one of the popular optimizers among deep learning enthusiasts. This is maybe because it
hasn’t been published but is still very well-known in the community. RMS prop is ideally an extension of
the work RPPROP. It resolves the problem of varying gradients. The problem with the gradients is that
some of them were small while others may be huge. So, defining a single learning rate might not be the
best idea. RPPROP uses the gradient sign, adapting the step size individually for each weight. In this
algorithm, the two gradients are first compared for signs. If they have the same sign, we’re going in the
right direction, increasing the step size by a small fraction. If they have opposite signs, we must decrease
the step size. Then we limit the step size and can now go for the weight update.

The problem with RPPROP is that it doesn’t work well with large datasets and when we want to perform
mini-batch updates. So, achieving the robustness of RPPROP and the efficiency of mini-batches
simultaneously was the main motivation behind the rise of RMS prop. RMS prop is an advancement in
AdaGrad optimizer as it reduces the monotonically decreasing learning rate.

RMS Prop Formula

The algorithm mainly focuses on accelerating the optimization process by decreasing the number of
function evaluations to reach the local minimum. The algorithm keeps the moving average of squared
gradients for every weight and divides the gradient by the square root of the mean square.

where Beta is the forgetting factor. Weights are updated by the below formula.

In simpler terms, if there exists a parameter due to which the cost function oscillates a lot, we want to
penalize the update of this parameter. Suppose you built a model to classify a variety of fishes. The model
relies on the factor ‘color’ mainly to differentiate between the fishes. Due to this, it makes a lot of errors.
What RMS Prop does is, penalize the parameter ‘color’ so that it can rely on other features too. This
prevents the algorithm from adapting too quickly to changes in the parameter ‘color’ compared to other
parameters. This algorithm has several benefits as compared to earlier versions of gradient descent
algorithms. The algorithm converges quickly and requires lesser tuning than gradient descent algorithms
and their variants.

The problem with RMS Prop is that the learning rate has to be defined manually, and the suggested value
doesn’t work for every application.
AdaDelta Deep Learning Optimizer

AdaDelta can be seen as a more robust version of the AdaGrad optimizer. It is based upon adaptive
learning and is designed to deal with significant drawbacks of AdaGrad and RMS prop optimizer. The
main problem with the above two optimizers is that the initial learning rate must be defined manually.
One other problem is the decaying learning rate which becomes infinitesimally small at some point. Due
to this, a certain number of iterations later, the model can no longer learn new knowledge.

To deal with these problems, AdaDelta uses two state variables to store the leaky average of the second
moment gradient and a leaky average of the second moment of change of parameters in the model.

Here St and delta Xt denote the state variables, g’t denotes rescaled gradient, delta Xt-1 denotes squares
rescaled gradients, and epsilon represents a small positive integer to handle division by 0.

Adam Optimizer in Deep Learning

Adam optimizer, short for Adaptive Moment Estimation optimizer, is an optimization algorithm
commonly used in deep learning. It is an extension of the stochastic gradient descent (SGD) algorithm
and is designed to update the weights of a neural network during training.

The name “Adam” is derived from “adaptive moment estimation,” highlighting its ability to adaptively
adjust the learning rate for each network weight individually. Unlike SGD, which maintains a single
learning rate throughout training, Adam optimizer dynamically computes individual learning rates based
on the past gradients and their second moments.

The creators of Adam optimizer incorporated the beneficial features of other optimization algorithms such
as AdaGrad and RMSProp. Similar to RMSProp, Adam optimizer considers the second moment of the
gradients, but unlike RMSProp, it calculates the uncentered variance of the gradients (without subtracting
the mean).

By incorporating both the first moment (mean) and second moment (uncentered variance) of the
gradients, Adam optimizer achieves an adaptive learning rate that can efficiently navigate the
optimization landscape during training. This adaptivity helps in faster convergence and improved
performance of the neural network.

In summary, Adam optimizer is an optimization algorithm that extends SGD by dynamically adjusting
learning rates based on individual weights. It combines the features of AdaGrad and RMSProp to provide
efficient and adaptive updates to the network weights during deep learning training.

Adam Optimizer Formula

The adam optimizer has several benefits, due to which it is used widely. It is adapted as a benchmark for
deep learning papers and recommended as a default optimization algorithm. Moreover, the algorithm is
straightforward to implement, has a faster running time, low memory requirements, and requires less
tuning than any other optimization algorithm.
The above formula represents the working of adam optimizer. Here B1 and B2 represent the decay rate of
the average of the gradients.

Batch Normalization:

What is Batch Normalization?

Batch normalization was introduced to mitigate the internal covariate shift problem in neural networks by
Sergey Ioffe and Christian Szegedy in 2015. The normalization process involves calculating the mean and
variance of each feature in a mini-batch and then scaling and shifting the features using these statistics.
This ensures that the input to each layer remains roughly in the same distribution, regardless of changes in
the distribution of earlier layers’ outputs. Consequently, Batch Normalization helps in stabilizing the
training process, enabling higher learning rates and faster convergence.

Need for Batch Normalization

Batch Normalization is extension of concept of normalization from just the input layer to the activations
of each hidden layer throughout the neural network. By normalizing the activations of each layer, Batch
Normalization helps to alleviate the internal covariate shift problem, which can hinder the convergence of
the network during training.The inputs to each hidden layer are the activations from the previous layer. If
these activations are normalized, it ensures that the network is consistently presented with inputs that
have a similar distribution, regardless of the training stage. This stability in the distribution of inputs
allows for smoother and more efficient training.

By applying Batch Normalization into the hidden layers of the network, the gradients propagated during
backpropagation are less likely to vanish or explode, leading to more stable training dynamics. This
ultimately facilitates faster convergence and better performance of the neural network on the given task.

Fundamentals of Batch Normalization

In this section, we are going to discuss the steps taken to perform batch normalization.

Step 1: Compute the Mean and Variance of Mini-Batches

For mini-batch of activations x1,x2,…,,…,xm, the mean 𝜇𝐵μB and variance 𝜎𝐵2σB2 of the mini-batch
are computed.

Step 2: Normalization
Step 3: Scale and Shift the Normalized Activations

import tensorflow as tf

# Define a simple model

model = tf.keras.Sequential([

tf.keras.layers.Dense(64, input_shape=(784,)),

tf.keras.layers.BatchNormalization(), # Add Batch Normalization layer

tf.keras.layers.Activation('relu'),

tf.keras.layers.Dense(10),

tf.keras.layers.Activation('softmax')

])

# Compile the model

model.compile(optimizer='adam',

loss='sparse_categorical_crossentropy',

metrics=['accuracy'])

# Train the model

model.fit(x_train, y_train, epochs=5, batch_size=32)

What is ReLU?

ReLU stands for Rectified Linear Unit. The function is defined as f(x) = max(0, x), which returns the
input value if it is positive and zero if it is negative. The output of the ReLU function is, therefore, always
non-negative.

The ReLU function has become a popular choice for activation functions in neural networks because it is
computationally efficient and does not suffer from the vanishing gradient problem that can occur with
other activation functions like the sigmoid or hyperbolic tangent functions.

Leaky ReLU
Leaky ReLU activation functionf(x)=max(0.01*x , x). This function returns x if it receives any positive
input, but for any negative value of x, it returns a really small value which is 0.01 times x. Thus it gives
an output for negative values as well.

Texture Generation Using GANs

Generative Adversarial Networks (GANs) have shown great potential in generating high-quality
images, but they can also be used to generate realistic textures. One of the challenges in texture
generation is to generate samples with high visual quality while preserving the coherence and
consistency of the texture. Spatial GAN builds upon the traditional GANs architecture by
transforming the generator and discriminator into fully convolutional networks. In this way, a
spatial input is mapped into an output texture image which can be expanded in size by expanding
the input. Periodic Spatial GAN (PSGAN) is an extension of Spatial GAN proposed to generate
textures with periodic patterns where they incorporated a periodic input into the generator
network to ensure that the generated textures have a periodic structure. The main drawback of
these methods is the scalability in memory, i.e., increasing the size of the output image will
increase the required GPU memory, this hinders the infinite size generation. TileGAN was able
to generate texture of hundreds of megapixels; however, it focuses on a different task where the
aim was to combine different patches generated by GANs by passing the pre-stored latent tiles to
the trained generator. This requires searching for the closest latents using a neighbourhood
similarity match which is not a scalable and deployable solution.

Texture Generation

Generating Music with a Generative Adversarial Network

Introduction

Generative Adversarial Networks (GANs) have become extraordinarily popular in recent years due to
their success with image generation. Websites like thispersondoesnotexist.com showcase the capabilities
of GANs in generating extraordinarily realistic human faces. Given their positive results regarding image
generation, we sought to find out if GANs could be applied to musical compositions with a similar
outcome.

Background

For some context, let’s briefly examine what a GAN actually is. GANs consist of two neural networks
with conflicting goals, namely a discriminator and a generator. The discriminator has the task of
determining whether or not input it is given is “real” or “fake”. The generator is challenged with creating
authentic-looking content that fools the discriminator into believing it’s real. The idea is that when one of
these networks gets better at its job, the other network has to learn how to better counteract its adversary.
This feedback loop results in obtaining better and better generated content.

Time-Frequency Data Representation

With this brief introduction to GANs out of the way, we’ll take a look at how we applied this concept to
music. With images, data representation is relatively straightforward; images are just 2-dimensional
arrays with a number of color channels (e.g. 1 channel for greyscale or 3 channels for red-green-blue).
However, music is structured differently than images. A single song can have multiple instruments
playing their own part at any time. This introduces a significant amount of variability that must be
captured, certainly too much to for a single 2D array. To make this task more feasible, we only used
songs from the classical music genre with a single track and we fixed the amount of data that we used
from each song.

In order to effectively utilize convolution within a GAN, the data used must maintain translational
invariance. In order to facilitate this, every musical training input was extracted as a 16 beat segment from
a song, where each beat was divided into 24 time slices. Each slice contained a vector of size 128 to hold
the volume of each possible note that could be played. This resulted in our discriminator input matrix
(and generator output matrix) being of size 384 x 128. Songs could be sampled multiple times to produce
additional training samples, with some danger of overlap between samples. These transformation steps
combined with the data filtering discussed in the paragraph above reduced our original input dataset of
about 113,000 midi files to roughly 6,000.

GAN

Convolutional models can be notoriously tricky to train, so we studied examples such as those provided
here for guidance. We needed to strike a balance between the additional complexity introduced with
deeper layers plus larger filters with the model’s potential to underfit the data and produce noise. We also
needed to find a training heuristic that would help avoid problems like non-convergence and mode
collapse. Keeping these ideas in mind, we came up with the following architectures for our generator and
discriminator networks.

Generator architecture

The generator element in our GAN takes a vector of 100 random real numbers as input and feeds this
through 5 hidden layers to produce the output song. Each input is selected from a normal distribution with
a mean of 0 and a variance of 1. The hidden layers in the network are organized like so: The first is a
fully-connected layer, whose output is reshaped to (6, 8, 256). This is then fed to a convolutional
transpose layer using a (5, 5) filter, followed by a third convolutional layer using a (4, 4) filter. Layers 4
and 5 both use (4, 2) filters to ultimately output a Pianoroll matrix of shape (384, 128). The last layer in
the generator is a Relu activation layer which limits each cell in the output to be between 0 and 2. Every
other layer uses a modified form of ReLU activation in every convolutional layer, which acts as a pass-
through activation for all positive outputs, but reduces the value by a factor of 3 if its negative. We also
use batch normalization between each layer to help control the magnitudes of the weights.

Discriminator architecture

The discriminator works in a reverse fashion. It takes the (384, 128) song as input, and feeds this into a
network of 3 convolutional layers to output a single scalar representing the probability that this input is
real. The first layer uses a (4, 2) filter to create an output of shape (96, 64, 32). This is fed to the second
convolutional layer with a (4, 4) filter and then to the third layer with a (4, 4) filter. This final layer is fed
to a fully connected layer with a single output: the class estimate. The same modified ReLU activation
seen in the generator is used for each convolutional layer, and a sigmoid activation is applied to the final
fully-connected layer. We also utilize dropout layers between each convolutional layer, which randomly
reduces 30% of the inputs to 0 to help it generalize to new data.
We applied cross-entropy as our loss functions as the generator loss is based on how well it is able to trick
the discriminator into identifying the fake song as real, while the discriminator loss is a sum of how well
it identifies both real and fake songs as their respective classes. Both models use the Adam algorithm as
its gradient optimizer, but the discriminator uses a learning rate of 1e-6, while the generator uses a
learning rate of 1e-4.

Training, Results, & Evaluation

We trained our model on a randomly selected batch of 200 of the 6,000 input samples for each of 10,000
epochs. We saved a generated song for examination every 250 epochs for later analysis and visualization.
At the conclusion of training, our model demonstrated that it was able to pick up some structural details
early on, but failed to generalize and ultimately was unable to produce compelling results.

The generator produced random noise after the first training epoch, which aligned well with our
expectations. After the first 250 epochs, some musical structure became clear as notes began to be played
at specific times in vertical sequence with other notes. However, this pattern would not persist, and as the
remaining training epochs progressed the output further resembled random sounds played at random
times, albeit at a lower density than when it started.
We were unable to find more conclusive results given time constraints on completing this project,
however there are several key ideas we would use to improve our model if we were to revisit this project
in the future. First, the filter sizes used by the discriminator are likely much too small to capture any large
structural patterns in the music. These would need to be scaled up significantly to find patterns that persist
through each sample. Second, we may not be using enough hidden layers in both models to capture and
reproduce the complex design of music. Adding more convolutional layers could potentially improve
performance. Third, our data could be selected with more strict criteria, such as enforcing a 4/4 time
signature, removing samples which contain key changes, and starting each sample on the first beat of a
measure.

CH1 Path D&R Agam
100% (1)
CH1 Path D&R Agam
34 pages
EX750-5 Circuit Diagram
100% (1)
EX750-5 Circuit Diagram
18 pages
BoM For Transformer
No ratings yet
BoM For Transformer
24 pages
CRISP-DM Template Final Project
No ratings yet
CRISP-DM Template Final Project
13 pages
Lecture 2.3.4GAN
No ratings yet
Lecture 2.3.4GAN
4 pages
Deep & Reinforcement - Unit 3
No ratings yet
Deep & Reinforcement - Unit 3
8 pages
Aiml Demo
No ratings yet
Aiml Demo
12 pages
Unit V
No ratings yet
Unit V
20 pages
Reviewon Generative Adversarial Networks
No ratings yet
Reviewon Generative Adversarial Networks
6 pages
Generative Adversarial Network
No ratings yet
Generative Adversarial Network
19 pages
PDL Unit 5-GAN
No ratings yet
PDL Unit 5-GAN
36 pages
Gen AI Unit 3
No ratings yet
Gen AI Unit 3
52 pages
Chapter8 GANs
No ratings yet
Chapter8 GANs
24 pages
Generative Adversarial Networks (Gans) : Date: 14.11.2022
100% (1)
Generative Adversarial Networks (Gans) : Date: 14.11.2022
12 pages
Generative Adversarial Networks
100% (1)
Generative Adversarial Networks
14 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
10 pages
DL Unit5
No ratings yet
DL Unit5
15 pages
Unit-V Deep Generative Models Part-02
No ratings yet
Unit-V Deep Generative Models Part-02
35 pages
E-Note 28189 Content Document 20241127105359AM
No ratings yet
E-Note 28189 Content Document 20241127105359AM
32 pages
Generative Adversarial Network GAN A General Review On Different Variants of GAN and Applications
No ratings yet
Generative Adversarial Network GAN A General Review On Different Variants of GAN and Applications
8 pages
A Survey On Generative Adversarial Networks (GANs)
No ratings yet
A Survey On Generative Adversarial Networks (GANs)
5 pages
GaNs L7
No ratings yet
GaNs L7
14 pages
DL Co4 PPT-3
No ratings yet
DL Co4 PPT-3
14 pages
Why Were Gans Developed in The First Place?: Generative Adversarial Network (Gan)
No ratings yet
Why Were Gans Developed in The First Place?: Generative Adversarial Network (Gan)
3 pages
DL Unit6 Gan
No ratings yet
DL Unit6 Gan
44 pages
Adversarial Training Technique
No ratings yet
Adversarial Training Technique
3 pages
What Are Generative Adversarial Networks - 2
No ratings yet
What Are Generative Adversarial Networks - 2
20 pages
Master of Technology in Computer Science: Generative Adversarial Network
No ratings yet
Master of Technology in Computer Science: Generative Adversarial Network
11 pages
Master of Technology in Computer Science: Generative Adversarial Network
No ratings yet
Master of Technology in Computer Science: Generative Adversarial Network
11 pages
Generative Adversarial Network
No ratings yet
Generative Adversarial Network
22 pages
Week 3 - Post - GAN
No ratings yet
Week 3 - Post - GAN
38 pages
Unit 5
No ratings yet
Unit 5
46 pages
Week 8
No ratings yet
Week 8
61 pages
Gans
No ratings yet
Gans
14 pages
Proceedings of Spie: A Survey On Generative Adversarial Networks and Their Variants Methods
No ratings yet
Proceedings of Spie: A Survey On Generative Adversarial Networks and Their Variants Methods
8 pages
Seminar 3258
No ratings yet
Seminar 3258
29 pages
Report GANS Final
No ratings yet
Report GANS Final
9 pages
The Nature of Generative Adversarial Networks
No ratings yet
The Nature of Generative Adversarial Networks
4 pages
GANs
No ratings yet
GANs
13 pages
Generative Adversarial Nets:Optimizations and Functioning: Motivation
No ratings yet
Generative Adversarial Nets:Optimizations and Functioning: Motivation
5 pages
GAPE Module 2
No ratings yet
GAPE Module 2
30 pages
Text To Image Translation Using Generative Adversarial Networks
No ratings yet
Text To Image Translation Using Generative Adversarial Networks
7 pages
Anime Gan
No ratings yet
Anime Gan
1 page
Generative Adversial Network
No ratings yet
Generative Adversial Network
21 pages
Advances in AI
No ratings yet
Advances in AI
16 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
4 pages
12-DL-Deep Learning For GANS
No ratings yet
12-DL-Deep Learning For GANS
75 pages
Realistic Face Image Generation Based On Generative Adversarial Network
No ratings yet
Realistic Face Image Generation Based On Generative Adversarial Network
4 pages
What Are Generative Adversarial Networks
No ratings yet
What Are Generative Adversarial Networks
14 pages
GAN Report by Manisha
No ratings yet
GAN Report by Manisha
30 pages
8 Gan
No ratings yet
8 Gan
6 pages
Generative Adversarial Networks (GANs)
No ratings yet
Generative Adversarial Networks (GANs)
37 pages
29 - Gan - 1
No ratings yet
29 - Gan - 1
24 pages
General Adversial Network (GAN)
No ratings yet
General Adversial Network (GAN)
12 pages
Generative Adversarial Networks Seminar Report
50% (4)
Generative Adversarial Networks Seminar Report
11 pages
GAN Technical Final Report
No ratings yet
GAN Technical Final Report
21 pages
Introduction To Generative Adversarial Networks: Luke de Oliveira
No ratings yet
Introduction To Generative Adversarial Networks: Luke de Oliveira
31 pages
A Technical Seminar2018-19
No ratings yet
A Technical Seminar2018-19
15 pages
Applications and Challenges of GAN in AI-powered A
No ratings yet
Applications and Challenges of GAN in AI-powered A
5 pages
01 GAN & Its Application
No ratings yet
01 GAN & Its Application
21 pages
Generative Adversarial Networks - A Literature Review
No ratings yet
Generative Adversarial Networks - A Literature Review
23 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
1 page
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Accomplishment Report As of Feb 5-9,2024
No ratings yet
Accomplishment Report As of Feb 5-9,2024
6 pages
LABNICS Filter Integrity Tester NFIT 101
No ratings yet
LABNICS Filter Integrity Tester NFIT 101
5 pages
Use and Maintenance Manual: Pneumatic L Sealer
No ratings yet
Use and Maintenance Manual: Pneumatic L Sealer
20 pages
Microprocessor
No ratings yet
Microprocessor
18 pages
Bill June24
No ratings yet
Bill June24
1 page
Usermanual Em6400.v01
No ratings yet
Usermanual Em6400.v01
81 pages
Quiz 13 Questions and Answers
No ratings yet
Quiz 13 Questions and Answers
3 pages
APT cnc2
No ratings yet
APT cnc2
65 pages
Automated Accounting Management System 1
No ratings yet
Automated Accounting Management System 1
46 pages
Challenges of Handicraft Industries
No ratings yet
Challenges of Handicraft Industries
11 pages
01 - ITIL Patch Management Best Practices
No ratings yet
01 - ITIL Patch Management Best Practices
4 pages
Die Shear Test - Microelectronic Devices - Application Overview
No ratings yet
Die Shear Test - Microelectronic Devices - Application Overview
2 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Fire Detection and Fire Alarm System Planning v2 51527
100% (1)
Fire Detection and Fire Alarm System Planning v2 51527
42 pages
Transformers Health Management Condition Monitoring System: Product Description
No ratings yet
Transformers Health Management Condition Monitoring System: Product Description
46 pages
Outreach Team Job Description
No ratings yet
Outreach Team Job Description
4 pages
Guideline On Submission of Amendment and Record Piling Plans PDF
No ratings yet
Guideline On Submission of Amendment and Record Piling Plans PDF
9 pages
Leave Application Form (New)
No ratings yet
Leave Application Form (New)
1 page
Iot Smart Parking PDF
No ratings yet
Iot Smart Parking PDF
69 pages
JBL Store Concept Presentation
No ratings yet
JBL Store Concept Presentation
22 pages
Chan V. Honda Motor Co., Ltd. and Honda Phil.: Rights, Regulations and Remedies) in Relation To Sec 170
No ratings yet
Chan V. Honda Motor Co., Ltd. and Honda Phil.: Rights, Regulations and Remedies) in Relation To Sec 170
3 pages
Flow Diagram Description of Recruitment Operating Procedure
No ratings yet
Flow Diagram Description of Recruitment Operating Procedure
3 pages
ED1 Fundamental Principles of Taxation Tax Laws and Tax Administration
No ratings yet
ED1 Fundamental Principles of Taxation Tax Laws and Tax Administration
2 pages
Continuous Quality Improvement Through Post-Occupancy Evaluation Feedback
No ratings yet
Continuous Quality Improvement Through Post-Occupancy Evaluation Feedback
15 pages
Dummy Tables For QOC Assessment 1st Draft
No ratings yet
Dummy Tables For QOC Assessment 1st Draft
33 pages
Asnt NDT Level Iii Basic Requirements
59% (17)
Asnt NDT Level Iii Basic Requirements
2 pages

Unit Iii

Uploaded by

Unit Iii

Uploaded by

Generative Adversarial Network (GAN)

GAN(Generative Adversarial Network) represents a cutting-edge approach to generative modeling within

● G is generator network and is D is the discriminator network

actual data as real.

● D(G(z)) is the likelihood that the discriminator will identify generated

data coming from the generator as authentic.

Also, the layers are not fully connected.

How does a GAN work?

The steps involved in how a GAN works:

input_shape = (self.img_rows, self.img_cols, self.channel)

self.D.add(Conv2D(depth*1, 5, strides=2, input_shape=input_shape,\

self.D.add(Conv2D(depth*2, 5, strides=2, padding='same',\

self.D.add(Conv2D(depth*4, 5, strides=2, padding='same',\

self.D.add(Conv2D(depth*8, 5, strides=1, padding='same',\

# Out: 1-dim probability

# Out: dim x dim x depth

self.G.add(Reshape((dim, dim, depth)))

# In: dim x dim x depth

# Out: 2*dim x 2*dim x depth/2

# Out: 28 x 28 x 1 grayscale image [0.0,1.0] per pix

optimizer = RMSprop(lr=0.0008, clipvalue=1.0, decay=6e-8)

Listing 3. Discriminator Model implemented in Keras.

optimizer = RMSprop(lr=0.0004, clipvalue=1.0, decay=3e-8)

Listing 4. Adversarial Model as shown in Figure 3 implemented in Keras.

noise = np.random.uniform(-1.0, 1.0, size=[batch_size, 100])

noise = np.random.uniform(-1.0, 1.0, size=[batch_size, 100])

What are Optimizers in Deep Learning?

Choosing the Right Optimizer

Important Deep Learning Terms

• Sample – A single row of a dataset.

Gradient Descent Deep Learning Optimizer

Gradient descent works as follows:

1. Initialize Coefficients: Start with initial coefficients.

2. Evaluate Cost: Calculate the cost associated with these coefficients.

5. Repeat Process: Continue this process iteratively.

Stochastic Gradient Descent Deep Learning Optimizer

Stochastic Gradient Descent With Momentum Deep Learning Optimizer

Mini Batch Gradient Descent Deep Learning Optimizer

Adagrad (Adaptive Gradient Descent) Deep Learning Optimizer

RMS Prop (Root Mean Square) Deep Learning Optimizer

RMS Prop Formula

Adam Optimizer in Deep Learning

Adam Optimizer Formula

What is Batch Normalization?

Need for Batch Normalization

Fundamentals of Batch Normalization

Step 1: Compute the Mean and Variance of Mini-Batches

# Define a simple model

tf.keras.layers.BatchNormalization(), # Add Batch Normalization layer

# Compile the model

# Train the model

model.fit(x_train, y_train, epochs=5, batch_size=32)

Texture Generation Using GANs

Generating Music with a Generative Adversarial Network

Time-Frequency Data Representation

Training, Results, & Evaluation

You might also like

# Out: 2dim x 2dim x depth/2