0% found this document useful (0 votes)
19 views37 pages

Deep Learning L3

The document discusses data augmentation techniques, which artificially increase training datasets by modifying existing data or generating new data points, and outlines its benefits and limitations. It also covers Recurrent Neural Networks (RNNs), their types, advantages, and challenges, including issues like vanishing and exploding gradients. Additionally, the document explains Long Short-Term Memory (LSTM) networks, designed to handle long-term dependencies in sequence data, detailing their architecture and functionality.

Uploaded by

220107102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views37 pages

Deep Learning L3

The document discusses data augmentation techniques, which artificially increase training datasets by modifying existing data or generating new data points, and outlines its benefits and limitations. It also covers Recurrent Neural Networks (RNNs), their types, advantages, and challenges, including issues like vanishing and exploding gradients. Additionally, the document explains Long Short-Term Memory (LSTM) networks, designed to handle long-term dependencies in sequence data, detailing their architecture and functionality.

Uploaded by

220107102
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Lecture 3 Neural Networks

INF 337 Deep Learning


Data Augmentation
Data Augmentation

Data augmentation is a technique of artificially increasing the training set by


creating modified copies of a dataset using existing data. It includes making
minor changes to the dataset or using deep learning to generate new data
points.
Augmented vs. synthetic data

Augmented data: This involves creating modified versions of existing data to


increase dataset diversity. For example, in image processing, applying
transformations like rotations, flips, or color adjustments to existing images
can help models generalize better.

Synthetic data: This refers to artificially generated data, which allows


researchers and developers to test and improve algorithms without risking the
privacy or security of real-world data.
When should you use data augmentation?

1. To prevent models from overfitting.


2. The initial training set is too small.
3. To improve the model accuracy.
4. To Reduce the operational cost of labeling and cleaning the raw dataset.
Limitations of data augmentation

1. The biases in the original dataset persist in the augmented data.


2. Quality assurance for data augmentation is expensive.
3. Research and development are required to build a system with advanced
applications. For example, generating high-resolution images using GANs
can be challenging.
4. Finding an effective data augmentation approach can be challenging.
Data Augmentation Techniques

Audio Data Augmentation:


1. Noise injection: add gaussian or random
noise to the audio dataset to improve the
model performance.
2. Shifting: shift audio left (fast forward) or
right with random seconds.
3. Changing the speed: stretches times
series by a fixed rate.
4. Changing the pitch: randomly change the
pitch of the audio.
Data Augmentation Techniques

Text Data Augmentation


1. Word or sentence shuffling: randomly
changing the position of a word or sentence.
2. Word replacement: replace words with
synonyms.
3. Syntax-tree manipulation: paraphrase the
sentence using the same word.
4. Random word insertion: inserts words at
random.
5. Random word deletion: deletes words at
random.
Data Augmentation Techniques

Image Augmentation
1. Geometric transformations: randomly flip, crop, rotate,
stretch, and zoom images. You need to be careful
about applying multiple transformations on the same
images, as this can reduce model performance.
2. Color space transformations: randomly change RGB
color channels, contrast, and brightness.
3. Kernel filters: randomly change the sharpness or
blurring of the image.
4. Random erasing: delete some part of the initial image.
5. Mixing images: blending and mixing multiple images.
Recurrent Neural Networks
CNN
RNNs

Recurrent Neural Network is a generalization of feed-forward neural network that


has an internal memory. RNN is recurrent in nature as it performs the same
function for every input of data while the output of the current input depends on
the past one computation.
Different types of RNN’s
● The core reason that recurrent nets are more exciting is that they allow us
to operate over sequences of vectors: Sequences in the input, the output,
or in the most general case both. A few examples may make this more
concrete:
One-to-one:
This also called as Plain/Vaniall Neural networks. It deals with Fixed size of
input to Fixed size of Output where they are independent of previous
information/output.
One-to-Many:
It deals with fixed size of information as input that gives sequence of data as
output.
Many-to-One:
It takes Sequence of information as input and ouputs a fixed size of output.
Many-to-Many:
It takes a Sequence of information as input and process it recurrently outputs a
Sequence of data.
Bidirectional Many-to-Many:

Synced sequence input and output.


Notice that in every case are no
pre-specified constraints on the lengths
sequences because the recurrent
transformation (green) is fixed and can
be applied as many times as we like.
CNN vs RNN:
RNN:
Backpropagation Through Time:
Backpropagation Through Time:
We typically treat the full sequence (word) as one training example, so the total error
is just the sum of the errors at each time step (character). The weights as we can see
are the same at each time step. Let’s summarize the steps for backpropagation
through time
1. The cross entropy error is first computed using the current output and the actual
output
2. Remember that the network is unrolled for all the time steps
3. For the unrolled network, the gradient is calculated for each time step with
respect to the weight parameter
4. Now that the weight is the same for all the time steps the gradients can be
combined together for all time steps
5. The weights are then updated for both recurrent neuron and the dense layers
Backpropagation Through Time:

While Backpropagating you


may get 2 types of issues:

1. Vanishing Gradient
2. Exploding Gradient
Vanishing Gradient

Where the contribution from the earlier steps becomes insignificant in the
gradient descent step.
Exploding Gradients

We speak of Exploding Gradients when the algorithm assigns a stupidly high


importance to the weights, without much reason. But fortunately, this problem
can be easily solved if you truncate or squash the gradients.
How can you overcome the Challenges of
Vanishing and Exploding Gradience?

Vanishing Gradience can be overcome with:


1. Relu activation function.
2. LSTM, GRU.
Exploding Gradience can be overcome with:
● Truncated BTT(instead starting backprop at the last time stamp, we can choose
similar time stamp, which is just before it.)
● Clip Gradience to threshold.
● RMSprop to adjust learning rate.
Advantages of Recurrent Neural Network

The main advantage of RNN over ANN is that RNN can model sequence of data
(i.e. time series) so that each sample can be assumed to be dependent on
previous ones

Recurrent neural network are even used with convolutional layers to extend the
effective pixel neighborhood.
Disadvantages of Recurrent Neural Network

Gradient vanishing and exploding problems.

Training an RNN is a very difficult task.

It cannot process very long sequences if using tanh or relu as an activation


function.
Long Short Term Memory
LSTM
LSTM
● Long Short-Term Memory (LSTM) is a recurrent neural network architecture
designed by Sepp Hochreiter and Jürgen Schmidhuber in 1997.
● LSTMs are explicitly designed to avoid the long-term dependency problem.
Remembering information for long periods of time is practically their default
behavior, not something they struggle to learn!
LSTM

The key to LSTMs is the cell state, the horizontal line running through the top of the
diagram. The cell state is kind of like a conveyor belt. It runs straight down the
entire chain, with only some minor linear interactions. It’s very easy for information
to just flow along it unchanged.
LSTM
The LSTM does have the ability to remove or add
information to the cell state, carefully regulated by
structures called gates.

Gates are a way to optionally let information through. They


are composed out of a sigmoid neural net layer and a
pointwise multiplication operation.

The sigmoid layer outputs numbers between zero and one,


describing how much of each component should be let
through. A value of zero means “let nothing through,” while
a value of one means “let everything through!”
LSTM
The first step in our LSTM is to decide what information we’re going to throw away
from the cell state. This decision is made by a sigmoid layer called the “forget gate
layer.” It looks at ht−1 and xt, and outputs a number between 0 and 1 for each
number in the cell state Ct−1. A 1 represents “completely keep this” while a 0
represents “completely get rid of this.”
LSTM
The next step is to decide what new information we’re going to store in the cell
state. This has two parts. First, a sigmoid layer called the “input gate layer” decides
which values we’ll update. Next, a tanh layer creates a vector of new candidate
values, Ct, that could be added to the state. In the next step, we’ll combine these two
to create an update to the state.
LSTM
Finally, we need to decide what we’re going to output. This output will be based on
our cell state, but will be a filtered version. First, we run a sigmoid layer which
decides what parts of the cell state we’re going to output. Then, we put the cell
state through tanh (to push the values to be between −1 and 1) and multiply it by
the output of the sigmoid gate, so that we only output the parts we decided to.
LSTM
Finally, we need to decide what we’re going to output. This output will be based on
our cell state, but will be a filtered version. First, we run a sigmoid layer which
decides what parts of the cell state we’re going to output. Then, we put the cell
state through tanh (to push the values to be between −1 and 1) and multiply it by
the output of the sigmoid gate, so that we only output the parts we decided to.

You might also like