Deep Learning Full
Deep Learning Full
Deep Learning Full
PROJECT 7
2. Student ID:
3. Student ID:
4. Student ID:
Final assignment Introduction to computing
Reference.................................................................................................................24
Final assignment Introduction to computing
Deep learning drives many artificial intelligence (AI) applications and services
that improve automation, performing analytical and physical tasks without human
intervention. Deep learning technology lies behind everyday products and services
(such as digital assistants, voice-enabled TV remotes, and credit card fraud
detection) as well as emerging technologies (such as self-driving cars).
Final assignment Introduction to computing
2008 Andrew NG’s group GPUs for training Deep Neural Networks
The RNN is one of the foundational network architectures from which other
deep learning architectures are built. The primary difference between a typical
multilayer network and a recurrent network is that rather than completely feed-
forward connections, a recurrent network might have connections that feed back
into prior layers (or into the same layer). This feedback allows RNNs to maintain
memory of past inputs and model problems in time.
RNNs consist of a rich set of architectures (we’ll look at one popular topology
called LSTM next). The key differentiator is feedback within the network, which
could manifest itself from a hidden layer, the output layer, or some combination
thereof.
The LSTM memory cell contains three gates that control how information
flows into or out of the cell. The input gate controls when new information can
flow into the memory. The forget gate controls when an existing piece of
information is forgotten, allowing the cell to remember new data. Finally, the
output gate controls when the information that is contained in the cell is used in
the output from the cell. The cell also contains weights, which control each gate.
The training algorithm, commonly BPTT, optimizes these weights based on the
resulting network output error.
Recent applications of CNNs and LSTMs produced image and video captioning
systems in which an image or video is captioned in natural language. The CNN
implements the image or video processing, and the LSTM is trained to convert
the CNN output into natural language.
The GRU is simpler than the LSTM, can be trained more quickly, and can be
more efficient in its execution. However, the LSTM can be more expressive and
with more data can lead to better results.
3.2.1.Self-organized maps
Self-organized map (SOM) was popularly known as the Kohonen map. SOM is
an unsupervised neural network that creates clusters of the input data set by
reducing the dimensionality of the input. SOMs vary from the traditional artificial
neural network in quite a few ways.
The first significant variation is that weights serve as a characteristic of the node.
After the inputs are normalized, a random input is first chosen. Random weights
close to zero are initialized to each feature of the input record. These weights now
represent the input node. Several combinations of these random weights represent
variations of the input node. The euclidean distance between each of these output
nodes with the input node is calculated. The node with the least distance is
declared as the most accurate representation of the input and is marked as the best
matching unit or BMU. With these BMUs as center points, other units are similarly
calculated and assigned to the cluster that it is the distance from. Radius of points
around BMU weights are updated based on proximity. Radius is shrunk.
Next, in an SOM, no activation function is applied, and because there are no target
labels to compare against there is no concept of calculating error and back
propogation.
Final assignment Introduction to computing
3.2.2.Autoencoders
Though the history of when autoencoders were invented is hazy, the first known
usage of autoencoders .This variant of an autoencoders is composed of 3 layers:
input, hidden, and output layers.
First, the input layer is encoded into the hidden layer using an appropriate
encoding function. The number of nodes in the hidden layer is much less than the
number of nodes in the input layer. This hidden layer contains the compressed
representation of the original input. The output layer aims to reconstruct the input
layer by using a decoder function.
During the training phase, the difference between the input and the output layer
is calculated using an error function, and the weights are adjusted to minimize the
error. Unlike traditional unsupervised learning techniques, where there is no data to
compare the outputs against, autoencoders learn continuosly using backward
Final assignment Introduction to computing
An RBM is a 2-layered neural network. The layers are input and hidden layers.
As shown in the following figure, in RBMs every node in a hidden layer is
connected to every node in a visible layer. In a traditional Boltzmann Machine,
nodes within the input and hidden layer are also connected. Due to computational
complexity, nodes within a layer are not connected in a Restricted Boltzmann
Machine.
During the training phase, RBMs calculate the probabilty distribution of the
training set using a stochastic approach. When the training begins, each neuron
gets activated at random. Also, the model contains respective hidden and visible
bias. While the hidden bias is used in the forward pass to build the activation, the
visible bias helps in reconstructing the input.
Because in an RBM the reconstructed input is always different from the original
input, they are also known as generative models.
Final assignment Introduction to computing
Starting from a video in some source domain, they synthesize a new video in a
target domain using a learning network. Semantic labels allow them to edit or
create content in a convenient input domain and generate a video to an output
domain that is harder to edit or create
Final assignment Introduction to computing
The network can synthesize multiple results given the same input or manipulated
to generate the desired output video. In the crude map, each color corresponds to
an object class, and we can change the meaning of the label. Some examples of this
are transforming trees into buildings or vice versa and changing the styles of
buildings or roads.
They train a sketch-to-face synthesis video model by using the real face videos in
the Face Forensics dataset. The network learns to transfer edge map video to video
of a human face. It also can generate different faces from the same input edge map.
On the other hand, the model can change the facial appearance of the original face
videos. The resulting video is temporarily consistent from frame to frame.
body shapes and motions. The method can change the clothing for the same dancer
or transfer poses from one person to another person with consistent shadow.
Figure 4.1.3: Body → Pose → Body Results
4.1.4.Frame Prediction :
To predict the future video given a few observed frames, the team has
decomposed the task into two sub-tasks:
When training the BERT model, both the techniques are trained together, thus
minimizing the combined loss function of the two strategies.
References:
Websites:
[1] ibm.com
https://www.ibm.com/cloud/learn/deep-learning
https://developer.ibm.com/technologies/artificial-intelligence/articles/cc-machine-
learning-deep-learning-architectures/
[2] machinelearningknowledge.ai
https://machinelearningknowledge.ai/brief-history-of-deep-learning/
[3] nvlabs.github.io
https://nvlabs.github.io/few-shot-vid2vid/
[4] tryolabs.com
https://tryolabs.com/blog/2018/12/19/major-advancements-deep-learning-2018/
[5] towardsdatascience.com
https://towardsdatascience.com/understanding-bert-is-it-a-game-changer-in-nlp-
7cca943cf3ad#:~:text=1%2C%20BERT%20achieves%2093.2%25%20F1,Language
%20Understanding%20(NLU)%20tasks.
[6] web.stanford.edu
https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/reports/default/15812785.
pdf
Science Journals:
[8] Ting-Chun Wang , Ming-Yu Liu , Jun-Yan Zhu , Guilin Li, Andrew Tao , Jan
Kautz , Bryan Catanzaro
Books: