DL Ia2
DL Ia2
- BIOLOGICAL INSPIRATION:
- The biological inspiration for CNNs is the visual cortex in animals.
- The visual cortex, is the human brain's vision-processing center
- The visual cortex is the primary cortical region of the brain that receives,
integrates, and processes visual information relayed from the retinas.
- The cells in the visual cortex are sensitive to small subregions of the input.
- These smaller subregions are tiled together to cover the entire visual eld.
- The cells are well suited to exploit the strong spatially local correlation found in
the types of images our brains process and act as local lters over the input space.
- There are two classes of cells in this region of the brain.
fi
ff
fi
fi
fi
- The simple cells activate when they detect edge-like patterns, and the more
complex cells activate when they have a larger receptive eld and are invariant to
the position of the pattern.
- CNN ARCHITECTURE:
8. 3D VOLUMETRIC INPUT?
->
1. Mini-batch Size:
- Mini-batch size is the number of input records (collections of time-series points for a
single source entity) we want to model per batch.
2. Number of columns in our vector per time-step:
- The number of columns matches up to the traditional feature column count found in a
normal input vector.
3. Number of time-steps:
• The number of time-steps is how we represent the change in the input vector over time.
ff
fi
fi
fi
fi
fi
fi
fi
fi
9. GENERAL RECURRENT NEURAL NETWORK
ARCHITECTURE?
->
- superset of feed-forward neural networks but they add the concept of recurrent
connections.
- These connections span adjacent time-steps, giving the model the concept of time.
- The conventional connections do not contain cycles in recurrent neural networks.
- The output is computed from the hidden state at the given time-step.
- The previous input vector at the previous time step can in uence the current output at
the current time-step through the recurrent connections.
- We can chain layers of these specialized recurrent neurons together to build better
models.
- We connect the output of the previous layer to the input of the next layer.
- ISSUES IN RNN:
• Vanishing Gradient Problem:
• RNNs su er from the problem of vanishing gradients. The gradients carry
information used in the RNN, and when the gradient becomes too small, the
parameter updates become insigni cant.
• This makes the learning of long data sequences di cult.
• Exploding Gradient Problem:
• While training a neural network, if the slope tends to grow exponentially instead
of decaying, this is called an Exploding Gradient. This problem arises when large
error gradients accumulate, resulting in very large updates to the neural
network model weights during the training process.
• Long training time, poor performance, and bad accuracy are the major issues in
gradient problems.
- TRANSFORMER ARCHITECTURE:
fi
ff
ff
• uses an encoder-decoder structure.
• The encoder maps an input sequence to a series of continuous representations.
• The decoder receives the encoder’s output and the decoder’s output at a previous
time step and generates an output sequence.
• The architecture rst converts the input data into an n-dimensional embedding,
which is then fed to an encoder.
• The encoder and decoder consist of modules stacked on each other several times.
• The modules include mainly feed-forward and multi-head attention layers.
• Multi-head attention attention:
- The multi-head attention mechanism enables the model to pay attention to
multiple parts of the key simultaneously.
• Self Attention:
- Allows the model to relate to each word
• Positional encoding:
- Helps us to carry some information about its position in the sentence.
- Helps us to represent a pattern that can be learned by the model.
- TRANSFORMER CHALLENGES:
• The vanilla Transformer model helps overcome the RNN model’s shortcomings but has
two key issues:
- Limited context dependency:
- the Transformer outperforms the LSTM for character-level language modeling
purposes.
- cannot keep long-term dependency information beyond the con gured
context length.
- cannot correlate with words that appeared several segments ago.
- Context fragmentation:
fi
fi
- the Transformer is trained from scratch for each segment.
- No context information is stored in the rst few symbols of each segment,
leading to performance issues.
fi