0% found this document useful (0 votes)

7 views

Curriculum: Tuesday, February 15, 2022 3:30 PM

Uploaded by

bhowmickgyanesh04

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Curriculum: Tuesday, February 15, 2022 3:30 PM

Uploaded by

bhowmickgyanesh04

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 408

Curriculum

Tuesday, February 15, 2022 3:30 PM

Course Announcement Page 1

Features
Tuesday, February 15, 2022 3:31 PM

Course Announcement Page 2

Prerequisites
Tuesday, February 15, 2022 3:31 PM

Course Announcement Page 3

Extra Content
Tuesday, February 15, 2022 3:32 PM

Course Announcement Page 4

What is Deep Learning?
16 February 2022 06:32

Deep Learning is a subfield of Artificial Intelligence and Machine Learning that is

inspired by the structure of a human brain.
Deep learning algorithms attempt to draw similar conclusions as humans would by
continually analyzing data with a given logical structure called Neural Network.

Deep learning is part of a broader family of machine learning methods

based on artificial neural networks with representation learning.

Deep Learning Algorithms uses multiple layers to progressively extract

higher-level features from the raw input. For example, in image
processing, lower layers may identify edges, while higher layers may
identify the concepts relevant to a human such as digits or letters or
faces.

D1 What is Deep Learning Page 5

D1 What is Deep Learning Page 6
Deep Learning VS Machine Learning
16 February 2022 06:33

1. Data Dependency
2. Hardware Dependency
3. Training Time
4. Feature Selection
5. Interpretability

D1 What is Deep Learning Page 7

Why now?
Wednesday, February 16, 2022 6:59 AM

D1 What is Deep Learning Page 8

D1 What is Deep Learning Page 9
D1 What is Deep Learning Page 10
Code Example
Saturday, February 19, 2022 7:10 AM

D3 What is Perceptron Page 11

Sunday, February 20, 2022 11:02 AM

D3 What is Perceptron Page 12

D3 What is Perceptron Page 13
D3 What is Perceptron Page 14
Recap
22 February 2022 14:12

D5 Perceptron Loss Functions Page 15

Problem with Perceptron Trick
22 February 2022 16:10

D5 Perceptron Loss Functions Page 16

Loss Function
22 February 2022 16:46

D5 Perceptron Loss Functions Page 17

Perceptron Loss Function
22 February 2022 17:47

D5 Perceptron Loss Functions Page 18

D5 Perceptron Loss Functions Page 19
Explanation of Loss Function
23 February 2022 08:07

D5 Perceptron Loss Functions Page 20

Gradient Descent
23 February 2022 08:50

D5 Perceptron Loss Functions Page 21

Loss Function Differentiation
23 February 2022 12:57

D5 Perceptron Loss Functions Page 22

More Loss Functions
23 February 2022 13:35

D5 Perceptron Loss Functions Page 23

D5 Perceptron Loss Functions Page 24
The Problem
24 February 2022 13:41

D7 MLP Intuition Page 25

Perceptron with Sigmoid
24 February 2022 15:35

D7 MLP Intuition Page 26

Abstract Solution
24 February 2022 15:36

D7 MLP Intuition Page 27

Mathematical Reasoning
24 February 2022 15:36

D7 MLP Intuition Page 28

Adding Weights
24 February 2022 15:37

D7 MLP Intuition Page 29

The Idea of MLP
24 February 2022 15:37

D7 MLP Intuition Page 30

Adding nodes in hidden layer
25 February 2022 07:52

D7 MLP Intuition Page 31

Adding nodes in input
25 February 2022 08:25

D7 MLP Intuition Page 32

Adding nodes in output node
25 February 2022 08:33

D7 MLP Intuition Page 33

Deep Neural Network
25 February 2022 08:41

D7 MLP Intuition Page 34

Sunday, March 6, 2022 4:56 PM

D7 MLP Intuition Page 35

MNIST Dataset
Wednesday, March 9, 2022 6:39 AM

D7 MLP Intuition Page 36

MLP Notation
26 February 2022 07:02

D8 MLP Notation Page 37

Forward Propogation
03 March 2022 06:07

D9 Forward Propogation Page 38

What is Loss Function?
23 March 2022 07:15

D13 Loss Functions in Deep Learning Page 39

D13 Loss Functions in Deep Learning Page 40
D13 Loss Functions in Deep Learning Page 41
D13 Loss Functions in Deep Learning Page 42
D13 Loss Functions in Deep Learning Page 43
23 March 2022 07:31

D13 Loss Functions in Deep Learning Page 44

Backpropogation
30 March 2022 13:18

D14 Backpropogation Page 45

D14 Backpropogation Page 46
D14 Backpropogation Page 47
D14 Backpropogation Page 48
D14 Backpropogation Page 49
D14 Backpropogation Page 50
D14 Backpropogation Page 51
D14 Backpropogation Page 52
D14 Backpropogation Page 53
MLP Memoization
Thursday, April 7, 2022 7:45 AM

D15 Memoization Page 54

Gradient Descent
Thursday, April 7, 2022 7:44 AM

Gradient descent is one of the most popular algorithms to perform

optimization and by far the most common way to optimize neural
networks.

Gradient descent is a way to minimize an objective

function J(θ) parameterized by a model's parameters θ∈Rd by
updating the parameters in the opposite direction of the gradient of
the objective function ∇θJ(θ) w.r.t. to the parameters. The learning
rate η determines the size of the steps we take to reach a (local)
minimum. In other words, we follow the direction of the slope of the
surface created by the objective function downhill until we reach a
valley.

There are three variants of gradient descent, which differ in how much
data we use to compute the gradient of the objective function.
Depending on the amount of data, we make a trade -off between the
accuracy of the parameter update and the time it takes to perform an
update.

D16 Gradient Descent Page 55

D16 Gradient Descent Page 56
D16 Gradient Descent Page 57
Vanishing Gradient Problem
Thursday, April 7, 2022 7:45 AM

D17 Vanishing Gradients Page 58

D17 Vanishing Gradients Page 59
D17 Vanishing Gradients Page 60
Saturday, April 9, 2022 10:01 AM

D17 Vanishing Gradients Page 61

D17 Vanishing Gradients Page 62
D17 Vanishing Gradients Page 63
D17 Vanishing Gradients Page 64
D17 Vanishing Gradients Page 65
D17 Vanishing Gradients Page 66
D17 Vanishing Gradients Page 67
How to improve a neural network
29 April 2022 13:51

D18 How to improve a neural network Page 68

D18 How to improve a neural network Page 69
D18 How to improve a neural network Page 70
Feature Scaling
05 May 2022 17:37

D20 Feature Scaling Page 71

D20 Feature Scaling Page 72
The Problem of Overfitting
12 May 2022 09:32

D21 Dropouts Page 73

Possible Solutions
12 May 2022 09:33

D21 Dropouts Page 74

The concept of Dropouts
12 May 2022 09:33

D21 Dropouts Page 75

Why this works?
12 May 2022 09:33

D21 Dropouts Page 76

Random Forest Analogy
12 May 2022 09:33

D21 Dropouts Page 77

How prediction works?
12 May 2022 09:35

D21 Dropouts Page 78

Regression Code Example
12 May 2022 09:34

D21 Dropouts Page 79

Classification Code Example
12 May 2022 09:34

D21 Dropouts Page 80

Effect of p
12 May 2022 10:06

D21 Dropouts Page 81

Practical Tips and Tricks
12 May 2022 09:34

D21 Dropouts Page 82

Drawbacks
12 May 2022 09:34

D21 Dropouts Page 83

Resources
12 May 2022 12:38

D21 Dropouts Page 84

Overfitting
20 May 2022 08:14

D22 Regularization Page 85

Why Neural Networks Overfit?
19 May 2022 07:34

D22 Regularization Page 86

D22 Regularization Page 87
D22 Regularization Page 88
Ways to solve overfitting
20 May 2022 08:13

D22 Regularization Page 89

Regularization
20 May 2022 08:14

D22 Regularization Page 90

D22 Regularization Page 91
Intuition behind Regularization
20 May 2022 08:15

D22 Regularization Page 92

Code Demo
20 May 2022 08:15

D22 Regularization Page 93

Further Resources
20 May 2022 08:15

D22 Regularization Page 94

What are Activation Functions?
31 May 2022 14:49

In artificial neural networks, each neuron forms a weighted sum of its inputs and
passes the resulting scalar value through a function referred to as an activation
function or transfer function. If a neuron has n inputs
then the output or activation of a neuron is

This function g is referred to as the activation function.

D23 Activation Functions Page 95

Why Activation Functions are needed?
31 May 2022 14:50

D23 Activation Functions Page 96

Ideal Activation Function
31 May 2022 14:50

D23 Activation Functions Page 97

Sigmoid Activation Function
31 May 2022 14:50

D23 Activation Functions Page 98

D23 Activation Functions Page 99
Tanh Activation Function
31 May 2022 14:50

D23 Activation Functions Page 100

Relu Activation Function
01 June 2022 16:43

D23 Activation Functions Page 101

Dying Relu Problem
08 June 2022 22:53

D24 Relu Variants Page 102

D24 Relu Variants Page 103
Leaky Relu
08 June 2022 22:53

D24 Relu Variants Page 104

Parametric Relu
08 June 2022 22:53

D24 Relu Variants Page 105

Elu - Exponential Linear Unit
09 June 2022 00:29

D24 Relu Variants Page 106

Selu - Scaled Exponential Linear Unit
09 June 2022 00:29

D24 Relu Variants Page 107

Why Weight Initialization is Important?
22 June 2022 12:48

D25 Weight Initialization Techiniques Page 108

What not to do?
22 June 2022 12:49

D25 Weight Initialization Techiniques Page 109

D25 Weight Initialization Techiniques Page 110
D25 Weight Initialization Techiniques Page 111
D25 Weight Initialization Techiniques Page 112
Key Insights
22 June 2022 12:49

D25 Weight Initialization Techiniques Page 113

D25 Weight Initialization Techiniques Page 114
Weight Initialization Techniques
22 June 2022 12:49

D25 Weight Initialization Techiniques Page 115

What is Batch Norm?
27 June 2022 11:00

Batch-Normalization (BN) is an algorithmic method which makes

the training of Deep Neural Networks (DNN) faster and more
stable.
It consists of normalizing activation vectors from hidden
layers using the mean and variance of the current batch. This
normalization step is applied right before (or right after) the
nonlinear function.

D26 Batch Normalization Page 116

Why use Batch Norm?
30 June 2022 16:08

D26 Batch Normalization Page 117

Internal Covariate Shift
27 June 2022 11:02

D26 Batch Normalization Page 118

D26 Batch Normalization Page 119
Batch Norm - The How
27 June 2022 11:02

D26 Batch Normalization Page 120

D26 Batch Normalization Page 121
Batch Norm during test
27 June 2022 11:03

D26 Batch Normalization Page 122

Advantages
27 June 2022 11:02

D26 Batch Normalization Page 123

Keras Implementation
27 June 2022 11:03

D26 Batch Normalization Page 124

Introduction
05 July 2022 09:57

D27 Optimizers Page 125

Role of Optimizer
05 July 2022 10:01

D27 Optimizers Page 126

Types of Optimizers
05 July 2022 10:02

D27 Optimizers Page 127

Challenges
05 July 2022 10:02

D27 Optimizers Page 128

What next?
05 July 2022 10:02

D27 Optimizers Page 129

The What
05 July 2022 20:15

D28 EWMA Page 130

Mathematical Formulation
05 July 2022 20:16

D28 EWMA Page 131

D28 EWMA Page 132
Understanding Graphs
20 July 2022 12:03

D29 SGD with Momentum Page 133

Convex Vs Non-Convex Optimization
20 July 2022 13:06

D29 SGD with Momentum Page 134

Momentum Optimization - The Why?
20 July 2022 14:28

D29 SGD with Momentum Page 135

Momentum Optimization - The What?
20 July 2022 14:33

D29 SGD with Momentum Page 136

Momentum Optimization - Mathematics(The How?)
20 July 2022 14:56

D29 SGD with Momentum Page 137

Effect of beta
20 July 2022 15:09

D29 SGD with Momentum Page 138

Problem with momentum optimization
20 July 2022 15:16

D29 SGD with Momentum Page 139

Demo
24 July 2022 13:17

D30 NAG Page 140

Mathematical Intuition
24 July 2022 13:17

D30 NAG Page 141

Geometric Intuition
24 July 2022 13:17

D30 NAG Page 142

Disadvantage
24 July 2022 13:18

D30 NAG Page 143

Keras Code
24 July 2022 13:18

D30 NAG Page 144

AdaGrad Intro
02 August 2022 18:45

D31 Adagrad Page 145

How optimizers behave(Why?)
02 August 2022 18:44

D31 Adagrad Page 146

Adagrad mathematics + Intuition
02 August 2022 18:44

D31 Adagrad Page 147

Adagrad Demo
02 August 2022 18:44

D31 Adagrad Page 148

Keras Implementation
02 August 2022 18:44

D31 Adagrad Page 149

Disadvantage
02 August 2022 18:43

D31 Adagrad Page 150

The Why?
03 August 2022 14:03

D32 RMSProp Page 151

Mathematical Formulation
03 August 2022 14:04

D32 RMSProp Page 152

D32 RMSProp Page 153
Disadvantage
03 August 2022 14:54

D32 RMSProp Page 154

Introduction
04 August 2022 10:45

D33 ADAM Page 155

Mathematical Formulation
04 August 2022 10:45

D33 ADAM Page 156

Demo
04 August 2022 10:45

D33 ADAM Page 157

Verdict
04 August 2022 10:45

D33 ADAM Page 158

What is a CNN?
17 August 2022 06:47

Convolutional neural networks, also known as convnet, or CNNs, are a

special kind of neural network for processing data that has a known
grid-like topology like time series data(1D) or images(2D).

D35 Intro to CNN Page 159

Why not use ANN?
17 August 2022 06:47

1. High Computation Cost

2. Overfitting
3. Loss of imp info like spatial arrangement of pixels

D35 Intro to CNN Page 160

CNN Intuition
17 August 2022 06:48

D35 Intro to CNN Page 161

CNN Applications
17 August 2022 06:48

D35 Intro to CNN Page 162

Roadmap
17 August 2022 06:48

D35 Intro to CNN Page 163

Human Visual Cortex
18 August 2022 16:05

D36 CNN vs Visual Cortex Page 164

The Experiment
18 August 2022 16:09

D36 CNN vs Visual Cortex Page 165

Conclusion
18 August 2022 17:57

D36 CNN vs Visual Cortex Page 166

Development
18 August 2022 16:15

D36 CNN vs Visual Cortex Page 167

Introduction
19 August 2022 16:45

D37 Convolution Operation Page 168

Basics of Images
19 August 2022 16:53

D37 Convolution Operation Page 169

Edge Detection (Convolution Operation)
19 August 2022 16:53

D37 Convolution Operation Page 170

Demo
19 August 2022 16:54

D37 Convolution Operation Page 171

Working with RGB Images
19 August 2022 16:54

D37 Convolution Operation Page 172

Multiple Filters
23 August 2022 08:24

Image taken from Andrew NG's lecture

D37 Convolution Operation Page 173

Problem with Convolution
26 August 2022 14:25

D38 Padding and Stride Page 174

What is Padding?
26 August 2022 14:26

D38 Padding and Stride Page 175

D38 Padding and Stride Page 176
Strides
27 August 2022 08:23

D38 Padding and Stride Page 177

D38 Padding and Stride Page 178
Why Strides are required?
27 August 2022 08:24

D38 Padding and Stride Page 179

The Problem with Convolution
01 September 2022 09:55

D39 Pooling Page 180

Pooling
01 September 2022 09:55

D39 Pooling Page 181

Demo
01 September 2022 09:57

D39 Pooling Page 182

Pooling on Volumes
01 September 2022 09:56

D39 Pooling Page 183

Advantages of Pooling
01 September 2022 09:56

D39 Pooling Page 184

D39 Pooling Page 185
Keras Code
01 September 2022 09:56

D39 Pooling Page 186

Types of Pooling
01 September 2022 09:57

D39 Pooling Page 187

Disadvantages of Pooling
01 September 2022 09:57

D39 Pooling Page 188

CNN Architecture
02 September 2022 10:55

D40 CNN Architecture Page 189

D40 CNN Architecture Page 190
LeNet
02 September 2022 10:55

D40 CNN Architecture Page 191

Guidelines
02 September 2022 10:58

D40 CNN Architecture Page 192

Keras Code
02 September 2022 14:58

D40 CNN Architecture Page 193

CNN Vs ANN
06 September 2022 10:00

D41 CNN Vs ANN Page 194

D41 CNN Vs ANN Page 195
Backpropagation in CNN
10 September 2022 11:12

D42 Backprop in CNN Page 196

D42 Backprop in CNN Page 197
D42 Backprop in CNN Page 198
Backpropogation in CNN
15 September 2022 09:37

Day 43 Backpropogation in CNN Part 2 Page 199

Day 43 Backpropogation in CNN Part 2 Page 200
Day 43 Backpropogation in CNN Part 2 Page 201
Day 43 Backpropogation in CNN Part 2 Page 202
Why use Pretrained models?
03 October 2022 12:52

Day 46 Using Pretrained Models Page 203

ImageNET Dataset
03 October 2022 12:36

Day 46 Using Pretrained Models Page 204

ILSVRC
03 October 2022 12:38

Day 46 Using Pretrained Models Page 205

Famous Architectures
03 October 2022 12:39

Day 46 Using Pretrained Models Page 206

Idea of Pretrained Models
03 October 2022 12:39

Day 46 Using Pretrained Models Page 207

Keras Demo
03 October 2022 12:40

Day 46 Using Pretrained Models Page 208

Problem with training your own model
10 October 2022 10:48

Day 47 Transfer Learning Page 209

Using Pretrained Models
10 October 2022 10:48

Day 47 Transfer Learning Page 210

Transfer Learning
10 October 2022 10:49

Transfer learning is a research problem in machine learning that focuses on storing

knowledge gained while solving one problem and applying it to a different but
related problem.

Day 47 Transfer Learning Page 211

Day 47 Transfer Learning Page 212
Why Transfer Learning works
10 October 2022 10:50

Day 47 Transfer Learning Page 213

Ways of doing Transfer Learning
10 October 2022 10:50

Day 47 Transfer Learning Page 214

Code
10 October 2022 10:50

Day 47 Transfer Learning Page 215

Problem with Sequential Model
14 October 2022 16:00

Day 48 Functional Models Page 216

Day 48 Functional Models Page 217
A Simple Example
14 October 2022 16:01

Day 48 Functional Models Page 218

Multi Output Model
14 October 2022 16:02

Day 48 Functional Models Page 219

Multi Input Model
14 October 2022 16:02

Day 48 Functional Models Page 220

Shared Layers Model
14 October 2022 16:02

Day 48 Functional Models Page 221

Sequential Data
22 October 2022 13:09

D49 Why RNN Page 222

Why use RNN?
22 October 2022 13:37

D49 Why RNN Page 223

D49 Why RNN Page 224
RNN Applications
22 October 2022 13:37

D49 Why RNN Page 225

Roadmap
22 October 2022 13:37

D49 Why RNN Page 226

Why RNNs?
29 October 2022 13:30

D50 Forward Propagation in RNNs Page 227

Data for RNN
29 October 2022 13:30

D50 Forward Propagation in RNNs Page 228

RNN Architecture
29 October 2022 13:30

D50 Forward Propagation in RNNs Page 229

RNN Forward Prop
29 October 2022 17:02

D50 Forward Propagation in RNNs Page 230

Simplified Representation
29 October 2022 13:32

D50 Forward Propagation in RNNs Page 231

Code
29 October 2022 13:32

D50 Forward Propagation in RNNs Page 232

State and Memory
29 October 2022 13:33

D50 Forward Propagation in RNNs Page 233

Keras Code Example
06 November 2022 09:14

In natural language processing, word embedding is a term used for the

representation of words for text analysis, typically in the form of a real -
valued vector that encodes the meaning of the word such that the
words that are closer in the vector space are expected to be similar in
meaning.

D51 Code Example Keras Page 234

D51 Code Example Keras Page 235
D51 Code Example Keras Page 236
D51 Code Example Keras Page 237
D51 Code Example Keras Page 238
D51 Code Example Keras Page 239
D51 Code Example Keras Page 240
D51 Code Example Keras Page 241
Till Now
17 November 2022 17:01

D52 Types of RNN Page 242

Many to One
17 November 2022 17:02

D52 Types of RNN Page 243

One to Many
17 November 2022 17:02

D52 Types of RNN Page 244

Many to Many
17 November 2022 17:02

D52 Types of RNN Page 245

One to One
17 November 2022 17:02

D52 Types of RNN Page 246

Backpropagation in RNN
01 December 2022 16:43

D53 Backpropogation through time Page 247

D53 Backpropogation through time Page 248
D53 Backpropogation through time Page 249
Problem with RNN
19 December 2022 16:33

Problems with RNN Page 250

Problems with RNN Page 251
Recap
21 August 2023 11:55

D54 LSTM The What Page 252

LSTM Core Idea
21 August 2023 17:27

D54 LSTM The What Page 253

LSTM Architecture
21 August 2023 18:41

D54 LSTM The What Page 254

D54 LSTM The What Page 255
LSTM Gates
21 August 2023 19:06

D54 LSTM The What Page 256

Summary
21 August 2023 19:29

D54 LSTM The What Page 257

The Architecture
29 August 2023 07:57

D55 LSTM Architecture Page 258

The Gates
29 August 2023 08:09

D55 LSTM Architecture Page 259

What are Ct and ht
29 August 2023 08:08

D55 LSTM Architecture Page 260

What is Xt
29 August 2023 17:40

D55 LSTM Architecture Page 261

What are ft, it, ot and Ct
29 August 2023 08:09

D55 LSTM Architecture Page 262

Pointwise Operations
29 August 2023 18:26

D55 LSTM Architecture Page 263

Neural Network Layers
29 August 2023 18:34

D55 LSTM Architecture Page 264

The Forget Gate
29 August 2023 19:58

D55 LSTM Architecture Page 265

The Input Gate
30 August 2023 04:38

D55 LSTM Architecture Page 266

The Output Gate
30 August 2023 05:27

D55 LSTM Architecture Page 267

What is a Next Word Predictor
08 September 2023 08:50

D56 Next Word Predictor Page 268

The Strategy
08 September 2023 08:51

D56 Next Word Predictor Page 269

D56 Next Word Predictor Page 270
The Architecture
08 September 2023 08:55

D56 Next Word Predictor Page 271

How to improve performance?
08 September 2023 08:51

D56 Next Word Predictor Page 272

What is GRU
05 October 2023 00:09

D57 Gated Recurrent Unit Page 273

The Big Idea Behind GRU
05 October 2023 00:47

D57 Gated Recurrent Unit Page 274

The Setup
05 October 2023 01:07

D57 Gated Recurrent Unit Page 275

The Input Xt
05 October 2023 01:52

D57 Gated Recurrent Unit Page 276

Architecture
05 October 2023 02:10

D57 Gated Recurrent Unit Page 277

What exactly is hidden state?
05 October 2023 02:19

• There was a king Vikram very strong and powerful

• There was an enemy king kaali
• Both had a war and kaali killed Vikram
• Vikram had a son Vikram Jr who grew up he to become very strong
just like his father
• He also attacked Kaali But got killed
• Vikram Jr too had a son called Vikram super Jr and when he grew
up he also fought kaali
• And he killed kaali and took revenge of his father and grand father

[Power, Conflict, Tragedy, Revenge]

D57 Gated Recurrent Unit Page 278

D57 Gated Recurrent Unit Page 279
Calculating the reset gate
05 October 2023 14:24

D57 Gated Recurrent Unit Page 280

D57 Gated Recurrent Unit Page 281
LSTM vs GRU
05 October 2023 16:45

Here are the main differences between LSTM and GRU:

1. Number of Gates:
• LSTM: Has three gates — input (or update) gate, forget gate, and output gate.
• GRU: Has two gates — reset gate and update gate.

2. Memory Units:
• LSTM: Uses two separate states - the cell state (ct) and the hidden state (ht). The cell
state acts as an "internal memory" and is crucial for carrying long-term dependencies.
• GRU: Simplifies this by using a single hidden state (ht) to both capture and output the
memory.

3. Parameter Count:
• LSTM: Generally has more parameters than a GRU because of its additional gate and
separate cell state. For an input size of d and a hidden size of h, the LSTM has 4
×((d×ℎ)+(ℎ×ℎ)+ℎ)4×((d×h)+(h×h)+h)parameters.

• GRU: Has fewer parameters. For the same sizes, the GRU has 3×((d×ℎ)+(ℎ×ℎ)+ℎ)3
×((d×h)+(h×h)+h)parameters.

4. Computational Complexity:
• LSTM: Due to the extra gate and cell state, LSTMs are typically more computationally
intensive than GRUs.
• GRU: Is simpler and can be faster to compute, especially on smaller datasets or when
computational resources are limited.

5. Empirical Performance:
• LSTM: In many tasks, especially more complex ones, LSTMs have been observed to
perform slightly better than GRUs.

• GRU: Can perform comparably to LSTMs on certain tasks, especially when data is
limited or tasks are simpler. They can also train faster due to fewer parameters.

6. Choice in Practice:
• The choice between LSTM and GRU often comes down to empirical testing. Depending
on the dataset and task, one might outperform the other. However, GRUs, due to their
simplicity, are often the first choice when starting out.

D57 Gated Recurrent Unit Page 282

What is Deep RNN
17 October 2023 16:28

D58 Deep RNNs Page 283

Architecture
17 October 2023 16:29

D58 Deep RNNs Page 284

D58 Deep RNNs Page 285
Notation
17 October 2023 16:29

D58 Deep RNNs Page 286

Why and When to use?
17 October 2023 16:29

1. Hierarchical Representation
2. Customization for Advanced Tasks

D58 Deep RNNs Page 287

Code Example
17 October 2023 16:30

D58 Deep RNNs Page 288

Variants
17 October 2023 16:30

D58 Deep RNNs Page 289

Disadvantages
17 October 2023 16:30

D58 Deep RNNs Page 290

The Why?
26 October 2023 15:19

D59 Bidirectional RNNs Page 291

Bidirectional RNN Architecture
26 October 2023 15:19

D59 Bidirectional RNNs Page 292

Code
26 October 2023 15:21

D59 Bidirectional RNNs Page 293

Applications and Drawbacks
26 October 2023 15:21

D59 Bidirectional RNNs Page 294

Introduction
16 November 2023 16:15

D60 Seq2Seq Models Page 295

Sequence Tasks and its types
16 November 2023 16:16

D60 Seq2Seq Models Page 296

Seq2Seq Tasks
16 November 2023 16:16

D60 Seq2Seq Models Page 297

History of Seq2Seq Models
16 November 2023 16:16

D60 Seq2Seq Models Page 298

Stage 1 - Encoder Decoder Architecture
18 November 2023 16:16

Ilya Sutskever

D60 Seq2Seq Models Page 299

Stage 2 - Attention Mechanism
20 November 2023 10:59

"Sadly mistaken, he realized that the job offer was

actually an incredible opportunity that would lead
to significant personal and professional growth."

D60 Seq2Seq Models Page 300

D60 Seq2Seq Models Page 301
Stage 3 - Transformers
20 November 2023 12:18

D60 Seq2Seq Models Page 302

Stage 4 - Transfer Learning
20 November 2023 15:39

Transfer learning (TL) is a technique in which knowledge learned from a

task is re-used in order to boost performance on a related task.

For example, for image classification, knowledge gained while learning

to recognize cars could be applied when trying to recognize trucks.

D60 Seq2Seq Models Page 303

D60 Seq2Seq Models Page 304
D60 Seq2Seq Models Page 305
Stage 5 - LLMs
20 November 2023 18:51

D60 Seq2Seq Models Page 306

D60 Seq2Seq Models Page 307
The Grand Finale - ChatGPT
22 November 2023 10:13

D60 Seq2Seq Models Page 308

D60 Seq2Seq Models Page 309
Seq2Seq Data
08 December 2023 17:12

D61 Encoder Decoder Module Page 310

Before Starting
08 December 2023 19:24

D61 Encoder Decoder Module Page 311

High Level Overview
08 December 2023 17:14

D61 Encoder Decoder Module Page 312

What's under the hood?
08 December 2023 17:14

D61 Encoder Decoder Module Page 313

D61 Encoder Decoder Module Page 314
Training the Architecture using Backpropagation
08 December 2023 17:15

D61 Encoder Decoder Module Page 315

D61 Encoder Decoder Module Page 316
Prediction
08 December 2023 17:15

D61 Encoder Decoder Module Page 317

Improvement 1 - Embeddings
08 December 2023 17:16

D61 Encoder Decoder Module Page 318

Improvement 2 - Deep LSTMs
08 December 2023 17:16

D61 Encoder Decoder Module Page 319

Improvement 3 - Reversing the Input
08 December 2023 17:16

D61 Encoder Decoder Module Page 320

The Sutskever Architecture
08 December 2023 17:18

Application to Translation: The model focused on translating English to French, demonstrating

the effectiveness of sequence-to-sequence learning in neural machine translation.

Special End-of-Sentence Symbol: Each sentence in the dataset was terminated with a unique
end-of-sentence symbol ("<EOS>"), enabling the model to recognize the end of a sequence.

Dataset: The model was trained on a subset of 12 million sentences, comprising 348 million
French words and 304 million English words, taken from a publicly available dataset.

Vocabulary Limitation: To manage computational complexity, fixed vocabularies for both

languages were used, with 160,000 most frequent words for English and 80,000 for French.
Words not in these vocabularies were replaced with a special "UNK" token.

Reversing Input Sequences: The input sentences (English) were reversed before feeding them
into the model, which was found to significantly improve the model's learning efficiency,
especially for longer sentences.

Word Embeddings: The model used a 1000-dimensional word embedding layer to represent
input words, providing dense, meaningful representations of each word.

Architecture Details: Both the input (encoder) and output (decoder) models had 4 layers, with
each layer containing 1,000 units, showcasing a deep LSTM-based architecture.

Output Layer and Training: The output layer employed a Softmax function to generate the
probability distribution over the target vocabulary. The model was trained end-to-end with
these settings.

Performance - BLEU Score: The model achieved a BLEU score of 34.81, surpassing the baseline
Statistical Machine Translation (SMT) system's score of 33.30 on the same dataset, marking a
significant advancement in neural machine translation.

D61 Encoder Decoder Module Page 321

The Why
20 December 2023 13:35

Once upon a time in a small Indian village, a mischievous monkey

stole a turban from a sleeping barber, wore it to a wedding, danced
with the bewildered guests, accidentally got crowned the 'Banana
King' by the local kids, and ended up leading a vibrant, impromptu
parade of laughing villagers, cows, and street dogs, all while
balancing a stack of mangoes on its head, creating a hilariously
unforgettable spectacle and an amusing legend that the village still
chuckles about every monsoon season.

D62 Attention Mechanism Page 322

The Solution
20 December 2023 17:32

Once upon a time in a small Indian village, a mischievous monkey

D62 Attention Mechanism Page 323

The What
21 December 2023 06:04

D62 Attention Mechanism Page 324

D62 Attention Mechanism Page 325
Recap
16 January 2024 16:10

D63 Bandanau Attention Vs Luong Attention Page 326

Bahdanau Attention
16 January 2024 16:11

D63 Bandanau Attention Vs Luong Attention Page 327

D63 Bandanau Attention Vs Luong Attention Page 328
Luong Attention
17 January 2024 00:09

D63 Bandanau Attention Vs Luong Attention Page 329

What is Transformer?
27 January 2024 18:41

D64 Transformers Part 1 - Introduction Page 330

Impact of Transformers
27 January 2024 20:03

D64 Transformers Part 1 - Introduction Page 331

The Origin Story!
27 January 2024 22:38

D64 Transformers Part 1 - Introduction Page 332

D64 Transformers Part 1 - Introduction Page 333
The Timeline
28 January 2024 00:55

D64 Transformers Part 1 - Introduction Page 334

Advantages
28 January 2024 01:00

D64 Transformers Part 1 - Introduction Page 335

Famous Applications
28 January 2024 01:17

D64 Transformers Part 1 - Introduction Page 336

Disadvantages
28 January 2024 01:51

D64 Transformers Part 1 - Introduction Page 337

Future
28 January 2024 02:09

D64 Transformers Part 1 - Introduction Page 338

The What

D65 Transformers Part 1 - Self Attention Page 339

Embeddings

D65 Transformers Part 1 - Self Attention Page 340

D65 Transformers Part 1 - Self Attention Page 341
First Principle Approach
04 February 2024 17:55

embedding of bank which

produces a n dim vector

similarity is how much

of emoney has the
similarity with ebank
which is represented by
the coefficient of ebank

D65 Transformers Part 1 - Self Attention Page 342

D65 Transformers Part 1 - Self Attention Page 343
Progress
06 February 2024 00:42

D65 Transformers Part 1 - Self Attention Page 344

Query, Key & Value Vectors
06 February 2024 13:32

D65 Transformers Part 1 - Self Attention Page 345

Revision
28 February 2024 16:25

D66 Why Scaling is needed for Self Attention Page 346

D66 Why Scaling is needed for Self Attention Page 347
D66 Why Scaling is needed for Self Attention Page 348
What is dk
28 February 2024 16:59

D66 Why Scaling is needed for Self Attention Page 349

Recep
08 March 2024 15:14

D67 Self Attention Geometric Visualization Page 350

Geometric Intuition
08 March 2024 15:16

D67 Self Attention Geometric Visualization Page 351

D67 Self Attention Geometric Visualization Page 352
Recap of Attention
11 March 2024 17:09

D68 Why call it self attention Page 353

Recap of Self Attention
08 April 2024 14:46

D69 Multihead Attention Page 354

Problem with Self Attention
08 April 2024 14:47

The man saw the astronomer with a telescope

The

man

saw

the

astronomer

with

telescope

D69 Multihead Attention Page 355

Multi-head Attention
10 April 2024 09:50

The man saw the astronomer with a telescope

D69 Multihead Attention Page 356

D69 Multihead Attention Page 357
15 April 2024 16:36

random Page 358

The Why
23 May 2024 14:33

D70 Positional Encodings Page 359

Proposing a simple solution
23 May 2024 15:37

D70 Positional Encodings Page 360

D70 Positional Encodings Page 361
The sine function as a solution
23 May 2024 17:17

D70 Positional Encodings Page 362

D70 Positional Encodings Page 363
Positional Encoding
23 May 2024 20:35

D70 Positional Encodings Page 364

D70 Positional Encodings Page 365
Interesting Observations
24 May 2024 02:06

D70 Positional Encodings Page 366

D70 Positional Encodings Page 367
Agenda
07 June 2024 02:03

D71 Layer Normalization Page 368

What is Normalization
05 June 2024 10:32

Normalization in deep learning refers to the process of transforming data or model outputs to
have specific statistical properties, typically a mean of zero and a variance of one.

What do we normalize?

Benefits of Normalization in Deep Learning

• Improved Training Stability:

○ Normalization helps to stabilize and accelerate the training process by reducing the
likelihood of extreme values that can cause gradients to explode or vanish.

• Faster Convergence:

○ By normalizing inputs or activations, models can converge more quickly because the
gradients have more consistent magnitudes. This allows for more stable updates
during backpropagation.

• Mitigating Internal Covariate Shift:

○ Internal covariate shift refers to the change in the distribution of layer inputs during
training. Normalization techniques, like batch normalization, help to reduce this
shift, making the training process more robust.

• Regularization Effect:

○ Some normalization techniques, like batch normalization, introduce a slight

regularizing effect by adding noise to the mini-batches during training. This can help
to reduce overfitting.

D71 Layer Normalization Page 369

D71 Layer Normalization Page 370
Batch Norm(Revision)
05 June 2024 10:39

D71 Layer Normalization Page 371

Why don't we use Batch Norm in Transformers?
05 June 2024 10:40

Review Sentiment
Hi Nitish 1
How are you today 0
I am good 0
You? 1

Embedding dimension - 3

Batch Size - 2

D71 Layer Normalization Page 372

D71 Layer Normalization Page 373
Layer Norm
05 June 2024 10:40

D71 Layer Normalization Page 374

Layer Norm in Transformers
05 June 2024 10:40

D71 Layer Normalization Page 375

Recap
09 July 2024 08:47

D72 Transformer Encoder Page 376

Simplified Representation!
09 July 2024 13:05

D72 Transformer Encoder Page 377

D72 Transformer Encoder Page 378
Encoder Architecture
09 July 2024 16:31

D72 Transformer Encoder Page 379

D72 Transformer Encoder Page 380
Some questions
09 July 2024 20:48

1. Why use residual connections?

2. Why use a FFNN?
3. Why use 6 encoder blocks?

D72 Transformer Encoder Page 381

Recap
23 July 2024 17:09

D73 Masked Self Attention Page 382

Autoregressive models
23 July 2024 17:39

The Transformer decoder is autoregressive at

inference time and non-autoregressive at
training time.

In the context of deep learning, autoregressive models are a class of

models that generate data points in a sequence by conditioning each
new point on the previously generated points.

D73 Masked Self Attention Page 383

D73 Masked Self Attention Page 384
Transformer as an Autoregressive Model
23 July 2024 23:16

The Transformer decoder is autoregressive at

inference time and non-autoregressive at
training time.

Inference

Query Sentence -> I am fine

D73 Masked Self Attention Page 385

Training

S.No English Sentence Hindi Sentence

1 How are you?
2 Congratulations
3 Thank you

D73 Masked Self Attention Page 386

The problem in parallelizing
25 July 2024 22:55

S.No English Sentence Hindi Sentence

1 How are you?
2 Congratulations
3 Thank you

D73 Masked Self Attention Page 387

D73 Masked Self Attention Page 388
Finding the answer
26 July 2024 00:21

ce = w11 * + w12 * + w13 *

ce = w21 * + w22* + w23 *

ce = w31 * + w32 * + w33 *

D73 Masked Self Attention Page 389

Plan of Action
12 August 2024 17:53

D74 Cross Attention Page 390

What is Cross Attention
12 August 2024 17:53

Cross-attention is a mechanism used in transformer architectures, particularly in tasks

involving sequence-to-sequence data like translation or summarization. It allows a model to
focus on different parts of an input sequence when generating an output sequence.

Cross Attention is conceptually very similar to Self-Attention

Self-Attention Vs Cross Attention

1. The input
2. The processing
3. The output

D74 Cross Attention Page 391

Self-Attention Vs Cross Attention (Input)
13 August 2024 08:22

D74 Cross Attention Page 392

Self-Attention Vs Cross Attention (Processing)
12 August 2024 17:54

D74 Cross Attention Page 393

Self-Attention Vs Cross Attention [Output]
12 August 2024 18:05

D74 Cross Attention Page 394

Cross Attention Vs Bahdanau/Luong Attention
12 August 2024 18:06

D74 Cross Attention Page 395

Use-cases
12 August 2024 18:07

D74 Cross Attention Page 396

Plan of Attack
22 August 2024 15:36

Training not Inference

D75 Decoder Architecture Page 397

Simplified View
22 August 2024 00:33

D75 Decoder Architecture Page 398

D75 Decoder Architecture Page 399
Decoder Architecture
22 August 2024 00:56
Eng | Hindi

I am good Mai badhiya hu

We are friends Hum dost hai

1. Shifting
2. Tokenization
3. Embedding
4. Positional Encoding

D75 Decoder Architecture Page 400

D75 Decoder Architecture Page 401
Machine Translation Task
English Sentence - We are friends

Hindi Sentence - Hum dost hai

D75 Decoder Architecture Page 402

D75 Decoder Architecture Page 403
Plan of Attack
03 September 2024 00:37

D76 Transformer Inference Page 404

The Setup
03 September 2024 00:39

S no English Sentence Hindi Sentence

1 How are you Aap kaise ho
2 Thank you Dhanyawad
3 You are welcome Apka swagat hai

Query Sentence

We are friends

D76 Transformer Inference Page 405

Transformer during Inference
03 September 2024 00:50

D76 Transformer Inference Page 406

D76 Transformer Inference Page 407
D76 Transformer Inference Page 408

Solutions
No ratings yet
Solutions
11 pages
file
No ratings yet
file
408 pages
Curriculum: Tuesday, February 15, 2022 3:30 PM
No ratings yet
Curriculum: Tuesday, February 15, 2022 3:30 PM
358 pages
Cambridge 3 Syllabus Revision Plane Nov - Dec 2022
No ratings yet
Cambridge 3 Syllabus Revision Plane Nov - Dec 2022
2 pages
Unit_9
No ratings yet
Unit_9
206 pages
Backup Infrastructure Custom Data
No ratings yet
Backup Infrastructure Custom Data
2 pages
Monthly Closed/Resolved Tickets: Request ID Request Mode Support Group
No ratings yet
Monthly Closed/Resolved Tickets: Request ID Request Mode Support Group
90 pages
Vishu's Notes Digital Logic Design
No ratings yet
Vishu's Notes Digital Logic Design
159 pages
1.2 Introduction To Applied Data Science
No ratings yet
1.2 Introduction To Applied Data Science
47 pages
Student Grade History
No ratings yet
Student Grade History
2 pages
Other Roport
No ratings yet
Other Roport
49 pages
Major Report Pandey Final
No ratings yet
Major Report Pandey Final
34 pages
Intership Report
No ratings yet
Intership Report
38 pages
Unit 2 - Data Preprocessing
No ratings yet
Unit 2 - Data Preprocessing
42 pages
Revueeng
No ratings yet
Revueeng
58 pages
Student Grade History
No ratings yet
Student Grade History
2 pages
Assignement 05
No ratings yet
Assignement 05
3 pages
Log
No ratings yet
Log
3 pages
A Project Reported on - Naveen Raj s
No ratings yet
A Project Reported on - Naveen Raj s
71 pages
0% File (B) Skin - 3h9m - 0.16mm - 195C - PLA - ENDER3V2.gcode Filament1.64 M Slicer3h 9m 7s Actual40m 19s Total40m 48s
No ratings yet
0% File (B) Skin - 3h9m - 0.16mm - 195C - PLA - ENDER3V2.gcode Filament1.64 M Slicer3h 9m 7s Actual40m 19s Total40m 48s
6 pages
PWC - Digital Trends in Maintenance
No ratings yet
PWC - Digital Trends in Maintenance
40 pages
Dokumen - Pub - Data Wrangling Concepts Applications and Tools 111987968x 9781119879688
No ratings yet
Dokumen - Pub - Data Wrangling Concepts Applications and Tools 111987968x 9781119879688
357 pages
ВО4901ЕІ 2022 11 16 11 32
No ratings yet
ВО4901ЕІ 2022 11 16 11 32
7 pages
Final
No ratings yet
Final
42 pages
finalreport.pdf 09-23-10-114
No ratings yet
finalreport.pdf 09-23-10-114
16 pages
Late Tickets
No ratings yet
Late Tickets
19 pages
wp_frm_item_metas_2
No ratings yet
wp_frm_item_metas_2
1 page
Water Simulation On WebGL and Three - Js
No ratings yet
Water Simulation On WebGL and Three - Js
42 pages
Prediction of Fertilizers For The Efficient Yield Through Machine Learning
No ratings yet
Prediction of Fertilizers For The Efficient Yield Through Machine Learning
73 pages
WDDBA3
No ratings yet
WDDBA3
1 page
Industrial Internship Report
No ratings yet
Industrial Internship Report
21 pages
RBSCVC 20
No ratings yet
RBSCVC 20
8 pages
DE Report-5-68
No ratings yet
DE Report-5-68
64 pages
KEY DATES - Sept2021 (Computing)
No ratings yet
KEY DATES - Sept2021 (Computing)
2 pages
GCC SVC Log 2022-01-07
No ratings yet
GCC SVC Log 2022-01-07
3 pages
List of Digital 201 Courses: What Is Digital 201 Course
No ratings yet
List of Digital 201 Courses: What Is Digital 201 Course
81 pages
Linode UnderstandingDatabases ExtendedEdition
No ratings yet
Linode UnderstandingDatabases ExtendedEdition
259 pages
Visit Bangladesh SRS
No ratings yet
Visit Bangladesh SRS
107 pages
Sign in
No ratings yet
Sign in
6 pages
Dcg02 Due Dates For Assignments
No ratings yet
Dcg02 Due Dates For Assignments
1 page
Done
No ratings yet
Done
91 pages
ICDL - Alaa Elkhouly
No ratings yet
ICDL - Alaa Elkhouly
1 page
Module 5 Decision Tree Part2
No ratings yet
Module 5 Decision Tree Part2
47 pages
(Ebook) Interactive computer graphics: a top-down approach using OpenGL by Angel, Edward ISBN 9780321535863, 9780321549433, 0321535863, 0321549430 instant download
100% (2)
(Ebook) Interactive computer graphics: a top-down approach using OpenGL by Angel, Edward ISBN 9780321535863, 9780321549433, 0321535863, 0321549430 instant download
57 pages
(Ebook PDF) Handbook on Intelligent Healthcare Analytics 1st edition by Jaya Kalaiselvi, Dinesh Goyal, Dhiya Jumeily 9781119792550 full chapters - Get the ebook in PDF format for a complete experience
100% (9)
(Ebook PDF) Handbook on Intelligent Healthcare Analytics 1st edition by Jaya Kalaiselvi, Dinesh Goyal, Dhiya Jumeily 9781119792550 full chapters - Get the ebook in PDF format for a complete experience
80 pages
A Project Reported Final Pro - 312820205028 Naveen Raj s
No ratings yet
A Project Reported Final Pro - 312820205028 Naveen Raj s
71 pages
D.E_
No ratings yet
D.E_
72 pages
Draft - EHS-DGP Study Material
No ratings yet
Draft - EHS-DGP Study Material
59 pages
index practical chaitu
No ratings yet
index practical chaitu
15 pages
Project Status Reporting
No ratings yet
Project Status Reporting
65 pages
WDDBA2
No ratings yet
WDDBA2
1 page
Face Recognition Based Attendance System
No ratings yet
Face Recognition Based Attendance System
70 pages
A2
No ratings yet
A2
66 pages
Mini Project Report (MCA)
No ratings yet
Mini Project Report (MCA)
96 pages
Live Telemetry
No ratings yet
Live Telemetry
89 pages
Grandstream Networks, Inc.: GDS3710 Input/output Alarms Configuration Guide
No ratings yet
Grandstream Networks, Inc.: GDS3710 Input/output Alarms Configuration Guide
18 pages
Data and Analytics_
No ratings yet
Data and Analytics_
43 pages
(Ebook) Software Solutions for Rapid Prototyping by Ian Gibson ISBN 1860583601 All Chapters Instant Download
100% (2)
(Ebook) Software Solutions for Rapid Prototyping by Ian Gibson ISBN 1860583601 All Chapters Instant Download
81 pages
10alytics Data Analyst Track Welcome Kit Cohort 15-1
No ratings yet
10alytics Data Analyst Track Welcome Kit Cohort 15-1
9 pages
Capacity To Customers Project Progress Report Dec2012 0
No ratings yet
Capacity To Customers Project Progress Report Dec2012 0
20 pages
Agile Modeling: Effective Practices for eXtreme Programming and the Unified Process
From Everand
Agile Modeling: Effective Practices for eXtreme Programming and the Unified Process
Scott Ambler
3.5/5 (13)
English-to-Malayalam_Machine_Translation_Framework_using_Transformers
No ratings yet
English-to-Malayalam_Machine_Translation_Framework_using_Transformers
5 pages
Lecture 27
No ratings yet
Lecture 27
40 pages
Sec1_introduction_GR_Tutorial_Slides_SIGIR
No ratings yet
Sec1_introduction_GR_Tutorial_Slides_SIGIR
25 pages
An Overview of Chatbots Using ML Algorithms in Agricultural Domain
No ratings yet
An Overview of Chatbots Using ML Algorithms in Agricultural Domain
9 pages
Arabic Chatbots A Survey
No ratings yet
Arabic Chatbots A Survey
8 pages
TensorRT Sample Support Guide
No ratings yet
TensorRT Sample Support Guide
52 pages
50 LLM Interview Questions
No ratings yet
50 LLM Interview Questions
56 pages
ACL - 2020 - Mike Lewis - BART Denoising Sequence-To-Sequence Pre-Training For Natural Language Generation, Translation, and Comprehension
No ratings yet
ACL - 2020 - Mike Lewis - BART Denoising Sequence-To-Sequence Pre-Training For Natural Language Generation, Translation, and Comprehension
10 pages
2022 - Hao Zhang
No ratings yet
2022 - Hao Zhang
12 pages
AI-Powered Text Generation For Harmonious Human-Machine Interaction: Current State and Future Directions
No ratings yet
AI-Powered Text Generation For Harmonious Human-Machine Interaction: Current State and Future Directions
8 pages
Marathi To English Neural Machine Translation With Near Perfect Corpus and Transformers
No ratings yet
Marathi To English Neural Machine Translation With Near Perfect Corpus and Transformers
5 pages
Lecture 2
No ratings yet
Lecture 2
80 pages
Tensor Flow
No ratings yet
Tensor Flow
130 pages
DL 8
No ratings yet
DL 8
7 pages
A Survey On Text-to-SQL Parsing: Concepts, Methods, and Future Directions
No ratings yet
A Survey On Text-to-SQL Parsing: Concepts, Methods, and Future Directions
19 pages
A_Transformer-Based_Framework_for_Scene_Text_Recognition
No ratings yet
A_Transformer-Based_Framework_for_Scene_Text_Recognition
16 pages
Full Chapter Programming Large Language Models With Azure Open Ai Conversational Programming and Prompt Engineering With Llms Developer Reference 1St Edition Esposito PDF
100% (25)
Full Chapter Programming Large Language Models With Azure Open Ai Conversational Programming and Prompt Engineering With Llms Developer Reference 1St Edition Esposito PDF
54 pages
LangGragh
No ratings yet
LangGragh
14 pages
CNN-BiLSTM model for English Handwriting Recognition, Comprehensiv Evalution on the IAM Dataset 2307.00664v1
No ratings yet
CNN-BiLSTM model for English Handwriting Recognition, Comprehensiv Evalution on the IAM Dataset 2307.00664v1
20 pages
Real World Natural Language Processing Practical Applications With Deep Learning 1st Edition Masato Hagiwarainstant download
100% (1)
Real World Natural Language Processing Practical Applications With Deep Learning 1st Edition Masato Hagiwarainstant download
52 pages
Neural Approaches To Conversational AI
No ratings yet
Neural Approaches To Conversational AI
95 pages
122012502009, Rehan Molla, Health Care Chartbot Using Deeplearning Technique
No ratings yet
122012502009, Rehan Molla, Health Care Chartbot Using Deeplearning Technique
67 pages
Visualizing a Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) – Jay Alammar – Visualizing Machine Learning One Concept at a Time.
No ratings yet
Visualizing a Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) – Jay Alammar – Visualizing Machine Learning One Concept at a Time.
13 pages
CH 6. Applications of AI-NLP
No ratings yet
CH 6. Applications of AI-NLP
65 pages
Deep Learning (MODULE-4)
No ratings yet
Deep Learning (MODULE-4)
102 pages
UNIT_2_DL[1]
No ratings yet
UNIT_2_DL[1]
43 pages
NeurIPS 2020 Retrieval Augmented Generation For Knowledge Intensive NLP Tasks Paper
No ratings yet
NeurIPS 2020 Retrieval Augmented Generation For Knowledge Intensive NLP Tasks Paper
16 pages
Exploring Text Decoding Methodsfor Portuguese Legal Text Generation, Kenzo Miranda Sakiyama
No ratings yet
Exploring Text Decoding Methodsfor Portuguese Legal Text Generation, Kenzo Miranda Sakiyama
15 pages
Attention and Transformers
No ratings yet
Attention and Transformers
103 pages