DL M2 Tech

DL notes

Uploaded by

Alefiya Rampurawala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

37 views32 pages

DL M2 Tech

DL notes

Uploaded by

Alefiya Rampurawala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 32

21 Training Feedforward DNN 211 Sia Training, Optimization ang Regularization of Deep Neural Network 2.1 Training Feedforward ONN Multi Layered Feed Forward Neural Network, Learning Factors, Activation Tanh, Logistic, Linear, SoftMax, ReLU, Leal ‘Output function and loss function ky ReLU, Loss functions: Squared Error loss, Cross Entropy, Gh 22 Optimization Learning with backpropagation, Batch GD, Momentum Based GD, Nesterov A Regularization Overview of Overfitting, regularization, Parameter sharing, Augmentation, Learning Parameters: Gradient Descent (GD), Stochastic ang} celerated GD, AdaGrad, Adam, RMSProp Types of biases, Bias Variance Tradeoff Regularization Methods: Dropout, Weight Decay, Adding noise to ‘input and output Batch normalization, Early stopping, B Multi Layered Feed Forward Neural Network As shown in Fig. 21.1 in.a multilayer feed forward network, there are multiple layers: Input mn Hidden layers and output layer Put Layer, one or The computational units of the hidden layer are known as hi idden neurons or input to the output layer the hidden layer performs useful intermediary comp, ay hidden units. Before = utations,Deep Learning f handlin ‘+ Multi-layered Network of neurons is composed of many sigmoid neurons. MLNs are bapa 2 as . the linearly separable data. The layers present between the input and output layers are calle layers. Th etweer ie output hidden layers are used to handle the complex non-linearly separable relations be in input and the out | Hidden layers | Input layers : Output layers: % Ye 4 | Deep feedforward neural network There are many factors that affects the neural network learning. Choosing the appropriate value of these factors oF Parameters is an important part of neural network trainin: neural network performance. Some of these factors are: Initial Weight 19. Appropriate selection of values help in improving Choice of Activation function Learning constant Network architecture/Number of hidden layersTraining Opti. & Regularization of Deep The second term (aA w (t- 1)) indicate @ scaled most recent adjustment of weights, is called the, term + Typically ais chosen between 0= Land 0-8 Loss function + Deep fearing neural networks are trained using the stochastic gradient descent optimization algorithm, As part of the optimization algorithm, the error for the current state of the model must be epeatedly. This requires the choice of an error function, conventionally called a loss function, that can| to estimate the loss of the model so that the weights can be updated to reduce the loss on Neural network models learn a mapping from inputs to outputs from examples and the choice of loss ‘must match the framing of the specific predictive modeling problem, such as classification or Further, the configuration of the output layer must also be appropriate for the chosen loss function. * For example, for regression, we use Mean squared error, or RMSE or MAE. '* _ For Classification problem cross entropy would be the appropriate Loss function. }. Epochs and batch size ‘+ The (batch Size refers to the number of samples that must be processed before the model is updated. ‘number of epochs refers to the total number of times the training dataset has been traversed to train 2 mode ‘Experiment with batch sizes and training epochs to see what works best for the model Activation Functions is Activation Function? function in a neural network defines how the weighted sum of the input is transformed into a node. Sometimes the activation function is called a “transfer function.” If the output range 0 is limited, then it may be called a “squashing function” in the layer or the network design. The nd performance of the neural network, and2.1.3(C) Types of Activation Functions Activation functions can be classified as linear or non-linear activation functions 1. Linear Activation Functions 1. entity function ‘+ The Linear activation function also known as identity function and is computed as, © = tWex + Itsrangeis—= to + =, ‘+ Uses : Linear activation function is used at just one place ie. output layer function has two major problems : (ie. gradient) is a constant and has no relation to the input x so it’s not possible ‘Retwork will collapse into one if a linear activation function is used. No matter the ral network, the last layer will still be a linear function of the first layer. So, function turns the neural network into just one layer. Linear FunctionLimitations ‘+ I-cannot provide multi-value outputs - for example, it cannot be used for multi-class classification The gradient of the step function is zero, which causes a hindrance in the backpropagation process. 0) x 0 Fig. 2.1.4 : Unipolar Binary activation function 3. Bipolar Binary ‘+ itis similar to unipolar binary function. The only difference is it produces out as either -1 or 1. + Use: This function is also used for binary classification (Yes/No) problem. Limitations I cannot provide multi-value outputs - for example, it cannot be used for multi-class classification problems, ‘The gradient of the step function is zero, which causes a hindrance in the backpropagation process. Although linear transformations make the neural network simpler, but this network would be less powerful and will not be able to learn the complex patterns from the data. (0)OG veep Learning Training, Opti. & Repularization of Deep Neural Metwork But the disadvantage is, it sutfers from vanishing gradient problem, a id “squashes” it into range between -1 and 1 is zero-centered, Its therefore preferred than sigmoid‘Training, Opt. Fig, 2.1.8 : ReLU Activation function Most modem deep NNs use ReLU activations At the first glance after plotting ReLU it seems to be a linear function. But in fact, itis a non-finear itis required so as to pick up & learn complex relationships from the training data. acts as a linear function for positive values and as a non-linear activation function for negative When we use an optimizer such as SGD (Stochastic Gradient Descent) during backpropagation, it linear function for positive values and thus it becomes a lot easier when computing the gradient Linearity allows to preserve properties and makes linear models easy to be optimized with g ivity to weighted sum and thus this avoids neurons from getting there is little or no variation in the output), of ReLUDeep Learning 2.11 Training, Opt. & Regularization of Deep Robust to vanishing Gradient problem : Neural Networks are trained using the process of gradients gradient descent consists of the backward propagation step which is basically @ chain rule to get the d weights in order to reduce the loss after every epoch. The activation function must withstand vanishing g problem, + Range : The range activation function for any application. of the output that is generated by activation function is an important factor for the « Non-Linearity : Non-linear activation functions are preferred over linear activation functions for solving problems, 2.2_LossFunctions 2.2.1 Whatis Loss Function ? + A.loss function is a method of evaluating how well your algorithm models your dataset. In terms of o techniques, the function which is used to evaluate a solution is referred to as the objective function. Now we: ‘want to maximize or minimize the objective function so to get the highest or lowest score respectively. Typically, for deep learning neural network, we want to minimize the error value and hence the objective here is known as a cost function or a loss function and the value of this objective function is simply referred the “loss”. ‘Types of Loss Functions Depending upon the nature ofthe problem we need to select the appropriate Loss function, Loss functions : For regression loss the following Loss functions are used to evaluate the model.egularization of Deep Neural 2-13 ‘raining, Opti. & number of classes (dog, cat fish) I I : ae. ication for observation 0. = binary indicator (0 or 1) if class label cis the correct classifi predicted probability observation 0 is of class ¢ 2.2.2(€) Choosing Output Function and Loss Choosing an output function and Loss function depends on the nature ofthe problem-t0 be solved. different suitable output activation functions and lo Depending upon whether it is classification or regression, functions to be used. Classification for discrete values. Regression for continuous values. (Case 1: When the output is'a numerical value that you are trying to predict (regression) ‘= When the problem to be solved is a regression problem, for example, consider predicting the prices of provided with different features of the house. Here, we construct a neural network structure where the final the output consist of only one neuron that gives the numerical value. For computing the accuracy score, the predicted values are compared to true numeric values. _In such cases, activation function in output layer will be ‘Loss function to be used in this case may be MAE, MSE or RMSE. iF activation, ‘There are two types of classification problems.© SoftMax converts a real vector to a vector of categorical probabilities. 7 range (0, 1) and sum to 2, Each vector is handled independently. © SoftMax is often used as the actival interpreted as a probability distribution tion for the last layer of a classi vox 6 mostly used (oF, mult-CleREl ce Therefore, SoftM classification, © Mathematically, it can be represented 0s, Formula (2), = = SoftMax 2 = input vector e = standard exponential function for input vector K ey ey = number of classes in the multi-class classifier = standard exponential function for input vector standard exponential function for output vector 2.2.2(E) SoftMax Output Function ve elements of the output vector arg fication network because the result Could—————— 6 Regularization of Deep Neural Network | + Tectoie of activation functions acral part of neural network design +The choice of activation function in the hidden layer will contol how well the network model learns the training dataset. + The choice of activation function in the output layer wll define the type of predictions the model can make. + For example in the case of the perceptron, the choice of the sign activation function is motivated by the fact that a binary class label needs to be predicted. + However, itis possible to have other types of situations where different target variables may be predicted. For example, if the target variable to be predicted is real, then it makes sense to use the identity activation function, and the resulting algorithm is the same as least-squares regression. + _ Ifitis desirable to predict a probability of a binary class, it makes sense to use 3 sigmoid function for activating the output node. 3. Learning Rate/ Learning constant + The effectiveness and convergence of learning algorithm depends significantly on the value of learning constant. + However the optimum value of n depends on the problem being solved and there is no single learning constant value suitable for different training cases. ‘+ When broad minima yield small gradient values, then a large value of 1 will result in a more rapid with steep and narrow minima, a small value of n must be chosen to avoid However, for a true gradient descent, but the price of this guarantee is an increased e been reported as 2 successful for many computational back-+ Whee xgumettandt-2 sncicate the current and the most recent training step respectively the user selected positive moment constant ‘The second teem fa w (2— Ip incdicate a scaled most recent adjustment of megs S called the momentum, = = Typically ais chosen between 0-1 a 0-8 © tesfencion = Deep learning meural networks are tained using the stochastic gradient descent ons © As pat of the optimization sigue the emer for the axrent state of the sepestediy. This requires the choice of an evar function, comentionally called 2 loss function, that can be esed to estimate the lass of the mode! so thet the meights con be updated to reduce the loss on the nett estate (© Neu aconark models team 2 mapping tom inputs to outputs from examples anc the choice of loss function mast match the framing of the speciic predictive modeling problem, such 25 cisssification or regression, Ferthe the contiguestion of the cutout layer must also be appropriste for the chosen loss function For example for regression me use Mean squared error, or RMSE or MAE “For Giassiication pratien cass extupy mould be the appropriate Loss function Size ectess to the sumber of samples thet cust be processed before the mode! is updated. The octers to the total number of times the training dataset has been traversed to train 2 model | Szes and taining epochs to see what works best for the mode!raining, Opt & Regularization of Deep NEUal Net raining, OP Deep Learning 2.19 4. Cliffs and Exploding Gradients 19 cliffs shown in Fig 23.4, i yemblint + Neural networks with many layers often have extremely steep regions Fes hor) > Clift ; ins sharp nonlinearities jn Fig. 2.3. + The objective function for highly nonlinear deep neural networks often cont Sra parameter space resulting from the multiplication of several parameters. These nonlinearities give rise to vey hhigh derivatives in some places. When the parameters get close to such a cliff region, a gradient descent update can catapult the parameters very far, possibly losing most of the optimization work that had been done, _ 5. Inexact Gradients Most optimization algorithms are designed with the assumption that we have access to the exact gradient or Hessian matrix. In practice, we usually only have noisy or even biased estimate of these quantities. The inexa gradients introduces a second layer of uncertainty in the model, : 2.3.3 Gradient Descent Approaches — Stochastic, Batch, Mini Batch + It is a first-order optimization algorithm. This means it ‘only takes into account the first order dérivative of the co "function when performing the updates on the parameters.+ Itis having huge oscillation, no oan S22 S6D will always vary from one point to another for each and every datasets its tough @Bsolite minima, And we will end up getting a multiple minima points We need to control eee earing Tatas too tigh, it may be possible that some other dataset not show you the same ic : ay Properties, again, learning rate effect in SGD will be litle but lesser as compare to the BGD and MGD. ‘+ Batch Gradient Descent (86D) ‘When we train the model to optimize the loss function Using the mean of all the individual losses in our whole dataset, it is called Batch Gradient Descent. ‘Advantages of Batch Gradient Descent (BGD) ‘+ Itis more computationally efficient. + It is a feamable parameter: whenever we try to calculate 2 new weight, we consider all the data which is available to us based on the summation of the loss. So, we actually find out or derive the new value of the weight / bias , which is a learnable parameter. Disadvantages of Batch Gradient Descent (BGD) ‘© Memory consumption is too high: Here we send all the data inside the network one by one, so, we need some kind of memory to store a loss which we have received in each and every iterations. Once we are done with passing rk, we calculate the loss. So this this case, memory consumption will be too ‘say that the computation will be high and calculation will be very ‘somewhat inefficient, Stochastic gradient Descent, ‘the mean of a batch of 10-1000 examples toW deep Learning a 2.3.4 Momentum based GD Trait batches, we get .n that in some ‘Some © While using MGD, since we take records in batches, so, it might ee contol the fearing rate by eung error and in Some other batches, we get some other error. So, we hav + ot ao fal fearing tag whenever we use MGD. If fearing rate is very ow, so the convergence ° high, we won't get an absolute global or local minima, So we need to conte = Note: _ tithe batch size = total no. of data, then inthis case, BGD = MGD. jeter optimization. Gradient descent with momentum uses the momentum of the gradient for param: a E =pvt-1 + avLyet-3) Parameters update in GD with momentum at iteration t: 6t = 6-1 - Vt and vt =pvt ‘aVL(0t~ }) where 6t- 1 are the parameters from the previous iteration t—1. Th Compare to vanilla GD: @t = @t=1 term Vtis called momentum. This term accumulates the gradients from the past several steps, ie, V = Von? = Vrav"*+av"? 2V+BavL'?+a9l' 3V+2avl"*+BavL'+avL"* This term is analogous to a momentum of a heavy ball rolling down the hill ‘The parameter is referred to as a coefficient of momentum Atypical value of the parameter is 0.9 ‘This method updates the parameters in the direction of the weighted average of the Past gradientspeer Learning 22___Training, Opti. & Regularization of Deep Neural Network similar to GD with momentum, Ad; adient), ie, Vt = Bika + (=f Computes a weighted average of past gradients (first moment of the 9 2 VL (t= 1) ‘of past squared gradients (second moment of the gradient), §GD with momentum g rate according to a parameter. ch parameter based on iteration. updates (ie. low learning rates) for parameters pdates (\e. high learning rates) for parameters sion, then using the calculated step size to make 3 jing the partial derivatives for the parameter seen eter by the square root of the sum of the F optimization problems that have a lot ofTraining, Opti & Regularization of Deep Neural 223 23.8 RMSProp = RMSProp extends Adagrad to avoid the effect of a monotonically decreasing leaming rate: Another ing rate optimization algorithm, Root Mean Square Prop (RMSProp) works By Keeping) ane .-... of the squares of past gradients. RMSProp then divides the learning rate by th average to speed up convergence. sight of as an extension of AdaGrad in that it uses 2 decaying average or moving average © RMSProp can be thos the calculation of the learning rate for each parameter. the partial derivatives instead of the sum in = Using a decaying moving average of the partial derivative allows the search to forget early partial derivative and focus on the most recently seen shape of the search space. Bsaw + -B) Gy sow ae aw Note ‘s~ the exponentially weighted average of past squares of gradients. Se pai et rept caret yo wat rt |W weight tensor |B hyperparameter to be tuned ‘the leaming rate «< - very small value to avid dividing by zero Regularization is a set of strategies used in Machine Learning to reduce the generalization error (overftting). nis one of the most important concepts of machine learning. It is a technique to prevent the tting by adding extra information to it.point or a value which tells us spread of our N to training data and does not generalize well on the dataW deep Learning Multi-dimensional system can have high bias Fesultin 2.4.4 Ways to Overcome Underfitting High bias Increase the model size (such as number of in High Bias and High Variance issue._ (High Bias) It allows to fit the training set better. Ifyoy that this increases variance, then use: neurons/layers) regularization, which will usually eliminate increase in variance. Create additional features that help the algorithm eliminate a particular category of Modify input features based on insights from error analysis errors. These new features could help with’ bias and variance. Reduce or eliminate regularization (12, L1 Reduces avoidable bias, but increase vari regularization, dropout) Modify model architecture (such as neural network architecture) so that it is more suitable for your problem. This can affect both bias and variance. 2.4.5 Ways to Overcome Overfitting (| (High variance) Simplest and more reliable way to addres variance, so long as you have access 0 __ significantly more data and enough nal power to process the datepeep Learning, 26__ Training, Opt. & Regularization of Deep Neural Network generalization. To build @ good model, we ‘eed to " minimizes the total error, find a good balance between bias and variance such that it ‘Aigo complexity Fig, 2.5.1 : Bias ~ variance trade off 2.6 Regularization Methods 261 L2 Regularization * Consider “curve, which shows the loss for both the training set and validation set goes up. In other |a penalty or‘Trai Deep Learning © Ridge regression adds “squared magnitude” of coefficient as penalty term t highlighted part represents L2 regularization element. n Pp oe E |y- EB yH| +2 2B ia 1 j Cost function itwill lead to under-fitting. So, it's important ing issue. = However, if lambda is very large then it will add too much weight and it how lambda is chosen. This technique works very well to avoid over+ 2.6.2 11 Regularization = A regression model that uses L1. regularization technique is called Lasso Regression. The Lasso regression adds absolute value of magnitude” of coefficient as penalty term to the loss function. n Pp Pp = |y- J 6] +4 Z Bl j=1 jel Cost function «The key difference between these techniques is that Lasso shrinks the less important feature's coefficient to thus, removing some feature altogether. So, this works well for feature selection in case we have a huge nu of features. ‘of L2 Regularization | For example, a linear model with the following weights: V1 = 02, w2 = 05, w3==5.w4=1, w5-0.25, w6=075) ‘term is 26.915 -ewc-twd = 26.915'10 solve complex problems, we need complex solutions. : way Of preventing our model from getting over complex. But it is actually 2 ‘more interactions between various parts of our neural network. ‘to get out of hand. Hence, what if we penalize complexity ‘0ur model from getting too complex. This is how the idea ‘add all our parameters (weights) to our loss function. That won't and some are negative. So what if we add the squares of all the ‘it might result in our loss getting so huge. ‘that are large. A good NN d might have some weight2.6.5 Batch Normalization iding extra layers in '* Batch normalization is a process to make neural networks faster and more stable through adding a deep neural network. ¢ coming from ©The new layer performs the standardizing and normalizing operations on the input of a layer ng a previous layer. ‘© Itact similar to the data pre-processing steps mentioned earlier They calculate the mean j and variance 0 of a batch of input data, and normalize the data x to a zero mean and unit variance iesx = en + BatchNorm layers alleviate the problems of proper initialization of the parameters and hyper-parameters I Result in faster convergence training, allow larger learning rates Reduce the internal covariate shift + BatchNorm layers are inserted immediately after convolutional layers or fully-connected layers, and before ae © They are very common with convolutional NNs. ‘Advantages of Batch Normalization ‘Speed up the Training : By Normalizing the hidden layer activation the Batch normalization speeds up the training process. _ Handles internal covariate shift : It solves the problem of internal covariate shift. Through this, we ensure that the J in training neural networks is how long to train them. ‘mean that the model will underfit the train and the test sets. Too much training will mean ft the training dataset and have poor performance on the test set.

NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
DL Unit2 HD
No ratings yet
DL Unit2 HD
141 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Activation Function
No ratings yet
Activation Function
43 pages
Unit 2 - Machine Learning
No ratings yet
Unit 2 - Machine Learning
19 pages
Activation Function
No ratings yet
Activation Function
4 pages
Activation Function
No ratings yet
Activation Function
31 pages
Feed Forward NN
No ratings yet
Feed Forward NN
35 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Fundamentals of Neural Network
No ratings yet
Fundamentals of Neural Network
84 pages
Module-4 Neural Network
No ratings yet
Module-4 Neural Network
61 pages
06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
Module1
No ratings yet
Module1
124 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
Lecture 9-NN - Modified
No ratings yet
Lecture 9-NN - Modified
94 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
Tech Neo DL U3 6 Split This Is A PDF of Techneo Deep Learning
No ratings yet
Tech Neo DL U3 6 Split This Is A PDF of Techneo Deep Learning
110 pages
Activation Function
No ratings yet
Activation Function
34 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Unit 2
No ratings yet
Unit 2
35 pages
Artificial Neural Artificial Neural Networks
No ratings yet
Artificial Neural Artificial Neural Networks
40 pages
Deep Learning Interview
No ratings yet
Deep Learning Interview
28 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Lect 5 - Non Linear Activation Functions
No ratings yet
Lect 5 - Non Linear Activation Functions
41 pages
Unit 2 - Activation Function - PR
No ratings yet
Unit 2 - Activation Function - PR
22 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
26 - Netinput Activation Function Forward and Back Propogation
No ratings yet
26 - Netinput Activation Function Forward and Back Propogation
41 pages
Mod 2.3 - Activation Function, Loss Functions
No ratings yet
Mod 2.3 - Activation Function, Loss Functions
12 pages
Unit V Neural Networks
No ratings yet
Unit V Neural Networks
35 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Activation Functions and Their Characteristics in Deep Neural Networks
No ratings yet
Activation Functions and Their Characteristics in Deep Neural Networks
6 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Activation FN
No ratings yet
Activation FN
15 pages
Session NN
No ratings yet
Session NN
32 pages
003 Activation Functions in Machine Learning
No ratings yet
003 Activation Functions in Machine Learning
19 pages
Activation Funtions
No ratings yet
Activation Funtions
26 pages
DL Answers
No ratings yet
DL Answers
24 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
ML Unit 4
No ratings yet
ML Unit 4
23 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
Unit 4
No ratings yet
Unit 4
19 pages
Sample Questions: Subject Name: Semester: VI
No ratings yet
Sample Questions: Subject Name: Semester: VI
17 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Unit 5 Activation Function
No ratings yet
Unit 5 Activation Function
15 pages
DL M1 Tech
No ratings yet
DL M1 Tech
40 pages
Unit 2
No ratings yet
Unit 2
18 pages
Forward and Backward Propagation Deep Learning 1703697260
No ratings yet
Forward and Backward Propagation Deep Learning 1703697260
9 pages
DL M5 Tech
No ratings yet
DL M5 Tech
21 pages
DL M1 TechNeo
No ratings yet
DL M1 TechNeo
30 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Activatn FN 2
No ratings yet
Activatn FN 2
10 pages
DL M3 Tech
No ratings yet
DL M3 Tech
15 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Unit 3 Deep Learning
No ratings yet
Unit 3 Deep Learning
11 pages
Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Activation
No ratings yet
Activation
7 pages
UNIT-III Activation-Function
No ratings yet
UNIT-III Activation-Function
6 pages
DL M6 Tech
No ratings yet
DL M6 Tech
29 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
DL M4 Tech
No ratings yet
DL M4 Tech
24 pages

DL M2 Tech

Uploaded by

DL M2 Tech

Uploaded by

You might also like