0% found this document useful (0 votes)

6 views48 pages

ANN Unit-2

The document outlines the principles and architectures of shallow neural networks, emphasizing their role in binary and multiclass classification, as well as autoencoders. It discusses the relationship between classical machine learning models and shallow neural networks, highlighting how minor architectural changes can yield different models. Additionally, it covers the advantages of deep learning, the structure of autoencoders, and the importance of hyperparameters in training these models.

Uploaded by

suparnachandra paruchuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views48 pages

ANN Unit-2

Uploaded by

suparnachandra paruchuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Artificial Neural Network

School or Dept. Name here

Syllabus
UNIT 2 :Shallow Neural Networks 9 hours, P - 6 hours
Neural Architectures for Binary Classification Models, Neural Architectures for
Multiclass Models, Autoencoder: Basic Principles, Neural embedding with continuous
bag of words, Simple neural architectures for graph embeddings

School or Dept. Name here

Neural Networks and Machine Learning

• Neural networks are optimization-based learning models.

• Many classical machine learning models use continuous optimization:
– SVMs, Linear Regression, and Logistic Regression
– Singular Value Decomposition
– (Incomplete) Matrix factorization for Recommender Systems
• All these models can be represented as special cases of shallow neural networks!

School or Dept. Name here

The Continuum Between Machine Learning and Deep Learning

• Classical machine learning models reach their learning capacity early because they
are simple neural networks.
• When we have more data, we can add more computational units to improve
performance.

School or Dept. Name here

The Deep Learning Advantage

• Exploring the neural models for traditional machine learning is useful because it
exposes the cases in which deep learning has an advantage.
– Add capacity with more nodes for more data.
– Controlling the structure of the architecture provides a way to incorporate domain-
specific insights (e.g., recurrent networks and convolutional networks).
• In some cases, making minor changes to the architecture leads to interesting models:
– Adding a sigmoid/softmax layer in the output of a neural model for (linear) matrix
factorization can result in logistic/multinomial matrix factorization (e.g., word2vec).

School or Dept. Name here

Neural Architectures for Binary Classification Models
Recap: Perceptron versus Linear Support Vector Machine

The Perceptron criterion is a minor variation of hinge loss with identical update of
W ⇐ W +αyX in both cases.
• We update only for misclassified instances in perceptron, but update also for “marginally
correct” instances in SVM.
School or Dept. Name here
What About the Kernel SVM?

RBF Network for unsupervised feature engineering.

– Unsupervised feature engineering is good for noisy
data.
– Supervised feature engineering (with deep learning)
is good for learning rich structure.

School or Dept. Name here

Much of Machine Learning is a Shallow Neural Model

By minor changes to the architecture of perceptron we can get:

– Linear regression, Fisher discriminant, and Widrow-Hoff learning ⇒ Linear
activation in output node
– Logistic regression ⇒ Sigmoid activation in output node
• Multinomial logistic regression ⇒ Softmax Activation in Final Layer
• Singular value decomposition ⇒ Linear autoencoder
• Incomplete matrix factorization for Recommender Systems
⇒ Autoencoder-like architecture with single hidden layer (also used in word2vec)

School or Dept. Name here

Why do We Care about Connections?

• Connections tell us about the cases that it makes sense to use conventional machine
learning:
– If you have less data with noise, you want to use conventional machine learning.
– If you have a lot of data with rich structure, you want to use neural networks.
– Structure is often learned by using deep neural architectures.
• Architectures like convolutional neural networks can use domain-specific insights.

School or Dept. Name here

Widrow-Hoff Rule: The Neural Avatar of Linear Regression

• The perceptron (1958) was historically followed by Widrow-Hoff Learning (1960).

• Identical to linear regression when applied to numerical targets.
– Originally proposed by Widrow and Hoff for binary targets (not natural for regression).
• The Widrow-Hoff method, when applied to mean-centered features and mean-centered
binary class encoding, learns the Fisher discriminant.

School or Dept. Name here

Linear Regression: An Introduction

• In linear regression, we have training pairs (Xi, yi) for i ∈ {1 . . . n}, so that Xi
contains d-dimensional features and yi contains a numerical target.
• We use a linear parameterized function to predict
• Goal is to learn W, so that the sum-of-squared differences between observed yi and
predicted ˆyi is minimized over the entire training data.
• Solution exists in closed form, but requires the inversion of a potentially large
matrix.

School or Dept. Name here

Linear Regression with Numerical Targets:Neural Model

School or Dept. Name here

Widrow-Hoff: Linear Regression with Binary Targets

School or Dept. Name here

Comparison of Widrow-Hoff with Perceptron and SVM

Convert the binary loss functions and updates to a form more easily comparable to
perceptron using

School or Dept. Name here

Connections with Fisher Discriminant

• Consider a binary classification problem with training instances

Mean-center each feature vector as

Mean-center the binary class by subtracting from each yi.

Use the delta rule for learning.

School or Dept. Name here

Neural Models for Logistic Regression

Consider the training pair (Xi, yi) with d-dimensional feature variables in Xi and class
variable yi ∈ {−1,+1}.
• In logistic regression, the sigmoid function is applied to W·Xi, which predicts the
probability that yi is +1.

• We want to maximize ˆyi for positive class instances and 1−ˆyi for negative class
instances.
– Same as minimizing −log(ˆyi) for positive class instances and −log(1 − ˆyi) for
negative instances.
– Same as minimizing loss Li = −log(|yi/2 − 0.5+ ˆyi|).
– Alternative form of loss Li = log(1+exp[−yi(W · Xi)])

School or Dept. Name here

School or Dept. Name here
Interpreting the Logistic Update

• An important multiplicative factor in the update increment is 1/(1+exp[yi(W · Xi)]).

• This factor is 1− ˆyi for positive instances and ˆyi for negative instances ⇒ Probability of

mistake!

• Interpret as: W ⇐ W+α [Probability of mistake on (Xi, yi)] (yiXi)

School or Dept. Name here

Comparing Updates of Different Models

The unregularized updates of the perceptron, SVM, Widrow- Hoff, and logistic
regression can all be written in the following form:
W ⇐ W +αyiδ(Xi, yi)Xi
• The quantity δ(Xi, yi) is a mistake function, which is:
– Raw mistake value (1 − yi(W · Xi)) for Widrow-Hoff
– Mistake indicator whether (0−yi(W ·Xi)) > 0 for perceptron.
– Margin/mistake indicator whether (1−yi(W ·Xi)) > 0 for SVM.
– Probability of mistake on (Xi, yi) for logistic regression.

School or Dept. Name here

Comparing Loss Functions of Different Models

School or Dept. Name here

Auto Encoders

▪ Auto encoders are a specific type of feedforward neural networks where the input
is the same as the output.
▪ They compress the input into a lower-dimensional code and then reconstruct the
output from this representation.
▪ The code is a compact “summary” or “compression” of the input, also called the
latent space representation.
▪ An auto encoder consists of 3 components: encoder, code and decoder.
▪ The encoder compresses the input and produces the code, the decoder then
reconstructs the input only using this code.

School or Dept. Name here

Architecture
• Both the encoder and decoder are fully-connected feedforward neural networks
• Code is a single layer of an ANN with the dimensionality of our choice.
• The number of nodes in the code layer (code size) is a hyperparameter that we set
before training the autoencoder.

School or Dept. Name here

Architecture

School or Dept. Name here

Encoder: An encoder is a feedforward, fully connected neural network that compresses the
input into a latent space representation and encodes the input image as a compressed
representation in a reduced dimension. The compressed image is the distorted version of the
original image.
Code: This part of the network contains the reduced representation of the input that is fed into
the decoder.
Decoder: Decoder is also a feedforward network like the encoder and has a similar structure
to the encoder. This network is responsible for reconstructing the input back to the original
dimensions from the code.
Latent Space :Abstract multi-dimensional space that encodes a meaningful internal
representation of externally observed events.

School or Dept. Name here

Hyper parameters of Autoencoders

There are 4 hyperparameters that we need to set before training an

autoencoder:
1. Code size: It represents the number of nodes in the middle layer.
Smaller size results in more compression.
2. Number of layers: The autoencoder can consist of as many layers as we want.
3. Number of nodes per layer: The number of nodes per layer decreases with each
subsequent layer of the encoder, and increases back in the decoder. The decoder is symmetric
to the encoder in terms of the layer structure.
4. Loss function: We either use mean squared error or binary cross-entropy. If the input
values are in the range [0, 1] then we typically use cross-entropy, otherwise, we use the mean
squared error.
Autoencoders are trained the same way as ANNs via backpropagation

School or Dept. Name here

Autoencoder – Example

▪ Two inputs neurons

▪ One hidden layer with two neurons (encoder)
▪ Latent layer with two neurons
▪ One hidden layer with two neurons (decoder)
▪ Output two neurons (same as input)

School or Dept. Name here

Autoencoder – Forward Pass Autoencoder – Backpropagation
▪ Decoder Gradients: Applying the Chain
▪ Encoder 𝑍 = 𝑓𝑒𝑛𝑐𝑜𝑑𝑒𝑟 𝑥; 𝑊1 , 𝑏1 = rule
𝜎 𝑊1 𝑥 + 𝑏1 𝜕𝐿 𝜕𝐿 𝜕𝑥ො 𝜕(𝑊2 𝑍)
where: = . .
𝜕𝑊2 𝜕𝑥ො 𝜕(𝑊2 𝑍) 𝜕𝑊2
W1: Encoder weights, = 2 𝑥ො𝑖 − 𝑥𝑖 ⨀𝜎 ′ (𝑊2 𝑍 + 𝑏2 )⨀𝑍 𝑇
b1: Encoder biases, Similarly for the bias
σ: Activation function (e.g., ReLU or 𝜕𝐿
sigmoid). = 2 𝑥ො𝑖 − 𝑥𝑖 ⨀𝜎 ′ (𝑊2 𝑍 + 𝑏2 )
𝜕𝑏2
▪ Decoder 𝑥ො = 𝑓𝑑𝑒𝑐𝑜𝑑𝑒𝑟 𝑍; 𝑊2 , 𝑏2 = ▪ Encoder Gradients
𝜎 𝑊2 𝑍 + 𝑏2 𝜕𝐿 𝜕𝐿 𝜕𝑥ො 𝜕𝑍 𝜕(𝑊1 𝑥)
where: = . . .
𝜕𝑊1 𝜕𝑥ො 𝜕𝑍 𝜕(𝑊1 𝑥) 𝜕𝑊1
W2: Decoder weights,
= ൣ൫2 𝑥ො𝑖 − 𝑥𝑖 ⨀𝜎 ′ (𝑊2 𝑍
b2: Decoder biases,
Loss function = σ𝑖 𝑥ො𝑖 − 𝑥𝑖 2 + 𝑏2 )⨀𝑊2 𝑇 ൯⨀𝜎 ′ 𝑊1 𝑥 + 𝑏1 ൧⨀𝑥 𝑇
Similarly for the bias (𝑥 𝑇 =1 in the above
School or Dept. Name here equation)
Autoencoder – Weight Updates Autoencoder – with L2 norm

Decoder weights λ: Regularization parameter (controls the

𝜕𝐿 𝜕𝐿 strength of the penalty),
𝑊2 ← 𝑊2 − 𝛼 ; 𝑏2 ← 𝑏2 − 𝛼 New Loss function = σ 𝑥
ො − 𝑥 2
+
𝜕𝑊2 𝜕𝑏2 2 + | 𝑊 |2
𝑖 𝑖 𝑖
Encoder weights 𝜆 | 𝑊1 | 2
𝜕𝐿 𝜕𝐿 Decoder weights
𝑊1 ← 𝑊1 − 𝛼 ; 𝑏1 ← 𝑏1 − 𝛼 𝜕𝐿
𝜕𝑊1 𝜕𝑏1 𝑊2 ← 𝑊2 − 𝛼 + 𝜆𝑊2
This process propagates errors from the output 𝜕𝑊2
back through the decoder to the encoder, Encoder weights
ensuring the autoencoder improves at 𝜕𝐿
𝑊1 ← 𝑊1 − 𝛼 + 𝜆𝑊1 ;
reconstructing the input. 𝜕𝑊1

School or Dept. Name here

Numerical Example
1
Input 𝑥 = .
0.5
0.5 0.3 0
Encoder weights 𝑊1 = ; 𝑏1 =
0.2 0.7 0
0.6 0.4 0
Decoder weights 𝑊2 = ;𝑏 =
0.1 0.9 2 0
Activation function: Identity
0.65 0.65
Forward pass: Encoder 𝑍 = ; Decoder𝑥ො𝑖 =
0.55 0.56
Loss = 0.1261
Decoder gradient
𝜕𝐿 𝑇 −0.455 −0.385 𝜕𝐿 −0.7
= 2 𝑥ො𝑖 − 𝑥𝑖 ⨀𝑍 = ; =
𝜕𝑊2 0.078 0.066 𝜕𝑏2 0.12
Encoder gradient
𝜕𝐿 𝑇 𝑇 −0.402 −0.201 𝜕𝐿 −0.402
= 2 𝑥ො𝑖 − 𝑥𝑖 ⨀𝑊2 ⨀𝑥 = ; =
𝜕𝑊1 0.0468 0.0234 𝜕𝑏1 0.0468
School or Dept. Name here
0.6455 0.4385 0.07
Decoder weight new 𝑊2 = ; 𝑏2 =
0.0922 0.8934 −0.012

0.5402 0.3201 0.0402

Encoder weight new 𝑊1 = ; 𝑏1 =
0.1953 0.6977 0.00468
𝜕𝐿 −0.395 −0.345
Decoder gradient with L2 norm =
𝜕𝑊2 0.088 0.156
𝜕𝐿 −0.352 −0.171
Encoder gradient with L2 norm =
𝜕𝑊1 0.0668 0.0934

School or Dept. Name here

Text Embedding with Word2vec

Consider a sentence containing the words w1w2 ...wn in that sequence.

The words wi−twi−t+1 ...wi−1wi+1 ...wi+t−1wi+t are used to predict the target word wi.
This model is referred to as the continuous bag-of-words (CBOW) model.
"The cat sits on the mat"
Vocabulary (d) :["the", "cat", "sits", "on", "mat"] Context Target
Let the context size (m) = 2. ["the", "sits"] "cat"
Each word is represented as a
one-hot vector: ["cat", "on"] "sits"
"the" = [1, 0, 0, 0, 0] ["sits", "mat"] "on"
"cat" = [0, 1, 0, 0, 0]
"sits" = [0, 0, 1, 0, 0]…

School or Dept. Name here

Continuous Bag of Words (CBOW)

# of hidden neurons - p Say 3

Encoder weights (Shared Weights) – W
Dimensions – 5x3
Decoder Weights – V
Dimensions – 3x5
Output – Softmax layer
Dimensions – 1x5
Highest probability predicts the target
word.

School or Dept. Name here

Numerical Example (Lab 5 code)

Vocabulary:{"queen","man","woman","child","king","prince","princess","throne","palac
e","royal"}
Indices: {0: "queen", 1: "man", 2: "woman", 3: "child", ..., 9: "royal"}.
Parameters: Vocabulary size (𝑉) = 10
Embedding size (𝑁) = 3
Context size (𝐶) = 3
Learning rate (𝜂) = 0.1
Training Example:
Context words: ["queen", "man", "woman"] ([0,1,2])
Target word: "king" ([4])

School or Dept. Name here

Neural embedding with continuous bag of words

School or Dept. Name here

Random Encoder Decoder Weights

Input to Hidden Weights (W): V×N

Hidden to Output Weights (W′): N×V
Step 1: Embed the Context Words
The one-hot encoded vectors for the context words
(queen, man, woman) are multiplied by 𝑊:
Embedding=𝑊𝑇one-hot(context)
Embeddingqueen=[0.1,0.2,0.3],
Embeddingman=[0.4,0.5,0.6],
Embeddingwoman=[0.7,0.8,0.9]

School or Dept. Name here

Forward Pass
Step 2: Compute the Hidden Layer (Average Embedding)
𝐶
1
ℎ = ෍ 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔𝑖 = 0.4, 0.5, 0.6
𝐶
𝑖=1
1
= 0.1, 0.2, 0.3 + 0.4, 0.5, 0.6 + 0.7, 0.8, 0.9
3
Step 3: Compute the Scores for Each Word (Pre-Softmax)
0.3 0.5 …
𝑢 = ℎ. 𝑊 ′ = 0.4, 0.5, 0.6 0.7 0.3 …
0.5 0.9 …
= 0.74,0.83,0.74,0.99,0.98,1.09,0.58,0.89,1.09,0.86
Step 4: Apply Softmax to Compute Probabilities
𝑒 𝑢𝑖
School or Dept. Name here𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑢 = 𝑉
σ𝑗=1 𝑒 𝑢𝑗
Loss and Error Correction

In the CBOW model, categorical cross-entropy loss is used

𝑉

𝐿 = − ෍ 𝑇𝑎𝑟𝑔𝑒𝑡𝑖 . log 𝑦𝑖
𝑖=1
Where:
• V: Vocabulary size.
• yi: Predicted probability for word i.
• targeti: One-hot encoded value for the target word.
𝑒 𝑢𝑖
𝑦 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑢 = 𝑉
σ𝑗=1 𝑒 𝑢𝑗

School or Dept. Name here

Loss and Error Correction
To compute the gradient ∂L/∂ui, we use the chain rule:
𝜕𝐿 𝜕𝐿 𝜕𝑦𝑖
= .
𝜕𝑢𝑖 𝜕𝑦𝑖 𝜕𝑢𝑖
The derivative of the loss with respect to yi
𝜕𝐿 𝑇𝑎𝑟𝑔𝑒𝑡𝑖
=−
𝜕𝑦𝑖 𝑦𝑖
(for i = t, target word)
The derivative of the softmax function is
𝜕𝑦𝑖
= 𝑦𝑖 1 − 𝑦𝑖 , for 𝑖 = 𝑡
𝜕𝑢𝑖
Combining the terms, the final derivative for i-th output is:
𝜕𝐿
= 𝑦𝑖 − 𝑡𝑎𝑟𝑔𝑒𝑡𝑖 = 𝑒𝑟𝑟𝑜𝑟
𝜕𝑢𝑖
School or Dept. Name here
Loss and Backward Propagation
Target word King (index 4) = [0, 0, 0, 0, 1, 0, 0, 0, 0, 0] = T
Output probabilities: y
[0.085,0.093,0.085,0.121,0.12,0.133,0.072,0.103,0.133,0.106]
Backward Propagation
Step 1: Compute the Error at the Output Layer
Error:𝑒=𝑦−T=[0.085,0.093,0.085,0.121,−0.88,0.133,0.072,0.103,0.133,0.1
06]
Step 2: Compute Gradients for W′
∇𝑊 ′ = ℎ𝑇 . 𝑒𝑟𝑟𝑜𝑟
0.4
= 0.5 ሾ0.085 0.093 0.085 0.121
0.6
− 0.88 0.133 0.072 0.103 0.133 0.106ሿ
School or Dept. Name here
Backward Pass
Step 2: Compute Gradients for W′
∇𝑊 ′ = ℎ𝑇 . 𝑒𝑟𝑟𝑜𝑟
0.4
= 0.5 ሾ0.085 0.093 0.085 0.121
0.6
− 0.88 0.133 0.072 0.103 0.133 0.106ሿ
0.4 × 0.085 0.4 × 0.093 …
= 0.5 × 0.5 × 0.093 …
0.6 × 0.6 × 0.093 …

Step 3: Propagate Error Back to Hidden Layer

𝛿ℎ = 𝑊 ′ 𝑒𝑟𝑟𝑜𝑟 𝑇
School or Dept. Name here
Backward Pass
Step 3: Propagate Error Back to Hidden Layer 𝜹𝒉 =
0.0535, 0.0726, 0.0618
Step 4: Compute Gradients for W
1 𝐶
∇𝑊 = σ 𝛿
𝐶 𝑖=1 ℎ

School or Dept. Name here

Backward Pass
Step 4: Compute Gradients for W
Each word contributes equally to W, so the gradient for each word’s
embedding is simply δh scaled by the context size C.
1
∇𝑊𝑄𝑢𝑒𝑒𝑛 = ∇𝑊𝑚𝑎𝑛 = ∇𝑊𝑤𝑜𝑚𝑎𝑛 = 0.0535, 0.0726, 0.0618
3
= 0.01783, 0.0242, 0.0206
Step 5 Update Weights W
𝑊𝑛𝑒𝑤 = 𝑊𝑜𝑙𝑑 − 𝜂∇𝑊
Let learning rate 𝜂 = 0.1
𝑊𝑄𝑢𝑒𝑒𝑛 = 𝑊𝑄𝑢𝑒𝑒𝑛 − 𝜂∇𝑊𝑄𝑢𝑒𝑒𝑛
= 0.1, 0.2, 0.3 − 0.1. 0.01783, 0.0242, 0.0206
= 0.09822, 0.19758, 0.29794ሿ

School or Dept. Name here

Backward Pass

Step 5 Update Weights W

𝑊𝑛𝑒𝑤 = 𝑊𝑜𝑙𝑑 − 𝜂∇𝑊
Let learning rate 𝜂 = 0.1
𝑊𝑚𝑎𝑛 = 𝑊𝑚𝑎𝑛 − 𝜂∇𝑊𝑚𝑎𝑛
𝑊𝑚𝑎𝑛 = 0.4, 0.5, 0.6 − 0.1. 0.01783, 0.0242, 0.0206
= 0.39822, 0.49758, 0.59794
𝑊𝑤𝑜𝑚𝑎𝑛 = 𝑊𝑤𝑜𝑚𝑎𝑛 − 𝜂∇𝑊𝑤𝑜𝑚𝑎𝑛
= 0.7 0.8, 0.9 − 0.1. 0.01783, 0.0242, 0.0206 = 0.69822, 0.79758, 0.89794
The embeddings for other words remain unchanged.

School or Dept. Name here

Skip-gram model

The Skip-Gram model learns word embeddings by predicting the context words given a
target word.
For 𝑤=2, context words are 2 words before and after the target word.
Vocabulary: {"We", "love", "machine", "learning"}.
Input: One-hot vector of the target word.
Output: Probabilities of context words.46

School or Dept. Name here

School or Dept. Name here
Sentiment classification using Word2Vec

▪ Gather a labeled dataset for sentiment

analysis (e.g., positive and negative
sentiments).
▪ Use a pre-trained Word2Vec model (e.g.,
Gensim's pre-trained Word2Vec, Google
Word2Vec). Convert sentences into feature
vectors by averaging the Word2Vec
embeddings of the words in each sentence.
▪ Use a machine learning model like
Logistic Regression, Random Forest, or a
Neural Network.
▪ Split the data into training and test sets.
Evaluate the model using metrics like
accuracy
School or Dept. Name here
Applications of Word Embedding vector

Text Classification : Classifying customer reviews as positive or negative using

sentiment-based embeddings.
Machine Translation : Using embeddings in neural machine translation systems like
Google Translate to improve language understanding.
Named Entity Recognition (NER) : Extracting "John" as a person and "New York" as a
location from sentences.
Semantic Search and Information Retrieval : Retrieving relevant documents for the
query "car" by matching it with related terms like "vehicle" or "automobile."
Text Generation: Using embeddings in models like GPT to generate human-like
responses in conversational AI systems.

School or Dept. Name here

Thank You

School or Dept. Name here

UNIT 1 Introduction Part 1
No ratings yet
UNIT 1 Introduction Part 1
37 pages
Chap 2 Slides
No ratings yet
Chap 2 Slides
74 pages
06 NeuralNetworks 2024
No ratings yet
06 NeuralNetworks 2024
82 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
92 pages
L2 - UCLxDeepMind DL2020
No ratings yet
L2 - UCLxDeepMind DL2020
104 pages
DL Full Merged
No ratings yet
DL Full Merged
454 pages
4 Classification 2
No ratings yet
4 Classification 2
55 pages
Deep - Learning
No ratings yet
Deep - Learning
49 pages
Lecture15 Neural Nets
No ratings yet
Lecture15 Neural Nets
70 pages
DL - M2 - Deep Feedforward NN
No ratings yet
DL - M2 - Deep Feedforward NN
97 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
DNN - M2 - Deep Feedforward NN 23dec
No ratings yet
DNN - M2 - Deep Feedforward NN 23dec
97 pages
Perceptron Notes
No ratings yet
Perceptron Notes
27 pages
02A DL2023 NN Basics
No ratings yet
02A DL2023 NN Basics
52 pages
Lecture Slides 1 - Introduction, PLA, and Logistic Regression - 2021
No ratings yet
Lecture Slides 1 - Introduction, PLA, and Logistic Regression - 2021
48 pages
Lecture13 - ML Linear & Log-Linear Models
No ratings yet
Lecture13 - ML Linear & Log-Linear Models
34 pages
Chapter 2 - 2 Shallow Neural Network 2 - 2
No ratings yet
Chapter 2 - 2 Shallow Neural Network 2 - 2
34 pages
7 NN Apr 28 2021
No ratings yet
7 NN Apr 28 2021
81 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Ch1-Fundamental of Neural Network
No ratings yet
Ch1-Fundamental of Neural Network
59 pages
D5 PPT
No ratings yet
D5 PPT
79 pages
Auto Encoder S
No ratings yet
Auto Encoder S
32 pages
Session 6 Machine Learning Algorithms
No ratings yet
Session 6 Machine Learning Algorithms
46 pages
3 ArtificialNeuralNetworks PDF
No ratings yet
3 ArtificialNeuralNetworks PDF
77 pages
3-Intro To Deep Learning and Perceptron
No ratings yet
3-Intro To Deep Learning and Perceptron
43 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
Deep Learning Autoencoders
No ratings yet
Deep Learning Autoencoders
31 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
Week3 Perceptron Mlprwerwerwer
No ratings yet
Week3 Perceptron Mlprwerwerwer
8 pages
NN Theory
No ratings yet
NN Theory
138 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Dat 300
No ratings yet
Dat 300
12 pages
DUnit IV
No ratings yet
DUnit IV
22 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
Lec1 PerceptronPocket Recap
No ratings yet
Lec1 PerceptronPocket Recap
61 pages
Lecture2 Slides 1
No ratings yet
Lecture2 Slides 1
28 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Module 03
No ratings yet
Module 03
13 pages
Autoencoders
No ratings yet
Autoencoders
14 pages
Short Course On Deep Learning: Welcome!!
No ratings yet
Short Course On Deep Learning: Welcome!!
57 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Unit IV V Deep Learning Material
No ratings yet
Unit IV V Deep Learning Material
32 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Machine Learning-Gkouzionis
No ratings yet
Machine Learning-Gkouzionis
14 pages
Module 2
No ratings yet
Module 2
44 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
TensorFlow Roadmap
No ratings yet
TensorFlow Roadmap
22 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
Web Information Systems Engineering - WISE 2018: 19th International Conference, Dubai, United Arab Emirates, November 12-15, 2018, Proceedings, Part II Hakim Hacid Download
100% (1)
Web Information Systems Engineering - WISE 2018: 19th International Conference, Dubai, United Arab Emirates, November 12-15, 2018, Proceedings, Part II Hakim Hacid Download
56 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Super VIP Cheat Sheet: Arti Cial Intelligence
No ratings yet
Super VIP Cheat Sheet: Arti Cial Intelligence
18 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
DLunit 5
No ratings yet
DLunit 5
17 pages
Project Report Format (2024-25)
No ratings yet
Project Report Format (2024-25)
35 pages
2023 - An Advanced Deep Neural Network For Fundus Image Analysis and Enhancing Diabetic Retinopathy Detection
100% (1)
2023 - An Advanced Deep Neural Network For Fundus Image Analysis and Enhancing Diabetic Retinopathy Detection
19 pages
Practical AI For Cybersecurity
No ratings yet
Practical AI For Cybersecurity
293 pages
Hands On Machine Learning With Scikit Learn and TensorFlow Techniques and Tools to Build Learning Machines 1st Edition by AurÃ©lien GÃ©ron 9352135210 9789352135219 - Download the ebook now for instant access to all chapters
100% (10)
Hands On Machine Learning With Scikit Learn and TensorFlow Techniques and Tools to Build Learning Machines 1st Edition by AurÃ©lien GÃ©ron 9352135210 9789352135219 - Download the ebook now for instant access to all chapters
85 pages
Ad3501 Deep Learning QB
No ratings yet
Ad3501 Deep Learning QB
8 pages
Deep Learning Algorithms and Architectures
No ratings yet
Deep Learning Algorithms and Architectures
26 pages
ChatGPT - Convolution and Pooling Operations
No ratings yet
ChatGPT - Convolution and Pooling Operations
43 pages
Q Learning
No ratings yet
Q Learning
38 pages
1 s2.0 S0959652624001628 Main
No ratings yet
1 s2.0 S0959652624001628 Main
18 pages
Csit Btech Iv Yr Vii Sem Scheme Syllabus July 2022
No ratings yet
Csit Btech Iv Yr Vii Sem Scheme Syllabus July 2022
25 pages
Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach
No ratings yet
Energies: Building Energy Consumption Prediction: An Extreme Deep Learning Approach
20 pages
Course Work Syllabus Revised
No ratings yet
Course Work Syllabus Revised
12 pages
Wideep: Wifi-Based Accurate and Robust Indoor Localization System Using Deep Learning
No ratings yet
Wideep: Wifi-Based Accurate and Robust Indoor Localization System Using Deep Learning
11 pages
L-Verse: Bidirectional Generation Between Image and Text
No ratings yet
L-Verse: Bidirectional Generation Between Image and Text
18 pages
Early Prediction of Extreme Rainfall Events
No ratings yet
Early Prediction of Extreme Rainfall Events
14 pages
Deep Learning For Iot: Tausif Diwan, Jitendra V. Tembhurne, Tapan Kumar Jain, and Pooja Jain
No ratings yet
Deep Learning For Iot: Tausif Diwan, Jitendra V. Tembhurne, Tapan Kumar Jain, and Pooja Jain
17 pages
Recent Advances in Deep Learning Techniques For Face Recognition MAIN
No ratings yet
Recent Advances in Deep Learning Techniques For Face Recognition MAIN
33 pages
Scalable Diffusion Models With Transformers
No ratings yet
Scalable Diffusion Models With Transformers
25 pages
Med Technique
No ratings yet
Med Technique
15 pages
Anomaly Detection in Self-Organizing Networks - Conventional Versus Contemporary Machine Learning
No ratings yet
Anomaly Detection in Self-Organizing Networks - Conventional Versus Contemporary Machine Learning
9 pages
Deep Learning Techniques For Cyber Security Intrusion Detection: A Detailed Analysis
No ratings yet
Deep Learning Techniques For Cyber Security Intrusion Detection: A Detailed Analysis
11 pages
28 - Intelligent Electronic Monitoring Supervision System Based On Multi-Label Classification
No ratings yet
28 - Intelligent Electronic Monitoring Supervision System Based On Multi-Label Classification
3 pages
1809 07454 PDF
No ratings yet
1809 07454 PDF
12 pages
Electronics 09 01011 v2 PDF
No ratings yet
Electronics 09 01011 v2 PDF
15 pages
Deep Learning, Theory and Foundation A Brief Review
No ratings yet
Deep Learning, Theory and Foundation A Brief Review
7 pages
Categorical Reparameterization With Gumbel Softmax
No ratings yet
Categorical Reparameterization With Gumbel Softmax
13 pages
Unsupervased Learning in Trading
No ratings yet
Unsupervased Learning in Trading
1 page
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet

ANN Unit-2

Uploaded by

ANN Unit-2

Uploaded by

Artificial Neural Network

School or Dept. Name here

School or Dept. Name here

• Neural networks are optimization-based learning models.

School or Dept. Name here

School or Dept. Name here

School or Dept. Name here

RBF Network for unsupervised feature engineering.

School or Dept. Name here

By minor changes to the architecture of perceptron we can get:

School or Dept. Name here

School or Dept. Name here

• The perceptron (1958) was historically followed by Widrow-Hoff Learning (1960).

School or Dept. Name here

School or Dept. Name here

School or Dept. Name here

School or Dept. Name here

School or Dept. Name here

• Consider a binary classification problem with training instances

Mean-center each feature vector as

Mean-center the binary class by subtracting from each yi.

Use the delta rule for learning.

School or Dept. Name here

School or Dept. Name here

• An important multiplicative factor in the update increment is 1/(1+exp[yi(W · Xi)]).

• Interpret as: W ⇐ W+α [Probability of mistake on (Xi, yi)] (yiXi)

School or Dept. Name here

School or Dept. Name here

School or Dept. Name here

School or Dept. Name here

School or Dept. Name here

School or Dept. Name here

School or Dept. Name here

There are 4 hyperparameters that we need to set before training an

School or Dept. Name here

▪ Two inputs neurons

School or Dept. Name here

Decoder weights λ: Regularization parameter (controls the

School or Dept. Name here

0.5402 0.3201 0.0402

School or Dept. Name here

Consider a sentence containing the words w1w2 ...wn in that sequence.

School or Dept. Name here

# of hidden neurons - p Say 3

School or Dept. Name here

School or Dept. Name here

School or Dept. Name here

Input to Hidden Weights (W): V×N

School or Dept. Name here

In the CBOW model, categorical cross-entropy loss is used

School or Dept. Name here

Step 3: Propagate Error Back to Hidden Layer

School or Dept. Name here

School or Dept. Name here

Step 5 Update Weights W

School or Dept. Name here

School or Dept. Name here

▪ Gather a labeled dataset for sentiment

Text Classification : Classifying customer reviews as positive or negative using

School or Dept. Name here

School or Dept. Name here

You might also like