0% found this document useful (0 votes)
44 views7 pages

Types of Neural Networks

Uploaded by

Sai someone
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views7 pages

Types of Neural Networks

Uploaded by

Sai someone
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

DESIGN OF OUTPUT LAYER:

Let's discuss the design of the last layer of network in particular. In addition to all hidden layers, it
completes the functions of dimensional transformation and feature extraction, and it is also used as an
output layer.
It is necessary to decide whether to use the activation function and what type of activation function to
use according to the specific tasks. We will classify the discussions based on the range of output
values.

[0, 1] Interval:
It is also common for output values to belong to interval [0, 1], such as image generation, and binary
classification problems. The binary classification network with single output node looks like:

In this case, you only need to add the Sigmoid function after the value of the output layer to translate
the output into a probability value.
Below figure shows the output layer of the binary classification network is two nodes.

The output value of the first node represents the probability of the occurrence of event A P(x), and the
output value of the second node represents the probability of the occurrence of the opposite event
P(x). The function can only compress a single value to the interval (0, 1) and does not consider the
relationship between the two node values. We hope that in addition to satisfy oi ∈ [0, 1], they can
satisfy the constraint that the sum of probabilities is 1:

[0,1] Interval with Sum 1:


For cases that the output value oi ∈ [0, 1], and the sum of all output values is 1, it is the most common
problem with multi-classification. As shown in Figure each output node of the output layer represents
a category.
The network structure in the figure is used to handle three classification tasks. The output value
distribution of the three nodes represents the probability that the current sample belongs to category
A, B, and C: P(x), P(B| x), and P(C| x). Because the sample in the multi-classification
problem can only belong to one of the categories, so the sum of the probabilities of all categories
should be 1.
This can be achieved by adding a Softmax function to the output layer. The Softmax function is
defined as:

The Softmax function can not only map the output value to the interval [0, 1] but also satisfy the
characteristic that the sum of all output values is 1.

z = tf.constant([2.,1.,0.1])
tf.nn.softmax(z)
Out[12]:
<tf.Tensor: id=19, shape=(3,), dtype=float32, numpy=array([0.6590012, 0.242433 , 0.0985659],
dtype=float32)>

(-1, 1) Interval
If you want the range of output values to be distributed in intervals (−1, 1), you can simply use the
tanh activation function:
I
x = tf.linspace(-6.,6.,10)
tf.tanh(x)
Out[15]:
<tf.Tensor: id=264, shape=(10,), dtype=float32, numpy= array([-0.9999877 , -0.99982315, -0.997458
, -0.9640276 ,-0.58278286, 0.5827831 , 0.9640276 , 0.997458 , 0.99982315,0.9999877 ],
dtype=float32)>

The design of the output layer has a certain flexibility, which can be designed according to the actual
application scenario, and make full use of the characteristics of the existing activation function.

ERROR CALCULATION:
 After building the model structure, the next step is to select the appropriate error function to
calculate the error.
 Common error functions are mean square error, cross-entropy, KL divergence, and hinge
loss.
 Among them, the mean square error function and cross-entropy function are more
common in deep learning.
 The mean square error function is mainly used for regression problems, and the cross-entropy
function is mainly used for classification problem.
Mean Square Error Function
Mean square error (MSE) function maps the output vector and the true vector to two points in the
Cartesian coordinate system, by calculating the Euclidean distance between these two points (to be
precise, the square of Euclidean distance) to measure the difference between the two vectors:

The value of MSE is always greater than or equal to 0. When the MSE function reaches the minimum
value of 0, the output is equal to the true label, and the parameters of the neural network reach the
optimal state.

o = tf.random.normal([2,10]) # Network output


y_onehot = tf.constant([1,3]) # Real label
y_onehot = tf.one_hot(y_onehot, depth=10)
loss = keras.losses.MSE(y_onehot, o) # Calculate MSE
loss
Out[16]:
<tf.Tensor: id=27, shape=(2,), dtype=float32,
numpy=array([0.779179 , 1.6585705], dtype=float32)>

You need to average again in the sample dimension to obtain the mean square error of the average
sample. The implementation is as follows:

loss = tf.reduce_mean(loss)
loss
Out[17]:
<tf.Tensor: id=30, shape=(), dtype=float32, numpy=1.2188747>
It can also be implemented in layer mode. The corresponding class is
keras.losses.MeanSquaredError().
Like other classes, the __call__ function can be called to complete the forward calculation. The code
is as follows:

criteon = keras.losses.MeanSquaredError()
loss = criteon(y_onehot,o)
loss
Out[18]:
<tf.Tensor: id=54, shape=(), dtype=float32, numpy=1.2188747>

Cross-Entropy Error Function:

Calculating the cross-entropy error function in a neural network involves


computing the loss between the predicted values (often probabilities) generated
by the model and the true labels or target values. As mentioned earlier, there are
two common variants of cross-entropy loss: binary cross-entropy and
categorical cross-entropy.

Binary Cross-Entropy (Binary Log Loss):

For binary classification problems, where there are only two classes (0 and 1),
the binary cross-entropy loss is used. Given a true label
y (0 or 1) and a predicted probability y^ (a value between 0 and 1), the binary
cross-entropy loss is calculated as follows:
L(y,y^)=−(y⋅log(y^)+(1−y)⋅log(1−y^))
Where, L(y, y^) is the binary cross-entropy loss.
y is the true label (0 or 1).
y^is the predicted probability of belonging to class 1.
To calculate this loss for a batch of samples, you typically average the
individual losses.

Categorical Cross-Entropy (Multi-Class Log Loss):

For multi-class classification problems, where there are more than two classes,
the categorical cross-entropy loss is used. Given a true label y (a one-hot
encoded vector) and predicted class probabilities y^ (a vector of predicted
probabilities), the categorical cross-entropy loss is calculated as follows:
L(y,y^)=−∑i=1Nyi⋅log(y^i)
Where, L(y, y^) is the categorical cross-entropy loss.
y is a one-hot encoded vector representing the true class.
y^is a vector of predicted class probabilities for each class.
N is the number of classes.

To calculate this loss for a batch of samples, you typically average the
individual losses.

Here's an example of how to set the loss function in Keras:

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import Dense

model = Sequential([
Dense(units=64, activation='relu', input_dim=input_dim),
Dense(units=1, activation='sigmoid') # For binary classification
])

model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
For multi-class classification, you would typically use
'categorical_crossentropy' as the loss function and adapt your model architecture
accordingly.

During training, the optimization algorithm minimizes the chosen loss function,
which means it adjusts the model's parameters to make the predicted values
(probabilities) closer to the true labels.
TYPES OF NEURAL NETWORKS:
There are various types of neural networks, each designed to address specific
types of machine learning tasks. Here are some of the most common types of
neural networks:

Feedforward Neural Network (FNN):

 Also known as Multi-Layer Perceptrons (MLP).


 The simplest form of neural network.
 Consists of an input layer, one or more hidden layers, and an output layer.
 Used for tasks like classification and regression.

Convolutional Neural Network (CNN):

 Specifically designed for processing grid-like data, such as images.


 Employs convolutional layers to automatically learn spatial hierarchies of
features.
 Widely used in image classification, object detection, and image
segmentation.

Recurrent Neural Network (RNN):

 Designed for sequential data and time-series analysis.


 Contains recurrent layers that maintain hidden states to capture temporal
dependencies.
 Suitable for tasks like natural language processing, speech recognition,
and time-series prediction.

Long Short-Term Memory (LSTM):

 A type of RNN with improved ability to capture long-term dependencies.


 Utilizes memory cells and gates to control the flow of information.
 Excellent for sequential tasks where context over long sequences is
essential.

Gated Recurrent Unit (GRU):


 Similar to LSTM but with a simpler architecture.
 Uses gating mechanisms to control information flow.
 Offers a balance between performance and complexity compared to
LSTM.
Autoencoder (AE):

 Unsupervised learning neural network used for dimensionality reduction


and feature learning.
 Comprises an encoder to reduce input data dimensions and a decoder to
reconstruct the original data.
 Used in image denoising, anomaly detection, and recommendation
systems.

Variational Autoencoder (VAE):

 An extension of autoencoders with probabilistic properties.


 Encourages the model to generate data points similar to those in the
training dataset.
 Commonly used in generating new data samples and data representation
learning.

Generative Adversarial Network (GAN):


 Comprises a generator network and a discriminator network.
 Trains by having the generator and discriminator compete against each
other.
 Used for generating synthetic data, image-to-image translation, and style
transfer.

Radial Basis Function Network (RBFN):


 Utilizes radial basis functions as activation functions.
 Suitable for interpolation, approximation, and function approximation
tasks.

Self-Organizing Maps (SOM):

 Used for clustering and dimensionality reduction.


 Organizes data points in a low-dimensional grid while preserving
topological relationships.

Residual Neural Network (ResNet):

 Addresses the vanishing gradient problem by using skip connections.


 Enables the training of extremely deep neural networks.
 Commonly used in image recognition tasks.
Siamese Network:

 Designed for tasks involving similarity or dissimilarity comparisons.


 Consists of two identical subnetworks with shared weights.
 Often used in face recognition and signature verification.

Transformers:

 Introduced in the field of natural language processing (NLP).


 Utilizes attention mechanisms to capture contextual information.
 The basis for models like BERT, GPT, and T5 for various NLP tasks.

Graph Convolutional Neural Network (Graph CNN or GCN):

 Designed for processing graph-structured data.


 Utilizes graph convolutional layers to propagate information between
connected nodes in a graph.
 Used in tasks such as node classification, link prediction, and graph
classification in areas like social network analysis and recommendation
systems.

Attention Mechanism:
 Not a standalone network architecture but a mechanism integrated into
various neural networks.
 Introduced in models like Transformers.
 Allows the model to focus on different parts of the input sequence when
making predictions.
 Essential for capturing long-range dependencies in sequential data.
 Used in natural language processing for tasks like machine translation,
text summarization, and question-answering.

These are some of the fundamental types of neural networks, and there are
many more specialized architectures and variations tailored to specific
applications and research areas. The choice of the neural network architecture
depends on the nature of the problem you want to solve.

You might also like