Types of Neural Networks
Types of Neural Networks
Let's discuss the design of the last layer of network in particular. In addition to all hidden layers, it
completes the functions of dimensional transformation and feature extraction, and it is also used as an
output layer.
It is necessary to decide whether to use the activation function and what type of activation function to
use according to the specific tasks. We will classify the discussions based on the range of output
values.
[0, 1] Interval:
It is also common for output values to belong to interval [0, 1], such as image generation, and binary
classification problems. The binary classification network with single output node looks like:
In this case, you only need to add the Sigmoid function after the value of the output layer to translate
the output into a probability value.
Below figure shows the output layer of the binary classification network is two nodes.
The output value of the first node represents the probability of the occurrence of event A P(x), and the
output value of the second node represents the probability of the occurrence of the opposite event
P(x). The function can only compress a single value to the interval (0, 1) and does not consider the
relationship between the two node values. We hope that in addition to satisfy oi ∈ [0, 1], they can
satisfy the constraint that the sum of probabilities is 1:
The Softmax function can not only map the output value to the interval [0, 1] but also satisfy the
characteristic that the sum of all output values is 1.
z = tf.constant([2.,1.,0.1])
tf.nn.softmax(z)
Out[12]:
<tf.Tensor: id=19, shape=(3,), dtype=float32, numpy=array([0.6590012, 0.242433 , 0.0985659],
dtype=float32)>
(-1, 1) Interval
If you want the range of output values to be distributed in intervals (−1, 1), you can simply use the
tanh activation function:
I
x = tf.linspace(-6.,6.,10)
tf.tanh(x)
Out[15]:
<tf.Tensor: id=264, shape=(10,), dtype=float32, numpy= array([-0.9999877 , -0.99982315, -0.997458
, -0.9640276 ,-0.58278286, 0.5827831 , 0.9640276 , 0.997458 , 0.99982315,0.9999877 ],
dtype=float32)>
The design of the output layer has a certain flexibility, which can be designed according to the actual
application scenario, and make full use of the characteristics of the existing activation function.
ERROR CALCULATION:
After building the model structure, the next step is to select the appropriate error function to
calculate the error.
Common error functions are mean square error, cross-entropy, KL divergence, and hinge
loss.
Among them, the mean square error function and cross-entropy function are more
common in deep learning.
The mean square error function is mainly used for regression problems, and the cross-entropy
function is mainly used for classification problem.
Mean Square Error Function
Mean square error (MSE) function maps the output vector and the true vector to two points in the
Cartesian coordinate system, by calculating the Euclidean distance between these two points (to be
precise, the square of Euclidean distance) to measure the difference between the two vectors:
The value of MSE is always greater than or equal to 0. When the MSE function reaches the minimum
value of 0, the output is equal to the true label, and the parameters of the neural network reach the
optimal state.
You need to average again in the sample dimension to obtain the mean square error of the average
sample. The implementation is as follows:
loss = tf.reduce_mean(loss)
loss
Out[17]:
<tf.Tensor: id=30, shape=(), dtype=float32, numpy=1.2188747>
It can also be implemented in layer mode. The corresponding class is
keras.losses.MeanSquaredError().
Like other classes, the __call__ function can be called to complete the forward calculation. The code
is as follows:
criteon = keras.losses.MeanSquaredError()
loss = criteon(y_onehot,o)
loss
Out[18]:
<tf.Tensor: id=54, shape=(), dtype=float32, numpy=1.2188747>
For binary classification problems, where there are only two classes (0 and 1),
the binary cross-entropy loss is used. Given a true label
y (0 or 1) and a predicted probability y^ (a value between 0 and 1), the binary
cross-entropy loss is calculated as follows:
L(y,y^)=−(y⋅log(y^)+(1−y)⋅log(1−y^))
Where, L(y, y^) is the binary cross-entropy loss.
y is the true label (0 or 1).
y^is the predicted probability of belonging to class 1.
To calculate this loss for a batch of samples, you typically average the
individual losses.
For multi-class classification problems, where there are more than two classes,
the categorical cross-entropy loss is used. Given a true label y (a one-hot
encoded vector) and predicted class probabilities y^ (a vector of predicted
probabilities), the categorical cross-entropy loss is calculated as follows:
L(y,y^)=−∑i=1Nyi⋅log(y^i)
Where, L(y, y^) is the categorical cross-entropy loss.
y is a one-hot encoded vector representing the true class.
y^is a vector of predicted class probabilities for each class.
N is the number of classes.
To calculate this loss for a batch of samples, you typically average the
individual losses.
model = Sequential([
Dense(units=64, activation='relu', input_dim=input_dim),
Dense(units=1, activation='sigmoid') # For binary classification
])
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
For multi-class classification, you would typically use
'categorical_crossentropy' as the loss function and adapt your model architecture
accordingly.
During training, the optimization algorithm minimizes the chosen loss function,
which means it adjusts the model's parameters to make the predicted values
(probabilities) closer to the true labels.
TYPES OF NEURAL NETWORKS:
There are various types of neural networks, each designed to address specific
types of machine learning tasks. Here are some of the most common types of
neural networks:
Transformers:
Attention Mechanism:
Not a standalone network architecture but a mechanism integrated into
various neural networks.
Introduced in models like Transformers.
Allows the model to focus on different parts of the input sequence when
making predictions.
Essential for capturing long-range dependencies in sequential data.
Used in natural language processing for tasks like machine translation,
text summarization, and question-answering.
These are some of the fundamental types of neural networks, and there are
many more specialized architectures and variations tailored to specific
applications and research areas. The choice of the neural network architecture
depends on the nature of the problem you want to solve.