Final Year Project
Final Year Project
on
Intelligent Communication System Based on CNN
Bachelor of Technology
in
Electronics and Communication Engineering
By
Abhimesh Sarin (20185108)
Shivam K Raj (20185029)
Ravi kumar (20185069)
i
Acknowledgement
We take this opportunity to express our deep sense of gratitude and heartfelt thanks to our
project supervisor, Dr. P. Karupannan (Asst. Professor), Department of Electronics &
Communication Engineering, Motilal Nehru National Institute of Technology
Allahabad for his constant guidance and insightful comments during the course of the
work. We shall always cherish our association with him for his constant encouragement
and freedom to think and action rendered to us throughout the work.
We are also thankful to our colleagues and friends for their constant support. Finally, we
deem it a great pleasure to thank one and all that helped us directly or indirectly in carrying
out this work.
ii
ABSTRACT
We all are using different types of communication systems for example cellular systems,
Wifi etc seemingly. The development in communication technology and the increase in
number of users,current technologies cannot meet the requirement for future use, hemce
there is need for better communication system.
iii
Table of Contents
Undertaking ii
Certificate iii
Acknowledgement iv
Abstract v
List of Tables ix
Abbreviations x
Chapter 1: Introduction
1.1 : Introduction 1
1.2 : Motivation 2
2.4 : Summary 15
3.1 : Objective 16
Chapter 5: Conclusion
5.1 : Conclusion 35
References 37
v
List of Figures
CNN-AE (DCNN-AE).
Figure 4.8:
vi
Figure 4.9: Validation Losses and training losses and of CNN-AE system for
AWGN Channel.
Figure 4.10: Validation losses and training losses and of CNN-AE system for
Rayleigh Fading Channel
vii
List of Tables
viii
Abbreviations
ix
Chapter 1
Introduction
1.1 Introduction
The face is the most expressive and communicative part of a human being [1]. It’s
able to transmit many emotions without saying a word. Facial expression recognition
identifies emotion from face image, it is a manifestation of the activity and personality of a
human. In the 20th century, the American psychologists Ekman and Friesen [2] defined six
basics’ emotions (anger, fear, disgust, sadness, surprise and happiness), which are the same
across cultures.
Facial expression recognition has brought much attention in the past years due to its
impact in clinical practice, sociable robotics and education. According to diverse research,
emotion plays an important role in education. Currently, a teacher use exams, questionnaires
and observations as sources of feedback but these classical methods often come with low
efficiency. Using facial expression of students the teacher can adjust their strategy and their
instructional materials to help foster learning of students.
10
1.2 Motivation
The main motivation behind this project is that suppose there is concert where hundred
of people are present so it is not possible for a human being to recognize the emotion of each
and every person so we will try develop this system which can detect facial expression of
every person present in the image.
Another thing is that if a person cannot speak and listen so its very difficult to recognize
its emotion so this also helps in recognize the facial expression of the person.
11
Chapter 2
Literature Survey
Hubel and Wiesel worked on this concept in 1962, demonstrating that some brain
neurons fired only when an edge in a specific orientation was present. For example, some
neurons fired when shown to horizontal edges and some other neurons are sensitive when
shown horizontal or diagonal edges. The researchers discovered that all of neurons are
organised in a vertical column-like layout, allowing them to form visual perception when
combined.
CNNs, or Convolutional Neural Networks, are a type of neural network that has a
grid-like architecture and is used to analyse data. Time-series data, which is akin to a 1-
Dimensional grid that captures samples at regular intervals of time, and picture data, which
is a 2-Dimensional grid of pixels, are two examples. The term "convolutional neural
network" refers to the network's usage of the convolutional mathematical procedure.
Convolution is a linear process that involves multiplying a set of weights with the
12
input, similar to a standard neural network.
The technique that was shown in Fig 2.1 was designed for 2-D input, the
multiplication is between array of weights called filter and two dimensional array of inputs.
The filter is smaller than the input, and the size of the filter is determined by the CPU's
calculation power. The multiplication applied between the input patch, which is the size
of the filter, and the filter (or kernel), which is then averaged to provide a single value, is
also known as the scalar product or dot product since it produces a single value.
We purposefully employ a filter that is smaller than the size of the input because it
allows us to multiply the filter by the input array by moving it from top to bottom and left
to right across the input grid.
Output from multiplication is single value because the filter is shifted and multiplied
many number of times to the input array, the result is a two-dimensional array of output
values that represent the input in very much filtered way. the two-dimensional array from
13
this multiplication is called a “feature map “.
14
2.2 Proposed Approach
A Convolutional Neural Network (CNN) is a deep artificial neural network that can
recognize visual patterns from the input image with minimal pre-processing compared to
other image classification algorithms. This implies that the network learns the filters that
were previously hand-engineered in traditional technique. A neuron is a crucial component
of a CNN layer. They're linked together such that the output of one layer's neurons becomes
the input of the following layer's neurons.
15
The backpropagation technique may be used to calculate the partial derivatives of cost
function. Convolution is the procedure of producing a feature map from an input image by
applying a filter or kernel. The CNN model has three layers. as shown in Figure 2:
ReLU Activation: The Rectified Linear Unit is the most frequently used activation function
in deep learning models. If the function receives any negative input, it returns 0, but if it
receives any positive value xx, it returns that value. So, it may be written as f(x)=max(0,x)
Pooling Layer: Each feature map's dimensionality is reduced while the most critical
information is retained. Pooling can be of different types: Max Pooling, Sum Pooling and
Average Pooling. Pooling has the goal of gradually shrinking the input representation's
spatial size and making the network invariant to small transformations, distortions, and
translations in the input picture. In our work, we took the maximum of the block as the single
output to pooling layer as shown in Figure 4.
16
Fully connected layer: Is a traditional Multi Layer Perceptron that uses an activation
function in the output layer. The term “Fully Connected” implies, every neuron in the
previous layer is connected to every other neuron of the next layer. The Fully Connected
layer's job is to categorise the input picture into several classes based on the training dataset
using the output of the convolutional and pooling layers. As a result, the Convolution and
Pooling layers extract features from the input picture, while the Fully Connected layer serves
as a classifier.
17
Figure 5 represents our CNN model. It contains 4 convolutional layers with 4
pooling layers to extract features, and 2 fully connected layers then the softmax layer with
7 emotion classes. The input picture is a grayscale facial image with a 48x48 pixel size.
3×3 filters were used for every convolutional layer with stride 2. For the pooling layers, we
used max pooling layer and 2×2 kernels with stride 2. As a result, we employed the
Rectified Linear Unit to inject nonlinearity into our model (ReLU), defined in Equation 2,
wich is the most used activation function recently.
R(z) = max(0, z)
18
19
Chapter3
Method Description and
Implementation
We used the FER2013 database to train our CNN architecture as shown in Figure 7. It
was generated using the Google image search API and was presented during the ICML 2013
Challenges. The database's faces have been automatically normalised to 48x48 pixels.. The
FER2013 database has 35887 images (28709 training images, 3589 validation images and
3589 test images) with 7 expression labels. The number of images for every emotion is
represented in Table II.
20
3.2 Model Description structure
21
We used this system architecture which is shown in figure 3.2 . First of all we
initialized webcam which is used to capture the video of the person present in front of
camera after then it is segregated into many frames by open cv technique. After that these
images are converted to grayscale image from RGB format smoothly on which we apply
haar cascade filter model easily to detect the face of the person in front of camera . Then
a rectangle is drawn around the face and looks like edges and photo is cropped to extract
the image of the face .
The cropped photo is work as input of the convolutional neural network which uses
function like ReLU(to reduce negative images) activation function and max pooling. Then
network is trained by unsupervised machine learning algorithm in such a way that model
will get optimal accuracy when we traine.
After doing haar cascade image is processed by Neural Network and it classify in
one of 7 emotion(Anger , sad , happy , fear , disgust , surprise , neutral) and results are easily
displayed by green colour on the screen with green coloured rectangle size surrounding the
face.
22
3.3 CNN Implementation
We used OpenCV(capture image by webcam) library [16] to capture live frames from web
camera and to detect students’ faces based on Haar Cascades technique [14] as shown in
Figure 8.Actually Haar Cascades uses the Adaboost learning algorithm invented by
Freund etal. [15], who won the 2003 Gödel Prize for their work. And Adaboost learning
algorithm chose a few number of significant features from a large set in order to provide an
effective result of classifiers. We built a Convolutional Neural Network model using
TensorFlow(also a libraray) [18] Keras [17] high-level API by google.
ORIGINAL TRANSFORMED
Fig. 9. Image augmentation using Keras(open source software
libraray)
Then we defined our CNN model with 4 convolutional layers, 4 pooling layers
(2X2)and 2 fully connected NEURAL network layers. After that, to provide non linearity in
our CNN model we applied the ReLU activation function and we also used batch
normalization to normalize the activation of the precedent layer at each batch and L2
regularisation technique to apply penalties on the different parameters of the model. Thus,
we chose softmax(logistic function) as our last activation function, it takes as input a vector
z of k numbers and normalizes it into a probability distribution function. The result of
softmax function is shown in Figure 10:.
24
To train our CNN model then we had to split the database into 80% training data
and 20% test data, then we compiled the model using Stochastic gradient descent (SGD)
optimizer. At each epoch, Keras is use to checks if our model performed better than the
models of the previous epochs or not. If it is in this case, the new best model we found
weights are saved into a file. And this will allow us to load directly the weights without
having to retrieve it if we want to use it in another situation.
CHAPTER 4
EXPERIMENTAL RESULTS
We trained our CNN(happiness, anger, sadness, disgust, neutral, fear and surprise).The
detected face images are resized to 48×48 pixels, and converted to grayscale images after
then we were used for inputs to the CNN model. Thus, 3 youthful Bachelor’s students from
our faculty participated in the experiment, among them there were wearing glasses. The
Figure 11 shows the emotions. We found Results of 3 students.
We achieved an accuracy rate of 81% at the 100 epochs. To evaluate the efficiency and the
quality of our proposed method we have to calculate confusion matrix, precision, recall and
F1-score as shown in Figure 12 and in Figure 13, respectively. Our model is very good for
predicting happy and surprised faces more accurately. However it predicts quite poorly
feared faces because it confuses them with sad faces little bit.
25
Fig. 1 2. Confusion matrix of the proposed method on FER database
We Trained our Convolution neural network by 100 epochs and achieved the validation
26
accuracy of 81.4 % and training accuracy if 87.83 percent respectively. It is optimal accuracy
we achieved yet. If we train our Model with more number of epochs the validation accuracy
decreases significantly due to extensive curve fitting on training data set with more
computational power.
If we Train our CNN model with less number of epochs cycle the validation accuracy
also decreases as well as neural network cannot able to adjust the right baised due to less
training.
27
Chapter 5
Conclusions
5.1 Conclusion
A more and better accurate technique for face emotion recognition system has been
developed which can work under any environment. This system works very efficiently and
efficiently . it also increases the rate at which data is exchanged ,and allowing all the new
system to implement all the new services quickly as well. This intelligent communication
system can also deal with the increase in number of users as the system is fully automated
and organised. It is a kind of machine which has been trained from many different sets of
data given by user.
Deep learning is the technique which is used in order to design the intelligent
communication system more easily. The system based on DL are capable of optimizing all
the components of transmitter and receivers by using the concept of auto
encoder(unsupervised learning technique).Hence a Convolution based Neural Network
based auto encoder communication system was proposed. The main motive to apply
autoencoder is that it is able to generalize non linear transformation with non-linear
activation function and also to special block length, special channel utilization and diverse
throughput and the gadget can paintings in a particular SNR point perfectly, whilst running
on SNR variety as a whole progress.
28
intelligently when transmitting over non-popular off-again channel, the proposed network
is in a position to perform b e t t e r t h a n existing schemes, since the learned
representations of the signal adaptation to the difference
29
environment through constellation optimization which is happen in large dimension space.
Furthermore, a DCNN-AE scheme is presented in order to overcome the want of channel
estimation as much as an extent. Finally, we displayed that the CNN-AE structure may be
skilled with less no of epochs and this machine has rapid learning convergence.
We used a AWGN channel and for this where the optimal solutions are known, The
Fig. 4.2 and Fig. 4.3 have already demonstrated that the system can match that of the
conventional\schemes and various supervised and unsupervised machine learning
algorithm to simulate human brain. But in case of the system fail if the channel does not
obey mathematical analysis then we use in different way.. So this can be taken for future
analysis that how our auto encoder system will handle the situation when system equations
cannot be identified.
30
References
[2] https://www.rohm.com/electronics-basics/wireless/modulation-methods
- Modulation techniques.
31