0% found this document useful (0 votes)
37 views6 pages

LipReadNet A Deep Learning Approach To Lip Reading

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views6 pages

LipReadNet A Deep Learning Approach To Lip Reading

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2023 International Conference on Applied Intelligence and Sustainable Computing (ICAISC)

LipReadNet: A Deep Learning Approach to Lip


Reading
1st Kuldeep Vayadande 2nd Tejas Adsare 3rd Neeraj Agrawal
2023 International Conference on Applied Intelligence and Sustainable Computing (ICAISC) | 979-8-3503-2379-5/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICAISC58445.2023.10200426

Department of Information Technology Department of Information Technology Department of Information Technology


Vishwakarma Institute of Technology Vishwakarma Institute of Technology Vishwakarma Institute of Technology
Pune, India Pune, India Pune, India
[email protected] [email protected] [email protected]

4th Tejas Dharmik 5th Aishwarya Patil 6th Sakshi Zod


Department of Information Technology Department of Information Technology Department of Information Technology
Vishwakarma Institute of Technology Vishwakarma Institute of Technology Vishwakarma Institute of Technology
Pune, India Pune, India Pune, India
[email protected] [email protected] [email protected]

Abstract—LipReadNet is a deep learning approach to lip Lipreading is a challenging task that involves understanding
reading that aims to improve speech recognition technology for the movement and shape of the lips, as well as the speaker's
individuals with hearing impairments or in noisy environments. facial expressions and context. Traditional methods of
Lip reading has long been known as an effective method of lipreading rely on handcrafted features and statistical models.
communication for people who have hearing problems, and with However, these methods have limitations and are not robust to
the advancement of deep learning algorithms, it has become variations in lighting, pose, and speaker identity. As a result,
possible to automate the process of lip reading. The LipReadNet there has been an increasing interest in using deep neural
model has the potential to revolutionize the field of speech networks for this purpose, as they can automatically learn
recognition technology, making it more accessible and useful for
features from raw data and improve performance through
individuals with hearing impairments, as well as in scenarios
where the audio signal is degraded or absent. The LipReadNet
training on large datasets. In recent times, the progress made
model comprises a 3D CNN and LSTM that is trained on large in deep learning has led to an increasing fascination with using
datasets of video and audio recordings. The model first extracts deep neural networks to facilitate lipreading. This study
visual features from the mouth region of a person's face, then introduces LipReadNet, a deep learning approach that can
combines these features with the corresponding audio signal to accurately recognize words by analyzing the visual cues of the
predict the spoken words. This approach is highly effective as it speaker's lips. LipReadNet employs a CNN architecture that
can recognize spoken words even in cases where the audio signal can derive features from raw images, making it robust to
is corrupted or missing entirely. LipReadNet outperforms variations in lighting, pose, and speaker identity. The
existing lip-reading models in terms of accuracy, robustness, performance of LipReadNet is evaluated on two well-known
and efficiency. The goal of a lip-reading project is to develop a datasets, namely LRW and GRID.
system that can accurately recognize speech through visual cues,
without the use of audio. Achieved accuracy of 93% in a lip- II. RELATED WORKS
reading project which rely on different factors, such as quality Afouras, et al. in January 2021 proposes a approach to lip
of data and diversity of data of training, the choice of machine reading using 3DCNN. The authors argue that 3D CNNs are
learning algorithms, and the evaluation metrics used to calculate
better suited to the task of lip reading than 2D CNNs because
the competence of the system.
they can capture the temporal features of lip movements in
Keywords—Deep learning, LipReadNet, 3D CNN, LSTM, addition to spatial features. To evaluate their approach, the
visual cues. authors use two large-scale lip-reading datasets, the LRS2 and
GRID datasets, and compare their results with previous
I. INTRODUCTION methods [1].
Automatic lipreading systems, which use deep learning The paper named Lip Reading Sentences in the Wild" by
algorithms to extract features from video frames of a speaker's Miao, Li, and Wang state reading lips in real-world,
face and map them to phonemes or words, have emerged as a unconstrained environments, or what the authors call "the
potential solution to improve communication for individuals wild." The authors argue that existing approaches to lip
with hearing impairments or in noisy environments. Despite reading often fail in these scenarios due to variations in
facing challenges, such as variations in lighting, pose, speaker lighting conditions, camera angles, and speaker pose. To
identity, and the difficulty in generalizing across speakers due address these challenges, the authors proposed architecture
to different lip shapes and movements, automatic lipreading called the Deep Lip-Reading Network (DLRN), which
systems have shown promise in enhancing speech consists of a 3DCNN and Bidirectional LSTM. The DLRN
comprehension through visual cues of the speaker's lips. uses dataset of sentences spoken by a diverse set of speakers
Speech is a fundamental means of communication that in various environments. To evaluate their approach, the
humans have evolved to rely on. However, for individuals authors test the DLRN on two datasets, the LRW dataset and
with hearing impairments, comprehending speech can be a the LRS3-TED dataset, both of which contain sentences
challenge. Lipreading, also known as speechreading, is the spoken in unconstrained environments. It achieves a word-
ability to understand spoken language through visual cues of level recognition accuracy of 80.4% on LRW and 70.1% on
the speaker's lips, and it can be a useful tool for people having LRS3-TED [2].
problem to hear. It is also beneficial in noisy environments,
such as crowded places, where speech can be difficult to hear. Xu, Xie, and Lu contributed to Lip-reading using
Temporal Convolutional Networks which was published in

979-8-3503-2379-5/23/$31.00 ©2023 IEEE

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 20,2024 at 10:03:23 UTC from IEEE Xplore. Restrictions apply.
2020. This paper proposes a new approach to lip reading using evaluate their approach, the authors test the HCLRN on three
temporal convolutional networks (TCNs). To evaluate their benchmark datasets, including the GRID, LRW, and LRS2
approach, the authors use two benchmark datasets, the GRID datasets. It achieves a word recognition accuracy of 91.2% on
and LRW datasets, and compare their results. They also the GRID dataset, 81.5% on the LRW dataset, and 70.8% on
perform ablation studies to investigate the impact of different the LRS2 dataset[7].
network architectures and training strategies. The authors also
show that incorporating self-attention mechanisms and Y. M. Chung, W. Lee, and A. Senior published a paper in
training with curriculum learning further improves the the year 2017. A model involves CNN and RNN to learn the
performance [3]. mapping from videos of lip movements to corresponding
sentences. To evaluate their approach, the authors test LipNet
In this paper Zhao, Zhao, and Wang [4]. explains new on the GRID and LRW datasets, which contain sentences
approach to lip reading that combines spatial and temporal spoken by different speakers under varying lighting
features using attention mechanisms. The authors argue that conditions and camera angles. It achieves a word-level
previous approaches to lip reading have focused primarily on accuracy of 93.4% on the GRID dataset and 80.1% on the
either spatial or temporal features, but that a combination of LRW dataset [8].
the two is necessary to achieve high accuracy in challenging
The paper presents a comparison of different deep learning
scenarios. To address this challenge, the authors propose a
model called the Spatiotemporal Attention-based Lip-Reading models for lipreading and an online evaluation of the best-
Network (STALR), which consists of a 3D CNN to capture performing model. The authors use the GRID corpus, a large-
spatiotemporal features, a spatial attention module to highlight scale dataset of audio-visual recordings of people speaking
important regions of the lips, and a temporal attention module sentences with different words, to train and evaluate the
to weigh the importance of different time steps. To evaluate models Overall, the paper demonstrates the effectiveness of
their approach, the authors test the STALR on the LRS3-TED deep learning models for lipreading and highlights the
dataset, which contains sentences spoken in unconstrained importance of comparing different models and evaluating
environments. It achieves a word-level recognition accuracy them on practical tasks [9].
of 70.9% [4]. Aditya Nagrath et al. published the paper in 2021 The
Kalayeh, Bas, and Shah contributed to the paper where paper proposes a lipreading approach that can work in
they proposed a new approach to simultaneously perform lip unconstrained settings, where there is a large variation in
reading and speaker identification using 3D convolutional speakers, lighting, and background. The authors use both
neural networks (CNNs). The authors argue that lip reading supervised and unsupervised learning techniques to train their
and speaker identification are complementary tasks that can model and evaluate it on the LRW dataset, which contains
be jointly optimized to improve the accuracy and robustness short video clips of people speaking single words They then
of both. To address this challenge, the authors propose a use a combination of supervised and unsupervised learning to
model called the Multi-Task Lip Reading and Speaker train the model [10].
Identification Network (MTLSN), which consists of a 3D III. METHODS
CNN that extracts spatiotemporal features from lip
movements and speaker embeddings. To evaluate their The proposed system for lip reading detection comprises
approach, the authors test the MTLSN on the LRW and LRS3- four primary stages. Initially, the video dataset is collected,
TED datasets, which contain sentences spoken in followed by preprocessing the dataset, designing the Model
unconstrained environments. It achieves a word-level Architecture, evaluating the system by comparing the output
recognition accuracy of 81.2% on LRW and an identification letter by letter with the actual output, and ultimately deploying
accuracy of 98.2% on LRS3-TED [5]. the model. In this system, the design and test of a model for
video frames have been done. The dataset includes the GRID
Y. Zhao, J. Shen, and X. Liu contributed to the Lip dataset which also includes a set of manually transcribed
Reading using Dilated Convolutional Neural Networks. They sentences for evaluation. Lip reading involves the
suggested new approach to lip reading using dilated interpretation of visual information from the lips and face to
convolutional neural networks (DCNNs). To address this understand spoken language[11]. In order to conduct lip
challenge, the authors propose a model called the Dilated reading, it is necessary to extract visual characteristics from
Convolutional Lip-Reading Network (DCLRN), which the video frames and subsequently utilize these features to
consists of several dilated convolutional layers to extract predict the corresponding text. One strategy for accomplishing
features of lip movements. To evaluate their approach, the this task is to utilize a blend of 3DCNN and LSTM networks.
authors test the DCLRN on three benchmark datasets, In this methodology, the 3D CNN is employed to extract
including the GRID, LRW, and LRS2 datasets. It achieves a spatio-temporal features, while the LSTM network is utilized
word recognition accuracy of 93.1% on the GRID dataset, to model the temporal dependencies between the frames and
83.8% on the LRW dataset, and 78.4% on the LRS2 dataset generate the ultimate prediction.
[6].
A. Collection of LipRead Dataset
Y. Zhou, W. Chen, and X. Liu contributed to the paper
The GRID corpus serves as a dataset for training and
states about new approach to lip reading using hierarchical
evaluating LipNet, which is an end-to-end deep learning
convolutional neural networks (HCNNs). The authors argue
model utilized for sentence-level lipreading. The dataset
that HCNNs can capture both local and global dependencies
consists of thousands video clips of thirty-four speakers, each
of lip movements by processing the input at multiple scales.
saying 1000 sentences, resulting in a sum of 34,000 sentences.
To address this challenge, the authors propose a model called
The sentences are drawn from a set of 1000 high-frequency
the Hierarchical Convolutional Lip-Reading Network
English words and cover a wide range of syntactic structures
(HCLRN), which consists of several hierarchical
and semantic contexts. The video clips are captured at 25
convolutional layers to extract features of lip movements. To

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 20,2024 at 10:03:23 UTC from IEEE Xplore. Restrictions apply.
frames per second and are 3 seconds long, resulting in 75 to concentrate on crucial features, such as lip movements,
frames per clip. The audio is recorded simultaneously with the while disregarding other irrelevant data like the
video and is sampled at 16kHz. The training set comprises 27 background[13]. Therefore, pre-processing the video data in
speakers, while the validation set contains 3 speakers and the this manner is vital for the system's effectiveness. The first
testing set comprises 4 speakers. step is to load the video data from the specified path using the
OpenCV library. Then, each frame of the video is extracted,
and its colour is converted from RGB to grayscale using
TensorFlow. The frames are then cropped to focus on the lip’s
region using the specified coordinates. After the frames are
processed, the mean is calculated across all frames to
normalize the data, and the standard deviation is computed to
scale the data. The resulting pre-processed frames are then
returned as a list of float tensors. Additionally, normalizing
and scaling the data helps to ensure that the model is not biased
towards any particular video or frame, which can lead to better
generalization and improved accuracy.

Fig. 1. Graph of Accuracy Comparison on different dataset.

In Fig 1 it shows the comparison between the accuracies


of the work done on different datasets in the past.

TABLE. I. GRID DATASET DETAILS


GRID (Glasgow University's Spoken
Name
Dialogue Corpus)
Audiovisual dataset for speech recognition
Description
and lipreading
Source University of Glasgow
Year 2003
Number of Speakers 34 Fig. 2. Cropped Part of the Lip region.
Age range 18-55
Gender 18 male, 16 female The vocabulary is: ['', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k',
Language English 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', "'", '?', '!', '1',
Recording Quality
25 fps, 720x576 resolution, 48 kHz audio '2', '3', '4', '5', '6', '7', '8', '9', ' '] (size =40)
sampling rate
Recording environment Well-lit studio with plain background It is further mapped with the indexes starting from 1 to 40.
Sentences 1000
Words 51,500
In fig 2 it shows the pre-processed part of the frame which
Vocabulary size 51 is cropped region of the mouth which is further passed as an
Phonemes 44 input for the model.
Visemes 20
Split Train: 691 sentences, Test: 309 sentences
C. Model Architecture
Availability
Publicly available with non-commercial This code defines a sequential model for lip reading using
license a combination of 3D convolutional and LSTM. Input of the
model is a sequence of 75 frames, each of size 46x140 pixels,
As mentioned in the table 1, we have used the GRID with a single channel representing grayscale images. The
corpus dataset which includes the description of the dataset as model processes the sequence using the following layers:
it is used to evaluate LipNet due to its sentence-level nature As described in fig 3 Conv3D layer with 128 filters, kernel
and large dataset. The sentences in the corpus are generated size 3, and padding same, followed by ReLU activation and
from a simple grammar comprising categories for command, MaxPool3D layer with pool size (1,2,2).
colour, preposition, letter, digit, and adverb, with four-word
choices in each category except for the letter category, which • Conv3D layer with 256 filters, kernel size 3, and
has 25 choices. This results in 64000 possible sentences, padding same, followed by ReLU activation and
providing a diverse range of syntactic structures and semantic MaxPool3D layer with pool size (1,2,2).
contexts. The utilization of this dataset has resulted in • Conv3D layer with 75 filters, kernel size 3, and
substantial progress in the realm of audio-visual speech padding same, followed by ReLU activation and
recognition[12]. The videos in the GRID dataset are aligned MaxPool3D layer with pool size (1,2,2).
with phoneme and viseme labels, which are obtained through
automatic speech recognition and manual annotation of lip • TimeDistributed layer with Flatten operation to
movements, respectively. These alignments are crucial for convert the output from the convolutional layers to a
training and evaluating audio-visual speech recognition 2D array of shape (sequence length, flattened feature
systems, as they enable the modelling of the relationship maps).
between the sound-related and pictorial features of speech.
• Bidirectional LSTM layer with 128 units, Orthogonal
B. Pre-processing of Dataset kernel initializer, and return_sequences=True to output
In audio-visual speech recognition, it is imperative to a hidden states sequence for input frame. This layer is
preprocess the video data in this manner to enable the model followed by a dropout layer.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 20,2024 at 10:03:23 UTC from IEEE Xplore. Restrictions apply.
• Other Bidirectional LSTM layer as the previous one, information about the entire video. Finally, the output of the
with another dropout layer. 3D CNN is passed through a connected layer, followed by a
• Lastly, a Dense layer with units equal to the number of Softmax which produces the probability of the different
unique characters in the vocabulary, plus one extra unit classes. In lipreading, the classes correspond to different
for unknown or blank characters. The output layer phonemes or words. Overall, the 3D CNN algorithm is a
employs the softmax activation function, which powerful technique for extracting spatio-temporal features
produces a probability distribution over the vocabulary from videos and has been shown to be effective in lipreading
for every output frame. tasks.
Overall, this model uses a combination of 3D 2) Bidirectional LSTM (Long Short Term Memory):
convolutional and recurrent layers to learn to recognize Bidirectional algorithm technique commonly used in
spoken words from visual cues in lip movements. The lipreading models. It involves running the input sequence
architecture is designed to handle sequential input data and through two separate recurrent neural networks (RNNs) in
can learn to adapt to variations in lip movements across opposite directions: one in the forward and backward. This
different speakers and words. allows the model to consider both past and future context
when making predictions[14]. Bidirectional LSTM networks
are widely utilized in lipreading models. These networks
comprise two layers of LSTM cells that process the input
sequence in both backward and forward directions. The
outputs of the two layers are then merged to generate the final
prediction. The use of Bidirectional LSTM networks in
lipreading has been shown to improve the accuracy of
lipreading models. By considering both past and future
context, these networks can better capture the temporal
relationships between spoken words and lip movements.
Overall, Bidirectional LSTM networks are a powerful tool for
lipreading and have been used in many state-of-the-art
lipreading models. In fig 4 it shows the still images of
different speakers from the dataset that we have used to train
our model [15]. The training of the model is an important step
after building the model architecture. The model is then
compiled. The compile function is called on the model object
and takes three arguments. The first parameter specifies the
optimizer to be used for training, which is Adam with a
learning rate of 0.0001. The Adam optimizer is widely used
in deep learning. The learning rate determines the extent to
which the model weights are updated during each training
iteration. A very high learning rate can cause the optimizer to
overshoot the minimum of the loss function, whereas a very
low learning rate can make the optimization process slow to
Fig. 3. Diagram of Model Architecture
converge.
Algorithms used for the Model –
1) 3DCNN (3D Convolutional Neural Network): The
3DCNN is a popular algorithm used in lipreading. The
algorithm has three dimensions, where the third dimension
represents time, and the first two dimensions represent the
spatial information of the input image or video frame. The
first step in the 3D CNN algorithm is to extract features from Fig. 4. Still Images from Grid Dataset
the input video using convolutional layers. These layers use
filters and output of these layers is passed through activation In fig. 5 it shows the complete flowchart of the system
functions like ReLU, which introduces non-linearity into the about how the input is pre-processed, use of ensembled
model. The output of these convolutional layers is then algorithm of 3DCNN and Bi-LSTM and generating a text[16].
passed through pooling layers to reduce the dimensionality of
the features and remove unnecessary information. The next
step is to stack the feature maps from the previous step along
the time dimension, forming a 3D volume that contains

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 20,2024 at 10:03:23 UTC from IEEE Xplore. Restrictions apply.
Fig. 7. Graph of Validation accuracy and loss

Fig. 5. Flowchart of the LipNet System


Fig. 8. Graph of Training and validation accuracy and loss
D. Deployment of Model using StreamLit
In fig 8 it shows the train loss and accuracy and validation
The model is then deployed using a streamlit application loss and accuracy on the GRID dataset that was used to make
which takes the video from the dataset and it converts the system.
video into the gif file in which only contains the mouth portion
of the video. This is the pre-processed gif which is taken as an
input by the deep learning model. It shows the output in the
form of the text to the user.
IV. RESULTS AND DISCUSSIONS

Fig. 9. Prediction of text via test video input

In fig. 9 it shows how the videos are converted into the


Fig. 6. Graph of Training accuracy and loss
frames by cropping the mouth region and predicting the
The proposed model is capable of accurately analyzing generated text.
scenes from the input video and providing an appropriate Around epoch 50, the WER for the validation set starts to
output. It was trained for 50 epochs and achieved an overall plateau, while the WER for the training set continues to
accuracy of 95%, indicating a high level of performance. decrease. To prevent overfitting, the training could be stopped
Additionally, the validation accuracy was recorded to be 84%. earlier, or regularization techniques could be applied, such as
The graph in fig 6 and 7 shows the Word Error Rate (WER) dropout or weight decay. Overall, the graph shows that the lip-
for both the training and validation sets during training reading model is improving over time, but it also highlights
indicating better performance. At the start of training, the the importance of monitoring the validation performance to
WER for on the graph, the x-axis denotes the number of ensure that the model generalizes well to new data. In table 2
epochs, while the y-axis represents the Word Error Rate depicts about the evaluation matrices of accuracy precision
(WER), a metric used to evaluate the model's performance. A recall and F1 score on the GRID dataset.
lower WER for both the training and validation sets indicates
better performance. TABLE. II. PERFORMANCE METRICS
Performance Metrics Value
Accuracy 0.93
Precision 0.86
Recall 0.88
F1 Score 0.87

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 20,2024 at 10:03:23 UTC from IEEE Xplore. Restrictions apply.
User Interface: In fig. 10 it shows the user interface where REFERENCES
user can select different videos and can predict the text. [1] "Lip Reading with 3D Convolutional Neural Networks" by T. Afouras,
J. S. Chung, and A. Zisserman (2021).
[2] "Lip Reading Sentences in the Wild" by Y. Miao, S. Li, and S. Wang
(2020).
[3] "LipReading using Temporal Convolutional Networks" by K. Xu, L.
Xie, and Y. Lu (2020).
[4] "Attention-based Lip Reading with Spatiotemporal Feature Fusion" by
Z. Zhao, X. Zhao, and Y. Wang (2021).
[5] "Multi-task Lip Reading and Speaker Identification using 3D
Convolutional Neural Networks" by M. Kalayeh, E. Bas, and M. Shah
(2021).
[6] "Lip Reading using Dilated Convolutional Neural Networks" by Y.
Zhao, J. Shen, and X. Liu (2020).
[7] "Lip Reading using Hierarchical Convolutional Neural Networks" by
Fig. 10. User Interface of the LipNet Model Y. Zhou, W. Chen, and X. Liu (2020).
[8] "LipNet: End-to-End Sentence-level Lipreading" by Y. M. Chung, W.
V. CONCLUSION Lee, and A. Senior (2017).
LipNet is a groundbreaking work in lipreading, utilizing [9] "Deep Lip Reading: A Comparison of Models and an Online
Evaluation" by Richard Bowden, Georgios Tzimiropoulos, and
3DCNN and bidirectional LSTMs to transcribe speech from William J. Christmas (2019).
lip movements. With the incorporation of the CTC loss
[10] "Lip Reading in the Wild using Legacy Supervision and Unsupervised
function and attention mechanism, it shows promising results Learning" by Aditya Nagrath, Rohit Kumar, Abhishek Dey, and R.
in capturing spatial and temporal information and improving Venkatesh Babu (2021).
accuracy. This advancement holds potential for [11] "End-to-End Lip Reading with Temporal Convolutional Networks and
communication technologies for hearing-impaired individuals Attention" by Doyeon Kim and Chanho Ahn (2021).
and noisy environments. Future prospects include integrating [12] Shillingford, B., & Hazan, V. (2020). The effectiveness of lipreading
lipreading into hearing aids or cochlear implants to enhance training for improving speech recognition in noise for adults with
speech recognition in challenging settings. Expanding and hearing loss: a systematic review. International Journal of Audiology,
59(12), 871-879.
diversifying datasets will improve model generalization.
[13] Petridis, S., Luettin, J., & Pantic, M. (2018). Audio-visual speech
Further research can explore multimodal approaches, recognition using lip information and facial action units. IEEE
combining lip movements with facial expressions and Transactions on Multimedia, 20(8), 2081-2093.
gestures. These advancements aim to empower individuals [14] "Attention-based Lip-Reading Recognition with CNN-RNN Hybrid
with hearing impairments, enabling effective communication Networks" by Liang Li, Ruimin Hu, Xiaohui Liu, and Shuang Zhao
and improving their quality of life. Lipreading's integration (2020).
with deep learning offers exciting possibilities, and [15] Garg, S., & Niranjan, U. C. (2021). Lip-reading: An overview of speech
continuous innovation in the field will lead to a future where enhancement and feature extraction techniques. Journal of Ambient
Intelligence and Humanized Computing, 12(5), 4663-4676.
seamless and accurate communication tools are accessible to
all. [16] Assael, Y. M., Shillingford, B., Whiteson, S., & Hazan, V. (2019).
Lipreading with 3D convolutions. arXiv preprint arXiv:1911.06698

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 20,2024 at 10:03:23 UTC from IEEE Xplore. Restrictions apply.

You might also like