0% found this document useful (0 votes)

34 views19 pages

Q R N N: Uaternion Ecurrent Eural Etworks

This paper introduces quaternion recurrent neural networks (QRNN) and quaternion long-short term memory networks (QLSTM) to better model multi-dimensional sequential data by leveraging quaternion algebra. The proposed models demonstrate improved performance in automatic speech recognition tasks compared to traditional RNNs and LSTMs, achieving better accuracy with significantly fewer parameters. The findings suggest that quaternion-based architectures are more efficient for capturing internal dependencies in complex data structures.

Uploaded by

JAFOR MOHAMMAD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views19 pages

Q R N N: Uaternion Ecurrent Eural Etworks

Uploaded by

JAFOR MOHAMMAD

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Published as a conference paper at ICLR 2019

Q UATERNION R ECURRENT N EURAL N ETWORKS

Titouan Parcollet1,4 , Mirco Ravanelli2 , Mohamed Morchid1 , Georges Linarès1 ,
Chiheb Trabelsi2,5 , Renato De Mori1,3 , Yoshua Bengio2 ∗
1
LIA, Université d’Avignon, France
2
MILA, Université de Montréal, Québec, Canada
3
McGill University, Québec, Canada
4
Orkis, Aix-en-provence, France
5
Element AI, Montréal, Québec, Canada
[email protected],
arXiv:1806.04418v3 [stat.ML] 7 Jan 2019

[email protected],
[email protected],
[email protected], [email protected]

A BSTRACT
Recurrent neural networks (RNNs) are powerful architectures to model sequential
data, due to their capability to learn short and long-term dependencies between
the basic elements of a sequence. Nonetheless, popular tasks such as speech or
images recognition, involve multi-dimensional input features that are characterized
by strong internal dependencies between the dimensions of the input vector. We
propose a novel quaternion recurrent neural network (QRNN), alongside with
a quaternion long-short term memory neural network (QLSTM), that take into
account both the external relations and these internal structural dependencies
with the quaternion algebra. Similarly to capsules, quaternions allow the QRNN
to code internal dependencies by composing and processing multidimensional
features as single entities, while the recurrent operation reveals correlations between
the elements composing the sequence. We show that both QRNN and QLSTM
achieve better performances than RNN and LSTM in a realistic application of
automatic speech recognition. Finally, we show that QRNN and QLSTM reduce
by a maximum factor of 3.3x the number of free parameters needed, compared to
real-valued RNNs and LSTMs to reach better results, leading to a more compact
representation of the relevant information.

1 I NTRODUCTION
In the last few years, deep neural networks (DNN) have encountered a wide success in different
domains due to their capability to learn highly complex input to output mapping. Among the different
DNN-based models, the recurrent neural network (RNN) is well adapted to process sequential
data. Indeed, RNNs build a vector of activations at each timestep to code latent relations between
input vectors. Deep RNNs have been recently used to obtain hidden representations of speech unit
sequences (Ravanelli et al., 2018a) or text word sequences (Conneau et al., 2018), and to achieve
state-of-the-art performances in many speech recognition tasks (Graves et al., 2013a;b; Amodei et al.,
2016; Povey et al., 2016; Chiu et al., 2018). However, many recent tasks based on multi-dimensional
input features, such as pixels of an image, acoustic features, or orientations of 3D models, require
to represent both external dependencies between different entities, and internal relations between
the features that compose each entity. Moreover, RNN-based algorithms commonly require a huge
number of parameters to represent sequential data in the hidden space.

Quaternions are hypercomplex numbers that contain a real and three separate imaginary
components, perfectly fitting to 3 and 4 dimensional feature vectors, such as for image processing
and robot kinematics (Sangwine, 1996; Pei & Cheng, 1999; Aspragathos & Dimitros, 1998). The
∗
CIFAR Senior Fellow

1
Published as a conference paper at ICLR 2019

idea of bundling groups of numbers into separate entities is also exploited by the recent manifold
and capsule networks (Chakraborty et al., 2018; Sabour et al., 2017). Contrary to traditional
homogeneous representations, capsule and quaternion networks bundle sets of features together.
Thereby, quaternion numbers allow neural network based models to code latent inter-dependencies
between groups of input features during the learning process with fewer parameters than RNNs, by
taking advantage of the Hamilton product as the equivalent of the ordinary product, but between
quaternions. Early applications of quaternion-valued backpropagation algorithms (Arena et al.,
1994; 1997) have efficiently solved quaternion functions approximation tasks. More recently, neural
networks of complex and hypercomplex numbers have received an increasing attention (Hirose &
Yoshida, 2012; Tygert et al., 2016; Danihelka et al., 2016; Wisdom et al., 2016), and some efforts
have shown promising results in different applications. In particular, a deep quaternion network
(Parcollet et al., 2016; 2017a;b), a deep quaternion convolutional network (Gaudet & Maida, 2018;
Parcollet et al., 2018), or a deep complex convolutional network (Trabelsi et al., 2017) have been
employed for challenging tasks such as images and language processing. However, these applications
do not include recurrent neural networks with operations defined by the quaternion algebra.

This paper proposes to integrate local spectral features in a novel model called quaternion
recurrent neural network1 (QRNN), and its gated extension called quaternion long-short term
memory neural network (QLSTM). The model is proposed along with a well-adapted parameters
initialization and turned out to learn both inter- and intra-dependencies between multidimensional
input features and the basic elements of a sequence with drastically fewer parameters (Section 3),
making the approach more suitable for low-resource applications. The effectiveness of the proposed
QRNN and QLSTM is evaluated on the realistic TIMIT phoneme recognition task (Section 4.2)
that shows that both QRNN and QLSTM obtain better performances than RNNs and LSTMs with a
best observed phoneme error rate (PER) of 18.5% and 15.1% for QRNN and QLSTM, compared
to 19.0% and 15.3% for RNN and LSTM. Moreover, these results are obtained alongside with a
reduction of 3.3 times of the number of free parameters. Similar results are observed with the larger
Wall Street Journal (WSJ) dataset, whose detailed performances are reported in the Appendix 6.1.1.

2 M OTIVATIONS

A major challenge of current machine learning models is to well-represent in the latent space the
astonishing amount of data available for recent tasks. For this purpose, a good model has to efficiently
encode local relations within the input features, such as between the Red, Green, and Blue (R,G,B)
channels of a single image pixel, as well as structural relations, such as those describing edges or
shapes composed by groups of pixels. Moreover, in order to learn an adequate representation with the
available set of training data and to avoid overfitting, it is convenient to conceive a neural architecture
with the smallest number of parameters to be estimated. In the following, we detail the motivations
to employ a quaternion-valued RNN instead of a real-valued one to code inter and intra features
dependencies with fewer parameters.
As a first step, a better representation of multidimensional data has to be explored to naturally capture
internal relations within the input features. For example, an efficient way to represent the information
composing an image is to consider each pixel as being a whole entity of three strongly related
elements, instead of a group of uni-dimensional elements that could be related to each other, as in
traditional real-valued neural networks. Indeed, with a real-valued RNN, the latent relations between
the RGB components of a given pixel are hardly coded in the latent space since the weight has to
find out these relations among all the pixels composing the image. This problem is effectively solved
by replacing real numbers with quaternion numbers. Indeed, quaternions are fourth dimensional
and allow one to build and process entities made of up to four related features. The quaternion
algebra and more precisely the Hamilton product allows quaternion neural network to capture these
internal latent relations within the features encoded in a quaternion. It has been shown that QNNs
are able to restore the spatial relations within 3D coordinates (Matsui et al., 2004), and within color
pixels (Isokawa et al., 2003), while real-valued NN failed. This is easily explained by the fact that
the quaternion-weight components are shared through multiple quaternion-input parts during the
Hamilton product , creating relations within the elements. Indeed, Figure 1 shows that the multiple
weights required to code latent relations within a feature are considered at the same level as for
1
https://github.com/Orkis-Research/Pytorch-Quaternion-Neural-Networks

2
Published as a conference paper at ICLR 2019

Figure 1: Illustration of the input features (Qin ) latent relations learning ability of a quaternion-valued
layer (right) due to the quaternion weight sharing of the Hamilton product (Eq. 5), compared to a
standard real-valued layer (left).

learning global relations between different features, while the quaternion weight w codes these
internal relations within a unique quaternion Qout during the Hamilton product (right).
Then, while bigger neural networks allow better performances, quaternion neural networks make it
possible to deal with the same signal dimension but with four times less neural parameters. Indeed, a
4-number quaternion weight linking two 4-number quaternion units only has 4 degrees of freedom,
whereas a standard neural net parametrization has 4 × 4 = 16, i.e., a 4-fold saving in memory.
Therefore, the natural multidimensional representation of quaternions alongside with their ability
to drastically reduce the number of parameters indicate that hyper-complex numbers are a better fit
than real numbers to create more efficient models in multidimensional spaces. Based on the success
of previous deep quaternion convolutional neural networks and smaller quaternion feed-forward
architectures (Kusamichi et al., 2004; Isokawa et al., 2009; Parcollet et al., 2017a), this work proposes
to adapt the representation of hyper-complex numbers to the capability of recurrent neural networks
in a natural and efficient framework to multidimensional sequential tasks such as speech recognition.
Modern automatic speech recognition systems usually employ input sequences composed of mul-
tidimensional acoustic features, such as log Mel features, that are often enriched with their first,
second and third time derivatives (Davis & Mermelstein, 1990; Furui, 1986), to integrate contextual
information. In standard RNNs, static features are simply concatenated with their derivatives to form
a large input vector, without effectively considering that signal derivatives represent different views of
the same input. Nonetheless, it is crucial to consider that time derivatives of the spectral energy in a
given frequency band at a specific time frame represent a special state of a time-frame, and are linearly
correlated (Tokuda et al., 2003). Based on the above motivations and the results observed on previous
works about quaternion neural networks, we hypothesize that quaternion RNNs naturally provide
a more suitable representation of the input sequence, since these multiple views can be directly
embedded in the multiple dimensions space of the quaternion, leading to better generalization.

3 Q UATERNION RECURRENT NEURAL NETWORKS

This Section describes the quaternion algebra (Section 3.1), the internal quaternion representation
(Section 3.2), the backpropagation through time (BPTT) for quaternions (Section 3.3.2), and proposes
an adapted weight initialization to quaternion-valued neurons (Section 3.4).

3.1 Q UATERNION ALGEBRA

The quaternion algebra H defines operations between quaternion numbers. A quaternion Q is an

extension of a complex number defined in a four dimensional space as:
Q = r1 + xi + yj + zk, (1)

3
Published as a conference paper at ICLR 2019

where r, x, y, and z are real numbers, and 1, i, j, and k are the quaternion unit basis. In a quaternion,
r is the real part, while xi + yj + zk with i2 = j2 = k2 = ijk = −1 is the imaginary part, or the
vector part. Such a definition can be used to describe spatial rotations. The information embedded in
the quaterion Q can be summarized into the following matrix of real numbers:
r −x −y −z
 
x r −z y 
Qmat =  . (2)
y z r −x
z −y x r
The conjugate Q∗ of Q is defined as:
Q∗ = r1 − xi − yj − zk. (3)
/
Then, a normalized or unit quaternion Q is expressed as:
Q
Q/ = p . (4)
r + x + y2 + z2
2 2

Finally, the Hamilton product ⊗ between two quaternions Q1 and Q2 is computed as follows:
Q1 ⊗ Q2 =(r1 r2 − x1 x2 − y1 y2 − z1 z2 ) + (r1 x2 + x1 r2 + y1 z2 − z1 y2 )i+
(r1 y2 − x1 z2 + y1 r2 + z1 x2 )j + (r1 z2 + x1 y2 − y1 x2 + z1 r2 )k. (5)
The Hamilton product (a graphical view is depicted in Figure 1) is used in QRNNs to perform
transformations of vectors representing quaternions, as well as scaling and interpolation between two
rotations following a geodesic over a sphere in the R3 space as shown in (Minemoto et al., 2017).

3.2 Q UATERNION REPRESENTATION

The QRNN is an extension of the real-valued (Medsker & Jain, 2001) and complex-valued (Hu
& Wang, 2012; Song & Yam, 1998) recurrent neural networks to hypercomplex numbers. In a
quaternion dense layer, all parameters are quaternions, including inputs, outputs, weights, and biases.
The quaternion algebra is ensured by manipulating matrices of real numbers (Gaudet & Maida, 2018).
Consequently, for each input vector of size N , output vector of size M , dimensions are split into
four parts: the first one equals to r, the second is xi, the third one equals to yj, and the last one to
zk to compose a quaternion Q = r1 + xi + yj + zk. The inference process of a fully-connected
layer is defined in the real-valued space by the dot product between an input vector and a real-valued
M × N weight matrix. In a QRNN, this operation is replaced with the Hamilton product (Eq. 5) with
quaternion-valued matrices (i.e. each entry in the weight matrix is a quaternion). The computational
complexity of quaternion-valued models is discussed in Appendix 6.1.2

3.3 L EARNING ALGORITHM

The QRNN differs from the real-valued RNN in each learning sub-processes. Therefore, let xt be
the input vector at timestep t, ht the hidden state, Whx , Why and Whh the input, output and hidden
states weight matrices respectively. The vector bh is the bias of the hidden state and pt , yt are the
output and the expected target vectors. More details of the learning process and the parametrization
are available on Appendix 6.2.

3.3.1 F ORWARD PHASE

Based on the forward propagation of the real-valued RNN (Medsker & Jain, 2001), the QRNN
forward equations are extended as follows:
ht = α(Whh ⊗ ht−1 + Whx ⊗ xt + bh ), (6)
where α is a quaternion split activation function (Xu et al., 2017; Tripathi, 2016) defined as:
α(Q) = f (r) + f (x)i + f (y)j + f (z)k, (7)
with f corresponding to any standard activation function. The split approach is preferred in this work
due to better prior investigations, better stability (i.e. pure quaternion activation functions contain
singularities), and simpler computations. The output vector pt is computed as:
pt = β(Why ⊗ ht ), (8)
where β is any split activation function. Finally, the objective function is a classical loss applied
component-wise (e.g., mean squared error, negative log-likelihood).

4
Published as a conference paper at ICLR 2019

3.3.2 Q UATERNION BACKPROPAGATION T HROUGH T IME

The backpropagation through time (BPTT) for quaternion numbers (QBPTT) is an extension of the
standard quaternion backpropagation (Nitta, 1995), and its full derivation is available in Appendix
∂Et
6.3. The gradient with respect to the loss Et is expressed for each weight matrix as ∆thy = ∂W hy
,
∂Et ∂Et ∂Et ∂Et
∆thh = ∂Whh , ∆thx = ∂Whx , for the bias vector as ∆tb = ∂Bh , and is generalized to ∆t = ∂W with:
∂Et ∂Et ∂Et ∂Et ∂Et
= r
+i i
+j j
+k . (9)
∂W ∂W ∂W ∂W ∂W k
Each term of the above relation is then computed by applying the chain rule. Indeed, and conversaly
to real-valued backpropagation, QBPTT must defines the dynamic of the loss w.r.t to each component
of the quaternion neural parameters. As a use-case for the equations, the mean squared error at a
timestep t and named Et is used as the loss function. Moreover, let λ be a fixed learning rate. First,
the weight matrix Why is only seen in the equations of pt . It is therefore straightforward to update
each weight of Why at timestep t following:
∂Et
Why = Why − λ∆thy ⊗ h∗t , with ∆thy = = (pt − yt ), (10)
∂Why
where h∗t is the conjugate of ht . Then, the weight matrices Whh , Whx and biases bh are arguments
of ht with ht−1 involved, and the update equations are derived as:
Whh = Whh − λ∆thh , Whx = Whx − λ∆thx , bh = bh − λ∆tb , (11)
with,
t
X t
Y t
X t
Y t
X t
Y
∆thh = ( δn ) ⊗ h∗m−1 , ∆thx = ( δn ) ⊗ x∗m , ∆tb = ( δn ), (12)
m=0 n=m m=0 n=m m=0 n=m
and,
∗
⊗ δn+1 × α0 (hpreact

Whh n ) if n 6= t
δn = ∗ 0 (13)
Why ⊗ (pn − yn ) × β (ppreact
n ) otherwise,
with hpreact
n and ppreact
n the pre-activation values of hn and pn respectively.

3.4 PARAMETER INITIALIZATION

A well-designed parameter initialization scheme strongly impacts the efficiency of a DNN. An

appropriate initialization, in fact, improves DNN convergence, reduces the risk of exploding or
vanishing gradient, and often leads to a substantial performance improvement (Glorot & Bengio,
2010). It has been shown that the backpropagation through time algorithm of RNNs is degraded
by an inappropriated parameter initialization (Sutskever et al., 2013). Moreover, an hyper-complex
parameter cannot be simply initialized randomly and component-wise, due to the interactions between
components. Therefore, this Section proposes a procedure reported in Algorithm 1 to initialize a
matrix W of quaternion-valued weights. The proposed initialization equations are derived from the
polar form of a weight w of W :
/
w = |w|eqimag θ = |w|(cos(θ) + qimag
/
sin(θ)), (14)
and,
/ / /
wr = ϕ cos(θ), wi = ϕ qimagi sin(θ), wj = ϕ qimagj sin(θ), wk = ϕ qimagk sin(θ). (15)
/
The angle θ is randomly generated in the interval [−π, π]. The quaternion qimag is defined as purely
/
normalized imaginary, and is expressed as qimag = 0 + xi + yj + zk. The imaginary components xi,
yj, and zk are sampled from an uniform distribution in [0, 1] to obtain qimag , which is then normalized
/
(following Eq. 4) to obtain qimag . The parameter ϕ is a random number generated with respect to
well-known initialization criterions (such as Glorot or He algorithms) (Glorot & Bengio, 2010; He
et al., 2015). However, the equations derived in (Glorot & Bengio, 2010; He et al., 2015) are defined
for real-valued weight matrices. Therefore, the variance of W has to be investigated in the quaternion
space to obtain ϕ (the full demonstration is provided in Appendix 6.2). The variance of W is:
V ar(W ) = E(|W |2 ) − [E(|W |)]2 , with [E(|W |)]2 = 0. (16)

5
Published as a conference paper at ICLR 2019

Algorithm 1 Quaternion-valued weight initialization

1: procedure Q INIT(W, nin , nout )
1
2: σ←√ . w.r.t to Glorot criterion and Eq. 18
2(nin +nout )
3: for w in W do
4: θ ← rand(−π, π)
5: ϕ ← rand(−σ, σ)
6: x, y, z ← rand(0, 1)
7: qimag ← Quaternion(0, x, y, z)
q
8: /
qimag ← √ 2imag
2 2
x +y +z
9: wr ← ϕ × cos(θ) . See Eq. 15
/
10: wi ← ϕ × qimag i
× sin(θ)
/
11: wj ← ϕ × qimag j
× sin(θ)
/
12: wk ← ϕ × qimag k
× sin(θ)
13: w ← Quaternion(wr , wi , wj , wk )

Indeed, the weight distribution is normalized. The value of V ar(W ) = E(|W |2 ), instead, is not
trivial in the case of quaternion-valued matrices. Indeed, W follows a Chi-distribution with four
degrees of freedom (DOFs). Consequently, V ar(W ) is expressed and computed as follows:
Z ∞
V ar(W ) = E(|W |2 ) = x2 f (x) dx = 4σ 2 . (17)
0

The Glorot (Glorot & Bengio, 2010) and He (He et al., 2015) criterions are extended to quaternion as:
1 1
σ =p , and σ = √ , (18)
2(nin + nout ) 2nin
with nin and nout the number of neurons of the input and output layers respectively. Finally, ϕ can
be sampled from [−σ, σ] to complete the weight initialization of Eq. 15.

4 E XPERIMENTS
This Section details the acoustic features extraction (Section 4.1), the experimental setups and the
results obtained with QRNNs, QLSTMs, RNNs and LSTMs on the TIMIT speech recognition tasks
(Section 4.2). The results reported in bold on tables are obtained with the best configurations of the
neural networks observed with the validation set.

4.1 Q UATERNION ACOUSTIC FEATURES

The raw audio is first splitted every 10ms with a window of 25ms. Then 40-dimensional log Mel-filter-
bank coefficients with first, second, and third order derivatives are extracted using the pytorch-kaldi2
(Ravanelli et al., 2018b) toolkit and the Kaldi s5 recipes (Povey et al., 2011). An acoustic quaternion
Q(f, t) associated with a frequency f and a time-frame t is formed as follows:
∂e(f, t) ∂ 2 e(f, t) ∂ 3 e(f, t)
Q(f, t) = e(f, t) + i+ 2
j+ k. (19)
∂t ∂ t ∂3t
Q(f, t) represents multiple views of a frequency f at time frame t, consisting of the energy e(f, t)
in the filter band at frequency f , its first time derivative describing a slope view, its second time
derivative describing a concavity view, and the third derivative describing the rate of change of the
second derivative. Quaternions are used to learn the spatial relations that exist between the 3 described
different views that characterize a same frequency (Tokuda et al., 2003). Thus, the quaternion input
vector length is 160/4 = 40. Decoding is based on Kaldi (Povey et al., 2011) and weighted finite
state transducers (WFST) (Mohri et al., 2002) that integrate acoustic, lexicon and language model
probabilities into a single HMM-based search graph.
2
pytorch-kaldi is available at https://github.com/mravanelli/pytorch-kaldi

6
Published as a conference paper at ICLR 2019

4.2 T HE TIMIT CORPUS

The training process is based on the standard 3, 696 sentences uttered by 462 speakers, while testing
is conducted on 192 sentences uttered by 24 speakers of the TIMIT (Garofolo et al., 1993) dataset.
A validation set composed of 400 sentences uttered by 50 speakers is used for hyper-parameter
tuning. The models are compared on a fixed number of layers M = 4 and by varying the number
of neurons N from 256 to 2, 048, and 64 to 512 for the RNN and QRNN respectively. Indeed, it is
worth underlying that the number of hidden neurons in the quaternion and real spaces do not handle
the same amount of real-number values. Indeed, 256 quaternion neurons output are 256 × 4 = 1024
real values. Tanh activations are used across all the layers except for the output layer that is based on
a softmax function. Models are optimized with RMSPROP with vanilla hyper-parameters and an
initial learning rate of 8 · 10−4 . The learning rate is progressively annealed using a halving factor of
0.5 that is applied when no performance improvement on the validation set is observed. The models
are trained during 25 epochs. All the models converged to a minimum loss, due to the annealed
learning rate. A dropout rate of 0.2 is applied over all the hidden layers (Srivastava et al., 2014)
except the output one. The negative log-likelihood loss function is used as an objective function.
All the experiments are repeated 5 times (5-folds) with different seeds and are averaged to limit any
variation due to the random initialization.
Table 1: Phoneme error rate (PER%) of QRNN and RNN models on the development and test sets of
the TIMIT dataset. “Params" stands for the total number of trainable parameters.

Models Neurons Dev. Test Params

256 22.4 23.4 1M
512 19.6 20.4 2.8M
RNN
1,024 17.9 19.0 9.4M
2,048 20.0 20.7 33.4M
64 23.6 23.9 0.6M
128 19.2 20.1 1.4M
QRNN
256 17.4 18.5 3.8M
512 17.5 18.7 11.2M

The results on the TIMIT task are reported in Table 1. The best PER in realistic conditions (w.r.t to
the best validation PER) is 18.5% and 19.0% on the test set for QRNN and RNN models respectively,
highlighting an absolute improvement of 0.5% obtained with QRNN. These results compare favorably
with the best results obtained so far with architectures that do not integrate access control in multiple
memory layers (Ravanelli et al., 2018a). In the latter, a PER of 18.3% is reported on the TIMIT
test set with batch-normalized RNNs . Moreover, a remarkable advantage of QRNNs is a drastic
reduction (with a factor of 2.5×) of the parameters needed to achieve these results. Indeed, such
PERs are obtained with models that employ the same internal dimensionality corresponding to
1, 024 real-valued neurons and 256 quaternion-valued ones, resulting in a number of parameters of
3.8M for QRNN against the 9.4M used in the real-valued RNN. It is also worth noting that QRNNs
consistently need fewer parameters than equivalently sized RNNs, with an average reduction factor
of 2.26 times. This is easily explained by considering the content of the quaternion algebra. Indeed,
for a fully-connected layer with 2, 048 input values and 2, 048 hidden units, a real-valued RNN has
2, 0482 ≈ 4.2M parameters, while to maintain equal input and output dimensions the quaternion
equivalent has 512 quaternions inputs and 512 quaternion hidden units. Therefore, the number of
parameters for the quaternion-valued model is 5122 × 4 ≈ 1M. Such a complexity reduction turns out
to produce better results and has other advantages such as a smaller memory footprint while saving
models on budget memory systems. This characteristic makes our QRNN model particularly suitable
for speech recognition conducted on low computational power devices like smartphones (Chen et al.,
2014). QRNNs and RNNs accuracies vary accordingly to the architecture with better PER on bigger
and wider topologies. Therefore, while good PER are observed with a higher number of parameters,
smaller architectures performed at 23.9% and 23.4%, with 1M and 0.6M parameters for the RNN
and the QRNN respectively. Such PER are due to a too small number of parameters to solve the task.

4.3 Q UATERNION LONG - SHORT TERM MEMORY NEURAL NETWORKS

We propose to extend the QRNN to state-of-the-art models such as long-short term memory neural
networks (LSTM), to support and improve the results already observed with the QRNN compared to
the RNN in more realistic conditions. LSTM (Hochreiter & Schmidhuber, 1997) neural networks

7
Published as a conference paper at ICLR 2019

were introduced to solve the problems of long-term dependencies learning and vanishing or exploding
gradient observed with long sequences. Based on the equations of the forward propagation and
back propagation through time of QRNN described in Section 3.3.1, and Section 3.3.2, one can
easily derive the equations of a quaternion-valued LSTM. Gates are defined with quaternion numbers
following the proposal of Danihelka et al. (2016). Therefore, the gate action is characterized by an
independent modification of each component of the quaternion-valued signal following a component-
wise product with the quaternion-valued gate potential. Let ft ,it , ot , ct , and ht be the forget, input,
output gates, cell states and the hidden state of a LSTM cell at time-step t:

ft =α(Wf ⊗ xt + Rf ⊗ ht−1 + bf ), (20)

it =α(Wi ⊗ xt + Ri ⊗ ht−1 + bi ), (21)
ct =ft × ct−1 + it × tanh(Wc xt + Rc ht−1 + bc ), (22)
ot =α(Wo ⊗ xt + Ro ⊗ ht−1 + bo ), (23)
ht =ot × tanh(ct ), (24)

where W are rectangular input weight matrices, R are square recurrent weight matrices, and b are
bias vectors. α is the split activation function and × denotes a component-wise product between two
quaternions. Both QLSTM and LSTM are bidirectional and trained on the same conditions than for
the QRNN and RNN experiments.

Table 2: Phoneme error rate (PER%) of QLSTM and LSTM models on the development and test sets
of the TIMIT dataset. “Params" stands for the total number of trainable parameters.

Models Neurons Dev. Test Params

256 14.9 16.5 3.6M
512 14.2 16.1 12.6M
LSTM
1,024 14.4 15.3 46.2M
2,048 14.0 15.9 176.3M
64 15.5 17.0 1.6M
128 14.1 16.0 4.6M
QLSTM
256 14.0 15.1 14.4M
512 14.2 15.1 49.9M

The results on the TIMIT corpus reported on Table 2 support the initial intuitions and the previously
established trends. We first point out that the best PER observed is 15.1% and 15.3% on the test set
for QLSTMs and LSTM models respectively with an absolute improvement of 0.2% obtained with
QLSTM using 3.3 times fewer parameters compared to LSTM. These results are among the top of
the line results (Graves et al., 2013b; Ravanelli et al., 2018a) and prove that the proposed quaternion
approach can be used in state-of-the-art models. A deeper investigation of QLSTMs performances
with the larger Wall Street Journal (WSJ) dataset can be found in Appendix 6.1.1.

5 C ONCLUSION
Summary. This paper proposes to process sequences of multidimensional features (such as
acoustic data) with a novel quaternion recurrent neural network (QRNN) and quaternion long-short
term memory neural network (QLSTM). The experiments conducted on the TIMIT phoneme
recognition task show that QRNNs and QLSTMs are more effective to learn a compact representation
of multidimensional information by outperforming RNNs and LSTMs with 2 to 3 times less
free parameters. Therefore, our initial intuition that the quaternion algebra offers a better
and more compact representation for multidimensional features, alongside with a better learn-
ing capability of feature internal dependencies through the Hamilton product, have been demonstrated.

Future Work. Future investigations will develop other multi-view features that contribute to
decrease ambiguities in representing phonemes in the quaternion space. In this extent, a recent
approach based on a quaternion Fourier transform to create quaternion-valued signal has to be
investigated. Finally, other high-dimensional neural networks such as manifold and Clifford networks
remain mostly unexplored and can benefit from further research.

8
Published as a conference paper at ICLR 2019

R EFERENCES
Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl
Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et al. Deep speech 2: End-
to-end speech recognition in english and mandarin. In International Conference on Machine
Learning, pp. 173–182, 2016.
Paolo Arena, Luigi Fortuna, Luigi Occhipinti, and Maria Gabriella Xibilia. Neural networks for
quaternion-valued function approximation. In Circuits and Systems, ISCAS’94., IEEE International
Symposium on, volume 6, pp. 307–310. IEEE, 1994.
Paolo Arena, Luigi Fortuna, Giovanni Muscato, and Maria Gabriella Xibilia. Multilayer perceptrons
to approximate quaternion valued functions. Neural Networks, 10(2):335–342, 1997.
Nicholas A Aspragathos and John K Dimitros. A comparative study of three methods for robot
kinematics. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 28(2):
135–145, 1998.
Rudrasis Chakraborty, Jose Bouza, Jonathan Manton, and Baba C. Vemuri. Manifoldnet: A deep
network framework for manifold-valued data. arXiv preprint arXiv:1809.06211, 2018.
William Chan and Ian Lane. Deep recurrent neural networks for acoustic modelling. arXiv preprint
arXiv:1504.01482, 2015.
G. Chen, C. Parada, and G. Heigold. Small-footprint keyword spotting using deep neural networks.
In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.
4087–4091, May 2014. doi: 10.1109/ICASSP.2014.6854370.
Chung-Cheng Chiu, Tara N Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng
Chen, Anjuli Kannan, Ron J Weiss, Kanishka Rao, Ekaterina Gonina, et al. State-of-the-art
speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 4774–4778. IEEE, 2018.
Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, and Marco Baroni. What
you can cram into a single vector: Probing sentence embeddings for linguistic properties, 2018.
Ivo Danihelka, Greg Wayne, Benigno Uria, Nal Kalchbrenner, and Alex Graves. Associative long
short-term memory. arXiv preprint arXiv:1602.03032, 2016.
Steven B Davis and Paul Mermelstein. Comparison of parametric representations for monosyllabic
word recognition in continuously spoken sentences. In Readings in speech recognition, pp. 65–74.
Elsevier, 1990.
Sadaoki Furui. Speaker-independent isolated word recognition based on emphasized spectral dynam-
ics. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP’86.,
volume 11, pp. 1991–1994. IEEE, 1986.
John S Garofolo, Lori F Lamel, William M Fisher, Jonathan G Fiscus, and David S Pallett. Darpa
timit acoustic-phonetic continous speech corpus cd-rom. nist speech disc 1-1.1. NASA STI/Recon
technical report n, 93, 1993.
Chase J Gaudet and Anthony S Maida. Deep quaternion networks. In 2018 International Joint
Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, 2018.
Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural
networks. In International conference on artificial intelligence and statistics, pp. 249–256, 2010.
Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. Hybrid speech recognition with deep
bidirectional lstm. In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE
Workshop on, pp. 273–278. IEEE, 2013a.
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent
neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international
conference on, pp. 6645–6649. IEEE, 2013b.

9
Published as a conference paper at ICLR 2019

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing
human-level performance on imagenet classification. In Proceedings of the IEEE international
conference on computer vision, pp. 1026–1034, 2015.
Akira Hirose and Shotaro Yoshida. Generalization characteristics of complex-valued feedforward
neural networks in relation to signal coherence. IEEE Transactions on Neural Networks and
learning systems, 23(4):541–551, 2012.
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):
1735–1780, 1997.
Jin Hu and Jun Wang. Global stability of complex-valued recurrent neural networks with time-delays.
IEEE Transactions on Neural Networks and Learning Systems, 23(6):853–865, 2012.
Teijiro Isokawa, Tomoaki Kusakabe, Nobuyuki Matsui, and Ferdinand Peper. Quaternion neural
network and its application. In International Conference on Knowledge-Based and Intelligent
Information and Engineering Systems, pp. 318–324. Springer, 2003.
Teijiro Isokawa, Nobuyuki Matsui, and Haruhiko Nishimura. Quaternionic neural networks: Funda-
mental properties and applications. Complex-Valued Neural Networks: Utilizing High-Dimensional
Parameters, pp. 411–439, 2009.
Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
Hiromi Kusamichi, Teijiro Isokawa, Nobuyuki Matsui, Yuzo Ogawa, and Kazuaki Maeda. A new
scheme for color night vision by quaternion neural network. In Proceedings of the 2nd International
Conference on Autonomous Robots and Agents, volume 1315, 2004.
Nobuyuki Matsui, Teijiro Isokawa, Hiromi Kusamichi, Ferdinand Peper, and Haruhiko Nishimura.
Quaternion neural network with geometrical operators. Journal of Intelligent & Fuzzy Systems, 15
(3, 4):149–164, 2004.
Larry R. Medsker and Lakhmi J. Jain. Recurrent neural networks. Design and Applications, 5, 2001.
Toshifumi Minemoto, Teijiro Isokawa, Haruhiko Nishimura, and Nobuyuki Matsui. Feed forward
neural network with random quaternionic neurons. Signal Processing, 136:59–68, 2017.
Mehryar Mohri, Fernando Pereira, and Michael Riley. Weighted finite-state transducers in speech
recognition. Computer Speech and Language, 16(1):69 – 88, 2002. ISSN 0885-2308. doi: https:
//doi.org/10.1006/csla.2001.0184. URL http://www.sciencedirect.com/science/
article/pii/S0885230801901846.
Mohamed Morchid. Parsimonious memory unit for recurrent neural networks with application to
natural language processing. Neurocomputing, 314:48–64, 2018.
Tohru Nitta. A quaternary version of the back-propagation algorithm. In Neural Networks, 1995.
Proceedings., IEEE International Conference on, volume 5, pp. 2753–2756. IEEE, 1995.
Titouan Parcollet, Mohamed Morchid, Pierre-Michel Bousquet, Richard Dufour, Georges Linarès,
and Renato De Mori. Quaternion neural networks for spoken language understanding. In Spoken
Language Technology Workshop (SLT), 2016 IEEE, pp. 362–368. IEEE, 2016.
Titouan Parcollet, Morchid Mohamed, and Georges Linarès. Quaternion denoising encoder-decoder
for theme identification of telephone conversations. Proc. Interspeech 2017, pp. 3325–3328, 2017a.
Titouan Parcollet, Mohamed Morchid, and Georges Linares. Deep quaternion neural networks for
spoken language understanding. In Automatic Speech Recognition and Understanding Workshop
(ASRU), 2017 IEEE, pp. 504–511. IEEE, 2017b.
Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato
de Mori, and Yoshua Bengio. Quaternion convolutional neural networks for end-to-end automatic
speech recognition. In Interspeech 2018, 19th Annual Conference of the International Speech
Communication Association, Hyderabad, India, 2-6 September 2018., pp. 22–26, 2018. doi:
10.21437/Interspeech.2018-1898. URL https://doi.org/10.21437/Interspeech.
2018-1898.

10
Published as a conference paper at ICLR 2019

Soo-Chang Pei and Ching-Min Cheng. Color image processing by using binary quaternion-moment-
preserving thresholding technique. IEEE Transactions on Image Processing, 8(5):614–628, 1999.
Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel,
Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and
Karel Vesely. The kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech
Recognition and Understanding. IEEE Signal Processing Society, December 2011. IEEE Catalog
No.: CFP11SRW-USB.
Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahremani, Vimal Manohar, Xingyu Na,
Yiming Wang, and Sanjeev Khudanpur. Purely sequence-trained neural networks for asr based on
lattice-free mmi. In Interspeech, pp. 2751–2755, 2016.
Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, and Yoshua Bengio. Light gated recurrent
units for speech recognition. IEEE Transactions on Emerging Topics in Computational Intelligence,
2(2):92–102, 2018a.
Mirco Ravanelli, Titouan Parcollet, and Yoshua Bengio. The pytorch-kaldi speech recognition toolkit.
arXiv preprint arXiv:1811.07453, 2018b.
Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. Dynamic routing between capsules. arXiv
preprint arXiv:1710.09829v2, 2017.
Stephen John Sangwine. Fourier transforms of colour images using quaternion or hypercomplex,
numbers. Electronics letters, 32(21):1979–1980, 1996.
Jingyan Song and Yeung Yam. Complex recurrent neural network for computing the inverse and
pseudo-inverse of the complex matrix. Applied mathematics and computation, 93(2-3):195–205,
1998.
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov.
Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine
Learning Research, 15(1):1929–1958, 2014.
Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization
and momentum in deep learning. In International conference on machine learning, pp. 1139–1147,
2013.
Keiichi Tokuda, Heiga Zen, and Tadashi Kitamura. Trajectory modeling based on hmms with the
explicit relationship between static and dynamic features. In Eighth European Conference on
Speech Communication and Technology, 2003.
Chiheb Trabelsi, Olexa Bilaniuk, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos,
Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, and Christopher J Pal. Deep complex
networks. arXiv preprint arXiv:1705.09792, 2017.
Bipin Kumar Tripathi. High Dimensional Neurocomputing. Springer, 2016.
Mark Tygert, Joan Bruna, Soumith Chintala, Yann LeCun, Serkan Piantino, and Arthur Szlam. A
mathematical motivation for complex-valued convolutional networks. Neural computation, 28(5):
815–825, 2016.
Scott Wisdom, Thomas Powers, John Hershey, Jonathan Le Roux, and Les Atlas. Full-capacity
unitary recurrent neural networks. In Advances in Neural Information Processing Systems, pp.
4880–4888, 2016.
D Xu, L Zhang, and H Zhang. Learning alogrithms in quaternion neural networks using ghr calculus.
Neural Network World, 27(3):271, 2017.

11
Published as a conference paper at ICLR 2019

6 A PPENDIX

6.1 WALL S TREET J OURNAL EXPERIMENTS AND COMPUTATIONAL COMPLEXITY

This Section proposes to validate the scaling of the proposed QLSTMs to a bigger and more realistic
corpus, with a speech recognition task on the Wall Street Journal (WSJ) dataset. Finally, it discuses
the impact of the quaternion algebra in term of computational compexity.

6.1.1 S PEECH RECOGNITION WITH THE WALL S TREET J OURNAL CORPUS

We propose to evaluate both QLSTMs and LSTMs with a larger and more realistic corpus to validate
the scaling of the observed TIMIT results (Section 4.2). Acoustic input features are described in
Section 4.1, and extracted on both the 14 hour subset ‘train-si84’, and the full 81 hour dataset ’train-
si284’ of the Wall Street Journal (WSJ) corpus. The ‘test-dev93’ development set is employed for
validation, while ’test-eval92’ composes the testing set. Models architectures are fixed with respect to
the best results observed with the TIMIT corpus (Section 4.2). Therefore, both QLSTMs and LSTMs
contain four bidirectional layers of internal dimension of size 1, 024. Then, an additional layer of
internal size 1, 024 is added before the output layer. The only change on the training procedure
compared to the TIMIT experiments concerns the model optimizer, which is set to Adam (Kingma &
Ba, 2014) instead of RMSPROP. Results are from a 3-folds average.

Table 3: Word error rates (WER %) obtained with both training set (WSJ14h and WSJ81h) of the Wall
Street Journal corpus. ’test-dev93’ and ’test-eval92’ are used as validation and testing set respectively.
L expresses the number of recurrent layers.

Models WSJ14 Dev. WSJ14 Test WSJ81 Dev. WSJ81 Test Params
LSTM 11.2 7.2 7.4 4.5 53.7M
QLSTM 10.9 6.9 7.2 4.3 18.7M

It is important to notice that reported results on Table 3 compare favorably with equivalent architec-
tures (Graves et al., 2013a) (WER of 11.7% on ’test-dev93’), and are competitive with state-of-the-art
and much more complex models based on better engineered features (Chan & Lane, 2015)(WER
of 3.8% with the 81 hours of training data, and on ’test-eval92’). According to Table 3, QLSTMs
outperform LSTM in all the training conditions (14 hours and 81 hours) and with respect to both
the validation and testing sets. Moreover, QLSTMs still need 2.9 times less neural parameters than
LSTMs to achieve such performances. This experiment demonstrates that QLSTMs scale well to
larger and more realistic speech datasets and are still more efficient than real-valued LSTMs.

6.1.2 N OTES ON COMPUTATIONAL COMPLEXITY

A computational complexity of O(n2 ) with n the number of hidden states has been reported by
Morchid (2018) for real-valued LSTMs. QLSTMs just involve 4 times larger matrices during
computations. Therefore, the computational complexity remains unchanged and equals to O(n2 ).
Nonetheless, and due to the Hamilton product, a single forward propagation between two quaternion
neurons uses 28 operations, compared to a single one for two real-valued neurons, implying a longer
training time (up to 3 times slower). However, such worst speed performances could easily be
alleviated with a proper engineered cuDNN kernel for the Hamilton product, that would helps QNNs
to be more efficient than real-valued ones. A well-adapted CUDA kernel would allow QNNs to
perform more computations, with fewer parameters, and therefore less memory copy operations from
the CPU to the GPU.

6.2 PARAMETERS INITIALIZATION

Let us recall that a generated quaternion weight w from a weight matrix W has a polar form defined
as:
/
w = |w|eqimag θ = |w|(cos(θ) + qimag
/
sin(θ)), (25)

12
Published as a conference paper at ICLR 2019

/
with qimag = 0 + xi + yj + zk a purely imaginary and normalized quaternion. Therefore, w can be
computed following:
wr = ϕ cos(θ),
/
wi = ϕ qimagi sin(θ),
/ (26)
wj = ϕ qimagj sin(θ),
/
wk = ϕ qimagk sin(θ).
However, ϕ represents a randomly generated variable with respect to the variance of the quaternion
weight and the selected initialization criterion. The initialization process follows (Glorot & Bengio,
2010) and (He et al., 2015) to derive the variance of the quaternion-valued weight parameters. Indeed,
the variance of W has to be investigated:
V ar(W ) = E(|W |2 ) − [E(|W |)]2 . (27)
[E(|W |)]2 is equals to 0 since the weight distribution is symmetric around 0. Nonetheless, the value
of V ar(W ) = E(|W |2 ) is not trivial in the case of quaternion-valued matrices. Indeed, W follows a
Chi-distribution with four degrees of freedom (DOFs) and E(|W |2 ) is expressed and computed as
follows:
Z ∞
2
E(|W | ) = x2 f (x) dx, (28)
0

With f (x) is the probability density function with four DOFs. A four-dimensional vector X =
{A, B, C, D} is considered to evaluate the density function f (x). X has components that are
normally distributed, centered at zero, and independent. Then, A, B, C and D have density functions:
2 2
e−x /2σ
fA (x; σ) = fB (x; σ) = fC (x; σ) = fD (x; σ) = √ . (29)
2πσ 2
√
The four-dimensional vector X has a length L defined as L = A2 + B 2 + C 2 + D2 with a
cumulative distribution function FL (x; σ) in the 4-sphere (n-sphere with n = 4) Sx :
Z Z Z Z
FL (x; σ) = fA (x; σ)fB (x; σ)fC (x; σ)fD (x; σ) dSx (30)
Sx
√
where Sx = {(a, b, c, d) : a2
+ + c2 + d2 < x} and dSx = da db dc dd. The polar representa-
b2
tions of the coordinates of X in a 4-dimensional space are defined to compute dSx :
a = ρ cos θ,
b = ρ sin θ cos φ,
c = ρ sin θ sin φ cos ψ,
d = ρ sin θ sin φ sin ψ,
√
where ρ is the magnitude (ρ = a2 + b2 + c2 + d2 ) and θ, φ, and ψ are the phases with 0 ≤ θ ≤ π,
0 ≤ φ ≤ π and 0 ≤ ψ ≤ 2π. Then, dSx is evaluated with the Jacobian Jf of f defined as:

da da da da
dρ dθ dφ dψ
db db db db
∂(a, b, c, d) da db dc dd dρ dθ dφ dψ
Jf = = = dc dc dc dc
∂(ρ, θ, φ, ψ) dρ dθ dφ dψ dρ dθ dφ dψ
dd dd dd dd
dρ dθ dφ dψ

cos θ −ρ sin θ 0 0
sin θ cos φ ρ sin θ cos φ −ρ sin θ sin φ 0
= .
sin θ sin φ cos ψ ρ cos θ sin φ cos ψ ρ sin θ cos φ cos ψ −ρ sin θ sin φ sin ψ
sin θ sin φ sin ψ ρ cos θ sin φ sin ψ ρ sin θ cos φ sin ψ ρ sin θ sin φ cos ψ
And,
Jf = ρ3 sin2 θ sin φ. (31)

13
Published as a conference paper at ICLR 2019

Therefore, by the Jacobian Jf , we have the polar form:

da db dc dd = ρ3 sin2 θ sin φ dρ dθ dφ dψ. (32)

Then, writing Eq.(30) in polar coordinates, we obtain:

4 Z Z Z Z x
1 2
/2σ 2 −b2 /2σ 2 −c2 /2σ 2 −d2 /2σ 2
FL (x, σ) = √ e−a e e e dSx
2πσ 2 0
Z 2π Z π Z π Z x
1 2
/2σ 2 3
= e−ρ ρ sin2 θ sin φ dρ dθ dφ dψ
4π 2 σ 4 0 0 0 0
Z 2π Z π Z π Z x
1 2 2
= dψ sin φ dφ sin2
θ dθ ρ3 e−ρ /2σ dρ
4π 2 σ 4 0 0
π Z x0 0
1 θ sin 2θ 3 −ρ2 /2σ 2
= 2π2 − ρ e dρ
4π 2 σ 4 2 4 0 0
π x 3 −ρ2 /2σ2
Z
1
= 4π ρ e dρ,
4π 2 σ 4 2 0
Then,
Z x
1 2
/2σ 2
FL (x, σ) = ρ3 e−ρ dρ. (33)
2σ 4 0

The probability density function for X is the derivative of its cumulative distribution function, which
by the fundamental theorem of calculus is:

d
fL (x, σ) = FL (x, σ)
dx
1 3 −x2 /2σ2
= x e . (34)
2σ 4
The expectation of the squared magnitude becomes:

Z ∞
E(|W |2 ) = x2 f (x) dx
Z0 ∞
1 2 2
= x2 4 x3 e−x /2σ dx
0 2σ
Z ∞
1 2 2
= 4
x5 e−x /2σ dx.
2σ 0

With integration by parts we obtain:

∞
Z ∞
1 4 2 −x2 /2σ 2 3 −x2 /2σ 2
E(|W |2 ) = −x σ e + σ 2
4x e dx
2σ 4 0 0

∞
Z ∞
1 2
4 −x /2σ 2
3 −x2 /2σ 2
= −x e + 4x e dx . (35)
2σ 2 0 0

The expectation E(|W |2 ) is the sum of two terms. The first one:
2 ∞
/2σ 2 2
/2σ 2 2
/2σ 2
−x4 e−x = lim −x4 e−x − lim x4 e−x
0 x→+∞ x→+0
2
/2σ 2
= lim −x4 e−x ,
x→+∞

14
Published as a conference paper at ICLR 2019

Based on the L’Hôpital’s rule, the undetermined limit becomes:

2 2 x4
lim −x4 e−x /2σ = − lim x2 /2σ2
x→+∞ x→+∞ e
= ...
24
= − lim (36)
x→+∞ (1/σ 2 )(P (x)ex2 /2σ2 )
= 0.

With P (x) is polynomial and has a limit to +∞. The second term is calculated in a same way
(integration by parts) and E(|W |2 ) becomes from Eq.(35):

Z ∞
1 2 2
E(|W |2 ) = 2
4x3 e−x /2σ dx
2σ 0

∞
Z ∞
2 2 2 −x2 /2σ 2 2 −x2 /2σ 2
= 2 x σ e + σ 2xe dx .
σ 0 0
(37)
The limit of first term is equals to 0 with the same method than in Eq.(36). Therefore, the expectation
is:
Z ∞
2 2
E(|W |2 ) = 4 xe−x /2σ dx
0
= 4σ 2 . (38)
And finally the variance is:
V ar(|W |) = 4σ 2 . (39)

6.3 Q UATERNION BACKPROPAGATION THROUGH TIME

Let us recall the forward equations and parameters needed to derive the complete quaternion
backpropagation through time (QBPTT) algorithm.

6.3.1 R ECALL OF THE FORWARD PHASE

Let xt be the input vector at timestep t, ht the hidden state, Whh , Wxh and Why the hidden state,
input and output weight matrices respectively. Finally bh is the biases vector of the hidden states and
pt , yt are the output and the expected target vector.
ht = α(hpreact
t ), (40)
with,
hpreact
t = Whh ⊗ ht−1 + Wxh ⊗ xt + bh , (41)
and α is the quaternion split activation function (Xu et al., 2017) of a quaternion Q defined as:
α(Q) = f (r) + if (x) + jf (y) + kf (z), (42)
and f corresponding to any standard activation function. The output vector pt can be computed as:
pt = β(ppreact
t ), (43)
with
ppreact
t = Why ⊗ ht , (44)
and β any split activation function. Finally, the objective function is a real-valued loss function
applied component-wise. The gradient with respect to the MSE loss is expressed for each weight
∂Et ∂Et ∂Et ∂Et
matrix as ∂W hy
, ∂W hh
, ∂W hx
, and for the bias vector as ∂Bh
. In the real-valued space, the dynamic
of the loss is only investigated based on all previously connected neurons. In this extent, the QBPTT
differs from BPTT due to the fact that the loss must also be derived with respect to each component of
a quaternion neural parameter, making it bi-level. This could act as a regularizer during the training
process.

15
Published as a conference paper at ICLR 2019

6.3.2 O UTPUT WEIGHT MATRIX

The weight matrix Why is used only in the computation of pt . It is therefore straightforward to
∂Et
compute ∂W hy
:
∂Et ∂Et ∂Et ∂Et ∂Et
= r +i i
+j j
+k k
. (45)
∂Why ∂Why ∂Why ∂Why ∂W hy

Each quaternion component is then derived following the chain rule:

∂Et ∂Et ∂prt ∂Et ∂pit ∂Et ∂pjt ∂Et ∂pkt
r = + + +
∂Why ∂prt ∂Why
r r
∂pit ∂Why r
∂pjt ∂Why r
∂pkt ∂Why (46)
= (prt − ytr ) × hrt + (pit − yti ) × hit + (pjt − ytj ) × hjt + (pkt − ytk ) × hkt .

∂Et ∂Et ∂prt ∂Et ∂pit ∂Et ∂pjt ∂Et ∂pkt

= + + +
i
∂Why ∂prt ∂Why
i ∂pit ∂Why
i
∂pjt ∂Why
i i
∂pkt ∂Why (47)
= (prt − ytr ) × −hit + (pit − yti ) × hrt + (pjt − ytj ) × hkt + (pkt − ytk ) × −hjt .

∂Et ∂Et ∂prt ∂Et ∂pit ∂Et ∂pjt ∂Et ∂pkt

= + + +
j
∂Why ∂prt ∂Why
j ∂pit ∂Why
j
∂pjt ∂Why
j j
∂pkt ∂Why (48)
= (prt − ytr ) × −hjt + (pit − yti ) × −hkt + (pjt − ytj ) × hrt + (pkt − ytk ) × hit .

∂Et ∂Et ∂prt ∂Et ∂pit ∂Et ∂pjt ∂Et ∂pkt

k
= r k
+ i k
+ j k
+ k k
∂Why ∂pt ∂Why ∂pt ∂Why ∂pt ∂Why ∂pt ∂Why (49)
= (prt − ytr ) × −hkt + (pit − yti ) × hjt + (pjt − ytj ) × −hit + (pkt − ytk ) × hrt .
By regrouping in a matrix form the ht components from these equations, one can define:
hjt
 r
hit hkt

ht
 −hi
 jt hrt hkt −hjt  ∗
i  = ht . (50)

 −h −hk hr h
t t t t
−hkt hjt −hit hrt
Therefore,
∂Et
= (pt − yt ) ⊗ h∗t . (51)
∂Why

6.3.3 H IDDEN WEIGHT MATRIX

Conversely to Why the weight matrix Whh is an argument of ht with ht−1 involved. The recursive
backpropagation can thus be derived as:
N
∂E X ∂Et
= . (52)
∂Whh t=0
∂Whh
And,
t
∂Et X ∂Em ∂Em ∂Em ∂Em
= r +i r +j i
+k k
, (53)
∂Whh m=0
∂Whh ∂Whh ∂Whh ∂Whh
∂Ek
with N the number of timesteps that compose the sequence. As for Why we start with r :
∂Whh
t t
X ∂Em X ∂Et ∂hrt ∂hrm ∂Et ∂hit ∂him
r = r r + r
m=0
∂Whh m=0
r
∂ht ∂hm ∂Whh ∂hit ∂him ∂Whh
(54)
∂Et ∂hjt ∂hjm ∂Et ∂hit ∂hkm
+ j j ∂W r
+ r .
∂ht ∂hm hh ∂hkt ∂hkm ∂Whh

16
Published as a conference paper at ICLR 2019

Non-recursive elements are derived w.r.t r, i,j, k:

∂Et ∂Et ∂prt ∂Et ∂pit ∂Et ∂pjt ∂Et ∂pk

r = r r + i r + j r + k tr
∂ht ∂pt ∂ht ∂pt ∂ht ∂pt ∂ht ∂pt ∂ht
0 0 (55)
= (prt − ytr ) × f (prt ) × Why
r
+ (pit − yti ) × f (pit ) × Why
i

0 0
+ (pjt − ytj ) × f (pjt ) × Why
j
+ (pkt − ytk ) × f (pkt ) × Why
k
.

∂Et ∂Et ∂prt ∂Et ∂pit ∂Et ∂pjt ∂Et ∂pk

i
= r i
+ i i
+ j i
+ k ti
∂ht ∂pt ∂ht ∂pt ∂ht ∂pt ∂ht ∂pt ∂ht
0 0 (56)
= (prt − ytr ) × f (prt ) × −Why
i
+ (pit − yti ) × f (pit ) × Why
r

0 0
+ (pjt − ytj ) × f (pjt ) × Why
k j
+ (pkt − ytk ) × f (pkt ) × −Why .

∂Et ∂Et ∂prt ∂Et ∂pit ∂Et ∂pjt ∂Et ∂pkt

= + + +
∂hjt ∂prt ∂hjt ∂pit ∂hjt ∂pjt ∂hjt ∂pkt ∂hjt
0 j 0 (57)
= (prt − ytr ) × f (prt ) × −Why + (pit − yti ) × f (pit ) × −Why
k

0 0
+ (pjt − ytj ) × f (pjt ) × Why
r
+ (pkt − ytk ) × f (pkt ) × Why
i
.

∂Et ∂Et ∂prt ∂Et ∂pit ∂Et ∂pjt ∂Et ∂pkt

= r + i
+ j
+
∂hkt ∂pt ∂hkt ∂pt ∂hkt ∂pt ∂hkt ∂pkt ∂hkt
0 j 0 (58)
= (prt − ytr ) × f (prt ) × −Why
k
+ (pit − yti ) × f (pit ) × Why
0 0
+ (pjt − ytj ) × f (pjt ) × −Why
i
+ (pkt − ytk ) × f (pkt ) × Why
r
.

Then,

 ∂hr,m ∂hi,m ∂hj,m ∂hk,m


∂Whh r = hr,t−1 ∂Whh r = hi,t−1 ∂Whhr = hj,t−1 r
∂Whh = hk,t−1
 ∂h
 r,m ∂hi,m ∂hj,m ∂hk,m 
 ∂Whhi = −hi,t−1 ∂Whh r = hi,t−1 ∂Whhr = hj,t−1 r
∂Whh = hk,t−1 
 = h∗t . (59)

 ∂hr,m ∂hi,m ∂hj,m ∂hk,m
 j = −hj,t−1 j = −hk,t−1 j = hr,t−1 j = hi,t−1 
 ∂Whh ∂Whh ∂Whh ∂Whh 
∂hr,m ∂hi,m ∂hj,m ∂hk,m
k
∂Whh
= −hk,t−1 ∂Whh k = hj,t−1 k
∂Whh
= −hi,t−1 k
∂Whh
= hr,t−1

∂hrt ∂hit ∂hjt ∂hk

The remaining terms ∂hrm , ∂him , ∂hjm and t
∂hk
are recursive and are written as:
m

t preact
∂hr,t Y ∂hr,n ∂hpreact
r,n ∂hr,n ∂hi,n
= +
∂hr,m n=m+1
∂hpreact
r,n ∂hr,n−1 ∂hpreact
i,n
∂hr,n−1
preact preact
(60)
∂hr,n ∂hj,n ∂hr,n ∂hk,n
+ preact + ,
∂hj,n ∂hr,n−1 ∂hpreact
k,n
∂hr,n−1

simplified with,
t
∂hr,t Y ∂hr,n r ∂hr,n i
= preact × Whh + preact × Whh
∂hr,m n=m+1
∂hr,n ∂hi,n
(61)
∂hr,n j ∂hr,n k
+ preact × Whh + × Whh .
∂hj,n ∂hpreact
k,n

17
Published as a conference paper at ICLR 2019

Consequently,
t
∂hi,t Y ∂hi,n i ∂hi,n r
= preact × −Whh + preact × Whh
∂hi,m n=m+1
∂hr,n ∂hi,n
(62)
∂hj,n k ∂hi,n j
+ preact × Whh + × −Whh .
∂hj,n ∂hpreact
k,n

t
∂hj,t Y ∂hj,n j ∂hj,n k
= preact × −Whh + preact × −Whh
∂hj,m n=m+1
∂hr,n ∂hi,n
(63)
∂hj,n r ∂hj,n i
+ preact × Whh + × Whh .
∂hj,n ∂hpreact
k,n

t
∂hk,t Y ∂hk,n k ∂hk,n j
= preact × −Whh + preact × Whh
∂hk,m n=m+1
∂hr,n ∂hi,n
(64)
∂hk,n i ∂hk,n r
+ preact × −Whh + × Whh .
∂hj,n ∂hpreact
k,n

∂Et
The same operations are performed for i,j,k in Eq. 68 and ∂Whh can finally be expressed as:
t t
∂Et X Y
= ( δn ) ⊗ h∗t−1 , (65)
∂Whh m=0 n=m+1

with,
∗
⊗ δn+1 × α0 (hpreact

Whh n ) if n 6= t
δn = ∗ 0 (66)
Why ⊗ (pn − yn ) × β (ppreact
n ) else.

6.3.4 I NPUT WEIGHT MATRIX

∂Et ∂Et
∂Whx is computed in the exact same manner as ∂Whh .

N
∂E X ∂Et
= . (67)
∂Whx t=0
∂Whx

And,
t
∂Et X ∂Em ∂Em ∂Em ∂Em
= r +i r +j i
+k k
. (68)
∂Whx m=0
∂Whx ∂Whx ∂Whx ∂Whx
∂Et
Therefore ∂Whx is easily extent as:
t t
∂Et X Y
= ( δn ) ⊗ x∗t . (69)
∂Whx m=0 n=m+1

6.3.5 H IDDEN BIASES

∂Et
∂Bh can easily be extended to:

N
∂E X ∂Et
= . (70)
∂Bh t=0
∂Bh

18
Published as a conference paper at ICLR 2019

And,
t
∂Et X ∂Em ∂Em ∂Em ∂Em
= r + i ∂B r + j i
+k . (71)
∂Bh m=0
∂Bh h ∂Bh ∂Bhk

Nonetheless, since biases are not connected to any inputs or hidden states, the matrix of derivatives
∂Et
defined in Eq. 59 becomes a matrix of 1. Consequently ∂B h
can be summarized as:
t t
∂Et X Y
= ( δn ). (72)
∂Bh m=0 n=m+1

DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
RNN, Gru, LSTM
No ratings yet
RNN, Gru, LSTM
129 pages
Global Exponential Stability For Quaternion
No ratings yet
Global Exponential Stability For Quaternion
127 pages
Module 4 RNN LSTM GRU
No ratings yet
Module 4 RNN LSTM GRU
59 pages
Hand Written Project
No ratings yet
Hand Written Project
40 pages
cs231n 2019 Lecture10
No ratings yet
cs231n 2019 Lecture10
106 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
111 pages
Outline
No ratings yet
Outline
50 pages
11.RNN and Transformers
No ratings yet
11.RNN and Transformers
100 pages
Variational TNN
No ratings yet
Variational TNN
17 pages
A Practical Survey On Faster and Lighter Transformers - 2023 - Fournier Et Al
No ratings yet
A Practical Survey On Faster and Lighter Transformers - 2023 - Fournier Et Al
40 pages
Machine Learning Tutorial
No ratings yet
Machine Learning Tutorial
149 pages
Introduction To Rnns
No ratings yet
Introduction To Rnns
48 pages
2024 - The Topos of Transformer Networks - Mattia Villani Et Al - 2403.18415
No ratings yet
2024 - The Topos of Transformer Networks - Mattia Villani Et Al - 2403.18415
14 pages
6159 Resurrecting Recurrent Ne
No ratings yet
6159 Resurrecting Recurrent Ne
29 pages
Neural Networks in The Educational Sector Challeng
No ratings yet
Neural Networks in The Educational Sector Challeng
6 pages
Deep Complex Networks
No ratings yet
Deep Complex Networks
19 pages
DL Tutorial NIPS2015 PDF
No ratings yet
DL Tutorial NIPS2015 PDF
133 pages
Optimization of Weight Function For (3+1) D Phonon Propagation in Weyl Fermion Sea Expressed by Clifford Algebra Using Elman RNN and Echo State Network
No ratings yet
Optimization of Weight Function For (3+1) D Phonon Propagation in Weyl Fermion Sea Expressed by Clifford Algebra Using Elman RNN and Echo State Network
8 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
Time Series RNN LSTM 1746197734
No ratings yet
Time Series RNN LSTM 1746197734
25 pages
Lecture 4 - Language Modelling and RNNs Part 2
No ratings yet
Lecture 4 - Language Modelling and RNNs Part 2
44 pages
Issues in ML
No ratings yet
Issues in ML
2 pages
Rrnns Mckay 2018
No ratings yet
Rrnns Mckay 2018
31 pages
NIPS 2017 Principles of Riemannian Geometry in Neural Networks Paper
No ratings yet
NIPS 2017 Principles of Riemannian Geometry in Neural Networks Paper
10 pages
A Novel Continuous-Valued Quaternionic Hopfield Neural Network
No ratings yet
A Novel Continuous-Valued Quaternionic Hopfield Neural Network
6 pages
Sarcia - Judd Michael - AS4
No ratings yet
Sarcia - Judd Michael - AS4
6 pages
1308 0850 PDF
No ratings yet
1308 0850 PDF
43 pages
Signature Verification and Detection
No ratings yet
Signature Verification and Detection
61 pages
Hierarchically Gated Recurrent
No ratings yet
Hierarchically Gated Recurrent
20 pages
Isokawa2003 Chapter QuaternionNeuralNetworkAndItsA
No ratings yet
Isokawa2003 Chapter QuaternionNeuralNetworkAndItsA
7 pages
RNNBasics
No ratings yet
RNNBasics
23 pages
Unit V
No ratings yet
Unit V
32 pages
Szegedy - Intriguing Properties of Neural Networks
No ratings yet
Szegedy - Intriguing Properties of Neural Networks
10 pages
An Introduction To Deep Learning: January 2011
No ratings yet
An Introduction To Deep Learning: January 2011
14 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
DNN U2 Notes
No ratings yet
DNN U2 Notes
32 pages
A Simple Way To Initialize Recurrent Networks of Rectified Linear Units
No ratings yet
A Simple Way To Initialize Recurrent Networks of Rectified Linear Units
9 pages
1 s2.0 S0022509621003161 Main
No ratings yet
1 s2.0 S0022509621003161 Main
20 pages
Qu 2019
No ratings yet
Qu 2019
6 pages
Papers Papers PDF
No ratings yet
Papers Papers PDF
48 pages
Lec 4 Recurrent Neural Network Long Short-Term Memory
No ratings yet
Lec 4 Recurrent Neural Network Long Short-Term Memory
32 pages
Group Equivariant Convolutional Networks: Taco S. Cohen
No ratings yet
Group Equivariant Convolutional Networks: Taco S. Cohen
12 pages
Updated - Mini - Project - II - Predictive Maintenance Analysis For Industrial Machinary
No ratings yet
Updated - Mini - Project - II - Predictive Maintenance Analysis For Industrial Machinary
39 pages
Unit 3
No ratings yet
Unit 3
30 pages
Recurrent Neural Networks (RNNS) : Foundations and Applications in Sequential Learning
No ratings yet
Recurrent Neural Networks (RNNS) : Foundations and Applications in Sequential Learning
9 pages
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
No ratings yet
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
12 pages
Quaternion Convolutional Neural Networks
No ratings yet
Quaternion Convolutional Neural Networks
17 pages
Data Management For Training Large Language Models
No ratings yet
Data Management For Training Large Language Models
22 pages
UNIT1 Perceptron MLP
No ratings yet
UNIT1 Perceptron MLP
26 pages
Sentiment Analysis With Machine Learning and Deep Learning A Survey of Techniques and Applications
No ratings yet
Sentiment Analysis With Machine Learning and Deep Learning A Survey of Techniques and Applications
11 pages
2111CS010077 Deep Learning
No ratings yet
2111CS010077 Deep Learning
10 pages
B.Tech AI ML 2018 2019
No ratings yet
B.Tech AI ML 2018 2019
6 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
QB Serial Test 1
No ratings yet
QB Serial Test 1
7 pages
3 8 BoykoN Bosik Ivanets
No ratings yet
3 8 BoykoN Bosik Ivanets
6 pages
Aguado Asis Garcia A13 AI LAB 1
No ratings yet
Aguado Asis Garcia A13 AI LAB 1
9 pages
Machine Learning and Pattern Recognition Week 8 Neural Net Architectures
No ratings yet
Machine Learning and Pattern Recognition Week 8 Neural Net Architectures
3 pages
Artificial Intelligence Technologies Applied To SM
No ratings yet
Artificial Intelligence Technologies Applied To SM
8 pages
Unit III (2) RNN, LSTM, Gru
No ratings yet
Unit III (2) RNN, LSTM, Gru
14 pages
User + Entity Behavior Analytics (Ueba) : The Heart of Next-Generation Threat Hunting
100% (1)
User + Entity Behavior Analytics (Ueba) : The Heart of Next-Generation Threat Hunting
19 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
Aquino Dominic Bien FA2.2
No ratings yet
Aquino Dominic Bien FA2.2
3 pages
Sample Paper: General Instructions
No ratings yet
Sample Paper: General Instructions
4 pages
Survey of Prediction Using Recurrent Neural Network
No ratings yet
Survey of Prediction Using Recurrent Neural Network
3 pages
DL Unit 4 Part 2
No ratings yet
DL Unit 4 Part 2
8 pages
500 AI Prompts
No ratings yet
500 AI Prompts
9 pages
KERNEL Geometric Deep Learning: LeCun
No ratings yet
KERNEL Geometric Deep Learning: LeCun
22 pages
A Comprehensive Survey of Graph Neural Networks PDF
No ratings yet
A Comprehensive Survey of Graph Neural Networks PDF
22 pages
Zahra Amanli Resume
No ratings yet
Zahra Amanli Resume
2 pages
Breast Cancer
No ratings yet
Breast Cancer
20 pages
Unit 3
No ratings yet
Unit 3
8 pages
Bronstein2016geometric, Geometric Deep Learning - Going Beyond Euclidean Data
No ratings yet
Bronstein2016geometric, Geometric Deep Learning - Going Beyond Euclidean Data
20 pages
7072402-AI PPT-unit-1-INTRODUCTION TO ARTIFICIAL INTELLIGENCE (AI) - by Anu To Upload
100% (1)
7072402-AI PPT-unit-1-INTRODUCTION TO ARTIFICIAL INTELLIGENCE (AI) - by Anu To Upload
45 pages
Your Large Language Models Are Leaving Fingerprints
No ratings yet
Your Large Language Models Are Leaving Fingerprints
17 pages
Mergeddv
No ratings yet
Mergeddv
2 pages
A Survey On Requirements of Future Intel
No ratings yet
A Survey On Requirements of Future Intel
36 pages
Real-Time Passenger Train Delay Prediction Using Machine Learning A Case Study With Amtrak P
No ratings yet
Real-Time Passenger Train Delay Prediction Using Machine Learning A Case Study With Amtrak P
12 pages
Adaptive Predictive Power Management For Mobile LTE Devices
No ratings yet
Adaptive Predictive Power Management For Mobile LTE Devices
18 pages
Detection of Phishing Websites Using Machine Learning Techniques
No ratings yet
Detection of Phishing Websites Using Machine Learning Techniques
5 pages
Research 4.0 Interim Report
No ratings yet
Research 4.0 Interim Report
29 pages
ICML - 2022 - Active Testing Sample-Efficient Model Evaluation
No ratings yet
ICML - 2022 - Active Testing Sample-Efficient Model Evaluation
11 pages
Searching For Better Test Case Prioritization Schemes: A Case Study of AI-assisted Systematic Literature Review
No ratings yet
Searching For Better Test Case Prioritization Schemes: A Case Study of AI-assisted Systematic Literature Review
25 pages
Role: Business Analyst Company: Payu Location: Gurgaon /Bangalore/Mumbai About Company
No ratings yet
Role: Business Analyst Company: Payu Location: Gurgaon /Bangalore/Mumbai About Company
10 pages
Cortex Data Lake
No ratings yet
Cortex Data Lake
3 pages
Curriculum-PGP in Big Data Analytics and Optimization
No ratings yet
Curriculum-PGP in Big Data Analytics and Optimization
16 pages
Prediction of Cardiovascular Disease Using Machine Learning Techniques
No ratings yet
Prediction of Cardiovascular Disease Using Machine Learning Techniques
6 pages
Analysis On Credit Card Fraud Identification Techniques Based On KNN and Outlier Detection Abstract
No ratings yet
Analysis On Credit Card Fraud Identification Techniques Based On KNN and Outlier Detection Abstract
4 pages