Deep Convolutional Recurrent Autoencoders For Learning Low-Dimensional Feature Dynamics of Fluid Systems
Deep Convolutional Recurrent Autoencoders For Learning Low-Dimensional Feature Dynamics of Fluid Systems
Deep Convolutional Recurrent Autoencoders For Learning Low-Dimensional Feature Dynamics of Fluid Systems
DOI: xxx/xxxx
RESEARCH ARTICLE
KEYWORDS:
nonlinear model reduction, deep learning, convolutional neural networks, LSTM, dynamical systems
1 INTRODUCTION
Dynamical systems are used to describe the rich and complex evolution of many real-world processes. Modeling the dynamics of
physical, engineering, and biological systems is thus of great importance in their analysis, design, and control. Many fields, such
as the physical sciences, are in the fortunate position of having a first-principles models that describes the evolution of certain
systems with near-perfect accuracy (e.g., the Navier-Stokes equations in fluid mechanics, or Schrödingers equations in quantum
mechanics). Although, in principle it is possible to numerically solve these equations through direct numerical simulations
(DNS), this often yields systems of equations with millions or billions of degrees of freedom. Even with recent advances in
computational power and memory capacity, solving these high-fidelity models (HFMs) is still computationally intractable for
multi-query and time-critical applications such as design optimization, uncertainty quantification, and model predictive control.
2 Gonzalez and Balajewicz
Model reduction aims to alleviate this burden by constructing reduced order models (ROMs) that capture the large-scale system
behavior while retaining physical fidelity.
Some fields, however, such as finance and neuroscience, lack governing laws thereby restricting the applicability of principled
strategies for constructing low-order models. In recent years, the rise in machine learning and big data have driven a shift in
the way complex spatiotemporal systems are modeled 1,2,3,4,5 . The abundance of data have facilitated the construction of so
called data-driven models of systems lacking high-fidelity governing laws. In areas where HFMs do exist, data-driven methods
have become an increasingly popular approach to tackle previously challenging problems wherein solutions are learned from
physical or numerical data 6,7,8 .
In model reduction, machine learning strategies have recently been applied to many remaining challenges, including learning
stabilizing closure terms in unstable POD-Galerkin models 8,9 , and data-driven model identification for truncated generalized
POD coordinates 10,11,12 . A more recent approach involved learning a set of observable functions spanning a Koopman invariant
subspace from which low-order linear dynamics of nonlinear systems are modeled 13 . These approaches constitute just a small
portion of the outstanding challenges in which machine learning can aid in modeling low-dimensional dynamics of complex
systems.
In this work we make progress to this end by proposing a method that uses a completely data-driven approach to identify and
evolve a low-dimensional representation of a spatiotemporal system. In particular, we employ a deep convolutional autoencoder
to learn an optimal low-dimensional representation of the full state of the system in the form of a feature vector, or coordinates
of some low-dimensional nonlinear manifold. The dynamics on this manifold are then learned using a recurrent neural network
trained jointly with the autoencoder in an end-to-end fashion using a set of finite-time trajectories of the system.
† For linear time-invariant systems, or systems with polynomial nonlinearities all projection coefficient can be precomputed offline.
Gonzalez and Balajewicz 3
2 PROBLEM FORUMALTION
where 𝑡 ∈ [𝑡0 , 𝑇 ] ⊂ ℝ+ denotes time, 𝐱 ∈ ℝ𝑁 is the spatially discretized state variable where 𝑁 is large, and 𝝁 ∈ ⊆ ℝ𝑑 is
the vector of parameters sampled from the feasible parameter set . Here, 𝐹 ∶ ℝ𝑁 × ℝ+ × ℝ𝑑 → ℝ𝑁 is a nonlinear function
representing the dynamics of the discretized system. Such large nonlinear systems are typical in the computational sciences such
as when numerically solving the Navier-Stokes equations describing a fluid flow. In the parameter-varying case 𝝁 may represent
initial and boundary conditions, material properties, or shape parameters of interest.
Often, in engineering design and analysis the interest is on the evolution of certain outputs
𝐲 = 𝐺(𝐱(𝑡), 𝝁), (2)
𝑝
where 𝐲 ∈ ℝ may represent e.g., lift, drag, or some other performance criteria. In this work, the attention is focused only on
the evolution of the full state 𝐱.
(i) The dataset = {𝐱(𝑡1 ; 𝝁), 𝐱(𝑡2 ; 𝝁), ...} is constructed using snapshots from a single, statistically stationary trajectory
of Equation 1. In this case, 𝝁 is the same for all snapshots. This is relevant to situations in which obtaining snapshot
data is exceedingly expensive such as in large direct numerical simulations and the interest is on obtaining “quick”
approximate solutions.
(ii) The dataset = {𝑋 𝝁1 , 𝑋 𝝁2 , ...} is constructed using multiple, parameter varying trajectories 𝑋 𝝁𝑖 =
{𝐱(𝑡1 ; 𝝁𝑖 ), 𝐱(𝑡2 ; 𝝁𝑖 ), ...}. This case is relevant to multi-query applications or applications in which the interest is on
capturing the parameter-dependent transient behavior of Equation 1.
In both cases the surrogate model is constructed in a non-intrusive fashion using the same procedure and only the dataset is
changed.
Gonzalez and Balajewicz 5
3 BACKGROUND
In this section, we introduce the basic notions of deep learning and two key architectures used in this work: 1) recurrent neural
networks, and 2) convolutional neural networks. Finally, we finish by summarizing the connections between POD and fully-
connected autoencoders.
problem is typically addressed by using gated RNNs, including long short-term memory (LSTM) networks 41 and networks
based on the gated recurrent unit (GRU) 42 . These networks have additional paths through which gradients neither vanish nor
explode, allowing gradients of the loss function to backpropagate across multiple time-steps and thereby making the appropriate
parameter updates. This work will only consider RNNs equipped with LSTM units.
In convolutional neural networks, layers are organized into feature maps, where each unit in a feature map is connected to a
local domain of the previous layer through a filter bank. Consider a 2D input 𝐗 ∈ ℝ𝑁𝑥 ×𝑁𝑦 , a convolutional layer consists of a
′ ′
set of 𝐹 filters 𝐊𝑓 ∈ ℝ𝑎×𝑏 , 𝑓 = 1, ..., 𝐹 , each of which generates a feature map 𝐘𝑓 ∈ ℝ𝑁𝑥 ×𝑁𝑦 by a 2D discrete convolution
∑ ∑
𝑎−1 𝑏−1
𝐘𝑓𝑖,𝑗 = 𝐊𝑓𝑎−𝑘,𝑏−𝑙 𝐗1+𝑠(𝑖−1)−𝑘,1+𝑠(𝑗−1)−𝑙 , (7)
𝑘=0 𝑙=0
Gonzalez and Balajewicz 7
𝑁 +𝑎−2 𝑁 +𝑏−2
where 𝑁𝑥′ = 1 + 𝑥 𝑠 , 𝑁𝑦′ = 1 + 𝑦 𝑠 , and 𝑠 ≥ 1 is an integer value called the stride. Figure 1 shows the effect of different
stride values of a filter acting on an input feature map. As before, the feature map can be passed through an element-wise
nonlinear function. Typically, the dimension of the feature map is reduced by using a pooling layer, in which a single value is
computed from small 𝑎′ × 𝑏′ patch of the feature map either by taking the maximum value or averaging. A slightly more general
approach is to employ a convolutional layer with a stride of 𝑠 > 1, in which instead of taking the maximum or average value,
some weighted sum of the local patch of the input feature map is learned by adjusting the respective filter 𝐊𝑓 . In addition, dilated
convolutional filters (see Figure 2) are often employed to significantly increase the receptive field without loss of resolution,
effectively capturing larger features in highly dense data 43,44 .
The net result is an optimal low-dimensional representation 𝐡 = 𝚿𝑇𝑁 𝐱 of an input 𝐱, where again 𝐡 can be thought of as the
ℎ
intrinsic coordinates on the linear subspace .
‡ This method is known under different names in various fields: POD, principal component analysis (PCA), KarhunenâĂŞLoève decomposition, empirical orthogonal
functions and many others. In this work we will adopt the name POD.
8 Gonzalez and Balajewicz
over the years, most involving using a patchwork of local subspaces {𝑙 }𝐿𝑙=1 obtained through linearizations or higher-order
approximations of the state-space 23,45,24 .
A nonlinear generalization of POD is the under-complete autoencoder 26,34 . An under-complete autoencoder consists of a
single or multiple-layer encoder network
𝐡 = 𝑓𝐸 (𝐱; 𝜽𝐸 ), (13)
where 𝐱 ∈ ℝ𝑁 is the input state, 𝐡 ∈ ℝ𝑁ℎ is the feature or representation vector, and 𝑁ℎ < 𝑁. A decoder network is then used
to reconstruct 𝐱 by
𝐱̂ = 𝑓𝐷 (𝐡; 𝜽𝐷 ). (14)
Training this autoencoder then consists of finding the parameters that minimize the expected reconstruction error over all training
examples
[ ]
𝜽∗𝐸 , 𝜽∗𝐷 = arg min 𝔼𝑥∼𝑑𝑎𝑡𝑎 (𝐱,
̂ 𝐱) , (15)
𝜽𝐸 ,𝜽𝐷
where (𝐱,̂ 𝐱) is some measure of discrepancy between 𝐱 and its reconstruction 𝐱. ̂ Restricting 𝑁ℎ < 𝑁 serves as a form of
regularization, preventing the autoencoder from learning the identify function. Rather, it captures the salient features of the
data-generating distribution 𝑑𝑎𝑡𝑎 . Under-complete autoenocders are just one of a family of regularized autoencoders which also
include contractive autoencoders, denoising autoencoders, and sparse autoencoders 26,34 .
Remark 1. The choice of 𝑓𝐸 , 𝑓𝐷 , and (𝐱,
̂ 𝐱) largely depends on the application. Indeed, if one chooses a linear encoder and a
linear decoder of the form
𝐡 = 𝐖𝐸 𝐱, (16)
𝐱̂ = 𝐖𝐷 𝐡, (17)
where 𝐖𝐸 ∈ ℝ𝑁ℎ ×𝑁 and 𝐖𝐷 ∈ ℝ𝑁×𝑁ℎ , then with a squared reconstruction error
̂ 𝐱) = ‖𝐱 − 𝐱‖
(𝐱, ̂ 22
(18)
= ‖𝐱 − 𝐖𝐖𝑇 𝐱‖22 ,
the autoencoder will learn the same subspace as the one spanned by the first 𝑁ℎ POD modes if 𝐖 = 𝐖𝐷 = 𝐖𝑇𝐸 . However,
without additional constraints on 𝐖, i.e., 𝐖𝑇 𝐖 = 𝐈𝑁ℎ ×𝑁ℎ , the columns of 𝐖 will not form an orthonormal basis or have any
hierarchical ordering 45,46 .
A more completely data-driven approach, and one that is more closely related to our work, is to both learn a low-dimensional
representation of the state variable and to learn the evolution of this representation. This approach has been explored in 13 in
which an autoencoder is used to learn a low-dimensional representation of the high-dimensional state,
𝐡 = 𝑓𝐸 (𝐱), (20)
𝑁 𝑁ℎ
where 𝐱 ∈ ℝ high-dimensional state of the system, 𝐡 ∈ ℝ , 𝑁ℎ < 𝑁, and a linear recurrent model is used to evolve the
low-dimensional features
𝐡𝑛+1 = 𝐊𝐡𝑛 , (21)
where 𝐊 ∈ ℝ𝑁ℎ ×𝑁ℎ . This approach was first introduced in the context of learning a dictionary of functions used in extended
dynamic mode decomposition to approximate the Koopman operator of a nonlinear system 49 .
The central theme in these approaches and projection-based model reduction in general is the following two-step process:
1. The identification of a low-dimensional manifold embedded in ℝ𝑁 on which most of the data is supported. This
yields, in some sense, an optimal low-dimensional representations 𝐡 = 𝑓 (𝐱) of the data 𝐱 in terms of intrinsic
coordinates on , and
2. The identification of a dynamic model which efficiently evolves the low-dimensional representation 𝐡 on the manifold
.
In this work, we build on the framework introduced in 10,12,13 for constructing or augmented reduced order models, and extend
it in multiple directions. First, we introduce a deep convolutional autoencoder architecture which provides certain advantages
in identifying low-dimensional representation of the input data. Second, since the dynamics of reduced state vector on may
not necessarily be linear, we employ a single-layer LSTM network to model the possibly nonlinear evolution of 𝐡 on . Lastly,
we introduce an unsupervised training strategy which trains the convolutional autoencoder while using the current reduced state
vectors to dynamically train the LSTM network.
(i) The local approach of each convolutional layer helps to exploit local correlations in field values. Thus, much the same
way finite-difference stencils can capture local gradients, each filter 𝐊𝑓 in a filter bank computes local low-level features
from a small subset of the input.
(ii) The shared nature of each filter bank both allows to identify similar features throughout the input domain and reduce
the overall number of trainable parameters compared to a fully-connected layer with the same input size.
Consider the following 12-layer convolutional autoencoder model depicted graphically in Figure 3. A 2D arrayed input
𝐗 ∈ ℝ𝑁𝑥 ×𝑁𝑦 , with 𝑁𝑥 = 𝑁𝑦 = 128, is first passed through 4-layer convolutional encoder. Each convolutional encoder layer
uses a filter bank 𝐊𝑓 ∈ ℝ5×5 , with the first layer having a dilation rate of 2 and the number of filters 𝑓 increasing from 4 in
the first layer to 32 in the fourth layer using Equation 7. At the opposite end of the convolutional autoencoder network we use
a 4-layer decoder network consisting of transpose convolutional layers. Often erroneuously referred to as “deconvolutional"
10 Gonzalez and Balajewicz
reshape reshape
Encoder Decoder
FIGURE 3 Network architecture of the convolutional autoencoder. The encoder network consists of a 4-layer convolutional
encoder (blue), a 4-layer fully-connected encoder and decoder (yellow), and a 4-layer transpose convolutional decoder (red).
The low-dimensional representation is depicted in green.
layers, transpose convolutional layers multiply each element of the input with a filter 𝐊𝑓 and sum over the resulting feature
map, effectively swapping the forward and backward passes of a regular convolutional layer. The effect of using transpose
convolutional layers with a stride 𝑠 > 1 is two decode low-dimensional abstract features to a larger dimensional representation.
Table 1 outlines the architecture of the convolutional encoder and decoder subgraphs. In this work will consider the sigmoid
activation function 𝜎(𝑠) = 1∕1 + exp(−𝑠) for each layer of the autoencoder.§
TABLE 1 Convolutional encoder (left) and decoder (right) filter sizes and strides
Layer filter size filters stride Layer filter size filters stride
In between the convolutional encoder and decoder is a regular fully-connected autoencoder consisting of a 2-layer encoder
which takes the vectorized form of the 32 feature maps from the last convolutional encoder layer vec() ∈ ℝ512 , where =
[𝐘1 , ..., 𝐘32 ] ∈ ℝ4×4×32 , and returns the final low-dimensional representation of the input data
𝐡 = 𝜎(𝐖2𝐸 𝜎(𝐖1𝐸 vec() + 𝐛1𝐸 ) + 𝐛2𝐸 ) ∈ ℝ𝑁ℎ , 𝑁ℎ << (𝑁𝑥 ⋅ 𝑁𝑦 ), (22)
where 𝐖1𝐸 , 𝐛1𝐸
and 𝐖2𝐸 , 𝐛2𝐸
are the parameters of the first and second fully-connected encoder network (the 5 and 6 layers th th
of the whole model). To reconstruct the original input data from the low-dimensional representation, a similar 2-layer fully-
connected decoder parameterized by 𝐖1𝐷 , 𝐛1𝐷 and 𝐖2𝐷 , 𝐛2𝐷 , whose result is reshaped and passed to the transpose convolutional
decoder network.
Hierarchical convolutional feature learning through similar strategies have previously been proposed for visual tracking 51
and scene labeling or semantic segmentation of images 30,52,53 . However, this is the first time, to the authors knowledge, that a
convolutional autoencoders have been applied to model reduction of large numerical data of physical dynamical systems. The
key innovation in using convolutional autoencoders in model reduction is that it allows for nonlinear autoencoders and thus
nonlinear model reduction to be applied to large input data in a way that exploits structures inherent in many physical systems.
§ In recent years the rectified linear units (ReLUs) 34 , given by ReLU(𝑠) = max(0, 𝑠), and its many variants like the ELUs 50 , have been favored over the sigmoid
activation function. However, in this work we have found that ReLUs produce results similar to linear model reduction theory since ReLU(𝑠) are linear for inputs 𝑠 ∈ ℝ+ .
Gonzalez and Balajewicz 11
Remark 2. In this work we restrict our attention to 2D input data of size 𝑁𝑥 × 𝑁𝑦 = 128 × 128 with the first layer convolutional
filter having a dilation rate of 2. In practice, however, an equivalent memory-reducing approach was employed by using an input
data of size 𝑁𝑥 × 𝑁𝑦 = 64 × 64. In addition, the low-dimensional representations considered in this work are of size 𝑁ℎ = 64 or
smaller. To this effect, the hidden state sizes of the middle fully-connected autoencoder were chosen to be 512 and 256 such that
𝐖1𝐸 , (𝐖2𝐷 )𝑇 ∈ ℝ512×256 and 𝐖2𝐸 , (𝐖1𝐷 )𝑇 ∈ ℝ256×𝑁ℎ with the bias terms shaped accordingly. The net result is an autoencoder
with a maximum of 330k parameters with 𝑁ℎ = 64. A similar 12-layer fully-connected autoencoder would require over 22M
parameters.
Remark 3. The size of the low-dimensional representation 𝑁ℎ must be chosen a priori for each model. Currently, no principled
approach exists for the choice of 𝑁ℎ . One possible heuristic for an upper bound is to choose 𝑁ℎ such that
∑𝑁ℎ 2
𝜎
𝑖=1 𝑖
∑𝑚 2 < 𝜅, (23)
𝜎
𝑖=1 𝑖
where 𝜎𝑖 ≥ 0 are the singular values of the data matrix 𝐗 ∈ ℝ𝑁×𝑚 and 𝜅 is usually taken to be 99.9%. This approach is often
employed when selecting the number of POD modes to keep in POD-Galerkin reduced order models 14 , where in the context
of fluid flows this corresponds to choosing enough modes such that 99.9% of the energy content in the flow is preserved.
• Forget gate:
𝐟 𝑛 = 𝜎(𝐖𝑓 𝐡𝑛−1 + 𝐛𝑓 )
• Output gate:
𝐨𝑛 = 𝜎(𝐖𝑜 𝐡𝑛−1 + 𝐛𝑜 )
• Cell state:
𝐜𝑛 = 𝐢𝑛 ⊙ 𝐜𝑛−1 + 𝐢𝑛 ⊙ tanh(𝐖𝑐 𝐡𝑛−1 + 𝐛𝑐 )
where all four gates are used to update the feature vector by
𝐡𝑛 = 𝐨𝑛 ⊙ tanh(𝐜𝑛 ). (24)
12 Gonzalez and Balajewicz
Decoder
Encoder
Here, ⊙ represents the Hadamard product. Intuitively, at each step 𝑛 the input and forget gates choose what information gets
passed and dropped from the cell state 𝐜𝑛 , while the output gate controls the flow of information from the cell state to the feature
vector. It is important to note that the the evolution of 𝐡 does not require information from the full state 𝐱, thereby avoiding a
costly reconstruction at every step.
Initializing with a known low-dimensional representation 𝐡0 one obtains a prediction for the following steps by iteratively
applying
̂𝐡𝑛+1 = 𝑓𝐿𝑆𝑇 𝑀 (̂𝐡𝑛 ) 𝑛 = 1, 2, 3, ... (25)
where ̂𝐡1 = 𝑓𝐿𝑆𝑇 𝑀 (𝐡0 ), and 𝑓𝐿𝑆𝑇 𝑀 (⋅) represents the action of Equation 24 and its subcompoents. A graphical representation
of this model is depicted in Figure 4.
Once the model is trained, online prediction is straightforward. Using the trained parameters 𝜽∗ , and given an initial condition
𝐱0 ∈ [0, 1]𝑁𝑥 ×𝑁𝑦 , a low-dimensional representation of the initial condition 𝐡0 ∈ ℝ𝑁 ℎ
is constructed using the encoder network.
Iterative applications of Equation 25 are then used to evolve this low-dimensional representation for 𝑁𝑡 steps. The modular
construction of the convolutional recurrent autoencoder model allows the user to reconstruct from ̂𝐡𝑛 the full-dimensional state
𝐱̂ 𝑛 at every time step or at any specific instance. The online prediction algorithm is outlined in Algorithm 4.2.
14 Gonzalez and Balajewicz
5 NUMERICAL EXPERIMENTS
We apply the methods described in the previous sections on three representative examples to illustrate the effectiveness of deep
autoencoder-based approaches to nonlinear model reduction. The first one considers only a 4-layer fully-connected recurrent
autoencoder model applied to a simple one-dimensional problem based on the viscous Burgers equation. This has the merit of
demonstrating the performance of autoencoders equipped with nonlinear activation functions on tasks where linear methods
tend to struggle. The second example considers a parametric model reduction problem based on two-dimensional fluid flow in
a periodic domain with significant parameter variations. In this case, our convolutional recurrent autoencoder model is tasked
with predicting new solutions given new parameters (i.e., parameters unseen during training). The third example focuses on
long-term prediction of an incompressible flow inside a lid-driven cavity. This case serves to highlight the long-term stability
and overall performance of the convolutional recurrent autoencoder model in contrast to the unstable behavior exhibited by
POD-Galerkin ROMs.
t 0.2
0.1
0.0
0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5
1.0
u(x, t)
0.5
0.0
0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5
x x x
(a) (b) (c)
FIGURE 5 (a) exact solution, (b) 𝑁ℎ = 20; 𝐿2 -optimal POD reconstruction (solid orange), POD-LSTM reconstruction (dashed
orange), exact solution (light blue), (c) 𝑁ℎ = 20; shallow autoencoder-LSTM reconstruction (dashed orange), exact solution
(light blue).
POD modes the 𝐿2 -optimal POD reconstruction exhibits spurious oscillations. The spurious oscillations, aside from signifying
a poor reconstruction, may lead to stability issues in POD-Galerkin ROMs. This is widely known to be a problem for model
reduction of fluid flows where POD-Galerkin ROMs, while capturing nearly all the energy of the system truncate low-energy
modes which can have a large influence on the dynamics. Additionally, in agreement with similar work in 10,12 , the POD-LSTM
model was able to accurately capture the evolution of the optimal POD representation in a non-intrusive manner.
More importantly, the power of recurrent autoencoder-based approaches for nonlinear model reduction is exhibited in the
reconstruction using the shallow recurrent autoencoder model. The effect of nonlinearities in the fully-connected autoencoder
help to identify a more expressive low-dimensional representation of the full state. Combining this with an LSTM network to
evolve these low-dimensional representations yields an effective nonlinear reduced order modeling approach that outperforms
best-case scenario POD-based ROMs while using the same size models.
convolutional autoencoder. The main idea is that similar to detecting an instance of an object anywhere in an an image, the
shared-weight property of each convolutional layer in the autoencoder should be able to capture the large-parameter variations
implicitly defined in the initial condition.
2𝜋
𝜋
-
𝜋
+
0
0 2𝜋
FIGURE 6 Square domain with periodic boundary conditions. The positive and negative vortices of equal strength are randomly
initialized within the grey subdomain.
To create a training dataset, Equation 32 is discretized pseudospectrally using a uniform 1282 grid and integrated in time
using the Crank-Nicholson method to 𝑇 = 250 using a time step of Δ𝑡 = 1 × 10−2 . A parameter-varying dataset is created by
randomly sampling the initial Gaussian center locations from a square subdomain as was previously described. Similar to first
example, after subtracting the temporal mean and feature scaling the resulting dataset has the form
′𝑁
= {𝐗′1
𝑠
, ..., 𝐗𝑠 𝑠 } ∈ [0, 1]𝑁𝑥 ×𝑁𝑦 ×𝑁𝑡 ×𝑁𝑠 , (34)
′𝑁
where each training sample 𝐗′𝑖𝑠 [𝝎′1
=𝑠,𝑖
, ..., 𝝎𝑠,𝑖 𝑡 ] is a matrix of two-dimensional discretized snapshots corresponding to a different
set of initial conditions. In this case, the dataset consists of a total 𝑁𝑠 = 5120 training samples, each with 𝑁𝑡 = 30 evenly sampled
snapshots. Since we are interested in employing the convolutional recurrent autoencoder, each 𝝎′𝑠,𝑖 is kept as a two-dimensional
array.
Three convolutional recurrent autoencoder models, with feature vector sizes 𝑁ℎ = 8, 16, and 64, were trained using the
dataset Equation 34 with both two and three initial vortices. Each model was trained on an single Nvidia Tesla K20 GPU for
𝑁𝑡𝑟𝑎𝑖𝑛 = 1, 000, 000 iterations. Once trained, the three models were used to predict the evolution of the vorticity field for new
initial conditions. To highlight the benefits of convolutional recurrent autoencoders for location-invariant feature learning, we
compare our prediction with a set of best-case scenario rank-8 POD reconstructions. These rank-8 POD reconstructions use a
dataset containing snapshots from just 128 separate trajectories. In this case, a rank-8 POD reconstruction is not sufficient to
accurately capture the correct solution since the inclusion of randomly varying initial conditions has created a dataset that is no
longer low-rank. This is clearly shown in Figure 9 and Figure 10 for the two and three vortex cases. The need for more and
more POD modes to achieve a good reconstruction underscores a significant disadvantage of POD-based ROMs for systems
with large variations in parameters. The convolutional recurrent autoencoder overcomes these challenges and performs well in
prediction new solutions without the need to resort to larger-rank models. In contrast to POD-Galerkin ROMs, increasing the
number of separate trajectories in a dataset is beneficial to learning the correct dynamic behavior.
Similar to the first numerical example, the predictions are devoid of any spurious oscillations that are commonplace in POD-
based ROMs. Considering a single initial condition Figure 11 shows the performance of each sized model in predicting the
location of each vortex as it evolves up to the training sequence length for the two vortex case. Futher, Figure 12 shows the
Gonzalez and Balajewicz 17
2π
0
2π
y
0
2π
y
0
0 2π 0 2π 0 2π
x x x
FIGURE 7 A set of initial conditions with two randomly located Gaussian vortices of equal and opposite strength.
2π
0
2π
y
0
2π
y
0
0 2π 0 2π 0 2π
x x x
FIGURE 8 A set of initial conditions with three randomly located Gaussian vortices, one positive and two negative all with
equal strength.
where Ψ(𝑥, 𝑦, 𝑡) is the streamfunction, and ∇4 = ∇2 ∇2 is the biharmonic operator. To generate the training dataset, Equation 36
is spatially discretized using a 1282 Chebyshev grid and solved numerically. The Chebyshev coefficients are derived using the
fast Fourier transform (FFT), where the contractive nonlinearities are handled pseudospectrally. The equations are integrated in
time using a semi-implicit, second order Euler scheme. Since the statistically stationary solution is far from the initial condition,
we first initialize the simulation over 7, 500, 000 time steps with time-step size Δ𝑡 = 1 × 10−4 . The following 2, 500, 000 time
steps are then used to create a dataset in the form of Equation 28 with 𝑁𝑠 = 1110 where now each training sample is
′𝑖+(𝑁𝑡 −1)𝑚
𝐗′𝑖𝑠 = [𝚿′𝑖𝑠 , 𝚿𝑠′𝑖+𝑚 , 𝚿′𝑖+2𝑚
𝑠
, ..., 𝚿𝑠 ] ∈ ℝ𝑁𝑥 ×𝑁𝑦 ×𝑁𝑡 , (37)
where each 𝚿′𝑖𝑠 is a discretized two-dimensional snapshot of Equation 36, 𝑁𝑡 = 35, and 𝑚 is taken to be 100. In doing this, we
ensure that the initial training snapshot used to initialize the RNN portion of the model evenly samples the entire trajectory of
Equation 36. The result is the construction a training dataset that gives a good representation of the dynamics for the RNN to
learn. In addition, an interpolation step onto a uniform 1282 is performed to ensure each filter 𝐊𝑓 acts on equally physically-sized
receptive fields.
Three convolutional recurrent autoencoder models were trained using this dataset, again with low-dimensional representations
of sizes 𝑁ℎ = 8, 16, and 64. In this case all three models were trained on a single Nvidia Tesla K20 GPU for 𝑁𝑡𝑟𝑎𝑖𝑛 = 600, 000
iterations. The online performance of the each model was evaluated by initializing each model with a slightly perturbed version
of the first snapshot of the entire dataset and evaluating for 2500 prediction steps, over 70 times the length of each training
sequence. We perform the same with three equivalently sized POD-Galerkin ROMs. ?? depict the final predicted velocity fields
Gonzalez and Balajewicz 19
0
(b) 2π
y
0
(c) 2π
y
0
0 2π 0 2π 0 2π 0 2π
x x x x
FIGURE 9 Comparison at 𝑡 = 0, 40, 80, 120 of a sample trajectory using two initial vortices: (a) true solution, (b) rank-8 POD
reconstruction using dataset with 128 trajectories, and (c) prediction using a trained convolutional recurrent autoencoder of size
𝑁ℎ = 8.
𝑢(𝑥, 𝑦, 𝑡), 𝑣(𝑥, 𝑦, 𝑡), as well as the predicted vorticity field 𝜔(𝑥, 𝑦, 𝑡) using traditional POD-Galerkin ROMs and our convolutional
recurrent autoencoder model for 𝑁ℎ = 8, 16, and 64. In all three reconstructed fields the poor performance of POD-Galerkin
ROMs can be easily noticed by the spurious oscillations present in the field. This is in contrast to the predictions presented using
our approach, which nearly capture the exact solution even after long-term prediction.
In fact, we only present predictions up until 𝑡 = 60 for the 𝑁ℎ = 8 POD-Galerkin ROM since instabilities cause the solution
to diverge. This can be seen more clearly in Figure 17, which compares the instantaneous turbulent kinetic energy (TKE) of the
flow ( )
1
𝐸(𝑡) = 𝑢(𝑡)′2 + 𝑣(𝑡)′2 𝑑Ω (38)
2∫
Ω
where 𝑢(𝑡)′ and 𝑣(𝑡)′ are the instantaneous velocity fluctuations around the mean and Ω represents the fluid domain. The TKE
can be seen as a measure of the energy content within the flow. For statistically stationary flows, such as the one considered in
this example, the TKE should hover around a mean value. In Figure 17 we see that the POD-Galerkin models fail to capture
the correct TKE, and in the case of 𝑁ℎ = 8 instabilities lead to eventual divergence.
Against this backdrop, we can see that our approach vastly outperforms traditional POD-Galerkin ROMs. All velocity and
vorticity reconstructions are in good agreement with the HFM solution. As the size of the model increases to 𝑁ℎ = 64, we
see that predicted TKE is in good agreement with that of the HFM. It should be noted that the lid-driven cavity flow at these
Reynolds numbers exhibits chaotic motion, thus a best-case scenario would be to capture the right TKE in a statistical sense.
This can be seen further in Figure 18 which compares the power spectral density of each predicted TKE with that of the HFM.
20 Gonzalez and Balajewicz
0
(b) 2π
y
0
(c) 2π
y
0
0 2π 0 2π 0 2π 0 2π
x x x x
FIGURE 10 Comparison at 𝑡 = 0, 40, 80, 120 of a sample trajectory using three initial vortices: (a) true solution, (b) rank-8
POD reconstruction using dataset with 128 trajectories, and (c) prediction using a trained convolutional recurrent autoencoder
of size 𝑁ℎ = 8.
2π
HFM Nh = 16
Nh = 8 Nh = 64
y
0
0 2π
x
FIGURE 11 Evolution of the vortex centers as given the HFM solution and the predicted solutions using 𝑁ℎ = 8, 16, 64.
Gonzalez and Balajewicz 21
×10−5
kω 0n −ω̂ 0n k22
kω 0n k22 +
0.5
0.0
0 100 200 0 100 200 0 100 200
t t t
(a) (b) (c)
FIGURE 12 Mean and standard deviation of error at every time step for online predictions using (a) 𝑁ℎ = 8, (b) 𝑁ℎ = 16, and
(c) 𝑁ℎ = 64.
𝒖$%&
1
−1
−1 1
FIGURE 13 Lid-driven cavity domain, with lid velocity 𝐮lid = (1 − 𝑥2 )2 .
While each model prediction capatures the general behavior of the HFM, there is some high spatial frequency error evident
throughout the domain in each reconstruction. Interestingly, the stability of the RNN portion of the each model remains unaf-
fected by this high-frequency noise suggesting that it is due only to the transpose convolutional decoder. This is possibly a result
of performing a strided transpose convolution at each layer of the decoder. It is possible and perhaps beneficial to include a final
undilated convolutional layer with a single feature map to filter some of the high-frequency reconstruction noise.
6 CONCLUSIONS
In this work we propose a completely data-driven nonlinear reduced order model based on a convolutional recurrent autoen-
coder architecture for application to parameter-varying systems and systems requiring long-term stability. The construction
of the convolutional recurrent autoencoder consists of two major components each of which performs a key task in projec-
tion based reduced order modeling. First a convolutional autoencoder is designed to identify a low-dimensional representation
of two-dimensional input data in terms of intrinsic coordinates on some low-dimensional manifold embedded in the original,
high-dimensional space. This is done by considering a 4-layer convolutional encoder which computes a hierarchy of local-
ized, location invariant features that are passed to a two-layer fully connected encoder. The result of this is a mapping from
the high-dimensional input space to a low-dimensional data-supporting manifold. An equivalent decoder architecture is consid-
ered for efficiently mapping from the low-dimensional representation to the original space. This can be intuitively understood
22 Gonzalez and Balajewicz
−1
(e) (f) (g) (h)
1
y
−1
−1 1 −1 1 −1 1 −1 1
x x x x
FIGURE 14 𝑢(𝑥, 𝑦, 𝑡) contours of the lid-driven cavity flow at 𝑡 = 250 𝑠 using the optimal POD reconstruction: (a) 𝑁ℎ = 8
(note: 𝑡 = 60 shown, right before blowup), (b) 𝑁ℎ = 16, (c) 𝑁ℎ = 64, (d) true solution; and predicted contours using the
convolutional recurrent autoencoder model with hidden state sizes (e) 𝑁ℎ = 8, (f) 𝑁ℎ = 16, (g) 𝑁ℎ = 64, (h) true solution.
as a nonlinear generalization of POD, where the structure of the manifold is more expressive than the linear subspaces learned
by POD-based methods. The second important component of the proposed convolutional recurrent autoencoder is a modified
version of an LSTM network which models the dynamics on the manifold learned by the autoencoder. The LSTM network is
modified to require only information from the low-dimensional representation thereby avoiding costly reconstruction of the full
state at every evolution step.
An offline training and online prediction strategy for the convolutional recurrent autoencoder is proposed in this work. The
training algorithm exploits the modularity of the model by splitting each forward pass into two steps. The first step running a
forward pass on the autoencoder while creating a temporary batch of target low-dimensional representations which are then used
in the second step, which is the forward pass of the modified LSTM network. The backwards pass, or parameter update is then
performed jointly equally weighting autoencoder reconstruction error and the prediction error of the modified LSTM network.
We demonstrated our approach on three illustrative nonlinear model reduction examples. The first emphasizes the expres-
sive power of using fully-connected autoencoders equipped with nonlinear activation functions on performing model reduction
tasks in contrast to POD-based methods. The second highlights the performance of the convolutional recurrent autoencoder,
and in particular its location-invariant properties, in parametric model reduction with initial condition exhibiting large parame-
ter variations. The final example demonstrates the stability of convolutional recurrent autoencoders when performing long-term
predictions of choatic incompressible flows. Collectively, these numerical examples show that our convolutional recurrent
autoencoder model outperforms traditional POD-Galerkin ROMs both in terms of prediction quality, parameter variations,
and stability while also offering other advantages such as location invariant feature learning and non-intrusiveness. In fact,
although in this work we make use of canonical model reduction examples based on computational physics problems, our
approach is completely general and can be applied to arbitrary high-dimensional spatiotemporal data. When compared to exist-
ing autoencoder-based reduced order modeling strategies, our model provides access to larger-sized problems while keeping the
number of trainable parameters low compared to fully-connected autoencoders.
−1
(e) (f) (g) (h)
1
y
−1
−1 1 −1 1 −1 1 −1 1
x x x x
FIGURE 15 𝑣(𝑥, 𝑦, 𝑡) contours of the lid-driven cavity flow at 𝑡 = 250 𝑠 using the optimal POD reconstruction: (a) 𝑁ℎ = 8
(note: 𝑡 = 60 shown, right before blowup), (b) 𝑁ℎ = 16, (c) 𝑁ℎ = 64, (d) true solution; and predicted contours using the
convolutional recurrent autoencoder model with hidden state sizes (e) 𝑁ℎ = 8, (f) 𝑁ℎ = 16, (g) 𝑁ℎ = 64, (h) true solution.
promising predictive results for both parameter-varying model reduction problems and problems requiring long-term stability,
these methods remain in their infancy and their full capabilities are yet unknown. There are multiple directions in which this
work can be extended. One such direction is improving the design of the convolutional transpose decoder. As it stands, the main
source of error in our results is high-frequency in nature an appears only during the decoding phase. Considering this, future
decoder designs could include more efficient filtering strategies. Another possible direction is in the dynamic modeling of the
low-dimensional representations. In this work, we considered samples with spatial parameter variations and thus the design of
the LSTM network could remain unchanged. However, there is potential for deep learning-based dynamic modeling approaches
that exploit multi-scale phenomena inherent in many physical systems. Finally, a much more challenging problem is the rec-
onciliation of deep learning-based performance gains with physical intuition. This issue permeates throughout all fields where
deep learning has made an impact: what is it actually doing? Developing our understanding of deep learning-based modeling
strategies can potentially provide us with deeper insights of the dynamics inherent in a physical system.
ACKNOWLEDGMENTS
This material is based upon the work supported by the Air Force Office of Scientific Research under Grant No. FA9550-17-1-
0203. Simulations and model training were also made possible in part by an exploratory award from the Blue Waters sustained-
petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993)
and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center
for Supercomputing Applications.
References
1. Raissi M, others . Machine learning of linear differential equations using Gaussian processes. Journal of Computational
Physics 2017; 348: 683–693. doi: 10.1016/j.jcp.2017.07.050
24 Gonzalez and Balajewicz
−1
(e) (f) (g) (h)
1
y
−1
−1 1 −1 1 −1 1 −1 1
x x x x
FIGURE 16 Vorticity contours of the lid-driven cavity flow at 𝑡 = 250 𝑠 using the optimal POD reconstruction: (a) 𝑁ℎ = 8
(note: 𝑡 = 60 shown, right before blowup), (b) 𝑁ℎ = 16, (c) 𝑁ℎ = 64, (d) true solution; and predicted contours using the
convolutional recurrent autoencoder model with hidden state sizes (e) 𝑁ℎ = 8, (f) 𝑁ℎ = 16, (g) 𝑁ℎ = 64, (h) true solution.
2. Brunton SL, Brunton BW, Proctor JL, Kaiser E, Kutz JN. Chaos as an intermittently forced linear system. Nature
Communications 2017; 8(1): 19. doi: 10.1038/s41467-017-00030-8
3. Bongard J, Lipson H. Automated reverse engineering of nonlinear dynamical systems. Proceedings of the National Academy
of Sciences 2007; 104(24): 9943–9948. doi: 10.1073/pnas.0609476104
4. Schaeffer H. Learning partial differential equations via data discovery and sparse optimization. Proceedings of the Royal
Society A: Mathematical, Physical and Engineering Science 2017; 473(2197): 20160446. doi: 10.1098/rspa.2016.0446
5. Tran G, Ward R. Exact Recovery of Chaotic Systems from Highly Corrupted Data. Multiscale Modeling & Simulation
2017; 15(3): 1108–1129. doi: 10.1137/16M1086637
6. Raissi M, Perdikaris P, Karniadakis GE. Inferring solutions of differential equations using noisy multi-fidelity data. Journal
of Computational Physics 2016; 335: 736–746. doi: 10.1016/j.jcp.2017.01.060
7. Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data by sparse identification of nonlinear dynamical
systems. Proceedings of the National Academy of Sciences 2016; 113(15): 3932–3937. doi: 10.1073/pnas.1517384113
8. San O, Maulik R. Neural network closures for nonlinear model order reduction. 2017; 1(405): 1–33.
9. Benosman M, Borggaard J, San O, Kramer B. Learning-based robust stabilization for reduced-order models of 2D and 3D
Boussinesq equations. Applied Mathematical Modelling 2017; 49: 162–181. doi: 10.1016/j.apm.2017.04.032
10. Wang Z, others . Model identification of reduced order fluid dynamics systems using deep learning. International Journal
for Numerical Methods in Fluids 2017(July): 1–14. doi: 10.1002/fld.4416
11. Wang Q, Hesthaven JS, Ray D. Non-intrusive reduced order modeling of unsteady flows using artificial neural networks
with application to a combustion problem. 2018.
12. Kani JN, Elsheikh AH. DR-RNN: A deep residual recurrent neural network for model reduction. arXiv 2017.
13. Otto SE, Rowley CW. Linearly-Recurrent Autoencoder Networks for Learning Dynamics. arXiv 2017: 1–37.
Gonzalez and Balajewicz 25
10−1
Nh = 8
10−2
E(t)
10−3
10−4
10−1
Nh = 16
−2
10
E(t)
10−3
10−4
10−1
Nh = 64
10−2
E(t)
10−3
10−4
0 50 100 150 200 250
t
FIGURE 17 The evolution of the instantaneous turbulent kinetic energy for the lid-driven cavity flow from the DNS (thick grey
lines), standard POD-based Galerkin ROMs (blue dashed lines), and our method (solid black lines).
14. Benner P, Gugercin S, Willcox K. A Survey of Projection-Based Model Reduction Methods for Parametric Dynamical
Systems. SIAM Review 2015; 57(4): 483–531. doi: 10.1137/130932715
15. Carlberg K, Bou-Mosleh C, Farhat C. Efficient non-linear model reduction via a least-squares Petrov-Galerkin projection and
compressive tensor approximations. International Journal for Numerical Methods in Engineering 2011; 86(2): 155–181.
doi: 10.1002/nme.3050
16. Parish EJ, Duraisamy K. A paradigm for data-driven predictive modeling using field inversion and machine learning. Journal
of Computational Physics 2016; 305: 758–774. doi: 10.1016/j.jcp.2015.11.012
17. Lumley JL. Stochastic Tools in Turbulence. Elsevier . 1970.
18. Holmes P, Lumley JL, Berkooz G. Turbulence, Coherent Structures, Dynamical Systems and Symmetry. Cambridge:
Cambridge University Press . 1996
19. Bai Z. Krylov subspace techniques for reduced-order modeling of large-scale dynamical systems. Applied Numerical
Mathematics 2002; 43(1-2): 9–44. doi: 10.1016/S0168-9274(02)00116-2
26 Gonzalez and Balajewicz
−80
Power (dB)
−100
HFM
Nh = 8
−120 Nh = 16
Nh = 64
−140
10−1 100
Normalized frequency (×π rad/sample)
FIGURE 18 PSD of the turbulent kinetic energy of the lid-driven cavity flow.
20. Schmid PJ. Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics 2010; 656:
5–28. doi: 10.1017/S0022112010001217
21. Chaturantabut S, Sorensen DC. Nonlinear Model Reduction via Discrete Empirical Interpolation. SIAM Journal on
Scientific Computing 2010; 32(5): 2737–2764. doi: 10.1137/090766498
22. Carlberg K, Farhat C, Cortial J, Amsallem D. The GNAT method for nonlinear model reduction: Effective implementation
and application to computational fluid dynamics and turbulent flows. Journal of Computational Physics 2013; 242: 623–647.
doi: 10.1016/j.jcp.2013.02.028
23. Rewieński M, White J. Model order reduction for nonlinear dynamical systems based on trajectory piecewise-linear
approximations. Linear Algebra and its Applications 2006; 415(2-3): 426–454. doi: 10.1016/j.laa.2003.11.034
24. Trehan S, Durlofsky LJ. Trajectory piecewise quadratic reduced-order model for subsurface flow, with application to PDE-
constrained optimization. Journal of Computational Physics 2016; 326: 446–473. doi: 10.1016/j.jcp.2016.08.032
25. Balajewicz MJ, Dowell EH, Noack BR. Low-dimensional modelling of high-Reynolds-number shear flows incor-
porating constraints from the NavierâĂŞStokes equation. Journal of Fluid Mechanics 2013; 729: 285–308. doi:
10.1017/jfm.2013.278
26. Hinton GE. Reducing the Dimensionality of Data with Neural Networks. Science 2006; 313(5786): 504–507. doi:
10.1126/science.1127647
27. Wang Y, Yao H, Zhao S. Auto-encoder based dimensionality reduction. Neurocomputing 2016; 184: 232–242. doi:
10.1016/j.neucom.2015.08.104
28. Hartman D, Mestha LK. A Deep Learning Framework for Model Reduction of Dynamical Systems. In: ; 2017: 1917–1922.
29. Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. Advances In
Neural Information Processing Systems 2012: 1–9.
30. Farabet C, Couprie C, Najman L, LeCun Y. Learning Hierarchical Features for Scene Labeling. IEEE Transactions on
Pattern Analysis and Machine Intelligence 2013; 35(8): 1915–1929. doi: 10.1109/TPAMI.2012.231
Gonzalez and Balajewicz 27
31. Hinton G, others . Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research
Groups. IEEE Signal Processing Magazine 2012; 29(6): 82–97. doi: 10.1109/MSP.2012.2205597
32. Xiong HY, others . The human splicing code reveals new insights into the genetic determinants of disease. Science 2015;
347(6218). doi: 10.1126/science.1254806
33. Leung MKK, Xiong HY, Lee LJ, Frey BJ. Deep learning of the tissue-regulated splicing code. Bioinformatics 2014; 30(12):
i121–i129. doi: 10.1093/bioinformatics/btu277
35. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521(7553): 436–444. doi: 10.1038/nature14539
36. Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv 2014: 1–15.
37. Zeiler MD. ADADELTA: An Adaptive Learning Rate Method. arXiv 2012.
38. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature 1986; 323(6088):
533–536. doi: 10.1038/323533a0
39. Werbos P. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE 1990; 78(10): 1550–1560.
doi: 10.1109/5.58337
40. Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on
Neural Networks 1994; 5(2): 157–166. doi: 10.1109/72.279181
41. Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation 1997; 9(8): 1735–1780. doi:
10.1162/neco.1997.9.8.1735
42. Cho K, others . Learning Phrase Representations using RNN EncoderâĂŞDecoder for Statistical Machine Translation. In:
Association for Computational Linguistics; 2014; Stroudsburg, PA, USA: 1724–1734
43. Yu F, Koltun V. Multi-Scale Context Aggregation by Dilated Convolutions. 2015. doi: 10.16373/j.cnki.ahr.150049
44. Li Y, Zhang X, Chen D. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes.
2018. doi: 10.1109/CVPR.2018.00120
45. Bengio Y, Courville A, Vincent P. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern
Analysis and Machine Intelligence 2013; 35(8): 1798–1828. doi: 10.1109/TPAMI.2013.50
46. Plaut E. From Principal Subspaces to Principal Components with Linear Autoencoders. arXiv 2018: 1–6.
47. Ogunmolu O, Gu X, Jiang S, Gans N. Nonlinear Systems Identification Using Deep Dynamic Neural Networks. 2016.
48. Yeo K. Model-free prediction of noisy chaotic time series by deep learning. arXiv 2017(10): 1–5.
49. Li Q, Dietrich F, Bollt EM, Kevrekidis IG. Extended dynamic mode decomposition with dictionary learning: A data-driven
adaptive spectral decomposition of the koopman operator. Chaos 2017; 27(10). doi: 10.1063/1.4993854
50. Clevert DA, Unterthiner T, Hochreiter S. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs).
2015: 1–14. doi: 10.3233/978-1-61499-672-9-1760
51. Ma C, Huang Jb, Yang X, Yang Mh. Hierarchical Convolutional Features for Visual Tracking. In: IEEE; 2015: 3074–3082
52. Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. Proceedings of the IEEE International
Conference on Computer Vision 2015; 2015 Inter: 1520–1528. doi: 10.1109/ICCV.2015.178
53. Badrinarayanan V, Kendall A, Cipolla R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image
Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 2017; 39(12): 2481–2495. doi:
10.1109/TPAMI.2016.2644615
28 Gonzalez and Balajewicz
54. Lipton ZC, Berkowitz J, Elkan C. A Critical Review of Recurrent Neural Networks for Sequence Learning. Proceedings of
the ACM International Conference on Multimedia - MM ’14 2015: 675–678.
55. Sutskever I, Vinyals O, Le QV. Sequence to Sequence Learning with Neural Networks. arXiv 2014: 1–9.
56. Wu Y, others . Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.
arXiv 2016: 1–23. doi: abs/1609.08144
57. Abadi M, others . TensorFlow: A System for Large-Scale Machine Learning TensorFlow: A system for large-scale machine
learning. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16) 2016: 265–284. doi:
10.1038/nn.3331