0% found this document useful (0 votes)

44 views14 pages

Deep-Learning-Based Lossless Image Coding

The document proposes a novel deep learning approach for lossless image compression. It uses a deep neural network as a predictor to estimate pixel values, substantially improving over traditional prediction methods. Prediction errors are encoded using a context-tree modeling method and novel context-tree bit-plane entropy codec. The approach outperforms state-of-the-art lossless compression methods on photographic images, lenslet images from plenoptic cameras, and video sequences.

Uploaded by

Muhammad Talib Hussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views14 pages

Deep-Learning-Based Lossless Image Coding

Uploaded by

Muhammad Talib Hussain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 30, NO.

7, JULY 2020 1829

Deep-Learning-Based Lossless Image Coding

Ionut Schiopu , Member, IEEE, and Adrian Munteanu , Member, IEEE
Abstract— This paper proposes a novel approach for lossless design constraints on computational complexity. These meth-
image compression. The proposed coding approach employs a ods follow a predictive coding paradigm whereby the value
deep-learning-based method to compute the prediction for each of the current pixel is predicted using a linear combination
pixel, and a context-tree-based bit-plane codec to encode the
prediction errors. First, a novel deep learning-based predictor of the values in a small causal neighborhood. State-of-the-art
is proposed to estimate the residuals produced by traditional compression methods include powerful tools for processing the
prediction methods. It is shown that the use of a deep-learning residual-error and for encoding the error using variable-length
paradigm substantially boosts the prediction accuracy compared or context-based adaptive entropy coding methods. Some of
with the traditional prediction methods. Second, the prediction the most popular lossless compression methods include JPEG-
error is modeled by a context modeling method and encoded
using a novel context-tree-based bit-plane codec. Codec profiles LS [6], which employs the LOCO-I predictor which operates
performing either one or two coding passes are proposed, trading on a three-pixel causal neighborhood to predict the current
off complexity for compression performance. The experimental pixel, and CALIC [7], which applies a complex context con-
evaluation is carried out on three different types of data: ditioning scheme based on the predicted value and a six-pixel
photographic images, lenslet images, and video sequences. The causal neighborhood.
experimental results show that the proposed lossless coding
approach systematically and substantially outperforms the state- When referring to video, the current video compression
of-the-art methods for each type of data. standard is High Efficiency Video Coding (HEVC) [8], [9],
Index Terms— Machine learning, image coding, context which is widely used in numerous lossy video coding appli-
modeling. cations. Its lossless coding extension as well as improvements
of its intra and inter prediction tools were recently proposed
for lossless video coding applications [10].
I. I NTRODUCTION In the Machine Learning (ML) domain, recent research
studies have proven that solutions based on modern ML tools
T HE latest technological advances in camera sensor tech-
nologies are now offering the possibility to the industry to
produce top-of-the-line cameras with increased image resolu-
are providing remarkable gains over traditional state-of-the-
art methods. Modern ML tools were successfully applied
tion which can be integrated in a wide variety of applications. in numerous domains, reaching state-of-the-art performance
The volume of data produced by professional cameras or even in e.g., super-resolution [11], inpainting [12], depth map
phone cameras is steadily growing which imposes stringent prediction [13] and estimation [14], view synthesis [15], etc.
constraints on the development of efficient solutions for storing In the recent years, several approaches were proposed to
or streaming the digital content by employing compression integrate modern ML tools into coding systems. In [16],
methods with an increased performance. the authors propose one of the first publications describing
There are many applications which require that the original an image compression method based on a new design which
image must be compressed without any information loss employs ML tools. In [17], the authors propose an end-to-end
considering that the raw data captured by the camera sensor trainable model for image compression based on variational
contains critical information which will be lost after applying auto-encoders, where the model incorporates a hyperprior to
a lossy compression technique. Lossless image compression effectively capture spatial dependencies in the latent represen-
algorithms are mostly used in the development of applica- tation. In [18], the authors proposed a neural network design
tions in medical imaging [1], [2], professional photographic based on a sequence of several dense layers which is employed
imaging [3], and satellite image processing [4], [5], to name to block-based intra prediction in lossy video coding.
a few. In our prior work, we investigated the capabilities
Traditional state-of-the-art lossless compression methods offered by Convolutional Neural Networks (CNN) to
were designed on low resolution imagery and respecting severe act as prediction tools in compression systems, perform-
ing pixel-wise [19], [20] or block-wise [21] prediction and
Manuscript received September 21, 2018; revised February 13, 2019 and competing against traditional prediction strategies. The pro-
April 2, 2019; accepted April 2, 2019. Date of publication April 9, 2019; date posed coding paradigms demonstrate the potential offered by
of current version July 2, 2020. This work was supported in part by Fonds
Wetenschappelijk Onderzoek (FWO) - Vlaanderen (Research Foundation - CNN-based prediction mechanisms to improve the coding
Flanders) and in part by the 3DLicorneA Project funded by the Brussels Insti- performance over the state-of-the-art methods for lossless
tute for Research and Innovation (Innoviris). This paper was recommended image compression.
by Associate Editor G. Valenzise. (Corresponding author: Ionut Schiopu.)
The authors are with the Department of Electronics and Informatics The goal of this paper is to further advance over our findings
(ETRO), Vrije Universiteit Brussel (VUB), 1050 Brussels, Belgium (e-mail: in [19]–[21] and to propose a novel deep learning-based
[email protected]). coding approach for lossless compression of image and video
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org. data. In summary, the novel contributions of this paper are as
Digital Object Identifier 10.1109/TCSVT.2019.2909821 follows:
1051-8215 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Zhejiang Normal University. Downloaded on September 18,2023 at 01:54:39 UTC from IEEE Xplore. Restrictions apply.
1830 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 30, NO. 7, JULY 2020

(1) a new coding approach based on deep-learning and plenoptic cameras are build based on microlens technolo-
context-tree modeling for lossless image coding; gies leading to unfocused [26], [27] (e.g., Lytro cameras) or
(2) a new neural network design for a deep-learning based focused plenoptic cameras [28], [29] (e.g., Raytrix cameras).
predictor for lossless image coding; Microlens technologies enable capturing the light field as a
(3) an efficient context-tree based bit-plane entropy codec; so-called lenslet image, which is a matrix of macro-pixels,
(4) adaptations of the CALIC context modeling procedure whereby each macro-pixel corresponds to a microlens, cov-
for high resolution images and lenslet images; ering N × N pixels in the camera sensor. The macro-pixels
(5) a new strategy for generating binary context trees for a are arranged in the lenslet image according to the position
bit-plane coding strategy; of its corresponding microlens in the microlense matrix. An
(6) an elaborated experimental validation carried out on alternative approach for representing the 4D light field data is
three different types of data, that is: to generate the corresponding set of N 2 subaperture images
(a) UHD photographic images; from the acquired lenslet image. Each subaperture image
(b) lenslet images; corresponds then to a specific camera view captured at a
(c) high-resolution video sequences. specific angle, which is obtained by selecting the pixels located
The remainder of this paper is organized as follows. at the same spatial position in all macro-pixels.
Section II outlines state-of-the-art methods in the fields In recent years, the research community has focused on
of Machine Learning and Lossless Image Compression. offering solutions for compressing plenoptic images. Tradi-
Section III describes the proposed coding approach. The exper- tional methods have proven to be inefficient when applied
imental validation and performance analysis of the proposed to light field data as they fail to account for the specific
coding approach are presented in section IV. Finally, section V macro-pixel structure of such images. In lossless compression,
draws the conclusions of this work. different methods were proposed by taking into account the
plenoptic structure. In [30], the authors propose a predictive
coding method for compressing the raw data captured by
II. S TATE - OF - THE -A RT a plenoptic camera. In [31], each subaperture image in the
Lossless image compression was highly influenced by the RGB representation is encoded relative to a neighboring image
introduction of the Lossless JPEG (JPEG-LS) [6] standard based on a context modeling algorithm. In [32], a sparse
developed by the Joint Photographic Experts Group as an addi- modeling predictor guided by a disparity-based image seg-
tion to the JPEG standard [22] for lossless and near-lossless mentation is employed to encode the set of subaperture images
compression of continuous-tone images. Although an old stan- after applying the RCT color transform from JPEG-LS [6],
dard, (JPEG-LS) [6] maintains its competitive performance which resulted in an increased representation on 9 bits of
thanks to LOCO-I which is a simple, yet efficient prediction the chroma components. In [33], different color transforms
method that uses a small causal neighborhood of three pixels were tested for encoding the set of subaperture images. In
to predict the current pixel. JPEG-LS is well known for its low the lossy compression domain, most of the proposed solu-
complexity which comes from simple residual-error modeling tions are obtained by modifying the HEVC standard to take
based on a Two-Sided Geometric Distribution (TSGD) and into account the plenoptic structure [34]–[37]. Furthermore,
from the use of the Golomb-like codes in the entropy coder. light field compression was the topic of several competitions
The Context-based, Adaptive, Lossless Image Codec or special sessions in the most important signal process-
(CALIC) [7] is a more complex codec, representing the ing conferences [38], [39] where many approaches were pro-
reference method in the literature for lossless encoding of posed. The current state of the art in lossy coding of lenslet
continuous-tone images. In CALIC, the prediction is com- images has recently been proposed in [40]; in this approach,
puted by the Gradient Adjusted Predictor (GAP) which used a macro-pixels were adopted as elementary coding blocks, and
causal neighborhood of six pixels. Moreover, an error context dedicated intra-coding methods based on dictionary learning,
modeling procedure is exploiting the higher-order structures directional prediction and optimized linear prediction ensure
and an entropy coder based on histogram tail truncation is high coding efficiency for this type of data.
efficiently compressing the residual-error. In recent years, the ML domain had gained a lot of
In a more recent work [23], a lossless compression algo- popularity due to its high performance and applicability in
rithm called Free Lossless Image Format (FLIF) was proposed numerous domains. In general, the ML-based solutions are
based on Meta-Adaptive Near-zero Integer Arithmetic Coding attractive since they addresses the modern high-dimensional
(MANIAC), where not just the probability model associated challenges of processing a big amount of data, and they offer
to the local context is adaptive, but also the context model the possibility to simply replace some specific components of
itself is adaptive. For any give image dataset, FLIF currently a working algorithmic solution.
achieves the best compression results [24] compared to the Furthermore, machine learning solutions have benefited
most recent algorithms developed for lossless image compres- of important recent breakthroughs that boosted their per-
sion applications. formance and enabled practical deployments in numerous
Another domain where high spatial resolutions are encoun- domains; these advances include (i) the introduction of the
tered is light field imaging. In this domain, light field images batch normalization concept [41]; (ii) the study of weight
acquired by plenoptic cameras [25] provide both spatial and initialization [42], [43]; (iii) activation functions [44], such as
angular information as a 4D light field data. Consumer-level Rectified linear unit (R E LU) [45], Leaky ReLU [46], etc., and

Authorized licensed use limited to: Zhejiang Normal University. Downloaded on September 18,2023 at 01:54:39 UTC from IEEE Xplore. Restrictions apply.
SCHIOPU AND MUNTEANU: DEEP-LEARNING-BASED LOSSLESS IMAGE CODING 1831

(iv) stochastic optimization methods [47]; (v) the introduction

of residual learning [48] to reduce the training time; (vi) the
introduction of the inception architecture [49] to reduce the
network complexity; (vii) the design of Generative Adversarial
Networks (GAN S) [50] which benefit of the competition
between two neural networks with adverse goals, many recent
works propose the design of new GAN models to effectively
learn a distribution over images, e.g., [51].
These advances in machine learning found also application
in coding, where we have used machine learning algorithms
as prediction tools in image compression systems. Specifi-
cally, in our prior work, we have proposed a new strategy
for predicting the current pixel by replacing the traditional
state-of-the-art prediction methods with a CNN-based pre-
diction method. In [19], we were the first to prove that
a pixel-wise CNN-based predictor can offer an improved Fig. 1. (a) Conventional coding system for lossless image coding. (b) The
performance compared to the traditional prediction methods, proposed coding approach for lossless image coding, which introduces a
like LOCO-I [6] or GAP [7], for compressing photographic new residual error block based on a deep learning technique and a novel
context-based bit-plane entropy coder. New methods are proposed for the
images. In [21], we proposed a neural network design for blocks marked with a red rectangle.
block-based (macro-pixel based) prediction for lossless com-
pression of lenslet images. In [20], we introduced a dual
prediction method based on the residual-error prediction tech- • a novel Context-based Bit-plane Codec based on a new
nique which offers stunning performance gains of over 30% strategy for generating the context trees for bit-plane
compared to traditional codecs in lossless image compression, coding.
including JPEG-LS [6] and CALIC [7]. Figure 1(b) depicts the new coding approach, where the
In this paper, we propose a new coding approach whereby proposed methods are marked with a red rectangle.
the use of ML tools in lossless image coding is thoroughly In this section, we describe the novelties introduced by
studied. The novelties relative to our prior works include: (i) a the proposed coding approach. Section III-A describes the
novel neural network design, different than [19], [20], [21], proposed CNN-based predictor for residual-error prediction.
is proposed for residual-error prediction in a dual prediction Section III-B presents how the CALIC error context modeling
schema; and (ii) a novel context tree-based bit-plane codec is procedure was adjusted to encode high resolution images
employed for encoding the modeled errors. Moreover, to prove whereby the prediction error is processed to obtain the coded
that the proposed coding approach obtains systematic and error. Section III-C introduces the proposed Context-based
substantial performance gains relative to the state-of-the-art Bit-plane Codec used for encoding the coded error.
for different types of input, including photographic images,
lenslet images, and video frames.
A. Deep Learning-Based Prediction
III. P ROPOSED C ODING A PPROACH In this paper, pixel-wise prediction is computed by employ-
In general, a conventional system for lossless image coding, ing the dual prediction method we have introduced in [20],
depicted in Figure 1(a), consists of three main steps: predic- which stems from the concept of deep residual learning.
tion, error modeling, and entropy coding, and it contains the The main concept of dual prediction [20] is to the update
following methods: the prediction computed by a state-of-the-art method with the
(A) a pixel-wise prediction method is applied to compute the prediction of its residual-error computed by a CNN-based
prediction for the current pixel based on a causal neigh- prediction method in order to obtain an improved prediction.
borhood, where the neighborhood is generally having of Section III-A.1 outlines the dual prediction method
a small size mainly due to the low complexity constraint; with the residual error prediction based on deep learning.
(B) a complex context modeling method of the residual-error Section III-A.2 describes the proposed neural network design.
is applied to capture and exploit the higher-order inter- 1) Dual Prediction Method: Let us introduce the following
pixel dependencies; notations: I, the input image of size r × c; (x, y), the current
(C) a traditional entropy coding technique is employed to position, with x = 1 : r, y = 1 : c; and I (x, y) the current
encode the modeled error. pixel value.
In this paper, we propose a novel coding approach designed In a state-of-the-art prediction method, the causal neighbor-
to achieve an improved coding performance by introducing the hood of I (x, y), denoted by N(x, y), selects a small number
following concepts in the conventional coding system: of neighboring pixels. Figure 2(a) depicts two cases of such
• a novel Residual Error Prediction method based on a a causal neighborhood: N L OC O (x, y) is the causal neighbor-
new neural network design for a deep-learning based hood of three pixels used in the LOCO-I predictor from
pixel-wise prediction based on a large causal neighbor- JPEG-LS [6]; NC AL I C (x, y) is the causal neighborhood of
hood; six pixels used in the GAP predictor from CALIC [7] where

Authorized licensed use limited to: Zhejiang Normal University. Downloaded on September 18,2023 at 01:54:39 UTC from IEEE Xplore. Restrictions apply.
1832 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 30, NO. 7, JULY 2020

Fig. 3. (a) Residual Learning building block [48] (b) Inception layer [49].

that makes use of it and predicts the residual. Therefore,

the LOCO-I predictor was also integrated in the dual pre-
diction method employed in the proposed coding approach.
The novel components of this work that further advance the
performance relative to our initial designs in [19], [20], [21],
Fig. 2. (a) The causal neighborhood N (x, y) of two state-of-the-art prediction are detailed in the following.
methods: N L OC O (x, y) for LOCO-I [6], and NC AL I C (x, y) for GAP [7]. 2) Proposed Network Design: In recent years, the ML
(b) The causal neighborhood Nb (x, y), of size (b+1)×(2b+1) which selects research community has studied new ways for improving
pixels up to the distance of b pixels from the current position. The last b + 1
values in the last row are unknown at the decoder side. the performance of a neural network and has proposed sev-
eral structures of layers with different properties. In this
paper, the proposed neural network design was built based
an edge descriptor is computed based on seven neighboring on the following concepts: the Residual Learning framework
pixels. (R ES L) [48], and the Inception architecture (I) [49].
In a CNN-based prediction method, the causal neighborhood The R ES L framework was first introduced in [48] with
of I (x, y), denoted by Nb (x, y), selects all the neighboring the goal of reducing the training time of deep neural net-
pixels found at a maximum distance of b pixels from the works. ResL offers a solution for the degradation problem
current pixel. More exactly, Nb (x, y) is of size (b+1)×(2b+1) stated in [48], which observes that, as the network depth
and it is used as the input image for the neural network model is increasing, accuracy gets saturated and then it degrades
of the CNN-based method. Figure 2(b) depicts Nb (x, y), and rapidly. Figure 3(a) depicts the R ES L building block proposed
the last b + 1 values in the last row are set to zero since they in [48], where the feed-forward neural network is equipped
contains unknown information at the decoder side. with a “shortcut connection” so that the network can adjust
The dual prediction method is depicted on the left-side the filter weights in the main branch much faster based on the
of Figure 1(b). The proposed method first employs a state- “residual” received from the previous processing block. Note
of-the-art prediction technique on N(x, y) to compute the that one activation layer is processing the output of the first
prediction value I¯(x, y), which yields the residual ε̄(i, j ) = convolution layer and another activation layer is processing
I (i, j )− I¯(i, j ). Secondly, it employs a CNN-based prediction the sum between the output of the second convolution layer
method using Nb (x, y), and computes the residual’s prediction and the output of the previous processing block (the residual).
ε̂(x, y). In the dual prediction method case, the prediction of The Inception architecture was first proposed in [49] for
the current pixel is computed as: improving the utilization of computing resources inside the
Iˆ(x, y) = I¯(x, y) + ε̂(x, y). (1) neural network, by finding out how an optimal local sparse
structure can be approximated and covered by different dense
As shown in Figure 1(b) the prediction error of the current components. Figure 3(b) depicts the Inception layer proposed
pixel is computed as: in [49], where the current feature map is processed in four
different ways using convolution layers of size 1 × 1, 3 × 3,
ε(x, y) = I (x, y) − Iˆ(x, y) = ε̄(x, y) − ε̂(x, y). (2)
5 × 5, and a max pooling layer of size 3 × 3. Note that the
One may notice that the goal of the neural network is convolution layers inside the Inception module use the ReLU
to minimize ||ε(x, y)||2. Hence, the network is able to (i) activation function.
find the cases where the state-of-the-art prediction method In this paper, we utilize the Batch Normalization (BN)
fails, not being able to provide an accurate prediction of the concept which consists in always introducing a batch normal-
current pixel, and (ii) it updates the initial prediction, I¯(x, y), ization layer between a ReLU layer and a dense or convolution
using the predicted residual, ε̂(x, y), to obtain an improved layer. Therefore, based on the (BN) concept, let us denote the
prediction, Iˆ(x, y). following blocks of layers:
In our previous work [20], we have proven that a simple • the Dense Block (DB), as the block of layers containing
prediction method, like the LOCO-I predictor, yields lower one dense layer, followed by one batch normalization
coding performance compared to a dual prediction method layer and a ReLU layer as depicted in Figure 4(a);

Fig. 5. The REP-NN network proposed in [20] (leftmost) and the proposed
network designs: R ES LNN, IR ES LNN and IR ES LNN V.

Fig. 4. (a) Dense Block (DB) structure. (b) Convolutional Block (CB) – in branch 2 the input feature map is processed by
structure. (c) Residual Learning based Block (ResLB) structure. (d) Inception a 3 × 3 convolution layer, while in branch 3 it is
and Residual Learning based Block (IResLB) structure. processed by a 5 × 5 convolution layer;
– in branch 2 and branch 3 a preprocessing step,
consisting in a 3 × 3 convolution layer, is introduced
• the Convolution Block (CB), as the block of layers to reduce the number of parameters in the following
containing one convolution layer, followed by one batch convolution layer, having a halved number of filters;
normalization layer and a ReLU layer as depicted in – all the branches in the IResLB structure are having
Figure 4(b). the same output size and are added to obtain the out-
Moreover, we propose two new blocks of layers based on the put as in the R ES L framework, while in the Inception
BN concept and the two ML paradigms, each used as a base layer a filter concatenation step is introduced.
building block for the network designs proposed in this paper.
The following types of building blocks are proposed: IR ES LB is used to build the Inception and Residual
Learning-based Neural Network (IR ES LNN) depicted
(a) The R ES L building block was modified to obtain the
Residual Learning based Block (R ES LB) with the struc- in Figure 5.
ture of layers depicted in Figure 4(c). One may note that Figure 5 depicts the structure of the proposed new network
branch 1 in R ES LB contains an extra 3 × 3 convolution designs as well as our REP-NN network from [20]. The main
layer compared to the R ES L block so that the neural idea in designing each proposed network was to first process
network can further process the residual. R ES LB is used the input image at the initial resolution with one CB block,
to build the Residual Learning-based Neural Network followed by k = 5 blocks of R ES LB or IR ES LB, and then
(R ES LNN) depicted in Figure 5. to reduce the image resolution twice using a sequence of two
(b) The R ES L and Inception concepts were combined to R ES LB blocks with stride 2. The rest of the model shares
obtain the Inception and Residual Learning based Block similarities with our layout in [20], where the final feature
(IR ES LB) with the structure of layers depicted in vector is processed with a sequence of 11 DB blocks. In the
Figure 4(d). CNN-based architectures depicted in Figure 5, one may note
The main ideas used in designing IR ES LB are that the role of the softmax activation function is to classify
summarized as follows: the input patch into one of the 256 classes set by the last dense
– the residual is processed as in R ES LB by employing layer, and that ε̂(x, y) is set as the index of the class with the
a 3 × 3 convolution layer in branch 1; highest probability.

Authorized licensed use limited to: Zhejiang Normal University. Downloaded on September 18,2023 at 01:54:39 UTC from IEEE Xplore. Restrictions apply.
1834 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 30, NO. 7, JULY 2020

In this paper, we set b = 15 to obtain the causal neigh- error processing method is employed to process ε(x, y) and to
borhood Nb (x, y) with a resolution of 16 × 31. For these obtain the coded error c (x, y) by exploiting the higher-order
settings, the input image patches with resolution of 16×31 are dependencies between neighboring pixels. The proposed con-
processed using N1 = 32 channels, the first reduced resolution text modeling method is inspired from CALIC’s modeling
of 8×16 is processed using N2 = 64 channels, while the final paradigm [7] and focuses on processing prediction errors of
resolution of 4 × 8 is processed using N3 = 128 channels. high resolution images and lenslet images.
The tests have shown that the sequence of 11 DB blocks The goal of the method is to generate a suitable number of
plays an important role in the reduction of network over-fitting. contexts, without diluting them, and to model the residual error
However, for the case of predicting video sequence frames, such that the entropy coder provides high coding efficiency
an improved performance is obtained by employing a network when encoding c (x, y).
design where the DB blocks are removed and a GlobalMax Section III-B.1 describes the context modeling method
layer is introduced instead, as depicted in Figure 5. This partic- employed for computing the context number assigned to each
ular design was denoted IR ES LNN for video (IR ES LNN V). ε(x, y). Section III-B.2 describes the error modeling method
The goal of the CNN-based predictor is to improve the applied to ε(x, y) to obtain c (x, y).
prediction of the residual-errors ε(x, y). The CNN’s input 1) Context Model: Given the current pixel, I (x, y), let us
ε̄(x, y) is a 9-bit input in the range [−255, 255]. We reduce denote the neighboring pixels as: N = I (x − 1, y), W =
the dynamic range of ε̄(x, y) to an 8-bit representation via a I (x, y − 1), N W = I (x − 1, y − 1), N E = I (x − 1, y + 1),
clipping procedure, as follows: W W = I (x, y − 2), N N = I (x − 2, y), N N E = I (x − 2, y +
(i) set to 127 all the errors larger than 127; 1). Moreover, let us denote the prediction value computed by
(ii) set to −128 all the errors smaller than −128; and GAP [7] as IC AL (x, y).
(iii) add 128 to shift the prediction range to [0, 255]. The method computes the current context based on two
Additionally, we set a number of 256 output classes for the types of information: local texture information and local
networks, that is, ε̂(x, y) will be represented on 8-bits. We note energy. The local texture information, denoted by B, is
that the codec remains lossless as the CNN’s output ε̂(x, y) is obtained under the form of local binary pattern information
further used to compute ε(x, y) based on equation (2), which obtained by comparing IC AL (x, y) with the following vector
is further encoded losslessly. of eight local pattern values C = {N, W, N W, N E, N N,
Note that the range of ε̄(x, y) was reduced because errors W W, 2N − N N, 2W − W W }. Therefore, eight binary values
with large absolute values were of a very low frequency, while are generated and B is computed as the 8-bit number formed
the use of a large number of output classes in the dense layers by concatenating these binary values in the order given by C.
will result in a large number of model parameters and memory The local energy information is obtained by first computing
consumption. the local energy and then by quantizing it by employing the
The proposed network configurations were selected after following procedure:
performing complex testing procedures and are based on the (1) evaluate the strength of the local horizontal edges,
following observations: denoted by dh , and vertical edges, denoted by dv , as:

• the input image patches must be processed as much
dh = |W − W W | + |N − N W | + |N − N E|
as possible at the initial resolution rather than at lower (3)
resolution with the drawback of increasing the memory dv = |W − N W | + |N − N N| + |N E − N N E|
consumption and a more complex network model; (2) compute the error energy estimator, , using the edge
• one CB block must be used in processing the input image information and the neighboring prediction errors as
patch before applying an R ES LB or an IR ES LB block, follows:
as recommended in [49];
• the tests have shown that by processing the feature map = dh + dv + ε(x − 1, y) + ε(x, y − 1); (4)
with a convolution layer with a window size larger than
(3) quantize using the set of quantizers Q =
5 × 5 does not improve the performance.
{5, 15, 25, 42, 85, 60, 140} to obtain a 3-bit value,
In all the convolution layers, the input is padded such that
denoted by Q().
the activation map of each filter has the same size as the input,
In [7], the current context number is set as the 10-bit value
except in the case of the CB block with stride 2.
obtained by setting B as its first 8 bits and Q()/2 as its
In Figure 5, one can notice that there is a large difference
last 2 bits. In this paper, the method from [7] was modified as
between the designs of the proposed neural networks and
follows: (i) the local texture information, B, is computed based
REP-NN, since REP-NN was developed as a sequence of CB
on Iˆ(x, y), instead of IC AL ; (ii) the local energy information is
blocks, while the proposed network designs are processing the
computed as Q() instead of Q()/2; (iii) for lenslet images,
input using a sequence of newly introduced building blocks of
a third component is introduced for computing the current
layers with 2 to 3 branches.
context and it contains the subaperture information.
For high resolution images or video frames, the current
B. Error Context Modeling context is set as the 11-bit value obtained by setting B as
The dual prediction method computes for each pixel posi- the first eight bits and Q() as the last three bits. For lenslet
tion (x, y) the prediction error ε(x, y). In this paper, a complex images, the current context is computed based on a third

component which sets an extra eight bits to obtain a context

number with an 19-bit representation. The extra eight bits
are the binary representation of the current pixel position,
(x, y), inside the current macro-pixel. Hence, in this paper,
the number of contexts generated for high resolution images
is increased by using all the information provided by the local
energy information and the use of spatial information in the
case of lenslet images. Fig. 6. The causal neighborhood used for predicting the binary length of the
2) Error Modeling: The proposed method is modeling the currently codded error.
prediction error by first updating ε(x, y) based on the context
number computed by employing the method presented above,
and by remapping its updated value from the [−255, 255] c (x, y) are sufficient to represent c (x, y), and that
range to the [0, 255] range using Iˆ(x, y). they are encoded next starting from the most significant
The prediction error is updated based on the concept that, predicted bit, i msb = k̂ (x, y), until the least significant
since the distribution of the prediction errors in the image bit, i msb = 0.
has a zero mean, then one can impose that the distribution of Hence, c (x, y) is encoded using ξ(x, y) and its binary
the prediction errors for each context should have also a zero i=k̂ (x,y)
representation {bi }i=0 , by employing an adaptive context
mean. In other words, the conditional distributions are also tree coding method [52] for encoding each bit-plane and
zero-mean for each context. Therefore, the updated prediction ξ(x, y). In this paper, we propose a new method for generating
error, (x, y), is computed by adding to ε(x, y) the rounded the context trees. Moreover, two profiles are proposed for CBP
mean of the current context. by using either a 1-pass strategy (FAST profile) or a 2-pass
The proposed method remaps (x, y) based on the com- strategy (SLOW profile) for adapting the nine context trees.
puted prediction value Iˆ(x, y) by comparing it with the middle Section III-C.1 describes the prediction method employed
point of the range of the pixel representation, i.e., 127 in for predicting the length of the binary representation of
our case. Note that the error modeling method from [7] was c (x, y). Section III-C.2 describes the proposed method used
adapted to use Iˆ(x, y) instead of IC AL (x, y). Let us set the for generating a context tree. Section III-C.3 summarizes the
threshold I p as Iˆ(x, y), if Iˆ(x, y) is small (i.e. Iˆ(x, y) ≤ 127), proposed algorithm for encoding c (x, y).
or 255− Iˆ(x, y), if the prediction is large. If |(x, y)| is smaller
1) Binary Length Prediction: Given the current codded
than I p , then the entropy coded error, denoted by c (x, y), is
error, c (x, y), let us denote the number of bits needed to
set as 2|(x, y)| or 2|(x, y)| − 1 depending on the error’s
represent the neighboring prediction errors as: n = k (x −
sign, else it is set as I p + |(x, y)|.
1, y), w = k (x, y − 1), nw = k (x − 1, y − 1), ne =
k (x − 1, y + 1), ww = k (x, y − 2), nn = k (x − 2, y),
C. Context-Based Bit-Plane Codec and nne = k (x − 2, y + 1). Figure 6 depicts the position of
In the final stage of the proposed coding approach c (x, y) the 10 neighbors relative to the current pixel position (x, y).
is encoded by employing the proposed Context-based Bit- Moreover, the method denotes the following sets of neighbors
plane Codec (CBP) detailed in the following. The codec is as follows:
based on the concept of encoding c (x, y) using its binary N10 , as the set of all 10 neighbors, i.e.,
i=k (x,y) •
representation c (x, y) = i=0 bi · 2i , where bi ∈ {0, 1} N10 = {n, w, nw, ne, nn, ww, nww, nnw, nne, nee};
and k (x, y) = 0, 1, . . . , 7 is the (minimum) number of bits • Nh , as the set of neighbors found on rows x − 1 and x,
needed to represent c (x, y). i.e., Nh = {n, w, nw, ne, ww, nww, nee};
In this paper, the length k (x, y) of the binary representation • Nv , as the set of neighbors found on columns y − 1, y,
of c (x, y) is predicted by employing a binary length predic- and y + 1, i.e., Nv = {n, w, nw, ne, nn, nnw, nne};
tion method to compute the prediction of k (x, y), denoted • N6 , as the set of 6 closest neighbors, i.e.,
k̂ (x, y). One may note that k̂ (x, y) should be close enough N6 = {n, w, nw, ne, nn, ww}.
to k (x, y) and should avoid to under-predict k̂ (x, y), i.e., the
constraint k̂ (x, y) ≥ k (x, y) must be satisfied to achieve For each set the maximum value is computed as: k10 =
lossless compression. Therefore, after computing k̂ (x, y), the max(N01 ), kh = max(Nh ), kv = max(Nv ), k6 = max(N6 ).
method checks if the constraint is satisfied and if k̂ (x, y) Furthermore, the strength of the horizontal and vertical
is accepted by encoding a symbol ξ(x, y) for each current edges is computed as follows: sh = |w − ww| + |n − nw| +
position (x, y), where ξ(x, y) is set as follows: |ne − n| + |nw − nww| + |ne − nee|; sv = |w − nw| + |n −
(i) if k̂ (x, y) < k (x, y), the k̂ (x, y) bits are not sufficient nn| + |ne − nne| + |nw − nnw| + |ww − nww|. The strength of
to reconstruct c (x, y) and a symbol ξ(x, y) = 1 is the edges is used to define the following cases for predicting
encoded to signal the decoder that c (x, y) is represented the binary length:
on more the k̂ (x, y) bits and that k̂ (x, y) = 7 bits are (1) flat region, if sh + sv = 0;
encoded next; (2) vertical edge, if sv is above a threshold αv ;
(ii) otherwise, a symbol ξ(x, y) = 0 is encoded to signal (3) horizontal edge, if sh is above a threshold αh ; and
the decoder that the k̂ (x, y) least significant bits of (4) vertical and horizontal edge.

Authorized licensed use limited to: Zhejiang Normal University. Downloaded on September 18,2023 at 01:54:39 UTC from IEEE Xplore. Restrictions apply.
1836 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 30, NO. 7, JULY 2020

context tree depth dT until which the tree can grow. Based on
this first strategy, the context is determined immediately for
each symbol by passing only once through the whole image.
A second strategy is to set a larger value for the maximum
context tree depth, go through the whole image once and
gather the counts for each possible node in the context tree,
and finally prune the tree to obtain the optimal context tree. In
this second strategy, although the context is determined only
Fig. 7. Context template showing the position of the causal neighbors used after a second pass through the image, it has the advantage of
for creating the tree and the corresponding tree depth of the node. finding the optimal context tree for encoding the corresponding
sequence of symbols. However, this always implies a trade-off
between algorithmic complexity and algorithmic performance.
In this paper, we used αh = 2 and αv = 4. Next an interme- In this paper, the pruning process is employing the
diary prediction value, k̄ (x, y), is computed as follows: Krichevsky-Trofimov estimator [53] based on a gamma func-
⎧
⎪
⎪ k6 + 1, if sh + sv = 0; tion implementation to compute the codelength estimation for
⎪
⎨ kv +k10 , if s > α ; encoding the sequence of symbols collected at each node.
v v
k̄ (x, y) = kh +k4 . (5) Based on this more complex method, the context is determined
⎪
⎪ 4 , if sh > αh ;
10
⎪
⎩ only the second time the current position is visited.
k10 , otherwise. Both strategies are investigated and two profiles are pro-
posed: (1) the FAST profile, where the 1-pass strategy
The final binary length prediction, k̂ (x, y), is computed 1p
based on the observation that there is a higher chance that is employed using a maximum tree depth dT ; (2) the
k̄ (x, y) is an under-prediction (i.e., k̄ (x, y) < k (x, y)) in SLOW profile, where the 2-pass strategy is employed using
2p
the case of the least significant bits of c (x, y) compared to a maximum tree depth dT .
the case of the most significant bits. Therefore, k̂ (x, y) is
computed as follows: Algorithm 1 Context-Based Bit-Plane Coding, the FAST
Profile
k̂ (x, y) = k̄ (x, y) + δk (k̄ (x, y)), (6) 1) Apply the dual prediction method from Section III-A.2
where δk is updating k̄ (x, y) and it is defined as follows: and compute Iˆ(x, y) using equation (1).
2) Compute the ε(x, y) using equation (2).
2, if k̄ (x, y) < 3 i=k (x,y)
δk (k̄ (x, y)) = . (7) 3) Compute the εc (x, y) = i=0 bi · 2i by employing
1, otherwise the Context Modeling method described in Section III-
2) Context Tree Modeling: In this paper, the proposed codec B.
is utilizing the following set of nine binary context trees: Tξ 4) Compute k̂ (x, y), using equation (6).
is encoding ξ(x, y), and Ti is encoding bi , the i -th bit in the 5) Set ξ(x, y) by comparing k̂ (x, y) with k (x, y).
i=k̂ (x,y) 6) Encode ξ(x, y) as follows:
binary representation of c (x, y) = bi · 2i . Note 1p
i=0 a) Visit the nodes in Tξ from the root and up to dT
that k̂ (x, y) is reducing the number of symbols encoded in
by using the neighbor corresponding to the index
the last bit-planes since at most k̂ (x, y) < 7 bit-planes are
shown in Figure 7, and compute the current context
sufficient to represent c (x, y).
number.
Figure 7 depicts the template context utilized to generate
b) Encode ξ(x, y) using the counts in the current
each of the nine binary context tree. An index, dT , is assigned
context number.
to each causal neighbor, and represents the tree depth at which
c) Update Tξ .
the current node of the context tree is extended based on the
neighbor with index dT . The nodes in Tξ are set based on 7) From i = k̂ (x, y) and down to i = 0, encode each bit
the values of ξ. The nodes in Ti are set as follows: the nod bi as follows:
1p
at the tree depth dT is set 1 if the neighbor with the index a) Visit the nodes in Ti from the root and up to dT
dT (see Figure 7) is represented using at least i bits, and 0 by using the neighbor corresponding to the index
otherwise. Each context tree is used by an adaptive context shown in Figure 7, and compute the current context
tree method [52] where the current symbol is encoded by the number.
binary arithmetic codec corresponding to the context number b) Encode bi using the counts in the current context
computed using the context tree. number.
In this paper, we adopt the concept of halving the node’s c) Update Ti .
symbol counts every time the sum of symbol counts is
exceeding a halving threshold h 1/2 . The proposed method 3) Algorithmic Details: The FAST profile of the proposed
uses an aggressive strategy of halving the counts after coding approach is summarized in A LGORITHM I. The tests
h 1/2 = 127 symbols. have shown that by setting a large three depth the contexts
There are two strategies that can be used when generating are diluted, while by setting a small three depth the num-
the context tree. One simple strategy is to limit the maximum ber of contexts is too small to obtain a good performance.

Algorithm 2 Context-Based Bit-Plane Coding, the SLOW

Profile
2p
1) Employ the FAST profile using dT and without encod-
ing the current symbols, i.e., without applying steps 6).b)
and 7).b).
2) Compute each node’s codelength and prune Tξ and Ti .
2p
3) Employ the FAST profile using dT .

1p
The context trees generated for dT = 12 obtains in general
good performance and it was selected for our test, however,
1p
other values up to dT = 30 can be used for different type of
images.
The algorithmic description of the SLOW profile is sum-
marized in A LGORITHM II. The tests have shown that by
2p
setting dT = 18 the proposed coding approach obtains a good
performance in a reasonable runtime.
IV. E XPERIMENTAL E VALUATION
A. Experimental Setup Fig. 8. The study of experimental setups based on different training parameter
In this paper, the experimental validation is carried out variations for the set of 68 4K UHD images: (a) slightly different IResLNN
architectures: 10, 11, and 12 DB blocks; (b) different patch sizes: 4 × 7
on three different types of data: photographic image, lenslet (b = 3), 8 × 15 (b = 7), 12 × 23 (b = 11), and 16 × 31 (b = 15);
image, and video frame. The following datasets are used: (c) different batch sizes: 8, 32, and 4, 000 patches; (d) different training set
(1) The dataset of 68 4K UHD grayscale images randomly sizes: 1M, 5M, and 10M patches.
selected from [54], with a resolution of 3840 × 2160.
(2) The EPFL Light Field dataset [55], available online [56],
which contains 118 unfocused lenslet images captured by
the Lytro camera in the RGB colormap representation.
The resolution of the microlens matrix is 625 × 434 and
the resolution of a macro-pixel is 15 × 15.
(3) The dataset of seven video sequences from the Ultra
Video Group from Tampere University of Technology,
denoted here UVG-TUT, and available online [57]. The
experimental testing is executed on the frame resolution
of 1920 × 1080 and the compression results are reported
Fig. 9. Comparison between the single and dual prediction methods for the
only for the Y channel. set of 68 UHD images: (single) the single-stage prediction method based on
One may note that one grayscale matrix is encoded for the the IResLNN predictor; (dual-Proposed) the proposed dual prediction method,
photographic image case, three color matrices: R, G, B are where the LOCO-I predictor is employed in the first stage and the IResLNN
predictor in the second stage. (a) Relative compression results. (b) Comparison
encoded for the lenslet image case, and one luminance matrix between the absolute error of (single) and (dual-Proposed) prediction methods.
is encoded for the video frame case. Hence, not only that three
different types of data are tested, but also three different types
of image colormap representations. One may note that a neural remind that, in our work, we are using a 90% − 10% ratio
network model must be trained for each type of data, for each for splitting the 10M patches into training − validation data,
color channel, and for each resolution. and the learning rate is decreased progressively as follows. If
The proposed deep-learning based image codec is designed we denote the learning rate at epoch i as ηi , then ηi+1 is set
i
for lossless compression applications and its performance as ηi+1 = ( f d ) ns · ηi , ∀i = 1, 2, . . . , 32, where f d = 0.2 is
is assessed on still pictures, lenslet image data and video the decay rate, n s = 5 is the decay step, and η1 = 5 · 10−4 is
frames. One notes that the compression performance for the the learning rate at the first epoch.
latter type of data can be improved by employing different The above training procedure was proposed after testing
inter-prediction techniques. Adapting the proposed codec to the proposed method in a complex set of experimental setups
employ lossless inter-prediction is beyond the scope of this where different training parameter variations were studied.
paper. Figure 8 show relative compression results (see eq. (8) below)
The proposed neural network models (R ES LNN, for the set of 68 4K UHD images for the IResLNN predictor
IR ES LNN, and IR ES LNN V) were trained during 32 epochs when considering the following training parameter variations:
and using a batch size of 4000 patches of size 16 × 31. (a) slightly different IResLNN architectures (between 10 and
A number of 10 million (10M) patches are randomly selected 12 DB blocks); (b) different patch sizes (between 4 × 7 and
for each type of data from the selected training images. We 16 × 31 patch size); (c) different batch sizes (between 8

Authorized licensed use limited to: Zhejiang Normal University. Downloaded on September 18,2023 at 01:54:39 UTC from IEEE Xplore. Restrictions apply.
1838 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 30, NO. 7, JULY 2020

TABLE I
L OSSLESS C OMPRESSION R ESULTS FOR THE T EST S ET (64 P HOTOGRAPHIC I MAGES )

T RAINING set. Since 2.5M patches are randomly selected

from each training image, it results that 30.14% of the data in
the T RAINING set was selected for training.
For this type of data the tests have shown that there is
no significant difference between the performance of the two
profiles for this type of data, therefore, the CBP codec was
tested only under the FAST profile. The performance of the
following methods is compared:
(M1) the JPEG-LS codec [6];
(M2) the CALIC codec [7];
(M4) the FLIF codec [23];
(M4) REP-CNN predictor [20] and Reference Codec [20];
(M5) REP-CNN predictor [20] and CBP codec;
(M6) ResLNN predictor and CBP codec;
(M7) IResLNN predictor and CBP codec.
Fig. 10. Relative compression (RC) results for photographic images. Moreover, the methods listed above have the following
signification: M1 and M2 are the traditional lossless image
compression codecs; M3 is a more recent lossless image
and 4, 000 patches); (d) different training set sizes (between
compression codec; M4 is the solution proposed in [20];
1M and 10M patches). The proposed deep-learning based
M5 shows the performance gain of the proposed CBP codec
predictor together with the proposed training procedure yield
compared to the entropy coder from [20] called Reference
the best performance in all the tested experimental setups.
Codec; and M6 and M7 are the two solutions obtained
Experimental results show that increasing the patch size leads
using the proposed coding approach, where M6 is employing
to an improved coding performance at the expense of a larger
ResLNN for predicting the residual-error prediction and M7 is
inference time. The proposed 16 × 31 patch size offers a good
employing IResLNN.
performance-complexity trade-off; larger patch sizes lead to
The Relative Compression (RC) metric is used to compare
prohibitively large training times.
the compression results of the six methods relative to M7,
As shown in Figure 9, when employing the IResLNN
where the RC result for a method MX is computed as follows:
model, the dual prediction method achieves 13.83% average
improvement compared to the single-stage deep-learning based B RMX
RCMX = . (8)
prediction method. These results corroborate with those of B RM6
our similar study in [20] where 8.1% average improvement
One can notice that the improved performance of M7 over the
is achieved when employing the REP-CNN model relative to
method MX can be computed as 1 − RCMX .
a single-prediction method.
Figure 10 shows the RC results for each image in the dataset
The training of all neural network models used for pre-
of 68 4K UHD images. One can notice that:
diction was done on a Nvidia Titan X GPU with a Pascal
architecture, a frame buffer of 12GB of video RAM (vRAM) • M5 has an improved performance compared to M4 and
of GDDR5X memory [58]. The experiments were carried out it obtains better results for all the images in the dataset;
on a machine with Intel Xeon Processor E5-2620 v3 • M6 and M7, the two coding solutions based on the pro-
@ 2.40GHz, with 64 GB of RAM memory and running a posed coding approach, outperform all the other methods
Windows 10 Enterprise operating system. for all the images in the dataset;
• the IR ES LNN model offers a small increase in perfor-
B. Experimental Results and Analysis mance compared to the R ES LNN model.
1) Photographic Images: The set of 68 4K UHD images Table I shows the results for the T EST set using the RC
was divided into the T RAINING set of 4 images and the T EST and bits per pixels (bpp) metrics. One can notice that:
set of 64 images. The R ES LNN and IR ES LNN networks • M7, has an improved performance of 10.5% compared to
were trained on the 10M patches randomly selected from the M3, the method proposed in [20];

TABLE II
L OSSLESS C OMPRESSION R ESULTS FOR THE L ENSLET I MAGES F ROM THE EPFL D ATASET [56]

• CBP has an improved performance of 4.5% over the

entropy coder from [20];
• M7 outperforms the JPEG-LS codec with 59.3%
• M7 outperforms the CALIC codec with 54.8%;
• M7 outperforms the FLIF codec with 45.1%;
• IR ES LNN offers a 1% improvement compare to
R ES LNN.
Based on these results, the IR ES LNN model was selected
for the proposed coding approach and the method which
integrates the IResLNN predictor and the CBP codec was
denoted CBPNN.
2) Results on Lenslet Images: The experimental evaluation
for lenslet image data type was carried out on the 118 plenoptic
images from the EPFL dataset. The images are in the RG B
colormap, with a 16-bit representation, and are stored in a
MATLAB file using a 5-dimensional light field structure,
denoted L F, of size 15 × 15 × 625 × 434 × 3. In the proposed Fig. 11. Relative compression (RC) results for the lenslet images from EPFL
dataset [56].
experiments, only the most significant 8 bit-planes of the
images are encoded. The light field structure L F is rearranged
to form the lenslet image, denoted L L, as follows: TABLE III
T RAINING S ET S EQUENCES F ROM [59]
L L((k −1)N +1 : k N, ( −1)N +1 : N, c) = L F(:, :, k, , c),
(9)

where N = 15 is the size of the macro-pixel; (k, ) is the

position in the microlens matrix, k = 1 : 625 and = 1 : 434;
and c = 1 : 3 is the color channel.
The EPFL dataset was divided into the same two sets of
images as in [21]. The T RAINING set contains 10 images
selected from each of the 10 categories found in the dataset
so that each category will have equal weights in the training.
While the T EST set contains the remaining 108 images from
the EPFL dataset.
One network model is trained for each color channel R, G,
and B. Therefore, three sets of 10M patches are gener-
ated by randomly selecting 1M patches from each training
image. Since each lenslet images has an image resolution of
(625 · 15) × (434 · 15) and a number of 61, 031, 250 pixels,
it results that only 1.64% of the training data was used for (iv) the MP-CNN predictor [21];
training. Each color channel is encoded separately and the (v) the proposed CBPNN method.
corresponding IR ES LNN network is employed for prediction. Figure 11 shows the relative compression results for each
The performance of the following methods is compared: image in EPFL dataset. One can notice that CBPNN outper-
(i) the JPEG-LS codec [6]; forms all the other methods for all the images in the dataset.
(ii) the CALIC codec [7]; Table II shows the results for the T EST set using the RC and
(iii) the FLIF codec [23]; bits per pixels (bpp) metrics, where one can notice that:

Authorized licensed use limited to: Zhejiang Normal University. Downloaded on September 18,2023 at 01:54:39 UTC from IEEE Xplore. Restrictions apply.
1840 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 30, NO. 7, JULY 2020

TABLE IV
B ITRATE R ESULTS AND I MPROVEMENT (%) C OMPARE TO L OSSLESS HEVCI NTRA FOR THE UVG-TUT D ATASET [57]

• CBPNN has an improved performance of 13.7% over the (2) the JPEG-LS codec [6];
MP-CNN predictor [21]; (3) the FLIF codec [23];
• CBPNN outperforms the JPEG-LS codec with 35.4%; (4) the CALIC codec [7];
• CBPNN outperforms the CALIC codec with 31.3%. (5) CBPNN V running under the FAST profile;
• CBPNN outperforms the FLIF codec with 10.6%. (6) CBPNN V running under the SLOW profile.
3) Results on Video Frames: The IR ES LNN V model was Table IV shows the compression results of the five methods
trained to predict the Y channel of the frames in the in bpp, and the improvement compare to Lossless HEVCIntra,
UVG-TUT dataset [57] with video resolution of 1920×1080. denoted C R and computed for the MX method as follows:
The set of 10M patches was selected from the set of B RMX
15 training video sequences presented in Table III, available C R = 1 − . (10)
B RLosslessHEVCIntra
online [59]. An equal number of patches was allocated for
each sequence and 4 frames were randomly selected from each Note that the best and second best performance is marked
sequence. Therefore, only 8.03% of the patches found in the with bold. One can notice that the proposed codec CBPNN V
training dataset were collected for training. has an improved average performance compare to the state-
The video sequences used for training are completely dif- of-the-art methods. Lossless HEVCI NTRA is outperformed by
ferent than the video sequences used for testing. They were CBPNN V with the FAST profile with 19.82% and with the
acquired with a different generation of camera sensors and are SLOW profile with 20.12%.
showing different type of content, compared to the UVG-TUT
dataset. The UVG-TUT dataset was captured using a camera C. Complexity Analysis
sensor developed based on latest technologies and it contains The goal of this paper was to propose a new coding
seven video sequences with a better video quality than the 15 approach which employs deep learning-based prediction. The
training video sequences from [59]. proposed neural network design was developed with the goal
The set of 10M patches was collected based on the idea that of obtaining improved compression results compared to the
it must contain patches from all available video sequences, state-of-the-art algorithms.
having the target resolution of the predicted frame. If available, In our experiments, to compute the pixel-wise predic-
we recommend the use an even larger training set. tion for one UHD grayscale image, with a 3840 × 2160
Note that to encode a video sequence having a different res- resolution, the neural network is inferred using a total of
olution than 1920 × 1080, one must train another IR ES LNN V 8, 294, 400 patches. The current inference time on a machine
model using a different set of 10M patches. The set must be equipped with an NVIDIA Titan X GPU is around 12 minutes,
collected from a different set of training video sequences than and depends on the available VRAM memory, the machine’s
the one presented Table III, where each video sequence was RAM memory, the programming language and deep learning
captured at the requested resolution. framework used, and on the software implementation. In this
For encoding video frames, the proposed method is called paper, for the set of 68 4K UHD images, the total inference
CBPNN V and it is based on the proposed coding approach runtime is around 14 hours. The runtime of the proposed CBP
where IR ES LNN V is employed for predicting the residual- entropy codec is negligible compared to the inference time.
error. CBPNN V was tested under both profiles: FAST and One may notice that a deep learning-based solution will
SLOW. The experimental evaluation compares the perfor- always have high runtime when compared to the state-of-the-
mance of the following methods: art algorithms which were specially developed to have a low
(1) Lossless HEVCIntra [9] with the x.265 complexity. However, the runtime for the network inference
implementation [60], configured to run in the lossless can be reduced by using a smaller causal neighborhood or
mode, veryslow preset, and using only intra prediction. by applying specific methods for reducing the complexity of
The following parameters are passed: network inference. In recent years, the research community has
--preset veryslow --keyint 1 offered different solutions such as running a threshold-based
--input-csp 0 --lossless --psnr algorithm by which the filter’s weights are set to zero if they

are below a threshold, by employing a method for network [10] M. Zhou, W. Gao, M. Jiang, and H. Yu, “HEVC lossless coding and
training which is constraining the filter’s weights to have a improvements,” IEEE Trans. Circuits Syst. Video Technol., vol. 22,
no. 12, pp. 1839–1843, Dec. 2012.
sparse representation, etc. [11] J. Y. Cheong and I. K. Park, “Deep CNN-based super-resolution using
In our future work, we are planning to study how to external and internal examples,” IEEE Signal Process. Lett., vol. 24,
reduce the complexity of network inference. We will study no. 8, pp. 1252–1256, Aug. 2017.
[12] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with
the network performance after applying small changes to the deep neural networks,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
proposed design and means to decrease the complexity without Lake Tahoe, NV, USA, vol. 1, 2012, pp. 341–349.
diminishing the coding performance. [13] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a
single image using a multi-scale deep network,” in Proc. Adv. Neural
V. C ONCLUSIONS Inf. Process. Syst., Montreal, QC, Canada, Dec. 2014, pp. 2366–2374.
[Online]. Available: https://papers.nips.cc/paper/5539-depth-map-
The paper proposes a new coding approach for lossless prediction-from-a-single-image-using-a-multi-scale-deep-network
image coding. The approach employs a deep learning-based [14] F. Liu, S. Chunhua, and L. Guosheng, “Deep convolutional neural fields
approach for computing the residual-error for a dual prediction for depth estimation from a single image,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit. (CVPR), Boston, MA, USA, 2015, pp. 5162–5170.
method and an entropy coder performing context-based bit- [Online]. Available: https://ieeexplore.ieee.org/document/7299152
plane coding to encode the residuals. A new neural network [15] N. K. Kalantari, T.-C. Wang, and R. Ramamoorthi, “Learning-based
design built on the ML concepts of R ES L framework and view synthesis for light field cameras,” ACM Trans. Graph., vol. 35,
no. 6, 2016, Art. no. 193.
the Inception architecture was proposed together with a new [16] G. Toderici et al., “Full resolution image compression with recurrent
method for generating binary context trees. Moreover, a state- neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog-
of-the-art error modeling method was proposed to encode high nit. (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 5435–5443.
resolution images. The experimental validation is carried out [17] J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational
image compression with a scale hyperprior,” in Proc. Int. Conf. Learn.
on three different types of data: photographic image, lenslet Represent. (ICMR), Vancouver, BC, Canada, 2018, pp. 1–23.
image, and video sequences. [18] J. Li, B. Li, J. Xu, R. Xiong, and W. Gao, “Fully connected network-
The experimental results show that the proposed approach based intra prediction for image coding,” IEEE Trans. Image Process.,
vol. 27, no. 7, pp. 3236–3247, Jul. 2018.
systematically and substantially outperforms state-of-the-art [19] I. Schiopu, Y. Liu, and A. Munteanu, “CNN-based prediction for lossless
methods for all the images and for all the types of data tested: coding of photographic images,” in Proc. Picture Coding Symp. (PCS),
• For the photographic images, the JPEG-LS codec is San Francisco, CA, USA, Jun. 2018, pp. 16–20.
[20] I. Schiopu and A. Munteanu, “Residual-error prediction based on deep
outperformed in average with 59.3%, the CALIC codec learning for lossless image compression,” Electron. Lett., vol. 54, no. 17,
is outperformed in average with 54.8%, and the FLIF pp. 1032–1034, Aug. 2018.
codec is outperformed in average with 45.1%. [21] I. Schiopu and A. Munteanu, “Macro-pixel prediction based on convo-
lutional neural networks for lossless compression of light field images,”
• For the lenslet images, the JPEG-LS codec is outper-
in Proc. 25th IEEE Int. Conf. Image Process. (ICIP), Athens, Greece,
formed in average with 35.4%, the CALIC codec is Oct. 2018, pp. 445–449.
outperformed in average with 31.3%, the FLIF codec is [22] Digital Compression and Coding of Continuous Tone Still
outperformed in average with 10.6%. Images—Requirements and Guidelines, Standard ITU Rec. T.81,
ISO/IEC 10918-1, Sep. 1993.
• For the video frames, the HEVC standard is outperform [23] J. Sneyers and P. Wuille, “FLIF: Free lossless image format based on
in average with 20.12% on the UVG-TUT dataset. MANIAC compression,” in Proc. IEEE Int. Conf. Image Process. (ICIP),
Phoenix, AZ, USA, Sep. 2016, pp. 66–70.
R EFERENCES [24] J. Sneyers and P. Wuille. FLIF Website. [Online]. Available: https://flif.
info
[1] S. Chandra and W. W. Hsu, “Lossless medical image compression in a [25] G. Lippmann, “Épreuves réversibles donnant la sensation du relief,”
block-based storage system,” in Proc. Data Compress. Conf., Snowbird, J. Phys., vol. 7, no. 4, pp. 821–825, 1908.
UT, USA, Mar. 2014, p. 400.
[26] E. H. Adelson and J. Y. A. Wang, “Single lens stereo with a plenoptic
[2] L. F. R. Lucas, N. M. M. Rodrigues, L. A. da Silva Cruz, and
camera,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 2,
S. M. M. de Faria, “Lossless compression of medical images using 3-D
pp. 99–106, Feb. 1992.
predictors,” IEEE Trans. Med. Imag., vol. 36, no. 11, pp. 2250–2260,
[27] R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan,
Nov. 2017.
[3] H. Wu, X. Sun, J. Yang, W. Zeng, and F. Wu, “Lossless compression “Light field photography with a hand-held plenoptic camera,” Dept.
Comput. Sci., Stanford Univ., Stanford, CA, USA, 2005, pp. 1–11.
of JPEG coded photo collections,” IEEE Trans. Image Process., vol. 25,
no. 6, pp. 2684–2696, Jun. 2016. [28] A. Lumsdaine and T. Georgiev, “The focused plenoptic camera,” in
[4] V. Trivedi and H. Cheng, “Lossless compression of satellite image sets Proc. IEEE Int. Conf. Comput. Photography, San Francisco, CA, USA,
using spatial area overlap compensation,” in Image Analysis and Recog- Apr. 2009, pp. 1–8.
nition. M. Kamel and A. Campilho, Eds. Berlin, Germany: Springer, [29] C. Perwaß and L. Wietzke, “Single lens 3D-camera with extended depth-
2011, pp. 243–252. of-field,” Proc. SPIE, vol. 8291, pp. 829108-1–829108-15, Feb. 2012.
[5] G. Yu, T. Vladimirova, and M. N. Sweeting, “Image compression [Online]. Available: https://www.spiedigitallibrary.org/conference-proc
systems on board satellites,” Acta Astronautica, vol. 64, pp. 988–1005, eedings-of-spie/8291/829108/Single-lens-3D-camera-with-extended-dep
May/Jun. 2009. th-of-field/10.1117/12.909882
[6] M. J. Weinberger, G. Seroussi, and G. Sapiro, “The LOCO-I lossless [30] C. Perra, “Lossless plenoptic image compression using adaptive block
image compression algorithm: Principles and standardization into differential prediction,” in Proc. IEEE Int. Conf. Acoust., Speech Signal
JPEG-LS,” IEEE Trans. Image Process., vol. 9, no. 8, pp. 1309–1324, Process., Brisbane, QLD, Australia, Apr. 2015, pp. 1231–1234.
Aug. 2000. [31] I. Schiopu, M. Gabbouj, A. Gotchev, and M. M. Hannuksela, “Lossless
[7] X. Wu and N. Memon, “Context-based, adaptive, lossless image coding,” compression of subaperture images using context modeling,” in
IEEE Trans. Commun., vol. 45, no. 4, pp. 437–444, Apr. 1997. Proc. 3DTV Conf., True Vis.-Capture, Transmiss. Display 3D Video,
[8] High efficiency video coding, International Organization for Standard- Copenhagen, Denmark, Jun. 2017, pp. 1–4.
ization, document ISO/IEC 23008-2, ITU-T Rec. H.265, Dec. 2013. [32] P. Helin, P. Astola, B. Rao, and I. Tabus, “Minimum description
[9] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the length sparse modeling and region merging for lossless plenoptic image
high efficiency video coding (HEVC) standard,” IEEE Trans. Circuits compression,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 7,
Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012. pp. 1146–1161, Oct. 2017.

Authorized licensed use limited to: Zhejiang Normal University. Downloaded on September 18,2023 at 01:54:39 UTC from IEEE Xplore. Restrictions apply.
1842 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 30, NO. 7, JULY 2020

[33] J. M. Santos, P. A. A. Assuncao, L. A. da Silva Cruz, L. Távora, [52] I. Schiopu, “Depth-map image compression based on region and
R. Fonseca-Pinto, and S. M. M. Faria, “Lossless light-field compres- contour modeling,” M.S. thesis, Tampere Univ. Technol., Tampere,
sion using reversible colour transformations,” in Proc. 7th Int. Conf. Finland, Jan. 2016, vol. 1360. [Online]. Available: https://tutcris.tut.fi/
Image Process. Theory, Tools Appl. (IPTA), Montreal, QC, Canada, portal/en/publications/depthmap-image-compression-based-on-region-an
Nov./Dec. 2017, pp. 1–6. d-contour-modeling(913b8afb-a1ad-44df-b399-61c7756d5ef9)/export.h
[34] C. Conti, J. Lino, P. Nunes, L. D. Soares, and P. L. Correia, “Improved tml
spatial prediction for 3D holoscopic image and video coding,” in Proc. [53] R. Krichevsky and V. Trofimov, “The performance of universal encod-
19th Eur. Signal Process. Conf., Barcelona, Spain, Aug./Sep. 2011, ing,” IEEE Trans. Inf. Theory, vol. 27, no. 2, pp. 199–207, Mar. 1981.
pp. 378–382. [54] 4K UHD Photographic Images. Accessed: Aug. 25, 2017. [Online].
[35] C. Perra and P. Assuncao, “High efficiency coding of light field images Available: http://www.ultrahdwallpapers.net/nature
based on tiling and pseudo-temporal data arrangement,” in Proc. IEEE [55] M. Rerabek and T. Ebrahimi, “New light field image dataset,” in Proc.
Int. Conf. Multimedia Expo Workshops, Seattle, WA, USA, Jul. 2016, 8th Int. Conf. Qual. Multimedia Exper., Lisbon, Portugal, 2016, pp. 1–2.
pp. 1–4. [56] JPEG Pleno Database: EPFL Light-Field Data Set. Accessed:
[36] D. Liu, L. Wang, L. Li, Z. Xiong, F. Wu, and W. Zeng, “Pseudo- Mar. 1, 2017. [Online]. Available: https://jpeg.org/plenodb/lf/epfl
sequence-based light field image compression,” in Proc. IEEE Int. Conf. [57] Ultra Video Group. Tampere University of Technology. Test Sequences.
Multimedia Expo Workshops, Seattle, WA, USA, Jul. 2016, pp. 1–4. Accessed: Jul. 1, 2018. [Online]. Available: http://ultravideo.cs.tut.
[37] L. Li, Z. Li, B. Li, D. Liu, and H. Li, “Pseudo sequence based 2-D fi/#testsequences
hierarchical coding structure for light-field image compression,” in Proc. [58] Nvidia. Titan X Specifications. Accessed: Sep. 1, 2018. [Online]. Avail-
Data Compress. Conf., Snowbird, UT, USA, Apr. 2017, pp. 131–140. able: https://www.nvidia.com/en-us/geforce/products/10series/titan-x-
[38] T. Ebrahimi, P. Schelkens, and F. Pereira. ICME 2016 Grand pascal.
Challenge: Light-Field Image Compression. Accessed: Mar. 1, 2017. [59] Xiph.Org Foundation. Video Test Media. Accessed: Jul. 1, 2018.
[Online]. Available: https://mmspg.epfl.ch/meetings/page-71686-en- [Online]. Available: https://media.xiph.org/video/derf
html/icme2016grandchallenge_1/ [60] MulticoreWare. x265 Source Code, Version 2.7. Accessed: May 4, 2018.
[39] T. Ebrahimi, F. Pereira, P. Schelkens, and S. Foessela, Grand Challenges: [Online]. Available: https://bitbucket.org/multicoreware/x265/downloads
Light Field Image Coding. Accessed: Mar. 1, 2017. [Online]. Available:
http://www.2017.ieeeicip.org/GrandChallenges.html
[40] R. Zhong, I. Schiopu, B. Cornelis, S.-P. Lu, J. Yuan, and A. Munteanu,
“Dictionary learning-based, directional, and optimized prediction for
lenslet image coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 29,
no. 4, pp. 1116–1129, Apr. 2019.
[41] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep Ionut Schiopu (M’13) received the B.Sc. degree
in automatic control and computer science and the
network training by reducing internal covariate shift,” in Proc. Int.
Conf. Mach. Learn., Lille, France, Feb. 2015, pp. 448–456. [Online]. M.Sc. degree in advanced techniques in systems and
Available: https://dl.acm.org/citation.cfm?id=3045118.3045167 signals from the Politehnica University of Bucharest,
Romania, in 2009 and 2011, respectively, and the
[42] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” in Proc. 13th Int. Conf. Artif. Intell. Ph.D. degree from the Tampere University of Tech-
Statist., Sardinia, Italy, May 2010, pp. 249–256. nology (TUT), Finland, in 2016. From 2016 to 2017,
[43] K. He, X. Zhang, S. Ren, and J. Sun. (Feb. 2015). “Delving deep into he was a Post-Doctoral Researcher with TUT. Since
2017, he has been a Post-Doctoral Researcher with
rectifiers: Surpassing human-level performance on imagenet classifica-
tion.” [Online]. Available: https://arxiv.org/abs/1502.01852 Vrije Universiteit Brussel, Belgium. His research
[44] F. Agostinelli, M. Hoffman, P. Sadowski, and P. Baldi, “Learn- interests are the design and optimization of machine
learning tools for image and video coding applications, view synthesis, entropy
ing activation functions to improve deep neural networks,” CoRR,
vol. abs/1412.6830, pp. 1–9, Dec. 2014. [Online]. Available: coding based on context modeling, and image segmentation for coding.
https://arxiv.org/abs/1412.6830
[45] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
Boltzmann machines,” in Proc. 27th Int. Conf. Int. Conf. Mach.
Learn. (ICML), Washington, DC, USA, 2010, pp. 807–814.
[46] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities
improve neural network acoustic models,” in Proc. Int. Conf. Mach. Adrian Munteanu (M’07) received the M.Sc.
Learn. (ICML), Atlanta, GA, USA, 2013, pp. 1–3. degree in electronics and telecommunications from
[47] D. P. Kingma and J. Ba. (Dec. 2014). “Adam: A method for stochastic the Politehnica University of Bucharest, Romania,
optimization.” [Online]. Available: https://arxiv.org/abs/1412.6980 in 1994, the M.Sc. degree in biomedical engineering
[48] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for from the University of Patras, Greece, in 1996, and
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. the Ph.D. degree (magna cum laude) in applied
(CVPR), Las Vegas, NV, USA, 2016, pp. 770–778. [Online]. Available: sciences from Vrije Universiteit Brussel, Belgium,
https://ieeexplore.ieee.org/document/7780459 in 2003. From 2004 to 2010, he was a Post-Doctoral
[49] C. Szegedy et al., “Going deeper with convolutions,” in Proc. Fellow with the Fund for Scientific Research Flan-
IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Boston, MA, ders (FWO), Belgium. Since 2007, he has been a
USA, 2015, pp. 1–9. [Online]. Available: https://ieeexplore.ieee.org/ Professor with the Department of Electronics and
document/7298594 Informatics (ETRO), Vrije Universiteit Brussel (VUB), Belgium. He has
[50] I. J. Goodfellow et al., “Generative adversarial networks,” in Proc. authored more than 300 journals and conference publications, book chapters,
Adv. Neural Inf. Process. Syst., Montreal, QC, Canada, Jun. 2014, and contributions to standards and holds seven patents in image and video
pp. 2672–2680. [Online]. Available: https://papers.nips.cc/book/advances- coding. His research interests include image, video, and 3D graphics coding,
in-neural-information-processing-systems-27-2014 distributed visual processing, 3D graphics, error-resilient coding, multimedia
[51] A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, transmission over networks, and statistical modeling. He was a recipient of the
and K. Kavukcuoglu. (Jun. 2016). “Conditional image generation 2004 BARCO-FWO Prize for his Ph.D. work and several prizes and scientific
with PixelCNN decoders.” [Online]. Available: https://arxiv.org/abs/ awards in international journals and conferences. He served as an Associate
1606.05328 Editor for the IEEE T RANSACTIONS ON M ULTIMEDIA.

Authorized licensed use limited to: Zhejiang Normal University. Downloaded on September 18,2023 at 01:54:39 UTC from IEEE Xplore. Restrictions apply.

Medical Images Compression and Decompression Using Neural Networks
No ratings yet
Medical Images Compression and Decompression Using Neural Networks
5 pages
Deep-Learning Based Lossless Image Coding
No ratings yet
Deep-Learning Based Lossless Image Coding
14 pages
Deep Lossy Plus Residual Coding For Lossless and Near-Lossless Image Compression
No ratings yet
Deep Lossy Plus Residual Coding For Lossless and Near-Lossless Image Compression
18 pages
Learned Lossless Image Compression With Combined Channel-Conditioning Models and Autoregressive Modules
No ratings yet
Learned Lossless Image Compression With Combined Channel-Conditioning Models and Autoregressive Modules
8 pages
Applied Sciences: An End-to-End Deep Learning Image Compression Framework Based On Semantic Analysis
No ratings yet
Applied Sciences: An End-to-End Deep Learning Image Compression Framework Based On Semantic Analysis
13 pages
Preprints202403 1272 v1
No ratings yet
Preprints202403 1272 v1
37 pages
Learning-Driven Lossy Image Compression A Comprehensive Survey
No ratings yet
Learning-Driven Lossy Image Compression A Comprehensive Survey
14 pages
Improving Inference For Neural Image Compression
No ratings yet
Improving Inference For Neural Image Compression
17 pages
Learning End-to-End Lossy Image Compression: A Benchmark: Yueyu Hu, Wenhan Yang, Zhan Ma, and Jiaying Liu
No ratings yet
Learning End-to-End Lossy Image Compression: A Benchmark: Yueyu Hu, Wenhan Yang, Zhan Ma, and Jiaying Liu
18 pages
Hierarchical Prediction and Context Adaptive Coding For Lossless Color Image Compression
No ratings yet
Hierarchical Prediction and Context Adaptive Coding For Lossless Color Image Compression
5 pages
FLLIC: Functionally Lossless Image Compression: Xi Zhang and Xiaolin Wu
No ratings yet
FLLIC: Functionally Lossless Image Compression: Xi Zhang and Xiaolin Wu
10 pages
Lossless Data Compression Using Neural Networks
No ratings yet
Lossless Data Compression Using Neural Networks
5 pages
A Universal Optimization Framework For Learning-Based Image Codec
No ratings yet
A Universal Optimization Framework For Learning-Based Image Codec
19 pages
Dual Autoencoder-Based Framework For Image Compression and Decompression
No ratings yet
Dual Autoencoder-Based Framework For Image Compression and Decompression
9 pages
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
On Combining Denoising With Learning-Based Image Decoding
No ratings yet
On Combining Denoising With Learning-Based Image Decoding
14 pages
Video Compression With Rate-Distortion Autoencoders
No ratings yet
Video Compression With Rate-Distortion Autoencoders
14 pages
Variable-Rate Deep Image Compression With Vision Transformers
No ratings yet
Variable-Rate Deep Image Compression With Vision Transformers
12 pages
Lossless Image Compression Algorithm For Transmitting Over Low Bandwidth Line
No ratings yet
Lossless Image Compression Algorithm For Transmitting Over Low Bandwidth Line
6 pages
Lossless Recompression of JPEG Images Using Transform Domain Intra Prediction
No ratings yet
Lossless Recompression of JPEG Images Using Transform Domain Intra Prediction
12 pages
Lossless Embedded Compression Algorithm With Context-Based Error Compensation For Video Application
No ratings yet
Lossless Embedded Compression Algorithm With Context-Based Error Compensation For Video Application
4 pages
Onboard Deep Lossless and Near-Lossless Predictive Coding of Hyperspectral Images With Line-Based Attention
No ratings yet
Onboard Deep Lossless and Near-Lossless Predictive Coding of Hyperspectral Images With Line-Based Attention
10 pages
wg1n90021-REQ-JPEG AI Use Cases and Requirements
No ratings yet
wg1n90021-REQ-JPEG AI Use Cases and Requirements
7 pages
A Segmented Wavelet Inspired Neural Network Approach To Compress Images
No ratings yet
A Segmented Wavelet Inspired Neural Network Approach To Compress Images
11 pages
GT Img Compr
No ratings yet
GT Img Compr
87 pages
InstructPix2Pix in Practice: The Complete Guide for Developers and Engineers
From Everand
InstructPix2Pix in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
Transformer-Based Image Compression
No ratings yet
Transformer-Based Image Compression
10 pages
Smart Camera: Revolutionizing Visual Perception with Computer Vision
From Everand
Smart Camera: Revolutionizing Visual Perception with Computer Vision
Fouad Sabry
No ratings yet
Exploring The Effectiveness of Deep Learning in Audio Compression and Restoration
No ratings yet
Exploring The Effectiveness of Deep Learning in Audio Compression and Restoration
5 pages
Video Compression by Neural Networks
No ratings yet
Video Compression by Neural Networks
33 pages
IKIN - Diffusion Based Compression v1.8
No ratings yet
IKIN - Diffusion Based Compression v1.8
14 pages
Spherical Coding Algorithm For Wavelet Image Compression
No ratings yet
Spherical Coding Algorithm For Wavelet Image Compression
10 pages
Deep Convolutional Autoencoder-Based Lossy Image Compression
No ratings yet
Deep Convolutional Autoencoder-Based Lossy Image Compression
6 pages
6IJCSEITRJUN20196
No ratings yet
6IJCSEITRJUN20196
10 pages
Deep Learning-Based Video Coding - A Review and A Case Study
No ratings yet
Deep Learning-Based Video Coding - A Review and A Case Study
35 pages
Temporal Context Mining For Learned Video Compression
No ratings yet
Temporal Context Mining For Learned Video Compression
12 pages
1 s2.0 S2667241321000148 Main
No ratings yet
1 s2.0 S2667241321000148 Main
14 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
A Novel Hybrid Linear Predictive Coding - Discrete Cosine Transform Based Compression
No ratings yet
A Novel Hybrid Linear Predictive Coding - Discrete Cosine Transform Based Compression
5 pages
Optimizing Image Compression Via Joint Learning With Denoising
No ratings yet
Optimizing Image Compression Via Joint Learning With Denoising
18 pages
Optimal Machine Learning Model Based Medical Image Compression Techniques For Smart Healthcare
No ratings yet
Optimal Machine Learning Model Based Medical Image Compression Techniques For Smart Healthcare
10 pages
Sensors 24 00791 v2
No ratings yet
Sensors 24 00791 v2
27 pages
A Low Complexity Embedded Compression Codec Design With Rate Control For High Definition Video
No ratings yet
A Low Complexity Embedded Compression Codec Design With Rate Control For High Definition Video
14 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Transformer CNN Mixture Architecture
No ratings yet
Transformer CNN Mixture Architecture
10 pages
Full Resolution Image Compression With Recurrent Neural Networks
No ratings yet
Full Resolution Image Compression With Recurrent Neural Networks
10 pages
Farhan 2018 Ijca 916406
No ratings yet
Farhan 2018 Ijca 916406
5 pages
A Different Approach For Spatial Prediction and Transform Using Video Image Coding
No ratings yet
A Different Approach For Spatial Prediction and Transform Using Video Image Coding
6 pages
An Efficient Technique of Lossless Data Compression: Megha S. Chaudhari, S.S.Shirgan
No ratings yet
An Efficient Technique of Lossless Data Compression: Megha S. Chaudhari, S.S.Shirgan
5 pages
Aivp Paper 5837 15848 1 PB
No ratings yet
Aivp Paper 5837 15848 1 PB
12 pages
Computer Vision: Fundamentals and Applications
From Everand
Computer Vision: Fundamentals and Applications
Fouad Sabry
No ratings yet
MP Final Document
No ratings yet
MP Final Document
67 pages
Comprehensive Complexity Assessment of Emerging Learned Image Compression On Cpu and Gpu
No ratings yet
Comprehensive Complexity Assessment of Emerging Learned Image Compression On Cpu and Gpu
5 pages
Id Preservation Loss
No ratings yet
Id Preservation Loss
10 pages
Semantically-Guided Image Compression For Enhanced Perceptual Quality at Extremely Low Bitrates
No ratings yet
Semantically-Guided Image Compression For Enhanced Perceptual Quality at Extremely Low Bitrates
16 pages
DCC01 en
No ratings yet
DCC01 en
10 pages
Calic
No ratings yet
Calic
4 pages
Rippel Learned Video Compression ICCV 2019 Paper
No ratings yet
Rippel Learned Video Compression ICCV 2019 Paper
10 pages
NeurIPS 2021 Lossy Compression For Lossless Prediction Paper
No ratings yet
NeurIPS 2021 Lossy Compression For Lossless Prediction Paper
15 pages
UAV 3D Mapping System: Specifications
No ratings yet
UAV 3D Mapping System: Specifications
1 page
MAXHUB UC W20-Small-Format Brochure
No ratings yet
MAXHUB UC W20-Small-Format Brochure
2 pages
Cámara Tipo PTZ
No ratings yet
Cámara Tipo PTZ
1 page
Science Project
No ratings yet
Science Project
10 pages
95-8686-1.3 (Q9033-Collar)
No ratings yet
95-8686-1.3 (Q9033-Collar)
4 pages
Photochromic Glasses Logic
No ratings yet
Photochromic Glasses Logic
27 pages
(Intro To Investigation) Tutorial 3
No ratings yet
(Intro To Investigation) Tutorial 3
2 pages
Unit 1 Lo1 Brief
No ratings yet
Unit 1 Lo1 Brief
13 pages
Evaluation: Charlotte Ysabel Connolly-Hayes
No ratings yet
Evaluation: Charlotte Ysabel Connolly-Hayes
6 pages
Source 4WRD PAR PARNel Datasheet RevH
No ratings yet
Source 4WRD PAR PARNel Datasheet RevH
21 pages
PaJVeSSzSF2O0suUMZ9Y The Couple S Posing Guide - PDF - Email
No ratings yet
PaJVeSSzSF2O0suUMZ9Y The Couple S Posing Guide - PDF - Email
22 pages
Lab Session 3: Photomicrography: Pearlite
No ratings yet
Lab Session 3: Photomicrography: Pearlite
4 pages
08 - Photo Imaging and Postprocessing
No ratings yet
08 - Photo Imaging and Postprocessing
56 pages
LED Display White Paper For Architects & Designers: Whitepaper
No ratings yet
LED Display White Paper For Architects & Designers: Whitepaper
32 pages
Xperia 1 II Manual Print
No ratings yet
Xperia 1 II Manual Print
148 pages
Photography Notes
No ratings yet
Photography Notes
25 pages
DynamicPhoto PDF
No ratings yet
DynamicPhoto PDF
92 pages
Basic of Operating Microscope
No ratings yet
Basic of Operating Microscope
48 pages
Realme 13 Pro Series 5G Teaser
No ratings yet
Realme 13 Pro Series 5G Teaser
3 pages
DS-2CE11D8T-PIRL: 2 MP Outdoor Ultra-Low Light PIR Bullet Camera
No ratings yet
DS-2CE11D8T-PIRL: 2 MP Outdoor Ultra-Low Light PIR Bullet Camera
2 pages
Manual Oxberry
No ratings yet
Manual Oxberry
38 pages
Plan Plan 2 Film Recover
No ratings yet
Plan Plan 2 Film Recover
11 pages
Photo Journalism: Mass Communication - Vii
No ratings yet
Photo Journalism: Mass Communication - Vii
4 pages
Lomo Instant Automat Manual
No ratings yet
Lomo Instant Automat Manual
178 pages
Crack Detection Using Image Processing - Review
No ratings yet
Crack Detection Using Image Processing - Review
3 pages
Introduction To The History Day Documentary Example PDF
No ratings yet
Introduction To The History Day Documentary Example PDF
4 pages
Operating Manual: SO-3700-W Software For Camera Series 37xx (Software Version 2.xx)
No ratings yet
Operating Manual: SO-3700-W Software For Camera Series 37xx (Software Version 2.xx)
20 pages
Nikon F3 Technical Manual
No ratings yet
Nikon F3 Technical Manual
37 pages
Insta Story
No ratings yet
Insta Story
14 pages
SZ61/SZ51: For Biological Use
No ratings yet
SZ61/SZ51: For Biological Use
12 pages

Deep-Learning-Based Lossless Image Coding

Uploaded by

Deep-Learning-Based Lossless Image Coding

Uploaded by

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 30, NO.

7, JULY 2020 1829

Deep-Learning-Based Lossless Image Coding

(iv) stochastic optimization methods [47]; (v) the introduction

that makes use of it and predicts the residual. Therefore,

component which sets an extra eight bits to obtain a context

Algorithm 2 Context-Based Bit-Plane Coding, the SLOW

T RAINING set. Since 2.5M patches are randomly selected

• CBP has an improved performance of 4.5% over the

where N = 15 is the size of the macro-pixel; (k, ) is the

You might also like