0% found this document useful (0 votes)
16 views

Dual Autoencoder-Based Framework For Image Compression and Decompression

Uploaded by

Bhargav Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Dual Autoencoder-Based Framework For Image Compression and Decompression

Uploaded by

Bhargav Patel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Dual Autoencoder-based Framework for Image Compression and

Decompression
Bhargav Patel1
1 Ahmedabad University, Ahmedabad, India

ABSTRACT
Recent development in deep learning has shown great possibilities in many computer vision tasks. Image com-
pression is a crown field of computer vision, and deep learning is gradually being used for image compression &
decompression tasks. The compression rate of the lossy compression algorithm is higher than lossless compression
but the main disadvantage of lossy compression is the loss of data while compressing it. It is observed that a higher
compression rate causes higher data loss. Recent advancements in deep learning technique for computer vision like
image noise reduction and image super-resolution has shown great possibilities in the area of image enhancement. If
these techniques can be utilized to mitigate the impact of a higher compression rate on the output of lossy compression,
then the nearly same image quality can be achieved. In this paper, an image compression & decompression framework
are proposed. This framework is based on two convolutional neural networks (CNN) based autoencoders. Images and
videos are the main sources of unstructured data. Storing and transmitting these unstructured data in the cloud server
can be costly and resource-consuming. If a lossy compression can compress the image to store in the cloud server
then it can reduce the storage cost and space. We have achieved a 4x compression ratio which means an image will
occupy only 25% space compared to the original image. The original image will be retrieved using a joint operation
of deconvolution and image enhancement algorithms. The proposed framework receives state-of-the-art performance
in terms of Peak Signal-to-Noise ratio (PSNR) and Structure Similarity Index Measure (SSIM) with 4x compression.
Keywords: Auto-Encoder, Convolutional neural network (CNN), Image compression & decompression, Deep learn-
ing

1. INTRODUCTION
Image compression is an important domain of computer vision and has gained an enormous amount of interest in
recent years. The main objective of compressing an image is to eliminate redundancy and reduce irrelevant information
so that data can be transferred at a low bit rate and stored while occupying less space. Mainly two types of compression
algorithms exist, i.e. Lossy compression and Lossless compression. As the name suggests, in lossy compression data
loss is observed while in lossless compression no data loss is observed.
Deep learning has affected almost all fields. After the emergence of CNN [1], deep learning has impacted many
areas of computer vision and image compression is one of them. Earlier, algorithms such as Principal Component
Analysis were used to represent images in lower dimension [2]. Autoencoder [3] is a promising deep learning archi-
tecture to learn the latent representation of data at a lower dimension. Autoencoders have already been used in many
areas of computer vision like image colorization [4], image noise reduction [5], and image super resolution [6].
Recent advancement in computer vision is driven by CNNs. CNNs have impacted all the fields of computer
vision and image compression is one of them. [7] proposed a framework for variable-rate image compression. This
LSTM-based approach outperforms JPEG, JPEG2000, and WebP in terms of visual quality. 10% reduction in storage
is observed with this framework. In [8], an image compression method is proposed which is based on a nonlinear
analysis transformation, a nonlinear synthesis transformation, and a uniform quantizer. The transformations consist of
3 convolutional linear filters with nonlinear activation functions. The above-proposed frameworks are based on deep
learning and are lossy compression algorithms.
In [9], the author has presented an architecture for lossy image compression. This model achieves a high coding
efficiency by utilizing the advantages of a convolutional autoencoder (CAE). The authors have designed a state-of-
the-art CAE architecture. This architecture replaces conventional transformations. The rate-distortion loss function is
used to train CAE. This also utilized the principal components analysis (PCA) to rotate the feature maps produced by
the CAE. After this, quantization is applied and an entropy coder is used to generate the codes. This step-by-step pro-
cess generates a more energy-compact representation. In [10], the author proposed Super Resolution CNN (SRCNN).
SRCNN learns an end-to-end mapping between low & high-resolution images. With a lightweight structure, it has
outperformed state-of-the-art techniques for image super-resolution. In [11], proposes ARCNN, a compact and effi-
cient network for flawless compression of different artifacts. In [12], quantization distortion is modeled as Gaussian
noise, and to restore the images, the field of experts is used. In [13], to reduce compression artifact similarity priors of
image blocks are utilized. [14] proposes a very deep fully convolutional auto-encoder network for image restoration.
It has symmetrically linked encoder and decoder layers with skip-layer connections. Experimental results show that
their network achieves better performance than state-of-the-art methods on image denoising and JPEG deblocking.
In this work, we have proposed a two-fold CNN autoencoder-based deep learning framework to compress an image
and then decompress it with minimum loss. The first autoencoder named compCoder is used to compress an image
and reduce the image size by 4x and then decompress it to the original ratio. The second CNN autoencoder named
supeCoder is used to enhance the decompressed image. Due to the phenomenal capability of CNN to extract the
features of an image, compCoder is used to compress the image. The main objective is to compress data in lower
dimensions to reduce the size of an image so that it can be transmitted at a low bit rate and occupy less space on the
disk. We have achieved 4 times reduction in the size of an image and then decompress it with minimum loss. To
decompress the encoded image, the first decoder of compCoder is used. The output of the decoder is not visually
good with very low PSNR and SSIM. To enhance this output image we have used a supeCoder to enhance the image.
The output of supeCoder is visually perfect with benchmarking performance in terms of PSNR and SSIM considering
lossy compression.
The rest of the paper is organized as follows. Section II describes the implementation of the framework in which
the system model, dataset, autoencoder architectures, and performance metrics are discussed. Results are provided in
section III and the conclusion in section IV.

2. IMPLEMENTATION
2.1 System Model

Figure 1: System model for image compression and decompression using compCoder and supeCoder

Fig-1 shows the overall process of the framework. First, compCoder and supeCoder are trained independently
on the given dataset. Then the encoder and decoder part of the compCoder is divided to use independently. The
encoder of the compCoder compresses the image of the dimension 128X128X3 to 64X64X3. This provides a 4
times reduction in size. The compressed image is transferred over the network or can be used to store it in the server.
To decompress the compressed image, the first compressed image is passed through the decoder of the compCoder.
The output of the decoder is not so good visually as well as not good in terms of PSNR and SSIM. To enhance the
output supeCoder is used. The output of supeCoder is visually perfect and has benchmarking performance in terms of
PSNR and SSIM.

2.2 Dataset
We have used CLIC [15] dataset for the scope of this paper. CLIC is a dataset from the Challenge on Learned
Image Compression 2020 lossy image compression track. These images contain a mix of professional and mobile
datasets. This dataset is used to train and benchmark rate-distortion performance. The dataset consists of both RGB
and grayscale images. Grayscale images are discarded for this paper. Important attributes of dataset are described in
table-1

Table 1: Dataset Information


Attribute Value
Size 7.48 GiB
Training images 1633
Validation images 102
Test images 428
Dimension (None, None, 3)
Data type 8 bit unsigned integer

The images in the dataset are pre-processed and converted to a 128X128X3 NumPy array. After that, the dataset
is normalized and all the pixel values are converted in the range of 0 to 1 from 0 to 255.

2.3 Autoencoder Architecture


In this work two CNN-based autoencoders are used for image compression and decompression. First, the
compCoder is trained on the given dataset. The objective of compCoder is to effectively learn the latent representation
of given images in the lower dimensions. To do this, we are training the compCoder in a self-supervised learning
manner. Fig 2a describes the architecture of encoder part of compCoder. The number of filters and their kernel size
has been carefully selected to minimize information loss. As the image progress through the network, the number of
filters increases to get the maximum image features. This will compensate for the information loss happening due to
the down-scaling. Training deep neural networks is complicated due to the fact that the distribution of each layer’s
inputs changes during training according to the parameters of the previous layers. This slows down convergence by
requiring lower learning rates. To overcome that author of [16] has introduced the method of batch normalization(BN).
Due to the impact of the BN layer, a purple tint was observed on the output image, and convergence in the training
process was unstable. Due to this reason, BN layers are discarded. To make our deep network dynamic and effective
we have used skip connection in between layers. The main objective of the skip connection is to learn the residual of
the data. This approach provides robustness to the different datasets and makes the deep learning architecture dynamic.
Skip connection is present in the encoder and decoder parts separately so that we can detach them. A similar approach
is taken for the decoder of compCoder. Deconvolution layers have been used to upsample the encoded image.
A detailed architecture of supeCoder is given in the fig 2b. SupeCoder has longer skip connections. The main
objective to have a longer skip connection compared to compCoder is to pass the spacial feature from encoder to
decoder in order to recover spatial information loss. Longer skip connections help in recovering fine-grained details
while stabilizing the training process and convergence [17].

2.4 Network Training


A self-supervised learning approach is considered to train compCoder and supeCoder. Training procedures are
implemented using python3 using Keras [18] API with tensorflow [19] as the backend. Details of Hyperparameters
are given in table 3 & 4.
(a) compCoder (b) supeCoder
Figure 2: Detailed architecture
(a) Loss of compCoder in terms of MAE (b) Loss of supeCoder in terms of MSE
Figure 3: Loss Curves
Table 2: Hyperparameters
Table 3: compCoder Table 4: SupeCoder
Hyperparameters Value Hyperparameters Value
Initial learning rate 0.003 Initial learning rate 0.004
Batch size 32 Batch size 64
Epochs 30 Epochs 40
Optimization algorithm Adam Optimization algorithm Adam
Activation Function ReLU Activation Function ReLU
Loss Function MAE Loss Function MSE
Number of trainable parameters 0.4M Number of trainable parameters 1.7M

First, we are training the compCoder on the CLIC dataset. After training, the model archives 3% testing loss on
Mean Absolute Error (MAE). A new dataset is created which contains the output of compCoder and original images.
SupeCoder is trained to map the output of compCoder to the original image. After training supeCoder model achieves
0.008% loss using Mean Square Error (MSE).
Fig 3a & 3b shows the training and validation loss of the compCoder and supeCoder respectively. It can be
observed that compCoder and supeCoder training loss and validation loss go hand in hand. In both cases, we can say
that models have a good fit for the data.

2.5 Peak Signal-to-Noise Ratio


The peak signal-to-noise ratio (PSNR) is defined as the ratio between the signal’s peak power and the power of
corrupting noise. This noise affects the fidelity of the representation of the original signal. Due to the wide dynamic
range of different signals, PSNR is represented in terms of the logarithmic decibel scale. PSNR is defined by the mean
squared error (MSE). Given a noise-free M×N monochrome image Y and its noisy approximation Ŷ , MSE is defined
as:

m−1 n−1
1 XX
M SE = [Y (i, j) − Ŷ (i, j)]2 (1)
mn i=0 j=0

The PSNR (in dB) is defined as:

P SN R = 20.log10 (M AXY ) − 10.log10 (M SE) (2)


Table 5: PSNR and SSIM range
Model Name PSNR (avg) SSIM (avg)
CompCoder 22.3 db 0.8319
SupeCoder (final output) 32.4 db 0.9408

Table 6: PSNR and SSIM range of all competitive algorithms for grayscale image
Model Name PSNR (avg) SSIM (avg)
ARCNN [11] 26.97 db 0.8308
Sun’s [12] 26.72 db 0.8104
Zhang’s [13] 27.63 db 0.8308
Ren’s [20] 27.18 DB 0.8171

Here, M AXY is the peak pixel value of the image. Considering our scope of work, pixels are represented using 8 bits
per sample which means M AXY = 255.

2.6 Structure Similarity Index Measure


The structural similarity index (SSIM) is used to anticipate the perceived quality of digital images and videos.
SSIM is used for measuring the structural similarity between a compressed image and an original image. From the
definition, it can be observed that the SSIM index is a fully referential metric; which implies, that the measurement or
prediction of image quality is based on an original uncompressed image or distortion-free image. This method differs
from other techniques such as MSE or PSNR in the sense that these approaches estimate absolute errors rather than
relative errors. Structural information gives the idea that the pixels have robust inter-dependencies specifically when
they are spatially close.
SSIM index is calculated on various segments of an image. The measure between two segments x and y of size
M XM is

(2µx µy + c1 )(2σxy + c2)


SSIM (x, y) = (3)
(µ2x + µ2y + c1 )(σx2 + σx2 + c2 )

Here, µx & µy represents the mean of x and y. σx & σy represents the variance of x and y. σxy represents
covariance of x and y. c1 = (r1 L)2 and c2 = (r2 L)2 are two variables to stabilize the division with weak denominator.
L is the dynamic range of the pixel values. By default r1 = 0.01 and r2 = 0.03.

3. RESULTS
Fig. 4(a), 4(b) & 4(c) shows the reconstruction of the input image after the compression. First, we are com-
pressing the image to 4 times lower dimension representation. Compressed images with their size on the disk are given
in fig 5. We can observe that size of the image is reduced by 4 times. These encoded images can be transmitted over
the network to reduce the bandwidth or can be used to store in the cloud server with reduced size. The rightmost image
is the original image. The leftmost image is the output of the compCoder and the middle image is the output of the
supeCoder. It can be observed that, due to such a high compression rate in the compCoder, the colors and sharpness of
the image are distorted. Short skip connection and an increasing number of filters in the autoencoder performed well
in recovering major information but it fails to recover fine-grained information. SupeCoder is designed to maintain
fine-grained details such as color depth, sharpness, and structural similarities. Long skip connections are introduced
in supeCoder which passes the spatial information of the encoder part to the decoder part. This helps in learning the
global features of the input images which results in fine-grained images.
 ‌
Output‌‌of‌‌CompCoder‌ Output‌‌of‌‌SuperCoder‌ ‌Ground‌‌Truth‌  ‌
‌PSNR=26.3673‌‌&‌‌SSIM=0.9174‌ ‌PSNR=31.8675‌‌&‌‌SSIM=0.9334‌ ‌PSNR=Inf‌‌&‌‌SSIM=1‌  ‌
Fig.‌‌4(a):‌‌Result-1‌  ‌
 ‌

 ‌
Output‌‌of‌‌CompCoder‌ Output‌‌of‌‌SuperCoder‌ ‌ round‌‌Truth‌  ‌
G
‌ SNR=22.0169‌‌&‌‌SSIM=0.8954‌
P ‌ SNR=32.9175‌‌&‌‌SSIM=0.9412‌
P ‌PSNR=Inf‌‌&‌‌SSIM=1‌  ‌
Fig.‌‌4(b):‌‌Result-2‌  ‌
 ‌

 ‌
Output‌‌of‌‌CompCoder‌ Output‌‌of‌‌SuperCoder‌ ‌Ground‌‌Truth‌  ‌
‌PSNR=25.1840‌‌&‌‌SSIM=0.9012‌ ‌PSNR=32.4210‌‌&‌‌SSIM=0.9458‌ ‌PSNR=Inf‌‌&‌‌SSIM=1‌  ‌
Fig.‌‌4(c):‌‌Result-3‌  ‌
Figure 4: Outputs of compCoder and superCoder
Figure 5: Images encoded in lower dimension and their size

Significance of supeCoder can be appreciated by comparing the output of compCoder and supeCoder. It can be
observed that supeCoder does not just enhance the output image visibly but also improved the image quality in terms
of PSNR and SSIM. Table 5 shows the range of PSNR and SSIM for the reconstructed image. Despite the best efforts
to eliminate the limitations, in fig. 4(c) we can observe structural and color deformity can be observed. After observing
multiple experimental results with different colors and structures, we have come to the conclusion that this framework
works best with all types of images except those with light yellow and white as the major colors.
In table 6, PSNR and SSIM of major compression algorithms are given [21]. The given performance metrics
are for the grayscale images but it can be appreciated that despite the RGB image, the proposed model achieved
state-of-the-art PSNR and SSIM with a 4x compression rate. Utilization of an increasing number of filters, long and
short skip connections, ReLU activation function, and symmetrical CNN autoencoder structure has resulted in fine-
tuned networks. Both of these networks are trained on the CLIC dataset which is specially designed for the image
compression task.

4. CONCLUSION
Compressing an image into a lower dimension representation can be used to store the image in the disk with
reduced size or transmit the image over the network with low bandwidth. Our approach exploits the use of two CNN-
based autoencoders i.e. compCoder & supeCoder to perform the task of image compression, decompression, and
enhancement. Results show that the reconstructed image retains nearly all information due to the inclusion of long
and short skip connections and achieves state-of-the-art performance. Performance is evaluated with the help of PSNR
and SSIM and compared with the present state-of-the-art. We can observe that both the autoencoder are effectively
doing their task and have achieved a good performance in terms of performance metrics and visual observations. Both
of the autoencoders are finely trained and provide minimum data loss. Despite low data loss, this approach cannot
be used for medical images and satellite image compression as there is a loss of information, however insignificant
it is. Future work involves reducing the computational complexity of both models so that this model can work with
streaming data.
REFERENCES
[1] Y. LeCun, P. Haffner, L. Bottou, and Y. Bengio, “Object recognition with gradient-based learning,” in Shape,
Contour and Grouping in Computer Vision , 319, Springer-Verlag, Berlin, Heidelberg (1999).
[2] Z. H. Xiao, “An image compression algorithm based on pca and jl,” Journal of Image and Graphics 1, 114–116
(2013).
[3] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning , MIT Press (2016).
http://www.deeplearningbook.org.
[4] A. Deshpande, J. Lu, M.-C. Yeh, M. Jin Chong, and D. Forsyth, “Learning diverse image colorization,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , (July 2017).
[5] L. Gondara, “Medical image denoising using convolutional denoising autoencoders,” in 2016 IEEE 16th Inter-
national Conference on Data Mining Workshops (ICDMW) , 241–246 (2016).
[6] K. Zeng, J. Yu, R. Wang, C. Li, and D. Tao, “Coupled deep autoencoder for single image super-resolution,” IEEE
Transactions on Cybernetics 47(1), 27–37 (2017).
[7] G. Toderici, S. M. O’Malley, S. J. Hwang, D. Vincent, D. Minnen, S. Baluja, M. Covell, and R. Sukthankar,
“Variable rate image compression with recurrent neural networks,” (2016).
[8] J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” (2017).
[9] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Deep convolutional autoencoder-based lossy image compression,”
(2018).
[10] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE
Transactions on Pattern Analysis and Machine Intelligence 38(2), 295–307 (2016).
[11] C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,”
(2015).
[12] D. Sun and W.-K. Cham, “Postprocessing of low bit-rate block dct coded images based on a fields of experts
prior,” IEEE Transactions on Image Processing 16(11), 2743–2751 (2007).
[13] X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao, “Compression artifact reduction by overlapped-block transform
coefficient estimation with block similarity,” IEEE Transactions on Image Processing 22(12), 4613–4626 (2013).
[14] X.-J. Mao, C. Shen, and Y.-B. Yang, “Image restoration using convolutional auto-encoders with symmetric skip
connections,” (2016).
[15] Wenzhe ShiR. T. L. T. J. B. E. A. N. J. F. M. George Toderici, “Workshop and challenge on learned image
compression (clic2020),” (2020).
[16] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate
shift,” (2015).
[17] N. Adaloglou, “Intuitive explanation of skip connections in deep learning,” https://theaisummer.com/ (2020).
[18] F. Chollet and others, “Keras,” (2015).
[19] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, and others, “Tensorflow: A system for large-
scale machine learning,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI
16) , 265–283 (2016).
[20] J. Ren, J. Liu, M. Li, W. Bai, and Z. Guo, “Image blocking artifacts reduction via patch clustering and low-rank
minimization,” in 2013 Data Compression Conference , 516–516 (2013).
[21] F. Jiang, W. Tao, S. Liu, J. Ren, X. Guo, and D. Zhao, “An end-to-end compression framework based on convo-
lutional neural networks,” IEEE Transactions on Circuits and Systems for Video Technology 28(10), 3007–3018
(2018).

You might also like