Dual Autoencoder-Based Framework For Image Compression and Decompression
Dual Autoencoder-Based Framework For Image Compression and Decompression
Decompression
Bhargav Patel1
1 Ahmedabad University, Ahmedabad, India
ABSTRACT
Recent development in deep learning has shown great possibilities in many computer vision tasks. Image com-
pression is a crown field of computer vision, and deep learning is gradually being used for image compression &
decompression tasks. The compression rate of the lossy compression algorithm is higher than lossless compression
but the main disadvantage of lossy compression is the loss of data while compressing it. It is observed that a higher
compression rate causes higher data loss. Recent advancements in deep learning technique for computer vision like
image noise reduction and image super-resolution has shown great possibilities in the area of image enhancement. If
these techniques can be utilized to mitigate the impact of a higher compression rate on the output of lossy compression,
then the nearly same image quality can be achieved. In this paper, an image compression & decompression framework
are proposed. This framework is based on two convolutional neural networks (CNN) based autoencoders. Images and
videos are the main sources of unstructured data. Storing and transmitting these unstructured data in the cloud server
can be costly and resource-consuming. If a lossy compression can compress the image to store in the cloud server
then it can reduce the storage cost and space. We have achieved a 4x compression ratio which means an image will
occupy only 25% space compared to the original image. The original image will be retrieved using a joint operation
of deconvolution and image enhancement algorithms. The proposed framework receives state-of-the-art performance
in terms of Peak Signal-to-Noise ratio (PSNR) and Structure Similarity Index Measure (SSIM) with 4x compression.
Keywords: Auto-Encoder, Convolutional neural network (CNN), Image compression & decompression, Deep learn-
ing
1. INTRODUCTION
Image compression is an important domain of computer vision and has gained an enormous amount of interest in
recent years. The main objective of compressing an image is to eliminate redundancy and reduce irrelevant information
so that data can be transferred at a low bit rate and stored while occupying less space. Mainly two types of compression
algorithms exist, i.e. Lossy compression and Lossless compression. As the name suggests, in lossy compression data
loss is observed while in lossless compression no data loss is observed.
Deep learning has affected almost all fields. After the emergence of CNN [1], deep learning has impacted many
areas of computer vision and image compression is one of them. Earlier, algorithms such as Principal Component
Analysis were used to represent images in lower dimension [2]. Autoencoder [3] is a promising deep learning archi-
tecture to learn the latent representation of data at a lower dimension. Autoencoders have already been used in many
areas of computer vision like image colorization [4], image noise reduction [5], and image super resolution [6].
Recent advancement in computer vision is driven by CNNs. CNNs have impacted all the fields of computer
vision and image compression is one of them. [7] proposed a framework for variable-rate image compression. This
LSTM-based approach outperforms JPEG, JPEG2000, and WebP in terms of visual quality. 10% reduction in storage
is observed with this framework. In [8], an image compression method is proposed which is based on a nonlinear
analysis transformation, a nonlinear synthesis transformation, and a uniform quantizer. The transformations consist of
3 convolutional linear filters with nonlinear activation functions. The above-proposed frameworks are based on deep
learning and are lossy compression algorithms.
In [9], the author has presented an architecture for lossy image compression. This model achieves a high coding
efficiency by utilizing the advantages of a convolutional autoencoder (CAE). The authors have designed a state-of-
the-art CAE architecture. This architecture replaces conventional transformations. The rate-distortion loss function is
used to train CAE. This also utilized the principal components analysis (PCA) to rotate the feature maps produced by
the CAE. After this, quantization is applied and an entropy coder is used to generate the codes. This step-by-step pro-
cess generates a more energy-compact representation. In [10], the author proposed Super Resolution CNN (SRCNN).
SRCNN learns an end-to-end mapping between low & high-resolution images. With a lightweight structure, it has
outperformed state-of-the-art techniques for image super-resolution. In [11], proposes ARCNN, a compact and effi-
cient network for flawless compression of different artifacts. In [12], quantization distortion is modeled as Gaussian
noise, and to restore the images, the field of experts is used. In [13], to reduce compression artifact similarity priors of
image blocks are utilized. [14] proposes a very deep fully convolutional auto-encoder network for image restoration.
It has symmetrically linked encoder and decoder layers with skip-layer connections. Experimental results show that
their network achieves better performance than state-of-the-art methods on image denoising and JPEG deblocking.
In this work, we have proposed a two-fold CNN autoencoder-based deep learning framework to compress an image
and then decompress it with minimum loss. The first autoencoder named compCoder is used to compress an image
and reduce the image size by 4x and then decompress it to the original ratio. The second CNN autoencoder named
supeCoder is used to enhance the decompressed image. Due to the phenomenal capability of CNN to extract the
features of an image, compCoder is used to compress the image. The main objective is to compress data in lower
dimensions to reduce the size of an image so that it can be transmitted at a low bit rate and occupy less space on the
disk. We have achieved 4 times reduction in the size of an image and then decompress it with minimum loss. To
decompress the encoded image, the first decoder of compCoder is used. The output of the decoder is not visually
good with very low PSNR and SSIM. To enhance this output image we have used a supeCoder to enhance the image.
The output of supeCoder is visually perfect with benchmarking performance in terms of PSNR and SSIM considering
lossy compression.
The rest of the paper is organized as follows. Section II describes the implementation of the framework in which
the system model, dataset, autoencoder architectures, and performance metrics are discussed. Results are provided in
section III and the conclusion in section IV.
2. IMPLEMENTATION
2.1 System Model
Figure 1: System model for image compression and decompression using compCoder and supeCoder
Fig-1 shows the overall process of the framework. First, compCoder and supeCoder are trained independently
on the given dataset. Then the encoder and decoder part of the compCoder is divided to use independently. The
encoder of the compCoder compresses the image of the dimension 128X128X3 to 64X64X3. This provides a 4
times reduction in size. The compressed image is transferred over the network or can be used to store it in the server.
To decompress the compressed image, the first compressed image is passed through the decoder of the compCoder.
The output of the decoder is not so good visually as well as not good in terms of PSNR and SSIM. To enhance the
output supeCoder is used. The output of supeCoder is visually perfect and has benchmarking performance in terms of
PSNR and SSIM.
2.2 Dataset
We have used CLIC [15] dataset for the scope of this paper. CLIC is a dataset from the Challenge on Learned
Image Compression 2020 lossy image compression track. These images contain a mix of professional and mobile
datasets. This dataset is used to train and benchmark rate-distortion performance. The dataset consists of both RGB
and grayscale images. Grayscale images are discarded for this paper. Important attributes of dataset are described in
table-1
The images in the dataset are pre-processed and converted to a 128X128X3 NumPy array. After that, the dataset
is normalized and all the pixel values are converted in the range of 0 to 1 from 0 to 255.
First, we are training the compCoder on the CLIC dataset. After training, the model archives 3% testing loss on
Mean Absolute Error (MAE). A new dataset is created which contains the output of compCoder and original images.
SupeCoder is trained to map the output of compCoder to the original image. After training supeCoder model achieves
0.008% loss using Mean Square Error (MSE).
Fig 3a & 3b shows the training and validation loss of the compCoder and supeCoder respectively. It can be
observed that compCoder and supeCoder training loss and validation loss go hand in hand. In both cases, we can say
that models have a good fit for the data.
m−1 n−1
1 XX
M SE = [Y (i, j) − Ŷ (i, j)]2 (1)
mn i=0 j=0
Table 6: PSNR and SSIM range of all competitive algorithms for grayscale image
Model Name PSNR (avg) SSIM (avg)
ARCNN [11] 26.97 db 0.8308
Sun’s [12] 26.72 db 0.8104
Zhang’s [13] 27.63 db 0.8308
Ren’s [20] 27.18 DB 0.8171
Here, M AXY is the peak pixel value of the image. Considering our scope of work, pixels are represented using 8 bits
per sample which means M AXY = 255.
Here, µx & µy represents the mean of x and y. σx & σy represents the variance of x and y. σxy represents
covariance of x and y. c1 = (r1 L)2 and c2 = (r2 L)2 are two variables to stabilize the division with weak denominator.
L is the dynamic range of the pixel values. By default r1 = 0.01 and r2 = 0.03.
3. RESULTS
Fig. 4(a), 4(b) & 4(c) shows the reconstruction of the input image after the compression. First, we are com-
pressing the image to 4 times lower dimension representation. Compressed images with their size on the disk are given
in fig 5. We can observe that size of the image is reduced by 4 times. These encoded images can be transmitted over
the network to reduce the bandwidth or can be used to store in the cloud server with reduced size. The rightmost image
is the original image. The leftmost image is the output of the compCoder and the middle image is the output of the
supeCoder. It can be observed that, due to such a high compression rate in the compCoder, the colors and sharpness of
the image are distorted. Short skip connection and an increasing number of filters in the autoencoder performed well
in recovering major information but it fails to recover fine-grained information. SupeCoder is designed to maintain
fine-grained details such as color depth, sharpness, and structural similarities. Long skip connections are introduced
in supeCoder which passes the spatial information of the encoder part to the decoder part. This helps in learning the
global features of the input images which results in fine-grained images.
OutputofCompCoder OutputofSuperCoder GroundTruth
PSNR=26.3673&SSIM=0.9174 PSNR=31.8675&SSIM=0.9334 PSNR=Inf&SSIM=1
Fig.4(a):Result-1
OutputofCompCoder OutputofSuperCoder roundTruth
G
SNR=22.0169&SSIM=0.8954
P SNR=32.9175&SSIM=0.9412
P PSNR=Inf&SSIM=1
Fig.4(b):Result-2
OutputofCompCoder OutputofSuperCoder GroundTruth
PSNR=25.1840&SSIM=0.9012 PSNR=32.4210&SSIM=0.9458 PSNR=Inf&SSIM=1
Fig.4(c):Result-3
Figure 4: Outputs of compCoder and superCoder
Figure 5: Images encoded in lower dimension and their size
Significance of supeCoder can be appreciated by comparing the output of compCoder and supeCoder. It can be
observed that supeCoder does not just enhance the output image visibly but also improved the image quality in terms
of PSNR and SSIM. Table 5 shows the range of PSNR and SSIM for the reconstructed image. Despite the best efforts
to eliminate the limitations, in fig. 4(c) we can observe structural and color deformity can be observed. After observing
multiple experimental results with different colors and structures, we have come to the conclusion that this framework
works best with all types of images except those with light yellow and white as the major colors.
In table 6, PSNR and SSIM of major compression algorithms are given [21]. The given performance metrics
are for the grayscale images but it can be appreciated that despite the RGB image, the proposed model achieved
state-of-the-art PSNR and SSIM with a 4x compression rate. Utilization of an increasing number of filters, long and
short skip connections, ReLU activation function, and symmetrical CNN autoencoder structure has resulted in fine-
tuned networks. Both of these networks are trained on the CLIC dataset which is specially designed for the image
compression task.
4. CONCLUSION
Compressing an image into a lower dimension representation can be used to store the image in the disk with
reduced size or transmit the image over the network with low bandwidth. Our approach exploits the use of two CNN-
based autoencoders i.e. compCoder & supeCoder to perform the task of image compression, decompression, and
enhancement. Results show that the reconstructed image retains nearly all information due to the inclusion of long
and short skip connections and achieves state-of-the-art performance. Performance is evaluated with the help of PSNR
and SSIM and compared with the present state-of-the-art. We can observe that both the autoencoder are effectively
doing their task and have achieved a good performance in terms of performance metrics and visual observations. Both
of the autoencoders are finely trained and provide minimum data loss. Despite low data loss, this approach cannot
be used for medical images and satellite image compression as there is a loss of information, however insignificant
it is. Future work involves reducing the computational complexity of both models so that this model can work with
streaming data.
REFERENCES
[1] Y. LeCun, P. Haffner, L. Bottou, and Y. Bengio, “Object recognition with gradient-based learning,” in Shape,
Contour and Grouping in Computer Vision , 319, Springer-Verlag, Berlin, Heidelberg (1999).
[2] Z. H. Xiao, “An image compression algorithm based on pca and jl,” Journal of Image and Graphics 1, 114–116
(2013).
[3] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning , MIT Press (2016).
http://www.deeplearningbook.org.
[4] A. Deshpande, J. Lu, M.-C. Yeh, M. Jin Chong, and D. Forsyth, “Learning diverse image colorization,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , (July 2017).
[5] L. Gondara, “Medical image denoising using convolutional denoising autoencoders,” in 2016 IEEE 16th Inter-
national Conference on Data Mining Workshops (ICDMW) , 241–246 (2016).
[6] K. Zeng, J. Yu, R. Wang, C. Li, and D. Tao, “Coupled deep autoencoder for single image super-resolution,” IEEE
Transactions on Cybernetics 47(1), 27–37 (2017).
[7] G. Toderici, S. M. O’Malley, S. J. Hwang, D. Vincent, D. Minnen, S. Baluja, M. Covell, and R. Sukthankar,
“Variable rate image compression with recurrent neural networks,” (2016).
[8] J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” (2017).
[9] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Deep convolutional autoencoder-based lossy image compression,”
(2018).
[10] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE
Transactions on Pattern Analysis and Machine Intelligence 38(2), 295–307 (2016).
[11] C. Dong, Y. Deng, C. C. Loy, and X. Tang, “Compression artifacts reduction by a deep convolutional network,”
(2015).
[12] D. Sun and W.-K. Cham, “Postprocessing of low bit-rate block dct coded images based on a fields of experts
prior,” IEEE Transactions on Image Processing 16(11), 2743–2751 (2007).
[13] X. Zhang, R. Xiong, X. Fan, S. Ma, and W. Gao, “Compression artifact reduction by overlapped-block transform
coefficient estimation with block similarity,” IEEE Transactions on Image Processing 22(12), 4613–4626 (2013).
[14] X.-J. Mao, C. Shen, and Y.-B. Yang, “Image restoration using convolutional auto-encoders with symmetric skip
connections,” (2016).
[15] Wenzhe ShiR. T. L. T. J. B. E. A. N. J. F. M. George Toderici, “Workshop and challenge on learned image
compression (clic2020),” (2020).
[16] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate
shift,” (2015).
[17] N. Adaloglou, “Intuitive explanation of skip connections in deep learning,” https://theaisummer.com/ (2020).
[18] F. Chollet and others, “Keras,” (2015).
[19] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, and others, “Tensorflow: A system for large-
scale machine learning,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI
16) , 265–283 (2016).
[20] J. Ren, J. Liu, M. Li, W. Bai, and Z. Guo, “Image blocking artifacts reduction via patch clustering and low-rank
minimization,” in 2013 Data Compression Conference , 516–516 (2013).
[21] F. Jiang, W. Tao, S. Liu, J. Ren, X. Guo, and D. Zhao, “An end-to-end compression framework based on convo-
lutional neural networks,” IEEE Transactions on Circuits and Systems for Video Technology 28(10), 3007–3018
(2018).