0% found this document useful (0 votes)
24 views40 pages

Single Image Super Resolution

Uploaded by

Keshav Bansal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views40 pages

Single Image Super Resolution

Uploaded by

Keshav Bansal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

SINGLE IMAGE SUPER - RESOLUTION

Enrol. No. (s) - 9918103219, 9918103226


Name of Student (s) - Keshav Bansal, Purnima Shukla
Name of supervisor - Dr. Swati Gupta

Department of CSE/IT
Jaypee Institute of Information Technology University, Noida.

October 2021

Submitted in partial fulfillment of the Degree of Bachelor of Technology

in

Computer Science Engineering

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING & INFORMATION TECHNOLOGY

JAYPEE INSTITUTE OF INFORMATION TECHNOLOGY, NOIDA


TABLE OF CONTENTS
Chapter No. Topics Page No.

Chapter-1 Introduction 9-21

1.1 General Introduction


1.2 Problem Statement
1.3 Significance/Novelty of the problem
1.4 Empirical Study
1.5 Brief Description of the Solution Approach
1.6 Comparison of existing approaches to the problem framed

Chapter-2 Literature Survey 22-24

2.1 Summary of papers studied


2.2 Integrated summary of the literature studied

Chapter 3: Requirement Analysis and Solution Approach 25-26


3.1 Hardware and Software Requirements
3.2 Datasets

Chapter-4 Modeling and Implementation Details 27-32


4.1 Implementation details and issues
Chapter-5 Results 33-36
5.1 EDSR Results
5.2 WDSR Results
5.3 ESRGAN Results
Chapter-6 Metrics Comparison 37-38

References IEEE Format 39

1
DECLARATION

We hereby declare that this submission is our own work and that, to the best of our knowledge and
beliefs, it contains no material previously published or written by another person nor material which
has been accepted for the award of any other degree or diploma from a university or other institute
of higher learning, except where due acknowledgment has been made in the text.

Place: Noida
Date: 5th December, 2021
Name: Keshav Bansal
Enrolment No.: 9918103219
Name: Purnima Shukla
Enrolment No.: 9918103226

2
CERTIFICATE

This is to certify that the work titled “SINGLE IMAGE SUPER - RESOLUTION” submitted by
Keshav Bansal and Purnima Shukla in partial fulfillment for the award of degree of Btech of
Jaypee Institute of Information Technology, Noida has been carried out under my supervision. This
work has not been submitted partially or wholly to any other University or Institute for the award of
this or any other degree or diploma.

Signature of Supervisor
Name of Supervisor - Dr. Swati Gupta
Designation - Assistant Professor
Date - 5th December, 2021

3
ACKNOWLEDGEMENT

We would like to express our special thanks to our teacher Dr. Swati Gupta who gave us the
golden opportunity to do this wonderful project on the topic - Single Image Super - Resolution,
which also helped us in doing a lot of Research and we came to know about so many new things,
we are really thankful to her.

Secondly We would also like to thank Mr. Larry Page and Sergey Brin for making Google search
engine without which we would not be able to complete our project.

We would also like to thank Jeff Attwood for making Stackoverflow which helped us to find and
debug errors with ease.

We want to thank all the Open Source Developers and Contributors.

4
SUMMARY

Super-resolution is the process of creating high-resolution images from low-resolution images. This
example considers single image super-resolution (SISR), where the goal is to recover one
high-resolution image from one low-resolution image. SISR is challenging because high-frequency
image content typically cannot be recovered from the low-resolution image. Without
high-frequency information, the quality of the high-resolution image is limited. Further, SISR is an
ill-posed problem because one low-resolution image can yield several possible high-resolution
images.

We have worked on three models :

1. Enhanced Deep Residual Networks for Single Image Super-Resolution (EDSR)


2. Wide Activation for Efficient and Accurate Image Super-Resolution (WDSR)
3. Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN)

And after testing and analysing we found ESRGAN to be the best among the three when compared
on the basis of PSNR and SSIM Ratios.

5
LIST OF FIGURES
Fig Name Page No.
1. Sketch map of SRCNN 11
2. Upsampling layers positions 14
3. Global Skip connection 15
4. Sub - pixel convolution 16
5. Comparison of residual blocks in original , ResNet and ours 17
6. EDSR architecture 18
7. Resblock Architecture 19
8. Basic architecture of SRResNet 20
9. We remove the BN layers in residual block in SRGAN 20
10. RRDB block is used in our deeper model and β is the residual scaling parameter. 21
11. Train_processed_example 28
12. Validation_preprocessed 29
13. EDSR Results 33
14. WDSR Results 34
15. ESRGAN Results 35
16. ESRGAN Results 36

6
LIST OF TABLES
Table Name Page No.
1. Metrics 38

7
LIST OF SYMBOLS & ACRONYMS

1. SISR - single image super resolution


2. LR - low resolution
3. HR - high resolution
4. SR - super resolution
5. CNN - convolutional neural networks
6. PSNR - Peak Signal to Noise Ratio
7. SSIM - Structural Similarity Index
8. API - Application Programming Interface
9. BN - Batch Normalisation
10. GAN - Generative Adversarial Networks
11. EDSR - Enhanced deep super-resolution network
12. WDSR - Wide activation super-resolution network
13. ESRGAN - Enhanced Super-Resolution Generative Adversarial Networks

8
1. INTRODUCTION

1.1 GENERAL INTRODUCTION

Image Super Resolution refers to the task of enhancing the resolution of an image from low-resolution (LR)
to high (HR). It is popularly used in the following applications:
Surveillance: to detect, identify, and perform facial recognition on low-resolution images obtained from
security cameras.
Medical: capturing high-resolution MRI images can be tricky when it comes to scan time, spatial coverage,
and signal-to-noise ratio (SNR). Super resolution helps resolve this by generating high-resolution MRI from
otherwise low-resolution MRI images.
Media: super resolution can be used to reduce server costs, as media can be sent at a lower resolution and
upscaled on the fly.
Deep learning techniques have been fairly successful in solving the problem of image and video
super-resolution.
Low resolution images can be modeled from high resolution images using the below formula, where D is the
degradation function, Iy is the high resolution image, Ix is the low resolution image, and σ is the noise.

Ix = D(Iy;σ)

The degradation parameters D and σ are unknown; only the high resolution image and the corresponding low
resolution image are provided. The task of the neural network is to find the inverse function of degradation
using just the HR and LR image data.

There are many methods used to solve this task.

- Pre-Upsampling Super Resolution


- Post-Upsampling Super Resolution
- Residual Networks
- Multi-Stage Residual Networks
- Recursive Networks
- Progressive Reconstruction Networks
- Multi-Branch Networks
- Attention-Based Networks
- Generative Models

9
1.2 PROBLEM STATEMENT

We all have faced the problem of distorted images at some point of time. Thus here we have tried to enhance
the quality of images thereby removing the degradations caused by the imaging process. We will be using
deep learning techniques by training convolutional neural networks.

1.3 SIGNIFICANCE OF PROBLEM

Super-resolution is based on the idea that a combination of low resolution (noisy) sequences of images of a
scene can be used to generate a high resolution image or image sequence. Thus it attempts to reconstruct the
original scene image with high resolution given a set of observed images at lower resolution. The general
approach considers the low resolution images as resulting from resampling of a high resolution image. The
goal is then to recover the high resolution image which when resampled based on the input images and the
imaging model, will produce the low resolution observed images. Thus the accuracy of the imaging model is
vital for super-resolution and an incorrect modeling, say of motion, can actually degrade the image further.
The observed images could be taken from one or multiple cameras or could be frames of a video sequence.
These images need to be mapped to a common reference frame. This process is registration. The
super-resolution procedure can then be applied to a region of interest in the aligned composite image. The
key to successful super-resolution consists of accurate alignment i.e. registration and formulation of an
appropriate forward image model.

10
1.4 EMPIRICAL STUDY

Deep learning, especially convolutional neural networks (CNN), has recently shown an explosive popularity
due to its success in computer vision. CNN combines local receptive field, time or spatial subsampling with
sharing weights to obtain the invariance to displacement, scale and deformation. Deep learning methods have
been successfully applied to other fields, such as image classification, speech recognition. Dong et al. first
applied CNN for image super resolution in 2014. They directly learned the mapping function between LR
images and corresponding HR images by CNN large data training. Then given an input image, a HR image
can be reconstructed by the learned model. This method can also be applied to color images, and better
reconstruction effects can be expected. Following Figure 5
gives the sketch map of Dong’s proposed model - super resolution convolutional neural network (SRCNN).

Fig 1. Sketch map of SRCNN

11
This SRCNN model contains three dominant function operations through three layers structure:

1. Patch Extraction and Representation


F1 (Y) = max(0, W1 * Y + B 1)
where W1 and B1 represent the filters and biases respectively. denotes the LR image. Rectified Linear
Unit (ReLU) was chosen as the activation function.
2. Non-Linear Mapping
F2 (Y) = max(0, W2 * F1(Y) + B2)
where W2 and B2 denotes the filters and biases respectively.
3. Image Reconstruction
The operation aggregates the HR patch wise representations to generate the final HR image through
the following equation.
F(Y) = W3 * F2 (Y) + B3
where W3 and B3 refer to the filters and biases respectively in the third layer. F(Y) is the
reconstructed image. Our goal is to learn the mapping by minimizing the loss between the
reconstructed HR image and the true one. Mean Square Error (MSE) can be treated as the loss
function by

CNN uses stochastic gradient descent for optimization. Here, the weight matrices of convolution
kernel are updated as:

where l and i are the indices of layers and iterations, is the learning rate, and is the derivative.

12
1.5 BRIEF DESCRIPTION OF SOLUTION APPROACH

Super-resolution is the process of recovering a high-resolution (HR) image from a low-resolution (LR)
image. We will refer to a recovered HR image as a super-resolved image or SR image. Super-resolution is an
ill-posed problem since a large number of solutions exist for a single pixel in an LR image. Simple
approaches like bilinear or bicubic interpolation use only local information in an LR image to compute pixel
values in the corresponding SR image.
Supervised machine learning approaches, on the other hand, learn mapping functions from LR images to HR
images from a large number of examples. Super-resolution models are trained with LR images as input and
HR images as target. The mapping function learned by these models is the inverse of a downgrade function
that transforms HR images to LR images. Downgrade functions can be known or unknown.
Known downgrade functions are used in image processing pipelines, for example, like bicubic
downsampling. With known downgrade functions, LR images can be automatically obtained from HR
images. This allows the creation of large training datasets from a vast amount of freely available HR images
which enables self-supervised learning.
If the downgrade function is unknown, supervised model training requires existing LR and HR image pairs
to be available which can be difficult to collect. Alternatively, unsupervised learning methods can be used
that learn to approximate the downgrade function from unpaired LR and HR images. In this article though,
we will use a known downgrade function (bicubic downsampling) and follow a supervised learning
approach.

13
High-level architecture

Many state-of-the-art super-resolution models learn most of the mapping function in LR space followed by
one or more upsampling layers at the end of the network. This is called post-upsampling SR in Fig. 1.
Upsampling layers are learnable and trained together with the preceding convolution layers in an end-to-end
manner.

Pre-upsampling SR

Post-upsampling SR
Fig 2. Upsampling layers positions

Earlier approaches first upsampled the LR image with a pre-defined upsampling operation and then learned
the mapping in HR space (pre-upsampling SR). A disadvantage of this approach is that more parameters per
layer are required which leads to higher computational costs and limits the construction of deeper neural
networks.

14
Residual design

Super-resolution requires that most of the information contained in an LR image must be preserved in the SR
image. Super-resolution models therefore mainly learn the residuals between LR and HR images. Residual
network designs are therefore of high importance: identity information is conveyed via skip connections
whereas reconstruction of high frequency content is done on the main path of the network.

Fig 3. Global Skip connection

Fig. 3. shows a global skip connection over several layers. These layers are often residual blocks as in
ResNet or specialized variants (see sections EDSR and WDSR). Local skip connections in residual blocks
make the network easier to optimize and therefore support the construction of deeper networks.

15
Upsampling layer

The upsampling layer used in this article is a sub-pixel convolution layer. Given an input of size H×W×C an
upsampling factor s, the sub-pixel convolution layer first creates a representation of size H×W×s^2C via a
convolution operation and then reshapes it to sH×sW×C completing the upsampling operation. The result is
an output spatially scaled by factor s.

Fig 4. Sub-pixel convolution

An alternative is transposed convolution layers. Transposed convolutions can be learned too but have the
disadvantage that they have a smaller receptive field than sub-pixel convolutions and can therefore process
less contextual information which often results in less accurate predictions.

16
1.6 COMPARISON OF THE EXISTING SOLUTION TO THE PROBLEM FRAMED

Super-resolution models

EDSR

Fig 5. Comparison of residual blocks in original, ResNet and ours

The EDSR architecture is based on the SRResNet architecture, consisting of multiple residual blocks. The
residual block in EDSR is shown above. The major difference from SRResNet is that the Batch
Normalization layers are removed. The author states that BN normalizes the input, thus limiting the range of
the network; removal of BN results in an improvement in accuracy. The BN layers also consume memory,
and removing them leads to up to a 40% memory reduction, making the network training more efficient.

17
Fig 6. EDSR architecture

18
WDSR

Fig 7. Resblock architecture

There is a total 2 variations of WDSR Architecture :


1. WDSR-A
2. WDSR-B
The only difference between WDSR-A and WDSR-B is Resblock architecture.
Summary of WDSR research:
1. They demonstrate that in residual networks for Single Image Super-Resolution, wider activation has
better performance with the same parameter complexity. Without additional computational overhead,
they propose network WDSR-A which has wider (2× to 4×) activation for better performance.
2. To further improve efficiency, they also propose linear low-rank convolution as a basic building
block for the construction of their SR network WDSR-B. It enables even wider activation (6× to 9×)
without additional parameters or computation and boosts accuracy further.
3. They have suggested that batch normalization is not suitable for training deep SR networks and
introduce weight normalization for faster convergence and better accuracy.
4. We train proposed WDSR-A and WDSR-B built on the principle of wide activation with weight
normalization and achieve better results on large-scale DIV2K image super-resolution benchmark.

19
ESRGAN

Our main aim is to improve the overall perceptual quality for SR. In this section, we first describe our
proposed network architecture and then discuss the improvements from the discriminator and perceptual loss.
At last, we describe the network interpolation strategy for balancing perceptual quality and PSNR

Fig 8. Basic architecture of SRResNet

We could select or design “basic blocks” (e.g., residual block , dense block, RRDB) for better performance.

Network Architecture:

In order to further improve the recovered image quality of SRGAN, we mainly make two modifications to the
structure of generator G: 1) remove all BN layers; 2) replace the original basic block with the proposed
Residual-in-Residual Dense Block (RRDB), which combines multi-level residual network and dense
connections as depicted in Fig. 10.

Fig 9. We remove the BN layers in residual block in SRGAN

20
Fig 10. RRDB block is used in our deeper model and β is the residual scaling parameter.

Removing BN layers has proven to increase performance and reduce computational complexity in different
PSNR-oriented tasks including SR and deblurring. BN layers normalize the features using mean and variance
in a batch during training and use estimated mean and variance of the whole training dataset during testing.
When the statistics of training and testing datasets differ a lot, BN layers tend to introduce unpleasant
artifacts and limit the generalization ability. We empirically observe that BN layers are more likely to bring 6
Xintao Wang et al. artifacts when the network is deeper and trained under a GAN framework. These artifacts
occasionally appear among iterations and different settings, violating the needs for a stable performance over
training. We therefore remove BN layers for stable training and consistent performance. Furthermore,
removing BN layers helps to improve generalization ability and to reduce computational complexity and
memory usage. We keep the high-level architecture design of SRGAN (see Fig. 8), and use a novel basic
block namely RRDB as depicted in Fig. 10. Based on the observation that more layers and connections could
always boost performance, the proposed RRDB employs a deeper and more complex structure than the
original residual block in SRGAN. Specifically, as shown in Fig. 10, the proposed RRDB has a
residual-in-residual structure, where residual learning is used in different levels. A similar network structure
is proposed that also applies a multilevel residual network. However, our RRDB differs in that we use dense
blocks in the main path as, where the network capacity becomes higher benefiting from the dense
connections. In addition to the improved architecture, we also exploit several techniques to facilitate training
a very deep network: 1) residual scaling, i.e., scaling down the residuals by multiplying a constant between 0
and 1 before adding them to the main path to prevent instability; 2) smaller initialization, as we empirically
find residual architecture is easier to train when the initial parameter variance becomes smaller

21
2. LITERATURE SURVEY

2.1 SUMMARY OF PAPERS

Enhanced Deep Residual Network for Single Image Super Resolution


Link to paper - https://arxiv.org/pdf/1707.02921.pdf

In this paper, the author has proposed an enhanced super-resolution algorithm. By removing unnecessary
modules from conventional ResNet architecture, they achieve improved results while making our model
compact. We also employ residual scaling techniques to stably train large models. Our proposed single-scale
model surpasses current models and achieves state-of-the-art performance. Furthermore, we develop a
multi-scale super-resolution network to reduce the model size and training time. With scale-dependent
modules and shared main network, our multi-scale model can effectively deal with various scales of
super-resolution in a unified framework. While the multi-scale model remains compact compared with a set
of single-scale models, it shows comparable performance to the single-scale SR model. Our proposed
single-scale and multi-scale models have achieved the top ranks in both the standard benchmark datasets and
the DIV2K dataset

Wide Activation for Efficient and Accurate Image Super-Resolution


Link to paper - https://arxiv.org/pdf/1808.08718.pdf

In this report, the author has introduced two super-resolution networks WDSR-A and WDSR-B based on the
central idea of wide activation. They demonstrate in our experiments that with the same parameter and
computation complexity, models with wider features before ReLU activation have better accuracy for single
image super-resolution. They also found training with weight normalization leads to better accuracy for deep
super-resolution networks compared to batch normalization or no normalization. The proposed methods may
help with other low-level image restoration tasks like denoising and dehazing.

22
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
Link to paper - https://arxiv.org/pdf/1809.00219v2.pdf

In this report the author has presented an ESRGAN model that achieves consistently better perceptual quality
than previous SR methods. The method won first place in the PIRM-SR Challenge in terms of the perceptual
index. They have formulated a novel architecture containing several RDDB blocks without BN layers. In
addition, useful techniques including residual scaling and smaller initialization are employed to facilitate the
training of the proposed deep model. They have also introduced the use of relativistic GAN as the
discriminator, which learns to judge whether one image is more realistic than another, guiding the generator
to recover more detailed textures. Moreover, they have enhanced the perceptual loss by using the features
before activation, which offer stronger supervision and thus restore more accurate brightness and realistic
textures.

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network


Link to paper - https://arxiv.org/pdf/1609.04802.pdf

In this the author has described a deep residual network SRResNet that sets a new state of the art on public
benchmark datasets when evaluated with the widely used PSNR measure. They have highlighted some
limitations of this PSNR-focused image super-resolution and introduced SRGAN, which augments the
content loss function with an adversarial loss by training a GAN. Using extensive MOS testing, they have
confirmed that SRGAN reconstructions for large upscaling factors (4×) are, by a considerable margin, more
photo-realistic than reconstructions obtained with state-of the-art reference methods

23
2.2 INTEGRATED SUMMARY OF THE LITERATURE STUDIED

Single image super-resolution (SISR) is a notoriously challenging ill-posed problem that aims to obtain a
high resolution (HR) output from one of its low-resolution (LR) versions. Recently, powerful deep learning
algorithms have been applied to SISR and have achieved state-of-the-art performance. In this survey, we
review representative deep learning-based SISR methods and group them into two categories according to
their contributions to two essential aspects of SISR: the exploration of efficient neural network architectures
for SISR and the development of effective optimization objectives for deep SISR learning. For each category,
a baseline is first established, and several critical limitations of the baseline are summarized. Then,
representative works on overcoming these limitations are presented based on their original content, as well as
our critical exposition and analyses, and relevant comparisons are conducted from a variety of perspectives.

24
3. REQUIREMENT ANALYSIS AND DATASET USED

3.1 HARDWARE AND SOFTWARE REQUIREMENTS

Hardware used to train the models:


● Ryzen 7 5800H Processor
● 32 GB RAM
● 1 TB SSD storage
● Nvidia RTX 3070 GPU

Software used
● Windows 11 OS
● Anaconda with Python 3.6
● Cuda Toolkit 10.1
● Matplotlib 3.1.1
● Tensorflow 2.3.1
● Jupyter Notebook

25
3.2 DATASET

The DIV2K dataset. Available at : https://data.vision.ee.ethz.ch/cvl/DIV2K/

The dataset is a large newly collected dataset -DIV2K- of RGB images with a large diversity of contents.
The DIV2K dataset is divided into:

● train data: starting from 800 high definition high resolution images we obtain corresponding low
resolution images and provide both high and low resolution images for 2, 3, and 4 downscaling
factors
● validation data: 100 high definition high resolution images are used for generating low resolution
corresponding images, the low res are provided from the beginning of the challenge and are meant
for the participants to get online feedback from the validation server; the high resolution images will
be released when the final phase of the challenge starts.
● test data: 100 diverse images are used to generate low resolution corresponding images; the
participants will receive the low resolution images when the final evaluation phase starts and the
results will be announced after the challenge is over and the winners are decided.

Set5 Dataset
The Set5 dataset is a dataset consisting of 5 images (“baby”, “bird”, “butterfly”, “head”, “woman”)
commonly used for testing performance of Image Super-Resolution models.
We used this dataset to test the performance of our models

26
4. MODELING AND IMPLEMENTATION DETAILS

4.1 DATA COLLECTION

In this project, We have trained the model on the DIV2K dataset which contains high-quality (2K resolution)
images and a corresponding downgraded images dataset for image restoration tasks.

DIV2K dataset has many downgrading factors which are stated below
bicubic_x2, bicubic_x3, bicubic_x4, bicubic_x8, unknown_x2, unknown_x3, unknown_x4,
realistic_mild_x4, realistic_difficult_x4, realistic_wild_x4

We have used bicubic_x4 for model training. We have collected data using tfds TensorFlow DataLoader API

4.2 DATA AUGMENTATION

4.2.1 What is Data Augmentation?

Data Augmentation is a technique to get different varieties of the same image like flipped image, rotated
image, cropped image, etc.
It is a way of getting a different variety of data especially when training data is very less in number as we
have DIV2k data which has only 800 training images.
There are only 800 images for training which is very few to train the model so we will do data augmentation
on train data to get more kinds of different varieties of images.

4.2.2 Why Augmentation ?

Augmentation is very important to get more generalized results from the trained model. Because by training
on augmented data models can learn a variety of features so augmentation gives more generalized results.

27
4.2.3 Applying Augmentation (Crop, Rotate, Flip)

Random Crop: We will crop a random part of an image with size 96 X 96 from High-Resolution Image and
the corresponding patch from Low-resolution Image.
If the scaling factor is 4 and if we crop 96 X 96 patches from the HR image then Corresponding patch size
from the Low-Resolution image would be 24 X 24 (96/4 = 24.

Random Flip: In this Operation, we will flip LR and HR images if the generated random value from
tf.random.normal is less than 0.5 then we do left_right flip, otherwise we won’t flip images.

Random Rotate: In this operation, we will rotate LR and HR images by 90 degrees multiple times. We will
generate one random number using tf.random.uniform with a max value of 4. So as an example if the
generated random value is 2 then we will rotate the image by 90 degrees 2 times

4.5 VISUALISE PREPROCESSES IMAGES FROM TRAIN DATA

Fig 11. Train_processed_example

As we can see from above images there is a random cropped part from train images which will be different
every time we execute code block.

28
4.6 VISUALISE PREPROCESSED IMAGES FROM VALIDATION DATA

Validation data is not augmented because we want to get results on the whole validation image.

Fig 12. Validation_preprocessed

As we can see we have not done any kind of augmentation and preprocessing on validation dataset. The only
thing we have is the assigned batch size.

29
TRAINING

EDSR
Considered following is the complete Architecture of the EDSR model
1. Normalize Input by subtracting DIV2K RGB mean
2. Conv2d layer with 64 filters and kernel size=3
3. Resblock (In our model 8 ResBlocks)
4. Conv2d
5. Add (ResBlock output and original Input)
6. Upsampling ( Conv2d → Pixel Shuffle)

To train the EDSR model as per the research paper we have to train a model for 300000 steps and evaluate
the model at every 1000 steps on the first 10 validation set images

30
WDSR
WDSR general Architecture :
1. Normalize Input
2. Conv2d Weight Normalization (main branch)
3. ResBlock (different in WDSR-A and WDSR-B)
4. Conv2d Weight Normalization (With given Scale)
5. Shuffle Pixels (with given scale)
6. Conv2d WeightNormalization (Skip branch)
7. Shuffle Pixels (Skip branch with given scale)
8. Add main branch output and skip branch output ( Add step 5 and step 7)
9. Denormalize using DIV2K RGB mean
10. Output Model

We train WDSR B model for 300,000 steps and evaluate model every 1000 steps on the first 10 images of the
DIV2K validation set.

31
SRGAN

Generator pre-training-
1. Create a training context for the generator (SRResNet) alone.
2. Pre-train the generator with 1,000,000 steps (100,000 works fine too).
3. Save weights of pre-trained generators (needed for fine-tuning with GAN).
Generator fine-tuning (GAN)
1. Create a new generator and init it with pre-trained weights.
2. Create a training context for the GAN (generator + discriminator).
3. Train the GAN with 200,000 steps.
4. Save weights of generator and discriminator.

32
5. RESULTS

EDSR RESULTS

Fig 13. EDSR Results

33
WDSR RESULTS

Fig 14. WDSR Results

34
ESRGAN RESULTS

Fig. 15 ESRGAN Results

35
Fig 16. ESRGAN Results

36
6. METRICS COMPARISON

As a performance metric, we have used PSNR(Peak Signal to Noise Ratio) and SSIM(Structural
Similarity Index).
Both metrics are readily available in TensorFlow API.
If we can show that an algorithm or set of algorithms can enhance a degraded known image to more
closely resemble the original, then we can more accurately conclude that PSNR is a better
algorithm.
The proposal is that the higher the PSNR, the better the degraded image has been reconstructed to
match the original image and the better the reconstructive algorithm. This would occur because we
wish to minimize the MAE between images with respect to the maximum signal value of the image.
Formulation of the Loss function and Performance metric

Here,
f — represents the matrix data of our original image
g — represents the matrix data of our degraded image in question
m — represents the numbers of rows of pixels of the images and i represents the index of that row
n — represents the number of columns of pixels of the image and j represents the index of that
column
Max(f) — is the maximum signal value that exists in our original “known to be good” image

37
Table 1. Metrics

38
References

[1] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, Kyoung Mu Lee. Enhanced Deep
Residual Networks for Single Image Super-Resolution. In CVPR 2017

[2] Jiahui Yu, Yuchen Fan, Jianchao Yang, Ning Xu, Zhaowen Wang, Xinchao Wang, Thomas
Huang. Wide Activation for Efficient and Accurate Image Super-Resolution. In CVPR 2018

[3] Yuchen Fan, Jiahui Yu, and Thomas S Huang. “Wide-activated Deep Residual Networks based
Restoration for BPG-compressed Images”. In: The IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) Workshops.

[4] Yuchen Fan et al. “Balanced two-stage residual networks for image super-resolution”. In:
Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on. IEEE.
2017, pp. 1157–1164.

[5] R. Timofte, E. Agustsson, L. Van Gool, M.-H. Yang, L. Zhang, et al. Ntire 2017 challenge on
single image superresolution: Methods and results. In CVPR 2017 Workshops.

[6] Eirikur Agustsson, Radu Timofte. NTIRE 2017 Challenge on Single Image Super-Resolution:
Dataset and Study

[7] https://www.tensorflow.org/datasets/catalog/div2k

[8] Alain Horé, Djemel Ziou. Image Quality Metrics: PSNR vs. SSIM. In 2010 20th International
Conference on Pattern Recognition

[9] https://cvnote.ddlee.cc/2019/09/22/image-super-resolution-datasets

39

You might also like