Video Super-Resolution Reconstruction Based On Deep Learning and Spatio-Temporal Feature Self-Similarity Extended Abstract

Uploaded by

robinsonaziel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views2 pages

Video Super-Resolution Reconstruction Based On Deep Learning and Spatio-Temporal Feature Self-Similarity Extended Abstract

Uploaded by

robinsonaziel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

2023 IEEE 39th International Conference on Data Engineering (ICDE)

Video Super-Resolution Reconstruction

Based on Deep Learning and Spatio-Temporal
Feature Self-similarity (Extended abstract)
2023 IEEE 39th International Conference on Data Engineering (ICDE) | 979-8-3503-2227-9/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICDE55515.2023.00365

Meiyu Liang, Junping Du, Linghui Li, Zhe Xue, Xiaoxiao Wang, Feifei Kou, and Xu Wang
Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, School of Computer Science
Beijing University of Posts and Telecommunications
Beijing, China
[email protected]; [email protected]; [email protected]; [email protected]

I. I NTRODUCTION AND M OTIVATION Gaussian

Correlation mapping learning based on External nonlocal Dictionary
component
deep convolution neural network self-similarity prior
Video super-resolution (SR) reconstruction technology aims at learning
Patch extraction Nonlinear feature (PG-GMM model)
obtaining high quality reconstruction of high-resolution (HR) video and sparse
representation layer
mapping layer
Reconstruction layer

sequences by inferring the lost detailed information from their low-

resolution (LR) counterparts. However, this technology is an ill-

...

...
...
posed problem because signiﬁcant detailed information is lost in
the process of video degrading. The existing learning-based SR
reconstruction methods can be adapted to a larger super-resolution
factor, but it cannot be guaranteed that any low-resolution image
block can ﬁnd its corresponding high-resolution block matching in a Internal spatio-temporal nonlocal
External nonlocal
External nonl ocal similarity
s imilarity matching
mat
self-similarity matching
limited-scale training set. Some noise and over smooth phenomenon Self-adaptive region correlation judgment

usually exist while dealing with some unique features that rarely
Feature and structural similarity matching in the
appear in a given training data set. The self-similarity based SR 3D spatio-temporal domain

methods do not rely on accurate sub-pixel motion estimation and Nref+1

thus can be adapted to complex motion patterns. However, under Nref
conditions of insufﬁcient internal similar blocks, some visual ﬂaws Nref-1
Gaussian component
Gaussian component D ict ionary
Dictionary Weighted sparse
are usually produced due to the mismatched internal instances. selection selection coding

Yref+1
To address the above problems, we utilize the complementary and Yref-1 Yref Objective
Nonlocal similarity
comprehensive advantages of both and construct a novel video SR Spatio-temporal domain ST (Sliding time window) weighted fusion
high-resolution
estimation
mechanism based on deep learning and spatio-temporal feature sim-
ilarity with joint internal and external constraints learning (DLSS-
Fig. 1. The proposed DLSS-VSR algorithm architecture.
VSR). This paper not only considers the deep LR-HR correlations,
but also considers the non-local similarity of video in the spatio-
temporal domain. Thus it can avoid the occurrence of jitter between matching, a robust similarity calculation strategy is proposed by
video frames and can maintain the spatio-temporal consistency well. combining the spatio-temporal moment feature similarity and the
For some smooth regions and irregular structure information, which structural similarity. The external nonlocal similarity prior constraint
rarely appear in the video sequence, the external constraints by deep is learned by the patch group-based Gaussian mixture model (PG-
learning can play a greater role, and for some unique and singular GMM).
features that rarely appear in the external training set and repeat (4) The time efﬁciency for spatio-temporal similarity matching
within the video sequence, internal similarity constraints can play a is improved based on saliency detection and region correlation
greater role. In summary, the novelty and contributions of this paper judgment strategy, which can achieve a better tradeoff between SR
are as follows: accuracy and speed.
(1) By combining the external deep correlation mapping and
internal spatio-temporal self-similarity prior constraints, we propose II. T HE P ROPOSED DLSS-VSR A LGORITHM
a video SR reconstruction mechanism with joint external and internal The framework of the proposed DLSS-VSR algorithm is shown
constraints. in Fig.1. It consists of four main processes: correlation mapping
(2) A deep learning model based on deep convolution neural learning based on deep convolution neural network, internal spatio-
network is constructed to learn the nonlinear correlation mapping temporal nonlocal self-similarity matching, external nonlocal simi-
between HR and LR video frame blocks. larity matching, and nonlocal similarity weighted fusion.
(3) We propose a novel spatio-temporal feature similarity calcu- Correlation mapping learning based on deep convolution neural
lation strategy, which considers the internal spatio-temporal self- network: a deep learning model based on deep convolution neural
similarity of the video and the external nondegrading nonlocal network is constructed to learn the nonlinear correlation mapping
similarity. For the internal spatio-temporal feature self-similarity between HR and LR video frame blocks. For efﬁciency, it mainly
includes three core layers in the network structure: Patch extraction
This work was supported by the National Natural Science Foundation of China
(No. 61877006, No.62192784, No. 61532006, No. 61772083, No. 61802028), and and sparse representation layer, nonlinear feature mapping layer and
CAAI-Huawei MindSpore Open Fund. (Corresponding author: Junping Du) reconstruction layer.

DOI 10.1109/ICDE55515.2023.00365
Authorized licensed use limited to: York University. Downloaded on July 09,2024 at 15:55:26 UTC from IEEE Xplore. Restrictions apply.
TABLE I. THE AVERAGE PSNR OF SR RECONSTRUCTION EFFECTS OF DIFFERENT ALGORITHMS(UPSCALING FACTOR=4)
Video Sequence ScSR ANRSR DPSR CNN-SR NL-SR ZM-SR CSCN ESPCN VDSR DEEP VSR DBPN RBPN DLSS-
VSR
Satellite-1 30.88 31.63 31.15 30.97 31.35 31.38 30.08 30.16 31.15 31.20 31.45 31.51 31.75
Satellite-2 25.43 24.89 25.66 25.31 25.35 25.68 24.82 25.12 26.31 26.38 26.58 27.02 27.16
Satellite-3 25.19 23.14 25.54 24.94 26.61 26.89 24.86 24.99 26.77 26.66 26.05 26.32 27.60
Forman 26.76 22.14 27.42 26.62 26.89 26.03 27.56 27.42 29.35 28.62 28.84 28.21 29.46
Calendar 20.19 20.43 20.35 20.13 20.30 20.85 19.71 19.80 21.30 21.32 21.32 21.41 21.61
Coastguard 24.52 22.33 24.72 24.50 24.59 25.04 23.87 23.91 24.74 24.60 25.28 25.31 25.92
Suzie 27.86 20.83 28.18 27.61 28.41 29.47 27.65 27.70 29.29 28.82 28.4 28.98 29.55
Mother Daughter 20.06 18.34 20.38 19.94 20.31 20.48 19.96 20.03 20.44 20.75 20.21 21.19 20.53
Miss America 26.02 24.41 26.39 25.87 26.97 27.02 25.97 25.96 27.34 27.05 27.16 27.5 27.56
Ice 19.49 16.79 19.85 19.35 19.93 20.27 19.43 19.44 19.99 20.36 20.29 20.41 20.57
Football 23.80 22.12 24.03 23.63 24.89 25.41 23.15 23.72 24.78 25.28 24.75 25.44 26.61
Carphone 24.20 18.78 24.94 23.88 25.49 25.83 24.23 23.97 26.15 26.19 26.07 26.16 26.30
Akiyo 27.15 23.51 27.49 26.81 27.15 27.43 26.87 26.90 27.51 27.65 27.19 27.74 27.99
Average 24.73 22.26 25.08 24.58 25.25 25.52 24.47 24.55 25.78 25.76 25.68 26.00 26.28

Internal spatio-temporal nonlocal self-similarity matching: a novel methods and six DLSS-VSR variants, including the learning based
spatio-temporal similarity measure strategy based on the moment video and image super-resolution methods DEEP VSR, RCAN,
feature similarity and structural similarity is proposed for internal ESPCN, VDSR, CNN-SR, CSCN, ScSR, ANRSR, DBPN, RBPN,
similarity matching. Furthermore, a self-adaptive region correlation DPSR, and the spatio-temporal similarity based super-resolution
judgment strategy is proposed to improve efficiency. methods NL-SR and ZM-SR. The SR effects are validated in
External nonlocal similarity matching: a patch group based Gaus- terms of subjective visual evaluation and four objective quantitative
sian mixture model (PG-GMM) is constructed to learn the external indices: peak signal-to-noise ratio (PSNR), multi-scale structural
nonlocal self-similarity. The best matching Gaussian component and similarity based on visual perception (MS-SSIM), root-mean-square
dictionary are selected for sparse reconstruction. error (RMSE), and information fidelity indicator (IFC).
Nonlocal similarity weighted fusion: the objective high-resolution Three experiments are conducted to evaluate the performance of
estimation is obtained by the weighted fusion of both internal and the proposed DLSS-VSR algorithm. In Experiment 1, we compare
external nonlocal similarities. it with thirteen state-of-the-art comparison algorithms in terms of
subjective visual evaluation and objective quantitative indices. The
III. P ROBLEM D EFINITION results of overall visual effects and magnified local textures show that
T
Given an input LR video sequence Y = {yt [i, j]}t=1 and a set of compared with other algorithms, the proposed DLSS-VSR algorithm
LR and HR training pairs, the objective is to infer the corresponding achieves better SR performance with more prominent edge contour
T
HR video sequence X = {xt [i, j]}t=1 , where T denotes the video and clearer details. And the results of the objective indices of SR for
frame number. The mathematical model of the proposed DLSS-VSR different algorithms with an upscaling factor of 3 and an upscaling
algorithm is formulated to minimize the following objective energy factor of 4 shown in Table 1 indicate that the proposed DLSS-
function: VSR achieves better performance than other algorithms, which is
∧ owing to the combination of external LR-HR correlation mapping
X∗ = arg min {λ1 ESR
DLCM
(X, Y) learning and spatio-temporal similarity of video, which can make full
X (1)
ST N S P GN S use of their complementary advantages. In terms of time efficiency,
+ λ2 ESR (X, Y) + λ3 ESR (X, Y)}
the proposed DLSS-VSR achieves a better compromise between
∧
where X∗ denotes the HR estimate of the LR video sequence. accuracy and time performance. From temporal profile of different
DLCM
ESR (X, Y) denotes the external LR-HR correlation mapping SR algorithms by extracting the same horizontal row of pixels from
prior constraint element. ESR ST N S
(X, Y) denotes the internal spatio- a number of frames in the video and stacking them vertically into a
temporal nonlocal similarity prior constraint element, which aims new image, we can see that the proposed DLSS-VSR performs best
to improve SR performance by nonlocal similarity matching and which produces the most consistent results and sharper details.
fusion of the single-frame spatial domain (single-scale) and the In Experiment 2, we verify the effects of different internal and
spatio-temporal domain (multi-scale) between adjacent video frames. external spatio-temporal nonlocal similarity constraints on the video
P GN S
ESR (X, Y) denotes the external patch group based nonlocal SR reconstruction. The PSNR, MS SSIM, IFC and RMSE index
similarity prior constraint element, which is used to optimize SR comparison curves of the SR effects on different constraints demon-
performance by nonlocal similarity from external clear video frames. strate that compared with DLSR NLM, DLSR STNS, DLSR PGNS
λ1 , λ2 and λ3 are the balancing parameters of ESR DLCM
(X, Y), and DLSR PGNS NLM, the proposed DLSR STNS PGNS NLM
ST N S
ESR (X, Y) and ESR P GN S
(X, Y), which are chosen from the set is based on internal spatio-temporal nonlocal similarity and joint
[0, 1], respectively, and λ1 + λ2 + λ3 = 1. internal-external single frame nonlocal similarity constraints, allow-
ing it to achieve better quantitative evaluation index values than the
IV. EXPERIMENTAL RESULTS AND ANALYSIS algorithms based on simple similarity constraints.
To demonstrate the effectiveness of the proposed DLSS-VSR In Experiment 3, we verify the impact of the proposed strategies of
algorithm, the benchmark and spatial video sequences datasets region correlation judgment and visual saliency on the performance
are used, such as ’Forman’, ’Calendar’, ’Coastguard’, ’Suzie’, of video SR, the results show that DLSS-VSR achieves better overall
’Mother Daughter’, ’Miss America’, ’Ice’, ’Football’, ’Carphone’, performance with higher efficiency. It shows that the addition of
’Akiyo’, ’Satellite-1’, ’Satellite-2’ and ’Satellite-3’. We compare the region correlation judgment strategy and visual saliency detection
proposed DLSS-VSR with thirteen state-of-the-art super-resolution strategy can improve the time efficiency of video SR significantly.