UMMFF: Unsupervised Multimodal Multilevel Feature Fusion Network for Hyperspectral Image Super-Resolution
Abstract
:1. Introduction
- (1)
- UMMFF has designed a gated cross-retention shared encoder to address the issue of insufficient utilization of intermodal information in multimodal image fusion. This method captures local features by introducing a gated retention mechanism and establishes a cross-sharing relationship between different modal queries (Q) and keys (K) to achieve information complementarity.
- (2)
- UMMFF constructs a multilevel spatial attention and channel attention parallel fusion decoder to resolve the lack of feature enhancement for different modalities within the decoder. This decoder employs channel attention and spatial attention for multispectral and hyperspectral images, respectively, to extract spatial–spectral features. The resulting attention features enhance the capability of extracting and fusing spatial–spectral information across the following three levels: low, mid, and high.
- (3)
- UMMFF proposes an implicit representation blind estimation degradation network based on prior knowledge to tackle the issue of low optimization degrees in prior regularization networks. This network utilizes positional encoding to regularize a multilayer perceptron, thereby increasing the optimization degrees while combining degradation prior knowledge to constrain the network and avoid local optima. Consequently, the network can accurately estimate degradation parameters, enhancing the reconstruction accuracy and generalizability of the unsupervised algorithm.
2. Related Work
2.1. Theoretical Foundation of the Model
2.2. Transformer
2.3. Blind Estimation Network
3. Methods
3.1. Gate Cross-Retention Shared Encoder
- Gate Cross Retention (GCR)
- 2.
- Feed forward network (FFN)
3.2. Multilevel Spatial–Channel Attention Parallel Fusion Decoder
- Low-level feature extraction
- 2.
- Mid-level feature extraction
- 3.
- High-level feature extraction
3.3. Prior-Based Implicit Representation Blind Estimation of Degraded Networks and Loss Design Adopt
- Positional encoding
- 2.
- Multilayer perception network
- 3.
- Loss function based on prior knowledge
4. Experiments
4.1. Datasets
4.2. Hardware Environment for Setting Model Parameters
4.3. Evaluating Indicator
5. Results and Discussion
5.1. Visualization and Analysis of Super-Resolution
5.2. Comparison with Advanced Methods
5.3. Ablation Experiment
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Pande, C.B.; Moharir, K.N. Application of hyperspectral remote sensing role in precision farming and sustainable agriculture under climate change: A review. In Climate Change Impacts on Natural Resources, Ecosystems and Agricultural Systems; Springer Climate; Springer: Cham, Switzerland, 2023; pp. 503–520. [Google Scholar]
- Zhang, M.; Chen, T.; Gu, X.; Chen, D.; Wang, C.; Wu, W.; Zhu, Q.; Zhao, C. Hyperspectral remote sensing for tobacco quality estimation, yield prediction, and stress detection: A review of applications and methods. Front. Plant Sci. 2023, 14, 1073346–1073360. [Google Scholar] [CrossRef] [PubMed]
- Pan, B.; Cai, S.; Zhao, M.; Cheng, H.; Yu, H.; Du, S.; Du, J.; Xie, F. Predicting the Surface Soil Texture of Cultivated Land via Hyperspectral Remote Sensing and Machine Learning: A Case Study in Jianghuai Hilly Area. Appl. Sci. 2023, 13, 9321. [Google Scholar] [CrossRef]
- Liu, L.; Miteva, T.; Delnevo, G.; Mirri, S.; Walter, P.; de Viguerie, L.; Pouyet, E. Neural networks for hyperspectral imaging of historical paintings: A practical review. Sensors 2023, 23, 2419. [Google Scholar] [CrossRef]
- Vlachou-Mogire, C.; Danskin, J.; Gilchrist, J.R.; Hallett, K. Mapping materials and dyes on historic tapestries using hyperspectral imaging. Heritage 2023, 6, 3159–3182. [Google Scholar] [CrossRef]
- Huang, S.-Y.; Mukundan, A.; Tsao, Y.-M.; Kim, Y.; Lin, F.-C.; Wang, H.-C. Recent advances in counterfeit art, document, photo, hologram, and currency detection using hyperspectral imaging. Sensors 2022, 22, 7308. [Google Scholar] [CrossRef]
- da Lomba Magalhães, M.J. Hyperspectral Image Fusion—A Comprehensive Review. Master’s Thesis, Itä-Suomen Yliopisto, Kuopio, Finland, 2022. [Google Scholar]
- Zhang, M.; Sun, X.; Zhu, Q.; Zheng, G. A survey of hyperspectral image super-resolution technology. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS 2021, Brussels, Belgium, 11–16 July 2021; pp. 4476–4479. [Google Scholar]
- Dian, R.; Li, S.; Sun, B.; Guo, A. Recent advances and new guidelines on hyperspectral and multispectral image fusion. Inf. Fusion 2021, 69, 40–51. [Google Scholar] [CrossRef]
- Chen, Z.; Pu, H.; Wang, B.; Jiang, G.-M. Fusion of hyperspectral and multispectral images: A novel framework based on generalization of pan-sharpening methods. IEEE Geosci. Remote Sens. Lett. 2014, 11, 1418–1422. [Google Scholar] [CrossRef]
- Jia, S.; Qian, Y. Spectral and spatial complexity-based hyperspectral unmixing. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3867–3879. [Google Scholar]
- Akhtar, N.; Shafait, F.; Mian, A. Bayesian sparse representation for hyperspectral image super resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015, Boston, MA, USA, 7–12 June 2015; pp. 3631–3640. [Google Scholar]
- Xie, W.; Jia, X.; Li, Y.; Lei, J. Hyperspectral image super-resolution using deep feature matrix factorization. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6055–6067. [Google Scholar] [CrossRef]
- Dian, R.; Fang, L.; Li, S. Hyperspectral image super-resolution via non-local sparse tensor factorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 5344–5353. [Google Scholar]
- Liu, J.; Wu, Z.; Xiao, L.; Sun, J.; Yan, H. A truncated matrix decomposition for hyperspectral image super-resolution. IEEE Trans. Image Process. 2020, 29, 8028–8042. [Google Scholar] [CrossRef]
- Wan, W.; Guo, W.; Huang, H.; Liu, J. Nonnegative and nonlocal sparse tensor factorization-based hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8384–8394. [Google Scholar] [CrossRef]
- Li, J.; Cui, R.; Li, B.; Song, R.; Li, Y.; Dai, Y.; Du, Q. Hyperspectral image super-resolution by band attention through adversarial learning. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4304–4318. [Google Scholar] [CrossRef]
- Hu, J.-F.; Huang, T.-Z.; Deng, L.-J.; Jiang, T.-X.; Vivone, G.; Chanussot, J. Hyperspectral image super-resolution via deep spatiospectral attention convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 7251–7265. [Google Scholar] [CrossRef] [PubMed]
- Hu, J.-F.; Huang, T.-Z.; Deng, L.-J.; Dou, H.-X.; Hong, D.; Vivone, G. Fusformer: A transformer-based fusion network for hyperspectral image super-resolution. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6012305. [Google Scholar] [CrossRef]
- Qu, Y.; Qi, H.; Kwan, C. Unsupervised sparse dirichlet-net for hyperspectral image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2511–2520. [Google Scholar]
- Yao, J.; Hong, D.; Chanussot, J.; Meng, D.; Zhu, X.; Xu, Z. Cross-attention in coupled unmixing nets for unsupervised hyperspectral super-resolution. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXIX 16. pp. 208–224. [Google Scholar]
- Li, J.; Zheng, K.; Yao, J.; Gao, L.; Hong, D. Deep Unsupervised Blind Hyperspectral and Multispectral Data Fusion. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6007305. [Google Scholar] [CrossRef]
- Qu, Y.; Qi, H.; Kwan, C.; Yokoya, N.; Chanussot, J. Unsupervised and unregistered hyperspectral image super-resolution with mutual Dirichlet-Net. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5507018. [Google Scholar] [CrossRef]
- Liu, J.; Wu, Z.; Xiao, L.; Wu, X.-J. Model inspired autoencoder for unsupervised hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5522412. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Lin, H.; Cheng, X.; Wu, X.; Shen, D. Cat: Cross attention in vision transformer. In Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan, 18–22 July 2022; pp. 1–6. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
- Conde, M.V.; Choi, U.-J.; Burchi, M.; Timofte, R. Swin2SR: Swinv2 transformer for compressed image super-resolution and restoration. In Computer Vision–ECCV 2022 Workshops, Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022; pp. 669–687. [Google Scholar]
- Sun, Y.; Dong, L.; Huang, S.; Ma, S.; Xia, Y.; Xue, J.; Wang, J.; Wei, F. Retentive network: A successor to transformer for large language models. arXiv 2023, arXiv:2307.08621. [Google Scholar]
- Zheng, K.; Gao, L.; Liao, W.; Hong, D.; Zhang, B.; Cui, X.; Chanussot, J. Coupled convolutional neural network with adaptive response function learning for unsupervised hyperspectral super resolution. IEEE Trans. Geosci. Remote Sens. 2020, 59, 2487–2502. [Google Scholar] [CrossRef]
- Gao, L.; Li, J.; Zheng, K.; Jia, X. Enhanced Autoencoders with Attention-Embedded Degradation Learning for Unsupervised Hyperspectral Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5509417. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Li, J.; Li, Y.; Wang, C.; Ye, X.; Heidrich, W. Busifusion: Blind unsupervised single image fusion of hyperspectral and rgb images. IEEE Trans. Comput. Imaging 2023, 9, 94–105. [Google Scholar] [CrossRef]
- Palsson, F.; Sveinsson, J.R.; Ulfarsson, M.O.; Benediktsson, J. Quantitative quality evaluation of pansharpened imagery: Consistency versus synthesis. IEEE Trans. Geosci. Remote Sens. 2015, 54, 1247–1259. [Google Scholar] [CrossRef]
- Kruse, F.A.; Lefkoff, A.; Boardman, J.; Heidebrecht, K.; Shapiro, A.; Barloon, P.; Goetz, A. The spectral image processing system (SIPS)—Interactive visualization and analysis of imaging spectrometer data. Remote Sens. Environ. 1993, 44, 145–163. [Google Scholar] [CrossRef]
- Wald, L. Quality of high resolution synthesised images: Is there a simple criterion? In Proceedings of the Third Conference” Fusion of Earth Data: Merging Point Measurements, Raster maps and Remotely Sensed Images”. SEE/URISCA, Sophia Antipolis, France, 26–28 January 2000; pp. 99–103. [Google Scholar]
- Han, X.-H.; Shi, B.; Zheng, Y. SSF-CNN: Spatial and spectral fusion with CNN for hyperspectral image super-resolution. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 2506–2510. [Google Scholar]
- Zhang, X.; Huang, W.; Wang, Q.; Li, X. SSR-NET: Spatial–spectral reconstruction network for hyperspectral and multispectral image fusion. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5953–5965. [Google Scholar] [CrossRef]
- Wang, X.; Wang, X.; Song, R.; Zhao, X.; Zhao, K.J.K.-B.S. MCT-Net: Multi-hierarchical cross transformer for hyperspectral and multispectral image fusion. Knowl.-Based Syst. 2023, 264, 110362–110375. [Google Scholar] [CrossRef]
- Zhang, L.; Nie, J.; Wei, W.; Zhang, Y.; Liao, S.; Shao, L. Unsupervised adaptation learning for hyperspectral imagery super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA, 13–19 June 2020; pp. 3073–3082. [Google Scholar]
- Chen, S.; Zhang, L.; Zhang, L. Msdformer: Multi-scale deformable transformer for hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2023, 601, 5525614–5525628. [Google Scholar]
- Ma, Q.; Jiang, J.; Liu, X.; Ma, J. Reciprocal transformer for hyperspectral and multispectral image fusion. Inf. Fusion 2024, 104, 102148–102159. [Google Scholar] [CrossRef]
Def: Gate cross-retention shared encoder (Y, Z) |
---|
, , ,, = initialize_parameters() D = create_diagonal_matrix(Y) = Embedding(Y) = layer_normalization() Q = D × ( @ ) = Embedding(Z) = layer_normalization() K = D × (@ ) V = @ Attention_scores = normalize(Q @ @ D) = Attention_scores @ V = group_normalization() = reshape(, shape(Y)) output = swish(× ) F = output @ × = feed_forward_network(final_output) = final_output + return |
Index | SSFCNN [39] | SSRnet [40] | MCT [41] | MIAE [24] | Fusformer [19] | UAL [42] | MSDformer [43] | DCT [44] | Ours |
---|---|---|---|---|---|---|---|---|---|
PSNR ↑ | 42.41 | 41.37 | 42.73 | 44.74 | 40.84 | 46.41 | 43.59 | 41.52 | 50.38 |
RMSE ↓ | 0.0057 | 0.0065 | 0.0055 | 0.0097 | 0.0069 | 0.0036 | 0.0050 | 0.0064 | 0.0044 |
SAM ↓ | 1.61 | 1.97 | 1.88 | 2.20 | 1.74 | 1.08 | 1.45 | 1.87 | 1.01 |
ERGAS ↓ | 8.02 | 7.62 | 7.19 | 1.0 | 4.20 | 1.91 | 2.48 | 4.84 | 0.47 |
Index | SSFCNN [39] | SSRnet [40] | MCT [41] | MIAE [24] | Fusformer [19] | UAL [42] | MSDformer [43] | DCT [44] | Ours |
---|---|---|---|---|---|---|---|---|---|
PSNR ↑ | 34.10 | 37.32 | 37.41 | 40.82 | 37.04 | 40.29 | 37.67 | 34.36 | 43.15 |
RMSE ↓ | 0.0032 | 0.0022 | 0.0022 | 0.0014 | 0.0023 | 0.0015 | 0.0021 | 0.0031 | 0.0019 |
SAM ↓ | 3.23 | 2.14 | 2.11 | 1.38 | 5.81 | 1.46 | 1.91 | 2.57 | 1.77 |
ERGAS ↓ | 9.92 | 7.65 | 7.92 | 1.22 | 2.17 | 1.56 | 2.30 | 5.17 | 0.57 |
Index | SSFCNN [39] | SSRnet [40] | MCT [41] | MIAE [24] | Fusformer [19] | UAL [42] | MSDformer [43] | DCT [44] | Ours |
---|---|---|---|---|---|---|---|---|---|
PSNR ↑ | 39.33 | 40.17 | 40.48 | 41.91 | 39.55 | 42.24 | 42.26 | 41.94 | 42.37 |
RMSE ↓ | 0.0107 | 0.0098 | 0.0094 | 0.0080 | 0.0105 | 0.0077 | 0.0077 | 0.0079 | 0.0093 |
SAM ↓ | 2.91 | 2.73 | 2.65 | 2.27 | 2.92 | 2.26 | 2.16 | 2.20 | 2.37 |
ERGAS ↓ | 2.19 | 2.01 | 1.96 | 1.73 | 2.18 | 1.63 | 1.63 | 1.73 | 0.73 |
X4 | X8 | X16 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Methods | PSNR | RMSE | SAM | ERGAS | PSNR | RMSE | SAM | ERGAS | PSNR | RMSE | SAM | ERGAS |
SSFCNN [39] | 39.56 | 0.0073 | 0.57 | 0.33 | 38.89 | 0.0079 | 0.64 | 0.36 | 37.90 | 0.0089 | 0.81 | 0.40 |
SSRnet [40] | 40.11 | 0.0069 | 0.58 | 0.31 | 39.32 | 0.0076 | 0.70 | 0.34 | 37.72 | 0.0091 | 0.88 | 0.42 |
MCT [41] | 42.65 | 0.0051 | 0.45 | 0.23 | 42.30 | 0.0053 | 0.48 | 0.24 | 41.44 | 0.0059 | 0.51 | 0.27 |
MIAE [24] | 49.59 | 0.0023 | 0.21 | 0.10 | 46.75 | 0.0112 | 0.72 | 0.26 | 49.68 | 0.0023 | 0.20 | 0.10 |
Fusformer [19] | 47.32 | 0.0030 | 0.24 | 0.13 | 40.34 | 0.0067 | 0.54 | 0.30 | 39.29 | 0.0076 | 0.69 | 0.34 |
UAL [42] | 47.56 | 0.0029 | 0.26 | 0.13 | 45.64 | 0.0036 | 0.31 | 0.16 | 47.42 | 0.0029 | 0.25 | 0.13 |
MSD [43] | 46.89 | 0.0031 | 0.27 | 0.14 | 45.84 | 0.0035 | 0.30 | 0.16 | 46.24 | 0.0034 | 0.30 | 0.15 |
DCT [44] | 38.11 | 0.0087 | 0.40 | 0.76 | 39.66 | 0.0073 | 0.64 | 0.34 | 42.22 | 0.0054 | 0.48 | 0.25 |
ours | 55.34 | 0.0027 | 0.20 | 0.06 | 53.87 | 0.0034 | 0.24 | 0.08 | 53.75 | 0.0030 | 0.24 | 0.07 |
Shared Encoder | Fusion Decoder | Blind Estimation Network | PSNR | RMSE | SAM | ERGAS |
---|---|---|---|---|---|---|
× | × | × | 46.79 | 0.0046 | 0.32 | 0.11 |
√ | × | × | 49.62 | 0.0039 | 0.29 | 0.09 |
√ | √ | × | 52.85 | 0.0035 | 0.24 | 0.08 |
√ | √ | √ | 53.61 | 0.0029 | 0.20 | 0.06 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, Z.; Chen, M.; Wang, W. UMMFF: Unsupervised Multimodal Multilevel Feature Fusion Network for Hyperspectral Image Super-Resolution. Remote Sens. 2024, 16, 3282. https://doi.org/10.3390/rs16173282
Jiang Z, Chen M, Wang W. UMMFF: Unsupervised Multimodal Multilevel Feature Fusion Network for Hyperspectral Image Super-Resolution. Remote Sensing. 2024; 16(17):3282. https://doi.org/10.3390/rs16173282
Chicago/Turabian StyleJiang, Zhongmin, Mengyao Chen, and Wenju Wang. 2024. "UMMFF: Unsupervised Multimodal Multilevel Feature Fusion Network for Hyperspectral Image Super-Resolution" Remote Sensing 16, no. 17: 3282. https://doi.org/10.3390/rs16173282