Aybar et al. (2026) A radiometrically and spatially consistent super-resolution framework for Sentinel-2
Identification
- Journal: Remote Sensing of Environment
- Year: 2026
- Date: 2026-01-06
- Authors: Cesar Aybar, Julio Contreras, Simon Donike, Enrique Portalés-Julià, Gonzalo Mateo‐García, Luis Gómez‐Chova
- DOI: 10.1016/j.rse.2025.115222
Research Groups
- Image Processing Laboratory (IPL), University of Valencia, Valencia, Spain
Short Summary
This paper introduces SEN2SR, a deep learning framework designed to super-resolve Sentinel-2 images to a uniform 2.5-meter spatial resolution while preserving spectral and spatial consistency. By employing harmonized synthetic training data and a novel low-frequency hard constraint, SEN2SR achieves superior reconstruction quality, near-zero reflectance deviation, and spatial alignment compared to state-of-the-art methods, validated through extensive benchmarks and downstream Earth observation tasks.
Objective
- To develop a deep learning framework (SEN2SR) capable of super-resolving Sentinel-2 images (10-meter RGBN and 20-meter RSWIR bands) to a uniform 2.5-meter resolution.
- To ensure spectral and spatial alignment consistency in the super-resolved outputs by harmonizing synthetic training data and introducing a low-frequency hard constraint layer.
- To evaluate the framework's performance comprehensively using pixel-level, context-level, and Explainable AI (xAI) metrics, as well as its impact on downstream Earth observation tasks.
Study Configuration
- Spatial Scale:
- Input: Sentinel-2 multispectral images at 10 meters (RGBN bands) and 20 meters (Red Edge and SWIR bands).
- Output: Uniform 2.5-meter resolution images for all upsampled bands.
- Upsampling factors: ×4 for 10-meter bands, and a two-step process of ×2 then ×4 for 20-meter bands.
- Temporal Scale:
- Training data for harmonization: Cross-sensor image pairs acquired within a 2-day range.
- CloudSEN12 dataset: Multi-temporal, globally distributed, with regions comprising five Sentinel-2 L2A and L1C images.
- Downstream tasks: WorldFloods (flood events up to 2023), MARS-S2L (methane emissions from January 2018 to June 2024).
Methodology and Data
- Models used:
- Deep learning architectures: Convolutional Neural Networks (CNNs), Swin Transformers (Swin2SR), and Mamba (MambaSR).
- CNNSR: Based on the Swift Parameter-free Attention Network (SPAN) block.
- Swin2SR: Uses Swin Transformer version 2 (Swinv2) as its core building block.
- MambaSR: Replaces attention mechanism with a structured state-space module (SSM).
- Low-frequency hard constraint layer: Applied as the final layer, using Fourier-based filters (rectangular brick-wall, Butterworth, Gaussian) to enforce spectral consistency.
- Loss functions: L1 loss (primary), with LPIPS and General CLIP evaluated as auxiliary perceptual losses.
- Optimizer: Adam.
- Upsampling mechanism: PixelShuffle.
- Data sources:
- Training datasets:
- SEN2NAIPv2-synthetic: 61,282 LR (10-meter) and HR (2.5-meter) RGBN image pairs, synthetically degraded from NAIP data.
- CloudSEN12 cloud-free: Subset of 9880 cloud-free Sentinel-2 L2A and L1C regions, used for reference RSWIR SR tasks.
- Validation/Test datasets:
- SEN2NAIPv2-crosssensor: 4409 LR (Sentinel-2) and HR (NAIP 2.5-meter) RGBN image pairs, same-day acquisition, radiometrically and spatially harmonized.
- OpenSR-Test: 31 LR (Sentinel-2) and HR (SPOT, Venµs, NAIP, SNOP aerial imagery at 2.5-meter) RGBN image pairs, harmonized and visually inspected.
- Downstream task datasets:
- WorldFloods: 18 Sentinel-2 L1C images with human-annotated flood labels.
- MARS-S2L: 497 Sentinel-2 L1C images with human-annotated methane plume masks.
- Training datasets:
Main Results
- Low-frequency hard constraint: Consistently improves PSNR, accelerates training convergence, and ensures radiometric and spatial consistency (near-zero reflectance deviation and spatial misalignment) in super-resolved outputs when downsampled back to original resolution. Gaussian filters generally perform best.
- Model Architecture and Size: SwinSR and MambaSR architectures consistently outperform CNNSR. For non-reference RGBN SR ×4, performance gains plateau for models larger than approximately 15 million parameters, suggesting overfitting for very large models.
- Loss Function: L1 loss is selected for its robustness in preserving spectral fidelity and lower computational cost, leading to more stable training compared to standalone perceptual losses (CLIP, LPIPS).
- Non-reference RGBN SR ×4 (10-meter to 2.5-meter):
- SEN2SR (Mamba Medium) achieves the highest PSNR (37.01 dB) and the best trade-off between fidelity and perceptual quality (highest improvement, lowest hallucination rate) compared to state-of-the-art models.
- SEN2SR-Lite (CNN Light) processes a Sentinel-2 scene (110 km × 110 km) in approximately 10 minutes, while the full SEN2SR model takes about 6 hours on an NVIDIA RTX A5500.
- Reference RSWIR SR (20-meter to 2.5-meter):
- Flood Detection (MNDWI): SEN2SR and SEN2SR-Lite slightly increase flood recall (0.7880 and 0.7832 respectively, compared to 0.7793 for raw Sentinel-2) while maintaining comparable precision, demonstrating radiometric preservation.
- Methane Plume Detection (MBMP): SEN2SR and SEN2SR-Lite achieve higher signal-to-noise ratios (1.47 and 1.23 respectively, compared to 1 for raw Sentinel-2), indicating maintained radiometric integrity for sensitive applications.
- Explainable AI (xAI): SEN2SR models (Swin and Mamba) exhibit significantly higher gradient complexity and robustness compared to SEN2SR-Lite (CNNs), with a positive correlation between PSNR and gradient complexity for Mamba models, suggesting better generalization and more structured high-frequency detail reconstruction.
Contributions
- Proposed a novel low-frequency hard constraint layer for deep learning super-resolution networks, enforcing spectral and spatial consistency with original low-resolution inputs during training and inference.
- Conducted an extensive ablation study evaluating the influence of network architecture (CNNs, Swin Transformers, Mamba), model size, loss functions, and training data type (synthetic vs. cross-sensor).
- Extended the super-resolution process to include Sentinel-2's 20-meter Red Edge and SWIR bands, achieving a uniform 2.5-meter resolution across all target bands using a two-step Wald protocol.
- Introduced a comprehensive super-resolution evaluation framework that integrates pixel-level and context-level accuracy metrics, gradient-based Explainable AI (xAI) techniques, and performance assessments on real-world Earth observation downstream tasks.
Funding
- European Space Agency (ESA, Φ-lab): Explainable AI: Application to Trustworthy Super-Resolution (OpenSR)
- Spanish Ministry of Science and Innovation: project PID2023-148485OB-C21 (funded by MCIN/AEI/10.13039/501100011033)
- National Council of Science, Technology, and Technological Innovation (CONCYTEC, Peru): PE501083135-2023-PROCIENCIA
Citation
@article{Aybar2026radiometrically,
author = {Aybar, Cesar and Contreras, Julio and Donike, Simon and Portalés-Julià, Enrique and Mateo‐García, Gonzalo and Gómez‐Chova, Luis},
title = {A radiometrically and spatially consistent super-resolution framework for Sentinel-2},
journal = {Remote Sensing of Environment},
year = {2026},
doi = {10.1016/j.rse.2025.115222},
url = {https://doi.org/10.1016/j.rse.2025.115222}
}
Original Source: https://doi.org/10.1016/j.rse.2025.115222