Aybar et al. (2026) A radiometrically and spatially consistent super-resolution framework for Sentinel-2

Identification

Journal: Remote Sensing of Environment
Year: 2026
Date: 2026-01-06
Authors: Cesar Aybar, Julio Contreras, Simon Donike, Enrique Portalés-Julià, Gonzalo Mateo‐García, Luis Gómez‐Chova
DOI: 10.1016/j.rse.2025.115222

Research Groups

Image Processing Laboratory (IPL), University of Valencia, Valencia, Spain

Short Summary

This paper introduces SEN2SR, a deep learning framework designed to super-resolve Sentinel-2 images to a uniform 2.5-meter spatial resolution while preserving spectral and spatial consistency. By employing harmonized synthetic training data and a novel low-frequency hard constraint, SEN2SR achieves superior reconstruction quality, near-zero reflectance deviation, and spatial alignment compared to state-of-the-art methods, validated through extensive benchmarks and downstream Earth observation tasks.

Objective

To develop a deep learning framework (SEN2SR) capable of super-resolving Sentinel-2 images (10-meter RGBN and 20-meter RSWIR bands) to a uniform 2.5-meter resolution.
To ensure spectral and spatial alignment consistency in the super-resolved outputs by harmonizing synthetic training data and introducing a low-frequency hard constraint layer.
To evaluate the framework's performance comprehensively using pixel-level, context-level, and Explainable AI (xAI) metrics, as well as its impact on downstream Earth observation tasks.

Study Configuration

Spatial Scale:
- Input: Sentinel-2 multispectral images at 10 meters (RGBN bands) and 20 meters (Red Edge and SWIR bands).
- Output: Uniform 2.5-meter resolution images for all upsampled bands.
- Upsampling factors: ×4 for 10-meter bands, and a two-step process of ×2 then ×4 for 20-meter bands.
Temporal Scale:
- Training data for harmonization: Cross-sensor image pairs acquired within a 2-day range.
- CloudSEN12 dataset: Multi-temporal, globally distributed, with regions comprising five Sentinel-2 L2A and L1C images.
- Downstream tasks: WorldFloods (flood events up to 2023), MARS-S2L (methane emissions from January 2018 to June 2024).

Methodology and Data

Models used:
- Deep learning architectures: Convolutional Neural Networks (CNNs), Swin Transformers (Swin2SR), and Mamba (MambaSR).
- CNNSR: Based on the Swift Parameter-free Attention Network (SPAN) block.
- Swin2SR: Uses Swin Transformer version 2 (Swinv2) as its core building block.
- MambaSR: Replaces attention mechanism with a structured state-space module (SSM).
- Low-frequency hard constraint layer: Applied as the final layer, using Fourier-based filters (rectangular brick-wall, Butterworth, Gaussian) to enforce spectral consistency.
- Loss functions: L1 loss (primary), with LPIPS and General CLIP evaluated as auxiliary perceptual losses.
- Optimizer: Adam.
- Upsampling mechanism: PixelShuffle.
Data sources:
- Training datasets:
  - SEN2NAIPv2-synthetic: 61,282 LR (10-meter) and HR (2.5-meter) RGBN image pairs, synthetically degraded from NAIP data.
  - CloudSEN12 cloud-free: Subset of 9880 cloud-free Sentinel-2 L2A and L1C regions, used for reference RSWIR SR tasks.
- Validation/Test datasets:
  - SEN2NAIPv2-crosssensor: 4409 LR (Sentinel-2) and HR (NAIP 2.5-meter) RGBN image pairs, same-day acquisition, radiometrically and spatially harmonized.
  - OpenSR-Test: 31 LR (Sentinel-2) and HR (SPOT, Venµs, NAIP, SNOP aerial imagery at 2.5-meter) RGBN image pairs, harmonized and visually inspected.
- Downstream task datasets:
  - WorldFloods: 18 Sentinel-2 L1C images with human-annotated flood labels.
  - MARS-S2L: 497 Sentinel-2 L1C images with human-annotated methane plume masks.

Main Results

Low-frequency hard constraint: Consistently improves PSNR, accelerates training convergence, and ensures radiometric and spatial consistency (near-zero reflectance deviation and spatial misalignment) in super-resolved outputs when downsampled back to original resolution. Gaussian filters generally perform best.
Model Architecture and Size: SwinSR and MambaSR architectures consistently outperform CNNSR. For non-reference RGBN SR ×4, performance gains plateau for models larger than approximately 15 million parameters, suggesting overfitting for very large models.
Loss Function: L1 loss is selected for its robustness in preserving spectral fidelity and lower computational cost, leading to more stable training compared to standalone perceptual losses (CLIP, LPIPS).
Non-reference RGBN SR ×4 (10-meter to 2.5-meter):
- SEN2SR (Mamba Medium) achieves the highest PSNR (37.01 dB) and the best trade-off between fidelity and perceptual quality (highest improvement, lowest hallucination rate) compared to state-of-the-art models.
- SEN2SR-Lite (CNN Light) processes a Sentinel-2 scene (110 km × 110 km) in approximately 10 minutes, while the full SEN2SR model takes about 6 hours on an NVIDIA RTX A5500.
Reference RSWIR SR (20-meter to 2.5-meter):
- Flood Detection (MNDWI): SEN2SR and SEN2SR-Lite slightly increase flood recall (0.7880 and 0.7832 respectively, compared to 0.7793 for raw Sentinel-2) while maintaining comparable precision, demonstrating radiometric preservation.
- Methane Plume Detection (MBMP): SEN2SR and SEN2SR-Lite achieve higher signal-to-noise ratios (1.47 and 1.23 respectively, compared to 1 for raw Sentinel-2), indicating maintained radiometric integrity for sensitive applications.
Explainable AI (xAI): SEN2SR models (Swin and Mamba) exhibit significantly higher gradient complexity and robustness compared to SEN2SR-Lite (CNNs), with a positive correlation between PSNR and gradient complexity for Mamba models, suggesting better generalization and more structured high-frequency detail reconstruction.

Contributions

Proposed a novel low-frequency hard constraint layer for deep learning super-resolution networks, enforcing spectral and spatial consistency with original low-resolution inputs during training and inference.
Conducted an extensive ablation study evaluating the influence of network architecture (CNNs, Swin Transformers, Mamba), model size, loss functions, and training data type (synthetic vs. cross-sensor).
Extended the super-resolution process to include Sentinel-2's 20-meter Red Edge and SWIR bands, achieving a uniform 2.5-meter resolution across all target bands using a two-step Wald protocol.
Introduced a comprehensive super-resolution evaluation framework that integrates pixel-level and context-level accuracy metrics, gradient-based Explainable AI (xAI) techniques, and performance assessments on real-world Earth observation downstream tasks.

Funding

European Space Agency (ESA, Φ-lab): Explainable AI: Application to Trustworthy Super-Resolution (OpenSR)
Spanish Ministry of Science and Innovation: project PID2023-148485OB-C21 (funded by MCIN/AEI/10.13039/501100011033)
National Council of Science, Technology, and Technological Innovation (CONCYTEC, Peru): PE501083135-2023-PROCIENCIA

Citation

@article{Aybar2026radiometrically,
  author = {Aybar, Cesar and Contreras, Julio and Donike, Simon and Portalés-Julià, Enrique and Mateo‐García, Gonzalo and Gómez‐Chova, Luis},
  title = {A radiometrically and spatially consistent super-resolution framework for Sentinel-2},
  journal = {Remote Sensing of Environment},
  year = {2026},
  doi = {10.1016/j.rse.2025.115222},
  url = {https://doi.org/10.1016/j.rse.2025.115222}
}

Original Source: https://doi.org/10.1016/j.rse.2025.115222