Li et al. (2026) A runoff prediction method for arid regions integrating physics-guided signal extraction and temporally adaptive feature selection
Identification
- Journal: Journal of Hydrology Regional Studies
- Year: 2026
- Date: 2026-01-06
- Authors: Ziheng Li, Xuefeng Sang, Hao Wang, Guoqiang Wang, Yang Zheng, Haokai Ding
- DOI: 10.1016/j.ejrh.2025.103034
Research Groups
- State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, Beijing, China
- China Institute of Water Resources and Hydropower Research, Beijing, China
- Center for Geo-Data and Analysis, Faculty of Geographical Science, Beijing Normal University, Beijing, China
Short Summary
This study proposes an interpretable runoff prediction framework for arid regions, integrating physics-guided signal extraction and temporally adaptive feature selection to address hydrological non-stationarity and data scarcity. The framework effectively captures hydrological–engineering coupling, achieving high prediction accuracy and revealing interpretable regulation behaviors like annual memory effects and threshold-based storage/discharge shifts.
Objective
- To establish an interpretable hydrological–engineering coupled prediction framework for arid regions.
- To assess the effectiveness of physically separated anthropogenic regulation signals in enhancing prediction accuracy.
- To determine if dynamic feature selection can reveal the temporal evolution patterns of key driving factors.
- To evaluate the robustness and generalizability of the proposed method under conditions of uncertainty and input noise.
Study Configuration
- Spatial Scale: Daheihe River Basin, located in the arid region of northwestern China (northeastern corner of the Hetao region in Inner Mongolia). The study focused on four hydrological stations: QiXiaYing, MeiDai, WestIIRiver, and SanLiang.
- Temporal Scale: Data compiled for 2006–2021. Training period: 2006–2016. Testing period: 2017–2021. The hydrological model (WAS) was calibrated and validated using data from 1981–2000 (with 1980 as a warm-up period). Runoff data is monthly.
Methodology and Data
- Models used:
- Hydrological model: WAS (Water resources comprehensive Allocation and Simulation model), a conceptual, semi-distributed model for natural runoff simulation.
- Machine Learning models: Random Forest (MRF), Extreme Gradient Boosting (MXGBoost), Adaptive Boosting (MAdaBoost), Multivariate Linear Regression (MMLR), and a Long Short-Term Memory network with hyperparameter optimization (HPO-LSTM) for benchmark comparison.
- Feature selection: Expanding-Window Recursive Feature Elimination with Cross-Validation (EW-RFECV).
- Interpretability analysis: SHapley Additive exPlanations (SHAP).
- Data sources:
- Hydrometeorological data: Rainfall and evaporation records from 11 meteorological stations, monthly runoff data from four hydrological stations (China Hydrological Yearbook). Naturalized runoff data for QiXiaYing station (Second National Water Resources Survey and Assessment).
- Geospatial data: China’s 30-meter annual land cover raster dataset (1990–2022), HWSD 2.0 soil dataset.
- Water infrastructure data: Basic attributes (reservoir locations, storage capacities) from the national high-resolution spatial reservoir dataset (CRD).
Main Results
- The EW-RFECV-ML method demonstrated superior performance in predicting human activity signals, with average Nash-Sutcliffe Efficiency (NSE) of 0.68 and Kling-Gupta Efficiency (KGE) of 0.84 at downstream stations, exceeding a tuned LSTM model by 0.56 (NSE) and 0.76 (KGE).
- The integration of dynamic temporal information, recursive feature elimination, and the expanding-window training strategy significantly improved predictive performance, with each component contributing positively.
- Model applicability showed spatial heterogeneity: Random Forest performed better in upstream areas (nonlinear processes), while Multivariate Linear Regression was superior in downstream areas with intensive, rule-based human activities.
- The method exhibited strong robustness: performance degradation remained below 8% (max NSE decrease 6.2%, KGE decrease 7.7%) under ±10% physical model parameter perturbations, and negligible (absolute ΔNSE ≤ 0.04) under ±5% Gaussian noise in input features.
- SHAP analysis revealed key interpretable regulation mechanisms: "naturalflow" was consistently important, "Targett-12" (management behavior memory) showed strong influence reflecting annual management cycles, and "measuredflowt-1" indicated negative feedback from historical discharge.
- Threshold-based regulation was identified: at QiXiaYing, a natural inflow threshold of approximately 2.15 cubic meters per second (m³/s) marked a shift from water release to storage, consistent with reservoir observations.
- An annual update step for the expanding window achieved the optimal balance between model responsiveness and stability, aligning with regional water resource planning cycles.
Contributions
- Developed a novel, interpretable runoff prediction framework (EW-RFECV-ML) specifically tailored for arid regions characterized by hydrological non-stationarity, intensive human activities, and data scarcity.
- Introduced a physics-guided signal extraction approach to quantify anthropogenic regulation signals by separating them from natural runoff, overcoming the challenge of unavailable detailed operational records for water infrastructure.
- Implemented a dual-loop, temporally adaptive feature selection strategy (Expanding-Window Recursive Feature Elimination with Cross-Validation) to dynamically identify key driving factors and adapt to evolving basin dynamics over time.
- Provided mechanistic interpretations of human-hydrology coupling through SHAP analysis, revealing critical insights such as annual management memory effects and specific flow thresholds that govern reservoir storage/release decisions.
- Demonstrated high robustness of the framework against uncertainties originating from both the physical hydrological model parameters and machine learning input noise, enhancing the reliability and generalizability of predictions in complex hydro-engineering systems.
- Offered a practical and transferable approach for adaptive water management and intelligent decision support in arid and urbanized basins, by integrating multi-source data and revealing transparent operational rules.
Funding
- National Key Research and Development Program of China (2022YFC3204404)
- National Natural Science Foundation of China (U2243233)
- Major Science and Technology Project of the Ministry of Water Resources of China (SKS-2022118)
Citation
@article{Li2026runoff,
author = {Li, Ziheng and Sang, Xuefeng and Wang, Hao and Wang, Guoqiang and Zheng, Yang and Ding, Haokai},
title = {A runoff prediction method for arid regions integrating physics-guided signal extraction and temporally adaptive feature selection},
journal = {Journal of Hydrology Regional Studies},
year = {2026},
doi = {10.1016/j.ejrh.2025.103034},
url = {https://doi.org/10.1016/j.ejrh.2025.103034}
}
Original Source: https://doi.org/10.1016/j.ejrh.2025.103034