Zhou et al. (2026) Optimizing runoff simulation in three mid-high latitude catchments by integrating terrestrial ecosystem modelling, hybrid machine learning, and causal inference
Identification
- Journal: Journal of Hydrology Regional Studies
- Year: 2026
- Date: 2026-01-06
- Authors: Hao Zhou Hao Zhou, Jing Tang, Stefan Olin, Renkui Guo, Paul Miller
- DOI: 10.1016/j.ejrh.2025.103085
Research Groups
- Department of Physical Geography and Ecosystem Science, Lund University, Lund, Sweden
- Department of Earth and Environmental Sciences, The Chinese University of Hong Kong, Hong Kong Special Administrative Region of China
- Center for Volatile Interactions, Copenhagen, Denmark
Short Summary
This study develops a hybrid eco-hydrological framework by coupling the process-based terrestrial ecosystem model LPJ-GUESS with five machine learning (ML) algorithms to optimize monthly runoff simulation in three mid-high latitude catchments. The framework significantly improves runoff prediction (Nash–Sutcliffe efficiency by 0.4–1.9) and reveals that LPJ-GUESS systematically underweights the effect of incoming radiation, indicating missing energy-balance processes as a key source of model error.
Objective
- To evaluate the effectiveness of integrating LPJ-GUESS with ML models to enhance monthly streamflow prediction, hypothesizing that hybrid approaches outperform both the standalone ecosystem model and individual ML methods.
- To investigate whether optimal model structure configurations vary with catchment characteristics and input data availability.
- To identify and quantify structural shortcomings in LPJ-GUESS using combined SHapley Additive exPlanations (SHAP) and causal forest analyses, providing clear guidance for targeted model improvements to enhance future streamflow predictions.
Study Configuration
- Spatial Scale: Three mid-to-high latitude catchments (32°–64°N):
- Krycklan (Boreal Sweden): 67.9 square kilometers (km²), 64.23°N, 19.77°E (less than one 0.5° × 0.5° grid cell).
- Danube (Central Europe, at Budapest station): 184,893.0 km², 47.50°N, 19.05°E (89 grid cells).
- Mississippi (USA, at Vicksburg station): 2,964,255.0 km², 32.30°N, -90.91°W (1260 grid cells).
- LPJ-GUESS model resolution: 0.5° × 0.5°.
- Temporal Scale:
- LPJ-GUESS spin-up simulation: 1000 years.
- LPJ-GUESS transient simulation: 1901–2015.
- Observed streamflow data periods:
- Krycklan: 2008–2015 (8 years). Training: January 2008–December 2012; Testing: January 2013–December 2015.
- Danube: 1996–2015 (20 years). Training: January 1996–December 2011; Testing: January 2012–December 2015.
- Mississippi: 1966–2015 (50 years). Training: January 1966–December 2005; Testing: January 2006–December 2015.
- Output frequency: Monthly runoff.
Methodology and Data
- Models used:
- Process-based terrestrial ecosystem model: LPJ-GUESS (Lund-Potsdam-Jena General Ecosystem Simulator).
- Machine Learning (ML) algorithms: Multi-Layer Perception (MLP), Long Short-Term Memory (LSTM), Random Forest (RF), eXtreme Gradient Boosting (XGB), Support Vector Machine (SVM).
- Hybrid model structures:
- Hybrid Model 1 (HM1): LPJ-GUESS daily modelled runoff used as an input feature for ML models to predict observed runoff.
- Hybrid Model 2 (HM2): ML models predict the runoff residual (observed minus LPJ-GUESS modelled runoff) using LPJ-GUESS input data.
- Interpretability methods: SHapley Additive exPlanations (SHAP) algorithm, Causal Forest (CF) algorithm.
- Data sources:
- Climate forcing: CRUNCEP Version 7 database (gridded monthly mean, minimum, and maximum air temperature, relative humidity, precipitation, incoming solar radiation, and mean wind speed at 10 meters height).
- Daily climate data downscaling: Global Weather GENerator (GWGEN) from monthly CRUNCEP.
- Soil properties: Wide-field Infrared Survey Explorer, version 3 (WISE30min 3.0) dataset (sand, silt, clay fractions).
- Annual carbon dioxide (CO₂) concentrations: McGuire et al. (2001) and TRENDS project (ice-core measurements and atmospheric observations).
- Land use data: Hurtt et al. (2020) (cropland, pasture, and natural land proportion).
- Observed streamflow (discharge):
- Krycklan: Krycklan Catchment Study (Swedish University of Agricultural Sciences).
- Danube (Budapest station) and Mississippi (Vicksburg station): Global Runoff Data Centre (GRDC).
Main Results
- Hybrid models significantly improved monthly runoff prediction performance across all catchments, with Hybrid Model 2 (residual-based learning) generally outperforming Hybrid Model 1 and standalone models. Nash–Sutcliffe Efficiency (NSE) increased by 0.4–1.9 relative to original LPJ-GUESS.
- For Krycklan, HM2–RF achieved the highest NSE (0.83) compared to LPJ-GUESS (-1.80).
- For Danube, HM2–LSTM achieved the highest NSE (0.61) compared to LPJ-GUESS (0.05).
- For Mississippi, HM2–LSTM achieved the best overall skill (NSE = 0.73, Pearson correlation coefficient R = 0.81, Normalized Root Mean Square Error NRMSE = 0.52, Percent Bias PBIAS = -1.12) compared to LPJ-GUESS (NSE = -1.12).
- Hybrid models reduced peak-timing phase bias (MMIAT closer to 0 days) and overall error and bias, correcting both magnitude- and phase-related discrepancies. For example, HM2–LSTM reduced Mississippi's MMIAT from -27 days (LPJ-GUESS) to -2 days.
- Optimal ML architectures in hybrid models were significantly simpler (e.g., 50% fewer neurons in MLP, 69% reduction in LSTM hidden units, 87% shallower trees in RF for Mississippi) than standalone ML models, suggesting that physical information from LPJ-GUESS reduces the computational burden on the ML component.
- SHAP analysis consistently identified incoming radiation, temperature, and precipitation as the leading factors influencing runoff.
- LPJ-GUESS consistently underweighted the importance of incoming radiation on runoff compared to observed runoff, with importance gaps of 11–12% in Krycklan and Mississippi, and 6.26% in Danube.
- Causal forest inference confirmed a strong radiation-driven bias in LPJ-GUESS, with high-radiation conditions reducing prediction bias by an average treatment effect (ATE) of –0.13 standard deviations (s.d.) for Krycklan, –0.27 s.d. for Danube, and –0.59 s.d. for Mississippi. This indicates LPJ-GUESS underestimates runoff under high-radiation conditions.
- Temperature was the dominant contributor to prediction bias in Krycklan (ATE = –0.29 s.d.) and Danube (–0.35 s.d.), but had a relatively weak effect in Mississippi (–0.018 s.d.).
- Precipitation had the strongest positive causal effect on runoff bias in Danube (ATE = +0.60 s.d.) and Mississippi (ATE = +0.82 s.d.), suggesting LPJ-GUESS likely overestimates runoff response to precipitation in these larger basins.
- The results demonstrate that missing energy-balance processes (e.g., explicit surface/canopy energy balance, snow sublimation, albedo changes) are an important source of model error in LPJ-GUESS, particularly in cold-region catchments.
Contributions
- Developed a novel synergistic hybrid eco-hydrological framework that integrates a process-based terrestrial ecosystem model (LPJ-GUESS) with machine learning algorithms and interpretable ML methods (SHAP, Causal Forest).
- Demonstrated substantial improvements in monthly streamflow prediction accuracy and robustness across diverse mid-high latitude catchments compared to standalone process-based and ML models.
- Provided quantitative and causal insights into the structural shortcomings of LPJ-GUESS, specifically identifying the underrepresentation of radiation effects on runoff generation (especially in snow/ice melt processes) and potential over-conversion of rainfall to runoff in larger basins.
- Showed that hybrid models achieve superior performance with simpler and more computationally efficient ML architectures, adapting to catchment characteristics and data availability.
- Bridged correlational feature attribution (SHAP) with causal inference (Causal Forest) to robustly diagnose and quantify the causal impact of climatic drivers on model biases, offering actionable guidance for targeted model development in ecosystem hydrology.
Funding
- China Scholarship Council (CSC)
- Stipend from the Department of Physical Geography and Ecosystem Science, Lund University
- Swedish FORMAS (Forskningsråd för hållbar utveckling) mobility Grant (2016–01580)
- Villum Young Investigator (Grant No. VIL53048)
- Danish National Research Foundation (Center for Volatile Interactions, grant No. DNRF 168)
- Strategic research areas: MERGE (www.merge.lu.se) and BECC (www.becc.lu.se)
Citation
@article{Zhou2026Optimizing,
author = {Zhou, Hao Zhou Hao and Tang, Jing and Olin, Stefan and Guo, Renkui and Miller, Paul},
title = {Optimizing runoff simulation in three mid-high latitude catchments by integrating terrestrial ecosystem modelling, hybrid machine learning, and causal inference},
journal = {Journal of Hydrology Regional Studies},
year = {2026},
doi = {10.1016/j.ejrh.2025.103085},
url = {https://doi.org/10.1016/j.ejrh.2025.103085}
}
Original Source: https://doi.org/10.1016/j.ejrh.2025.103085