loading page

Learning from mistakes - Assessing the performance and uncertainty in process-based models
  • +2
  • Moritz Feigl,
  • Benjamin Roesky,
  • Mathew Herrnegger,
  • Karsten Schulz,
  • Masaki Hayashi
Moritz Feigl
University of Natural Resources and Life Sciences Vienna

Corresponding Author:moritz.feigl@boku.ac.at

Author Profile
Benjamin Roesky
BGC Engineering Inc
Author Profile
Mathew Herrnegger
University of Natural Resources and Life Sciences Vienna
Author Profile
Karsten Schulz
University of Natural Resources and Life Sciences Vienna
Author Profile
Masaki Hayashi
University of Calgary
Author Profile


Typical applications of process- or physically-based models aim to gain a better process understanding or provide the basis for a decision-making process. To adequately represent the physical system, models should include all essential processes. However, model errors can still occur. Other than large systematic observation errors, simplified, misrepresented, inadequately parametrized or missing processes are potential sources of errors. This study presents a set of methods and a proposed workflow for analyzing errors of process-based models as a basis for relating them to process representations. The evaluated approach consists of three steps: (i) training a machine learning (ml) error-model using the input data of the process-based model and other available variables, (ii) estimation of local explanations (i.e., contributions of each variable to a individual prediction) for each predicted model error using SHapley Additive exPlanations (SHAP) in combination with principal component analysis, (iii) clustering of SHAP values of all predicted errors to derive groups with similar error generation characteristics. By analyzing these groups of different error-variable association, hypotheses on error generation and corresponding processes can be formulated. That can ultimately lead to improvements in process understanding and prediction. The approach is applied to a process-based stream water temperature model HFLUX in a case study for modelling an alpine stream in the Canadian Rocky Mountains. By using available meteorological and hydrological variables as inputs, the applied ml model is able to predict model residuals. Clustering of SHAP values results in three distinct error groups that are mainly related to shading and vegetation emitted longwave radiation. Model errors are rarely random and often contain valuable information. Assessing model error associations is ultimately a way of enhancing trust in implemented processes and of providing information on potential areas of improvement to the model.
13 Aug 2021Submitted to Hydrological Processes
14 Aug 2021Submission Checks Completed
14 Aug 2021Assigned to Editor
14 Aug 2021Reviewer(s) Assigned
22 Oct 2021Review(s) Completed, Editorial Evaluation Pending
13 Nov 2021Editorial Decision: Revise Minor
10 Dec 20211st Revision Received
11 Dec 2021Submission Checks Completed
11 Dec 2021Assigned to Editor
11 Dec 2021Reviewer(s) Assigned
11 Dec 2021Review(s) Completed, Editorial Evaluation Pending
08 Feb 2022Editorial Decision: Accept
Feb 2022Published in Hydrological Processes volume 36 issue 2. 10.1002/hyp.14515