Estimating parameters for distributed hydrological models is a challenging and long studied task. Parameter transfer functions, which define model parameters as functions of geo-physical properties of a catchment, might improve the calibration procedure, increase process realism and can enable prediction in ungauged areas. We present the Function Space Optimization (FSO), a symbolic regression method for estimating parameter transfer functions for distributed hydrological models. FSO is based on the idea of transferring the search for mathematical expressions into a continuous vector space that can be used for optimization. This is accomplished by using a text generating neural network with a variational autoencoder architecture, that can learn to compress the information of mathematical functions. To evaluate the performance of FSO, we conducted a case study using a parsimonious hydrological model and synthetic discharge data. The case study consisted of two FSO applications: Single-criteria FSO, where only discharge was used for optimization and multi-criteria FSO, where additional spatiotemporal observations of model states were used for transfer function estimation. The results show that FSO is able to estimate transfer functions correctly or approximate them sufficiently. We observed a reduced fit of the parameter density functions resulting from the inferred transfer functions for less sensitive model parameters. For those it was sufficient to estimate functions resulting in parameter distributions with approximately the same mean parameter values as the real transfer functions. The results of the multi-criteria FSO showed that using multiple spatiotemporal observations for optimization increased the quality of estimation considerably.
FSO is a symbolic regression method that allows for automatic estimation of the structure and parameterization of transfer functions from catchment data. The FSO method transforms the search for an optimal transfer function into a continuous optimization problem using a text generating neural network (variational autoencoder). mHM is a widely applied distributed hydrological model, which uses transfer functions for all its parameters. For this study, we estimate transfer functions for the parameters saturated hydraulic conductivity and field capacity. To avoid the influence of parameter equifinality, the remaining mHM parameter values are optimized simultaneously. The study domain consists of 229 basins, including 7 major basins for Training and 222 smaller basins for validation, distributed across Germany. 5 years of data are used for training und 35 years for validation. By validating the estimated transfer functions in a set of validation basins in a different time period, we can examine the FSO estimated transfer functions influence on model performance, scalability and transferability. We find that transfer functions estimated by FSO lead to a robust performance when being applied in an ungauged setting. The median KGE of the validation basins in the validation time period is 0.73, while the median KGE of the 7 training basins in training time is 0.8. These results look promising, especially since we are only using 5 years of training data, and show the general applicability of FSO for distributed hydrological models.
Parameter estimation is one of the most challenging tasks in large-scale distributed modeling, because of the high dimensionality of the parameter space. Relating model parameters to catchment/landscape characteristics reduces the number of parameters, enhances physical realism, and allows the transfer of hydrological model parameters in time and space. This study presents the first large-scale application of automatic parameter transfer function (TF) estimation for a complex hydrological model. The Function Space Optimization (FSO) method can automatically estimate TF structures and coefficients for distributed models. We apply FSO to the mesoscale Hydrologic Model (mHM, mhm-ufz.org), which is the only available distributed model that includes a priori defined TFs for all its parameters. FSO is used to estimate new TFs for the parameters “saturated hydraulic conductivity” and “field capacity”, which both influence a range of hydrological processes. The setup of mHM from a previous study serves as a benchmark. The estimated TFs resulted in predictions in 222 validation basins with a median NSE of 0.68, showing that even with 5 years of calibration data, high performance in ungauged basins can be achieved. The performance is similar to the benchmark results, showing that the automatic TFs can achieve comparable results to TFs that were developed over years using expert knowledge. In summary, the findings present a step towards automatic TF estimation of model parameters for distributed models.
Typical applications of process- or physically-based models aim to gain a better process understanding or provide the basis for a decision-making process. To adequately represent the physical system, models should include all essential processes. However, model errors can still occur. Other than large systematic observation errors, simplified, misrepresented, inadequately parametrized or missing processes are potential sources of errors. This study presents a set of methods and a proposed workflow for analyzing errors of process-based models as a basis for relating them to process representations. The evaluated approach consists of three steps: (i) training a machine learning (ml) error-model using the input data of the process-based model and other available variables, (ii) estimation of local explanations (i.e., contributions of each variable to a individual prediction) for each predicted model error using SHapley Additive exPlanations (SHAP) in combination with principal component analysis, (iii) clustering of SHAP values of all predicted errors to derive groups with similar error generation characteristics. By analyzing these groups of different error-variable association, hypotheses on error generation and corresponding processes can be formulated. That can ultimately lead to improvements in process understanding and prediction. The approach is applied to a process-based stream water temperature model HFLUX in a case study for modelling an alpine stream in the Canadian Rocky Mountains. By using available meteorological and hydrological variables as inputs, the applied ml model is able to predict model residuals. Clustering of SHAP values results in three distinct error groups that are mainly related to shading and vegetation emitted longwave radiation. Model errors are rarely random and often contain valuable information. Assessing model error associations is ultimately a way of enhancing trust in implemented processes and of providing information on potential areas of improvement to the model.