2
Methodology
2.1 Framework
In order to predict the hydrological response to climate change, four
regional climate models (RCMs), i.e., RSMGE, HadGEM3_RA, RegCM4, and
WRF, are used to drive hydrological models for future runoff
forecasting. Climate projection results from CORDEX-East Asia, that were
bias-corrected by linear regression, are used as inputs, and the runoff
is used as the output of hydrological models. Hydrological predictions
are carried out by using four learning methods, i.e., multiple linear
regression (MLR), support vector machine (SVM), artificial neural
network (ANN) and multilayer perceptron (MLP), which are trained by
hydrological simulation. Critical forecasting results are daily and
monthly runoffs during the period of 2021-2050 under two greenhouse gas
emission scenarios, i.e., RCP4.5 and RCP8.5. The framework of this study
is shown in Fig. 1.
2.2 Regional climate
modelling
General circulation model (GCM) is a common and typical method for
forecasting future climates factors. GCM can predict future climate in
large-scale regions (around 1000 km) on a global scale. However, its
forecasting scale is so large that the resolution is relatively
inadequate to represent hydrological processes (Giorgi and Marinucci,
1996). Coordinated Regional Climate Downscaling Experiment (CORDEX) is a
framework for the World Climate Research Program. This framework aims to
assess simulated performances of regional climate models (RCMs) through
a series of predictive experiments. Compared with GCMs, RCMs have higher
resolutions (about 25-50 km) and can capture climate characteristics
within regions. Therefore, RCMs can better meet the needs of
hydrological forecasting. The RCMs this paper used are from the
high-resolution CORDEX-East Asia project, namely RSMGE, HadGEM3_RA,
RegCM4 and WRF. In order to study the runoff changes under different
greenhouse gas emission scenarios, this paper selected two RCP
scenarios, i.e., high-emission RCP8.5 scenarios and medium-emission
RCP4.5 scenarios.
The simulations of climate variables may be full of uncertainties (Cheng
et al., 2017; Wu et al., 2019). The biases in the outputs of RCMs are
corrected by a simple and easy-to-operate method, i.e., linear
regression. The specific steps of bias correction are: (1) The time
overlap between the simulation results of climate models and the
observed data is taken as the overall sample for bias correction. The
first 2/3 of the sample is used for calibrating the bias-correction
model, and the remaining 1/3 is used for verifying bias-correction
accuracies; (2) The RCMs simulation data and observation data in the
calibrated and verified samples are sorted in an ascending order
according to the values. Establish linear regression model by disturbing
sequences, to fit the relationship between simulated and observed
climate data. (3) Bringing the simulation data by verification sample
into the established bias-correction equation, the corrected climate
data will be obtained; (4) the pre-correction and post-correction
climate data are compared with the observed data respectively, to
analyze the correction effect of the bias-correction model.
The mean absolute error (MAE) is used as the evaluation index for the
bias-correction model. MAE is calculated as:
\(\text{MAE}=\frac{1}{n}\bullet\sum_{i=1}^{n}\left|y_{i}-y_{i}^{*}\right|\)(1)
where: \(y_{i}\) is the observed climate data, \(y_{i}^{*}\) is the
simulated climate date, n is the sample size.
2.3 Deep learning
Among the existing runoff forecasting methods, MLR is widely used
because of its simple principle and operation (Bauer and Curran, 2005).
In addition to MLR, SVM and ANN as machine learning methods have also
been successfully applied in several hydrological forecasting recently
(Asefa et al., 2006; Lin et al., 2010; Pan et al., 2007; Leahy et al.,
2008). SVM is a pattern recognition approach basing on statistical
learning theory (Vapnik, 1995). Prediction error and structural
complexity are simultaneously minimized in SVM. ANN is a mathematical
model that simulates the processing mechanism of complex information in
the human brain’s nervous system (Marcoulides, 2004). Therefore, MLR,
SVM and ANN are also used for hydrological forecasting in this study for
comparative analysis.
Deep learning are powerful tools in system simulation field, which are
widely used in image recognition (Smirnov et al., 2014), big-data
analytics (Wang et al., 2018). Compared to shallow machine learning,
deep learning transforms the original data features layer by layer, and
it has a more hierarchical learning of data features. MLP (Fig. 2) is
one of the typical deep learning. MLP has strong learning and
representation ability for nonlinear relationships among variables. In
this study, MLP is selected to be applied in runoff forecasting, and its
potential advantages in hydrological forecasting compared with
traditional machine learning are explored. The number of neurons in the
input and output layers are consistent with the number of input and
output variables, respectively. And the number of neurons in the hidden
layer is determined by parameter adjustment. In this study, the input
layer of the MLP network contains 4 neurons, the output layer contains 1
neuron, and there are two hidden layers. Each hidden layer contains 64
nodes.
Commonly used transfer functions are sigmoid, tanh, relu, etc. Compared
with other transfer functions, relu can effectively alleviate the
gradient disappearance. The MLP in this study uses relu as the transfer
function.
The transmission of information in MLP is as:
\(x_{\text{ij}}=f_{i}\left(W_{i}X_{i-1}+b_{i-1}\right)\) (2)
where \(x_{\text{ij}}\) is the output of layer i, node j, \(f_{i}\) is
the transfer function at layer i, \(W_{i}\) is the weights between layer
i-1 and layer i, \(X_{i-1}\) is the output of layer i-1, and\(b_{i-1}\) is the bias of layer i-1.
2.4 Hydrological simulation
Based on historical data, the correlation coefficient between each
climatic factor and runoff is calculated (Table 1, Table 2). The
climatic factors with strong correlation coefficient are selected as the
inputs.
The daily results show that the effect of precipitation on runoff is
stronger than temperature. Time lag between precipitation and river
discharge is about two days. Temperature has weak effect on the runoff,
and time lag is insignificant. Therefore, daily precipitation observed
two days ago, one days ago and in the same day as daily runoff are
chosen as inputs, and daily temperature observed in the same day as
daily runoff is also chosen.
The monthly correlation results show that the correlation of
precipitation on runoff is still slightly higher than temperature, and
time lag of both is about one month. Therefore, the average monthly
precipitation and temperature observed one month ago and in the same
month are chosen as inputs, in the monthly runoff forecasting.
In this study, Pearson correlation coefficient (ρ), Spearman correlation
coefficient (ρs), root mean square error (RMSE), Nash
coefficient (Nash) and relative square root error (RRSE) are used to
assess accuracy of model simulation. The calculation formula is as:
\(\rho=\frac{\sum_{i=1}^{n}{\left(y_{i}-\overset{\overline{}}{y}\right)\left(y_{i}^{*}-\overset{\overline{}}{y^{*}}\right)}}{\sqrt{\sum_{i=1}^{n}\left(y_{i}-\overset{\overline{}}{y}\right)^{2}}\bullet\sqrt{\sum_{i=1}^{n}\left(y_{i}^{*}-\overset{\overline{}}{y^{*}}\right)^{2}}}\)(3)
\(\rho_{s}=1-\frac{6\sum_{i=1}^{n}d_{i}^{2}}{n\left(n^{2}-1\right)}\)(4)
\(RMSE=\sqrt{\frac{\sum_{i=1}^{n}\left(y_{i}-y_{i}^{*}\right)^{2}}{n}}\)(5)
\(Nash=1-\frac{\sum_{i=1}^{n}\left(y_{i}-y_{i}^{*}\right)^{2}}{\sum_{i=1}^{n}\left(y_{i}-\overset{\overline{}}{y}\right)^{2}}\)(6)
\(RRSE=\sqrt{\frac{\sum_{i=1}^{n}\left(y_{i}-y_{i}^{*}\right)^{2}}{\sum_{i=1}^{n}\left(y_{i}-\overset{\overline{}}{y}\right)^{2}}}\)(7)
where: \(y_{i}\) is the observed runoff, \(y_{i}^{*}\) is the modelled
runoff, \(\overset{\overline{}}{y}\) is the average of observed runoff,\(\overset{\overline{}}{y^{*}}\) is the average of modelled runoff;\(d_{i}\) is the grade difference between observed runoff and modelled
runoff; n is the sample size.
ρ and ρs reflect the strength of correlation between
simulated value and observed value. The closer their value is to 1, the
stronger the correlation is. RMSE reflects the error of the simulated
value, that is, the magnitude of deviation. The closer its value is to
0, the more accurate the simulation is. Nash and RRSE reflect the
prediction error. Nash approximates to 1 and RRSE approximates to 0,
showing the error is minimized.