2.4.4 Extreme learning machine (ELM)
ELM is a feedforward neural network with only single hidden layer (HUANG
et al., 2004) , which differs from the BP algorithm in that it does not
need to adjust the weight of the hidden layer through reverse iteration.
The weight of the input feature vector is randomly assigned from the
input layer to the hidden laye. (Huang et al., 2006) (see Figure
5 (d)). Its difference with BP is that the weights between the layers
all need to be iteratively solved using the gradient descent method in
the BP algorithm, But for ELM, the weight of input layer and the hidden
layer can be determined without iteration for the certain input feature
vector.
2.4.5 Extreme gradient boosting
(XGB)
The Extreme gradient boosting (XGB) algorithm reduces the error of the
previous prediction step by continuously generating new regression
trees, gradually narrowing the gap between the true and predicted
values, and thereby improving the prediction accuracy (Chen and
Guestrin, 2016) (see Figure 5 (e)). It improves the
generalization ability and computational efficiency of the model by
introducing regularization terms and parallel computing techniques on
the basis of the original gradient boosting decision tree (GBDT)
algorithm.
2.4.6 Radial-Basis Function
(RBF)
Radial-Basis Function (RBF) network
can approximate any nonlinear function, deal with the difficulty to
analyze regularity in the system (Majnooni et al., 2023) (seeFigure 5 (f)). Compared
with BP, RBF has only one hidden layer, while BP does not limit the
number of it. BP is a global approximation of nonlinear mapping, while
RBF is a local approximation of nonlinear mapping, with faster training
speed.
[Insert Figure 5]
Figure 5Principles of the six
machine learning algorithms
2.5 Swarm intelligence optimization
algorithm
Although BP neural networks can quickly adapt to various problems, they
are also prone to falling into local optimum and overfitting, which will
affect the prediction results, while swarm intelligence optimization
algorithms demonstrates their effectiveness in solving complex
optimization problems (Cai et al., 2018). Therefore, we present swarm
intelligence optimization algorithms to seek out both optimal global and
local solution. These algorithms generate initial populations and
utilize heuristic rules to carry out searches and reduce overfitting
risks by optimizing weight parameters to improve the accuracy of FDC
prediction.
2.5.1 Particle swarm optimizationand gray wolf optimizer
algorithm
The particle swarm optimization algorithm (PSO) originates from a group
of animal social interaction models that search for food (Poli et al.,
2007) , in which birds are regarded as particles, and all particle
information within the group is shared to find the optimal strategy (seeFigure 6 (a)). Assuming a fixed search area is limited to ad -dimensional space, where the number of particles is n .
During the iteration process, particles track the “individual extremum
” and “global extremum ” by changing their own speed and position.
The formula for particle speed and position is as follows:
Where and represent the velocities of particle at the iteration and ,
having both size and direction; Similarly,
and are the positions of those
particles; is the individual extremum and is the global extremum which
is the uniformly distributed random number of the currently found
optimal solution in the particle swarm. and denote acceleration
coefficients; and are two random numbers between 0 and 1; is the inertia
weight.
The Grey Wolf Optimizer (GWO) boasts several advantages, including
simplicity in structure and parameters, and capabilities in adaptive
adjustment. These features render it an effective tool in striking a
balance between local and global optimization (Dehghani et al., 2019;
Seifi and Soroush, 2020) . It includes three main stages: finding,
surrounding, and attacking preys (Mirjalili et al., 2014) .Figure 6 (b) shows the working structure. 、、 and the
remaining wolves dominate the social hierarchy, and the level of
dominance of wolf species descends from to , which means that is the
most powerful wolf category, with wolves 、 and guiding the remaining
wolves to search towards their targets. GWO can bypass local optimal
stagnation and enhance the global optimal convergence ability (Adnan et
al., 2023) (Adnan et al., 2023). The hunting behavior of gray wolves
encircling their prey is quantified using the following position update
formula (Maroufpoor et al., 2020; Zhou et al., 2021) .
Where and represent the positions of the prey and the grey wolf
respectively; t represents the current number of iteration; D represents
the distance between the wolf and its prey; is the wolf’s position at
time (t +1); C and A are coefficient vectors; The above equation
represents the distance between the grey wolf and its prey, and the
following formula represents the the grey wolves’ position updated.
[Insert Figure 6]
Figure 6 The principles
of swarm intelligence optimization algorithm: PSO and GWO
2.6 Evaluation of model
performance
The performance evaluation criteria including R -squared
(R 2), root mean squared error (RMSE), and
Nash-Sutcliffe Efficiency (NSE) (Nash and Sutcliffe, 1970) were used to
evaluate the ML performance.
where and are the predicted and the observed value; and are the average
values of observed and predicted; n represents the sample number.
2.7 Explanation of ML’s
output
SHapley Additive exPlanation (SHAP) is based on the Shapley value in
game theory, which can take into account the mutual influence between
all features, providing more accurate and comprehensive explanations.
For each prediction sample, the model generates a corresponding
prediction value, and the SHAP value represents the contribution of each
feature to the model’s overall prediction (Lee, 2017) . The Shapley
value was interpreted as an additive feature attribution method.
3 DATABASE DESCRIPTION AND
ANALYSI
3.1 Selection of input
variables
Though multiple research have been carried out on the prediction of flow
rate in the regions without historical flow rate data via FDC, the
accuracy of prediction and applicability to regions is still unsettled.
Therefore, 224*3 sets of test data were used in this paper.
All basin characteristics
selected in this paper (Table 1) which were selected as input variables
are divided into two categories: mutable variables and immutable
variables. Characteristics that may change over time (such as Aridity
index (AI ), precipitation, etc.) are regarded to as “mutable”
variables. Those characteristics which are reflected as quantities that
are fixed over time, for example, such as elevation、drainage
area、reservoir control area、average slope (dimensionless)、baseflow
index (dimensionless)、topographic wetness index (dimensionless) et al.,
are regarded as “immutable” variables. The impact of basin
characteristics on FDC seems to be empirical (Elena Ridolf et al.,
2020). By assessing the value of each parameter, 15 unique corresponding
flow percentile values can be obtained.
Table 1 Basin
characteristic parameters selected for basins of hydrometric stations
[Insert Table 1]
3.2 Correlation and description of basin characteristics
The results judged by the Spearman correlation coefficients (seeFigure 7 ) show that BFI_ Mean and has a significant
correlation with flow data includes
15 percentiles corresponding to exceedance probabilities. Smax, AP_max,
DP_ std, ATP and SWS all exhibit a negative correlation with the
streamflow percentile. From Figure 7 (c), the relationship
between DA and the streamflow percentile varies from negative to
positive with the rise of percentiles (decrease in the rate of flow and
the return period), while the relationship between wind and flow
percentile is reversed. Moreover, BFI_ Mean has a better correlation
with low flow rate (bottom part of the FDC curve), while has a higher
correlation with high flow rate.
[Insert Figure 7]
Figure 7 Correlation of
input and output variables. (a) shows the correlation between variables
of selected characteristics and Q0.3. (b) shows the correlation between
all the variables. (c) shows the correlation between variables of
selected characteristics and the flow rate includes 15 streamflow
percentiles corresponding to exceedance probabilities.
[Insert Figure 8]
Figure 8 Distribution of
the seven variables with high correlation withQ0.3
As shown in Figure 8 ,
scatter matrix are further used to display the data distribution
features of these input variables. Considering the large size of the
matrix, here only a small portion of the matrix was shown, selectively
analyzing the seven variables
(BFI_mean、Smax、ATP、DP_std、AP_max、、SWS) with high correlation
with Q 0.3 in Figure 7 to show the data
situation. The frequency distribution of these input variables are
represented in the diagonal line. The upper right section portrays the
scatter plot of input basin characteristics. It can be observed that ATP
and DP_std follow a linear relationship. Nevertheless, the variables
exhibit a tendency towards nonlinear distribution, thus uncomplicated
linear regression and nonlinear regression models are not able to
precisely describe the distribution features of the input variables and
their impacts on the output variables. The lower left section displays
the probability density of the distribution of these input variables,
with darker colors indicating a higher probability of data appearing. To
take an instance, ATP is concentrated near the value of 500 when
BFI_mean assumes the value of 0.5. In order to fully evaluate their
impacts on them, SHAP was utilized to increase the interpretability of
ML models and disclose the mechanism in section 4.4.
4 RESULTS AND DISCUSSIONS
4.1 Parametric analysis
[Insert Figure 9]
Figure 9