Figure 13 Comparison of predicted and observed
Q90
[Insert Figure 14]
Figure 14 Comparison of
predicted and observed Q99.7
[Insert Figure 15]
Figure 15 Performance of
8 ML models on the training sets
[Insert Figure 16]
Figure 16 Performance of
8 ML models on the testing sets
In this paper, a comparative analysis is conducted on the predicted and
observed values of 15 streamflow percentiles corresponding to the FDCs
obtained from 8 models, and the predicted and observed values show
consistency. From Figure 10 to Figure 14 , we mainly
analyze the prediction results of five key streamflow percentiles (,
,,,).
Overall, the ratio of predicted to observed values is stable around 1,
and R 2 is close to 1. Neural networks have
better generalization capabilities than other machine learning
algorithms, as evidenced by their better predictive accuracy on the
testing set. The predictive accuracy of each model for the upper tail of
the FDC is higher than that for the lower tail, withQ 5 having the highest predictive accuracy andQ 99.7 having the lowest predictive accuracy. The
prediction difference between observed and predicted values may be
attributed to the random-like property of hydrological phenomena.
Related literature (Montanari and Koutsoyiannis, 2012) also reached
similar conclusions. Among the eight ML models on the testing set, the
prediction performance of ELM and RBF is worse, probably due to the
simplicity of single-layer neural networks. The predictive ability of
the XGB, PSO-BP and GWO-BP models is significantly better. We noticed
that these three models show good predictive ability on both the
training and testing set, with the R 2 for both
the training and testing set being greater than 0.8 at different
streamflow percentiles. The predictive performance of models shouldn’t
only be evaluated by a single metric. Compared to scatter plots, which
can only display the relationship between individual indicators (Choubin
et al., 2018) , the Taylor diagram integrates three evaluation metrics:
correlation coefficient, centered root-mean-square, and standard
deviation, based on the cosine relationship between the three to
evaluate the predicting performance from different perspectives
(Figure 15 、Figure 16 ). When the model prediction
results are consistent with the observed values, the closer the point
“model” is to the point “observed” on the x-axis, the higher the
correlation between such models and observations.
[Insert Figure 17]
Figure 17 Comparison of
predicted value obtained by PSO-BP, GWP-BP and observed data (Testing
sets)
For training sets, the RBF and ELM models’ prediction performances are
poor, while XGB performs the best. For high tails, the prediction
performance of the eight models can be ranked as XGB > SVM
>RF > PSO-BP> BPNN >
GWO-BP > RBF > ELM, while the prediction
performance can be ranked as XGB > RF > BPNN
> GWO-BP > SVM >PSO-BP
> RBF > ELM for low-tailed data.
As for testing sets, the prediction performance of the BPNN is not as
good as the other seven models, while XGB, PSO-BP, and GWO-BP all
exhibit good performance. For high tails, the prediction performance can
be ranked as follows: PSO-BP > GWO-BP > XGB
> RF > SVM > BPNN >
ELM > RBF. For low-tailed data, the prediction performance
is ranked as follows: GWO-BP > XGB > PSO-BP
> RF >BPNN > ELM
>SVM >RBF.
It is worth noting that the prediction accuracy of the lower tail of FDC
through machine learning is significantly lower than that of the upper
tail, but GWO-BP and XGB perform well in predicting the lower tail. By
comparing evaluation indicators, it is determined that the GWO-BP and
XGB models are the best models for predicting FDC. Moreover, it can be
concluded that optimizing ML model parameters using the swarm
intelligence optimization algorithms can effectively and significantly
enhance the model’s predictive capability and generalization ability by
comparing BPNN, PSO-BP, and GWO-BP (Figure 17 ).
4.3 Prediction results throughout the entire
duration
[Insert Figure 18]
Figure 18 Overall
evaluation (R2) (testing sets) of the estimated
quality of ML models (15 streamflow percentiles)
Multiple points of streamflow percentiles can reflect the shape of the
FDC. The R 2 and NSE are usually used to assess
the model prediction. We believe that an R 2greater than 0.85 indicates good predictive performance for the model.
In addition to considering R 2, std ,cor , and R MSE which we have analyzed and discussed in the
condition of six most typical streamflow percentiles in the previous
section, models with NSE values less than or equal to 0.50,
0.50~0.65, 0.65~0.75, and greater than
0.75 are considered to represent 4 categories: bad, satisfactory, good,
and excellent performance respectively (Fatehi et al., 2015) .
[Insert Figure 19]
Figure 19 Overall
evaluation (NSE) (testing set) of the estimated quality of ML models (15
streamflow percentiles)
As shown in Figure 18 and Figure 19 , considering theR 2 and NSE criteria, the results show
that RF, PSO-BP, and XGB all achieve very good performance, except for
Q99.7 which only has satisfactory results. And it is observed that the
XGB model has less predictive power for larger and smaller streamflow
percentiles than for the middle streamflow percentiles, with
particularly good predictive performance for the middle part. The GWO-BP
model performs well across the entire duration range in the testing set
(i.e., high flow to low flow) with R 2 of 0.86
to 0.94 and NSE of 0.78 to 0.94. The performance of the RF,
PSO-BP, and XGB models is also good throughout the entire duration, but
it is lower than that of the GWO-BP model. From the perspective of the
sustained range of the entire FDC, the GWO-BP is the best model to
predict FDC among all the models in this paper.
Compared with the research of Vafakhah and Khosrobeigi Bozchaloei
(Vafakhah and Khosrobeigi Bozchaloei, 2020) , which is believed that SVR
is the optimal model for predicting FDC with relative RMSE of 9.37 to
1.45 and NSE of 0.54 to 0.91, the GWO-BP model we selected in
this paper greatly improve the accuracy of prediction.
4.4 Feature importance analysis of the
processes
[Insert Figure 20]
Figure 20 Interpretation
of the predicted FDC and feature important analysis
Due to the “black box” issue, the ML models have their limitations.
(Esterhuizen et al., 2022) . The “feature importance” merely reflects
which feature is more important, but how it influences the prediction
results is unknown. In this paper, Shapley was used to explain the
results of machine learning. The advantage of SHAP values is that they
not only reveal the impact of each feature in given samples but also
indicate the sign of that impact (i.e., whether it is positive or
negative) (Dikshit and Pradhan, 2021) .
From Figure 20 , it can be seen that the impact of 22 variables
of basin characteristics on 6 critical streamflow percentiles (,,,,,)
was analyzed. The SHAP values are calculated for each sample and
variable, globally demonstrating the impact of feature on the model,
which quantifies the contribution of 22 environmental variables to
different streamflow percentiles. Each row represents an environmental
input variable, with the horizontal axis indicating the distribution of
SHAP values. Each point represents a sample, with color indicating the
feature value number (red for high values and blue for low values).
Comparing Figure 20 (a) and Section 3.2, the correlation
coefficients between BFI_mean, Smax, ATP, DP_std, AP_max, , SWS and
are relatively high, which are slightly different from the neural
network prediction results but generally consistent. It is worth paying
attention to that the zero value represents the average value of on the
horizontal axis. Considering , there is a rise in the value of SHAP when
decreases (changes in the color from red to blue). When reaches its
maximum, the is 1 lower than its average value, while reaches its
minimum, the is 2 higher than its average value. This is because a high
probability of no precipitation days indicates a decrease in
precipitation frequency (Cheng et al., 2012) , which will significantly
lower the value of flow rate quantile.
It also can be seen that the impact of environmental variables on
different streamflow quantiles varies noticeably. However, the two main
influencing factors for streamflow quantiles remain nearly unchanged,
with and BFI_mean playing the key roles. High values of the will reduce
the flow rate of streamflow quantiles, exerting a negative impact, while
high values of the BFI_mean feature will increase the flow rate of
streamflow quantiles, exerting a positive impact. with higher SHAP
values results in lower streamflow percentiles, exerting a negative
impact to the output values, while BFI_mean with higher SHAP values
results in higher streamflow percentiles, having positive impacts on the
output values. Additionally, it can be observed that
contributes more to the prediction
of high flow rate values such as ,, which means it will predict the
upper tail of FDC more accurately, while BFI_mean has a greater impact
on the prediction of low flow rate values such as , which means it will
predict the lower tail of FDC more accurately.
The
main influencing factors obtained through SHAP in this paper are
consistent with the physical controls of the gamma distribution fitting
parameters in the same region, proving the accuracy of the model in this
paper (Yu Zhou, 2023). The influence of annual average precipitation and
maximum precipitation on the prediction results of flow quantiles is not
significant, while the prediction results of flow quantiles are closely
related to , indicating that high flow may be driven by short-term
precipitation events, which are closely related to the
frequency of precipitation
occurrence, and these events cannot captured by annual average
precipitation. The frequency of precipitation has a significant impact
on the prediction results of FDC, mainly because it directly affects the
runoff generation mechanism and water balance of the watershed (Butcher
et al., 2021). High precipitation frequency means that there will be
more precipitation in a shorter period of time, which will lead to
faster collection of surface runoff, the increase in saturation degree
of soil and reduction of the infiltration capacity of soil (Crow et al.,
2018). Soil moisture content is high and the evaporation amount will
decrease accordingly, resulting in reduced water consumption. Thus, the
negative impact of precipitation frequency on the streamflow
corresponding to percentiles may be mainly due to the fact that frequent
precipitation can lead to faster collection of surface runoff, resulting
in slower increases or faster decreases in the streamflow. The BFI is
largely influenced by the water storage capacity of the aquifer and
human activities. It indicates the importance of the aquifer’s water
storage capacity in predicting low flow parts (Mazvimavi et al., 2004) .
Basins with low BFI cannot maintain
good water flow mobility. This may result in a shorter duration of flow
in high flow areas, while basins with high BFI can better maintain water
flow mobility, thus maintaining high flow conditions for a longer period
of time, which can explain why BFI_mean exerts positive impacts to the
output values and has greater impacts on the prediction of low flow rate
values.
Like other statistical-based methods
(Burgan and Aksoy, 2022c) , this paper also has shortcomings of the
subjectivity and uncertainty in variables selection (Veber Costa, 2020)
. In future research, except for exploring more watershed
characteristics that influence FDCs and incorporating them into the
model for more precise prediction, larger datasets and scales (e.g.,
global scale) are needed to be considered and examined to enhance the
applicability of the model before it can be applied to various
watersheds with more diverse climate and landscape conditions.
For most data-driven models, such as neural networks, only the
correlation between inputs and outputs is utilized, and the impact
mechanism of influencing factors is unknown (Atieh et al., 2017c;
Bozchaloei and Vafakhah, 2015) (Atieh et al., 2017c; Bozchaloei and
Vafakhah, 2015). Scholars pointed out that machine learning (ML) can
help hydrology make progress in many ways, including (1) incorporating
physics into ML models; and (2) improving the explanatory ability of ML
models (Shen, 2018) (Shen, 2018). From these two perspectives, the
findings of this paper can provide new methods and insights for more
accurately data-driven FDC curve prediction and analysis, which will
help provide scientific basis for water resource management and
hydrological forecasting and reveal the undelying physical processes.
5 CONCLUSION
This paper proposed the different ML
methods to estimate FDCs. Based on a
total of 645 sets of samples, made up of 22 basin characteristic
variables (including “mutable” and “immutable”), eight ML models are
integrated to predict the FDC (flow quantiles corresponding to 15
exceedance probabilities). Moreover, the SHAP analysis was used to
identify the main input variables that affect the prediction results of
different streamflow quantiles and the degree of that influence. The
optimal model for predicting under this environmental condition was
found. The main conclusions can be drawn in the following:
- With the high prediction accuracy and good generalization ability,
GWO-BP and XGB are the best models for predicting FDC. Moreover,
optimizing ML model parameters using the swarm intelligence
optimization algorithms can significantly enhance the model’s
predictive capability and generalization ability of the original BPNN.
- From the perspective of the
sustained range of the entire FDC, GWO-BP is the best predictive model
among the eight with R 2 of 0.86 to 0.94 andNSE of 0.78 to 0.94 in the testing set. It significantly
improved the prediction accuracy of existing research, which is
believed that SVR is the optimal model for predicting FDC with RMSE of
9.37 to 1.45 and NSE of 0.54 to 0.91 (Vafakhah and Khosrobeigi
Bozchaloei, 2020) .
- The predictive impact of variables on different quantiles varies, with
and BFI_mean contributes the most significantly to predicting FDC.
The has negative effects on the prediction result and has better
contribution to predicting higher flow
rate, which is mainly due to the
fact that frequent precipitation can lead to faster collection of
surface runoff, resulting in slower increases or faster decreases in
the streamflow. Basins with low BFI cannot maintain good water flow
mobility, which may result in a shorter duration of flow in high flow
areas, while basins with high BFI can better maintain water flow
mobility, thus maintaining high flow conditions for a longer period of
time. Therefore, BFI_mean exerts positive impacts to the output
values and has a greater impact on the prediction of low flow rate
values.