1.2 Artificial intelligence methods in the prediction of FDCs
With the development and maturity of data science and artificial intelligence, the research focus of hydrological prediction models has gradually shifted from process-drive to data-driven models (Mohammadrezapour et al., 2019; Sharifi Garmdareh et al., 2018) . The data-driven model was based on the statistical properties of the data, without considering the physical causes of runoff, and directly calculates the correlation between the input and output of the model to obtain hydrological prediction results. Machine learning models typically exhibit a relatively complex model structure. By adjusting parameters and conducting model training, the model can continuously approach the optimal mapping relationship between the input and output, and the predicted results usually have high accuracy. However, due to the limitations of the “black box”, decision-makers cannot directly know how machine learning models calculate decision results (Cortez and Embrechts, 2013) . The ”black box” of machine learning models simplifies model input and training, which makes its prediction results lack practical physical significance, and the model is unable to explain how to obtain prediction results from the causes and mechanisms of runoff formation, resulting in low credibility in practical prediction work. But machine learning methods are widely used in hydrology (Khan et al., 2016; Khan et al., 2019) because they have unreasonable effectiveness when applied to real-world problems (Shen, 2018) . Due to the complexity of hydrological systems which cannot be easily represented by simple conceptual relationships between variables and the nonlinear relationship between watershed characteristics and hydrological characteristics, traditional methods lack sufficient ability to predict FDCs, while artificial intelligence models have some applicative potential (Nearing and Gupta, 2015) .
SVM, ANN, and nonlinear regression (NLR) were used for regression prediction using different runoff duration as output variables and six basin feature selections as input variables in a study of 33 watersheds. The results indicate that SVR is the most suitable model for estimating FDC (Vafakhah and Khosrobeigi Bozchaloei, 2020) . A multi-output neural network model was developed to predict the FDC of 9203 dataless areas in the southeastern United States over a 60-year period from 1950 to 2009, suggesting that compared with single-output neural-network models, multi-output neural networks is capable of learning monotonic relationships between adjacent quantiles and yield better predictions (Worland et al., 2019) .
Machine learning (ML) has demonstrated outstanding performance in forecasting FDC and is extensively utilized for predicting (Ley et al., 2023; Vaheddoost et al., 2023) . Existing research has primarily concentrated on enhancing the prediction accuracy of FDC through single ML model, neglecting the impact of its influencing factors, and the prediction accuracy through traditional prediction methods is relatively low. Moreover, there are few research of using multiple machine model algorithms for comprehensive comparison, and conducting regionalization research on FDC prediction based on geographical and climatic characteristics. Explainable machine learning (eg. SHAP) is a rapidly developing subfield aimed at understanding how models use inputs for prediction and eliminating the black box problem (Kim, 2017) . Thus, the main issues studied in this paper include (see Figure 1 ):
[Insert Figure 1]
Figure 1 Framework of the prediction and inference of FDC using ML
This paper utilizes a total of 645 sets of samples, made up of 22 basin characteristic variables (including “mutable” and “immutable”) in 30 years from 244 hydrometric stations located in the middle and lower reaches of the Yangtze River basin. Using typical characteristics of the basin, regional FDC model was established through machine learning methods and the performance of these methods was compared to determine the most suitable model for predicting the FDC. Firstly, the model includes 22 basin characteristics that were selected and divided into mutable and immutable variables and 15 corresponding quantiles of FDC. Secondly, basin characteristic variable-flow quantile database was established using eight typical ML models to study the nonlinear relationship between the input parameter (basin characteristics) and the fifteen flow quantiles which affect the shape of the FDC. Each quantile was predicted and Taylor plots were applied to compare different ML models to select the best one to estimate FDC. Finally, the key influencing factors of various input on the fifteen quantiles were quantified and determined using SHAP. How these important hydrological factors affect the results was also discussed.