1. Introduction
Drought is an extreme climatic phenomenon that develops slowly but increases in intensity and frequency, causing significant consequences [
1]. The drought in Texas in 2011, the U.S. Central Great Plains drought in 2012, and the California drought have caused extreme losses to society, agriculture, and ecosystems, significantly impacting water supply and crop production [
2,
3]. As per the United States Department of Agriculture (USDA) Report in 2017, nearly 800 million people were found to be malnourished, with most of this population coming from developing countries, and the primary cause was drought [
4]. Drought’s known effects are decreased water supply, poor water quality, environmental hazards, forest fires, crop failure, decreased productivity, disturbed riparian habitats, reduced power generation, famine, civil unrest, and suspension of recreational activities. Drought frequency and severity will increase due to climate change (e.g., precipitation pattern change) and socioeconomic changes (e.g., increasing water demand) [
1]. As a result, accurate drought prediction based on real-time drought monitoring is necessary for early warning, preparation, and mitigation to lessen its negative impacts [
5].
Meanwhile, climate change and temperature rises, which affect drought frequency and severity, are expected to continue [
6], making future droughts more severe and prolonged. Thus, evaluating and predicting droughts is critical for early warning and preparing the most vulnerable areas to mitigate drought impacts. Drought prediction at various time scales is required to develop management strategies to minimize negative societal and economic impacts [
7]. Previous studies have classified drought into multiple types to characterize and address drought events: meteorological drought, hydrological drought, agricultural drought, and socioeconomic drought [
8]. Usually, when there is a shortage of precipitation, it leads to a decrease in water availability. This is known as a meteorological drought, which can result in hydrological drought due to streamflow reduction and invariably spreads over time through a hydrological cycle, causing reduced soil moisture and agricultural drought. Socioeconomic drought occurs when the water supply does not satisfy society’s productive and consumptive activities. In addition to the four types of drought, many studies have defined other types, such as groundwater drought, environmental drought, and ecological drought, depending on their target systems [
9,
10,
11].
Rainfall data are crucial for assessing drought. Observations from conventional precipitation gauges are regarded as the most accurate source of precipitation data for the most part. However, satellite-based precipitation data for drought monitoring are preferred due to the high installation and maintenance cost, short-length data records,, erratic observations, and limited spatial measurements of precipitation gauges [
12]. These shortcomings can be resolved using the precipitation data from satellites. In previous drought forecasting studies, researchers used meteorological data based on rain gauge ground data, which were found to be missing in most areas, making drought forecasting complex. Satellite precipitation data are derived from sensors that use infrared. Researchers utilized satellite precipitation data from Climate Hazards Group Infrared Precipitation (CHIRPS) products, available at spatial resolutions of 0.25 and 0.05 degrees, to monitor South Asian droughts [
13]. Tote et al. [
14] used CHIRPS precipitation data to monitor droughts and floods in Mozambique. Shukla et al. [
15] used CHIRPS precipitation data to forecast droughts in East Africa.
Several researchers have used different machine-learning tools for drought prediction recently; machine learning (ML) models such as artificial neural networks (ANNs), random forest (RF), long short-term memory (LSTM), and support vector machines (SVMs) have been used for drought prediction [
16,
17,
18,
19]. ML models can improve prediction accuracy without being explicitly designed for the purpose [
20]. With the large variety of drought prediction models available, it can be arduous for researchers to decide which is the best model for drought prediction research if they are unaware of the available prediction models. Most models have more limited performance with short-term forecasting than with long-term forecasting [
21]. This is because the hydrological processes related to short-term drought are complex, including factors such as moisture, temperature, and evapotranspiration after precipitation.
There has been less research on using the RF model in drought prediction [
22]. Moreover, RF has not been used for drought prediction near the Folsom Lake basin area. In the study by Chen et al. [
23], ARIMA and RF models were used in China’s Haihe River basin to predict drought. The study included precipitation, temperature, evaporation, and surface water content as input variables. The RF model can forecast reliable results without adjusting its parameters, is flexible in capturing time series fundamental relationships, and can generate an ensemble of drought forecasts rather than a mean prediction. According to the study, the model based on RF was found to be more reliable than ARIMA in forecasting droughts, both in the short and long term. Dikshit et al. [
24] used RF for drought forecasting in New South Wales, Australia. SPEI3 was found to have a better coefficient of determination (R
2) than SPEI1 in the validation period. Rainfall, minimum, maximum, and mean temperatures, potential evapotranspiration (PET), vapor pressure, and cloud cover were the climatic variables used in the prediction model. Park et al. [
25] utilized three machine learning approaches: RF, boosted regression trees, and Cubist for drought prediction in two arid (48 counties of Arizona and New Mexico) and two humid regions (146 counties of North and South Carolina). RF was found to perform best for SPI prediction. Similarly, Park et al. [
26] predicted severe drought using satellite images and topography data based on RF in western Korea.
Similarly, among the various ML techniques, SVR can be regarded as one of the most used drought prediction models [
5]. The SVR model changes the non-linear relationship between predictor and predictand to linear, where inputs are mapped into higher dimensional space. Mapping is conducted in SVR using the kernel function. SVR can learn from a small set of datasets and has the ability to handle datasets. Borji et al. [
27] used SVR and ANNs for hydrological drought forecasting. The study showed better efficiency for the SVR model in long-term drought forecasting than the ANN. Achite et al. [
28] found that an SVM produced better results out of four machine learning models used—which were an ANN, an artificial neuro-fuzzy inference system (ANFIS), an SVM, and a decision tree (DT)—in hydrological drought forecasting in the Wadi Ouahrane basin in the northern part of Algeria. A hundred percent accuracy was found in the SVM model for hydrological drought forecasting in the Gidra River [
29].
The above studies have the limitation of not comparing the hydrological drought indices resulting from HEC-HMS discharge output, RF model prediction, SVR model prediction, and station gauge data for short-term drought in a basin using drought indices SSI. This research aims to fill this gap by comparing their results with those of RF and SVR for the short term (1 and 3 months). Moreover, the satellite precipitation data from the CHRS data portal are used in our prediction model for hydrological drought indices calculation (SSI), and the process-based hydrological modeling tool (HEC-HMS) is used for discharge calculation, which is needed in hydrological drought indices (i.e., SSI) calculation. The novelty of our research is the use of machine learning and hydrological models for future drought prediction and evaluation, respectively.
This study investigates the feasibility of using satellite precipitation data for evaluating and predicting drought severity. The SSI is used to quantify the severity of hydrological drought, and the HEC-HMS is used to model streamflow. Previous studies have used streamflow from only the gauge stations and measured drought indices. However, in this study, the station gauge data are used, a basin is created, and discharge is calculated at the outlet using the HEC-HMS. The station discharge and HEC-HMS discharge are used for SSI calculation. RF and SVR models are applied using time-lagged SPI, SSI, and climatic data for future hydrological drought prediction (i.e., SSI prediction), indicating the study area’s drought severity. The research shows good predictability of hydrological drought (SSI) in a basin using RF and SVR models for short-term drought.
2. Materials and Methods
2.1. Study Area
The evaluation of hydrological drought was applied to the Folsom Lake basin in Northern California, which recently experienced extreme drought from 2012 to 2016 [
30]. The drought monitoring study carried out by Mozgovoy [
31] with the use of satellite and ground data concluded that there has been desiccation of Folsom Lake during 2011–2015. This area requires special attention and the basin contributing to the runoff towards Folsom Lake was chosen as a site for this research. The study area is 883 km
2, and the central part of the basin is located at 38.93° latitude and -121.02° longitude.
Figure 1 illustrates the location of the designated study area and the gauge station downstream. The study area generally has a Mediterranean climate, with hot, dry summers and cool, wet winters. According to the Western Regional Climate Centre [
32], average annual temperatures in the Folsom Lake area range from a low of around 10 degrees Celsius in winter to a high of around 30 degrees Celsius in summer. Maximum temperatures can exceed 38 degrees Celsius during heat waves in the summer, while minimum temperatures can drop below freezing in the winter. Annual precipitation varies significantly, with some years experiencing above-average rainfall and others experiencing drought conditions. On average, the area receives around 500–800 mm of precipitation annually, most falling between October and April. Recently, at the beginning of 2023, intense storms caused flooding, power outages, and landslides in the area. The opposite of the flooding was the extremely low precipitation experienced in 2015 and 2016. Climate change has altered atmospheric patterns, which can also be seen in this area. One extremely dry and one extremely wet about every two decades is experienced in the southern California region [
33]. The flooding and drought events in the area have raised the importance of drought and flood management and preparedness strategies.
Similarly, the area is a blend of urban, agricultural, and natural land. Forest and agricultural land dominate the land cover, including orchards, vineyards, and row crops. Natural areas, such as grasslands, oak woodlands, and chaparral, are also home to various wildlife species. Overall, the land use and cover are a mix of urban, agricultural, and natural areas. The area has diverse hydrogeological features blended by both natural watersheds and managed groundwater systems. Major rivers like the American and Truckee play crucial roles in surface and groundwater systems. The study region extends from the Sierra Nevada mountains to the Sacramento valley and comprises complex geology, including ancient metamorphic rocks, mineral resources, and seismic features. [
34]
The North Fork American River contributes its runoff toward Folsom Lake. The Basin is created upstream of Folsom Lake in the American Fork River tributary. It is one of Northern California’s most extended branches of the American River. The flow of this river fluctuates depending on the precipitation amount and time of the year. The flow is high in the rainy season and may decrease during dry months. Basin-contributing runoff on Folsom Lake is critical, as water from it is used for recreation, drinking water, irrigation, and hydroelectric power plants [
35].
Figure 2 indicates the historic drought conditions in California, as obtained from the US drought monitor. This shows the vulnerability of California to drought and the need of its preparedness for future drought conditions. Here, the figure indicates extreme drought occurrences in California state overall from 2014 to 2017 and later in different months of 2021, 2022, and 2023. This has significance with the drought index value that we evaluated with the precipitation data in that time interval, although our study area is confined to a part of California which is basically the watershed area contributing towards the runoff of Folsom Lake.
Table 1 represents the gauge information of the watershed used in our research study.
2.2. Data and Processing
This study applied different data types to prepare a comprehensive hydrological model using the HEC-HMS and machine learning. These data sources included a digital elevation model (DEM), which provides valuable information on elevation and slope characteristics across the study location. The curve number (CN) grid, a necessary component of hydrological analysis, was generated by integrating data from land use, land cover (LULC), and soil group data. This grid allowed for a detailed assessment of potential runoff within the study region. The impervious data set was crucial in identifying non-permeable surfaces, encompassing urban infrastructure such as buildings and road networks. These datasets were integrated within the Arc-GIS platform, and the resultant integration was employed to formulate a comprehensive hydrological model utilizing the HEC-HMS framework.
Furthermore, the research also incorporated precipitation data obtained from the CHRS data portal, which served as a fundamental input in multiple modeling techniques, including the HEC-HMS, RF, and SVR models. This precipitation data was crucial in enhancing the precision and reliability of the hydrological assessments. Moreover, discharge data from the United States Geological Survey (USGS) gauging station were diligently incorporated into the research, further enriching the hydrological analysis with real-world, ground-truth data. The monthly average discharge and precipitation values for 21 years dated from 2001 to 2021 were used for this study. This holistic approach, amalgamating various datasets and integrating observed discharges, bolstered the scientific rigor and comprehensiveness of the study’s hydrological modeling and predictions. The data used and their sources are mentioned below in
Table 2.
2.3. Drought Indices
2.3.1. Standardized Precipitation Index (SPI)
The SPI measures the severity of meteorological drought using the probability distribution function of precipitation at multiple time scales, directly indicating drought due to a lack of precipitation. Thus, SPI’s evaluation considers only precipitation data for the development of the probability distribution. The higher time frame precipitation data are adjusted with a specific (typically gamma) distribution, which is then altered into a standard normal distribution using an equal probability transformation approach [
37]. SPI values below zero indicate dry conditions, while those above zero signify wet conditions [
38]. The SPI value indicates the deviation of the total precipitation deficit from the normalized mean value [
39]. It can be evaluated for various durations like 1, 3, 6, 12, 24, and 48 months. The SPI was chosen as a meteorological index for comparing drought severity by the participants in the inter-regional workshop on indices and early warning [
40]. This study uses the SPI as a predictor for hydrological drought prediction in a machine learning model. The formula for calculating the SPI for a given month ‘i’ is generally expressed by Equation (1),
where,
is the observed precipitation for the month i,
is the mean precipitation and
is the standard deviation of the precipitation data. In this study, the SPI is computed for periods of 1 month and 3 months, which are denoted as SPI1 and SPI3, respectively. Similarly,
Table 3 presents the SPI classification for drought severity, as outlined by Fitchett in 2019. The SPI values are categorized into various conditions ranging from extremely wet to extremely dry. This classification provides a qualitative framework for interpreting SPI values, aiding in assessing moisture conditions based on historical precipitation data.
2.3.2. Standardized Streamflow Index (SSI)
The SSI is another extensively used drought index, especially to evaluate the severity of hydrological drought [
42]. The computation process for the SSI closely resembles that of SPI; the SSI calculation uses observed or simulated streamflow data, while SPI evaluation requires precipitation data. The frequencies of monthly streamflow values are found through a specific probability distribution function. Then the quantiles of the standard normal distribution corresponding to the frequencies are calculated as the SSI values [
43]. The classification of the SSI is presented in
Table 4. The SSI can also be evaluated at different time scales. This study considered 1- and 3-month SSI evaluations to characterize and compare seasonal and short-term droughts with SPI evaluations. The formula for calculating the SSI for a given month ‘i’ is generally expressed by Equation (2),
where,
is the discharge for the month i,
is the mean discharge, and
is the standard deviation of the discharge value. This study computed SSIs for 1 month and 3 months, denoted as SSI1 and SSI3, respectively. Similarly, the SSI range classification, as in
Table 4, provides a comprehensive overview of hydrological conditions based on probability within specified intervals. Ranging from extremely wet to extreme drought, the SSI values are associated with distinct conditions and their corresponding probabilities. For instance, an SSI value equal to or greater than 2.0 signifies extremely wet conditions with a 2.3% probability, while an SSI value of −2 or lower indicates extreme drought conditions, also with a 2.3% probability. These classifications, incorporating condition and probability, contribute to a nuanced understanding of streamflow variability, offering valuable insights into the likelihood of different hydrological states.
2.4. Hydrological Model
This study focuses on drought indices, which require a long-term series of discharge data in the study area. However, such data may not always be available in regions where data are scarce. To overcome this issue, the study employs the HEC-HMS model to simulate the river discharge at the outlet of the study watershed. The simulated data are then utilized to calculate the drought indices. To evaluate the discharge at the basin outlet, the hydrological analysis uses the HEC-HMS model. This model is typically applied to evaluate discharge with precipitation, taking into account the meteorological and topographic properties of the basin [
45]. The HEC-HMS simulates the process of converting rainfall into runoff during an event while considering critical factors that control the runoff [
46]. The HEC-HMS model consists of distinct loss, transformation, and watershed routing modules. The routing, loss, and transformation methods chosen are the Muskingum routing method, Soil Conservation Service (SCS) Curve Number, and SCS unit hydrograph methods, due to their widespread use, stability, and data availability. The HEC-HMS model integrates the basin model, meteorological model, control specification, and input data as its components.
A basin model explains the basin’s physical attributes to simulate runoffs and rainfall abstraction [
47]. The inputs used for the model are the watershed characteristics such as land use, soil types, and other relevant parameters like impervious area. Variable data sources like USGS and USDA were utilized. DEM, LULC, and soil group data of the area of interest were clipped using an arc map. This process aimed to estimate the extent of impervious surfaces within the research area’s geographical boundaries. The basin model used the SCS curve number (CN) method to calculate runoff and abstraction [
48].
At the beginning of the project, Arc-Hydro, an integrated part of Arc-GIS Pro software with version 3.2.1, extracts the network of rivers and the terrain data. We used various Arc Hydro tools to obtain various datasets that represent the drainage patterns of the catchment. These datasets included stream definition and segmentation, flow direction and accumulation, and watershed delineation generated using raster analysis. Next, the sub-basin parameters were derived using arc hydro tool-generated raster data. The curve number is a hydraulic attribute that we generated utilizing the soil and land use database in Arc Hydro. Moreover, the HEC-GeoHMS allowed us to import lag time and impervious percentages like hydrological attributes, which were crucial in our study.
The meteorological model considers various parameters, including rainfall and discharge. For this study, rainfall data for the research area were obtained from the Precipitation Estimation from Remotely Sensed Information using ANN—Dynamic Infrared Rain Rate near real-time (PDIR-Now). This source allows the use of up-to-date, high-resolution (0.04° × 0.04° pixel) satellite rainfall data from all around the world. On the other hand, runoff data for the part of the specific river being studied were taken from the USGS database. At the downstream endpoint, there is just one USGS gauging station (USGS 11427000) within the study area. The accessed discharge data covers observations from 2001. Furthermore, daily rainfall and discharge data have been collected within a grid for 21 years, from 2001 to 2021.
2.5. Machine Learning Models
2.5.1. Random Forest
In this investigation, the RF machine learning method was employed to predict drought severity in the study watershed. RF stands out as a supervised machine learning method renowned for its predictive and classification capabilities. Its computational efficiency, resilience to instability, and capacity to model non-linear relationships render it particularly appealing [
49]. The RF model constructs multiple decision trees based on a bootstrapped training data sample. Notably, a random subset of predictors is considered for forming binary splits at each decision tree node, providing diversity to the individual trees. To derive the anticipated response, one navigates through the tree starting from its main node to the specific end node. The overall prediction is then calculated by averaging the predictions made by each individual tree. The selection of the best combination of decision trees contributes to the model’s robustness [
50].
To enhance the efficiency and predictive performance of the simulation, focused consideration was given to features with high predictive potential. A random forest regressor was used to mitigate overfitting, and each tree’s depth was restricted to 100. The nodes were continuously expanded until the number of samples in the leaves became lower than the specified amount. Additionally, parameters such as “max_features” were set to ‘auto’ and “max_leaf_nodes” were set to ‘None’, allowing the model to expand based on optimal fitting requirements. The model produces its output by calculating the average of the outputs generated by each individual tree. This ensemble approach, incorporating multiple decision trees, not only guards against overfitting but also synergistically leverages the diverse insights of each constituent tree to enhance the overall predictive accuracy of the RF model.
2.5.2. Support Vector Regression
The SVR model is a machine learning model with a regression pattern similar to the SVM model. It is a supervised ML algorithm proposed by Vladimir and is widely used for nonlinear problems [
51]. SVMs are widely used ML techniques for both classification and regression. When employing an SVM with a kernel function (such as radial, linear, sigmoid, or polynomial), the approach involves transforming a nonlinear problem into a higher-dimension space. In this elevated space, the initially nonlinear problem is converted into a linear one, which is then addressed through SVM techniques [
5].
In SVR, the function used to model the data points is formed by linearly combining “kernels” centered around each input point [
52]. In the SVR model used, a linear kernel relates input features and the target variable linearly. SVR develops the decision boundary or optimal separating hyperplane.
2.6. Selection of Input Variables
The input variable selection process for the machine learning models in this study involves a rigorous process to identify the most influential predictors. Encompassing climatic and hydrological factors, the chosen predictors include discharge data, satellite precipitation data, SPI, and SSI data spanning 1 to 5 months ahead, along with the cumulative sum of 5 months of precipitation, specific humidity, and dew point data. This selection process begins with a comprehensive correlation analysis among all variables, emphasizing the identification of pairs exhibiting significant correlations. Variables with higher correlation coefficients are prioritized, signifying stronger relationships. The goal is to retain a subset of predictors optimized for capturing variability without overfitting, ensuring that the chosen variables collectively maximize predictive power while minimizing redundancy, ultimately enhancing the efficiency and interpretability of the hydrological and climatic predictive models.
The correlation matrix is shown in
Figure 3. It was found that discharge, precipitation, SPI, and SSI data from 1 to 5 months ahead and the sum of 5 months of precipitation data correlate better with SSI1 and SSI3. Based on this good correlation, they are used as predictors for SSI1 and SSI3 in the machine learning models.
First, the meteorological drought index (SPI) and hydrological drought index (SSI) are calculated from the satellite precipitation data and gauge station, respectively, for a duration of 1 and 3 months. Later, the standardized streamflow index is calculated by evaluating the discharge from the HEC-HMS model.
Figure 4 indicates the different steps involved in hydrological modeling using the HEC-HMS.
Machine learning models take 75% of the selector’s predictor dataset as a training set and 25% of the data as a testing dataset. The predictors used here are mentioned in
Table 5 Using these predictors, the SSI is predicted for two-time durations, 1 and 3 months. Later, the value of the coefficient of determination, root mean square error, and mean absolute error are evaluated in the Python platform for the training, testing, and overall datasets.
2.7. Evaluation Parameters
The observed and predicted data were analyzed, and the most suitable model was chosen using the coefficient of determination (R
2) value and root mean square error (RMSE) as criteria. RMSE measures the variance of errors between the actual and predicted values, whereas R
2 determines the fitness between the predicted and original values.
where
and
are the observed and predicted values,
is the mean value, and N is the amount of data.
where RMSE, SSE, and MAE indicate the root mean square error, sum of squared error, and mean absolute error, respectively. A higher R
2 value implies a better prediction capacity of the model, and if the value is 1, then there is a perfect correlation between the predicted and observed values. Usually, values above 0.5 are considered acceptable [
53]. This gives a foundation for determining whether the model is suitable for prediction. Similarly, RMSE quantifies the difference in the actual and predicted values. A lower RMSE value indicates the closeness of the actual values with predicted values. A Zero RMSE value is taken as a perfect fit. RMSE values that are lower than half the standard deviation of the measured data can be deemed low, making them suitable for assessing the model’s performance.
4. Discussion
The results of the study explain and predict drought occurrences in a watershed near Folsom Lake. Both hydrological and machine learning models use satellite precipitation data from the PDIR-Now dataset. PDIR-Now was found to be the best precipitation product in the study of Huang et al. [
55] among different PERSIANN family products. The satellite-based rainfall estimates are advantageous regarding accuracy, timeliness, spatial coverage, and cost efficiency [
56]. It is efficient for real-time rainfall monitoring and showing the development of drought conditions. Satellite precipitation data have demonstrated the capability to accurately depict the spatiotemporal fluctuations in precipitation across most global regions with exceptional precision [
57]. Nonetheless, the precision of satellite-derived precipitation is influenced by numerous factors. A significant drawback of satellite precipitation data is their limited historical data availability. Currently, only PERSIANN-CDR and CHIRPS offer data records exceeding 30 years of coverage, leaving many research studies constrained by the existing dataset [
58].
This study used hydrological modeling to evaluate droughts. Trambauer et al. [
59] reviewed previous studies and found the hydrological model suitable for drought forecasting. Xing et al. [
60] conducted research on the adaptability of hydrological models for the purpose of simulating and forecasting droughts. The study concluded that the HEC-HMS model is suitable for forecasting hydrological droughts due to its adaptable structure. These studies concur with our results showing the better performance of the HEC-HMS model for drought evaluation.
This study showed the high accuracy of the SVR and RF models in predicting hydrological drought. In a study by Jehanzaib et al. [
61], it was concluded that the SVM model showed better performance than the RF model. The SVM model holds significance in the realm of hydrological variables because of its effectiveness in handling high-dimensional spaces. The predicted results for a 3-month duration show a comparatively precise value for 1 month. This coincides with the drought prediction result of Belayneh and Adamowsk [
54], where the SPI6 forecasts were found to be more accurate than SPI3. The only difference is that our study included SSI1 and SSI3 instead of SPI3 and SPI6. The result of SSI3 is more accurate than SSI1 in both machine learning models. This is because of the greater randomness in the weather over a shorter time, which makes accurate prediction challenging. The drawback of both models used here is that neither have taken other types of hydrological data into account, such as groundwater level and temperature, which increase the chances of drought occurrence.
The study employed the HEC-HMS model as the hydrological model, which models several processes, including baseflows, infiltration, and rainfall runoff, using empirical techniques that might not accurately reflect the system’s behavior. Since the HEC-HMS is intended for natural watersheds, it might not yield reliable results in urban watersheds. Furthermore, future climatic changes like temperature and precipitation patterns are not taken into consideration by the HEC-HMS. Integration with other models, such as SWAT or MODFLOW, can improve the model. The model can be further enhanced using real-world field data, such as soil wetness. Better HEC-HMS input could use climatic models to generate future climate scenarios. With noisy datasets, RF and SVR can perform poorly, which may result in overfitting. Hybrid modeling, hyperparameter adjustment, and feature selection can all help overcome this constraint. Effective drought forecasting and preparedness are essential, and this can be achieved by implementing measures to conserve water supplies, such as building dams. According to Kazakis et al., mall dams help retain water for groundwater recharge in the summer [
62]. On the other hand, their quantity and location are determined by their future uses, such as fish farming, leisure activities, or hydropower research. Additional management techniques, such as tiered water pricing, rainwater collecting, and intelligent irrigation, can be used to lessen the severity of droughts.
5. Conclusions
The results indicated that the RF and SVR models predict the SSI more accurately for three months of drought than only one month of drought conditions. This is because there is higher fluctuation in the climatic parameters used in the prediction model in the shorter time duration compared to the longer time duration. The results also indicate the better performance of the SVR model than the RF model in drought prediction for this particular study basin area. Despite slight differences in the prediction results, RF can also be used for drought forecasting with significant accuracy. Unlike previous research, our research has closely examined the performance of the hydrological model using satellite-based precipitation data to evaluate the drought index. The result is satisfactory compared to the actual condition. This allows using a hydrologically modeled drought index when discharge data is unavailable in the basin due to the absence of gauge stations.
The study compared the results based on the statistical parameter (R
2, MAE, and RMSE) values. The study used the present and previous months’ lagged values of SSI, SPI, precipitation, discharge, dew point, and specific humidity as the machine learning model’s input. Further research is suggested to investigate how different drought indices can be used and how they impact the precision of machine learning models in different climatic conditions. The variety of training and testing dataset proportions can be implemented in future studies. Moreover, if available, the model’s performance can be enhanced by including more correlated parameters, varying the lead times, and with more refined data from a reliable source. Because of its complex nature, drought is influenced by factors like climatic conditions, land use, and socio-economic factors, so accurate prediction is not possible [
30]. For the HEC-HMS model, numerous input variables are required, which can influence the model’s output and, ultimately, the assessment of hydrological drought. Future research can incorporate other AI methods as prediction models and select the best for reliability, robustness, and accuracy. This study can be extended to multiple drought-affected basins with different climatic and physical conditions and evaluate the performance variations. Moreover, this study only focuses on hydrological drought using the standardized stream flow index. Future research can investigate other types of droughts, their severity, and frequency using different indices. Further, the hydrological drought result from other hydrological models rather than the HEC-HMS can be evaluated. The different machine learning models can be hybridized and tested for enhanced model performance in upcoming research.