Abstract

This paper investigates the performance of gridded rainfall datasets for precipitation detection and streamflow simulations in Indiaʼs Tungabhadra river basin. Sixteen precipitation datasets categorized under gauge-based, satellite-only, reanalysis, and gauge-adjusted datasets were compared statistically against the gridded Indian Meteorological Dataset (IMD) employing two categorical and three continuous statistical metrics. Further, the precipitation datasets’ performance in simulating streamflow was assessed by using the Soil and Water Assessment Tool (SWAT) hydrological model. Based on the statistical metrics, Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE) furnished very good results in terms of detecting rainfall, followed by Climate Hazards Group Infrared Precipitation (CHIRP), National Centres for Environmental Prediction-Climate Forecast System Reanalysis (NCEP CFSR), Tropical Rainfall Measurement Mission (TRMM) 3B42 v7, Global Satellite Mapping of Precipitation Gauge Reanalysis v6 (GSMaP_Gauge_RNL), and Multisource Weighted Ensemble Precipitation (MSWEP) datasets which had good-to-moderate performances at a monthly time step. From the hydrological simulations, TRMM 3B42 v7, CHIRP, CHIRPS 0.05°, and GSMaP_Gauge_RNL v6 produced very good results with a high degree of correlation to observed streamflow, while Soil Moisture 2 Rain-Climate Change Initiative (SM2RAIN-CCI) dataset exhibited poor performance. From the extreme flow event analysis, it was observed that CHIRP, TRMM 3B42 v7, Global Precipitation Climatology Centre v7 (GPCC), and APHRODITE datasets captured more peak flow events and hence can be further implemented for extreme event analysis. Overall, we found that TRMM 3B42 v7, CHIRP, and CHIRPS 0.05° datasets performed better than other datasets and can be used for hydrological modeling and climate change studies in similar topographic and climatic watersheds in India.

1. Introduction

Precipitation is an intrinsic component of the hydrological cycle. Whether measured directly through rain gauge stations or measured from different satellite sensors, it plays a crucial role in water resources management, climatic research, and disaster management studies. Though in situ ground-based precipitation datasets provide highly accurate results, the unavailability of data and sparse and uneven distribution of gauges over unpopulated areas makes it challenging to use them for global applications. Recently, in [1], it was highlighted that the number of rain gauges that measure precipitation globally is surprisingly small, covering less than half a soccer field. This finite gauge station availability motivated researchers to develop and test satellite/secondary rainfall estimates available from different sources/sensors having high temporal and spatial resolutions providing global coverage at subdaily, daily, and monthly time steps. These secondary datasets have high potential in monitoring precipitation, making it compelling to use them in water resource management studies where in situ observations are scarce, particularly in remote areas [2]. Advancement in blending infrared and microwave datasets and the availability of near-global coverage with multitemporal resolutions have increased the applicability of satellite rainfall datasets over a wide range of applications. However, a direct application of these precipitation products in hydrological modeling might lead to erroneous outputs, especially for extreme flow simulations [3]. Therefore, a comprehensive validation in replicating both the climate and hydrological components is essential to identify the best precipitation product over a specific region.

Several studies were conducted by comparing different satellite precipitation products (SPPs) with gauge-based or radar-based datasets in terms of statistical metrics evaluation or hydrological modeling to predict SPPs ability and efficiency in detecting rainfall accurately. Numerous studies related to the assessment and evaluation of SPPs against gauge-based data can be found worldwide [48]. Reviews related to the SPPs evaluation through various hydrological framework models can also be found in different climatic and geographical regions [912]. Most of these studies focused on either analyzing a single rainfall productʼs performance in hydrological modeling or evaluating the efficiency of a few rainfall products in runoff simulations, thus restricting their analysis to specific products [7, 1320]. Most of the studies have not considered reanalysis products during their evaluation or have not recalibrated each rainfall dataset, thus missing to differentiate the in situ corrected and uncorrected dataset efficiencies [15, 2124]. Studies that executed both statistical and hydrological comparisons also revealed that the precipitation datasets that prove effective in statistical comparison might not exhibit the same accuracy while performing hydrological simulations [911]. This mandates the assessment of SPPs by employing different hydrological models/SPPs and by conducting both hydrological and statistical analyses in a basin to truly interpret the behavior, characteristics, efficiency, and performance of an SPP for hydrological applications. Very few studies tested the efficiency of more precipitation datasets; however, these studies evaluated the SPPs performance using lumped hydrological models, thus not considering the intrinsic spatial behavior of river basin characteristics that are averaged over the subbasins [17, 25, 26]. Moreover, the few studies that implemented semidistributed/distributed models have not accommodated and tested the efficiency of multiple/more number of SPPs [18, 2730]. This provoked us to test numerous SPPs efficiency using a semidistributed hydrological model in the current study.

Further, the SPPs produced from satellite sensors are increasingly considered for hydrological modeling because of their long-term, consistent, and continuous data availability in mountainous and hilly regions [3133]. However, a recent study conducted in [20] in different climatic regions of India concluded that the precipitation datasets failed in detecting rainfall in the tropical region. This mandates the testing of SPPs performance by employing statistical and hydrological techniques in a mountainous tropical river basin. To overcome these shortcomings/research gaps, we evaluated 16 daily precipitation datasets against gauge-based gridded data and employed a semidistributed hydrological model in a tropical river basin of India. The 16 precipitation datasets considered in the current study are grouped under four categories, that is, gauge-based, satellite-only, reanalysis, and gauge-adjusted datasets, according to the classification mentioned by [34, 35]. To address these research gaps, the gridded rainfall products were analyzed for the period of 2000–2012, and different statistical coefficients were computed (i) to check the performance of these datasets in detecting rainfall when compared with gauge-derived gridded data, (ii) for comparing the simulated streamflow against observed streamflow using the Soil and Water Assessment Tool (SWAT), a semidistributed hydrological model, in a hilly tropical catchment of India, and (iii) for assessing the extreme flows simulated by the SWAT model employing these SPPs. Though these datasets’ performance varies across space and time, the study provides insight to researchers into selecting precipitation datasets when working on similar climatic and topographic regions for hydrological modeling and other precipitation-related studies.

The water input, that is, precipitation, was replaced with 16 different precipitation estimate products obtained from various sources to find their effects on runoff simulation using the SWAT model. Hydrological models are sensitive to input variable changes, where a small change in the input data can result in more significant deviations in output. Hence, it may be assumed that weather data that can simulate streamflow against observed streamflow with the highest correlation and least variance and bias have the best claim to be accurate. Since studies using satellite precipitation datasets on the hydrological model were scarce on mountainous tropical river basins, particularly in India, we were motivated to assess the SPPs efficiency for simulating runoff in an Indian river basin. SWAT model was selected for modeling the Tungabhadra river basin, south India, and different standard diagnostics like R2 (coefficient of determination), N-S (Nash-Sutcliffe coefficient), and PBias (percentage bias) were computed to evaluate the performance of each SPP. Though many distributed/semidistributed models such as MIKE SHE [36], TOP NET [37], PIHM [38], CREST [39], and VIC [40] were developed to incorporate the heterogeneity of basins, the physically based semidistributed model, that is, SWAT, was implemented by many researchers around the world due to its applicability and efficiency in simulating the characteristics of a basin effectively. SWAT model has been applied successfully for a wide variety of studies related to uncertainty analysis [7, 41, 42], climate change [43, 44], land-use change [4547], hydrological modeling [48, 49], best management practices generation [50, 51], water quality analysis [52, 53], and hydropower assessment [54, 55]. A study by [56] concluded that, as of 2019, around 4370 articles were published which implemented the SWAT model in their studies for a wide range of applications in diverse catchments having varied topographic and climatic conditions. As the current study area is agriculturally dominated, SWAT can fairly simulate the results close to the observed values as SWAT is mainly developed to model agricultural watersheds [57]. Hence, we adopted the SWAT model instead of opting for other distributed models.

2. Study Area

Tungabhadra, a major tributary of river Krishna, is a transboundary river shared by Andhra Pradesh and Karnataka states in south-western India. The basin has an area of around 69,552 km2 with elevation ranging from 246 m to 1921 m above mean sea level [17, 45]. Tungabhadra river originates from the confluence of twin rivers Tunga and Bhadra in the Western Ghats of Karnataka at an altitude of about 610 m above mean sea level. The current study area is a part of the upper Tungabhadra river basin with a catchment area of 7778 km2, considered up to the Honnali gauge station (having elevation of 557 m), which is the outlet of the catchment, as shown in Figure 1. The current study area lies between 74°00′ to 76°30′ E and 13°00′ to 15°30′ N, as shown in Figure 1.

The upper part of the catchment has an undulating terrain of Western Ghats and receives high rainfall compared to the lower portion of the watershed. The basin receives an average rainfall of about 1024 mm per year with mean maximum temperatures and minimum temperatures ranging from 26.3°C to 35.5°C and from 13.8°C to 22.3°C, respectively, and the relative humidity varying from 17% to 92% [58].

3. Input Datasets for the SWAT Model

Shuttle Radar Topographic Mission (SRTM) Digital Elevation Model (DEM) was used in this study with a spatial resolution of 30 m, downloaded from USGS Earth Explorer. The slope map was generated from DEM by selecting multiple classes based on the steepness of the surface. The slope map is categorized into five levels (0–10, 10–20, 20–30, 30–40, and >40 m). The slope map was classified by the Jenks classification (natural break) scheme available in the SWAT interface. The Jenks classification clearly sorts out ranges that can qualitatively represent the study areaʼs slope distribution. The SWAT interface allows us to define a maximum of five different classes to delineate discrete Hydrologic Response Units (HRUs). The slope ranges were designed to fit the homogenous catchment area within each slope range, looking for the most similar slope influence distribution in detecting HRUs. Soil information was obtained from the Food and Agricultural Organization (FAO) Digital Soil Map of the World (DSMW) based on the Harmonized World Soil Database (HWSD) and produced at a scale of 1 : 5,000,000 based on existing regional and national soil information. Land use land cover (LULC) was generated from Landsat-8 satellite imagery by employing maximum likelihood algorithm in ERDAS Imagine software. The LULC was classified into six classes, namely, agricultural land, cultivated land, barren land, forest area, water body, and built-up area, as shown in Supplementary Figure 1. Based on ground validation, the classified images’ overall accuracy was 85.94%, with a kappa coefficient of 0.789. SWAT requires five meteorological input variables. Of these, precipitation was taken from any of the 16 different precipitation datasets considered and temperature from IMD (Indian Meteorological Department) gridded data, whereas relative humidity, solar radiation, and wind speed were generated using the SWAT weather generator. The observed daily streamflow data were obtained from India-WRIS (Water Resources Information System) for the Honnali gauge station for 13 years spanning from 2000 to 2012. The difference in the collection time of LULC (2018) and weather parameters (2000 to 2012) is acceptable, since previous research conducted by [47] in the Tungabhadra river basin has concluded that the effect of change in LULC on streamflow simulations is negligible. Hence, for accurate classification and representation of land use classes, 2018 LULC was considered in the current study.

3.1. Overview of Precipitation Datasets

Different spatial resolutions for the same dataset are considered in this study to understand the effect of these variant resolutions of the models performance. A brief description of the precipitation datasets grouped under four categories, that is, gauge-based, satellite-only product, reanalysis, and gauge-adjusted datasets, is given in Table 1. More information regarding the development of SPPs can be found in the supplementary file under the section Supplementary Materials.

4. Methodology

Testing the capability of these datasets is generally performed in two approaches: (i) by keeping all topographic and climatic variables and sensitive parameters constant with varying precipitation datasets (traditional method) and (ii) by calibrating the model with constant topographic and climatic variables but with varying parameters for each precipitation dataset. The former approach uses a single set of parameters obtained from calibrating a standard precipitation dataset (generally station data) and implementing those sensitive parameters for calibrating and validating the model with other precipitation datasets. The latter approach deals with calibrating and validating the model separately for each input forcing (precipitation datasets).

The traditional method may induce some uncertainty, since all the datasets may not be sensitive to similar parameters. To overcome this limitation, the present study adopted the alternative methodology of calibrating the SWAT model separately with each precipitation dataset. Calibrating the model individually for each input dataset will help to drive the model to its maximum efficacy with the most suitable parameters for each input dataset. Further, the results can be validated using consistent statistical metrics. The alternate approach helps us understand the most capable precipitation dataset for simulating streamflow that accurately matches the observed streamflow.

A similar methodology was implemented by [69, 70], where each precipitation dataset is calibrated separately to assess the efficiency of precipitation datasets. Their studies proved that using the MSWEP precipitation datasetʼs sensitive parameters gave better results than using a standard gauge datasetʼs sensitive parameters (while calibrating MSWEP product). Hence, in the present study, each precipitation dataset is calibrated separately to test the datasetʼs maximum efficiency in producing streamflow using the SWAT model that can match against observed streamflow.

4.1. Continuous and Categorical Statistical Indices

The categorical statistics include probability of detection (POD) and false alarm ratio (FAR) metrics, whereas continuous statistical indices encompass correlation coefficient (CC), and root mean square error (RMSE), and bias. The precipitation detection ability can be assessed using categorical metrics, whereas SPPs’ performance in estimating precipitation is determined using continuous statistical metrics. POD constitutes the ratio of hits (accurate detection of rainfall as reference data) to the actual number of rainfall events according to the base dataset (IMD). FAR represents the ratio of misses (when SPP records the rainfall during the absence of precipitation in the base dataset) to the events not diagnosed by reference dataset. The categorical metrics assessment is essential in real-time flood monitoring studies because of the importance of accurately detecting extreme precipitation. These statistics will help us to comprehend the hydrological consequences of sources of errors in SPP [7173]. For computing POD and FAR, a threshold of 1 mm/day was implemented in the study as mentioned in [7, 20, 74, 75]. The categorical metrics were computed either for the entire time series or after segregating for different rainfall regimes. Table 2 exhibits the criterion implemented to partition the precipitation time series into multiple components such as low, medium, and high rainfall.

CC represents the degree of synchronicity between SPP and gauge or gridded data. RMSE indexes the data accuracy or the averaged error magnitude between the gauge and SPP. Bias signifies the degree of underestimation or overestimation of simulated data concerning observed data. Metrics with lower bias and RMSE and higher CC represent higher SPP accuracy with respect to base or reference datasets. More details regarding the metrics and formulas can be found in [74, 76]. The equations employed for calculating POD, FAR, CC, RMSE, and bias are represented as follows:Here represent the numbers of hits, misses, and false events of rainfall, and represent observed (gauge) and satellite precipitation estimates, and are averages of observed and satellite precipitation estimates, and n represents the number of data pairs.

4.2. SWAT Model

In this study, the SWAT model was employed for modeling the Tungabhadra river basin, India, to assess how the change in different input datasets (rainfall) affects the hydrologic process, that is, runoff. United States Department of Agriculture-Agricultural Research Service (USDA-ARS) developed a deterministic model, that is, SWAT [77], which uses both lumped (rainfall per subbasin) and distributed (HRU combination of unique soil, slope, and land use characteristics) variables for hydrological modeling. The watershed delineated using SWAT model produced 73 subbasins and 503 HRUs in the current study. SWAT uses readily available inputs for predicting various components related to water, sediment, and agricultural chemical yields for all types of watersheds at daily, monthly, and annual time step. The overall methodology is represented in Figure 2.

The SWAT model was calibrated at a monthly time step rather than being employed at a daily scale because of the catchment size and the type of land use in the basin that influences the hydrologic response times due to changes in the characteristic velocity. Urban and agriculturally dominated catchments exhibit faster and intermediate hydrological responses, whereas forest dominated basins will exhibit slower hydrological responses. The watershed in the current study is forest-dominated followed by agriculture (part of agriculture has similar characteristics of forest due to higher occupancy of areca nut and coconut plantation in the study area), which has slower-to-intermediate hydrologic response time. Hence, calibrating a model with different inputs may necessitate the observations whose temporal resolutions correlate with respective hydrologic response times implying that, for significant results, the model should be calibrated at a monthly time step [78]. Many studies [42, 74, 7981] conducted over different regions and catchments and two studies [47, 58] over the Tungabhadra basin even concluded that the SWAT modelʼs performance in simulating the flows is excellent at a monthly time step when compared to a daily time scale. Hence, the current study employed the SWAT model to simulate streamflows with different precipitation datasets at a monthly time step.

4.3. SWAT CUP

Calibration and validation of SWAT output (runoff) were carried out using the SUFI-2 (Sequential Uncertainty Fitting) algorithm in SWAT CUP. SWAT CUP is an autocalibration and uncertainty analysis tool that optimizes a range of input parameters iteratively through calibration batches. The autocalibration that depends on the global optimization method, that is, Shuffled Complex Evaluation developed at the University of Arizona (SCE–UA), is implemented in the current study to optimize the sensitive parameters. SWAT CUP provides various sensitive parameters with varying minimum and maximum values, which should be adjusted to obtain the best fit values when compared with observed data.

As large numbers of parameters can potentially be set within the SWAT model, sensitive parameters are identified based on previous literature for performing sensitivity analysis. The SWAT model was run for 1000 simulations with this list of candidate sensitive parameters. We selected the sensitive parameters, according to t-statistic and value. The value ranges from 0 to 1, which provides the significance of sensitivity, where a value closer to 1 is identified as a nonsensitive parameter and vice versa. The sensitive parameters (listed in Supplementary Table 1) obtained after sensitivity analysis were employed to calibrate and validate the model separately with each input forcing. The calibration of the SWAT model for the study area was carried out by comparing observed monthly streamflows over the training period with simulated values at the outlet of the basin by maintaining Nash-Sutcliffe (N-S) coefficient as the objective function. Around 14 parameters with varying ranges are employed for a different set of iterations for different input dataset calibrations. The various parameters adjusted in the present study are CN2 (SCS runoff curve number), Alpha_BF (base flow alpha-factor), GW_Delay (the delay in groundwater to resurface), GWQMN (depth of base flow alpha factor for bank storage), CH_N2 (Manning “n” coefficient for main channel), CH_K2 (main channelʼs hydraulic conductivity), SOL_AWC (water capacity available for the soil layer), SOL_K (saturated hydraulic conductivity of soil), ESCO (soil evaporation compensation factor), GW_REVAP (groundwater “revap” coefficient), REVAPMN (depth of water required for “revap” to occur in shallow aquifer), SLSUBBSN (average slope length), SLSOIL (slope length for lateral subsurface flow), and ALPHA_BNK (base flow alpha factor for bank storage). The allowable ranges and obtained fitted values are provided in Supplementary Table 1. The model was calibrated for the period of 2002–2008 and validated for the period of 2009–2012 with a warm-up period of 2 years. The warm-up period is required to train the model according to the datasets and attain hydrological parameter balance from the initial state.

5. Results and Discussion

5.1. Spatial Patterns of Sixteen Precipitation Datasets

Figure 3 depicts the spatial variations of the 13-year mean annual precipitation of the sixteen precipitation datasets over the Tungabhadra river basin. All the precipitation datasets exhibited an increasing rainfall trend from northeast to southwest over the Tungabhadra river basin. IMD, which was considered as standard, has a mean annual rainfall varying across the basin from 795 mm to 3195 mm. Another gauge dataset, APHRODITE (Figure 3(a)), estimated an average areal rainfall almost similar to IMD ranging from 667 mm to 2917 mm.

The mean areal precipitations depicted by CHIRP (Figure 3(b)) and CHIRPS 0.25° (Figure 3(d)) under the reanalysis category were close to the ranges of IMD (Figure 3(i)). In comparison, CHIRPS 0.05° (Figure 3(c)) and NCEP-CFSR (Figure 3(k)) had higher precipitation values compared to IMD. Reanalysis products PGF 0.5°, PGF 0.25°, and MSWEP (Figures 3(m), 3(n) and 3(j)) projected very low rainfall magnitudes over the study area compared to IMD.

GPCC v2018 and GPCC v7 (Figures 3(e) and 3(f)) have higher values of mean areal precipitation when compared to IMD, while lower average annual values were seen for TRMM 3B42 v7 (Figure 3(p)). GPCP-CDR v1.3 (Figure 3(g)) and PERSIANN-CDR (Figure 3(l)) datasets under the gauge-adjusted category were unable to detect the rainfall variation over the study area with precipitation spatially varying from 867 to 1036 mm and from 698 to 917 mm, respectively. The substantial difference of rainfall amount between southwest and northeast parts of the basin signifies the orographic effect of precipitation due to the existence of the Western Ghats mountains in the upper part of the basin, that is, at the southwest side of the catchment. Overall, GPCP-CDR v1.3, MSWEP, PERSIANN-CDR, PGF 0.25°, and SM2RAIN-CCI datasets failed to represent patterns over the study area accurately. CHIRP, CHIRPS 0.05°, and NCEP-CFSR captured the patterns effectively when compared against the IMD precipitation dataset.

5.2. Evaluation of Satellite Precipitation Datasets through Statistical Measures
5.2.1. Continuous Statistical Indices

(1) From Gauge-Based and Satellite-Only Datasets. The correlation coefficient for the 15 datasets compared with the IMD mean precipitation distribution is close to their corresponding perfect values (above 0.75), except for PERSIANN-CDR, which has a weak correlation (Figure 4). Under the gauge-based dataset category, APHRODITE has lower RMSE (2.89) and moderate overestimated bias (0.99). The satellite-derived SM2RAIN-CCI dataset provided a good correlation against the IMD dataset at a monthly time step (Figure 4). In contrast, the other two statistical metrics exhibited significant overestimation and higher RMSE values for the SM2RAIN-CCI dataset.

(2) From Reanalysis Datasets. All datasets under the reanalysis category exhibited a high correlation against the IMD dataset at the monthly time step (Figure 4). The underestimation of precipitation amount by the CHPClim (CHIRP, CHIRPS 0.25°, and CHIRPS 0.05°) datasets has been reported earlier by [42, 73, 82]. The NCEP-CFSR dataset under the reanalysis category revealed better performance with high CC (0.86), low RMSE (3.33), and low bias (−0.77) (Table 3). The Princeton datasets (PGF 0.25° and PGF 0.5°) overestimated the precipitation magnitudes, with moderate bias (1.93 and 1.46) and moderate RMSE (4.39 and 3.69 mm).

(3) From Gauge-Adjusted Datasets. TRMM 3B42 v7 and GPCC v7 demonstrated effective performance in detecting precipitation similar to IMD as they exhibited near-perfect results in terms of CC, RMSE, and bias under gauge-adjusted category at a monthly time step (Table 3). The close agreement and low bias between the TRMM 3B42 v7 and IMD datasets can be attributed to the gauge-based datasetʼs pivotal role in the bias removal process [83]. Poor performance and overestimation were observed from the statistical values of PERSIANN-CDR with high error magnitudes (RMSE = 6.42) and weak correlation (0.34).

Overall, APHRODITE furnished excellent results in detecting rainfall with less bias and error magnitudes than the IMD dataset from the results of continuous statistical metrics. CHIRP, NCEP-CFSR, TRMM 3B42 v7, GSMaP_Gauge_RNL v6, and MSWEP datasets exhibited good-to-moderate performances at a monthly time step.

5.2.2. Categorical Statistical Metrics

(1) From Gauge-Based and Satellite-Only Datasets. Categorical statistics, that is, POD and FAR, are computed based on the formulas mentioned in Table 2 and are segregated into low (0–5 mm), medium (5–25 mm), and high (>25 mm) rainfall classes. The trends of POD and FAR for all 15 datasets computed against IMD are shown in Figures 5 and 6. APHRODITE under gauge-based category has high POD in low rainfall criteria, indicating accurate detection of rainfall events. Further, the values of POD followed a declining trend with the increase in rainfall intensity, that is, for medium and higher rainfall events. The FAR value was low (0.11), indicating that the number of false events detected by APHRODITE was less, and it further increased to 0.21 and 0.2 for medium and high rainfall events. Figure 5(a) shows that APHRODITE performed well in detecting rainfall, and it had moderate detection capability (POD = 0.5) up to 106 mm of rainfall intensity. SM2RAIN-CCI product under the satellite-only dataset category exhibited an intermediate ability in detecting rainfall in case of low rainfall events (POD = 0.6) and negligible (POD = 0.08) to no (POD = 0) detection performance in medium and high rainfall events, respectively. The satellite-based dataset (SM2RAIN-CCI) provided higher FAR values for low, medium, and high rainfall events ranging from 0.32 to 1, indicating that the dataset detected more “no-rain” or false events. Figure 5(b) shows that the SM2RAIN-CCI dataset has a clear declining trend with the increase in precipitation intensity, and it detected until 22 mm of rainfall with low POD values. The FAR values plotted in Figure 6(b) exhibited an increasing trend, indicating that the dataset witnessed high false rainfall events.

(2) From Reanalysis Datasets. Under the reanalysis dataset category for low rainfall events (0–5 mm), NCEP-CFSR outperformed other rainfall datasets with POD and FAR values of 0.93 and 0.3, followed by the CHIRP dataset with similar performance in detecting low rainfall events (Figures 5(c) and 6(c)). Moderate performance was observed for medium rainfall events (6–25 mm) for both NCEP-CFSR and CHIRP with POD values of 0.55 and 0.46 and FAR values of 0.53 and 0.56, respectively. Poor performance in terms of POD and FAR was observed for NCEP-CFSR and CHIRP datasets in detecting high rainfall intensities (>25 mm). MSWEP and GSMaP_Gauge_RNL v6 datasets under the reanalysis category revealed higher POD values (0.81 and 0.74) and lower FAR values (0.29 and 0.18) for rain events between 0 and 5 mm, indicating better rainfall detection capabilities. Moderate performance was exhibited by CHIRPS 0.05° and CHIRPS 0.25° datasets in all three rainfall event classes (low, medium, and high). It can be observed from Figure 5(c) that only CHIRPS datasets had some ability to capture high rainfall events when compared to other datasets in the reanalysis category. All other datasets under the reanalysis category followed a similar trend with no rain detection in high or extreme rainfall events. PGF 0.25° dataset revealed moderate rainfall detection capability in low rainfall events with a POD of 0.5 and FAR of 0.17 (Figure 6(c)) but a poor performance in detecting medium and high rainfall events. PGF 0.5° dataset resulted in poor performance in all three rainfall events (low, medium, and high) under the reanalysis category.

(3) From Gauge-Adjusted Datasets. GPCC v2018 and GPCC v7 datasets manifested better performance in detecting low rainfall events. Under medium (5–25 mm) and high (>25 mm) rainfall events, GPCC v2018 yielded moderate performance, whereas GPCC v7 revealed poor performance. From Figure 5(d), it can be observed that GPCC v2018 captured high rainfall events up to 106 mm/d with moderate detection capability. TRMM 3B42 v7 exhibited reasonable performance in detecting rainfall values across low, medium, and high rainfall events. Both GPCP-CDR v1.3 and PERSIANN-CDR exhibited poor performance with worse capabilities in all rainfall categories.

The POD values with 1 mm/day rainfall threshold exhibited that all datasets’ rainfall detection skill decreases as precipitation intensity increases, whereas the FAR metric shows an increase. This reveals that all the implemented datasets cannot capture the magnitude of extreme precipitation events accurately. The two categorical metrics (POD and FAR) showed that NCEP-CFSR has the best skill in detecting low rainfall events followed by CHIRP, APHRODITE, MSWEP, and GPCC v2018 datasets. High rainfall events (>25 mm) were best captured by GPCC v2018, CHIRPS 0.05° and CHIRPS 0.25°, and TRMM 3B42 v7 datasets.

5.3. Performance Evaluation of Precipitation Datasets for Streamflow Simulations

The values obtained from statistical coefficients, that is, R2, N-S, and PBias, are represented in Table 4. The model was calibrated and validated at a monthly time step with a training period of 7 years ranging from 2002 to 2008 and a validation period of 4 years running from 2009 to 2012 with a warm-up period of 2 years.

5.3.1. From the Results of N-S

According to the ranges specified by [84] for streamflow simulations, IMD and APHRODITE in gauge-based category exhibited satisfactory performance in terms of estimating the acceptable magnitude of variances between observed and simulated datasets (N-S coefficient > 0.65). SM2RAIN-CCI dataset was left unsatisfactory in satellite-only listing. CHIRP v2.0 predicated excellent results supervened by GSMaP_Gauge_RNL v6 and CHIRPS 0.05° with good performance in the reanalysis section. TRMM 3B42 v7 outperformed in both gauge-adjusted and overall (out of 16 datasets) category in terms of producing less variance (N-S > 0.75) between the observed and simulated streamflow results.

5.3.2. From the Results of PBias

IMD dataset overestimated the flows with fewer biases and APHRODITE underestimated them with a more significant bias (PBias ≥ 25%), indicating unsatisfactory performance in gauge-only category during calibration. SM2RAIN-CCI produced the highest mismatch between the simulated and observed streamflows and was left unsatisfactory. CHIRP v2.0, CHIRPS 0.05°, CHIRPS 0.25°, and GSMaP_Gauge_RNL v6 displayed excellent model simulations with fewer biases (PBias ≤ 10%) in the reanalysis category. TRMM 3B42 v7 underestimated the streamflow with a lower bias of 7.1% and GPCC v2018 and GPCC v7 had an overestimation of −5.6% and −9% indicating very good performance (PBias ≤ 10%) in gauge-adjusted category. PERSIANN-CDR and GPCP-CDR were underestimated with higher biases and were left unsatisfactory.

5.3.3. From the Results of R2

IMD and APHRODITE obtained a moderate correlation (R2 ≥ 0.65) (Figures 7(a) and 7(b)) under gauge-based category. Consistently SM2RAIN-CCI dataset resulted in unpropitious correlation (Figure 7(c)) under satellite-only listing. CHIRP v2.0 yielded a very good correlation between observed and simulated discharge values, which can be asserted from Figure 7(d). In contrast, scatterplots depicted in Figures 7(e) and 7(f) portray that GSMaP_Gauge_RNL v6 and CHIRPS 0.05° produced good performance (R2 ≥ 0.65) with a moderate correlation. Remaining datasets in the reanalysis category (PGF 0.5° and 0.25°, MSWEP, CHIRPS 0.25°, and NCEP-CFSR) exhibited satisfactory analogue with R2 ≥ 0.5 as depicted in Figures 7(g)–7(k)). Under gauge-adjusted classification, TRMM 3B42 v7 outperformed with excellent correlation (R2 ≥ 0.75, Figure 8(l)) ensued by GPCC v2018 and v7 with good analogue between observed and simulated streamflow values as shown in Figures 8(m) and 8(n).

It can be observed from Table 4 that IMD performed best in gauge-based category, whereas SM2RAIN-CCI performed the least overall. CHIRP v2.0 in the reanalysis category and TRMM 3B42 v7 in the gauge-adjusted category furnished excellent results. Overall, TRMM 3B42 v7 outperformed when compared with other 15 datasets in terms of R2 and N-S followed by CHIRP v2.0, GSMaP_Gauge_RNL v6, CHIRPS 0.05°, GPCC v7, GPCC v 2018, IMD gridded data, PGF 0.25°, APHRODITE, PGF 0.5°, MSWEP v1.2, CHIRPS 0.25°, PERSIANN-CDR, NCEP-CFSR, GPCP-CDR v1.3, and SM2RAIN-CCI datasets. To assert the overall performance of precipitation datasets, TRMM 3B42 v7 and CHIRP datasets during the calibration phase along with APHRODITE and PGF 0.25° during the validation phase furnished very good performance with high correlation, less variance, and fewer biases.

5.4. Assessment of Extreme Flows

Table 5 represents the peak and standard deviation values of observed and simulated streamflows. The peak value represents the maximum streamflow value in the observed/simulated dataset. In contrast, the standard deviation specifies the variance of the simulated dataset’s mean with respect to the observed dataset. The model’s simulated dataset should match all the peaks and low flows when represented in a graphical format to consider the model as effective. From Table 5, it can be seen that the observed streamflow from station data and simulated streamflow from the SWAT model have more significant differences in peak values during the calibration phase, contributing to the decrease in R2, low correlation, and mismatch of peaks, which are represented in Figures 9(a)–9(e).

The SWAT model failed to capture the peaks correctly, since there is a larger deviation between the observed and simulated streamflow values (represented in Table 5) during the calibration period. From Table 5 and Figure 9(a) plotted for gauge-based datasets, it can be observed that IMD gridded data and APHRODITE have a more significant deviation and could not match the low flow events as well. SM2RAIN-CCI dataset portrayed in Figure 9(b) has more significant differences between observed and simulated peaks, resulting in unsatisfactory performance. From Figure 9(c) related to the reanalysis category, it can be seen that the CHIRP dataset matched the peak flows after 2004, whereas CHIRPS 0.05° was overestimated during the year 2005 and was underestimated for other years. In contrast, GSMaP_Gauge_RNL v6 tried to capture peak events compared to PGF 0.25° and that correctly captured the low flow events. MSWEP and PGF 0.5° datasets of the reanalysis section depicted in Figure 9(d) had more differences in peak and standard deviation values leading to lower R2 and higher PBias coefficients. In contrast, CHIRPS 0.25° was overestimated during the year 2005 and was underestimated for other years and, lastly, the NCEP-CFSR dataset failed to capture both peaks and low flows correctly. The line diagram depicted for gauge-adjusted datasets in Figure 9(e) represents that TRMM 3B42 v7 performed very well and matched the peaks and low flows in all the years compared to other datasets. GPCC v2018 has less difference in peak and deviation values between observed and simulated periods when compared to PERSIANN-CDR and GPCP-CDR v1.3 under the gauge-adjusted category.

Coming to the validation phase, from Table 5 and Figure 10(a) plotted for gauge-based datasets, it can be seen that IMD and APHRODITE underestimated the results in the first two years and overestimated them during the last two years of the validation period with smaller biases and deviations. The SM2RAIN-CCI dataset flows plotted in Figure 10(b) failed to match the peak and low flow events except in 2010, resulting in poor performance and larger biases. CHIRP dataset under the reanalysis category (Figure 10(c)) could not match the peaks in all the years but has minor deviation when compared to observed data, which resulted in better values of N-S. GSMaP_Gauge_RNL v6 tried to reach the peak flows and resulted in overestimation, whereas PGF 0.25° underestimated the flows in all the years (Figure 10(c)). CHIRPS 0.05° classified under the reanalysis category overestimated the peaks in all the years and had more significant deviations resulting in a more substantial bias. MSWEP, NCEP-CFSR, and PGF 0.5° from Figure 10(d) underestimated the peaks in all years, whereas CHIRPS 0.25° underestimated the peaks and has fewer biases. TRMM 3B42 v7 and GPCC v7 from the gauge-adjusted category overestimated all the peak events. From Figure 10(e), it can be seen that though GPCC v2018, which tried to match the peaks, overestimated the peaks in 2010 and underestimated them in 2012, it produced better results in estimating peaks when compared with other datasets. GPCP-CDR and PERSIANN-CDR categorized under gauge-adjusted datasets (Figure 10(e)) underestimated the peaks in all years with more substantial biases. From the extreme flow analysis, it was observed that CHIRP, TRMM 3B42 v7, GPCC v7, and APHRODITE datasets captured more peak flow events and hence can be further implemented for extreme event analysis.

5.5. Discussion

The discussion section was framed to compare the results and observations of the current study with other research around the world. To clearly segregate the comparisons, the discussion section is divided into 4 parts, that is, (i) from gauge-based dataset results, (ii) from satellite-only dataset results, (iii) from reanalysis dataset results, and (iv) from gauge-adjusted dataset results. Each section will tackle the results obtained from both statistical and hydrological analyses carried out in the current study.

5.5.1. From Gauge-Based Dataset Results

Gauge-based datasets (IMD and APHRODITE) developed by implementing different algorithms to interpolate precipitation values between sparsely spread gauging stations may contribute to lower accuracies. Many studies broached gauge-based dataset as standard and compared all the satellite datasets by calibrating the hydrological models with station or gauge-based gridded dataset and implementing those sensitive parameters for testing other satellite precipitation datasets [5, 8587]. The performance of gauge-based datasets for hydrological modeling in our study was less effective, since IMD and APHRODITE did not produce better results for simulating streamflows than TRMM 3B42 v7 and CHIRP datasets. Similar results were published by [10, 42, 73], where satellite precipitation products produced better performance in simulating streamflows than using precipitation measured from gauging stations or from gauge-based gridded datasets. The lower accuracies of IMD and APHRODITE may be attributed to the high precision (using infrared and microwave data or highly accurate interpolation techniques for merging satellite data with gauge data) along with complete spatial coverage of satellite datasets or usage of a different set of sensitive parameters for IMD, APHRODITE, and other datasets. The statistical results showed that CHIRP has less bias and better ability to detect low rainfall than APHRODITE (Table 5). This study illustrates to researchers that testing and calibrating each precipitation dataset are essential, since the standard dataset (which we assume as IMD) may not provide accurate results in every catchment.

5.5.2. From Satellite-Only Dataset Results

We observed that SM2RAIN-CCI under satellite-only category furnished good correlation against IMD but failed to represent fewer RMSE and bias values from continuous statistical analyses, which are parallel to the results reported by [88]. They mentioned that the SM2RAIN-CCI dataset overestimates precipitation magnitudes at low- and high-altitude regions and underestimates them in the medium-altitude regions. The SM2RAIN-CCI dataset (satellite-only category) employed in the present study exhibited more significant overestimation in detecting rainfall with substantial bias and a larger magnitude of errors. Though the dataset demonstrated overestimation in detecting rainfall, the streamflow simulations were underestimated. This discrepancy may be attributed to the overcalibration of SWAT model sensitive parameters with the N-S coefficient as a primary objective function.

5.5.3. From Reanalysis Dataset Results

All the three CHPClim datasets from the reanalysis category resulted in underestimation with moderate bias and RMSE for CHIRP and CHIRPS 0.05° datasets and high bias and RMSE for CHIRPS 0.25° datasets. The underestimation of CHPClim datasets has been reported earlier by [42, 73, 82], where it was mentioned that the underestimation might be attributed to the infrared algorithm (which computes data from infrared region retrieved signals) used for CHPClim dataset developments that implement fixed brightness temperature thresholds to distinguish between raining and nonraining clouds. The defined thresholds are usually too cold since the orographic precipitation occurring over the Western Ghats is warm, which may not produce much ice aloft, resulting in an underestimation of rainfall [8993]. Coming to the categorical results, NCEP-CFSR outperformed other rainfall datasets followed by the CHIRP dataset with relatively similar performance in detecting low rainfall events. Similar results were published by [42, 79], where it was concluded that the NCEP-CFSR dataset outperformed other datasets in terms of categorical statistics in low rainfall events; however, it had a moderate-to-poor performance in terms of continuous statistical metrics.

MSWEP dataset (reanalysis category), which takes advantage of merging gauge, satellite, and reanalysis precipitation estimates, did not result in a better outcome in our study from both statistical and hydrological perspectives. Though MSWEP has a better ability to detect low rainfall events, the magnitude of error and bias in the dataset was large, leading to the dataset’s poor performance for simulating streamflows. The MSWEP datasets versions 1.2 and 2.0 were developed by considering several data sources that were tested at a global scale by [66], which concluded that MSWEP v2.0 outperforms other datasets, including MSWEP v1.2 in terms of statistical and hydrological criteria. Another study by [11] tested four satellite precipitation datasets in different climatic zones of peninsular Spain. It was concluded that MSWEP v2.0 performed best in a semiarid region and TRMM 3B42 v7 outperformed in other climatic regions (Oceanic climate, Galicia Variant, and Mediterranean climate). A study conducted by [94] concluded that TRMM 3B42 v7 and MSWEP v1.2 outperformed PERSIANN and CMORPH datasets in terms of statistical coefficients while simulating discharges and represented that TRMM 3B42 v7 and MSWEP v1.2 had similar tendency and correlation while simulating runoff. In the current study, MSWEP v1.2 was tested in a semihumid tropical region, and it was found that the performance of MSWEP was not very effective in the reanalysis category compared to CHIRP and CHIRPS products. Testing of MSWEP v1.2 product in different climatic and topographic zones is still required to understand the performance of MSWEP products as much literature was not found on this dataset. The performance of Princeton datasets (PGF 0.5° and PGF 0.25°) was moderate in both statistical and hydrological analyses because of high bias and less precipitation detection capabilities. From the results of PGF and CHIRPS SPPs that have multiple spatial resolutions, the datasets with finer resolution (PGF 0.25° and CHIRPS 0.05°) proved effective compared to coarse resolution products (PGF 0.5° and CHIRPS 0.25°). One possible reason for the better performance of fine scale products can be attributed to the size of the catchment considered in the study. As the study area has a catchment area of 7778 km2, coarse resolution products (0.5° and 1° spatial resolution) might have 20–25 rainfall grid cells over the basin, whereas finer resolution (0.05° and 0.25° spatial resolution) products might accommodate 70–90 grids. Lesser number of grid cells lacks the ability to capture the heterogeneity of rainfall in a mountainous basin, thus leading to poorer performance when compared to fine scale SPPs. However, the interpolation algorithm, quality, resolution, time period, and blending procedures of input parameters will also play a pivotal role in proving the effectiveness of the dataset.

5.5.4. From Gauge-Adjusted Dataset Results

Poor performance and overestimation were observed for PERSIANN-CDR under gauge-adjusted category and this may be attributed to the inadequate training of ANN over parts of the world other than the United States because PERSIANN-CDR dataset is competently trained only over the United States [83, 95]. Though CHIRPS afforded a good performance, TRMM 3B42 v7 outperformed those results, which may be due to the implementation of combined infrared and microwave data in the development of TRMM 3B42 v7, whereas only infrared signals are used for the development of CHIRPS products. The biases were less for the TRMM 3B42 v7 product, which may be due to the inclusion of the inverse-error-variance-weighing algorithm and calibration with a gauge analysis product of GPCC during development. From the hydrologic evaluations in the present study, it can be apprehended that TRMM 3B42 v7 also provided better results than the CHIRP dataset in simulating streamflows, whereas CHIRP performed better than TRMM 3B42 v7 in detecting rainfall, which is evident through statistical analysis results. Similar results were cited by [82, 86, 96], where the applicability of these products in a variety of catchments having different topographic and climatic patterns was tested using hydrological modeling. It was concluded that though TRMM 3B42 v7 and CHIRP have certain biases, they outperformed other satellite precipitation datasets.

From the overall analysis, it can be observed that though the satellite products exhibited higher correlations at monthly time step in the statistical analysis category, satellite product performance varied significantly in simulating streamflows. The CHIRP dataset proved effective in both hydrological and statistical analyses. In contrast, the NCEP-CFSR dataset, which detected rainfall effectively, did not perform better in simulating streamflows and TRMM 3B42 v7, which has less precipitation detection capability, outperformed in streamflow simulations. These discrepancies can be attributed to the bias and error magnitudes of a particular product. TRMM 3B42 v7 and CHIRP, which have less bias, performed well, whereas NCEP-CFSR, PGF, GPCP-CDR v1.3, and PERSIANN CDR, which have more bias, exhibited moderate performance in streamflow simulations. Our results are in parallel with the conclusions made by [79], stating that the bias of rainfall products determines the accuracy of runoff simulations by a model. The reason specified for the conclusion (streamflow simulations are affected by the bias of a product) as per [16, 42] is related to the nonlinearity of the hydrologic process, where moderate rainfall bias can be transmuted into large PBias in discharge simulations. In correspondence to the above conclusion, the RMSE and bias values of SM2RAIN-CCI and GPCP-CDR v1.3 might have adversely affected the hydrologic performance and contributed to unsatisfactory discharge simulations.

6. Conclusions

This study compared sixteen rainfall datasets belonging to four categories (gauge-based, satellite-based, reanalysis, and gauge-adjusted products) in a mountainous tropical catchment of Karnataka, that is, Tungabhadra. Tungabhadra, with hilly terrain and varying topography, was selected to analyze the (i) capability of satellite precipitation datasets in detecting rainfall through statistical analysis and (ii) hydrological performance of these precipitation datasets in simulating streamflow when compared against observed streamflow data. Since the watershed is situated at higher elevations with mountainous terrain, the density of rain gauges in and around the watershed is very low, mandating us to test satellite precipitation products’ performance. Hence, the main aim of the present study is to assess the (i) capability of precipitation datasets in detecting rainfall and (ii) suitability of sixteen rainfall datasets for hydrological modeling tested in terms of statistical coefficients (R2, N-S, and PBias). From the results of continuous statistical metrics, APHRODITE furnished very good results in detecting rainfall with less bias and error magnitudes when compared against the IMD dataset. CHIRP, NCEP-CFSR, TRMM 3B42 v7, GSMaP_Gauge_RNL v6, and MSWEP datasets exhibited good-to-moderate performances at a monthly time step. From the results of categorical statistical metrics, it was revealed that NCEP-CFSR has the best skill in detecting low rainfall events followed by CHIRP, APHRODITE, MSWEP, and GPCC v2018 datasets. High rainfall events (>25 mm) were tried to be captured by GPCC v2018, CHIRPS 0.05° and 0.25°, and TRMM 3B42 v7 datasets. The objective of testing the suitability of these precipitation datasets was successfully achieved as three datasets under the reanalysis category (CHIRP, CHIRPS 0.05°, and GSMaP_Gauge_RNL v6) and three datasets under gauge-adjusted category (TRMM 3B42 v7, GPCC v7, and GPCC v2018) are under ranges specified by [84] in simulating streamflow using a hydrological model, that is, SWAT, for Tungabhadra river basin, India. The newly released GPCC v.2018 proved effective when compared to GPCP-CDR v1.3. The LULC classification using a maximum likelihood algorithm produced an accuracy of 85.94%, indicating that the classification has achieved good accuracy. Sensitive parameters were obtained from previous literature and are used in different sets based on the sensitivity to each precipitation dataset in SWAT CUP for valid calibration and validation results. The present study showed that changes in input parameters affect the output results. The quality, resolution, time period, and blending procedures of input parameters will also play a pivotal role in proving the effectiveness of the dataset and achieving accurate results. It was shown from CHIRPS and PGF datasets that finer resolutions provide better results compared to coarser resolutions in case of the same type of dataset, which was developed by implementing the same algorithm and same input parameters. From both statistical and hydrological performance outputs, it can be concluded that rainfall product bias determines the accuracy of hydrological model runoff simulations. This is mainly because the bias of rainfall products will be transmuted into larger PBias during runoff simulations using a hydrological model. Out of 16 datasets, one can use TRMM 3B42 v7, CHIRP, CHIRPS 0.05°, and GSMaP_Gauge_RNL v6 datasets for hydrological modeling, climate change studies, and other research in similar topographic and climatic watersheds as they achieved overall very good performance.

The highlights of this paper are listed as follows:(i)Eight out of sixteen datasets yielded good results even without bias correction(ii)TRMM 3B42 v7 proved best for streamflow modeling and APHRODITE was the best in detecting rainfall(iii)The reliability and applicability of recently released GPCP-CDR v1.3 and GPCC v2018 datasets were explored

Data Availability

All the datasets used in this study are freely available online. The IMD gridded data can be accessed from http://www.imdpune.gov.in, APHRODITE from http://aphrodite.st.hirosaki-u.ac.jp, SM2RAIN-CCI from https://zenodo.org/record/1305021, CHIRP from ftp://ftp.chg.ucsb.edu/pub/org/chg/products/, GSMaP from https://sharaku.eorc.jaxa.jp/GSMaP/, NCEP-CFSR from https://globalweather.tamu.edu/, PGF from http://hydrology.princeton.edu/data/pgf, MSWEP from http://www.gloh2o.org/, GPCP-CDR from http://eagle1.umd.edu/GPCP_CDR, PERSIANN CDR from http://chrsdata.eng.uci.edu/, and TRMM from https://disc.gsfc.nasa.gov/.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors thank Mr. Kolluru Srinivas (Research Scholar, IIT Bombay) for his support in data extraction and Dr. Archana Mohite (IIT Kharagpur) for her assistance in performing statistical analysis. The authors gratefully thank all institutes, organizations, and developers of precipitation datasets for providing their products freely.

Supplementary Materials

Supplementary Table 1: sensitivity parameters for SWAT CUP. Supplementary Figure 1: soil and land use map of the study area. Supplementary Information: overview on satellite precipitation datasets. (Supplementary Materials)