A comparison of three parameter estimation methods of the gamma distribution of annual maximum low flow duration and deficit in the Upper Vistula catchment (Poland)

Based on 30-year 24-hour flow sequences at 69 water gauging stations in the Upper Vistula catchment, it was determined that the probability distributions of the low flow duration and its maximum annual deficit can be described by the gamma distribution with the estimated parameters by the methods: MOM, the method of moments, LMOM, the method of linear moments, and MLE, the method of maximum likelihood. The stationarity of the time series was tested by the Mann-Kendall correlation using the Hamed and Rao variance correction. The low flows were defined by the SPA method, with the limit flow Q70%. The quality of the match was tested by the Anderson-Darling goodness of fit test. This test allowed accepting the gamma distribution in all analysed cases, regardless of the method used to estimate the distribution parameters, since the pv (p-values) values were greater than 5% (over 18% for Tmax and 7.5% for Vmax). The highest pv values for individual water gauging stations, as well as the highest 90% Tmax and Vmax quantiles were noted using LMOM to estimate the gamma distribution parameters. The highest 90% Tmax and Vmax quantiles were observed in the uppermost part of the studied area.


Introduction
The river low flow is usually referred to as the time period in which the water flow in a given section of the river is lower than the conventionally assumed flow threshold [1][2][3][4][5]. This method of determining the low flow (threshold level methods) was introduced in the United States by Yevjevich [6] and in Poland by Zielińska [7].
Low flow is usually defined by one of two methods: PUT (Pit Under Threshold) [8] or SPA (Sequent Peak Algorithm) [2]. The beginning of the low flow occurs in both methods at the time t p of the decrease of the flow below the limit Q g . Differences between the use of these methods are evident in the determination of the end time t k of the low flow duration and its deficit. In the PUT method, it is assumed that the end of the low flow occurs when the flow in the river rises back above the Q g level. In the SPA method, the differences Q g -Q i are added (starting with t p ) in subsequent days until the sum obtained ceases to be positive, i.e. until the resulting water deficit is balanced by flows higher than Q g . The day on which the sum reaches the maximum value is the t k day end of the SPA low flow and determines both the T-time of the SPA low flow duration and its deficit V [2,9,10].
The maximum annual low flow. An important issue in hydrology is the risk assessment of low flow hazards, and the maximum values of the duration and deficit of a low flow are of particular importance. The maximum values are usually determined by one of two methods: annual maximum series or partial duration series. The annual maximum series (AMS) contains only the highest values that occur in each year of the examined multiyear, while the partial duration series (PDS) covers all values in the considered period, above the set limit value.
Estimation of one-dimensional distributions of duration and deficits of the maximum low flow (using the PDS method) were studied by, inter alia, Zelenhasić and Salvai [11], while in Poland the issue was studied primarily by Jakubowski [12,13]. In order to determine the probability of not exceeding the volume of the low flow and its duration, Jakubowski and Radczuk, developed the computer program NIZOWKA2003 [14], in which the number of low flows (in a year or season) is estimated by the following discrete distributions: Poisson and Pascal, and the duration of low flows and their deficit by the following continuous distributions: Pearson type III, Weibull, lognormal, Johnson, double exponential, and generalized Pareto. This paper uses the annual maximum series of the duration and deficit of the low flow.
Finding the probability distribution of time variables T max duration and volume V max of the maximum low flows allows estimating the value of exceeding the preset T max and/or V max with a given return period, i.e. a numerical risk/hazard assessment manifested as a low flow with a defined characteristic. The use of the gamma distribution to describe the maximum duration and the maximum deficit of a low flow can be found in many works [4,10,[13][14][15][16][17].
The most commonly used methods of estimating the distribution parameters of the characteristics of the low flows are: the method of maximum likelihood MLE [12][13][14], the method of moments MOM [18] and the method of linear moments LMOM [18]. Each of these methods has its advantages and disadvantages. The method of moments cannot be used for non-simple tests, and the estimators of parameters are sensitive to outliers [19]. The most frequently used method in Poland is the method of maximum likelihood that can be used for nonsimple tests. The method of linear moments is less sensitive to significant outlier values in the sample [19]. Sharma and Panu [18] suggest that for small sample size of drought events (<30), it is advisable to use LMOM, and for a large sample -MOM.
The goal of this work is to investigate whether the gamma distribution can be used to describe the probability distribution of the duration and volume of the maximum low flow in the Upper Vistula catchment. In the work, 90% gamma distribution quantiles were determined with parameters estimated in three ways: the method of maximum likelihood, the method of moments and the method of linear moments. The influence of the method of estimation of gamma distribution parameters on the quality of distribution adjustment was also investigated.

Study area and data
The study is based on daily discharge series for the period from 1.11.1983 to 31.10.2013. This yields a total of 30 hydrologic years and 10,958 discharge values obtained at each of 69 water gauging stations located on the right-bank tributaries of the Upper Vistula River in southern Poland (Fig. 1). The gauging cross-sections close catchments of various area ranging from 24.7 km 2 to 16,824 km 2 , and the elevation of gauging stations varies from 139.1 m a.s.l. to 965.6 m a.s.l. About half of the catchment gauging station elevations (more than 35 out of 69) do not exceed 300 m a.s.l.; 53% of catchment areas are in between 100 and 1000 km 2 .
The area of the Upper Vistula is diverse in terms of climate, mainly due to significant morphological differences, chiefly owing to the varied land relief and large area denivelations [20,21]. In the right-bank part of the Upper Vistula catchment, three main climatic regions can be distinguished. Because the southern part of the area lies in the Carpathians and the Beskid Mountains, the region has a mountain climate, rich in precipitation. The boundary of this region coincides with the annual isotherm of 7 ºC and the annual isohyet of 900 mm. Located more to the north is the climate of the Carpathian Foothillsmoderately warm and humid, with an average annual temperature of 7-8 ºC and an annual rainfall of 700-900 mm. The climate of the region of Kotliny Podgórskie [Sub-mountain Basins], which is located in the northernmost part of the studied area, is warmer and moderately dry [21].

Selection of maximum low flow
Based on the 30-year daily flow sequences, at each of the 69 water gauging stations, using the SPA method and the limit flow Q 70% read from the flow duration curve, two characteristics of the low flow were determined: duration T [day] and deficit V [day]. The low flow deficit is regulated by a multi-year average flow Q and is therefore expressed in 24 hours. The determination of the volume expressed in days shows how many days, at a given water gauging station, the low flow deficit could be filled with the average flow.
The normalized limit flow Q 70% calculated, for the 69 water gauging stations, varies between 0.2 and 0.6, and in most cases (43 per 69) it is smaller than 0.4 (Fig. 2). The maximum annual low flows are understood in two ways: as a low flow with the longest annual duration in a year, T max,i , i = 1, 2, …, 30, and as a low flow with the highest annual deficit V max,i , i = 1, 2, …, 30. If a low flow circuit started in one hydrological year and ended in the next one, it was not divided, but assigned in its entirety to the year in which its mean was observed. This approach was proposed by Zelenhasić and Salvai [11]. 138 (69 water gauging stations, two definitions of the maximum low flows) time series of maximum low flows were obtained.
During the considered period (1983-2013), the maximum low flows lasted, on average, from 37 days to 98.5 days, with a maximum of 302 days (Fig. 3). Low flow deficit, depending on the water gauging station limit, ranged from 3.4 days to 10 days on average. There is no dependence of the median T max and the median V max on the height H of the gauge (the Pearson correlation coefficient between the median T max and H is 0.16, and the median V max and H -0.31) (Figs. 3 and 4).
There is no dependence of the median T max and the height H of the gauge (the Pearson correlation coefficient between the median T max and H is 0.16) (Fig.  3). The Pearson correlation coefficient between the median V max and H is 0.31 is significant (by the significance level, 0.05) (Fig. 4).  The maximum low flows usually start in September, the lowest probability of their occurrence is in March and April. In the multi-year surveys, 2010 was the "wettest" year (with the lowest number of low flow days) and 2012 the "driest" (with the highest number of low flow days).
At 54 (out of 69) water gauging stations, low flows were observed annually (Fig. 5), while at the remaining 22%, at least in one year all low flows were higher than the assumed limit flow. At one of the gauging stations (Mikuszowice, the Biała River) no low flows were observed for 6 years out of the 30 measured.

Methods
Because low flows need not be found in every year, zero values may appear in the ranks of duration and deficit of low flows, determined by the AMS method. In this situation, Stedinger et al. [22] recommend the use of a conditional probability model. An additional parameter describes the probability p 0 that the duration or deficit of the low flow is zero. The distribution function F(x), for any value x > 0, is as in the formula: (1) where G(x) is a continuous cumulative distribution function for non-zero values of X (a random variable X representing T max or V max ). All statistical calculations were performed using the GNU R software package [23]. For all the tests considered in the paper, we fixed the significance level at 0.05.

Mann-Kendall test
The stationarity of the time series duration T max and the deficit V max of the maximum annual low flows was tested by the Mann-Kendall test [24][25][26]. The test statistic is defined as: where x value (T max or V max ) with sample size n: When H 0 : S = 0 is true (i.e. stationarity), the mean and variance of S are given by: The above equality (5) occurs when there are no repeated elements of a time series (so-called ties). If there are more than a few repeating elements, then the formula (5) needs to be adjusted, which yields the following form [27]: where t m , m = 1, 2, …, k, indicates that there exists a number of groups with k repeating elements.
Kendall [25] proved that when n > 10, then the test statistic: approximately follows the standard normal distribution N(0;1). If the p v value of the hypothesis test is less or equal to the chosen significance level  v , the null hypothesis must be rejected. The basic assumption when using this test is a lack of autocorrelation in the data series. If this condition is not met and autocorrelation is positive, which is a frequent situation in hydrology, then the variance var(S) is underestimated. Bayley and Hammersley [26] introduced a variance adjustment: when using "an effective number of observations  S n ", which Hamed and Rao [28] suggest to determine as: where n is the number of observations and  S (i) is the autocorrelation function of ranks of observations. The variance adjustment is only calculated for data whose autocorrelations are significant at the 5% level (according to the Ljung-Box test [23]).
The variance var(S) in (6) is replaced by var*(S) from (8) and the rest of the testing process is the same as in (7) (i.e. comparing the values of U with the quantiles of N(0;1) corresponding to the type I error of the test).
The description of the Mann-Kendall test with the Hamed and Rao correction, together with the use of this test to determine the stationarity of the series of low flows can be found in the work of Baran-Gurgul [29].

Gamma distribution
To obtain the probability distribution of the maximum low flow, the two-parameter gamma distribution was used. The probability density function of the gamma distribution is [23]: where  is the shape, 1/the scale parameter, and (.) the gamma function.

Estimation of gamma distribution parameters
The paper uses three gamma distribution parameter estimation methods: MOM, LMOM and MLE.
Using individual methods, the estimators of gamma distribution parameters can be calculated according to formulas [20,22,23,30]: x and s are the sample mean and standard deviation, respectively; t = t 2 / t 1 , and t r -rth linear moment of data sample (r = 1, 2).

Anderson-Darling test
The quality of matching the gamma distribution to the data was tested with the Anderson-Darling test. For a random ordered sample: (x 1 , x 2 , ..., x n ), the AD statistic for this test is expressed by the formula [23]: where F is the cumulative distribution function of the specified distribution.
The critical values for the Anderson-Darling test are dependent on the specific distribution that is being tested. Tabulated values and formulas have been published in the various papers by Stephens (e.g. [31]) for a few specific distributions (normal, lognormal, exponential, Weibull, logistic, extreme value type I).

Results and discussion
Investigation of the hypothesis about the stationarity of the time series of T max duration and V max deficit of the maximum annual low flows at the 69 water gauging stations was carried out with the non-parametric Mann-Kendall test. In all analysed cases, the Mann-Kendall test p v values are greater than 5%, which allows acceptance of the hypothesis about the stationarity of the time series of the maximum low flows T max and V max . This acceptance allows creating probability distributions of these variables and to calculate the values of T max and V max variables exceeded with a given probability.
Parameters of the gamma distribution of T max and V max variables were estimated using three methods: MOM, LMOM and MLE, and the quality of matching the gamma distribution to the data was tested using the Anderson-Darling test.
In all analysed cases, regardless of how the gamma distribution parameters are estimated, the p v value of the Anderson-Darling test is above 5% (Fig. 6) and there is no reason to reject the hypothesis of theoretical and empirical agreement, which means that T max and V max can be determined, at the level of 5%, with the gamma distribution with parameters estimated by each of the methods used.
The minimum p v values at the tested 69 water gauging stations are respectively: for the T max variable: MOM -18%, LMOM -33%, MLE -26%, and the V max variable: MOM -7%, LMOM -26%, MLE -37%. Furthermore, the median of p v values at the 69 water gauging stations is the lowest for both low flow characteristics when estimating distribution parameters using the method of moments (0.78 for T max and 0.83 for V max ). In other cases, the p v medians are high, 0.88-0.91 (Fig. 6). For individual gauging stations, the highest p v of the Anderson-Darling test was usually found by estimating the distribution parameters using the LMOM method (68% of cases for T max and 48% for V max ), and the the least frequently -MOM (in 20% of cases for T max and 35% for V max ) (Fig. 7)   Fig. 7. Number of N water gauging stations with the highest p v of the Anderson-Darling test which was found by estimating the distribution parameters using the three methods. Figures 8 and 9 graphically illustrate the results of parameter estimation (, 1/) for the gamma probability distribution, respectively, for duration T max and deficit V max of the maximum low flow at all gauging stations.     (Fig. 13).

Final remarks
The probability distribution of the best choice is important for the description of low flow statistics for studies related to low flow analysis.
Stationarity study of 30-year time series duration T max and deficit V max of the maximum annual low flow using the Mann-Kendall test showed that at all 69 water gauging stations there were no grounds for rejecting the hypothesis of lack of trend.
In all analysed cases, irrespective of how the gamma distribution parameters are estimated, the variables of duration T max and deficit V max can be described by a onedimensional gamma distribution at the significance level of 5%. The quality of the fit (which was tested with the Anderson-Darling goodness of fit test) is high, the p v probability values are the highest when using the LMOM method to estimate the distribution parameters (T max : p v , min = 32%, p v , mean = 83%, V max : p v , min = 26%, p v , mean = 83%).
Confirmation of a good fit of the gamma distribution is a high correlation, above 90%, between the quantiles of empirical distributions and the quantiles of the gamma distribution, regardless of the method of estimating the parameters of this distribution.
The highest 90% quantiles of the gamma distribution were obtained, both for the duration T max and the deficit V max of the maximum low flow, using the LMOM method for estimation. Depending on the gauging station, T max,90% is on average around 142 days and V max,90% around 18 days.

Conclusions
Based on the analysis carried out on 30-year long daily flows at the 69 water gauging stations, we can draw the following conclusions: 1. The gamma distribution can be used to describe the duration and deficit of the maximum low flow in the Upper Vistula catchment. 2. In most catchments, the best fit of the gamma distribution according to the Anderson-Darling test was obtained by estimating the parameters using the method of linear moments. 3. The largest 90% quantiles were obtained by estimating the gamma distribution parameters using the method of linear moments. 4. In the Carpathian part of the Upper Vistula catchment, the area which faces the highest risk of low flows with the longer duration (over 195 days) and the largest deficit (over 22 days) is the area of the Central Western Carpathians. The Carpathian catchment of the Upper Vistula is a very diverse region both due to its shape, as well as to its geological structure and climate. The Central Western Carpathians, with their highest part, the Tatras, are built of granitic and metamorphic rocks, the Outer Carpathians and Sub-Carpathian region are made of flysch, mainly sandstone and shale [33].