An analysis of the hydrological regime as a factor influencing on the distributions of maximum annual flows

Statistical models of freshet flows are the basis for the design of hydrotechnical structures and for undertaking all and any activities related with flood threat. With regard to the method of data preparation for estimation and to the estimation procedure itself, the methods applied in such situations can be divided into two parts FFA (Flood Frequency Analysis) and POT (Peak Over Threshold). In this study a comparison of those methods is made, using an original mixture of distributions (FFA) and an original procedure of distribution estimation (POT) for six selected water gauges on the river Odra.


Introduction
For years, statistical elaboration of freshet flows has been the basis for the estimation of high quantiles of transgression. Those values are necessary at the stage of design of hydrotechnical structures, identification of areas with elevated risk of flood threat, etc. Therefore, starting with the nineties of the 20th century that subject is constantly in the area of interest of science. There is no way to analyse all the studies concerned with this problem. However, there are a few studies that partially fulfil the task, e.g. [1,2]. For some time now more and more studies are devoted to the problem of nonstationarity of freshet flows. Those studies include also review elements covering the methods of estimation of maximum flows [3]. The methods used in the case of estimation of maximum flows can be divided into two parts [4], in relation to the method of preparation of data for the estimation, and to the estimation procedure itself.
In the first method, called FFA (Flood Frequency Analysis), only one observation -of the maximum flow -is taken from each hydrological year. On the basis of multi-year observations obtained in this manner, probability distributions of maximum flows are estimated. In this case the list of distributions used is extensive [4], [5] and contains over a dozen distributions of various types, starting with the three-parameter gamma distribution (Pearson type III), and ending with GEV type distributions. The choice of distributions applied in individual countries is highly diverse. For instance, in the USA the basic distribution is the log-Pearson distribution [6], in Poland the set of allowable distributions includes such distributions as Pearson type III, log-normal, Gumbel, and GEV [7]. Applying the above unimodal distributions it is assumed that the observed maximum flows originate from a simple sample. This assumption is highly debatable, as there are dry years, with no significant high flows, in which it is hard to classify the maximum flow as a freshet flow. To solve that problem of genetic heterogeneity, in the study [8] it was assumed that the sought distribution is a mixture of two probability distributions. The first is responsible for non-freshet maxima, and the second for freshet ones. This type of approach is now fairly often applied in similar problems related to e.g. economics [9] or ecology -e.g. [10,11]. The MIX distribution proposed in the paper by [8] will be the basis for the comparative analyses presented herein. It is defined as follows: (1) where: is a two-parameter gamma distribution; is a three-parameter distribution of generalised extreme values; is a parameter of the mixture. In the second method, called POT (Peak Over Threshold), proposed by Teodorović and Zelenhaisć [12] in 1970, all flood events are determined that exceed an arbitrarily chosen cut-off threshold in the analysed multiyear period. In consequence, the result of elaboration of such data can be either the volume of the high-flow event, its duration, or the highest flow in the event. In the context of comparison of the two methods, only one parameter of a high-flow event was adopted for analysis -its maximum flow. To determine the maximum annual flow it is also necessary to analyse the number of highflow events in a year. Hence, the estimation of the distribution of maximum annual flows is a two-stage procedure. In the first step we determine the distribution of maximum flows is all high-flow events of the analysed multi-year period (Fig.1). The high-flow event is adopted as a sequence of days with flows above the threshold . In the study it was assumed that it corresponds to the flow . The conducted goodnessof-fit tests showed that the best fit is displayed by the two-parameter generalised Pareto distribution (GPD): In the second step the number of high flows in a year is determined. As a standard it is assumed that their distribution is a Poisson one. Then the distribution of maximum annual flows is determined as a compound Poisson distribution (CP) in accordance with the formula: (2) where: is the value of the maximum annual flow; is the threshold and is the flow above the threshold; is the mean number of high-flow events in the year. Formally it is a distribution with three parameters of GPD distribution and of Poisson distribution. However, the GPD distribution is determined for flows exceeding a certain additional thresholdit constitutes the fourth, hard to estimate, parameter of the distribution.
The paper presents the results of estimation of the distribution of maximum flows at selected water gauges on the river Odra, using the MIX distribution for the FFA method, and for the POT methodthe compound Poisson distribution CP described above.

River Odra and its catchment basin
The river Odra is one of the longest rivers in the catchment basin of the Baltic Sea. It is the second longest river in Poland, after the Vistula. Its length is 854 km, out of which 742 km within the territory of Poland. The total area of the catchment basin of the river Odra is 118861 km 2 . Its major part, approx. 89%, is within the territory of Poland, 5% in Germany, and 6% in the territory of the Czech Republic.
Over the entire length of the Odra river one can distinguish two parts: the mountain part, covering the initial 50 km of its run, and the lowland parton the remaining length of the river. The Odra river is canalised over 186 km, with 24 weirs damming the water.
The landscape of the Odra Valley within the territory of Poland is highly diversified (Tab.1). The Valley cuts across areas with diverse geological structure and relief, formed by various factors. Also the age of the particular sections is varied: the upper section, from the border of Poland to the Ścinawskie Depression, was formed ultimately after the recession of the ice-sheer of the Odra glaciation (300-280 thousand years ago), while the section from Siekierki to the mouth of the Odra into the Szczecin Lagoon is only 13-15 thousand years old.
The Valley of the river Odra in included in nine physiographic units with the rank of mesoregions, that form seven units with the rank of macroregions. The On the basis of Kondracki's [13] physiographic division of Poland, the Odra Valley is divided into 9 sections: The catchment basin of the river Odra is highly developed and exceptionally asymmetrical. The basins of the left-bank tributaries situated in the Sudetes and the Sudetes Foreland and that of the river Olza flowing out of the Silesian Beskid are classified as basins of mountain-lowland rivers. The hypsometric system of the entire mesoregion is diversified and characterised by a story system of geo-ecological units. Three types of landscape are distinguished here: mountain and submontane, highland and lowland. The diversity of the environment of the area affects not only the amount of precipitations, but also the rate of runoff and the retention capacity of the catchment basins.
The main left-bank tributaries of the Odra are the rivers: Osobłoga, Nysa Kłodzka, Oława, Ślęza, Bystrzyca, Kaczawa, Bóbr, Nysa Łużycka. Right-bank tributaries, such as the rivers Kłodnica, Mała Panew, Widawa and Barycz, are lowland rivers. The right-bank tributary Warta, whose length and catchment basin area are nearly equal to those of the Odra at the point where the rivers join near Kostrzyń, has a significant effect only on the lower section of the Odra.
The Odra catchment basin is an interesting case among European rivers due to the impact of various civilisation cultures on the state of economic development in consecutive historical periods. The current hydrological regime of the river is an effect of natural geographic and climatic conditions and of ages of human activity, especially intensive in the 19 th and 20 th centuries. This is clearly observable due to the regulatory and reservoir structures on the river network in the catchment basins of the upper and central river Odra. The bed of the river Odra was once very meandering and over the last 200 years has been shortened about 160 km on many sections due to the digging of cross-cuts on many meanders [14].
In the area of the Odra catchment basin two climates clashthe marine and the continental, which causes a high variation of the weather. According to Polish Institute of Meteorology and Water Management (IMGW PIB) data, the annual precipitation totals in the period of 1961-1980 varied from 545 mm in the catchment basin of the Kaczawa to 1380 mm in the catchment basin of the Bóbr. Among the European rivers, the Odra is one of the least abundant in water. This is shown by analyses of the average flow volume quotient (SSQ) of the Elbe, the Vistula, the Danube and the Rhine to that of the Odra, the mean value of which is 1.06, 1.7, 2.9 and 5.0, respectively. Therefore, the Odra has less water than the Elbe, by about 6%, than the Vistulaby ca. 60%, nearly threefold less than the Danube, and fivefold less than the Rhine.
With regard to the longitudinal slope, the Odra is divided into the upper, central and lower:  the Upper Odra: from the springs to Koźle, with length of 202 km. On the initial 54 km the Odra has the character of a mountain river, with slope of 7.2‰, while in the territory of Poland the slope is notably less, at about 0.33‰,  the Central Odra: from Koźle to the mouth of the Warta, with slopes from 0.28 to 0.19%, has the length of 522 km (including a canalised section of 187 km and a free-flowing section of 335 km);  the Lower Odra: from the mouth of the Warta to the Szczecin Lagoon, with slopes from 0.05 to 0.00‰. In the valley of the river Odra floods occur almost every year, either in its upper or lower section, or else over its entire length. Analyses of the frequency of occurrence of maximum diurnal precipitations indicate that the months with a high flood threat on the Odra are July and August.
Over the period of 988-1774 the chronicles recorded 36 great floods in the catchment basin of the Odra. In the 19 th century catastrophic floods occurred in 1813, 1854, 1855 and 1888 [15]. In the 20 th century there were a number of serious floods, e.g. in 1903,1915,1924,1938,1940,1947,1958,1960,1963,1964,1965,1970,1972,1977,1980,1985,1997. The greatest flood, with an unprecedented scale, exceeding the most catastrophic estimates, took place in July 1997, second to it being the flood of July 1903, that was considered to be the greatest flood of the 20 th century until July 1997.
The formation of a flood wave depends on the parameters of the catchment basin, such as the local topographic, hydrographic and geological conditions, and on human economic activity (retention reservoirs, relief canals, polders, embankments, building-up of mountain brooks, changes in afforestation and land use). The probability of occurrence of catastrophic flood waves that would encompass the upper, central and lower sections of the Oder is very low. In the lower section of the Odra the most frequent are spring floods, often related with ice progress down the river, and also during unfavourable storms on the Baltic, inhibiting flood wave waters discharge in to the sea.

Methods
From among all water gauge stations along the river Odra (Tab. 2) six were selected for comparative analyses of distributions of the MIX and the compound Poisson distribution with the FFA and POT methods. The selected water gauges are marked in bold type in Table  2.
In each case the analysis covered flows from the period of 1966-2010. The water gauge stations were selected so as to emphasise the diversity of the maximum annual flows. The main difference between the standard, unimodal, distributions of maximum flows and the analysed MIX distribution and the compound Poisson distribution is the possibility of taking into account two fundamentally different states of flow in the catchment basin. In the FFA method, due to the manner of gathering of observations, it is difficult to assume that the data set obtained that way will constitute a simple sample. The MIX distribution, due to its structure, allows the description of high-flow events that are of the order of mean flows from a multi-year period and actual high-flow events. In the POT method, as flows lower than the estimated cut-off threshold are neglected, that problem is solved automatically. The genetic heterogeneity is illustrated in Fig. 1. It is plotted for the water gauge stations in Krzyżanowice (Silesian Lowland) and Słubice (Warta-Odra Urstromtal).

Estimation of parameters of MIX distribution
The estimation of parameters of the MIX distribution was determined with the method of the maximum likelihood [8]. The global maximum of the likelihood function was determined using the genetic algorithm of search for the global extreme of a multivariate function developed by Kenneth Price and Rainer Storn [16].

Estimation of parameters of compound Poisson distribution
The POT method [12] was applied to determine consecutive high-flow events from the daily values of flows in the water gauge profiles. In the climatic and hydrological conditions of Poland, for the 45-year series of daily flows the number of high-flow events varies from 50 to 200. The first step of estimation of the distribution was the analysis of the fit of the maximum flows in a high-flow event to the unidimensional generalised Pareto distribution. Estimation of unknown parameters of the distribution was performed as follows:  the observed maximum flows were arranged in an increasing order, obtaining sequence  it was assumed that the shift parameter will be equal to consecutive observations of the ordered sequence ;  sequences of differences were determined;  the estimators of of the generalised Pareto distribution GPD were determined with the method of the maximum likelihood, analysing the sequence of differences i ;  the agreement of the estimated distribution with the observation data was tested using test ;  the estimators that maximised the p-value of test were adopted as the best set of estimators of . In the second step the mean number of high-flow events in a year was determined. Only those high-flow events were taken into consideration in which the maximum flow exceeded threshold .

Results
The correctness of the choice of the analysed distributions was verified using the goodness-of-fit test At confidence level , in each of the analysed distributions there were no grounds for the rejection of the hypothesis of conformance of the maximum annual flows with the MIX and GPD distributions. The relevant p-values of the test are presented in Table 3. The fitting of both distributions to the empirical data is presented graphically in Fig. 2. The distributions were compared by marking the distribution functions in each graph:  empiricalblue stars;  estimated MIX distributionred line;  estimated compound Poisson distribution black line. The horizontal axis in the Figure is specially formatted in the scale of probability so as to facilitate the interpretation of the results obtained. The standard linear scale is in this case transformed by an inverse function to the gamma distribution [17].
Analysis of the distributions presented in the graphs allows the formulation of the following observations: 1. The MIX distribution is less sensitive to outstanding high values of flows. In all the graphs its distribution function limits the variation of maximum flows from below. This means that the n-year mean water estimated with the MIX will limit its actual variation from below.
2. The CP distribution is more sensitive to high values of flows. In almost all cases the graph of the distribution function limits the variation of flows from above. At all analysed water gauges the curve of the CP distribution lies within the area of confidence [8] determined by the MIX distribution.
3. At two water gauge stations situated on the lower Odra (Słubice, Gozdowice) the graph of the CP distribution diverges strongly from the observed maximum annual flows. In those cases the conducted tests of conformance (Kolmogorov's  test) suggested the rejection of the hypothesis of conformance.
4. The poor fit of the CP distribution to the maximum annual flows at Słubice and Gozdowice results from the difficulty of estimation of the distribution of the number of high-flow events in a year. In the study it was assumed that it was a Poisson distribution [18], but due to the low variation of the number of high-flow incidents there is no possibility of correct conformance testing (in this case the estimated means values of 50-and 100-year waters in Tab. 4 are marked in red). Table 4 presents the mean 50-and 100-year waters estimated with the MIX and CP distributions. In addition, in the last column, the estimated values of 100year water are given, obtained with the standard methods (IMGW PIB). Except for the gauge station at Oława, one can note an underestimation of the flows obtained with the standard methods in relation to the MIX distribution. The values confirm also a considerable overestimation of all estimated means of n-year waters determined with the POT method.

Conclusions
On the basis of the conducted statistical inference one can formulate the following conclusions: 1. Both methods, the MIX distribution and the compound Poisson distribution (CP) can be used for the determination of maximum flow distributions. However, at the lower section of the Odra an additional analysis of the distribution of the number of high-flow events in a year with the POT method is necessary. 2. The advantages of using both distributions are the following:  The MIX and GPD distributions provide good fit to strongly diversified data;  The estimations of mean n-year waters obtained with their use limit them from above -CP distribution, and from below -MIX distribution; 3. The disadvantages of using both distributions are the following:  Long, at least 40-year stationary series of observed flows are required,  Tests of conformance of both distributions with observation data require high precision and, due to a small amount of data, are difficult to automate. 4. At the parameters > 0.5 (GEV) and < -0.5 (GPD) both distributions have unlimited varianceit is then impossible to determine the areas of confidence for high quantiles; 5. The form of the estimated distributions is strongly affected by high noted flows, and their potential inaccurate measurement will have a significant impact on the determination of the value of estimation.