Markov chain as a tool for forecasting daily precipitation in the vicinity of the city of Bydgoszcz, Poland

The crop yield depends on numerous weather factors, but mainly on the rainfall pattern and course of air temperature during vegetation period. Investigating the dependence of yields on rainfall, apart from its amount, there also should be taken into account dry spell periods. The two-state Markov chain was considered as a precipitation pattern in the investigation, since it is generally recognized as a simple and effective model of the precipitation occurrence. Based on the daily precipitation totals from the period 1971–2013, the Markov chain was designated. The data were derived from a measuring point of the University of Science and Technology in Bydgoszcz, Poland. As one of the objectives was to determine the order of the Markov chain examined describing the change of precipitation in subsequent days. Another aim was to investigate rainfall dependencies on a month of a year. An analysis of this data leads to the conclusion that the chain is second order. This is confirmed by the two criteria used: BIC (Bayesian Information Criteria) and AIC (Akaike Information Criteria). The research regarded the precipitation volume dependence on a month of the year.


Introduction
Estimating and predicting precipitation is a problem of fundamental importance for agriculture, hydrology and ecology. Information on the probability of precipitation, its size and the number of days without rainfall, is necessary for designing sanitary systems for draining rainwater, or planning irrigation systems as an alternative system of growing plants in order to build a rational management of water in the soil [1]. Determining the distribution of rainfall is also necessary for planning the use of water resources on a larger scale, eg in the national economy. The management of water resources in a given area is always based on historical, current and future meteorological data at various time scales: annual, monthly and daily. The forecast of daytime rainfall is an extremely difficult since atmospheric precipitation, as a meteorological element, is a very complex phenomenon and changes with time and place [2]. It is one of the most unpredictable events even for updated climate models. The range of uncertainty varies depending on the model, the physical characteristics of the atmosphere and the complexity associated with its mathematical modeling.
Cazacioc and Cipu [3] point out three methods of precipitation forecasting: subjective forecasts based on the experience of forecasters, deterministic forecasts obtained from numerical weather forecast models as well as statistical forecasts. The two latter tools are the most objective. Statistical models contribute to a large extent to reducing the uncertainty resulting from the complexity of the phenomenon of precipitation.
Among the statistical techniques of atmospheric precipitation modeling are Markov chains, used to predict short-term rainfall. The use of Markov chains for rainfall modeling was introduced over 40 years ago [4][5][6]. In spite of general simplicity, the model of the first order Markov chain (based on rainfall from the previous day) remains a suitable technique for modeling rainfall data in many geographical areas [7]. Markov chain models have two advantages: forecasts are available immediately after completing the observation, because they use only local weather information as predictors and require minimal calculations after processing climatological data.
Markov chains determine the state of each day as "wet" or "dry" and allow explaining the relationship between the current day's state and the previous day. The order of the chain is the maximum number of days preceding the day on which the state of the current day depends. Most of the Markov chain models mentioned in the literature are first-order models [8][9][10]. Some researchers [11][12][13] point out the inconvenience in the use of first-order models that underestimate the length, frequency and variability of precipitation, which is why higher-order models are recommended. The two state Markov chain, as a model of rainfall, was studied by Gabriel and Neumann [5] and was generalized by Teodorovic and Woolhiser [14] and Katz [8]. Many authors use Markov chains to model the occurrence of daily rainfall [15][16][17][18][19][20]. The low order is most often preferred for two reasons: the number of parameters is kept to a minimum so as to obtain a better estimate, moreover, the latter use of a tailored model to calculate other variables, such as the probability of long dry (rain free) periods, is much simpler.
The distribution of the number of days with precipitation usually has gamma distribution characteristics. In the analysis of pluviometric conditions, two characteristic features of rainfall are usually distinguished: their occurrence and their height or intensity. In modeling, they can be considered together or separately. This paper only discusses the occurrence of precipitation, more precisely modeling daily rainfall.
The objective of the paper was basic statistical analysis of daily rainfall and, in particular, determining the order of the analyzed Markov for modeling everyday rainfall phenomena in the region of the city of Bydgoszcz, Poland. In addition, the dependencies of precipitation totals on the month of the year were examined.

Material and research methods
The paper presents a statistical analysis of data on daily rainfall totals recorded in the January-December period in the years 1971-2013 at the meteorological station of the Faculty of Agriculture and Biotechnology of the University of Science and Technology in Bydgoszcz at the Research Center located in the agricultural area of Mochle, about 17 km away from Bydgoszcz.
The basic methods of statistical analysis used in the work are methods related to estimation of Markov chain parameters. In particular, the matrix of transition probabilities for the Markov chain of precipitation in the selected years was estimated and the stationary probabilities were determined. The basic statistical operation was to determine the order of the Markov chain. In order to achieve this, two criteria of determining the chain order: the BIC (Bayesian Information Criteria) [21] and the AIC (Akaike Information Criteria) [22]. Both are based on the loglikelihood functions for transition probability of the Markov chain constructed on certain data series.

Markov chain
The simplest kinds of discrete variable is that which has binary values (1 / 0) corresponding to two state in which it can exist. For daily precipitation, those are their occurrence or non-occurrence. A sequence of daily observations from meteorological station constitutes time series of that discrete variable.
For the first order Markov chain, the transition probability to future state depends only on its current state. Knowing that at day i the variable X is either in state 0 ( no precipitation occurs and X(i) = 0 ), or state 1 (precipitation occurs and X(i) = 1). It may be assumed that P{X( n + 1) = x n + 1 │X(n) = x n , X(n -1) = x n-1 ,…, X(0) = x 0 } (1) = P{X( n + 1) = x n + 1 │X(n) = x n } for any x 0 , x 1 ,…,x n , Stochastic process X(n), n = 0, 1, 2, … is called Markov chain. Conditional transition probability at day i + 1 by one step is defined as It is easy see that p 00 + p 01 = 1 and p 10 + p 11 = 1. Matrix P of transition probabilities defined as: is called transition matrix in one step.
The model is fully defined by two transition probabilities: p 01 (the probability that precipitation will occur tomorrow if precipitation did not occur today) and p 11 (the probability that precipitation will occur tomorrow if precipitation occurred today). These probabilities can easily be computed from observed precipitation occurrence time series. Their maximumlikelihood estimates p 01 and p 11 , are given by p 01 = n 01 / (n 00 + n 01 ) (8) p 11 = n 11 / (n 10 + n 11 ) where n 01 is the historical count of wet days that followed dry days, n 00 is the historical count of dry days that the followed dry days and so on. For a Markov chain describing the daily occurrence or non-occurrence of precipitation, the stationary probability π 1 for precipitation, corresponds to the unconditional probability of precipitation is given by formula: Analogously for unconditional probability π 0 is π 0 = (1-p 11 ) / (1 + p 01 -p 11 ) (11)

Estimation of matrix P for 1971 -2013
Matrix for years 1971 -1980 is as follows: The analysis of the matrices P 1 , P 2 , P 3 , P 4 and P 5 of transfer probabilities shows respectively differences between the values of probabilities. The determined matrices are characterized by the succession of days without precipitation and days with precipitation. Table  1 contains the values of stationary probabilities π 1 corresponding to the above matrices Table 1. Stationary probabilities π 1 determined for five of the analyzed. The lowest border probability has been observed for the decade of 1991 -2000. The following problem arises: has this fact been confirmed by other studies.

Higher Order Markov Chains
Let X(i), n = 1, 2, …, n is observed the time series, n 0 number 0's, n 1 number 1's in this sequences, and p 0 = n 0 / n, p 1 = n 1 / n. First, consider for instance a second -order Markov chain. Then transition probability for second-order Markov chain depends on the state i -1, i, i + 1. Transition probability of second order Markov chain can be defined as p krs = P{X(i + 1) = s │X(i) = r, X(i -1) = k}, for k, r, s  {0, 1} Transition probabilities p krs is estimated by formula p krs = n krs / n kr• where n krs• = ∑ n krst .
By n krst is denoted number of realization of transition states The order m is chosen as appropriate that minimizes the functions (29) and (30).
Letting s = 2, the value of statistics AIC(m) and BIC(m) is calculated in Table 2. An analysis of Table 2 allows one to conclude that both AIC and BIC indicate the choice of 2 nd order for the examined Markov chain.

Probability distribution of daily precipitation
The results of the basic statistical analysis of daily rainfall recorded in the years 1971 -2013 are presented below. During this period, out of the 15,706 analyzed days, n = 5,367 days with atmospheric precipitation. Days with precipitation accounted for 5,367/1,5706 = 34.2% of all analyzed cases. The first to be examined was the dependence of precipitation on the month of the year. The length of the series for each month is n = 42. The graph below (Fig. 1) presents the dependences of the mean value of precipitation on the month with the confidence interval, for the mean value with the confidence level 1 -α = 0.95.  Source data in Fig. 1 as well as the results of calculations of basic statistics are presented in Table 3. It includes the values for basic statistics for each month of the year:  mean value -M  left side confidence interval -LSCI  right side confidence interval -RSCI  standard deviation -SD  coefficient variable -CV  maximum value -MAX  sample size -N The analysis of the graph (Fig. 1) as well as the results presented in Table 3 show that the highest rainfall occurs in the months: May, June, July, August and September, while the lowest in February. The longest confidence interval and the highest mean values are characteristic for the period of May, June, July and August. Whereas for December, January, February, March and April, the intervals of confidence are the smallest, the highest variability of daily rainfall was found in June and July. Similarly, the most rainy days were recorded in these months. The confidence interval for the average value was determined on the basis of a statistical sample of a large number, but a relatively large spread of daily totals of precipitation (all coefficients of variation are greater than or equal to 1) implies a relatively large width of this interval.

Probability distribution of daily precipitation
For each month the dependence of the volume of daily rainfall on the year was examined. For all of the 12 months no statistically significant dependence has been confirmed. This means that for the years 1971-2013 the mean total of all daily rainfall neither rose, nor fell.

Concluding remarks
A high variability of rainfall over time can lead to the occurrence of atmospheric drought, and when the offseason occurs in the growing season, it takes the form of agricultural droughts, which result in low yields [23]. Analyses of the variability of precipitation summed up with the use of Markov chains, not only in the theoretical context, but also applied in various fields of the economy were conducted by many authors. The obtained rainfall models included various time scales [24][25][26][27]. Daily models have gained widespread use as suitable for use in a detailed water balance as well as agricultural and environmental models [26]. For daytime precipitation modeling at one point, Stern and Coe [28] used a second order Markov chain to describe precipitation and gamma distribution to forecast the amount of rainfall. The results of the analysis of the Markov chains may be implemented in the forecast of rainfall in a given vegetative season [29,30] or for irrigation scheduling [31]. On the basis of climate reports and future climate change scenarios, one may expect a demand to monitor and forecast rainfall on the basis on simple statistic tools.

Summary
The conducted analysis confirmed that daily rainfall in the area of Bydgoszcz is characterized by high variability in individual months and years. This model of daily rainfall was found to adhere to the second-order Markov chain, which was confirmed by two applied criteria: BIC (Bayesian Information Criteria) and AIC (Akaike Information Criteria). However, such an extensive statistical material failed to confirm the existence of the dependence of precipitation on the particular year. On the other hand, a strong dependence of the amount of daily rainfall on the individual month has been confirmed.
Research was made within the framework of the FACCE JPI -MACSUR project titled: Modelling European Agriculture with Climate Change for Food Security Acronym FACCE MACSUR 2 realized between 01/06/2015 and 31/05/2017