Selecting the probability distribution of annual maximum temperature in Malaysia

The issues on global warming have become very popular and been discussed both locally and internationally. This phenomenon due to the temperature rises will increase the variability of climate and more natural disasters were expected to occur. Increasing of global temperature will affect the agricultural sector, increase some of the infectious diseases that may lead to high mortality rates in humans, high demand for electricity, water and food which eventually affecting the economy of Malaysia. Hence, this work aims to study the best fitted probability distribution that describes the annual maximum temperature recorded at seventeen meteorological stations in Malaysia. The Normal, Lognormal, Gamma, Weibull and Generalized Skew Logistic distributions are considered using the maximum likelihood estimation method to estimate the parameters. The goodness of fit test and model selection criteria such as Kolmogorov-Smirnov and AndersonDarling tests, Corrected Akaike Information Criterion and Bayesian Information Criterion are used to measure the accuracy of the predicted data using theoretical probability distributions. The results show that most of the stations favour the Generalized Skew Logistic distribution as the best fitted probability distribution. Also, some stations favour the Normal, Lognormal as well as Weibull distribution as the best fitted distribution to describe the annual maximum temperature.


Introduction
Recently, the issues of global warming are becoming more common due to the high temperature. Although the changes in mean temperature are small, it is actually due to the large changes in the frequency. According to Intergovernmental Panel Climate Change (IPCC), based on observations of increases in global average air temperatures, the warming of the climate system is now "unequivocal" and the surface temperatures is predicted to rise over the 21st and the heat waves are expected to occur more frequently and last longer [1,2]. This phenomenon due to the temperature rises eventually will increase the variability of climate and also the occurrence of natural disasters.
Malaysia consists of two regions, namely, Peninsular Malaysia and East Malaysia and is located within the equatorial region where it is characterized as being hot and humid throughout the year. The average annual temperature is about 27°C where the daytime temperature rises above 30°C and the night temperature rarely drops below 20°C. The *Corresponding author: nurfatini8532@gmail.com temperature in Malaysia is predicted to continue on an increasing trend [3] and compared to the other region in Malaysia, the rising temperature in Western Peninsular Malaysia is more significant [4].
The anthropogenic or human factors such as land conversion, industrialization and transportation release greenhouse gasses which amplify the air temperature [4]. Several studies have shown that extreme temperatures will increase heat waves [5]. Stress from excess heat may lead to blood pressure and heart diseases. Increased temperatures and changes in precipitation patterns may cause a rise in malaria, cholera and dengue. This problem is already observed with malaria in Southeast Asia. Malaysia is one of the countries located in Southeast Asia faces potential threats to population health and development due to the changes in temperature.
Based on the research conducted by various researchers, the increase in global temperature gives negative effects on the agricultural sector, health, food and water supply as well as the environment. The rate of evaporation becomes faster and thus lead to drought. The agricultural sector will be affected by the impacts of rising temperatures where the soil moisture tends to reduce more. The low humidity will increase the risk of wildfires and open burning which eventually speeding up the warming in air temperature.
Mori [6] found that there is an increased demand for electricity during periods of extremely hot temperatures. Moreover, extremely high temperatures even have an indirect impact on the economy of Malaysia. As we all know, floods are one of the inevitable accidents that frequently happen in Malaysia. From 2001 to 2005, a total of RM1.79 billion was spent on structural flood mitigation measures [7]. The increment of extreme weather such as drought and heavy floods is also associated with the influence of the El Nino phenomenon [4].
Several analysts from all over the world have conducted this statistical analysis of maximum temperature in different locations. Araújo et al. [8] have shown that the Normal distribution as the best distribution to describe the daily series of maximum temperature in Iguatu City, Ceara. In 2011, de Araújo et al. [9] further their research in Iguatu City, Ceara where they focused on fitting the probabilities of occurrence of maximum temperature in the scale of fifteen days for each month of the year by using the Lognormal distribution. Torsen et al. [10] revealed that the Johnson Sb distribution is the best fitted distribution for the maximum temperature in Adamawa state in Nigeria. The Generalized Skew Logistic distribution is selected as the best fit to observe the monthly maximum temperature of Dhaka station [11]. In 2018, Abdulla and Hossain [12] found the Generalized Skew Logistic distribution was the favoured distribution for Cox's Bazar station while the Weibull distribution describes the best for Patuakhali station.
In Malaysia, several similar analyses have been done for the rainfall [13][14][15] and wind speed data [16,17]. Daud et al. [13] discussed the comparative assessment of eight candidate distributions in providing accurate and reliable maximum rainfall estimates for Malaysia. Among the eight distributions, the Generalized Extreme Value (GEV) distribution is chosen as the best fitted probability distribution to describe the annual rainfall in Malaysia. Dan'azumi et al. [14] also studied the statistical distribution of hourly rainfall depth for twelve representative stations spread across Peninsular Malaysia. It is observed that the Generalized Pareto distribution (GPD) fits well compared to the Exponential and Gamma distribution. Meanwhile, the Weibull distribution is widely used and chosen as the best fits for describing the wind speed data [16,17].
Aside from analyzing the rainfall and wind speed distribution, it is also important to identify the behaviour of maximum temperature as it is necessary to decrease the impact of climate change in this country. Most studies on temperature emphasize the use of the Generalized Extreme Value [3,18,19] and Generalized Pareto distribution [20] only. Hence, this work aims to find the maximum temperature distribution by comparing several distributions and determine the probability distribution that describes the best for the annual maximum temperature in Malaysia.

Data description
The daily maximum temperature data are recorded at seventeen meteorological stations in Malaysia over the period of January 1994 to December 2017, in which the annual maximum temperature is obtained.

Methodology
In order to describe the annual maximum temperature, it is important to identify the distribution that fits well with the data. As mentioned, most studies in Malaysia are focusing on the analysis of maximum temperature data by using the GEV and GPD distributions. Hence, this study aims to cover other distributions other than GEV and GPD to analyze the maximum temperature. The Normal, Lognormal, Gamma, Weibull and Generalized Skew Logistic distributions are used and the parameters are estimated using the maximum likelihood estimation method since it provides a consistent approach to the parameter estimation problems. Table 1 shows the considered probability distributions. Table 1. The probability density function of five distributions and its parameter.

Distribution
Probability density function Parameter 1 exp

Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) tests
The Kolmogorov-Smirnov statistic computes the largest difference between the empirical distribution function of the sample and the cumulative distribution function of the selected distribution. The test statistic is defined as: where F is the theoretical cumulative distribution of the tested distribution that must be continuous, and the parameter is fully specified. The null hypothesis will be rejected if D is greater than the critical value computed from the statistical A is greater than the critical value. The null hypothesis of both tests is that the data follow the specified distribution.

Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)
The AIC and BIC are used to check the accuracy of the predicted data using theoretical probability distributions. The AIC is defined as The BIC is computed as The distribution which provides the smallest value of AIC and BIC is preferable.

Quantile-Quantile (Q-Q) Plot
The Q-Q plot is obtained for the stations which show an unclear decision on selecting the best fitted probability distribution. It is a graphical tool that help us to assess if a set of data plausibly came from some theoretical distribution. A sample of 12 , ,.., n x x x are used to construct the plot by plotting the theoretical quantiles against the sample quantiles, i x , where x refers to the annual maximum temperature data. If the empirical distribution is consistent with the theoretical distribution, the points in the Q-Q plot should lie along the 45-degree reference line.

Results and discussion
The daily maximum temperatures that are covered up for 19 to 24 years are observed. The annual maxima are used as the selection period to study the characteristic of the maximum temperature in seventeen meteorological stations mentioned. Table 2 shows a descriptive analysis of the annual maximum temperature. The longitude and latitude are also listed. Chuping station records the highest mean annual maximum temperature with 37.28°C followed by Alor Setar station with 36.95℃. These two stations are located in Western Peninsular Malaysia. Meanwhile, the mean of Kuala Terengganu station, which is located in the east is observed as the lowest annual maximum temperature. This is consistent with the fact that the temperature in Western Peninsular Malaysia experiences a more significant rise compared to other regions in Malaysia [4].
The bigger standard deviations of Chuping and Alor Setar stations indicate that the annual maximum temperature deviates far from the average maximum temperature while the smallest standard deviation by Kuala Terengganu station indicates that the maximum temperatures are mostly close to the mean. The test statistics for the KS and AD tests along with the AICC and the BIC for the annual maximum temperature were calculated for each of the considered distributions. The distribution favoured by each station are counted based on the smallest value produced from the goodness of fit tests and the model selection criterion. The most preferable distribution for each of the goodness of fit tests and the model selection criterion are shown in Table 3. 5 Based on the KS test, the Generalized Skew Logistic distribution provides a good fit to the annual maximum temperature data at most of the stations except for Subang, Kuala Terengganu and Kota Kinabalu stations which favour the Normal, Lognormal and Weibull distributions, respectively. The AD test reveals that the results are almost the same as the KS test. Only for Bayan Lepas and Subang stations where the Lognormal distribution fit well to the annual maximum temperature data.
Meanwhile, the AICC and BIC give quite a similar result except for Alor Setar and Kuching stations, where there is a different result between the Generalized Skew Logistic and Lognormal distributions. In general, eight stations provide a clear decision while nine other stations show an unclear decision on selecting the best fitted probability distribution.
The best fitted probability distribution for the annual maximum temperature for seventeen meteorological stations are presented in Table 4, along with the estimated parameters. Based on the comparison from the goodness of fit tests, the model selection criterion as well as the validation from the Q-Q plot, it is observed that the Generalized Skew Logistic distribution provides the best fitted probability distribution for twelve stations such as Chuping, Alor Setar, KLIA, Seremban, Melaka, Kuantan, Muadzam Shah, Senai, Mersing, Kota Bharu, Labuan and Kuching stations. This result is also consistent with Hossain et al. [11] who conducted a study using maximum temperature for Dhaka stations.
Meanwhile, three stations, namely, Bayan Lepas, Sitiawan and Kuala Terengganu stations favour the Lognormal distribution. The Weibull distribution provides the best fit for Kota Kinabalu station. This result is also consistent with Abdulla and Hossain [12], who conducted a study using maximum temperature for Patuakhali stations. Lastly, Subang station favours both of the Normal and Lognormal distributions as the best distributions. This is also consistent with the results obtained by Araújo et al. [8] and de Araújo et al. [9].

Conclusion
In this study, the annual maximum temperature recorded at seventeen meteorological stations in Malaysia are analyzed using the Normal, Lognormal, Gamma, Weibull and Generalized Skew Logistic distributions. The parameters are estimated using the maximum likelihood estimation method. The selection of best fitted probability distribution is determined using the comparison between the goodness of fit test and the model selection criterion namely the Kolmogorov-Smirnov and Anderson-Darling tests along with the Corrected Akaike Information Criterion and Bayesian Information Criterion. It has been observed that most of the stations favour the Generalized Skew Logistic distribution and some of the stations favour the Lognormal, Normal as well as Weibull distributions as the best fitted probability distribution to describe the annual maximum temperature. It is also noticed that most stations having a right-skewed distribution as shown in Fig. 1, where the tail of the distribution is longer to the right-hand side compared to the left-hand side. It is observed that the number of maximum temperature peaks around 34℃ to 36℃ and the distribution extend further into the higher maximum temperature than to the lower maximum temperature. Hence, the stations that follow the Generalized Skew Logistic distribution have a higher annual maximum temperature compared to the stations that follow the Normal, Lognormal and Weibull distributions. The results can be improved for future works by using the longer-term period as it would give more accurate information about the behaviour of annual maximum temperature. More three-parameter distributions can be included since the distribution might provide more fit to the data. Plus, the more parameters a probabilistic model has, the more flexible it becomes in adjusting the data. The analysis on the maximum temperature allow the scientists to study the behaviour of maximum temperature and its impacts and later make 8 a prediction. The results from this study will give benefits to the society to build a better explanation on the maximum temperature and help to bring awareness for the local people about the maximum temperature. We hope that this study on the maximum temperature will be useful in understanding the events of extreme temperatures in Malaysia.
We would like to thank the Malaysian Meteorological Department for providing the data and the School of Mathematical Sciences, Universiti Sains Malaysia for the support.