Block Mining reward prediction with Polynomial Regression, Long short-term memory, and Prophet API for Ethereum blockchain miners.

. The Ethereum blockchain is an open-source, decentralized blockchain with functions triggered by smart contract and has voluminous real-time data for analysis using machine learning and deep learning algorithms. Ether is the cryptocurrency of the Ethereum blockchain. Ethereum virtual machine is used to run Turing complete scripts. The data set concerning a block in the Ethereum blockchain with a block number, timestamp, crypto address of the miner, and the block rewards for the miner are explored for K means clustering for clustering miners with a unique crypto address and their rewards. Linear regression and polynomial regression are used for the prediction of the next block reward to the miner. The Long Short-Term Memory (LSTM) algorithm is used to exploit the Ether market data set for predicting the next ether price in the market. Every kind of price and volume for every four hours is taken for prediction. The root mean square error of 34.9% is obtained for linear regression, the silhouette score is 71% for K-means clustering of miners with same rewards, with the optimal number of clusters obtained by Gap statistic method.


Introduction
Blockchain is a decentralized ledger of transactions with the traits of decentralization, transparency, and anonymity. Ethereum 2.0 has the proof of stake consensus algorithm, which has energy efficiency, high economic incentives, and opens up high revenue-generating capability for a large number of users. The validator nodes add blocks to Proof of Stake (POS) Ethereum blockchain [1] by staking their cryptocurrency, ether on the network and anyone validator can be randomly selected to propose a block. The other validators agree with this validator who is proposing the block. The reward is for both attesting the other validators when they are proposing the block and for the self proposal of the block also. Their own money or ether is at stake when they behave maliciously and this is quite in contrast to the hectic energyconsuming proof of work consensus protocol in the bitcoin blockchain. If the validator does not propose a block when his turn comes or fails to do attestations when other validators propose a block then he is penalized for that also. The validator's stake amount will be fully debited and the validator will be sent out from the proof of stake network if he performs any re-org attack of transactions. The proof of stake consensus algorithm does not require the miners to invest more in hardware, the initial hardware setup costs, the electricity bills but the monetary stake amount,32 ether (ETH) has to be staked by the Ethereum miner for active participation as a validator in Ethereum mining process. For those miners who find 32 ETH, a significant amount to be staked upon, then they can join staking pools which aggregate the funds of miners and the resultant rewards are distributed proportionally. There is a decentralization trend in PoS mining since anyone can participate with minimized electricity costs and this opens up revenue opportunities for many people. PoS achieves both scalability and security with sharding of the blockchain.

Related Work
The quintessential details about the proof of stake consensus algorithm for the Ethereum blockchain is given in [1]. Qin, R. et al. [2] discusses the speculation of individual minersin joining mining pools. Due to increased hardware cost, electricity, and maintenance costs, the chances or the probability of the individual miners who can successfully mine a block is very less, since they have to produce a hash value early and once their hash value is less than or equal to target hash produced by the system, they gain rewards. So, the miners join mining pools and contribute their hash to the overall mining pool. The choice of the mining pools or choosing the right mining pool to join and go ahead with the mining is not a trivial one and this is discussed by the authors. This is modelled as a risk decision process. The pool manager has to distribute the reward got from successfully mining to all miners who have joined the mining pool. The mechanisms for distributing the reward include the pay per share mechanism, pay per last N share mechanism, and the flat proportional mechanism. This paper discusses the computational result for a decision of selecting a mining pool.
Qin, R., Yuan et al. [3] discusses the three levels of participants in the bitcoin mining, namely mining pools, individual miners, and blockchain systems. The author lists the issues based on artificial societies, computational experiments, and parallel systems (ACP) and proposes a research framework. Easley et al. [4] discuss the strategy adopted by miners, the significant role of mining rewards, transaction fees, price, waiting time, and external constraints that influence the participation of users in the blockchain. Kiayias, A. et al. [5] discusses the two simplified forms of the game wherein the first game the miners mine a block and release it immediately, in the second form of the game the miners mine a block, and they announce it but defer to release it. In both forms of the game, the best response and the Nash equilibrium is demonstrated. The best response whether it matches the expected behaviour of the bitcoin designer is introspected. Wang, W.et al. [6] discusses the decentralized nature of the blockchain from the perspective of the individual nodes who form the backbone network of the blockchain, the decentralized consensus that runs on these nodes and they have surveyed the incentive mechanisms and byzantine fault tolerance mechanisms from a game theory perspective.
Li, J.et al. [7] discusses the users bidding transaction fees for faster confirmation of transactions. Generalized Second Price auction method is used for the bidding process. The metrics that are considered are transaction size, the bidding scores submitted by the user, the scores of quality, and the virtual fees of the transactions. These metrics play a vital role in reducing and saving the user's fees. Saad M.et al. [8] explores the features in the bitcoin blockchain network and Ethereum blockchain and how they change over time and how they relate to the demand and supply economics of both the cryptocurrencies. Jang, H. et al. [9] discusses the highly correlated network indicators that are useful in predicting the bitcoin price with high accuracy. A Bayesian neural network is used to predict the price of cryptocurrency that takes in to account a sudden decrease in the hash rate that leads to delay in block publishing time and reduction in network throughput. Poongodi M.et al. [10] exploits the ARIMA model for bitcoin data set which is publicly available on the URL http://coinmarketcap.com for 4 consecutive years to analyze the volatility of bitcoin and how it affects the miners joining mining pools and dropping their solo mining. Catalini, C. et al. [11] discusses the two metrics such as the cost of verification of transactionsand thecost ofnetworking, with the cost of verification of transactions related to the cost of verifying the state of transactions along with their attributes, their previous history, and their current ownership of a digital asset. The cost of networking relates to the cost of operating a decentralized digital market place. Salimitari, M.et al. [12] discusses the group of miners joining a mining pool and giving their share of electricity for mining, and three concepts of dynamic game theory such as Social optimum, Nash equilibrium, and myopic Nash equilibrium is proposed. Vimal, S.et al. [13] discusses the efficiency of transferring p2p files in a decentralized interplanetary file system and awarding the miners who successfully shared their resources.
Gupta, S.S et al. [14] discusses clustering methods are used for fraud detection in cryptocurrency financial transactions. Chawathe, S. S. et al. [15] explores the clustering algorithm for the blockchain data and its impact on miners. The chapter from the book, mastering-bitcoin [16], refers to the process of transactions being validated by miners, a new block formed ever since last block creation, 10 minutes ago in blockchain, and the miners mint new coins for each new block creation and the transaction fees from all transactions that are included in the block. The miner who mines for a block by fetching transactions from a mining pool and by solving the target hash given by the bitcoin system or any blockchain system, that is the proof of work system has the crypto address. The miners have to find a hash that is less than or equal to the target hash and the first miner to find it gets the reward in bitcoins. Here we will consider an Ethereum blockchain. But to our surprise for the data posted in http://xblock.pro/xblock-eth.html [17], a simple k means of clustering the mining rewards based on the crypto address of the users for each progressing block number, a linear regression model for prediction of the next block reward, the polynomial regression model for the prediction of the next block reward for theminer has not yet been analyzed in the previous works. Here we have also taken a dataset with stock market prediction of ether from the same url and we have done prediction with LSTM algorithm and Prophet API for ether with ether market data.

Problem Definition
Clustering of miner's crypto address, their rewards per unique miner address and the prediction of miner's rewards in Ethereum blockchain has not been addressed in previous works and we have implemented with the existing K-means clustering of miner's crypto address and rewards. The linear regression and Polynomial regression are used for the prediction of miner's rewards. The LSTM algorithm, Prophet API for prediction of Ether crypto currency in Ether stock market. Here the Ethereum blockchain which has Proof of Stake consensus mining, with Ether currency is first analyzed for mining rewards with the dataset in the website [17] that has attributes, miner crypto address, number of transactions, and the rewards. K means clustering of the miner's crypto address with rewards is implemented and the rewards for 'n'(n=3) optimal number of clusters is implemented below in Section 4. The centroids of the cluster are found, the elbow method of K means clustering, and the scatter plot of the clusters is shown with the linear regression line fitted for the scattered values is shown. The polynomial regression is implemented for the same with the prediction for the next block reward for the miner crypto address and values. The LSTM method and the Facebook Prophet API for the prediction of the stock price of Ether is used here. These methods are plotted and analyzed with statistical measures.

Results and discussion
We have implemented a K mean clustering in python, given three attributes, timestamp, miner crypto address, and block rewards in the dataset [17]. Then we did a polynomial regression and compared it with the normal linear regression. The Ordinary Least Square (OLS) regression model fit is shown below in Fig.3. The data is scattered and the accuracy of the model with the statistics is shown below. The timestamp for the miner, the crypto address of the miner with the rewards for successfully mining is shown in the head of the data frame with five values in  The crypto address being an object data type, the other attributes such as timestamp, VALUES being integer values and reward being float value respectively. The data set is acquired from the block dataset present in http://xblock.pro/xblock-eth.html [17].
The summary of the data frame is given below in Fig 2. with the distribution of the data frame columns, count (number of non-null observations), mean, standard deviation min, max, 25% (lower), 50% (median), and 75% (upper) of values. The min+(max-min) * percentile is used for calculating percentiles per column values. Here we have replaced the miner's crypto address with a unique integer value for each unique crypto address, so the clustering with the unique integer per miner's crypto address is implemented seamlessly. The linear regression results are shown below in Fig 3, which has the dependent variable "reward", the R-squared value, the measure of the symmetry of the data about the mean, F-statistic, T-statistic, P-values, Durbin-Watson measure for autocorrelation and Jarque-Bera measure of skewness.   Then we have done a cluster with K-means for three clusters, to find miners with similar rewards,this is used for analyzing the miners with a similar hash rate of computation. The three clusters are formed with red, green, and cyan colours and are represented in Fig 6. The cluster of three is formed with values shown below in Fig 7. The number of clusters is userdefined. The centroid of each cluster is determined and the distance of the observations (which are values, rewards) from the cluster is calculated and they are grouped in appropriate clusters. The elbow method of k means is shown below in Fig  8. As the number of clusters increases, sum of squared distances decreases in the elbow method.  The result of the polynomial regression is shown above in Fig 9. We noticed there was skewness in data and so we did a skew transformation as shown in Fig  10, Fig 11. and Fig 12. The miner's address was more skewed towards the left and the rewards were skewed towards the right. After the removal of the skewness of data with normalization, we have implemented polynomial regression results with a degree of 10.   The root mean square error for polynomial regression is 1.3368300755199987e+152. R2 score, which is a measure of how close are the data to the fitted regression line is 0.005977411651757691. The Polynomial regression with the degree of 10 is shown in Fig 13. The curvilinear fit of the polynomial regression fits the line of best fit so that the best approximation between the independent variable and the dependent variable is implemented.

Ether price prediction with LSTM
The data set we acquired from http://xblock.pro/mt/ had the following attributes, close: the close price in the period, date: the timestamp at the beginning of this period, high: the highest price in the period, low: the lowest price in the period, open: the open price in the period, quote Volume: the quote volume in the period, base volume: the base volume in the period and weighted Volume: the average price for those base volume and quote volume. The plot of the original values is shown in Fig 14 with x-axis as time in days and y-axis as the price of the Ether.,andtheprediction with LSTM for the future values with the given observed values is shown in Fig 15. The original dataset, training data set and testing data set is shown in the graph in   The LSTM model has predicted lower values than the original values of ether, cryptocurrency with themean square error of 36.45 %. The Ether market prices in the future are predicted by Facebook Prophet API in Fig  16. The column 'ds' has dates for which prediction has to be done using this API. The predicted value is shown in column 'yhat'.'yhat_lower' and 'yhat_upper' represent uncertainty intervals. The x-axis has the 'ds' column with the year as values and y-axis have ether, cryptocurrency values. The prediction of the ether prices, the trend of the ether price, weekly and yearly is shown in Fig. 17 and Fig. 18 respectively. The trend of the Ether prices, yearly seasonality, weekly seasonality is shown with Prophet API for the time series. The future data frame for a time series of 365 days is taken. Here prediction is done on a data frame with the date and closing values. Thus, Prophet API for forecasting components and future ether values prediction is implemented in python language. The mean absolute error of 1326.71 is obtained with Prophet API. The prediction of Ether price, which is a Time Series problem, is also done with LSTM.

Conclusion and Future work
Thus, we have devised a k means clustering algorithm for clustering the unique addresses of the miners based on their rewards. The linear regression and polynomial regression to implement prediction of the mining reward for the next block with the next sequential block number and time period is implemented. The LSTM model to predict the stock price of ether is also implemented. The future work would be the implementation of pattern mining algorithms, centralization versus decentralization traits in Ethereum mining pools with deep learning models.