Study and Analysis of Stock Market Prediction Techniques

. Stock marketplace is a complicated and demanding system in which people make more money or lose their entire savings. The stock market prediction having high accuracy yields more profit for stock investors. Stock market data is generated in a very large amount and it varies quickly every second. The decision making in stock marketplace is a very challenging and strenuous task of financial stock market. The development of efficient models for prediction decisions is very difficult because of the convolution of stock market financial data and should have high accuracy. This study attempts to compare existing models for the stock market. Various Machine learning methods like Long Short Term Memory (LSTM), Convolution Neural Networks (CNN) and Convolution Neural Networks – Long Term Short Memory (CNN-LSTM) have been used for the comparison. The models are estimated using conventional strategic measure: MAE (Mean Absolute Error). The measured low values indicates that the models are effective in predicting stock prices.


Introduction
Stock market forecasts are very crucial as it is utilized by many business people and general public. People will both benefit cash or lose their whole existence financial savings in stock market. Building precise models is tough as it relies upon more than one element along with news, social media information, fundamentals, manufacturing of the company, authorities bonds and country' s economics. Prediction model which consists only one factor might not be precise. There are several theories regarding stock market that have been conceptualizing over the years. Those theories try to expalin whether the market can be beaten or try to explain stock market nature. The market price of the stock integrates all the information about that stock in that particular time frame. The changing tendency of the stock prices has been regularly recognized as a total hassle inside the financial field [1]. Stock prices are suffering from diverse inner and outside factors, which includes home and overseas financial environment, global situation, enterprise prospect, economic information of indexed companies, and stock market operation [2,3]. The conventional analysis is primarily based on finance and economics which uses basic and technical analytical methods. Firstly, the fundamental analysis focuses on the inherent stock values and qualitatively analyses the external factors like interest rates, exchanged rates, inflation, industrial policy, listed companies finances and international relations, etc. which affects the stocks. Secondly, the technical analysis mostly focuses on the stock price direction, trading volumes and psychological expectation of investors that focuses mostly on stock market by using tools like k-line chart or by analyzing individual stock directions in the stock indexes. The above methods are stilled the most generally used methods for many companies and investors [4,5]. Traditional fundamental analysis accuracy is tough to be convincing because the prediction results are highly dependent on the professional quality of the analytics and the influential factors are in a long term cycle. The stock data have the traits of random walk in a financial time series. The accuracy of usage of only time series model is questioned due to unknown and high noise characteristics of the financial time series [6]. There are certain limitations on clearly predicting stock price trends when using linear time series forecast version or the neural network version. Currently, by combining the benefits of different methods to enhance the hybrid approach is now an economic improvement trend for time series deep learning [7]. Therefore, in order to make the better use of the time series, thorough investigation of the characteristics of the records and accuracy improvement of the stock price forecasting can be done. This paper compares stock prices forecasting approaches based primarily on CNN, LSTM and CNN-LSTM.

Literature Survey
The financial marketplace is noisy, non-parametric dynamic and there are mainly two types of forecasting techniques: Technical analysis technique and machine learning techniques [8]. The conventional econometric techniques or equations with parameters aren't appropriate for studying complicated large dimensional and noisy financial data. In the paper proposed by Aparna et al. [9], consideration of various parameters of various datasets as done, it was observed that Decision Boosted Tree was performing better when compared to SVM and logistic regression. The paper proposed by Vijh et al. [10], dataset of five companies from 2009-2019 having new parameters for better prediction, such as High -Low , Open-Close, 7 day average stock price, 14 day average stock price, 21 days average stock price , last 7 days standard deviation was used. Comparative analysis based on RMSE, MAPE and MBE results clearly shows that ANN provides better stock prediction when compared to RF. The paper proposed by Hyeong et al. [11], the model performance was validated on both different time periods with several metrics like MSE, MAE and RMSE. By analyzing the testing results it was observed that Arima-Lstm hybid performs far better when compared to other financial models. The paper proposed by Pushpendu et al. [12] it mainly focuses on application of Random Forest and LSTM to predict stock prices directional movements. It was observed that the LSTM outperforms random forests. The paper proposed by Mehtabhorn et al. [13] it basically compares the various types of machine learning techniques and algorithm which is used in finance and stock market prediction. The paper proposed by Wenjie Lu et al. [14], The CNN-LSTM model is used to predict the closing price of a stock price the next day. Experimental results show that CNN-LSTM have highest accuracy and best performance compared to CNN, RNN, LSTM< MLP and CNN_RNN. The paper proposed by Nusrat Rouf et al. [15] comparisons of ANN, SVM, NB and DNN was carried out. SVM was the most popular technique used for SMP. It was observed that ANN and DNN performs more accurate and provides faster prediction. The paper proposed by Jingyi Shen et al. [16] used a comprehensive deep learning system. Prediction was carried on the datasets of Chinese stock market using the LSTM models. It was observed that the LSTM model achieved high prediction accuracy and outperformed the major models. The paper proposed by D. Wei et al. [17], the prediction was performed on the datasets using various LSTM models. It was observed that Vanilla LSTM, Stacked LSTM and Bidirectional LSTM are the commonly used LSTM models. BI-LSTM was having greater accuracy and low error when compared to other models. The Paper proposed by Sheng Chen and Hongxiang He et al. [18], CNN model was used for making Stock prediction which was perform using conv 1d function to process 1d data in convolution layer. It was observed that if source data is sequential then the model is efficient and can even be used to make predictions. The paper proposed by Wu et al. [19] with leading indicators prediction was performed on dataset using hybrid CNN-LSTM model. It was observed that CNN-LSTM model was achieving greater accuracy when compared with CNN and LSTM models. The paper proposed by Xuan Ji et al. [20], MAE, RMSE and R-square values are calculated to evaluate the performance of various prediction models. It was observed that CNN-LSTM model out performs well when applied on various stock prices. The paper proposed by Vanukuru, Kranthi et al. [21], the SVM model was used for predicting the stock index movements. It was observed that model generates higher profit as compared to selected benchmarks. The paper proposed by A M Pranav et al. [22], the sentimental analysis was performed on stock prices to forecast stock price variations. It was observed that machine learning models were performing well on various datasets

Convolution Neural Network (CNN):
Sheng Chen [16], proposed a CNN model for creating stock prediction that use the conv1d function to process the 1D data in the convolution layer. CNN is a feedforward neural network that performs very well in image processing and natural language processing. If implemented correctly, it can even predict forecasting of the time series. The local perception and weight distribution of the CNN can significantly reduce parameter range thereby improving the performance of model learning. The CNN as shown in Figure 1, particularly consists of a convolution layer as well as the pooling layer. Each convolution layer consist of various convolution kernel and its formula is shown in equation (1).
where ov is the output value after convolution, tanh is the activation function, vi is the input vector, kw is the convolution kernel weight, and vb is the convolution kernel bias. The CNN model extracts the features map with varying details across convolution layers of stock data. The stock data includes stock market performance of assets over the period of IPO (initial public offering -private companies offers its share to public in new stock issuance) introduction to current date. This is inherently temporally interdependent data which has been discovered in the EDA (Exploratory Data Analysisthe process of inspection of the dataset to find patterns, irregularity and structure hypothesis based on the comprehension of the stock dataset) phase. This temporal dependency is extracted as a 2D feature map by CNN which in turn is passed through dense layers to generate single continuous output that is target variable which is open price of the stock.  As shown in Figure 2, the LSTM memory cells consists of three parts: a forget gate, an input gate, and output gate. LSTM is used in analytical approach because LSTMs can store important information in the past and forget about other information. The input gate adds cell state information, the forget gate removes the information that is not needed in the model and the current gate selects the information that is displayed as an output. LSTMs combines the result of previous gate and the current gate and further, predicts the next state using the gain correlation. LSTM is suited to extract long term dependencies in sequential data with temporal dependencies as it eliminates vanishing and exploding gradient in the backpropagation phase. The temporal dependency of stock market data is modelled by unfolding the data in time and passing through LSTM which predicts the output at next time step. The last price, volume and date provides input to the model and the open price is produced as the output of the target variable.

Convolution -Long Short term Memory Hybrid (CNN-LSTM):
The paper proposed by Wenjie Lu [14], states that a CNN-LSTM model is used to predict the closing price of the stocks of the next day. This method takes opening price, highest price, lowest price, closing price, volume, turnover, ups and downs, and changes in stock data as inputs, uses CNN to characterize the input data, and uses LSTM output to extract. It then learns the characteristic data and predict the closing price of the stock price the next day.
In CNN, it has the notions of listening to the maximum apparent features within side the line of sight, so its miles is extensively utilized in feature engineering. LSTM has the property of increasing overtime and is widely used in time series. The model structure diagram is shown in Figure 3. The CNN captures the spatial dependencies in the stock data inherent to the images. LSTM resolves the issue of vanishing or exploding gradient associated to the long term temporally dependent stock data. The combination of CNN and LSTM is tested in the predictive model. The CNN layers are used as initial layers which extracts the features in the sequential stock data and LSTM is then cascaded to incorporate long term dependency preservation in the features extracted by the CNN layers. At last, fully connected layers have been added to give the single continuous result. The models analyses the dataset in order to remove null values in the columns or replace them with mean or median values and also to detect the relationship between the parameters to determining the important parameters that affects the stock prices. The models handles the categorical and date type data by identifying the columns/attributes datatypes. The date data type is handle by diffusing it into three components. This generates the report regarding the dataset which is returned to the models which is used for further processing. The dataset is divided into training dataset (80% of original dataset) and testing dataset (20% of original dataset). The training of the models is done on the training datasets whose performances (models predicted values on testing dataset) are compared to the testing datasets actual values. This is done by the models which returns the performance metric calculated on the testing dataset. The models then gives the output of the performance report of all the three models.

Results and Discussion
In the proposed method diverse recognized datasets are used which are TATASTEEL, TATAMOTORS, VEDL, BHARTIAIRTEL and ITC etc. These companies dataset are decided from various diverse sectors such as oil, telecommunication and from many others.
in which tv is true value and pv is predictive value.   The forecasting of the open price of TATASTEEL using LSTM models shows that its results are better than CNN but lags behind.    Figures 10-15

Conclusion
This report proposes a CNN, LSTM and CNN-LSTM based models to forecast the stock prices data according to the sequential traits of the stock prices data. The models uses stock datasets parameters such as close price, open price, high price, low price, previous closed price, turnover and volume. These parameters are used as the input to the models for training and testing for stock price prediction. The experimental results shows that in most of the cases, the CNN-LSTM performs very much better than CNN and LSTM models.