Analysis of hyperparameters in Sentiment Analysis of Movie Reviews using Bi-LSTM

. Movie reviews are an important factor in determining a film’s success because instead of depending solely on the number of views as a parameter for the success of the movie, movie reviews are used to acquire additional insights into the movie. Existing systems use LSTM for sentiment analysis but there is no study available how various hyperparameters a ff ect the performance of the model. Bi-LSTM along with dropout layers provide good accuracy in sentiment analysis. The suggested method outperforms CNN and Natural Language Toolkit in terms of accuracy.The proposed model is tested using di ff erent hyper parameters including dropout rate,number of Bi-LSTM layers and Bi-LSTM nodes. 64 LSTM nodes, 2 Bi-directional Layers, and a 0.2 Dropout rate should be used for optimal accuracy. E ff ect of di ff erent text vectorization algorithms and activation functions was also studied. The combination of Tf-idf text vectorization and the ReLU activation function yields the best results.


Introduction
Sentiment Analysis on movie reviews can help readers make more informed decisions by providing a more comprehensive description of the film [1]. Sentiment analysis is used to get the performance of the movie based on user reviews. Any sequence of words that is usually associated with a specific opinion is referred to as a sentiment. Sentiment analysis not only focuses on polarity but also on feelings, emotions, and intentions. This type of analysis can be extremely beneficial in marketing, customer feedback, and customer service. In this scenario, using machine learning to learn and predict whether a movie review is positive or negative, the analysis is the act of looking at data and generating judgments. Two main strategies for sentiment analysis are rule-based and machine learningbased. The machine learning model identifies sentiment based on words and their order, whereas the rule-based technique employs sentiment scores. News and social media datasets can also be used for sentiment analysis.
Long Short-Term Memory (LSTM) networks are a type of recurrent neural network that may learn order dependence in sequence prediction challenges [8]. LSTM preserves historical information by establishing a memory unit which can substantially reduce the long-distance reliance problem of Recurrent Neural Networks. The LSTM is made up of three sections, each of which serves a distinct purpose. The first portion determines whether the previous timestamp's information should be remembered or is irrelevant and can be ignored. The cell attempts to learn new information from the input in the second section. Finally, the cell passes updated information from the current timestamp to the next timestamp in the third component. The gates are the three components of an LSTM cell [8]. The Forget gate is the first part, the Input gate is the second, and the Output gate is the final part. Future information is ignored by LSTM. BiLSTM [3] adds another LSTM layer to the mix, reversing the flow of data. At a nutshell, it means that in the additional LSTM layer, the input sequence flows backward. The outputs from both LSTM layers are then combined in a variety of methods, including average, sum, multiplication, and concatenation. BiLSTM can provide a more relevant output by merging LSTM layers from both directions, because every component of an input sequence carries information from both the past and the present [3].
Section II of the paper gives a brief description about the literature survey done on the existing systems and comparative analysis of the survey.The problem definition and the need for such a system is covered in section III.Section IV discusses the proposed system and architecture of the system.The result of the system is covered in Section V .Section VI draws the conclusion of the paper.

Related Work
To train machines to perform sentiment analysis, a variety of techniques and complicated algorithms are used. Each one has their benefits and drawbacks. However, when used in tandem, they can provide extraordinary outcomes. In recent years, there has been a lot of research on sentiment analysis of text based on movie reviews. For sentiment analysis several methodologies were used. An ML-based model is proposed by Saeed Mian Qaisar [1], in which the imported dataset is first partitioned, then vectorized, and then submitted to the LSTM model. Three layers are used here: a 50-node first layer, a 101-node second LSTM layer, and a final layer with outputs. This paper uses a 50k movie review dataset from IMDb for its research. To identify IMDb movie reviews, Aswathi Sajeevan and Lakshmi K S [2] utilize two separate deep learning models: the first is a hybrid LSTM-CNN, while the second is a hybrid CNN-LSTM. This study uses the IMDb movie reviews dataset, which includes 1000 positive and 1000 negative reviews. Guixian Xu, Yueting Meng, Xiaoyu Qiu, Ziheng Yu, and Xu Wu [3] use the TensorFlow library, as well as Dropout and TF-IDF, to develop a BiLSTM model. Comment text vectors are created from the textual data. ReLu is the activation function employed. The paper is based on a crawl of 15000 hotel comment texts from the Ctrip website.
Anwar Ur Rehman, Ahmad Kamran Malik, Basit Raza, and Waqar Ali offer a model with recurrent and convolutional neural networks in their paper [4]. It starts with a corpus and then preprocesses it. Sentiment Analysis was then performed using a word embedding layer in conjunction with CNN and LSTM models. The categorization layer is then applied at the end. The IMDb movie reviews dataset and the Amazon movie reviews dataset were used. The Nave Bayes algorithm, which is a supervised learning approach as well as a statistical method for classification, is used in this study by G. P Saradhi Varma, A.Govardhan, and I.Hemalatha [5]. It also employs the maximum entropy algorithm, which is a machine learning technique for classification and prediction that may be used in a variety of situations.
The text classification approach is utilized in this study by Madhav Singh Solanki [6], which is a supervised learning approach to recognise a specific type of writing, such as a blog, book, web page, news item, or tweet. Another rule based model is used that employs lexicon to recognise certain terms in the text. Rule-based techniques typically define a set of rules in a scripting language that denotes subjectivity, polarity, or a point of view. Machine learning techniques, POS tagging, dependency parsing, and other techniques are utilized in this article by Amlan Chakrabarti [7] to identify the aspects and user opinions linked to the aspect.The input layer, which consists of word embedding characteristics for each word in the phrase, and two convolution layers make up a seven-layer deep CNN architecture.

Proposed System
Gauging the performance of the movie based solely on the views of the movie can be mis-leading and understanding the performance of the movie based on user comments will be more realistic. Therefore, a system is proposed that could perform sentiment analysis on the comments of the users and provide users with the rating of the movie based on the comments. The problem is best framed as singlelabel classification, which generates a numerical measure of positivity and negativity. The comment given by the user will be mapped to a scale of 1 to 5,1 being the most negative and 5 the most positive. Real-life scenarios like emojis, slang language and spelling mistakes will be taken into consideration while doing the classification. The use of LSTM model to solve sentiment analysis is proposed.

A. System Architecture
The proposed system architecture has 5 phases beginning from text vectorization,then model building and evaluation followed by input sanitization. Based on the input, the model calculates a sentiment score.

C. Bi-LSTM
The proposed system uses Bi-LSTM. LSTM networks are designed to overcome the long-term dependency problem faced by recurrent neural networks. LSTMs can process entire sequences of without having to handle each point in the sequence separately, instead, preserving useful information about previous data in the sequence to help with the processing of new data points. LSTMs employ a number of 'gates', which regulate how data in a sequence enters, is stored in, and exits the network. A typical LSTM has three gates: a forget gate, an input gate, and an output gate. These gates are each their own neural network and can be thought of as filters. A Bi-LSTM is a sequence processing model made up of two LSTMs, one of which takes the input in one way and the other in the opposite direction. BiLSTMs effectively improve the quantity of data available to the network, providing the algorithm with more context.

Results and Discussion
Dataset: • Using the imdb reviews dataset from tfds(TensorFlow datasets) which contains 50,000.
• Convert data into a batch size of 64 for batch training.
• Perform text vectorization on the data to convert text data into vectors of numbers.
There are 8 layers in model architecture. The first layer is the text vectorization layer followed by an embedding layer. Then there are two pairs of Bidirectional and Dropout layers. Finally, there is a dense layers as the output layer. In Figure 4.1, the X-axis represents the dropout rate ranging from 0.1 to 0.6, and the Y-axis represents the accuracy. The accuracy becomes optimum at 0.2. Further training accuracy increases, but testing accuracy decreases due to over fitting. So 0.2 dropout rate is optimum. In Figure 4.2, the X-axis represents the number of Bidirectional LSTM layers ranging from 1 to 4, and the Y-axis represents the accuracy. In this case the accuracy becomes optimum at 2 layers. On further increasing number of layers both training and testing accuracy decreases. In Figure 4.3, the X-axis represents the number of LSTM nodes ranging from 16 to 512, and the Y-axis represents the accuracy. The accuracy becomes optimum at 64 nodes. As is evident from Table 4.1, it can be inferred that using the proposed model the training and testing accuracy on both the datasets are within the same range. Both the datasets are of different sizes, therefore the accuracy varies slightly. If two datasets of similar sizes are taken then one could get near the same accuracy.  The activation function specifies a neuron's or node's output in response to a single or multiple inputs. Selection of activation function is important for the performance of the model. As shown in Table 4.3, Softmax has slightly poor performance, otherwise all other activation function has similar performance, relu having the best training and testing accuracy. Upon testing on multiple sets of hyperparameters ,it was found that 64 LSTM nodes,2 Bi-directional Layers and a 0.2 Dropout rate provides optimum accuracy. Tf-idf text vectorization and ReLU activation function provides best accuracy. Using these parameters the proposed model has the testing accuracy of 86.56% while existing systems using CNN provides 80% accuracy. Existing systems using Natural Language Toolkit achieves an accuracy of 81%.

Conclusion
The proposed binary classification model could rate comments as positive or negative using Bi-LSTM. For optimal accuracy 64 LSTM nodes,2 Bi-directional Layers and a 0.2 Dropout rate should be used. Tf-idf text vectorization along with ReLU activation function provides best results.Accuracy of the proposed system is better than of CNN and Natural Language Toolkit. The model was tested on two different datasets and achieved comparable results.