Text sentiment classification method based on DPCNN and BiLSTM

. In recent years, deep learning network models have been widely used in the aspect of text emotion classification and have achieved remarkable achievements. The traditional TextCNN network can only extract local spatial features of sentences, while the improved DPCNN model has the ability to capture long-distance dependence of the text by deepening the network depth. At the same time, bi-LSTM model is characterized by learning temporal information of text. Therefore, this paper combines the two models, which can not only obtain the spatial local information of the text, but also further strengthen the ability to understand and learn the semantic association information of the text. Experimental results show that the classification effect of the model used in this paper is better than the single model.


Introduction
In recent years, with the rapid development of global information technology, people's enthusiasm for participating in the Internet has been increasing day by day, and actively expressing their views and opinions on the Internet, thus generating a large number of valuable text data.It is very crucial to conduct accurate emotion analysis on these samples.For example, by analyzing customers' comments, businesses can find out the problems in commodities and make targeted improvements.Relevant government departments can analyze public opinion ,so as to understand the real situation of people's livelihood.
Deep learning methods are widely used in the field of NLP and have achieved good results.Kim et al [1] .proposed a convolutional neural network CNN,which learns spatial local features of different positions of the texts by using convolution kernels of different sizes.Lai et al [2] .proposed the cyclic convolutional neural network (RCNN), which firstly extracted the temporal features of the text, and then extracted the local features of the temporal information, and achieved good results.Wang et al [3] .proposed the LSTM model by improving on the traditional RNN, which overcomes the existing problems of gradient explosion and gradient disappearance.Zhou et al [4] .gave full play to the respective advantages of CNN and LSTM and proposed the C-LSTM model.The experimental results show that it is better than the use of a single model.Conneau et al [5] .proposed the VDCNN network model.Although deepening the depth can improve information extraction ability, the increase of training parameters and training time will also have drawbacks.Wang J et al [6] .proposed the ResLCNN network for text classification,which avoids the problem of gradient disappearance in deep neural networks by constructing residual structure in multilayer network.The DPCNN model of the deep pyramid convolutional network proposed by this paper [7] although the model is deepened,has a simple structure and the ability to extract remote information, the classification effect is obvious at the representation level with words as semantic unit.

LSTM
Convolutional neural network CNN is mainly used to capture the local spatial features of sentences, and because the context of the text must be connected.Therefore, RNN, a recurrent neural network with the ability to remember previous information is proposed.But the problem of "long-distance dependence" arises, that is, due to the deepening of sequence length, the subsequent state cannot accurately remember the initial information.Therefore, Hochreiter et al [8] .proposed the LSTM model, which can effectively avoid the problem of memory information being forgotten with the increase of sequence by using several special memory units.See Figure 1.Forget the door: Input the door: Output the door: Candidate memory unit: Where:

DPCNN
Deep Pyramid Convolutional Network [7] is a real deep text classification convolutional network following TextCNN.See Figure 2.Firstly, multi-dimension filter is used to convolve each text fragment to generate embedding.In order to allow CNN to capture long-distance information, the author adopts equal-length convolution to keep the positions of words in the input and output sequences unchanged.Each word has more accurate and comprehensive semantics and is represented by context information.Although the semantic representation will be richer when the network is deeper, two-layer convolution is generally adopted in consideration of efficiency.In addition, the use of 1/2 pooling layer not only reduces the sequence length by half, but also doubles the perceived information.Finally, in order to avoid the problems of weight loss and gradient dispersion caused by deepening the network depth, the author uses the idea of residual connection in ResNet to alleviate the above problems to a large extent.

BiLSTM-DPCNN
First, the sentences are expressed randomly as a set of vector by word embedding, and then sent to two different types of deep learning models bi-LSTM and DPCNN respectively, which can not only capture the spatial local features of the text, but also enhance the ability to understand contextual semantic association information.Bi-directional LSTM can accurately represent the forward and backward semantic information of sentences.Then the two features are combined and send them to the fully connected layer network for dimensionality reduction classification, See Figure 3.

Experiment
This paper uses the hotel review data set ChnSentiCorp_htl_all (CSC for short) collected by Tan Songbo in China.There are 7766 data samples, which are divided into positive and negative comments.The training set and the test set are randomly divided according to the ratio of 9:1, and 6989 training sets and 777 test sets are obtained.The stuttering word segmentation tool is used to segment all text data and remove punctuation marks, repetitive items and other meaningless items for emotion classification.
In order to verify the effectiveness of the model, comparative experiments were conducted on four different models.See table 1,The experimental results found that the accuracy of the model presented in this paper could reach up to 86.74%, and in the number of 5 different convolution kernels, the 3 effects are all most.Table 2 shows the values of each evaluation index under the optimal accuracy of the model.

Table 1 .
Comparison of accuracy of different models.