Rumor detection based on graph attention network

. At present, most of the existing rumor detection methods focus on the learning and fusion of various features, but due to the complexity of language, these models often rarely consider the relationship between parts of speech. This paper uses graph attention neural network model to learn text features and syntactic relations to solve this problem. It uses node attention collection text feature and edge attention collection relationship feature for syntactic dependency tree, and node attention and edge attention to enhance each other. Finally, the proposed method is verified on Twitter and Weibo data sets. The experimental results show that the proposed method has greatly improved the early detection and accuracy of rumors.


Introduction
Rumor detection is to identify whether the information on the social media platform is a rumor. It has the characteristics of fast propagation, difficult identification and great harm. It has a wide application prospect in social media and other fields. Because rumors usually spread quickly, they may reach a large number of users before experts or authoritative organizations draw conclusions. Therefore, there is an urgent need to automatically detect rumors in the early stage in order to minimize its negative impact The earliest research on rumor detection was based on traditional machine learning [1][2], that is, rumor detection is transformed into a binary classification problem. Firstly, the rumor features are extracted, then the machine learning model is used for modeling, and finally the rumor features are input into the model for training to realize rumor detection. Literature [3] proposed a model based on decision tree classifier, which uses the phrases of query and correction classes extracted from the message to cluster, and then constructs a decision tree according to the statistical characteristics to realize rumor detection.
Most of the existing rumor detection methods focus on learning and integrating various features for detection [4]. These methods are easy to lose time and other information. Some scholars use cyclic neural network to realize rumor detection, considering the forwarding comment information and timing characteristics of the message in the transmission process, but these methods do not save the hierarchical information of the transmission structure. Other scholars [5][6] use bidirectional graph convolution neural network to obtain the propagation structure. This method considers a top-down rumor propagation and bottom-up rumor diffusion to capture and identify rumors. This method does not consider the dynamic change process of propagation structure.
In order to improve the accuracy of rumor detection, this paper proposes a graph neural network model, which uses the dependency parsing package to obtain the syntactic dependency tree, and uses the node attention and edge attention of the graph attention model to collect text features and syntactic dependencies respectively. It is tested on the public standard Twitter and Weibo data sets, which greatly improves the early detection and accuracy of rumors.

Graph neural network summary
With the vigorous development of deep learning technology, GNN has achieved great success in the representation and learning of graph structure data [7]. Generally speaking, most existing GNN models follow the neighborhood aggregation strategy. A GNN layer can be defined as shown in Formula 1: Where ℎ i l is the node representation of node i in layer L, and N i is the local domain set of node i. AGGR is an aggregate function of GNN and has many implementation methods. GNN has the ability to capture the long-distance interaction between entities, and has good performance in text classification. Most of the current methods are to establish a corpus level document word map and try to classify documents through semi supervised node classification. Although they have achieved success, most of the existing methods have computational defects. At the same time, the simplicity of using graphs to model text is greatly limited. Therefore, how to improve the computing power of model expression is an important and must be solved task.

Graph attention network
The input of this model is the syntactic dependency tree generated by the syntactic dependency parsing package. As shown in Figure 1, its node is the text of the news, and the edge is the dependency. After encoding it, it passes through an LSTM layer, and then uses the graph attention network model to learn it. The node attention learns its text features, the edge attention learns the dependency, enhances each other, and finally classifies the news. The overall framework is shown in Figure 2. The graph shows that the neural network aggregates the representation of neighborhood nodes along the dependent path. However, this process does not take into account ITM Web of Conferences 47, 02033 (2022) CCCAR2022 https://doi.org/10.1051/itmconf/20224702033 dependencies, which may lose some important dependency information. Intuitively, neighborhood nodes with different dependencies should have different effects. This model uses additional relationship headers to expand the original graph attention network, and uses these relationship headers as relationship level gates to control the information flow from neighborhood nodes.. In order to support the representation learning of the constructed dependency tree, this paper uses GAT model, which uses node attention aggregation, neighborhood node representation and edge attention aggregation domain dependency respectively, and iteratively updates each node representation, such as formula (2), (4).
ǁ k=1 K represents the splicing of vectors from X1 to XKW k l is a trainable weight matrix.aij indicates the attention coefficient of node nodej, and the calculation is as shown in formula (3): This model uses edge attention to highlight the next layer representation of learning node node i. This process can be expressed as: W k l is a trainable weight matrix.B ij lk indicates the attention coefficient of the upper edge ej of node vi, which can be calculated by formula (5) (6).
Where a T is a weight vector for calculating dependencies, and ||is a join operation. This model can not only capture high-order word interaction, but also learn the dependency of dependency tree.

Datasets
The three data sets of Weibo [8], twitter15 [9] and twitter 16 [9] are all from real social media. The Weibo data set contains two binary Tags: false rumors (f) and true rumors (T). The twitter15 and twitter16 data sets contain four Tags: non rumors (n), false rumors (F), true rumors (T) and unverified rumors (U). The tags of each event on Lang Weibo are marked according to the Sina community management center to record various error messages. Each event tag of twitter 15 and twitter 16 is marked according to the authenticity tag of rumor website articles. The statistical data of these three data sets are shown in Table 1.

Experimental environment
The syntax dependency tree is processed by the method in [10] to form a. The random gradient descent method is used to update the parameters of the model, and the Adam algorithm is used to optimize the model. The hyperparameter of the balance loss function is = 0.1. This experiment is carried out in Python 3 6, pytorch1. 2.0 GPU. The learning rate is set to 0.0005 and the batch processing is set to 128. The data set is randomly divided into five parts, and 5x cross validation is carried out to obtain more generalized results. The training process is carried out for 200 iterations. When the verification loss stops for at least 10 times, the application ends the training.

Experimental result
We compare some traditional machine learning and deep learning models with the model proposed in this paper. The experimental results of these three data sets using different methods are shown in Table 2. significantly better than that of manual feature extraction method. Because the deep learning method can learn the high-level representation of rumors to capture effective features. This shows the importance and necessity of studying deep learning in rumor detection. The proposed method is superior to GRU-1 method in all performance aspects, which shows the effectiveness of rumor detection combined with syntactic information.

Early rumor detection
Early rumor detection is to detect rumors in the early stage of transmission, which is another important index to evaluate the quality of this method. First, we build an early detection task. We set the deadline for rumor detection, and only use the posts published before the deadline to evaluate the accuracy of the proposed method and mainstream methods. The early performance of all methods fluctuates more or less. This is because with the spread of posts, there is more semantic and structural information, and noise information increases. The method of this model can get good results after being published on twitter data set for about one hour, which shows that the performance of this model in the early detection of rumors is very superior.

Conclusion
In order to improve the accuracy and timeliness of rumor detection, the author proposes a rumor detection method based on a graph attention network model, which uses two attention mechanisms to aggregate text features and syntactic dependencies respectively, and finally enhance each other. The experimental results show that this method does not urgently improve the accuracy of rumor detection, and the effect is very good in early rumor detection. In the next work, we will pay more attention to the external factors affecting rumor detection. At the same time, applying graph neural network to rumor detection is also a promising direction.