Knowledge Reasoning Based on Neural Tensor Network

: Knowledge base (KBs) is a very important part of applications such as Q&A system, but the knowledge base is always faced with incompleteness and the lack of inter-entity relationships. Knowledge reasoning is an important part of the construction of knowledge base, and is intended to find a way to supplement these missing relationships. This paper attempts to explore the model complexity of neural tensor network, a very important method of knowledge reasoning, and the reasoning accuracy. By increasing the number of slices in the tensor network layer, the number of parameters to be trained by the model is increased, thereby increasing the complexity of the model. The experimental results show that the number of slices is improved, which is helpful to increase the reasoning accuracy of the model, while the time consumption does not show obvious growth. The accuracy of the model on WordNet and FreeBase increased 2% and 3.2% respectively.


Introduction
The knowledge base is intended to store a large amount of information in a structured way, and the data is organized in a knowledge-dependent manner in which the nodes represent the entities and the edges correspond to the relationships among the entities.knowledge bases are considerably useful resource for many nature natural language processing tasks such as information retrieval, recommendation systems and so on.However, the knowledge base, including some well-known knowledge base such as Yago [3], WordNet [4] or the Google Knowledge Graph, suffer from incompleteness and a lack of auto reasoning capability.Hence, learning new facts based on the knowledge bases is an essential way to improve them.Much previous work (probability graph model and inductive logic programming and Markov Logic Network et al.) has focused on completing existing knowledge bases using patterns or classifiers applied to external large text corpora.However, not all common knowledge that is obvious to people which we called common sense, is expressed in Text.For example, we find a special bird, without special information, we know that it should have wings and will fly and other information, because these can be obtained through our common sense, without the need for additional text information Obtain.Therefore, learning new relations(triples) based on knowledge bases has been increasingly popular.Mukherjee .et al(2013) [5] used a matrix tri-factorization approach to reasoning new facts in knowledge bases.Socher et al.(2013) [1] introduced a neural tensor network to extracting common sense, which is the base of this paper .Guoliang Ji et al(2014) [2] applied a neural tensor network to reasoning over relations based on Chinese knowledge bases.
In this paper, I will explore the relationship between the number of slices and the reasoning accuracy of the model through specific experiments.NTN transforms the entities in the knowledge base into corresponding vectors , which can capture facts and their certainty about that entity ,to complete the initialization of the entity vectors.For sharing statistical strength among the entities that contain similar substrings, each entity is represented as the average of its word vectors.Each relation corresponds to a group of parameters of neural tensor networks.The entities and relationships can interact well through the tensor in the neural tensor layer of networks.
The main contribution of this paper is to explore the relationship between model complexity and reasoning accuracy through experiment, and to make a reference to the choice of model complexity when using this model to reason.The paper is organized as follows.Section 2 and section 3 introduce the Neural Tensor Network model and some related work respectively .Section 4 analyzes the model complexity.Section 5 reports the results of experiments.Section 6,I summarize my contribute and consider the further work directions.

Related Work
Here are the two parts involved in neural tensor networks and natural language processing: semantic vector space and depth learning.

Semantic Vector Space
Mapping the corresponding words to the lowdimensional vector semantic space is an important technique for natural language processing related tasks.It is also an important basis for other natural language processing tasks, such as POS tagging, naming entity recognition .Word vector related technology has been very mature.Neural language models (Bengio et ,2003; Collobert and Westion,2008) have been shown to be very powerful at language modeling, a task where models are asked to accurately predict the next word given previously seen words [7].From the language model in 2003 to the 2008 word2vec and so on a series of word vector training technology development, makes the implied semantics of the word quite perfect.

Deep Learning
Deep learning technology in the visual and language technology has been quite mature, but in terms of text, relatively new.Deep learning technology in other aspects of the major achievements also makes the relevant personnel pay more attention to its natural language processing direction of the study.Nickle [6] applied a tensor factorizaton method for multi-relational learning, where a knowledge base was regarded as a three dimensional tensor.Bowman [7] introduced a recursive neural tensor networks model on a new corpus of constructed examples of logical reasoning in short sentences.Bordes et al [13] deal with the issue of weak interaction between entity vectors through multiple matrix products followed by Hadamard products.

Neural Tensor Network
This section introduces the neural tensor network model(NTN).NTN reasons over database entries which existed in knowledge base already by learning vector representations for each entity.We can see from Fig. 1, each relation triple is described as (e1,R,e2) and database entities are given as input to that relation's model.The model returns a high score if the relationship exist in database and a low one otherwise.This will answer any fact ,whether implicit or explicitly mentioned in the database, with a certainty confidence.
Next, I will introduce the model from two aspects: model structure, training goal.

Model Structure
In this section, we introduce the NTN model structure.Firstly, we need to define some crucial parameters indexed by R for each relation's scoring function .If we use e1,e2 be the vector representations of any two entities in the database.Then the Neural Tensor Network (NTN) computes the confidence of a given triple (e1,R,e2) through a bilinear tensor layer which relates two entity vectors across multiple dimensions and the formula as follow.

Training Objective
In the data set, the samples are represented in the form of triples.triple described as (e1,R,e2), where ei(i = 1, 2),R represent entities and relationship respectively, corresponds a tensor network whose inputs are ei(i = 1, 2).The main idea is that each triplet in the training set which be regard as positive samples and represent it with ܶ ோ = (݁1 ோ , ܴ, ݁2 ோ ), should receive a higher score than a triplet in which one of the entities is replaced with a random entity ,where there triplets are regards as negative samples and represent it as ܶ ோ = (݁1 ோ , ܴ, ݁ܿ ோ ).We minimize the objective as follow: The loss function is a hinge-loss function and is followed by a regular term to prevent over-fitting.We denote the set of all relationships' NTN parameters by Ω = ܷ, ܸ, ܹ, ܾ, ‫,ܧ‬ where ܷ, ܸ, ܹ, ܾ are model parameters and ‫ܧ‬ is word vectors.

Analysis of Neural Tensor Network complexity
Some experiences of machine learning and deep learning tell us that it is possible to increase the complexity of the model and increase the performance of the model when the model is not over-fitting.Based on this argument, here I try to explore the model complexity and the reasoning ability of the neural tensor network model through experimental comparison .Here we assume that the word vector and the dimension of the entity vector take 200,݁݅ϵܴ ଶ .then the number of parameters of the neural tensor network layer is calculated as follows: The number of parameters of tensor ܹ ோ [ଵ:] ∈ ܴ ௗ * ௗ * is 40000*k, the number of parameters in ܸ ோ ܴ߳ ଶ * ௗ is 2*k.It can be seen that the most important factor affecting model complexity lies in tensor.The number of parameters that need to be trained in the model represents the complexity of the model.Equation (3) (4) shows the relationship between the total number of model parameters and the number of slices in the tensor network

Expriments
Experiments are conducted on both WordNet [1] and FreeBase [2] predict whether some relations hold or not using other facts in the database .Before reasoning with Neural Tensor Network, We have a very important step is to convert the entity into the corresponding entity vector.In this paper , I will use a simple random initialization method for entity Vector initialization.

Entity to Semantic Vector
Firstly, we simply initialize each word vector ‫ݔ‬ϵܴ ଶ by sampling it from a zero mean Gaussian distribution : ‫,0(ܰ~ݔ‬ ߪ ଶ ) .Next, the entity vector is converted to the average of the vector of the constituent words.On the right side of Figure 1, we can see that the entity vector of the input entity Bengal tiger is: The advantage of doing so is that entities with the same substring can share the same part of the semantics.For example, Bengal tiger and South china tiger.The two entities have the same sub-string tiger, using the formula (5), can make two entities have the same semantic part.

Datasets
Table II

Environment of Experiment
Table III shows the operating environment on which this experiment relies.As the complexity of the model and the increase in the size of the data, the consumption of resources must be an important factor in our consideration.So it is necessary to discuss the environment in which the whole program is running.This paper will use the time as an important indicator to reflect the complexity of the model.

Experimental Results
In this section, I will use the chart comparison and the use of python open source package matplotlib drawing way to visualize the results.In the first part, I will illustrate the complexity of the model through the entire program run time.Next, observe the change in the reasoning accuracy of the relationship reasoning in WordNet and FreeBase as the complexity of the model increases.In this paper, I will be the most common slice (slice = 3) as the baseline.
In table IV and table V ,I illustrate the complexity of the corresponding model by the number of slices, the total time consumed by the program run, and the number of model parameters .Where the reasoning accuracy is the average reasoning accuracy of all the relationships in the corresponding data set.Through the results in the table, we can see that with the increase of the number of slices, the running time of the model shows a linear increase rule.The accuracy of the reasoning is also changing.The reasoning accuracy of the FreeBase and WordNet are reach the peak when the slice is 6 and 7 ,respectively.the optimal reasoning results were 0.8242, 0.8532, compared to the baseline, respectively, increased 0.02 and 0.032.Then with the increase of the slice, the reasoning accuracy is reduced, which indicates the model has been fitted.Fig. 3 and Fig. 4 visualize the results by drawing, showing the specific changes in the reasoning accuracy of the various data relations in the two datasets as the model complexity changes .From Fig. 3 we can see that "similar to" and "domain topic" fluctuate the most, affected by the complexity of the model.It can be seen from Fig. 4 that the reasoning of "children", "place of birth", "place of death" and so on is still the least accurate.The greatest impact on model complexity is "children " and "place of birth".

Conclusion
The complexity of the model can be determined by choosing the appropriate super-Parameters, and the appropriate complexity is helpful to improve the performance of the model.In the neural tensor network, the number of slices in tensor has the greatest influence on the complexity of the model.Hence, I started from the number of slices to explore the relationship between model complexity and model performance.Finally, through experiments, we know that the appropriate increase in the number of slices, is indeed conducive to improving the model of reasoning accuracy, of course, this also inevitably increases the program running time .The theme of this paper is to explore the effect of the complexity of the neural tensor network model on the reasoning accuracy ,So all of the entity vector initialization using a random initialization method .Richard Socher in his paper [1] through the experimental show that the use of pre-training method to initialize the entity vector, is very helpful to increase of the reasoning accuracy .Using word2vec [8], glove even more complex language model, such as LSTM, for word vector pretraining, will become the focus of my next step.

Figure 1 .
Figure 1.This figure shows how words in a knowledge base are mapped to a low-dimension vectors and averaged to construct entity vectors.Entity relation triples are inputted into a neural tensor network, which calculates the confidence that the two entities are in a relationship.

Figure 2 .
Figure 2. A visualization of the parameters of Richard Socher's Neural Tensor Network Model with k = 2 slices.

Table 1 :
The parameters of the model that need to be trained with the increase of the slice

Table 2 :
is the data set used in this paper.For WordNet, I use 112,581 relational triplets for training.In total, there are 11 different relations which including 38,696 unique entities.In addition, the experimental will use 2609 relational triplets for validation , and use 10544 relational triplets for final test.For FreeBase ,I use 316232 relational triplets for training.In total, there are 75043 unique entities in 13 different relations.In contrast to the previous practice, some relationship of FreeBase is considered to be difficult to reason (such as place of birth, place of death, location, children, spouse, parents) has not been removed in this paper.Because the need to explore here is the relationship between model complexity and accuracy.The statistics for WordNet and Freebase including number of different relations #R.

Table 3 :
The environment of experiments

Table 4 :
The relations between number of slices running time and reasoning accuracy (WordNet)

Table 5 :
The relations between number of slices ,running time and reasoning accuracy (FREEBASE) Figure 3. Changes of 11 relationships in WordNet