Named Entity Recognition: Resource Constrained Maximum Path

Abstract. Information Extraction (IE) is a process focused on automatic extraction of structured information from unstructured text sources. One open research field of IE relates to Named Entity Recognition (NER), aimed at identifying and associating atomic elements in a given text to a predefined category such as names of persons, organizations, locations and so on. This problem can be formalized as the assignment of a finite sequence of semantic labels to a set of interdependent variables associated with text fragments, and can modelled through a stochastic process involving both hidden variables (semantic labels) and observed variables (textual cues). In this work we investigate one of the most promising model for NER based on Conditional Random Fields (CRFs). CRFs are enhanced in a two stages approach to include in the decision process logic rules that can be either extracted from data or defined by domain experts. The problem is defined as a Resource Constrained Maximum Path Problem (RCMPP) associating a resource with each logic rule. Proper resource Extension Functions (REFs) and upper bound on the resource consumptions are defined in order to model the logic rules as knapsack-like constraints. A well-tailored dynamic programming procedure is defined to address the RCMPP.


Introduction
Information Extraction (IE) is a task of Natural Language Processing aimed at inferring a structured representation of contents from unstructured textual sources.In this field, Named Entity Recognition (NER) has gained the attention of researches for identifying and associating atomic elements in a given text to a predefined category (such as names of persons, organizations and locations).Considering a text as sequence of tokens x = x 1 , . . ., x N , the goal is to classify each token x i as one of the entity labels y j ∈ Y for originating a tag sequence y = y 1 , . . ., y N .
Nowadays, the state-of-the-art model for tackling the NER task is represented by linear chain Conditional Random Fields (CRFs) [3], which is a discriminative undirected graphical model able to encode known relationships among tokens (observations) and labels (hidden states).In order to efficiently enhance the description power of CRFs, two main research directions have been investigated to enlarge the information set exploited during training and inference: (1) relaxing the Markov assumption [6] to include long distance dependencies and (2) introducing additional domain knowledge in terms of logical constraints [2,5].Considering that the relaxation of the Markov assumption implies an increasing computational complexity, in this paper we focused on the second research direction by formulating the inference task as an Integer Linear Programming problem.In particular, the standard CRFs inference process is enhanced by two main contribution: (1) introducing "extra knowledge" related to semantic constraints about the token labels, and (2) modelling the label assignment problem as a Resource Constrained Maximum Path Problem (RCMPP).

Background
A CRF [3] is an undirected graphical model that defines a single joint distribution P (y|x) of the predicted labels (hidden states) y = y 1 , ..., y N given the corresponding tokens (observations) x = x 1 , ..., x N .Linear Chain CRFs, in which a first-order Markov assumption is made on the hidden variables, define the following conditional distribution: where f k (y t , y t−1 x, t) is an arbitrary real-valued feature function over its arguments and ω k is a learned weight that tunes the importance of each feature function.In particular, when for a token x t a given feature function f k is active, the corresponding weight ω k indicates how to take into account f k : (1) if ω k > 0 it increases the probability of the tag sequence y; (2) if ω k < 0 it decreases the probability of the tag sequence y; (3) if ω k = 0 has no effect whatsoever.Once the parameters ω k have been estimated, usually by maximizing the likelihood of the training data [4], the inference phase can be addressed.

Inference: finding the most probable state sequence
The inference problem in CRF corresponds to find the most likely sequence of hidden state y * , given the set of observation x = x 1 , ..., x n .This problem can be solved by determining y * such that: Given a number m of possible states and n possible input tokens, a layered acyclic directed graph D can be constructed for addressing the inference problem.The graph D is composed of n + 1 layers.Layer 0 corresponds to the entry layer, n + 1 is the ending layer, the other n layers represent the elements of the sequence.Arcs from each state y i , i = 1, . . ., m belonging to each layer t exists for each state y i , i = 1, . . ., m belonging to layer t + 1, t = 0, . . ., n.We denote as N the set of nodes containing 2 + n × m elements, that is, the states  Given a layered acyclic directed graph denoting both observed tokens and labels to be assigned, the objective is to find the heaviest path in the graph D starting from y p and ending at the state y f .Given the variables e t yi,y i assuming value equal to 1 if arc (t, y i , y i ) is included to the optimal path, 0 otherwise, the problem can be formulated as follows.The following constraints, opportunely instantiated according to the domain knowledge, could be introduced in the model: • Adjacency: if the token at time t − 1 is labelled as A, then the token at time t must be labelled as • Precedence: if the token at time t + z is labelled as B, then a token at time t must be labelled as • Begin-end position: if the sequence of tokens starts with label A, then the sequence must end with label To guarantee constraints (7) it is sufficient to modify the graph (D) by removing all the edges (t, y i , B). Examples of feasible paths satisfying constraints (8) are depicted in figure 2. Figure 3 shows an infeasible path for the same constraints.

Resource constrained model
The problem can be modelled as a Resource Constrained Maximum Path Problem (RCMPP ).It is possible to define proper Resource Extension Function (REF ) in order to introduce knapsack-like constraints for each typology of logic rule, that is, precedence and begin-end conditions.The reader is referred to [1] for more details on resource constrained path problems.

Precedence Constraint
Let consider P B be the set of predecessor states of state B. The path from y p to B has to contain all states yī ∈ P B .We assume a resource consumption r p i, ī associated with each state yī, for each y i of the network, defined in what follows: The REF associated with the precedence constraints is defined in Eq. (11).
The resource limit W P B is set equal to 0. The set of knapsack-like constraints that define the precedence constraints assume the following form: Figure 4 shows the resource constrained instance when precedence constraints are considered.
Considering feasible paths in figure 2, the resource consumption is either 0 or −1 for each label of both paths.The path in Figure 3 has resource consumption equal to 1 at state B, thus it is infeasible for Eq. ( 12).
It is worth observing that the resource at states can be viewed as resource on arcs.Indeed, the resource consumption of an arc is the resource consumption of the head node.

Begin-end Constraint
Here we define resource constraint for the begin-and condition.We assume a resource consumption r be (t,yi,yj ) associated with each arc (t, y i , y j ) ∈ A.
The REF associated with begin-and constraint is defined in equation ( 14).
w be yp = 0; w be j = w be i + r be (t,yi,yj ) , ∀(t, y i , y j ) ∈ A. ( Assuming a resource limit W be = 0, the resource constraint modelling beginend condition is expressed by equation (15).
(t,yi,y i )∈A e t yi,y i r be (t,yi,y i ) ≤ W be .( Figure 5 shows the resource graph associated with the begin-end condition.ī, w be i ).Since there may exist several paths to state y i , the index h is used to indicate the id of a given path.Thus, label l t i (h) is associated to the h − th path π t i (h).Starting from the initial label l 0 yp (0, 0, 0), the dynamic programming explores the statespace in order to reach the final labels l n y f (h)(•).Among all labels l n y f (h)(•), that with maximum value of α is associated with optimal path.The state-space is reduced by considering only feasible and non-dominated labels.
and at least one inequality is strictly satisfied.
Let L be the list of labels with the potential to generate an optimal solution and ND i be the set of non-dominated labels associated with y i .The labelling approach to solve to optimality the problem is depicted in Algorithm 1.
Step 1 (Label Selection) Select and delete from L a label l t i (h). Step

Conclusion
In this paper, the problem of Named Entity Recognition is addressed by investigating the inference task on Conditional Random Fields.In particular, 7 ITM Web of Conferences 14, 00004 (2017) DOI: 10.1051/itmconf/20171400004 APMOD 2016 a mathematical programming formulation based on a Resource Constrained Maximum Path is presented to include some background knowledge during the labelling phase of a text source.Three types of background knowledge constraints have been presented, together with a dynamic programming approach for determining the optimal sequence of labels.Concerning the future work, additional long distance dependencies are planned to be automatically discovered from the data (as hidden patterns) and enclosed into the mathematical problem formulation.
Figure 1 shows the graph D with n = m = 3.

Figure 4 .
Figure 4. Resource constrained graph with the precedence constraint PB = {A}.

Figure 5 .
Figure 5. Resource constrained graph with begin-end condition.
B , ∀yī ∈ P B and w be i (h) ≤ W be , with i ≡ y f .