Computing Optimal Mixed Strategies for Terrorist Plot Detection Games with the Consideration of Information Leakage

: The terrorist's coordinated attack is becoming an increasing threat to western countries. By monitoring potential terrorists, security agencies are able to detect and destroy terrorist plots at their planning stage. Therefore, an optimal monitoring strategy for the domestic security agency becomes necessary. However, previous study about monitoring strategy generation fails to consider the information leakage, due to hackers and insider threat. Such leakage events may lead to failure of watching potential terrorists and destroying the plot, and cause a huge risk to public security. This paper makes two major contributions. Firstly, we develop a new Stackelberg game model for the security agency to generate optimal monitoring strategy with the consideration of information leakage. Secondly, we provide a double-oracle framework DO-TPDIL for calculation effectively. The experimental result shows that our approach can obtain robust strategies against information leakage with high feasibility and efficiency.


Introduction
In the past few years, an increasing threat is originated from the terrorist organization, which is continually training and directing their supporters to launch coordinated attacks in their home countries, especially in European countries.To detect and thwart such attacks, domestic security agencies need to allocate security resources to monitor potential terrorists.Among all of the important factors, confidentiality is a basic requirement for security stuff during executing the monitoring task.A spotted security stuff cannot collect information from alerted terrorists, he will be even cheated by intentional terrorists with wrong information.Hence, the security agency fails to monitor the plot, which results in a huge risk to public security.Unfortunately, the security agency probably leak their information unconsciously.For example, Snowden [1] whose disclosure revealed several classified program of NSA.Consequently, the defender strategies become fragile and easy to attack.To sum up, designing robust strategy against the leakage becomes crucial to security agencies.
Recent years, security game theory has been widely used to protect large scale critical infrastructure and targets such as airport [2], transportation network [3], coast ports [4], etc. Coordinated attack is a rising hotspot for academic research, there are already some literature about this problem.There has been some research [5,6] on security game mechanism against coordinated attack.For example, Wang et al. [5] proposed a Terrorist Plot Detect(TPD) game model for security agencies to monitor the potential terrorists.Guo et al. [6] introduced a coalitional security game (CSG) to address the issue that attackers can form coalitions that can dramatically increase their ability to achieve malicious goals.Most of the models of Stackelberg security games assume the attacker is not able to observe the defender's real-time pure strategy, so the attacker makes decision only based on his knowledge about the defender's mixed strategy.However this assumption cannot explain the leakage we mentioned above.The information leakage effect was ignored in most of the literature about coordinated attacks.Xu et al. [7] first introduced the concept of "information leakage"(IL) in security game domain, they also discussed the model and the computation of the leakage in the simplest defender-attacker security game.However, the effect of the information leakage in other security game setting are missing in the present study, and the efficient algorithms to handle it in the specify case still needs to be investigated.
This paper develops the optimal defender strategy in Terrorist Plot Detect game taking the information leakage into account.In the presented model, the information leaks as the attacker can discover specific terrorists' monitoring information.For the sake of simplicity, we assume that at the same time there is only one terrorist's monitoring information that can be leaked, but all terrorists in the network have the probability to discover its monitoring status.
The rest of the paper is organized as follows.In section 2 we give a brief introduction to the TPD models, then we propose our TPDIL (Terrorist Plot Detect game considering Information Leakage) model.In section 3, a DO-TPDIL (Double Oracle algorithm for TPD games considering Information Leakage) framework is proposed based on double-oracle framework to solve the problem efficiently.Our experiments in section 4 show that the model can decrease the loss of defender's utility, which verify the feasible of our approach.

Model
In this section, firstly, we will introduce the Terrorist Plot Detect(TPD) game model proposed by Wang et al. [5].Then we will give the Terrorist Plot Detect game with the consideration of Information Leakage (TPDIL) model and LP formulation to calculate the optimal strategy for defender given the probability of IL.

TPD Game Model
As introduced by Wang et al. [5], a Terrorist Plot Detect(TPD) game is a Stackelberg game between a leader (security agency, defender) and a follower (terrorist plot planner, attacker).The defender first allocates R resources to monitor the attacker's social network ( , ) G V E = , where V is the set of vertices(potential terrorists) and E is the set of edges(relations between terrorists).Then the attacker chooses a connected subgraph to launch a coordinated attack based on his knowledge of the defender schedule.The defender's pure strategy v S S = is a V - dimensional 0-1 vector of R monitoring resources, and defender will monitor terrorist i if 1 i S = . The defender's mixed strategy is a probability distribution over pure strategies, x is the probability that pure strategy S is played, and will attend the plot, 0 otherwise.The attacker's mixed strategy is denoted by a probability distribution , and A y is the probability of playing attacker pure strategy A .The utility of TPD game is defined as follows: TPD game is a zero-sum game, the attacker gains exactly the defender loses.Given a defender allocation S and an attacker subgraph A , if there is no intersection between S and A , i.e., S A ? ?, which means the defender fails to detect the plot, the attacker will win ( ) P A .
( ) P A is a subgraph utility function and the definition can be found in [5], the defender's cost is ( ) P A -. Otherwise the defender wins and the reward of destroying the plot is 0, attacker's utility is also 0.
Without information leakage, given the attacker's strategy space A and the defender's strategy space S , the defender's optimal mixed strategy can be calculated by the following linear programming(LP) (1) formulation.

Model of Information Leakage
Xu et al. [7]  where n V = .p r is usually given by domain experts and may be determined by the nature or capability of terrorists.ADIL is a special case that only one chosen vertex 's status is being leaked, let i be the vertex, then 0 and 0 1 i p p = -.This model captures the fact that few attacker own counter-reconnaissance or surveillance ability.With certain probability, they succeed in observing a specific high-value vertex's status, like the ringleader of the plot or someone who has been arrested before.
In a simplest TPD game, let ( ) iu iv t t Ø denote the status that terrorist i is being monitored (not being monitored).The defender's utility of PRIL model is defined as Eq.( 2) : where 0 min ( , ) is the defender's utility when there is no information leakage.The defender's utility when terrorist i leaks out its monitoring status as i t ("being watched") is defined as Eq.( 3): And the defender's expected utility when target i leaks as status i t Ø (i.e., not being monitored) is defined as Eq.( 4) :

min [ ( , ) ( | )]
Which means that defender choose the minimax strategy to minimize attacker's best response.
Given a defender mixed strategy x and a attacker's pure strategy A , let ( , ) According to the definition, ( , ) ( , ) Notice that comparing to the simplest security game discussed in [7], the leakage message(only one node's status) in TPDIL is the same.However, instead of considering the number of targets, the defender now have to take a much more complex attacker strategy space, which grows up in an exponential way, into consideration.A direct challenge is that we can't use the origin compact matrix representation in [7], because the marginal coverage probability is helpless to describe the relationship between both player's pure strategy in our model.The ADIL model is defined in a similar way.Let terrorist i denote the special terrorist whose monitoring status is being leaked, 0 0 0 (1 )( ) where i u and i v is defined as above.

Equilibrium
Under zero-sum condition, the Stackelberg equilibrium (SSE) is equivalent to maximin equilibrium and minimax equilibrium, thus the optimal defender mixed strategy x in TPDIL game can be calculated by solving the following LP( 6).

Approach
As we discussed above, both player's strategy space grows in the exponentially way, which make solving the problem in a large scale network become impractical.
The double oracle framework is a normal form for solving zero-sum games with large strategy space.To solve the TPD game in a real-world scale network, Wang et al. [5] presented an algorithm DO-TPD(an Double Oracle algorithm for TPDs) to obtain solution for both players.DO-TPD initials with solving the equilibrium with a small and limited game, then both player compute the improving strategies and add them to the strategy space iteratively, and finally the solution meets the global equilibrium when no improving strategies can be found.However, the solutions obtained from DO-TPD don't consider the information leakage.
We extend DO-TPD to DO-TPDIL(an Double Oracle algorithm for TPD games considering Information Leakage) in a novel way.Next, we start with an overview of DO-TPDIL, and then present its components.

DO-TPDIL Overview
Our DO-TPDIL algorithm is sketched in Algorithm 1. DO-TPDIL also initials the small strategy space by solving the linear program LWA-LP presented by Wang et al. [5], where the attacker is restricted to use "lone-wolf" attack represented as a single vertex.In the initialization procedure, the game degenerates to the simplest form of security game because the attacker ignores the network structure.The defender strategy can be represented by marginal coverage probability of each vertex, and the mixed strategy can be efficiently sampled with Comb Sampling algorithm [8].for leak case i, xi ⊂ x, yi y do 5.
A = A Ai + 12. until S + = ∅ and Ai + =∅ for all leak case i 13. return (x, y ) Here we regard iu t and iv t as two separated cases, thus there are totally 2 1 V + possible cases.Let i A denotes the attacker's best response in each case i, then we have , where 0 A is the attacker's response with no leakage, i.e., the origin attacker's solution in TPD.Then DO-TPDIL solves CoreLP with the above restricted strategy space.The attacker's strategy space ' A È has only | | V strategies first, so the restricted CoreLP can be solved efficiently.Obviously, the "restricted version" solution is not enough to form the origin TPDIL's SSE solution.To improve both player's utilities, DO-TPDIL calls the subsequent oracles to find improving strategies.The process repeats until no improving strategies can be found in all case, and the obtained solution is a minimax equilibrium solution [9], and is equivalent to SSE and NE strategies in zero-sum game [10].

CoreLP
The CoreLP in TPDIL is computed as follows.Firstly, the defender computes its mixed strategy x against ' A È by solving LP( 1 Specifically, let case j denote the case iu t ( iv t ), i.e., the terrorist i leaks as being monitored (not being monitored).The attacker first incurs the defender strategy space j S which only contains the defender pure strategy that can be used in case j. j S can be calculated by Eq.( 5).Then he can obtain his optimal mixed strategy i y in this case by solving LP(1) with j S S = and i A A = .y È is the collection of the attacker's mixed strategy probability distribution in each case.

Defender Oracle
The goal of the defender oracle is to find pure strategy that can improve the player's utility, however finding the improving strategy is different between TPD game and TPDIL game.One problem is that the solution generated by the defender oracle in DO-TPD may be pointless in a specific leakage case.For example, if the attacker observes the case iu t , then a defender pure strategy that doesn't contain the leak point i will be meaningless.We need to add particular restrains in defender oracle to discard these invalid solution.Given a specific leakage case j and the related attacker's strategy space j A , the best defender oracle in DO-TPDIL(denoted as bestOD) will solve the defender best oracle in DO-TPD [5] (denoted as TPD-bestOD) as the MILP (7).The variables in this MILP have the same meaning with the TPD-bestOD, and the idea is that defender chooses an allocation which can overlap the attacker's subgraphs as much as possible.I is an 0-1 indicator, 1 I = when case j is terrorist i leaks as being monitored(i.e., iu t case), and this ensures that the strategy can be used in case j to against the specific attacker j A , and 0 I = is on the contrary(i.e., iv t case).
If there is no leakage, the defender will call LP( 7) by removing the last restrain, and setting Unfortunately, the bestOD problem is still NP-hard, because the process still has to search the whole network to find the best response, including or excluding the leak point has a negligible effect on network scale.To speed up the oracle process, we propose a fast oracle called betterOD.The idea is finding some sub-optimal("better") solutions to meet the criterion ( , ) ( , ) U S y U x y > in each leakage case, while it can be solved within much less time than bestOD process.The bestOD process only needs to be called when betterOD fails to find an answer.

3.4
Better OD In our betterOD algorithm, we need to use the algorithm TPD-betterOD presented by Wang et al. [5].The TPD-betterOD algorithm provides us an approach to find defender's sub-optimal strategy given attacker's mixed strategy y.Notice that, the defender searches the whole network to find pure strategy depends on attacker's mixed strategy y in TPD-betterOD.We add C as the candidate nodes range instead of the whole network to prevent pointless strategies which break the leakage precondition.Our betterOD is shown in Algorithm 3. Firstly, the betterOD algorithm sets the candidate nodes, if the node leaks as case iv t , then betterOD will restrict the candidate nodes as C = V \{i} to prevent meaningless strategies in lines 2~3.Secondly, betterOD calls TPD-betterOD to search strategy by selecting vertices that bring the maximum marginal utility in a greedy way.Finally, in iu t case, if the strategies obtained from line 4 don't contain the leak point i , betterOD will discard these strategy and return the rest part.for all S Sbetter do 7.
if S doesn't contains leak point i then 8.

Attacker Oracle(AO)
The attacker oracle algorithm for TPDIL works in the same way with DO-TPD.In each leakage case i , the attacker only considers the defender's probability distribution i x after observing iu t or iv t .Then he will call the attacker better oracle to find improving strategies, if fail, he will turn to the attacker best oracle.The algorithm detail we refers to [5].

Experimental Evaluation
We evaluate the performance of our approach through extensive evaluations.All LP and MILP computations are solved by CPLEX 12.60.All computations are run on a machine with 2.2 GHZ Intel qual core CPU and 8 GB memory.The parameters keep the same with [5].All the experiments are run with the ADIL model.The leak point is random chosen from the network, and the 0.5 i p = unless otherwise stated.All the results are sampled over 20 times.
Our experiments use the following three types of network: (i) Random trees (RT), where each new node is connected to a random picked existing node.(ii) Erdös-Rényi random graphs(ER(V, M)) [11], where M links are randomly assigned between all possible vertices pairs, we use the networks with M V = labelled as ER (1).
(iii) Barabási-Albert scale-free network(BA(k)) [12], where each new node is connected to k existing nodes with the proportional to the edges the existing node already has, we use BA scale-free network with parameters 1 k = labeld as BA (1).By default, the capability of each vertex v t is randomly chosen in [1, 5], and the network externality measure 0.1 d = .

Scale-Up Analysis
In Figure .2, we test our algorithm in different network.
Results show that our algorithm can help to reduce the complexity of solving the problem.The naive FullLP's runtime increases in an exponentially way, and it runs out of memory when the network size is about 20.Our approach can help the player to solve the problem with the network size of 40.

Solution Quality Analysis
We use the following centrality measure based heuristic baselines proposed by [5] to evaluate the solution quality of our approach: (i) DCP: defender only monitors the vertices with top R degree values.(ii) DCM: a defender mixed strategy where the vertex marginal coverage probability is normalized degree centrality; (iii) WCP: defender only monitors vertices with top R node capability values.(iv) WCM: a defender mixed strategy where the vertex marginal coverage probability is normalized weight centrality.All the mixed strategies are sampled by Comb Sampling algorithm [8].The result by our approach is denoted by line EXACT.In Figure 2. , we compare the solution quality obtained from our approach and other 4 heuristic baselines.Results show that the solution obtained from the heuristic baselines performs badly in information leakage model, the performance becomes worse with the network size growing up.The defender utility from our approach can remain stable as the network grows, and the value indicates our approach's effectiveness.In all the three network, the mixed strategy (DCM, WCM) always performs better than the corresponding pure strategy.
This shows the necessary of using randomized mixed strategy in TPD game even with the consideration of information leakage.

Conclusion
In this study, we considered partial information leakage in Terrorist Plot Detect games.We focused on the modeling and the computation of the new game to help defender decreasing the loss caused by leakage.We proposed an efficient double oracle framework to speed up finding efficient strategy, instead of searching strategy from the whole strategy space.The experimental results show the feasibility and efficiency of our approach.
L is the number of information leakage cases, in our model, is the network size | | V .The | | X stands for the defender strategy space S , which grows exponentially with the number of resources R , and | | Y is the attacker strategy space, which grows exponentially with | | V .As a result, it is impractical to directly to solve this LP.
"monitoring status" is being leaked, in other words, attacker is aware of whether terrorist i is being watched or not.With probability 0 1 introduced the two related IL model: PRobabilistic Information Ieakage (PRIL) model and ADversarial Information Leakage (ADIL) model.Let ( 0) i i p p ³ denote the probability that vertex i's Î = -å , no vertices leak information.In PRIL model, every single vertex has probability to leak its information, thus we have a ( 1) n +