Analytical solution to an LQG homing problem in two di- mensions

An analytical solution is found to the problem of maximising the time spent in the first quadrant by the two-dimensional diffusion process (X(t),Y(t)), where Y(t) is a controlled Brownian motion and X(t) is proportional to its integral. Moreover, we force the process to exit the first quadrant through the y-axis. This type of problem is known as LQG homing and is very difficult to solve explicitly, especially in two or more dimensions. Here the partial differential equation satisfied by a transformation of the value function is solved by making use of the method of separation of variables. The exact solution is expressed as an infinite sum of Airy functions. 2010 Mathematics Subject Classification: 93E20.

Our aim is to find the control u * that minimises the expected value of the cost function J(x, y) := where q 0 is a positive constant. We choose the termination cost function Moreover, we assume that the parameter θ in Eq. (3) is negative. Hence, the objective is to maximise the expected time that (X(t), Y(t)) will spend in C, and we force the process to exit C through the boundary x = 0. The optimizer must of course take the quadratic control costs into account. The problem defined above is a particular LQG homing problem, in which the final time is a random variable; see Whittle [? ]. Such problems are very difficult to solve explicitly, especially in two or more dimensions. The author has written many papers on LQG homing problems; see Lefebvre [? ] and Lefebvre and Moutassim [? ] for recent ones. Exploiting the symmetry present in some problems, it is sometimes possible to reduce multidimensional to one-dimensional problems. For example, in Lefebvre [? ] the author was able to solve a three-dimensional LQG homing problem by using the method of similarity solutions to transform it into a one-dimensional problem; see also Makasu [? ]. In this paper, we will find an exact analytical expression for the optimal control u * in the problem defined above, which cannot be reduced to a one-dimensional problem.
Next, we define the value function Making use of dynamic programming, we obtain the following proposition.
The value function F(x, y) satisfies, for (x, y) ∈ C, the dynamic programming equation where u := u(x, y) and F x := ∂F/∂x, etc.
Remark 1.1 If the parameter θ is too large in absolute value, the function F(x, y) could become −∞. We assume that θ is such that |F(x, y)| is finite.
In the next section, the function F(x, y) and the optimal control u * will be obtained explicitly.

Analytical solution
We deduce from Proposition 1.1 that the optimal control can be expressed as so that F(x, y) satisfies the following non-linear second-order partial differential equation (p.d.e.): This equation is subject to the boundary conditions Then, we find that Φ(x, y; α) satisfies the linear second-order p.d.e.

Remark 2.1 Equation (5) is actually the Kolmogorov backward equation satisfied by the mathematical expectation (assumed to exist)
where (X 0 (t), Y 0 (t)) is the uncontrolled two-dimensional degenerate diffusion process obtained by setting u[X(t), Y(t)] ≡ 0 in (1), (2), and T 0 (x, y) is the same as T (x, y), but for the uncontrolled process. Furthermore, the conditions in Eq. (6) are the appropriate boundary conditions.
To obtain an explicit solution to the problem (5), (6), we will make use of the method of separation of variables: we assume that Φ can be written as where actually G(x) = G(x; α) and H(y) = H(y; α). Then, we must solve the ordinary differential equations and in which λ is the constant of separation. We find at once that Moreover, if λ is positive, the general solution of Eq. (7) can be written as where c 1 and c 2 are arbitrary constants, and Ai(·) and Bi(·) are Airy functions.
Next, because Φ(x, y; α) is finite and the function Bi diverges as y tends to infinity, whereas Ai decreases to zero, we set c 2 = 0 in Eq. (8). Furthermore, we deduce from the boundary condition Φ(x, 0; α) = 0, for x > 0, that the function H(y) must be such that H(0) = 0. Thus, the constant λ = λ n must be chosen so that where a n denotes a zero of the function Ai. This function has an infinite number of zeros, which can be evaluated numerically, and they are all located on the negative real axis; see Abramowitz and Stegun [? ] (p. 450). Therefore, we write that Let A denote the set of zeros of Ai. We have (see Vallée and Soares [? ] (p. 88) or Katori and Tanemura [? ]), for a n and a n ′ ∈ A, ∫ ∞ 0 Ai(z + a n )Ai(z + a n ′ ) dz = { 0 if a n a n ′ , (Ai ′ ) 2 (a n ) if a n = a n ′ .
From what precedes, we define This function is a solution of Eq. (5) and it satisfies the boundary condition Φ(x, 0; α) = 0, for any x > 0. To conclude, we must determine the constants k n such that Φ(0, y; α) = 1, for any y > 0. That is, we must have Hence, letting Ai n (y) := Ai where we assumed that we can integrate the series term by term. Let We can give a mathematical expression for I m in terms of hypergeometric functions, and the integrals I n,m can be evaluated for any m and n with the help of a mathematical software. In practice, we can get a good approximation to the function Φ(x, y; α) by computing enough terms in the infinite summation. If we use l terms, then to obtain (approximately) the constants k 1 , . . . , k l we must solve a system of l linear equations, which again can be done with a mathematical software. Summing up, we can state the following proposition.

Proposition 2.1
The optimal control in the problem considered in this paper can be expressed as ∑ ∞ n=1 k n e −λ n x/k Ai [ (2λ n /σ 2 ) 1/3 y + a n ] for x > 0 and y > 0, where the a n 's are the zeros of the function Ai, λ n is defined in Eq. (9) and the constants k n can be computed approximately by solving the system (??) with a finite number of equations.

Remark 2.2
We have assumed in Proposition 2.1 that we can interchange the order of differentiation and summation. Since in practice we will use a finite number of terms in the summation, this is not a problem.
The first 9 zeros of the function Ai(z) are given in Table 1.
We can now compute the approximation Φ l9 (x, y; 1) to the function Φ(x, y; 1) when the first 9 terms are used in the summation. This approximation is presented in Figure ?? when x = 0. To check whether the summation converges rapidly or not, we also computed the ap-   proximations Φ l5 (0, y; 1) and Φ l7 (0, y; 1). The three approximations are shown in Figure ??. We see that while Φ l5 (0, y; 1) is not a very good approximation, Φ l7 (0, y; 1) and Φ l9 (0, y; 1) are quite similar and much better approximations; see also Figure ??.
Finally, from Proposition ?? we can calculate the approximate optimal control in this example. Figure ?? presents the approximation u * * (1, y) to u * (1, y) when 9 terms are used in the summation ∑ ∞ n=1 k n Φ n (x, y; α), whereas the approximation u * * (x, 1) to u * (x, 1) is shown in Figure ??. As expected, u * * (1, y) tends to infinity when y decreases to zero, because the optimizer must avoid the infinite penalty incurred when the process hits the boundary y = 0 first. When y increases, the optimal control is approximately equal to zero, because the uncontrolled process will then most likely hit first the boundary x = 0 instead. In the case of the approximate optimal control u * * (x, 1), it tends to about 0.471 as x tends to infinity.

Conclusion
In this paper, we were able to find an analytical expression for the optimal control in a twodimensional LQG homing problem. Contrary to other multidimensional homing problems considered so far, the one that was solved here could not be reduced to a one-dimensional problem by making use of symmetry, for instance. The method of separation of variables was used to solve the partial differential equation satisfied by a transformation Φ(x, y; α) of the value function. The function Φ(x, y; α) was expressed as an infinite series involving the Airy function Ai and its zeros. This series was almost a Fourier-Airy series, but unfortunately the various Airy functions here are not orthogonal.
Because the solution is an infinite series, in practice we cannot compute the optimal control exactly. However, we saw in the example presented in Subsection ?? that the series Figure 5. Approximation u * * (x, 1) to the optimal control u * (x, 1) in the interval [0, 30] when the first 9 terms are used in the summation. stabilises rapidly. Indeed, the approximations Φ l7 (0, y; 1) and Φ l9 (0, y; 1) to Φ(0, y; 1) obtained by considering respectively the first 7 and 9 terms in the infinite series are very similar and practically coincide for y small enough.
We could have used a numerical method to obtain an approximate optimal control in Subsection ?? (and in general). Our objective was rather to find a mathematical expression for the exact optimal control. In order to implement this exact optimal control, we must evaluate improper integrals involving Airy functions, and then solve a system of linear equations. Both tasks can be done with the help of a mathematical software like Maple.
Finally, notice that because the zeros of Ai(z) are all negative real numbers, the parameter θ in the cost function (3) also had to be negative, so that the objective was to maximise the expected time spent by the process in the first quadrant. If we want to minimise this expected time instead, θ will then be positive and our method would fail.