value function, Linear programming approach, OCP, optimal control problem, convex programming, Rn, stochastic control problems, duality theory, deterministic optimal control problem, optimal control problems, optimal control, linear program, Cb1, Linear programming, continuous function, l0, AJ, Lemma
Content:
APPLICATIONES MATHEMATICAE 24,1 (1996), pp. 1733
D . H E R N Aґ N D E Z  H E R N Aґ N D E Z (Mґexico) O . H E R N Aґ N D E Z  L E R M A (Mґexico) M . T A K S A R (Stony Brook, N.Y.) THE
Linear programming APPROACH TO DETERMINISTIC OPTIMAL
control problemS
Abstract. Given a deterministic optimal control problem (OCP) with
value function, say J, we introduce a linear program (P ) and its dual (P ) whose values satisfy sup(P ) inf(P ) J(t, x). Then we give conditions under which (i) there is no duality gap, i.e. sup(P ) = inf(P ), and (ii) (P ) is solvable and it is equivalent to the (OCP) in the sense that min(P ) = J(t, x).
1. Introduction. A timehonored approach to optimal
control problems (OCPs) is via mathematical programming problems on suitable spaces. For instance, this approach can be used to obtain Pontryagin's maximum principle; see e.g. [3]. Another class of results has also been obtained for both deterministic and stochastic OCPs using convex programming methods [2, 5, 6]. This paper is concerned with the linear programming (LP) approach to deterministic, finitehorizon OCPs with value function J(t, x)when the initial data is (t, x) [see (2.3)]. In this case, we first introduce a linear program (P ) and its dual (P ) for which
(1.1)
sup(P ) inf(P ) J(t, x),
where sup(P ) and inf(P ) denote the values of (P ) and (P ), respectively. Then we give conditions under which
1991 Mathematics
subject classification: Primary 49J15, 49M35. Key words and phrases: optimal control, linear programming (in infinite
dimensional spaces), duality theory. This research was partially supported by
Research Grant 1332E9206 from the Consejo Nacional de Ciencia y Tecnologґia, Mexico. The work of the third author was supported in part by NSF Grant DMS 9301200 and NATO Grant CRG 900147.
[17]
18
D. HernґandezHernґandez et al.
(i) there is no duality gap, i.e.,
(1.2)
sup(P ) = inf(P );
(ii) the linear program (P ) is solvable, which means that (P ) has an
optimal solution (and we write min(P ) instead of inf(P )), and is equivalent
to the OCP in the sense that
(1.3)
min(P ) = J(t, x).
related literature. In recent papers [8, 9], we have obtained results similar to (1.1)(1.3) for some discretetime stochastic control problems on general Borel spaces. Our work is also related to the convex programming approach in [2, 5, 6] in that we use (LP) duality theory to get (1.1)(1.3); in fact, to set our OCP we follow closely [5, 6]. Finally, we should mention that for several classes of OCPs (see e.g. [12, 13]) there is a well known, direct wayi.e., without going through the dual program (P )to get (1.3); namely, one simply writes down the associated linear program (P) and then uses continuity/compactness arguments to get a minimizing sequence that converges to the optimal value. But of course, using duality, one gets more information on the OCP. For example, it turns out that the dual (P ) is associated with the dynamic programming equation (DPE) in a sense to be precised in the Corollary to Theorem 5.1.
Organization of the paper. In Section 2 we introduce the OCP we are interested in, and recall some facts on the dynamic programming equation. Section 3 presents the linear programs (P ) and (P ) associated with the OCP. We also prove the consistency of these programs. In Section 4 we present the proof of (1.1)(1.2), whereas the equality (1.3) is proved in Section 5. Finally, in Section 6 we introduce a particular approximation to the value function.
2. The optimal control problem R e m a r k 2.1. Notation. (a) If X is a generic metric space, then we denote by C(X) the space of realvalued continuous bounded functions with finite uniform norm . If b : X R is a
continuous function with b(·) 1 (which we call a bounding function), then Cb(X) stands for the real vector space of all continuous functions v : X R such that v b := v/b = sup v(x)/b(x) < . xX Let Db(X) be the dual of Cb(X), i.e. the vector space of all bounded linear functionals on Cb(X). If Db(X) and v Cb(X), we denote by , v the value of at v. (b) Let Mb(X) be the vector spacМe of all finite signed measures µ on the Borel sets of X such that µ b := b dµ is finite, where  ·  stands for
Linear programming approach to control problems
19
the total variaМtion. Then, identifying µ Mb(X) with the linear functional v µ, v := v dµ on Cb(X), we see that Mb(X) Db(X) since
 µ, v  v b µ b. (c) Let T, 0 < T < , be the optimization horizon, and U Rn the
control set, which is assumed to be compact. Define := [0, T ] Ч Rn, S := Ч U . If v is a function on Rn, we consider it to be a function on , S or RnЧU , defining v(t, x) := v(x), v(t, x, u) := v(x) or v(x, u) := v(x) respectively. For each t [0, T ], the set U(t) of control processes is the set of Borel
measurable functions u : [t, T ] U .
The optimal control problem (OCP). Let f : S Rn be a given function, and consider the controlled system
(2.1)
x (s) := f (s, x(s), u(s)), t < s T, x(t) = x,
where x Rn and u U (t). The OCP is then to minimize
(2.2)
T J(t, x; u) := l0(s, x(s), u(s)) ds + L0(x(T )) t
over the pairs (x(·), u(·)) that satisfy Definition 2.2. The OCP's value function J is defined as
(2.3)
J(t, x) := inf J(t, x; u). U (t)
Definition 2.2. A pair (x(·), u(·)) is said to be admissible for the initial data (t, x) if u(·) U(t), and x(·) satisfies (2.1). We shall denote by P(t, x) the family of all admissible pairs, given the initial data (t, x). Throughout the following we assume (H1)(H3) below: (H1) f belongs to C(S) and it is Lipschitz in x Rn, uniformly in (t, u) [0, T ] Ч U , i.e. sup f (t, x, u) K and f (t, x, u)  f (t, y, u) cx  y x, y Rn, S where c is some constant independent of (t, u). (H2) l0 and L0 are nonnegative, bounded away from zero, continuous functions on S and Rn respectively, and there exists a realvalued continuous function b(x) on Rn such that l0(t, x, u) b(x), (t, x, u) S, L0(x) b(x), x Rn, b(x)/l0(t, x, u) C(S), and b(x)/L0(x) C(Rn).
20
D. HernґandezHernґandez et al.
(H3) There exist 0 > 0 and c > 0 such that for all s  t, x  y < 0, b(y)  b(x) cy  xb(x), l0(t, x, u)  l0(s, y, u) c(y  x + t  s)b(x), L0(y)  L0(x) cy  xb(x); without loss of generality we may take c to be the same as in (H1).
The dynamic programming equation (DPE). We write
partial derivatives as D0 := /t and Di := /xi for i = 1, . . . , n. Let b be as in (H2) and define Cb1() as the
Banach space consisting of all the functions Cb() with
partial derivatives Di in Cb() for all i = 0, 1, . . . , n, with
(2.4)
n
1 b
:=
b+
Di b < .
i=0
For each Cb1(), define A Cb(S) by
(2.5)
A(t, x, u) := D0(t, x) + f (t, x, u) · x(t, x),
where x is the xgradient of . Then A : Cb1() Cb(S) is a linear operator and it is obviously bounded, since
(2.6)
A b (1 +
f
)
1 b
Cb1().
Definition 2.3. A function in Cb1() is said to be a smooth subsolution to the dynamic programming equation (DPE) if
A + l0 0 on [0, T ) Ч Rn Ч U, and (T, x) L0(x) x Rn.
If is in Cb1() and (x(·), u(·)) P(t, x), then
d dt
(t,
x(t))
=
A(t,
x(t),
u(t)),
so that
(2.7)
T A(s, x(s), u(s)) ds = (T, x(T ))  (t, x).
t
Therefore, if is a smooth subsolution to the DPE, then (t, x) J(t, x; u), and we see that and the value function are related by the inequality
(2.8)
(t, x) J(t, x).
3. The linear programming formulation. We will use the linear programming terminology of [1], Chapter 3. Dual pairs. Let b be the function in (H2)(H3) and define the vector space C(S) := Cb(S) Ч Cb(Rn), which consists of all pairs l = (l, L) of functions l Cb(S) and L Cb(Rn). (Note that condition (H2) implies that
Linear programming approach to control problems
21
(l0, L0) C(S)). Moreover, let Db(S) and Db(Rn) be the dual spaces of Cb(S) and Cb(Rn) respectively, and define D(S) as the vector space consisting of pairs = (1, 2) of functionals 1 Db(S) and 2 Db(Rn). Then (C(S), D(S)) is a dual pair with respect to the bilinear form
, l := 1, l + 2, L . Let Mb(S) Db(S) and Mb(Rn) Db(Rn) be the spaces of measures introduced in Remark 2.1. Then each admissible pair (x(·), u(·)) P(t, x) defines a pair of measures M u = (M u, N u) in Mb(S) Ч Mb(Rn) by setting, for l C(S),
(3.1)
T M u, l = M u, l + N u, L = l(s, x(s), u(s)) ds + L(x(T )).
t
That is, N u is the Dirac measure at x(T ), and M u satisfies
M u(A Ч B Ч C) =
IB(x(s))IC (u(s)) ds,
[t,T ]A
where A, B and C are arbitrary Borel sets in [t, T ], Rn and U respectively. Note that condition (H1) implies that for each controlled process
x(t), 0 < t < T , defined by (2.1) belongs to a compact set. Thus M u, l is well defined and finite for each l. Furthermore, if Cb1(), we may write (2.7) as
(3.2)
(M u, N u), (A, T ) = (t, x),
where T (x) := (T, x), for x Rn, denotes the restriction of to {T }ЧRn. On the other hand, from (2.2)(2.3),
(3.3)
J(t, x) = inf (M u, N u), (l0, L0) . U (t)
We shall consider C(S) and D(S) to be endowed with the norms
l = (l, L) = max{ l b, L b}
and
= (1, 2) = max{ 1 b, 2 b}.
In addition to (D(S), C(S)), we also consider the dual pair (Db1(), Cb1()), where Db1() is the dual of Cb1(). Let L2 : Cb1() C(S) be the linear map defined by
(3.4)
L2 := (A, T ), Cb1().
By (2.6), L2 is continuous. We now define L1 : D(S) Db1() as follows. First, for every = (1, 2) D(S), let T be defined on Cb1() as T() =
22
D. HernґandezHernґandez et al.
, L2 . Since L2 is a continuous linear map, so is T. Therefore, there exists a unique Db1() such that
(3.5)
T() = , (= , L2 ).
As this holds for every D(S), we define L1 : D(S) Db1() as
(3.6)
L1 :=
and note that L1 is the adjoint of L2, i.e., from (3.5),
(3.7)
L1, = , L2 D(S), Cb1().
Moreover, from (3.7), (3.4) and (2.5), a direct calculation shows that
L1
1 b
=
sup{
L1,
:
1 b
1} (2 +
f
) .
Thus, L1 is a continuous linear map.
R e m a r k 3.1. Notation. Given a real vector space X with a positive cone X+ we write x 0 whenever x X+. Let C(S)+ := {l = (l, L) C(S) : l 0, L 0} be the natural positive cone in C(S), and
D(S)+ := { = (1, 2) D(S) : , l 0 l C(S)+}
the corresponding dual cone.
Linear programs. Let l0 be the pair (l0, L0) C(S), and let 0 := (t,x) Db1() be the Dirac measure concentrated at the
initial condition (t, x) of (2.1), that is, 0, = (t, x) for Cb1(). Consider now the following linear program (P ) and its dual (P ).
(P ) minimize , l0 , subject to:
(3.8)
L1 = 0, D(S)+.
(P ) maximize 0, [= (t, x)], subject to:
(3.9)
L2 l0, Cb1(),
where the latter inequality is understood componentwise, i.e.,
A l0 and T L0. Recall that T (·) := (T, ·) is the restriction of to {T }Ч Rn. Let F (P ) (resp. F (P )) be the set of feasible solutions to (P ) (resp. (P )); i.e. F (P ) (resp. F (P )) is the set of pairs = (1, 2) in D(S) that satisfy (3.8) (resp. the set of functions Cb1() that satisfy (3.9)). Consistency. The linear program (P ) is said to be consistent if F (P ) is nonempty, and similarly for F (P ). The program (P ) is consistent, since e.g. (·) 0 is in F (P ). On the other hand, (P ) is also consistent since F (P ) contains the set of all pairs M u = (M u, N u) 0 such that
Linear programming approach to control problems
23
(x(·), u(·)) P(t, x); see (3.1). Indeed, by (3.7), the equality L1M u = 0 in (3.8) holds if and only if
M u, L2 = (M u, N u), (A, T ) = (t, x) which is the same as (3.2) for (1, 2) = (M u, N u). The latter also implies that, from (3.3),
Cb1(),
J(t, x) = inf M u, l0 inf , l0 =: inf(P ),
U (t,x)
F (P )
i.e. the value function J and the value, inf(P ), of (P ) are related by
J(t, x) inf(P ).
Furthermore, denoting by sup(P ) the value of (P ), weak duality yields [1]
inf(P ) sup(P );
hence,
(3.10)
J(t, x) inf(P ) sup(P ).
4. Absence of duality gap. In this section we prove that there is no duality gap (see (4.1)) and that (P ) is solvable. More precisely, we have the following theorem.
Theorem 4.1. If the hypotheses (H1)(H3) hold, then there is no duality gap and (P ) is solvable, i.e.
(4.1)
sup(P ) = inf(P ),
and there exists an optimal solution D(S) for (P ), so that
sup(P ) = min(P ) = , l0 .
P r o o f. We use Theorems 3.10 and 3.22 of [1], which state that if (P ) is consistent with a
finite value, and the set
(4.2)
D := {(L1, , l0 ) : D(S)+}
is closed in Db1() Ч R, then there is no duality gap between (P ) and (P ), and (P ) is solvable. Thus, since we have seen that (P ) is consistent, it suffices to show that the set D in (4.2) is closed. Let be a directed set, and let { = (1 , 2 ) : } be a net in D(S)+ such that (L1 , , l0 ) converges to (, r) in Db1() Ч R, i.e.
(4.3) and
r = lim , l0
(4.4)
= lim L1
24
D. HernґandezHernґandez et al.
in the weak topology (Db1(), Cb1()). We wish to show that (, r) is in D, i.e. there exists = (1, 2) D(S)+ such that
(4.5)
r = , l0 and = L1.
By (4.3), given > 0, there exists () such that, for all (),
(4.6)
r  , l0 = 1 , l0 + 2 , L0 r + .
Therefore, for any () and l Cb(S),
 1 , l  1 , l l b 1 , b l b 1 , l0 b/l0 by (H2) l b b/l0 (r + ) by (4.6);
that is, {1 : ()} is a bounded family in Db(S). Similarly, {2 : ()} is a bounded family in Db(Rn), since for all () and L Cb(Rn),
 2 , L  2 , L L b 2 , b L b 2 , L0 b/L0 by (H2) L b b/L0 (r + ) by (4.6).
Thus, { : ()} is bounded and, therefore, there exists a directed set and a pair = (1, 2) such that { : } converges to . This convergence, together with (4.3), yields , l0 = r, whereas the continuity of L1 and (4.4) give
L1 = L1(lim ) = lim L1 = .
That is, (4.5) holds.
5. Equivalence of (P ) and the OCP. In this section we prove that the original OCP (2.1)(2.3) and the linear program (P ) are equivalent in the sense of the following theorem. Theorem 5.1. Assume (H1)(H3). Then min(P ) = J(t, x). Moreover, from (4.1) and Theorem 5.1, we obtain J(t, x) = sup(P ). In other words: Corollary. Under (H1)(H3), the value function J is the supremum of the smooth subsolutions to the DPE. In the proof of Theorem 5.1 we use the following key result, which is proved in the next section.
Linear programming approach to control problems
25
Theorem 5.2. For every > 0 there exist functions J, L and , with J Cb1(), such that
(5.1) (5.2)
J  J b 0 as 0, J(T, x) = L(x), L0  L b 0 as 0,
(5.3)
AJ + l0 ,
where
(5.4)
b 0 as 0.
P r o o f o f T h e o r e m 5.1. From (3.10) and the solvability of (P ) (Theorem 4.1), we know that min(P ) J(t, x). Suppose that min(P ) < J(t, x). Then there exists F (P ) such that
(5.5) Thus, from (5.3),
, l0 < J(t, x).
, l0 1, AJ + + 2, L + 2, L0  L 1, AJ + 2, L  b 1 b  2 b L0  L b = , L2J  b 1 b  2 b L0  L b = L1, J  b 1 b  2 b L0  L b = J(t, x)  b 1 b  2 b L0  L b by (3.8).
From (5.1)(5.2) and (5.4), it follows that J(t, x) , l0 , which contradicts (5.5).
6. Approximation of the value function. In this section we prove the approximation Theorem 5.2. We will do this via several lemmas, from which we obtain a particular approximation to the optimal cost function. We first extend our control problem to a larger time interval. Put f (t, x, u) := f (0, x, u) and l0(t, x, u) := l0(0, x, u) if t < 0; f (t, x, u) := f (T, x, u) and l0(t, x, u) := l0(T, x, u) if t > T. For each > 0, define := [, T + ] Ч Rn, S := Ч U, and U(t) as the set of Borel measurable functions u : [t, T + ] U ,  t < T + . Note that, thus defined, the extensions of l0 and f to and S satisfy (H1) and (H2). Define T + J(t, x; u) := l0(r, x(r), u(r)) dr + L0(x(T + )), t
26
D. HernґandezHernґandez et al.
where
(6.1)
x (r) = f (r, x(r), u(r)), t < r T + , x(t) = x.
The value function J is defined as
J(t,
x)
:=
inf U (t)
J(t,
x;
u).
Note that = 0 yields the original OCP. We shall now establish properties of the value function J. Below, C stands for a generic constant whose values may be different in different formulas.
Lemma 6.1. There exists C such that for all < 1,
J(t, x) Cb(x) (t, x) .
P r o o f. From (H3) it follows that
(6.2)
b(y) b(x) (1 + cy  x) b(x)ecxy
for all x  y < 0. By induction, one can show the validity of (6.2) for all x, y Rn. From (6.1) and (H1) we obtain, for each u U(t) and r t,
(6.3)
x(r)  x Kr  t.
Then, by (H2) and (6.2)(6.3),
T +
J(t, x; u) b(x(r)) dr + b(x(T + )) t
T +
b(x)ecx(r)x dr + b(x)ecx(T +)x
t
T +
b(x)
ecKrt dr + ecKT +t Cb(x).
t
Taking the infimum over U(t) yields the lemma.
Lemma 6.2. There exist 1 > 0 and C > 0 such that for all < 1 and x  y, s  t < 1, J(t, x)  J(s, y) C[x  y + s  t]b(x). P r o o f. Assume t < s and let u U(t) be an arbitrary control function. Put x 1(r) = f (r, x1(r), u(r)), t < r T + , with x1(t) = x, x 2(r) = f (r, x2(r), u(r)), s < r T + , with x2(s) = y. Then
Linear programming approach to control problems
27
(6.4)
J(t, x; u)  J(s, y; u) s l0(r, x1(r), u(r)) dr t
T + + [l0(r, x1(r), u(r))  l0(r, x2(r), u(r))] dr s
+ L0(x1(T + ))  L0(x2(T + ))
=: I1 + I2 + I3.
Using (6.2), (6.3) and (H2), we have
(6.5)
s
s
I1 b(x1(r)) dr b(x)ecx1(r)x dr
t
t
s b(x) ecKrt dr b(x)ecK(T +2)(s  t).
t
We now majorize I2. From (6.3), x1(s)  x Ks  t; hence
(6.6)
x1(s)  y x  y + Ks  t.
Consequently, by (H1),
(6.7)
x1(r)  x2(r) r = x1(s)  y + f (z, x1(z), u(z))  f (z, x2(z), u(z)) dz s
r x  y + Kt  s + cx1(z)  x2(z) dz. s
Thus, Gronwall's inequality implies
(6.8)
x1(r)  x2(r) [x  y + Kt  s]ecrs.
Taking 1 < 1 such that (K + 1)1ec(T +2) < 0 we have (see condition (H3))
(6.9)
T + I2 l0(r, x1(r), u(r))  l0(r, x2(r), u(r)) dr s T + b(x1(r))[x  y + Kt  s]ec(T +s) dr s T + b(x)[x  y + Kt  s]ec(T +2) ecx1(r)x dr
s
28
D. HernґandezHernґandez et al.
T + b(x)[x  y + Kt  s]ec(T +2) ecKrt dr  = C[x  y + t  s]b(x).
Similarly, using (6.8), (6.2), (6.3) and (H3), we may majorize I3 as follows:
(6.10)
I3 = L0(x1(T + ))  L0(x2(T + )) cb(x1(T + ))x2(T + )  x1(T + ) cb(x)ecx1(T +)x(x  y + Kt  s)ec(T +s) cb(x)ecKT +t(x  y + t  s)(K + 1)ec(T +s)
= Cb(x)(x  y + t  s).
Combining (6.4), (6.5), (6.9) and (6.10) and taking the supremum over all control functions u(·), we complete the proof of the lemma, since
J(t, x)  J(s, y) sup J(t, x; u)  J(s, y; u). U (t)
R e m a r k 6.3. From Lemma 6.2 it follows that J is differentiable for almost all (t, x) , and DiJ(t, x) Cb(x), i = 0, 1, . . . , n.
Lemma 6.4. There exists C > 0 such that for all (t, x) and all sufficiently small > 0,
(6.11)
J(t, x)  J(t, x) Cb(x).
P r o o f. Let 0 t T and let u(·) be any control function in U(t). Let
x (r) = f (r, x(r), u(r)), t < r T + ,
x(t) = x.
Then (H3) and the inequalities (6.2) and (6.3) show that for < 1,
(6.12)
J(t, x; u)  J(t, x; u) T + l0(r, x(r), u(r)) dr + L0(x(T + ))  L0(x(T )) T T + b(x)ecx(r)x dr + cb(x(T ))x(T + )  x(T ) T b(x)ecK(T +) + cb(x)ecx(T )xK
b(x)ecK(T +) + cb(x)ecKT tK Cb(x).
Finally, as in the proof of Lemma 6.2, taking the supremum over all u(·), we get (6.11).
Linear programming approach to control problems
29
Lemma 6.5. There exist 2 > 0 and C > 0 such that for any < 2, any initial condition (t, x), and any sufficiently small 0 < h < and u U ,
(6.13) J(t, x) l0(t, x, u)h + J(t + h, x + f (t, x, u)h) + Chb(x). P r o o f. Let u U be fixed and let
x (r) = f (r, x(r), u), t < r T + ,
x(t) = x.
The dynamic programming principle [4, p. 9] implies
(6.14)
t+h J(t, x) l0(r, x(r), u) dr + J(t + h, x(t + h)) t =: I1 + I2.
Using (H3) and (6.3), we get
(6.15)
t+h
I1  l0(t, x, u)h l0(r, x(r), u)  l0(r, x, u) dr
t
t+h
+ l0(r, x, u)  l0(t, x, u) dr
t
t+h
t+h
cx(r)  xb(x) dr + cb(x)r  t dr
t
t
t+h
cb(x) Kr  t dr + cb(x)h/2
t c(K + 1)hb(x)/2.
By virtue of (H3), the inequality (6.15) is valid for h such that x(r)x < 0 for all t r t + h. This requirement is satisfied by choosing h 0/K. On the other hand, using Lemma 6.2, (H1) and (H3), we get
(6.16) I2  J(t + h, x + f (t, x, u)h) Cb(x)x(t + h)  x  f (t, x, u)h
t+h Cb(x) f (r, x(r), u)  f (t, x, u) dr t t+h Cb(x) f (r, x(r), u)  f (r, x, u) dr t t+h + f (r, x, u)  f (t, x, u) dr t
30
D. HernґandezHernґandez et al.
t+h Cb(x) cx(r)  x dr + h t = Cb(x)(cKh2/2 + h) Cb(x)h(cK/2 + 1).
In (6.16), h is chosen such that f (r, x, u)  f (t, x, u) < for r [t, t + h]. The inequalities (6.14)(6.16) yield (6.13).
R e m a r k 6.6. From Remark 6.3 it follows that subtracting J(t, x) from both sides of (6.13), dividing by h and letting h 0, we get
(6.17)
0 l0(t, x, u) + AJ(t, x, u) + Cb(x)
for almost all (t, x) , and all u U .
We shall now use J to construct a smooth approximation of J. Let (t, x) be an infinitely differentiable nonnegative function such that (t, x) = 0 if t + x > and
(t, x) dx dt = 1.  Rn
For (t, x) define the convolution
(6.18)
t+
J(t, x) := J(t, x) =
(t  s, x  y)J(s, y) dy ds
t B(x)
+
=
(s, y)J(t  s, x  y) dy ds,
 B(0)
where B(x) is the ball in Rn with radius and center x.
Lemma 6.7. J belongs to Cb1().
P r o o f. Continuous differentiability of J is obvious from its definition. On the other hand, applying Lemmas 6.1 and 6.2 to J, we see that
(6.19)
J(t, x) J(t, x) + sup J(s, y)  J(t, x) st,xy<
Cb(x) + 2Cb(x) = (1 + 2)Cb(x).
Let 1 be as in Lemma 6.2. From (6.18) we see that for each < 1 and each (t, x), (s, y) subject to t  s, x  y < ,
(6.20)
J(t, x)  J(s, y)
=
(r, z)[J(t  r, x  z)  J(s  r, y  z)] dz dr
 B(0)
Linear programming approach to control problems
31
(r, z)C(s  t + x  y)b(x) dz dr
 B(0)
= C(s  t + x  y)b(x).
Inequality (6.20) shows that
(6.21)
DiJ Cb(x).
Combining (6.21) and (6.19), we get the statement of the lemma.
Lemma 6.8. J  J b 0 as 0.
P r o o f. In view of Lemma 6.4, it is sufficient to show that
(6.22)
J  J b 0 as 0.
From (6.18),
(6.23)
J(t, x)  J(t, x)
(r, z)J(t  r, x  z)  J(t, x) dz dr
 B(0)
(r, z) sup J(t  r, x  z)  J(t, x) dz dr
 B(0)
ts,xy<
=
(r, z) sup J(s, y)  J(t, x) dz dr
 B(0)
ts,xy<
(r, z)[ sup C(t  s + x  y)b(x)] dz dr
 B(0)
ts,xy<
2Cb(x),
where the last inequality in (6.23) follows from Lemma 6.2.
To conclude this section, we shall use the previous lemmas to prove Theorem 5.2.
P r o o f o f T h e o r e m 5.2. Let L(x) := J(x, T ). Then, from Lemma 6.8 and the equality J(x, T ) = L0(x), we have
(6.24)
J  J b 0 and L  L0 b 0 as 0,
which proves (5.1)(5.2). Now, from (6.17) it follows that
(6.25)
0 l0 + (AJ) + C(b ) on S.
32
D. HernґandezHernґandez et al.
Thus, to complete the proof of the theorem it suffices to show that, as 0, (i) (AJ)  AJ b 0, (ii) l0  l0 b 0, and (iii) b b < . Fix < 2 (where 2 is the same as in Lemma 6.5) and (t, x, u) S. Then (i) follows from
(6.26)
1 b(x)
(AJ
)
(t,
x,
u)

AJ
(t,
x,
u)
=
1 b(x)
n
fi(t  r, x  z, u)DiJ(t  r, x  z)(r, z) dz dr
i=1  B(0)
n  fi(t, x, u)DiJ(t, x) i=1
=
1 b(x)
n
[fi(t  r, x  z, u)
i=1  B(0)
 fi(t, x, u)]DiJ(t  r, x  z)(r, z) dz dr
n
(fi) DiJ b,
i=1
where (fi) denotes the modulus of continuity of fi. We now prove (ii) using (H3):
(6.27)
1 b(x)
l0
(t,
x,
u)

l0
(t,
x,
u)
1 b(x)
l0(t  r, x  z, u)  l0(t, x, u)(r, z) dz dr
 B(0)
1 b(x)
cb(x)[r + z](r, z) dz dr 2c.
 B(0)
Finally, to prove (iii) we use (6.2):
(6.28)
1 b(x)
b(x 
z)(z) dz
1 b(x)
b(x)
(z)ecz dz = const.
B (0)
B (0)
Combining (6.25)(6.28), we get the statement of the theorem.
Linear programming approach to control problems
33
References
[1] E. J. A n d e r s o n and P. N a s h, Linear Programming in InfiniteDimensional Spaces, Wiley, Chichester, 1989. [2] W. H. F l e m i n g, Generalized solutions and convex duality in optimal control , in: Partial
differential equations and the Calculus of Variations, Vol. I, F. Colombini et al . (eds.), BirkhЁauser, Boston, 1989, 461471. [3] W. H. F l e m i n g and R. W. R i s h e l, Deterministic and Stochastic Optimal Control , Springer, New York, 1975. [4] W. H. F l e m i n g and H. M. S o n e r, Controlled
markov processes and Viscosity Solutions, Springer, New York, 1992. [5] W. H. F l e m i n g and D. V e r m e s, Generalized solutions in the optimal control of diffusions, IMA Vol. Math. Appl. 10, W. H. Fleming and P. L. Lions (eds.), Springer, New York, 1988, 119127. [6] , Convex duality approach to the optimal control of diffusions, SIAM J. Control Optim. 27 (1989), 11361155. [7] O. H e r n ґa n d e z  L e r m a, Existence of average optimal policies in Markov control processes with strictly unbounded costs, Kybernetika (Prague) 29 (1993), 117. [8] O. H e r n ґa n d e z  L e r m a and D. H e r n ґa n d e z  H e r n ґa n d e z, Discounted cost Markov
Decision Processes on Borel spaces: The linear programming formulation,
J. Math. Anal. Appl. 183 (1994), 335351. [9] O. H e r n ґa n d e z  L e r m a and J. B. L a s s e r r e, Linear programming and average optimality of Markov control processes on Borel spacesunbounded costs, SIAM J. Control Optim. 32 (1994), 480500. [10] J. L. K e l l e y, General Topology, Van Nostrand, New York, 1957. [11] R. M. L e w i s and R. B. V i n t e r, Relaxation of optimal control problems to equivalent convex programs, J. Math. Anal. Appl. 74 (1980), 475493. [12] J. E. R u b i o, Control and Optimization, Manchester Univ. Press, Manchester, 1986. [13] R. H. S t o c k b r i d g e, Timeaverage control of martingale problems: a linear pro gramming formulation, Ann. Probab. 18 (1990), 206217.
Daniel HernґandezHernґandez Departamento de Matemґaticas UAMI Apartado postal 55534 Mґexico D.F., Mexico Michael Taksar Department of
Applied Mathematics
State University of New York at Stony Brook Stony Brook, NY 11794, U.S.A.
Onґesimo HernґandezLerma Departamento de Matemґaticas CINVESTAVIPN Apartado postal 14740 07000 Mґexico D.F., Mexico Email:
[email protected]Received on 9.2.1995; revised version on 29.8.1995
D Hernández