So far we have chatted about optimal control problems in the absence of any uncertainty. However, as we discussed earlier, real-world systems often have some uncertainty. So let's add back uncertainty in our optimal control discussion. We will primarily deal with non-deterministic uncertainty, where d∈[d,dˉ].
In the presence of uncertainty, the optimal control problem is often formulated as a robust optimal control problem, where uncertainty ties to maximizing the cost whereas the control wants to minimize the cost. We will discuss the problem in continuous time, the discrete-time formulation can be similarly defined:
In the above problem, control is trying to minimize the cost function and disturbance is trying to maximize it. In other words, the control is optimizing the worst-case cost.
Note that we have modeled uncertainty as if it is working adversarially against us; however, the real world does not behave this way. This choice is made because we don't know what uncertainty we might face and hence if we plan against worst-case uncertainty, we can always ensure good performance.
The above problem is a game between the control and the disturbance since their decisions are intertwined (through dynamics). Since control and disturbance are a function of time, it is a dynamic game. Finally, since the cost function for disturbance is negative of that of control, it is a zero-sum game. Thus, the robust optimal control problem is an instance of a zero-sum dynamic game problem.
Zero-sum dynamic games have been studied extensively by Rufus Isaac who came up with a "dynamic programming" principle for zero-sum games. In fact, there has been some dispute about the original founder of dynamic programming.
Rufus was studying these games in the context of pursuit-evasion games where the goal of the pursuer (disturbance) was to chase an evader vehicle (control) who was trying to escape the pursuer.
When we have two players that are playing against each other, their optimal strategy will depend on what information each has access to. To understand this, consider the following example:
Suppose there are two boxes each with two slots. Each slot contains some prize money. Player 1 wants to maximize its prize money whereas Player 2 who has to pay this prize money wants to minimize the prize money of Player 1.
If P1 picks Box 1, P2 picks Slot 2. Reward -> 10
If P1 picks Box 2, P2 picks Slot 1. Reward -> 1
The best outcome of the game for P1 is to pick Box 1 and get a reward of 10. Mathematically, this is written as:
If P2 picks Slot 1, P1 will pick Box 1. Reward -> 2000
If P2 picks Slot 2, P1 will pick Box 2. Reward -> 6000
The best strategy for P2 is to pick Slot 1 and pay P1 a reward of 2000. Mathematically, this is written as:
d∈[S1,S2]min(u∈[B1,B2]maxJ(u,d))=2000
Clearly, the outcome of the game depends on when and on what information each player decides in his or her input. In general, whenever we have some maxminJ(⋅). max is computed second, but plays first, and min is computed first, but plays second.
The same order and information aspects are important for our zero-sum dynamic games. In addition to the order of play, it is also important as to what information each agent has when they are making their decision at some time s∈[s,T].
The above formulations are not suitable to describe most practical systems. Typically, at time t, we only have information up to time t. Such strategies are called non-anticipative strategies.
Under non-anticipative strategies, it can be shown that the value function exists. The robust function can be obtained by applying the principle of dynamic programming as before. Specifically,
H(x,V)=L(x)+vx,vymindx,dymax(p1vx+p1dx+p2vy+p2dy)=L(x)+vxminp1vx+dxmaxp1dx+vyminp2vy+dymaxp2dy (This can be done since vx,vy,dx,dy teams are replaceble) =L(x)−∣p1∣vmax+∣p1∣dmax−∣p2∣vmax+∣p2∣dmax=L(x)+(∣p1∣+∣p2∣)(dmax−vmax)