OPTIMIZATION AND CONTROL
Richard Weber
Contents
DYNAMIC PROGRAMMING
1 Dynamic Programming: ‘The Optimality Equation
Ti" Conirl opinion ve ie
UE Rutmpe to honon path pie
14 The opmatyeuation ===
15 Marie decom proces
2 Some Examples of Dynamic Programming
21am: manage sedis aa
25. Example: acepting te best ofr
1 Dissnnted cost
52. Bsampl: jb whiting
BA Theinisiteborinn cm |! :
8:4 The optimality equation in the ait bvion eae
5. Examples sing an mee
4 Positive Programming,
41 Esamper pole fc of an ota ply
12 Charcereo of he opel py
{3 Bsample: option! suboe
Valu teraton
3 ample: piarmaceuia ike
5 Negative Programming
S10 Stationty poles
2 Chnncainuton af he oil iy
'Opsmal song over «Malte hotuon
1 Example: epi parking
iL
2
a
1"
u
uw
1®
55. Optimal soning over the nite Borne
© Average-cost Programming
Br Avrageeast optimisation
£2 Blample:wiisioncomtol at ie
82. Val eration te
Poly improvement .
Lae sysrEMs
"he 1) regu a
72 The Reem recon
18. Rsample: aries white oie
TA Contmoar tine LQ regulon
4. Lmar dele Unearation
8 Conteolaitty
RI” Disturbances
82 Tracking
8: Contra incontinence
85, Example: broom bnimcing
Infinite Horizon Limits
1 Examples sae in» pve eit
2. Sealab
8.4 Examples penal
35, nt rvs LO veaton
Di The [Ane] epee
10 Observabity
101 Oleeray
102 Obserailty in coatinvous ine
103 Bamps
11 Kelman Filtering and Certainty Equivalence
112 The Kalman Ber :
13 Caro egufabence
14 Example nits oe wih a’ pation tsi
CONTINUOUS-TIME MODELS
2
Et
2
»
»
2
a
a
6
“
a12 Dynamic Programming in Continuous Time “6
21 Theopsunay equation %
129 Bvample: 1Q retin 6
123 Example eaate penning ie
124 Beample Laren °
19 Ponteyagin's Maxiznam Principle
Th Beare destin
122 Baample: bron patie tor
183 Connon with Lge euler
124 vamp eof taney onisone
1 Applications of the Maximum Principle ss
TEP Probl with torvial conn =
22 Exam: unui a
13 sample Insects a optiniers
1A Example cet thre opinion
15 Controlled Markov Jump Processes st
TR Thedynanmi programming equation . Ea
185 Unibriton bn the ffi rion os a
154 Example also contra tn mis = oa)
16 Contra Difco Prowse “
161 Dion procmes and oneal difasn proces L
162 Example: LQG in consinon tine 2
6A Addenduot on PNP. cova subject wo comer Loot
Schedules
‘he fst 6 lectues ate devoted to dynamic programming in dicretine end
cone bth Gite td insite boraon pans: dnote cnt. pt, negate
‘ale eration hp impeoverer
“Thenest Slater ntedevavd tote LQG model (nea system quadratic cot)
sd cover the impore es fenton ser; the Ret eto
“The final 5 lctures are devoted to continuous-time models and incitement
‘of Poraryagins asian principe andthe Hamionnn ako deesaon proceso
Enh of te 16 lecture i eign to be some scot i thee
wal be one eat o ‘Negative Programming, one on Contvola ce Exes
‘a spinon ar portant nth couae 0 there ae oe nore mie eee
in each etre
Examples sheets
"here are three example abet, corresping tothe sid of he course, "Tere ae
tor thee guenions or ch tare oe tected nd wre oa poe ee,
ch question eck to indicate the lnere with which ex acne
Lecture Notes and Handouts
"There are priated lture noes er the couse and tbe ocuoal andwts, "There
srrahetssammarangnoeaton ar what Sou re expt ofthe eae
‘Te notes acide at of kayword and wilh denn your atria to thee
we yo long Ifyou have «gor sep of te meaning ofeach of ee hry Wor, ek
ou wil be well on your wa to undestaning the iportact concep ofthe ours
WWW pages
etp://ewastatalab. can, ac UR) Pew /oe/ dex tS
Books
‘The flowing bool re recomended
D.P, Beriatas, Dynamic Programming, Prete Hal 198,
.P. Berta Dyna Programing ond Opal Contr
LLM Hocking, Optimal Control. Aw introduction to the ory and appleations, Oxf
Raw, Itmdtion to Stachate Dyn Progaming, culate Pres 08,
White, Optimisation Oeer Tame, Varn Tap I, Wikey, 1962.85.
ows book i probly te ease 1 read. However, only coves art Lo he
‘outa: Whites bk i pot foe Pst H and Hoek book ot for Dare HL Tes
the arena clletonexn te fa fn Seton 3D and 3D ofthe DPA ary
Notation ders fom book o book My notation wil be dost to that of White's
books and consent througout. For example, I wil always depen iia cst
Fnetion ty) (cere inthe econ tke pun nd, Weed a
nytt symbols ed for thls qos)1 Dynamic Programming: The Optimality Equation
We invoice the des of dso programming an the pic of ontinaty, We
sve notation fo tatesteutie eel, fete te af eck, open To,
‘cap coat ni dc roms hl he ha cate eh
1.1 Control as optimization over time
Optimization la key tol in moda. Semen is npartane co salve «prob
Fe cially Other tines ett a pt aptinal lution is food enough the reat
wotlen dosent hare ge eterion by which a solution can be jue. However,
‘ven thes optiizton i efi a way Yo test thinking He opt solton ts
‘calor may aupat wasn whi both modal ad hiking can be eid
‘Cael mers is cou wil dye mptin sn ie opinion over
time. T ceo fr the Face that «snarl seer ta eve socal
Ut ay wanna unkown ciety ere as wes frit, i
tte UK econo),
‘Thc contenste wih opinion mln the TB come (ch thon or LP el
‘evr ow mods): hee stl and wig was rato o dn. Te ees thee
tev letra: dja stctnate eve, and inpertect eta cleeration, tt
‘We cost ape an et Ici can the portance of corral ry
teasing te detent song the winds on qveror sd soo Sah le
this theory hich we ite, Nault everon ad Asami la
1.2 The principle of optimality
A ly iden in that optimization over time can ofien be regard ms opimizaton in
‘ages We tae of our dese vo attain te lowest posable eos atthe preset Sage
tinimies the su cf the cot ncured tthe curren tae and he least otal ost that
Se beer om all sue ages, compet is dein Thi snows
‘tke Pine of Opal.
Definition 1.1 (Principle of Optimality) From any pont onan eptinalaeetors,
(he remataing trajectory optina forthe coreaponding prehlm intaado that pn
1.8. Bxample: the shortest path problem
Consider the stagecoach probe? in which a trelerwhbes to minimize dhe lente
fs joaraey om town Ata town I by St tring to one of UC or D aed he
‘a to one of Ey F of G then nas wo one of Hor Tad th ally t6 3.
oad sytem fo asscoach problem
Solution. Lt F(X) be the ina ste required to each J rom X. "The hal,
Fay 0, PU) 3 ad FO) =
PIE) = info +A, =
ds on, Recursive, me oa #(A) = 1 a in
ASDF althowah ii ot une).
The study of dui programming date fron Wiad Balan, who woe he
fit hoon the abe (1077) and ga iene. A ry age mame of problene
‘ou be ental sy
1.4 The optimality equation
‘The optimality equation im the general ene. In disenetatime ttle inser
‘ale oy ‘Suppose wf contel variable wave value st be chan at
Ue LE Urey = Caps) ont the pti ence of ental (or dots)
{alin dt sng Supp the oot pt the ime Mors be by
© = Git) = Glayeny opt)
"Th the price of pt
ity is xpath lowing bmoxenn
‘Theorem 1.2 (The principle of optimality) Define the fonctions
Cientd= ial Ua)
hen thee obey the mension
Cilia H) ight) tek
th torminatexataion GU, 8) = GC).
"Th cot isin ro the dition of GINA hs
Ur) igh inf ley teas Oe gay“The state structured case, ‘The cokrl vrais choweron the ss of cng
Upon = (Goya, (wich determines everything es) But a me economical
repression othe pet bitin bf aici. or ecole nny it med
tow se ene path that bas ben followed yp tote but ony the pace co which
teins taen ur The den of state ariable © Bs that wae nf dence,
is ebulbl om own quite ie obeys plant equation (or of motion)
4)
Sonpose we wish to minnie cot Beton of he frm
fee
0 Scnent-r Css oy
cio ot) Dib he fm
S enwnr) +Oxlen), a)
o
ud the minimal cont from ine towards a an optimization ove {steal EO
Fist) = iC
ee 1) th inna fers ont sonar hat the at ea ab
time € Then by an inductive poo, one can showin Theorem 1.2 that
Fiat) igen + Plata. 0). teh. as)
eth terminal eoaiton F(A) = Cy (2) Herein generic vie of x. The in
Ising in (3) he opin conte wat al nor of my, yar erent
‘The optimally equation (12) ius ealled the dynamle programming equation
(DP) Bellasan equation
"Te DP quate an opin contol den in what eal eek or
lowed lop fr, wih y= u(y 1). Then evtras to the open Toop eatin
invwbich (oyun) ate to be dterined al a one a tne A policy (ot
straigy) fo rl fr ning le the cae eae der al pune
‘Sreunstancen a fuelion ofthe perevelercstances, To summaries
(6) he etal a nein only af yan
olen)
(i) The DP eqntion exprenes the opal yin dome lop fon, He opti
whatever the pat cote pies ry ave bee,
“The DP stone mack ocuion in tne (a wok we a the opin,
stth=Iythen h=2 and on) The ater poly i decided it.
“Life must belived forvart on wndertod backwards? (Kien)
1.5 Markov decision processes
Consider nom soba evoaton, Lat Xy = (eye) md Ue = (ys ) dene
the x und wbistoree ttn As above, tate srusie is charter by the Hct
tie withthe following pope me "
(4) Mertor dymemies: (Le, the tach seson ofthe plantation)
Plaeas |XeUi) = Pheer |)
(0) Deaonposate cos eas sven by (1)
‘Thee amtimitions define tate structure, Far the moment we se ei
(©) Perfect state obcration: The caren vlc of the ate i observable, That i,
“rts pw at the Ue at which te use be dicen So, hice Wy dane he
herve stay atte fe meue Wy = (XU) Nove that Ci determined
Uy aso we ise wate C= CWA)
‘Twe menpions dfn wnt chow a dcr tine Markow decison pro-
eat (UDI) Many of out toate wil be of this type, Av aoves the ent
onward em by (1.2), Denote the minimal expected cot hor Ge rere oy
Whee = denotes a ple, Le, «rule fr ein the contd wy.
Weean
‘Theorem 18 FIN) ie a Jencton of 24 and talon, soy lent) Moye the opt
nat ean
Flew
if (derued) +2 Crant + Dlaaulhy Coby AA)
sith ermal condition
Penh) = Cao.
Moreover, a insng value of in (LA) (which els oy a fction 2 ant) i
pte,
rook The vl of (HH i Cay) 20 the aera redeton of Fie ada ine
Aun talline t+ 1 Ibe DP eintion th
FIM) =igflea ent) +BlPtensn t+ DL XU) as)
Bat, by auto a) the igh esi of (15 eet hl
of (4) All Ue aerions ten fallow.2° Some Examples of Dynamic Programming
We strate the meted of amie programing ao me fl rick’
2.1 Example: managing spending and savings
An investor eines anal incone fom sulk society of pom in yet. He
‘ete uy sd hy a to is emp, Oy sy Th pital ete
Interest ate Bo 100%, ob incon par t+ 1 increas 60
signs een) + Bloe—a
edie aon is total oat ove yes, C= 327
Solution. Inthe notation we have Be tng, ott) = Wy, Cp) =. Thi ie
‘avis o workin tre of ‘time to gol, = ht Lee F(a) dete the ai
foward obtainable tating in wate rand wen there i tse # vo go. The domme
Progamming equation
Bi
nego Beale +620)
shee Fs) = 0 ace no mone canbe obtained once ine his ence) Hae, 2 and
(ee geri vals for Zp
F(a) = gasce+ t Bee—a)) = ps a 0) =
Next,
Fila) pag + File Blew pps ea)
Sige a $2 +02—e Hein ts maa cur tw = oe
Fla) = mas 4 O25] ~ a 02) — pe
Th rete he gues ra) = facie Tele hl, wed
w+ ocaat + pole
Ege t pve + et2— wl se
‘sour gue ist and Fs) = e., where p, obeys the recursion impli in he
RTE ST) ings
{hseere TE
te at gt LM hp
Ietoinves the whole at the inore in gears. .-sh
1 (0 bunt pea ast
‘There are ever thingy north remeber rom example. () Ie often cal
to fame things interns ‘of tae o> yo, (u) Ah the form of the dame
‘agreming ation cnn steeies lok mary, try waking bmcoar fre FC)
(stich) Oten pater wll cee a whieh we ea pee topeber
SDs (i) Whew the dais nre Heer, tbe optimal comedies ta etere
Pic of th wo outset The form of po chee some nothing
‘Secon: everthing, known as bang-bang control.
22. Example: exercising a stock option.
The owner of «ell option kab the opin to buy sare at Hed ting pep
The pt nse be nace yh I a neo the pion on ny fae he
inadintly nl the abr xt the caret yi rhe xs mem pat of =
Suppor the pice sequcice obeys the equation iy = 24+ ty whet te ae Lid
random variable or which |< sc. be am eto exer the opie apna.
Tar Pe) eth sl antn uanel snpestel wat ha the shar
snd there ae days to go. Show tat () #2) non deeensing as i) Fo) —2 is
tonocteasng ina il) fa) comtnuois a, Dede tat the oda poy
hs he aration lone
‘her ents enn deresing somnee (a) sch that en otal ply it exerese
the option th ratte ha 2 ey are 2 he caren price ans the ner
ef das Coy ere exp of he oan
Solution, The sate yrabl ttn ¢ sly opening, x pt «verble wih
indies whether the option has ben cece or at Moneve, ie oly the her
‘ome wich in often ri the sive tne variable. Since dae programe
‘alas ealeaations bacrard, fos the teredantion pol it wen stage
Fivetion (maxinal expe probit) with «as to go then
sd the dynamic programing aton i
BI
mn(a—pEIRa(e teh 9 hae
Note thatthe expectation opener comes oui ot rid, F-()
‘Oe can we tdcton vo show () (i) ad (i). or eam) 6 obviows, ince
‘Formal prot "
Ke)
we(e— pits +0)} 2 max(s —p.0) = F(a)
Now supony adusivly that Fon > oa, The
Ble
marten p.BUP.s(o+ 6} 2 (e~n EIR ate)whee, isn dereing in» Sines am inductive prot afi) lows fom
Bia) = 2= mx(-7.Balete)~ (249) +810},
snc the ke han! draenei he non eres erate the ih
om (i) and the fat tat (2) > pt lowe that ere exis sah at
2a) iskeater that s~pilz and eas 2 ~pilz > on I follows ro tha
ie onrdeeresing t,he constant oy the ales for hich Fale) 2p
2.8 Rxamples accepting the best offor
We aca ntrviow hcudate fr jbs At the of ah evi we a ithe
hire reject se andate we have Just se, ad mony tot cange eis detente
Candidates ae sen in radon Orden an raed wet the ten previo
Solution, Lm Wy be the hiory of users sp to tw 1, be er we have
inerviemed the i candi A sh matters ete eo wb te hk
‘adit Utter than all ber predevewors let = 1 if thin intra ah ry = 0
ier. Inte easy =, the probably she ste bes ofall hepaites
Pibestofh) ht,
First oti ~~
Now he fet that he cast te bet the # cates ee 0 psc
rection othe rltive rn ft Bvt = andes: the y= Tae Way ae
atic lodependet ad we have
Ps |=),
PW)
et F(t~ 1) be the probity that ander an optimal poy we sect the et
‘oat, ve at we ve seen #1 ean aa Ue La ne a he
Piast of | bes oft ¢) =
P= = Perm) = Pim) = 4
Foe) =
100-2
Eo)
ewe py F(t 2) 2 FOE) he a A Tee es o/h and 10)
110.) > ha for lng t we hae FO.)
F913). Hence
s sete a be opis oly ue Joe ae tk a decening oder othe es
SEP7(0~B%) Ths type ening ie noo ats inerchemge meet
‘hare ae » couple pits to ote, 0) Am interchange argent an be usa
for sing dco problem about syst tat eve cage ARoogh suck
[woblena Gan be altel ty dyunnac programming, as ierchange argument bert
reckon eniee (8) The ds pte ec ote ally paced i ne
eve they ate te points at which «numa fas ave been camped.
8.8. The infinite-horizon ease
Inthe it oraon meth at fanetion obtained simply rr (2) tthe bowed
‘cir from te tori ok However, wi the horion ate tae 0
‘Tet ws coder the tinebonmgencous Maroy cas, in which cost ed nami
‘donot depen aise] = eau). Spout tha here ino tril est,
a; Oy(a) 0. Deine the r-horon car wer peg = =
eead= Be |S aetaun
where Ey denotes expectation oer tho path ofthe proces under paige tae
"efi th rsp to eve the fia oon os
Fie) = Wh Fa),
(Cory: this naps ete aod tio the optic euation
Faay=
(lee) + ELE (ar) [39 = 2500 =P ey
with ein condition f2) = 0
“The epe torcn cst under poy is ko gute naturally defied |
Fie) =
im, Fle.D (discounted programming): 0 <8 <1 tee) @” Hac day thre te prot 19 that
Fd the poly thi iiss the fventr exacted retur and exes ha aia
‘eur neque rot of easton. Show ta ge) 2h he
{hen e ould nvep tho it Wi Whi» at east
Solution. Thee are only two states, depetng whether sbi has bee acepeat
‘riot Lt thes lad rept The optimality tin ie
Ly ay SETH) 9
ra
= 981) + [iy BEN) dy
aris fie ort
o-aeoy- [ sn
hh na ng = ha oe a a
site tne tect iene ead a
Sabie SSSA ube a a the
agetiaitaTad tiga alae ate et
(1 si) =2/8FC)~ BEOYEWP,
snd has #* = 1/ VIE Bh ad 9° = VAT =
1312 1/2 9 sould nately aco be Bit A
‘ose that dsoumtng ain in tic role beans a each stage the pb
stitay 1" shat neste’ wll ctor shat Ings hinge to an rd Te
‘Abrectecaton fe mane a whieh decoustng ean wie ft ule wet
PAP ae ao4 Positive Programming
\We addres the seca theory of masinzine oti ewan, oon tat thee ray
leno optimal poe bat tat ifs paey has ave neon tot satis dopey
‘euntion then opt) nd the et ae ere,
4.1 Example: possible lack of an optimal poliey.
Posive programming concerns minimising non postive cots, cra) © 0. The mme
“riginates rom the eee problem of masinitng nom give rem rz6) >
fed for this seta we peers in that setting, The ellowing example sw
‘hat thre say be
‘Snowe the pou stain at the oo-neatee integers alin eat x we have
‘nice of hee ving to mats +1 aed weevig no rewur,otsng ost
‘Ring eae 1 1/s sn) en reining tae O toner ad Sain
Fare reed. The optimal ection it
Pooyeumal—w/eFet Dy 230
(hen (2) = 2,2 > 0, ht the pic that dros the miming ston in che
‘pray equation slnays owen co sate 1 ad hen hn sve venerd. Cal
thor xno ple that actualy achieee newt of
42. Characterization of the optimal policy
The following theorem proves» necessary and scien comton for plies tobe
‘optima nay value tnction mst st she opt eto. Ths hore
‘Theorem 41 Suppote Do P belie end is a pay whose value faction F(2)
tela the optima opnton
Poa) = mpl ayad + AERC) 20
root Let 7 be any poly ao upon it hee) = fe. Sine M2) eather
‘he opty ewatian
ene) & es fle) + SE Pra) |= 2308 = fle
By repeated alti fei it tel, we Bad
Bena) 2 Be [Eortenea|
nee P we ca drop the final term on the ight ad sie of (42) (because
tonne) and then et nese D we cn to ei alert
(he ter teat era. Hither wae hve Pye) 2 POG)
4.3 Example: optimal gambling
‘A exmle he pound and want neease tic to N. At ach ae ab ean bet any
Frcton of he expt sa 5 Ether se ws with paiy pl wow hae)
oun she les, wi Probabity¢= 1 7 and bas t= j pte Lt the save
Seow te (0.1, N)_ Th pan tps pon ronching ate Or The oly ser
Fevardis 1, upon reaching state. Supowe p= 1/2 Prove that the uid states,
‘tadways btn ony T pod, ain the yeobabity’ ofthe gambler naling N
FO mnbOF +A +e,
1p show tht he tnid strategy opti we need to Go it value faction, 37
Gli and show she (2 slate to the opeaty equation. We bave Cli) =
“Lat Latent),
0 = =e past oe] +6
ins bei bd ie.
Tes a spl exci to show that
44 Value iteration
‘Thoin ot function # cau be spprenitnate by successive approximation o&
value leeratiom. us porns and practical tld of comping Let te
Mine
Fela
nF) = Bim inf Fe 2). a
"This eats (by mumstane convergence ute No Po by theft tat er D the
tort inoue ter Ge ¢ is sshingly al)
‘erie sat (4.2) revere the OM Oye ahd Int 0 (6). The lowingBy(2)~ Pla), However, it cae N we no! an adn amptin
(Gite actions): There are only nity many mile vals of in each tte
“Thcorem 42 Sappoee Hat D ov B holds oN and hall. Thon Fae) = Fle).
0)
(0) = Sim (0) ~ Li in 5.2 & i
(ie) Fla
‘Taking the inna ove 5 ge Fal) & Pl).
"Now we prove 2" a th pie ewe, eau) <0, 99 Fle) > F(a). Now ie
oc, Tn the dace ct, wt 0] Big salaacing B > 0 fom
‘texy cot Thieves the aie bio co ume ang ole by omy B/D)
te F(a A) aso deceno by hs aout Alles ce no eee sche
rest have jae proved apie. [Akeni note hat
FG)- 98/8) < Fla) < Fa) +" BL-B)
(ove you ote why) and Mealy Fle) = FU}
Ts the negative ae,
Fre) =i moles) + Final) [=a = e000 =a)
= ete) + ie 1B ale) [20 = 2500 =)
= mine) + Bole) 20= 2,00 = 0), wa
see the it aut flloms besarte inn ner ite umber of tera
Uh cond equality fellow by Labeepve mocotone converge (nce P=) inte
lay Me be de pty thn came he naning mtn he hte oe
(3), Theis, sbsition of (1.3 ft sland ing the fc shat N mples
esting $00 gies F(a) 2 Fle) 2 Fa).
4.5 Example: pharmaceutical trials
Tn to woot fr sen pati with pba independ of steoas be
ther picts The new dei tested ad wa uaknown probably of wcce
Which the doctor believes oe unordered ever 1 He teats oe patent
[dy det chore hic Sr to we. Suppor he be Sneed eects
Ethrs wie se new drug Les Pe, j} be the tata expect dco mambo
Tate patents who are surely ete fhe choos betwen the deus opty
ie creme: For exe ile te ony she otal reg te expec
‘ioctl mea ena ray ed ppb Sm A
Tho wast dean fa
Grew of, oc0ch
nC Mo of ose
sn thrones ns) = (41/6442) Ps ota eon i
eas £5 peas
re 221 acon py Eton ssa)
ee nt pile to gow wie expen for Bt we eo Bo a apie
une sation. Is fl very Inge sa 90, ten Sef) = (2 DY Le EP +)
I goad appeosimation wo @. Thus we can take roan)
2170 nod work baked For 9 = 005,
rr
AISI St S08 902917
11 02 ‘had ast a8
ee ee)
“Thewe nambers are the yeas also» for which tis worth continuing wh
a lest one more tl ofthe new drug, For example, Wi Tee worth
ing with th ew drag wn p= 05 © O63. AL thi pol he prob
that he new drag will accsuly treat the eat pati ie 5 aad 20 the dctor
"od actly presi the diy hi oly to cel Thi example bw he
‘Bikeenosveraen genie pocy hich ie fone iments sorry
‘pina pie, wie frets imiadiate reward in ord ga infomation sad
Dodly gener roars ater on. Notice tha worth wing the new dg at st
‘er p= 6761, ewe shou a et the new dw ony Be coe mh5 Negative Programming
We ares the spc theory of miiizine postive casts (ting thatthe atom that
‘xtremises the ight band sd ofthe opty sation vena opinl pli, aed
“ope probes ad thei slain
5.1 Stationary policies
A Maskov policy i plcy Unt pecifeth contra at time to he imply fant
‘fhe sete and tine In she pro of Tore 1 ne tad na = fi) 4 pei he
cectsl at tine This convenient notation for Maro gol, aed we wre
to Ufnfien) fim ion the pry doo nt depen ont, 80
‘eationary Marlow pole, sid we we T= (fof) =F
5.2 Characterization of the optimal policy
Negative programming concen miniising now mgatve costs eee) > 0. The ame
‘rine rom the etal pablo rim ne pve rrr) =
"The flonan Ueorem sve» necessary aie capdiin a tataoaey
iyo be opti mae mst choose the opin vo the ght hae sie of
{he opal equation, Nove tht inthe satemet of the Weorer We te requ
‘Theorem 1 Suppate Dor Whole. Sarpe
4, f(0))+ AEE (e) | 9 3300 = He]
= mine. 0) +E 20
‘hen P(ey2) = Fa), and x optimal
Sinn thie poy tax = fT by aubteating the ope eqption
ins tan wing the foc thi F species the wiinzns cont ech staat,
[Seca
Fe)
x= 2] +06 (Felie=a). a)
ts case N we can drop the Boal erm on the righthand ie of (1) (Omen He
on neste) ond the le» 2 neue Dee en et 9 2 ect bering that
‘his ter eds to ae. Hither was we have F(a) > F3). ML
‘Acoollary ie that an oil poley abaya exit, Nether Thorn 6 hi
oar at ee oe pastive poring (ese example Seton 11)
5.8 Optimal stopping over a finite horizon
(bo way tha he tosl-expectl cast cam be nie i iti posible wo ete 9 state
From which no fartber ect are nerd. Suppose is just to poste isa =
(ctop)and w= 1 (oon. Suppor there oe eration state 0 0; hat eter
‘pon choosing the stoping action Once ths state erered te ayes stay a that
ite soto further en enced thera
ayn The fniteharnon dye yrogtarming ection shore
File) = mine), le) + BYE (2) | 9 = 2.04 = 1) 62
with, iG) =H). =
Coir the ac of wate in wichita lest gd to stop now wt continue
‘one more step a then sap
8 = fe: Ma} Se) + 16) [29 = 2500 Dh
(Chars it cannot ho optima ost i 8, since in chat ease it would Yo srcthy
sl nite borin opm ples,
“Toren 5.2 Suppose 5 i clas so that once the ate enters $i emai.)
‘hon an optial oly or lt fase hortcans stop iond only 2 8.
Proof. ‘The poo i by iaduction. Ifthe horizon «= then obviously i is psa
to sop oly ifr eS. Suppo the theorem steve fr alae of 1. Ae san,
1 8 then bexter tonne for ore one te ads eke shan stp in wee
2 lec, then thetic that Sf loed ple ©8 and
then (2) gives (2) = (2). So we sboul sop If eS. MP
"The opine plc snow as « one-step Tookeaead rate (SLA).
5A Exampl
A dsver is oking for» patking apace on the way tobi dtoton, Bach pking
‘cei ve with proba p indepen of ether other pking spaces ae ee
{root The driver cana obverve wither parking pace Be itl he reach
the destination without havc parked the cost is D. Show that an optimal policy isto
ak inthe es ee spce tht Is vo furber shan s* fom thedesinaton, where»
the retest ape ach tht (Dp th" >
Solution. When the dive s+ spaces om the destination it only matters whether
(he pace eae (2 1) oe fll (2 ~ 0. The optimality equation ive
emf
optimal parking
(Gabe mail pe)
F-0) + pFix(D), inne wale space)ihre P40) = DF.
Suppose the diver adops polly of taking che at ee space that io ae
Let the cnt alee thi ae be A), hee
1H)
+ 90-1),
with K(0) ~ yD. The gece clon is ofthe for He) = <4/p 46+ of So afer
Siieutuung tn wig the boundary conan at #0, we hae
wo=-fese (pedo, seas.
?
ee beter to stop an (ot tan + fom the destination) thas tooo and te
Ue ot vate spe i in epi
S= 2:85 Me=1)} = (4:(Dp et 2h
"This set slat (snc sdereses) and so by Theorem 52 tis opping st describes
‘he opt policy mh
te diver pari in the fst sible space pt his tiation und wall backs,
tnt: DL yD, ee D=Ijpua neem ger a a 2
5.5 Optimal stopping over the infinite horizon
Let um coer the stoping pon cnr the infnite-hrizon. Ae nba (2)
ethene given th we are epi to stop by ines Let FC) be the aia
‘ost when ll that reued sth We stop even alt Since les ost ca be eure
(0) 2 Fale) 2 FE).
‘Tht by meme converses F(a) tr ta aH ny Fe
and Fla) > PC),
Cones the yen of op nin nah 8 the inter, where
(2) = 0, (2) = expl-2). The yliy of stopping mse ©. has FUe.2)
‘Sp(=2); and this ei te inferno pinay eqanin,
Fe) mingexr(-2), (1/28 +1) + (0/2)P
reaching large an iotener as we like before stopning: hence Fix) = 0. Inductively
Sn cn ee that F(z) = cpl 2) So F(a) > Fl.
Rohe uta ht at igh eo
Examples Theorem 41 ix not true for negative programming
Consider the above example but now suppose ne fs allowed never to stp. Since
‘tintin cots re the pst poly fr ll ite orzo an he nite bao
Peeve e cap, So (a) ~O ad thesis she pally xan above. However,
Fiz) = exg(—x) sho sat the opinalty equation sd he cit ered by
‘opping inmate Thiet notre at orp programing) tht Pay
‘he folowing lemma tes condtons under which the Ifa Hite hortaen cost
oes conert the nina nite hor cot
emma 5.3 Sipe all re Pode as fotos.
(0) K=apkte) <2 O/C miafele)>0. 63)
hen Fe) Pe) as 6
Proof (tare) Suppose xi an opti pig forthe init bxizon problem ad
‘top at thera tne 7. Th ot at ou (0 )CPle > 3). Hoeven
it oak be pmb to sop at ie Oe cet ao wo mor an K0|
(er nerie >) Pe) SK
In tho horizon roblem we cou fob, but stop at tine sf > This imles
fa
PO) ERO) aed te types 6 weakened
tor for cach = ere ent aw wach that (63) hele hen i placed by
ro Supe w i dase Ly some yy, By repented ulin of (63) it
iat we have
oe) <2 [Se
seen
Dive this by ¢ ad et + to obi
oe
Where te inl term on the rat Band see snply the average-cost under pole x
Minvising the right hae cde over z ges the oui, Te la for oplatel by ©
ta
Os -¥ tig bey
‘Theorem 6.2 Suppuse ere exists constant \ and dened fnetion ¢ setifing
(62). Then Ase te mol atrag-cnt ad te optimal satunary poe the oe
‘aoe he pti oh ight hand wef (62)
Prout. anion (2 impo tha (63) hoe with quality when oe taba 0 she
‘tatouey pale Ua chooses the opting wou the gt hase of (6.2). Thus =
‘epee i he imal erage BP
“The averag-cost optimal policy sound simply by looking oe abounded sation
to (62) Notice that Md iw slution of (52) thon 2 4 conan), becuase the
(a constant) wil cance from both ses of (62). "hs is undeteried up 20
ive coment le etching fr «schon to (62) we x hero ck my tee,
‘Fy Bsn bitrate 3) =
6.2. Example: admission control at a queue
ch day a consltnt i presented with che opportunity to tle on new job The
Jobs at ndepeneatydsrbuted oven ponble ype and on a sven day the fered
ype wi robliy wif = Devote dle of ype fey up ep,
(ne he bs accept jb he may ces no othe bul that ob complete. The
probabity hat no of peaks edna (0 ot, = EW be
Sh he coms eee”
Solution, Let 0 acd {deste te state in which be fee wo accep job, a
hich hee epg pon mb of fps epee Thaw (6:2) 5
2440) = Samant sa)
Ard) = Camda nite +
Taking (0) = 0, Ue have slat $i) = Hy — Np al ce
d= Semele eV
there is root. sy A", and this isthe minal average-teward, ‘The opsial paioy
tales be fem ace only jos for ick p> Ne6.3 Value iteration hounds
Value erton inthe aveagecest ce wed pon the en that #2) —Fy-a2)
yoxiates the minanal average cnt for Ie
‘Theorenn 6
Define
se mle) — Beales
apatite) Raaledhs (6a)
Then mg SX My, where the minimal average cot
root (stare
Maroy pan J Thee
appar tbs the fs stp of « shorn opal pay lows
BiG) = Fae) +10) ~ Fee) = es fC0)) + BF) [20 = 2,80 = Fe)
Fyala) +m, S cea} + BIR-ale |
for ll xu, Appling Theorem 6. with ¢= Fos and X’
Tound 12 Af eesti a iar ex
“This juries the follwing value iteration algorithm. At termination the algo
‘idk poten tna ply that within [OOS of ope
(0) Se Fila) =o
(0) Coma F om
5.9 =a,
implies me ©. The
fa) — min) + BF) lae— 8 8 — a)
(2) Compute and
se gor ep (3)
6A Policy
Ptigy imprest im civ mite! of impor satoonry ein.
provement
Policy improvement in the averagecont cane
In the average cot at policy improves lors ean be bas on the following
‘oteriscons Supp Unt ply F=f, We have sat A, 2 wonton to
A ole) =e, Foy) + Blea) Lao =2,0 = Sa)
a suppose fr se poi
de ge)
E osfalen) | Blt) Lay vas = fells wa
with tre nna fr sre x. ‘Thes lowing he ines of roa in Tore 6.
eae a which (0.5) ke x tne (2) ae pin, Tie tien
"he llovi policy improvement sleorithen
(0) Cans a artary stationary py oS 41
(pe fy pley Fon = fry eormive6, to save
AA da) = ea fesCa) Blols) Le =e = fee)
es [Eons
Ths gives wo ear pty eal eat ue sn (0:2).
(2) Now dtarnine the pai m= "om
a fle) # foe) [29 = 20 = Hl)
nga) +2160) Ian = 389
oo
taking $2) = f-u(2) whenever this posible. By applition of Theorem 6.1,
ths yes 8 srt inprowemnent wherever possible, I y= sy then the Meo
teeinater and sy Hopi, Oster, muta sop (2) with #0 2
beth the action and tate apace ar Fite shen thre are ony» Site mur
of peal stationary poles nt so the poley Improvement agordin wild oo
‘pt stsionney py in tele may erations By contrat, the ale erat
“Mboithn cum only cae move und mere aces appreciation of
1 ho cao of strc dscountng, the fling tbzorem plays the rol of There 61
The poet © sna, hy reveal satin of (0) ta eal
‘Theorem Suppose Hers crits bounded fiction onh th fol = on
Fe) le) + 881010) [non 00 =
"hen G mn and (i) canta tafe fo wth
minimal cot Sh oF w 16
TOMI Mn =A), 10,
root) Resse (83) ha io fo aray ad ly Meh ake
2° ab Migr and ni she vse whe w= 8, and
MMA =0 os wa
i) The tak of My ie non decreasing ia, 0 iF tie reonolable then itis
‘eo fo 6 7. Bt she ean este for r= my the Casey Mamion
‘hore () Comin the Lagann
Paws aa-Ta-m ay,
weary
Now we can derive Hom (8)
a
84 Controllability in continuous-time
‘Tucorem 8.4 () Then danensonal ste (A,B, «| 6 eontraae sf and onl if
the matric By hs awk, oF (8) emnalenty, eu ony
1 posto aft oa > 0 (i) teste is ote hen 6 antra at
aches th eer from 20) 021) wth ile nu wes
‘ee Nait —et0),
oe th
‘he eranaer becomes more lfc and costly bs 1
However Ele) LO. 61 00
8.5 Example: broom balancing
Conair the probe of blancing x broom is a upright poston on your hand. By
Kents ony se spc nya tan 3) gare. Final @ we bae
fon Land @ wala’ = (2a) / Ean wth = 9 the pa equation
aate-9,
aE) GC)»
spony,
Sie
he ates contol i nally sl.9 Infinite Horizon Limits
We present one fare example of controllably in cotinine the notion of
Ray nnd she inte boro Tin fo he LQ reeubsion pb.
941 Example: satellite in a plane orbit
Conder a stele of ut oa pena ital take polar ordinate 6).
re
hece pad up are the ail ad tangential componente of rt, = Othe
rma (uch that f= G0) wth + = pail 8 = Ve
Tcl that one run fr tanga srs Tear del eho they tl
saat estoy sro ac uli put neg hare ia pearing force
‘ke cordate of perturbation
he, with
Fe icy to check that My = [BAD] bs ra an ha therefore he sem
poo we
no iy 0
Ib ap 0” aryl
s=|i] w= an 2 #5)
n
Sic (2p0,0,07.Ma =O thin guar ad aw rank 3. The contrallabe compo
ei the angar momentum, p+ 708 = 608g
radius by target baking est “
92. Stal
sing the satinary control,
inability
Kay, we be = heyy + Bian = (ABR
Deition 8. We sy thot Piso stbiey marie inthe dcr tine sense fall
agencies of Tbe cy ove the unt dem the comps pow, |=
"The [4,3 eso a to stale of tre ete K sachet A+ BK
ce sahity nar and eee that 24 ast
Noe that ue = Ka nar and Maro, I soking contra such that» it
lutea oly to coer coutel ofthis spe sine, ae we So Ll such Cottle
‘ie opel conor the ie rao Let probe
93 Stabi
To contin in the plant eauation
deel Pt Bk
in continuous-time
Ae + Ba, Site wi onto = Ker
Bote amen hme l= SImyin
We say € i stabi matric the continuous tine sensei al egewaes of
ave neat el prt, and race Bat x; >
244 Bxample: pendulum
we wish to stile @ 10 zero by application ofa force w. Then
G = pbsind +
We hangs th tte variable to 2 = (08) and write
a(6)~( states)
~( hw)
= (42 )G)*()
Suppose we ry ste Wh a como = Ne = Aa. Ten
aeon=( 4° « 4)
sd this ha eignnlues 1 Y=GE=E. Se ether “gh — K > 0 ad one sgenalie
Ti ponte real pe. ich cane cere tm fice inwtabity gl KD
‘nd egevals ae purely ian, which mane we wis genoa Base eclaon:
Sosirewail ration rant hea fineon of 8 we wel (and tix Wo come et of
‘liton toh Le] pubes poo)9.5 Infinite-horizon LQ rogulation
Consider the tae armgencous eas and wet the Site Boon et in vee of ie
tons The terminal ert, when #0 ist! Fs) =, nal hat ll
wee = Oy without hee of gael.
Lemma 9.2 Suppor y= 0, > 0, Q > 0 ond [,B, «|i contol or stl
eh (Un) has» fe Hin
root Cots are non-native, Fo) non-leeeasing in: Now #2)
Thi "lly non derenng ns ft eeryw-Tochow hat eye bs
one of so arguments
esyster sconrlae then 2H is Lounded hesuse there ply which,
for any zp a il rng te state rset at mt nepal Bie et
hold ew er with ner thereto.
Whe yao fabian then tere c= Kak shat
nae ad singe = 0, we ae
elke
A BK ia saber
ae) 207 [Saye xT eK') «coe
ence is either case we haves uppee ound an =H tend to a Et fe
‘very Dy catering F ~ cy, the vector with m une in the jth place and eo
‘bene, we coclue thatthe jek element Othe ago oi eunergen Then
faking 4 de fllws thet the of aga! lata ao comer
Bodh mali ization and policy improve are eetve ways to canpute she
‘olution toa infnitehorion LO relation probes. Poly improvement nes alone,
the lnes developed in Lctre
‘hwo 4.2 which etal te sme fae xD, wad P pegraing, The LQ
ryuition probe a negative programing poder, bers we ct mpi The
roo 2, beens pene cetera otf 27 The i ot sre
‘Theorem 0.8 Suppose that > 0, Q > 0 andthe sien (A,B, -) conta
Then () The equi tt eesion
on
aso wnique now patie dfite solution I (i) Ror any Rite now negative defite
1 the vequesce {Il} converge fo) The gin matic careponding WoT 2
ity mata.
Proof (Csarced) Define I asthe fini of he sequence £0. By the pei en
te now tha hint eet ads otto (0),
Consier op = Kay oad spy = (A+ BK) ty for tary zy, where
21Qs BYUB) BTA nd P= A+ 2K. Wee wis (01) a2
4
+ RTQK 4 0Mr. 02)
ane hanet
flay = a(R + RTQR)oy +37 Mnos Dalal
Thus 2s decenses a, being Voued below by ato it tes to init. Ths
2] (RR QR) testo. Since 1-48 QR Ts postive deine his imps 25,
hic mpi i). Hance for atrary Sita nom np det
= sy 2 700, os)
owes if we one he cl ey wy = Kx then lle hat
<0 (7 )Mbr* Hh on
hs (93) sd (4 oy). 4
Fully non mezatie dest alo tien (0.1) the f= (01 ~ whence
(0 fon
9.6 The [4,2,C) system
“The tion of conollailty rt on th wsurtion thatthe inl value of the
state war iow It, however, ene mit rely pn imprte aerations, the she
(qui ares whee dhe ae of tae (te ache pastor i the preset) ca be
‘SXvnnat fom the obecrvatione The Scr ine ton BC) i ein oy
the plat exintion ad beeen relation
Aes + Bas
Ce
Here y © RY ie aber but # isnot, We cuppore C i
tuminr)and (i) the determination of pcan Ye exreaned
nearer, won
Prout Ifthe eptem hac elation fo (which io Wy hypothesis) the chi olin
‘tls uciqe and onl the mates, has ea, whenceaserion (). Assertion
(ih flows em (). The sec into) al i) em be veri ciety
tn dine the desation m= js ~CA™“zq than the ogaton amounts t= 0
1.242, Miho eqntions were hot covers we cul il tine 9 est sates
‘olson othe by enim ay positive det quate Frm i thee dentin
Sih rcpct ty pret, we cond riimion Srey Thi inion
‘wes (104). I oqutione (103) inde have x mln i, re milly consist,
swe supp) aid thi unique then expeson (104) ne a his sltion ths
NEN, should be norte, which alo condition (i).
‘oe dat we hve agin oud i Helpful bring i an optnztion tevion in
proving this tn, not ad to constr Oe ite stam ct ay, Bat
atl fo construct abet it lution wre a seat salto ight ot hee xed
This eprench fe se tothe sais! nppeoadh neces whem Oberst ae
corinne
‘ond only ifthe matric Ny has rok nr (i) equiclenty,ifond only if
Hie) -[ eC is
{spots dite for a> 0 i) I the system i osreb then the determination
(of 0) con Be writen
0) — HU" [orem
see :
nav [learrounes10.3. Examples
Example, Observation of population
CConier two populsions whow sats are changing accrue to the equations
iad, f= dore
(22) ce. me(h BD
sd othe tinal population ate eerie #
Example Radioctive decay
Suppose §= 12. nd dey fom 110 trate a um 1 0 rate Band orn
Peo Sat ate. We obere oly the accualaign in state The = A, where
Go + 80,8 hy bea
tonzerodterninat wd oer
Example. Satellite
cr pation of he mt ow atte, (ere
a \ fle uw? 0 8
Pa 0 0 01
a) \ie 0 won
By taking C= [0 01.0) we se tht me case that ster tolerable om
he bn of angle nenurmnts alae but nt seen fr C= [100 0), be,
o ore 1 oe
me{S 22 2] weft, 2 oo
pe oo a no
10.4 Imperfect state observation with noise
The full LQG model, whose description hus been dered wt now, assumes Kone
dynam, qdratie ces ard Gevssan nob. Inperet observation te most i
Be Ate Buy He 05)
ween atm, 000)
where is proces nos, is he oeration a ne tan is the oberon mae.
‘The miue oberon ae degrade nthat me omer only Cae Ate
o()-#G)G) Gt 4)
sod th 2 = NU). ba (tes) at
Salty Wt esa aan oC te at
Cece rm tet spose
Ute at tare we tra to the quetion of xtiating fe y.
‘We consider the iste of state entiation nl pial contol al ball show
(0) canbe cleat recursively fom the Kalan ier (aoe operator
= Ab + Bey 4 Fhlon Ch
‘rocens = y= Cy. rather than the whi ie
1 Aone el (Gs 90 = 20) andthe opi cok ne = Keres
then witout fall ifertion tbe epial contol 1 uy = Kat whee i he
Inst neu let sqanen eto fy Ln the inbrmaton (Uy) He
any ofthe eas we encounter in titans are waratd the spi sate
leper tems11 Kalman Filtering and Certainty Equivalence
\We pets the important concept ofthe Kanan er, etn evan and the
-Sparntion prinpe
11.1 Preliminaries
emma 11 Suppase ¢ end y are jst normal wath cere moms and comnanee
me Nee Vay
coll he va]
‘en te dotrduton of = comistionl ony 4 Gousan, wat
Bey) ~Valatys any
ond
cory) = Vou Vein aa)
Proof. Bothy and x~ Vaya! ae linear fnctins of and y and beer they are
Guinn. Brom & [2 — Veliatyg"] = 0 i allows that they are uncorrelated wd
{Unpisny aisle. He ie oan a =p ania
1 i kati wth ite uoeondiional dation, xd the Gan th ero men
td the covariance mati ive by (112)
"The etn a = in treo y din me ¢ = y= VigNy eh a the
Yinooe Yost square estimate of a tn of. Bve without the mst Lat
{rail y are ots nom thi Tee Beton of hae marae corn ate
ihn any caer ued eine for Ut ar funtion oy. I the Guta
‘one, ako the mas eed etm.
11.2 The Kalman filter
Let ue ae the LQG and eat structure rumen of Seton 10.
+ Borate. us)
ms ony
Notice that oth 2 aye canbe writan a nen funetins of the kan sae
(05.Ue-s} rns be normal, with soe ren yan cvarntno ates Vi The followin
"ore desis recuse updating elt fo ee two qt
‘Theorem 12.2 (The Kalman flee) Suppose tet conde Wo, the el
ate ye dotribted N15) ed the state ond ebereaons oly he rcasions of
he LQG model (1.3){118). Then conditional on Wythe coment tte i dried
ity ti). The conditional moon nd varance oy the wpdtngrctrions
A Ai + Bacay Hise CB.) as)
ENE AY LAT (EH AN AC™AL CYC CHa"), 118)
whee
Hy = (b+ AN ACM 4 CH CT) an
Proof. ‘The prot iby infusion ont: Conder tbe moment whe u-bas been
sina bs hot yet re The arn of) ental
Conasionl on (1), Rese quate are noemally dite Ah zero mens
Sod overoce tse,
wales =a Ee fe
= 08,
(Wersee-nGs (ki ellen vo wing 1) moral with eats eV
sd ovine mot Vg = Vgc We. Tw give (1LS}4009)_
cua Mr C7
11.3 Certainty equivalence
We sy that a quantity a potepandepeent (| Wo) inept
‘Theorem 11.8 Suppose £9G model assumptions had Then (i)
OW) = 3 + as)
sehr the lina en aqureseatimete of. those extn is determine by the
[Kalmon ler i Thsorem 11. end“ dates terms tht are oly independent
(i) the optinal control gi by
sho and Ky are th ame matin ani fl formations of Poon 2.21k ie important to gmp the cemctle fact thet (@) neers the optimal conta
tw ezactly the se ae wold De ol bens ware wr wad tok rales eel
{6 ther car ket egore states fog ther conor hewn) bel ot
‘scrstions yp tote t. Ths the Wea koa as ortanty equtvnenee Ax we
ive ee i he previous seta, the dtibitn ofthe tition emer Era dom
to depend on Uns The Bic tht the proiemy af optimal timation wl
‘oral canbe decoupled hs way is nown as the separation principle
root ‘The roa i oy backed induetian. Suppose (1.8) baw Recall hat
Se Ale + Dany tHe,
"Thon wth drat cot of he frm)
Boe ste) Halley += 1%
a Aete)
"Re Bu! Se-4 0! Qu, we awe
es]
HG) | Winns
[slsenna) + (Aber +a) MA(ABAA + Boal] 5
ce a ie fac Ut camino Wyte bal By ad 6 have
Inet td are ple inept. This enmure thas when we ex the qundetien
in powers of yy ad Hiya the expected tale ofthe Baer term inthe unten
‘resero sl thxpcted le of te jar terms eprseniad by +) re fey
independent
114 Example: inertialess rocket with noisy position sensing
Cons che ela case of contig the poston of a racket by inertia contra
ie velocy bat i the prea of ipefoce poo seine
Notice thatthe alseratno sion dies fom the usta eel of = C1 +
Toda Kalan ka fre for his vito we arg duty Rom ser
Suppome feat —seny VV) Couette
Fei bur Mate
(Te rleven onoration proces snow
‘egsion and suctuting a es
y= Beat Ham Be)
he sarin of i shrfre
fey) Sabai eae
sore en 2M BL Me)
Mining ta with respect wo Hyves Hy = V-(1+ Via)" othe ane in che
Vee Wa VA Hin t =Waf Nea)
Hee
Ve Veh tbe
Tether is complete ls of fran atte tar, then Vi
Welty fey maa) (¢ Ns tad ae
Te t
‘Ac far atthe optinal conto concerned, suppose an Inductive hypobess chat
UW) 2B psy whee denote ple independ tre The
=0,¥.=1/tand
Fon) =
gE (0 + Ge 4 oF + aR
FP + bias Hk Bor Ae WPM to)
LP)
inning ome han wl Ries ecu of
Mas =H
Hace, = D/(L4D(h-2) am sh opal conte ithe certainty eikonal
t= Diy D(h~ 0). Thin the ue contra in he determi ce, Bat
wah 2 relic’ by 2
nya +)12 Dynamic Programming in Continuous Time
We conidee deterministic dstamic moearanine i ominous tine
124 The optimality equation
1 coeinoun tne the plat enaton
daainush
ce [emeinnna seFC(ar 1,
‘The dain fatr on Sise-*#
Fo,
08+ of. So the optimality eqution i,
Fle lsat) +8) +06))
atlas 4
Dy consi he et kien i hr ern xpson we bi
ow
ap fern ort
een
tT a)
with F(a.) = Cle1) In he undixouted ease, wesley puto =.
‘Te flowing there ste tha if we can Bed pic whose vale fiction
stax the DP sation then hat ey fo
‘Theorem 121 Suppose « policy z, wing « cnt u, has wlve fonction F which
‘atc the DP anion (21) forall sls ofe and. Then So open
fret, Cone an poly ng com ye abe any
a
Fe
sm fon (den oP ME Satna)
settelans)
eqn nel lang te path, Boe 40) 2 Fv
etoa)-eecenn< f" men
a
“Th the v path incur cot of at ht (0,0), and oe = opti
12.2 Example: LQ regulation
The uliscountd onsinsus tine DP equation or he LQ regulation poten s
0 igh" +0" Qu f+ EEA Bul]
Suppose we ty acon ofthe as ant) = 2TH, mete Mf) In ay
ratte Then B= Ute and che opsimisng view = 40°™B"F, = -O-™B Hee
‘Thecfore the DP uation sats with this wi
fesnasatn
pron
hee me we the fet dat 2eTILAe = 2TiLAs + 27ATHe Hence we have x ltion
toh DP tion I 1) so hr Heat dire eqn Sesion
12.8 Example: estate planning,
Aman iconierng hie Meine plan of vest al expt. He biti
levelof sings 20) and no other icone ote han hat which he abalone
en ata xed ees fate, Hs tral apts therefore goverbed Uy the equation
HH = 8019 ul,
hee > 0 ad ws hi ate of expenditure, He wie to masimise
for x pven 7. Fd bis opin pai
Solution, The optcality equation fe
“Vaat,
Suppose we try a ssn fe fom F(t) = ftv. or thie to work me need
mn [va ones HVE + otae a)
a
Br Yeats epnting tad eed i
wan|p-o-torsst] ws
"hele sl ve sli fe choose Jt make he raced an
sxrnttr We howe the bounryconison ar) = Oy chose OP)
Thus wed —
forBy Theorem 124 we have etablhal the fem of optimal contol whi in else loop
wrnew= a) a
12.4 Examples harvesting
At population size x obers the past equation,
ene" 28
‘The faction a() rel the Facts thatthe poplation can grow when it smal bt
tebe evn ate when eget ain he
ined tat hares fl ve" dt
Solution, The DP syst (vith dincounting
sper + Siae) ano, cor
Heo uv yw ents eae ng ei
came ae a
isthe lanes nese fishing rate
sand Ft 0. Then
nto
Sapoce Fat) Fs) a8 =
wh
or BE pi] 0 oa
IEF) is concae,
. [pes ely | xe
‘Cerys he operating point, We cappose
pefA9re 0 ree
2) vane <0, 253,
‘Wes that ther chattering about the point 2 in the sense that sites between
ies man ae our ales ether de of 2, etl taking the ae (2) at
rine
(29
‘So fom (123) and (12.4) we bave
(ey = HR ee Pee Ne, (29)
ee) oleh ole) cect ona
Assuming Fis concave, Fes ia estve if texts, So
fem ats (Gare) (aa)
sara) (Saya)
ana(e)
ater ule)
‘mbes Iie fbsbacuse (12.5) olen nlboubood of 2 es regula
‘that Fae be oeantive. Bit the denominator changes sgn at the rurrator it
ae)
‘Grows eae a) utes to xronment peesues
‘Thee ica saeriee of long vem ied fr imate return a> IF te ina
Dopiton i pester than 2 the the oil poly js to cre a en we
reach the new F mi then fh nt tte = Wl2) Av a2” WO), #0, ey
Safety Inge Is becomes optinal to wipe au te fish population,13 Pontryagin’s Maximum Principle
We ein Ponsrains mesma pice derive it an gve examples of its ue
13.1 Heuristic derivation
Pontejagin’» maximum principle (PMP) cates 9 necoary condition that snst
fan ota raya. Ie x eneletso or# Fa aT ae the te,
(0, In compara, the DD approach ealeuatin for general nial vale ofthe
‘eate. PMP ean be bed ax bth computational and soli tebe (an in he
‘fend enc sole depen fr general tlle)
‘Greist «See srarant rianton wil pa tion &~ ay) ane
tanec cnt crn) et Sand ternal rat Ks} The tle fnetion FC)
‘bern the DP easton fthout dnsmsing)
en wan
xe 6, thoi oon
Pek, nes was
Lente att are om
proof tht Ae exis inthe rie se i actly tik el mnt. NO
ine the Hainan
Hasu.A) = Male) 2.0), asa)
sear dling a sk pit ft path x facto of he current at)
"Theorem 18.1 (PMP) Suppose wt) and) represen the optinal ental and state
trajectory, Then here eit an ind toetory Nf) such ha taper wf) af) nd
Xe) st
tem
) (a)
Mert al (36)
nd for allt, 0:5 ¢ 1, and al ese contol,
a0), 8, MO) $ Mla}, ul,NO), a7)
fe, the optimal contra ti he sole of» masimising H(i. NO)
"Proot? Our her prot is bard pos the DP equation ths he ct diet ad
cgtoning way to derive once that nay be expect to ld i ener.
“Asertion (185 neat, nt (187) lows fom theft tht he
sale of (13.1) optimal We ca wr (11) hcremertal fore ae
Fe) = ele m+ Beal + 8)
‘Using the enn le to dierent with eect toyed
oxy = Bee
= Mis - DB a49 206)
hee (15) flows .
"sie tha (10.5) and (12.6) ech give epations. Condon (1.7) ges free
imequations ace true aon
n't feta it, 0) ad 0
‘One ean make othe acertions ncaling specication of endcondtions (te so:
cola eraneerealty ondltione)
‘Theorem 13.2 (i) H=0 onthe opin path (i) The nl intial eondtion i ect
[fusion of he natal. The terminal condos
OsKy'6
lds tthe terminal x forall such ht 2+ is within oe) ofthe termination pint
fo posible optimal trajectory for ll sufcienty small paste
‘Proof? Anetian() flows tom (131), andthe Set amerton of 8) evident. We
have the tei! conion (12), fom whence lle hat (Fy Ke)" = 0 be
‘Mls each thet soe abo Beto & br ll ell anh pines Hesse se
‘nly ners In pits wbene moi ject al Bas ery to ad
"he pins (13.3) bok. Thc we int hve (18) .
cas)
13.2 Example: bringing a particle to nest in minimal time
‘Apart wi sven nl pain aa veo (0) (0) be bron 80 sot
‘poston 0 in minimal tne ‘Thi tobe done sng te conta ce sch tat
[oS wid dynam of = 23 nd fy = ve That
a(2)-(a)(s)*G)» oe
sed we ww inst
c=[ Tawhere a theft tinea which 2
(0.0). The Hanon ie
Hada t ent
hich ie maiiond by = sgn). The anne varia sti = 28/2 |
a (910)
‘Tho srninal + mus bo, 9 in (188) we can oly te ¢ =O ands (188) provides
‘aioe information or hs peblem. However, a raion b= ay B
"he interme of tie to go mec compe
Ma meae
"Te eval fn of sedi: dheve a wan oe change of a
‘otal path asim in oe ection nd chen pe ail nthe ohen
"Avi tothe at shat Hf Oa erin hen = 0) me cole ht
M)—ta2 Othen Ayton Ofc a> O and
eek omen meee
sl teajctary Hes on the parables) = 2/222 Oy SO hie
Isai ote switching oeus 2) = 25/2
Ma) Pee HP, Bae
med oS
“The control ale expressed as «function af operon, but a tens af 2.) am
Ue tein Tea x owe Lop
‘ori tha tepth sent othe ntl eonins Rat the opin path
ie very ferent fr to polit jam eter side ofthe oietng lose
18.3 Connection with Lagrangian multiplis
‘An ahermtive my ta nerd the ina pine fo tink of as La-
‘qian ship sealated with he corstenne += (au). Conder the Lagrangian
ta [hee ale Keer,
to be masini with respect to the (=) path, Here ft) tener taping se
Sant F. Wetonagate NE by prsto btan
antares [es sedate
Figue 2: pina eajetoris fo the Bash problem
‘he integrand must be satomary with respect 192) and hence A = Hs The
‘eqn muse ao be statonnry wih rept to ¢ > Oy x{Z} 40 © $ and ence
(CE) + K. (eC) = 0. ke, (186) and (8). Te good to have this eltemative
‘ew, Unt the treatment sew iramelat ad lem ey tog
12.4 Example: uso of transversality conditions
IF the tern soe i contro then (ome ae a the west eat) en onset
Ive Theorem 13.2; eat A masz oO, but the tbr caus of Theorems
TLL and 152 cori tl
(Coser poe with the dpe (1.9, a with wuncostand (0) =
(0.0) and cot fein
enh f oora—ncr)
hc i ade, He) ==) he Hota
Mien) =dervt onde,
hihi at) = A) Row SO
hee yeh
ah re cdo, (1.4K) 0, ary a0 wah ave
any
0 \yin)=0
‘mst sotuton mast be Ae) = and Ax) = Tt Hen he optimal pled ee
inet) == which dross Ener with ard eenteemve a14 Applications of the Maximum Principle
\We drs the tril contions ofthe msm prin nl farther explo of
insu, The argunents are epi of tine wd ta sythenive aston to nm pa
‘ete ble Wy ne of he ma pene
14.1 Problems with terminal conditions
Suppose 0, 6, Sand K are all-Sependnt. The DD equation for #24) now be
ye + Hel ay)
‘ute a toning tS. with Fx) = Kia. for (0) nS. However een race
this ton forms mine ce by aenting she ste arable by the variable
off -fl
(a4,3) ein 29 — Ay
‘hore 121 sold oc be werd, meer (136) we nn oad
By tho as)
Qo KaTo# Det Ber = 0, say
hich st olla dhe termination poi (2) cet) within fof che
Ueemination pin of ws optinalteajtor fo al sl enogh pitaec We ea vow
‘ndertanlwh todo wth astute of ei conc
tbe mopping tale pio ony ced terminal tae 7th rut sro
ts unconstrained 9 Ha (143) Uncouse NC] = KG. The pele in Section 13
fete ti
tine te» teen termina tim den fe wicontesine ad oo (143) pie
“Ag(T)= Ky. An cramp of thi eae appear in Seton 11-2 bow
the stem b tine homage, it that ea eae depelet off, bu he
teeminal coe (=P) depen on 7, then (12) pes that yn coat al wo she
‘asinine fH noms the opti oi. The problem a eetion 132 x
Uetreted shi way. We take Kay) = Ian dade fo the tranmeraty cnn
arsine + Is uncooatalo 2y-= Ry =k Ths f= Na+ Agu anand
Ms row hols th tive remaiing tock of Cane ley wie fr the
ig ear 15, eee rae (oni ie be ees a
cepa) = (n/t Or deal pa) Ofwre t:Se een comet
‘i O ad wire in a yt maxis be ttl Gncoaste e,
[eno (whee inunecrtetn
Solution. ‘The plat entice and th Haman is
(a3) = ea) — a= eat — 2)
0.) is constant The terial tne i conten so the tanerraiy condition
Son ier act ne came eS oe oe
SratEs 0st ee
tice ther dite on
[otereoerry0
‘Thus ws mpc a fnetion of, hough; Tbe optimal ae fuetion
14.3 Example: insects as optimizers
A colony oftmocts ons of works ud queens, of murs w(t) al) at te &
ies sumeepdee proportion i?) of the cys ene a ite prong wear
(0° ue) 6 The faction w vo be chown to masiize sho
tor of same atthe of the seam, Sw that the opt poy opens
‘nly workes upto se more pce only vec tera
Solution. The Hamann is
= Non — ts) ae‘The adit equnions and trannerny conitions (th X= 9) give
Hy == 0 \
\ WT) =-Ky= 0
Mew-Werett—, XG IIT T+
snd oe) co 1) =o al Tht 1 st
=P arene So.
A. hn wf) =. Hea Ee hha yes = Do he
* " Aeabe ous)
‘a ong as Avi sol, hi <0. "Terefore ate remaining ane rene, (8)
then y= (e=8) 0, whisk pes that) ati oineeoe a ere,
Fhe bck othe star So href no fre mth ih
"Te pot at whic te sng choca a oud ty negra (14) fom £40
1, to give yf) (c/8]1— ead so he ste occurs where Ayo =O, be,
mea or
tan = T+ (1/0) be)
evidence age that sca insets do coe fllow this ples wad
open swt ine tha Ie ney opine ete bare ero.
14.4 Example: rocket thrust optimization
Nard rocket as poe mas wt potion wcty © abd mas rss
‘dng ony by expan af wt the jet. Supe the et hae vet oie E
fel othe tok and the rocket lsu to exter ce J. Then the coun
‘trate comer Fk,
(om —Sio +59) + (0 Esme = ft,
sd this gives the anced rocket unto’
bins
Suppose the je sped A= 1/ is xa, but cho steton ss he rate of expuion
‘amo sae Tee ono sn econ = hin ol =
‘Sy Fin the corel tha asimios the height thatthe rocket eb.
Solution. ‘The plat tion Gn B2)
wonuts
v= bu
We ie dual arial p 4, coerepondingto zm. The
to poe TED
(here ee oats ae prey trial
Sa
(4
BE -)es
The controle bangs on p47 a deerme nthe dua ution
the rocket is buted vera thenf = mg an te dua gations give j = 0,
‘pad # = eu} 20. Sappose we want tase the beh tht she recat
Sinan. Lt tthe naa te rode sae mt te ann gh hn
te reached ifm = yy amd» <0. Since K-= 2 at termination, che teansvesalty
‘zttons ve pi) 1 g(P) = 0. Tha ls) = 1 (6) = sa fo mast msinie
[ults/m~ br). One cam cece tas (es) — 0} > 0, and bee we shou is
thee ar anc pts soe i, al therefor cst mi eight om er
tne15 Controlled Markov Jump Processes
We conchae with mes for controll ptiizationproblens in a cotinsus ine
eacinte stig Th Fectite ne contrelled Marko jmp preeses, Whih ne
eee wen the tate space dre
15.1 ‘The dynamic programming equation
infest + BIR 04649 |) = 2 0 =e
appropriate mits exis ten ths ean be writen nthe nt 0 as
inlets.) + 2,0 + Ao) Ptat
eve (a st operator dei by
ot) [EME Leeatimal=e) gga
let + 50) — 03
®
the eoodtonl expectation ofthe ate of change’ af (0) long the path. The operator
Acer ccslt tus of ate, ea), to another such futon Apa). Monee, te
“pene pon th cl, we mite ae Aa). Wee alla he tran
-encrator of the consollat Marky prowess Equation (151) quae >
EYotet + 50 | (0) = 2.00} = a= 2) + (uh + 6)
This qqation takes rely dee fr depending upn whether Ue state space
UWalucete or continuous Both are portant apd ei Hie orm Carn,
egg wth ase sae pe,
soe = yy so=aey=4]
15.2 The case of a diserete state space
Suopene tht cant ly vl a daerete set, abel by a inten jad
‘hehe transition intent
al
1
Pelt) = BL) =f al =a)
Ie ded for all and 9k Then
[sett +49) |
wl)
)
Taser (: > ssc) a) +469,
hee fll hat
ACE = Fal wot = 6)
sd the DP equation besos
ig [tsa +n. + Dantotao—eU40)
‘Thin the opiniy equation for» Markov jump proces.
15.3 Uniformization in the infinite horizon case
In this atin wo epin how (in he inf hvion ex) be continu tine DP
(srnton (152) can be revrten to bok le 9 dacrte ne DP equtiny Once th
tedone the all he koro Laces 1-6 ean be apie. nh encod cot cae
(0152) merges the al oletna to
tat fo) oF G.0)+F.0 +a 0 ral] o
1 init horn case, ert beoies indepedet fine ad we have
[sine + nie Fo] 0
Spon we cam hase «ange noigh ha epi to ine
B- Dane,
dal
for nl jal a. By nding (+9) Fy) 2 bt se of (8), tbe DP ton eat
be writes
rsonerany fio
Fowl, ving by B+, this can be weiten ae
ripen oA Ensen] wn
ind oe Bo nae) 2H aad Spe ="hire the dynamic peogratming motion lok Me wee of counted dye
Dyogehmming a dete tne, oF af negative programming Ha = All he Fslts
Se hve fr thou eum ean ow be se fe. aeration, OSLA rab, ate) The
Ue of ula lage B wo minke the reducon from «cola tom eee die
Firmalatin sealed wiformistion
the unlscoted cane me cou ry slim 4 (152) of the frm PL) =
at ofp) Sutetting this in (152, were tha his we a saaton rove
a [oo 14 Eatoles) wo]
By adding B44) to bth sof the sain, thi ding by B, wna
ig he ober sutton abe lat i= 8), thie egal
Bad
a0 srangfio Ere]. ws
hich has te sume frm sthodsrete tne average cot ptt union of Latore
The theo td techn of tt lature ca sow be ap
15.4 Example: admission control at a queue
tc ubere contelable Iter Od a maxim aloe \ Lat clara) = a2 ~ a
This corresponds paying «ent per uate foreach customer i the queue and
receing a evar 1 atthe poe that enh new estore sata (ae heeone
Inertng reward wt tte eben te arial rate on) Late take B= Apa
wnbout low of gaverahty nose B= The avenue cone oa sxntion foe
O53)
210)
ifl-Ru + wo} +44 Aw),
igfat—M+ #1) - 00 + (u-+600),
a) +1 ~ inflar— eu nolo 1)4+ wale 1)40—wieel.
or bul-R+ ee +1) ~ dle) + yale —1) + oka, > 0.
‘sw sould be coven wo be oF 1 a4 + la 1)— oe) ponive eztv
Vetus cose what happen under che pcg Unt tae w'= A foe alls The
relate eats fo his pi. my fate ven by
fla) bymar- BAG Mies Dbafle2), 230
‘The aalation tothe bomagenou part of thi recursion in ofthe fr $(2) = 1" +
Ay Asstumung 0, ley
Towers thn poly aency exits an ieteretng property. feet etme
salle que fet than dows pay wich rej mci a oy
‘This sod pte i opis feels purely concer with weer ono nv
‘dato tt ja when thee Ze = ener ok ot al low = pea
Goethe base ofthe difece betwen the rear 12 nad bis expected hang cot
(G4 Tels Thi esample exits the dirence betwen individual optimality
[wtih opie) and social optimality. The socal optimal poey is mae luca
{Dadi ctstones baense eps hat oe toms ote nth tle
les badly abou ngage patton n customer that peta Bnslf oy, eogniing
thar admiting sich cuseamer ca xine eomere wh sr dnd ae hi tet
[Beerdelay“Avexposed the pce ne nel the mee cel ote al16 Controlled Diffusion Processes
\We ive x bri introdurtion to conrlld continent tine sochaatie mode with &
‘vino tate spe, Uy eontrlled difison poe
16.1 Diffusion processes and controlled dilfusion processes
‘he Wiener proceas {J} 8 scalar poset fr wich (0) = 0, the joer
in over ote nterrals ae titel ndopenat and) ie normally
sebte wih sro mewn and rine (8 tan or Brownian motion ) This
‘pectin is inertally coset cae oe exanpe,
BU)» Bit) + [B40 ~ BAe]
tae mean and wit vice hand tty epee,
SB the neeert of it tine aera ene 9 tbe
26D) = 0, EUR |= 8, EUABY| = 0f6, fr j>2,
2160/6071 = 0 [601-4] 50,
the formal dervtve ¢ = dB/dt(comlguous tae “ite wo’) dow not et in a
Ineusquare sens, bt expectations sich
#[{{-osna} ]-2[{f eam} - for
ranks sae i the integral is converse.
Now comer «rlchartie diferential equation
Som alae (2,060,
hich nl rite Fall
= alesal tate
This, ms 4 Maroy proces, bs an afniesal gener with ation
suo = py [SA
hee Nea) = eal. The DP exuatoa i thie
iy e+ +R Fa]
lyfe + Feat StN Fea =0.
16.2 Example: LQG in continuous time
ig Mew Qu H+ AT Az + Bu) + Bele
Tn anslgy with the dgrete a deters contanous eases that we bave conse
previ, we ty a sla of te fo,
Fe) = 2700 +900,
“Tis les 9 the sme Rees equation as in Section 12
= esnasaTn—uag ans St]
2
da, asin Section 7.3,
Bev =0, ving >
fi sosuinyar
10.3 Bsamples passage a stopping seb
Cine robin moves on tt une era ¢ 2 < 1 comms tne,
teat owen ort ter ofthe eval. Te orto up efan gal eh
TEC Qwhpenalang both contol rd the spent ternal cn ih
Eu eC) cy ting enti aes pl a
Show tasted cne v= 0 one Shalt fro othe
tention pete et nt eae coe FS) be a fe
tear ary ts prey dscns ae ote boundary pte ht Couey
Shon the maceie cae he aie gran eution WE the
covet ie pion et cane eid ya rian #0} = log)
{le oan yan eer th rot
FE) a0 o