MIT开放课程Dynamic Programming Lecture (7)

更新时间：2023-04-21 17:26:01 阅读量：实用文档文档下载

说明：文章内容仅供预览，部分内容可能不全。下载后的文档，内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的，是否完整无缺。

mit开放课程网站推荐度：
相关推荐

6.231DYNAMICPROGRAMMING

LECTURE7

LECTUREOUTLINE

Deterministiccontinuous-timeoptimalcontrol Examples

Connectionwiththecalculusofvariations

TheHamilton-Jacobi-Bellmanequationasacontinuous-timelimitoftheDPalgorithm

TheHamilton-Jacobi-Bellmanequationasasuf- cientcondition

Examples

PROBLEMFORMULATION

Wehaveacontinuous-timedynamicsystem x˙(t)=fx(t),u(t),0≤t≤T,x(0):given,where

x(t)∈ nisthestatevectorattimet

u(t)∈U misthecontrolvectorattimet,Uisthecontrolconstraintset

[0,T],thatminimizesacostfunctionoftheform

hx(T)+

0T gx(t),u(t)dt

f,h,gareassumedcontinuouslydifferentiable.

Motioncontrol:Aunitmassmovesonalineunderthein uenceofaforceu. x(t)=x1(t),x2(t):positionandvelocityofthemassattimet Problem:Fromagivenx1(0),x2(0),bringthemass“near”agiven nalposition-velocitypair(1,2)attimeTinthesense:

2 2 minimize x1(T) 1 + x2(T) 2

subjecttothecontrolconstraint

|u(t)|≤1,forallt∈[0,T].

Theproblem tstheframeworkwith

x˙1(t)=x2(t),

x˙2(t)=u(t), 2 2 hx(T)= x1(T) 1 + x2(T) 2 ,

gx(t),u(t)=0,forallt∈[0,T].

Aproducerwithproductionratex(t)attimetmayallocateaportionu(t)ofhis/herproductionratetoreinvestmentand1 u(t)toproductionofastorablegood.Thusx(t)evolvesaccordingto

x˙(t)=γu(t)x(t),

whereγ>0isagivenconstant.

Theproducerwantstomaximizethetotalamountofproductstored

0T 1 u(t)x(t)dt

subjectto

0≤u(t)≤1,forallt∈[0,T].

Theinitialproductionratex(0)isagivenpositivenumber.

EXAMPLEIII(CALCULUSOFVARIATIONS)

Point Findacurvefromagivenpointtoagivenlinethathasminimumlength.

Theproblemis

minimize

0T 21+x˙(t)dt

subjecttox(0)=α.

Reformulationasanoptimalcontrolproblem:

minimize

0T 21+u(t)dt

subjecttox˙(t)=u(t),x(0)=α.

Wediscretize[0,T]attimes0,δ,2δ,...,Nδ,whereδ=T/N,andwelet

xk=x(kδ),uk=u(kδ),k=0,1,...,N. Wealsodiscretizethesystemandcost:xk+1=xk+f(xk,uk)·δ,h(xN)+N 1

k=0g(xk,uk)·δ.

WewritetheDPalgorithmforthediscretizedproblem (Nδ,x)=h(x),J (k+1)·δ,x+f(x,u)·δ. (kδ,x)=ming(x,u)·δ+JJu∈U

isdifferentiableandTaylor-expand: AssumeJ

J(kδ,x)=ming(x,u)·δ+J(kδ,x)+ tJ(kδ,x)·δ

u∈U

+ xJ(kδ,x)f(x,u)·δ+o(δ).

LetJ (t,x)betheoptimalcost-to-goofthecon-tinuousproblem.Assumingthelimitisvalidk→∞,δ→0,kδ=tlim (kδ,x)=J (t,x),Jforallt,x,

weobtainforallt,x, 0=ming(x,u)+ tJ (t,x)+ xJ (t,x) f(x,u)u∈U

withtheboundaryconditionJ (T,x)=h(x).

ThisistheHamilton-Jacobi-Bellman(HJB)equa-tion–apartialdifferentialequation,whichissat-is edforalltime-statepairs(t,x)bythecost-to-gofunctionJ (t,x)(assumingJ isdifferentiableandtheprecedinginformallimitingprocedureisvalid). ItishardtotellaprioriifJ (t,x)isdifferentiable. SoweusetheHJBEq.asaveri cationtool;ifwecansolveitforadifferentiableJ (t,x),then: J istheoptimal-cost-to-gofunction

Thecontrolµ (t,x)thatminimizesintheRHSforeach(t,x)de nesanoptimalcontrol

VERIFICATION/SUFFICIENCYTHEOREM SupposeV(t,x)isasolutiontotheHJBequa-tion;thatis,Viscontinuouslydifferentiableintandx,andissuchthatforallt,x,

0=ming(x,u)+ tV(t,x)+ xV(t,x) f(x,u),u∈U

V(T,x)=h(x),forallx.

Supposealsothatµ (t,x)attainstheminimumaboveforalltandx. Letx (t)|t∈[0,T]andu (t)=µ t,x (t),t∈[0,T],bethecorrespondingstateandcontroltrajectories.

Then

V(t,x)=J (t,x),

forallt,x, andu (t)|t∈[0,T]isoptimal.

PROOF

Let{( u(t),x (t))|t∈[0,T]}beanyadmissiblecontrol-statetrajectory.Wehaveforallt∈[0,T]

˙(t)=fxUsingthesystemequationx (t),u (t),

theRHSoftheaboveisequalto

d V(t,x (t))gx (t),u (t)+dt

Integratingthisexpressionovert∈[0,T],

0≤

0T0≤gx (t),u (t)+ tVt,x (t)+ xVt,x (t)fx (t),u (t). gx (t),u (t)dt+VT,x (T) V0,x (0).

UsingV(T,x)=h(x)andx (0)=x(0),wehave T V0,x(0)≤hx (T)+gx (t),u (t)dt.

Ifweuseu (t)andx (t)inplaceofu (t)andx (t),theinequalitiesbecomesequalities,and T gx (t),u (t)dt.V0,x(0)=hx (T)+

EXAMPLEOFTHEHJBEQUATION

Considerthescalarsystemx˙(t)=u(t),with|u(t)|≤ 21andcost(1/2)x(T).TheHJBequationis

0=min tV(t,x)+ xV(t,x)u,|u|≤1forallt,x,

withtheterminalconditionV(T,x)=(1/2)x2. Evidentcandidateforoptimality:µ (t,x)= sgn(x).Correspondingcost-to-go 21J (t,x)=max0,|x| (T t). WeverifythatJ solvestheHJBEq.,andthatu= sgn(x)attainstheminintheRHS.Indeed, tJ (t,x)=max0,|x| (T t),

xJ (t,x)=sgn(x)·max0,|x| (T t).

Substituting,theHJBEq.becomes

0=min1+sgn(x)·umax0,|x| (T t)|u|≤1

LINEARQUADRATICPROBLEM

Considerthen-dimensionallinearsystem

x˙(t)=Ax(t)+Bu(t),

andthequadraticcost

x(T) QTx(T)+

0T x(t) Qx(t)+u(t) Ru(t)dt

TheHJBequationis

0=minxQx+uRu+ tV(t,x)+ xV(t,x)(Ax+Bu),mu∈

withtheterminalconditionV(T,x)=x QTx.Wetryasolutionoftheform