Iterative-Refinement for Action Timing Discretization

更新时间：2023-05-26 17:19:01 阅读量：实用文档文档下载

说明：文章内容仅供预览，部分内容可能不全。下载后的文档，内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的，是否完整无缺。

iterative翻译推荐度：
相关推荐

Artificial Intelligence search algorithms search discrete systems. To apply such algorithms to continuous systems, such systems must first be discretized, i.e. approximated as discrete systems. Action-based discretization requires that both action paramete

Iterative-Re nementforActionTimingDiscretization

DepartmentofComputerScience

GettysburgCollegeGettysburg,PA17325,USAtneller@gettysburg.edu

ToddW.Neller

Abstract

Arti cialIntelligencesearchalgorithmssearchdiscretesys-tems.Toapplysuchalgorithmstocontinuoussystems,suchsystemsmust rstbediscretized,i.e.approximatedasdis-cretesystems.Action-baseddiscretizationrequiresthatbothactionparametersandactiontimingbediscretized.Wefocusontheproblemofactiontimingdiscretization.

Afterdescribingan-admissiblevariantofKorf’srecursivebest- rstsearch(-RBFS),weintroduceiterative-re nement-admissiblerecursivebest- rstsearch(IR-RBFS)whichofferssigni ckofknowledgeofagoodtimediscretizationiscompensatedforbyknowledgeofasuitablesolutioncostupperbound.

Introduction

Arti cialIntelligencesearchalgorithmssearchdiscretesys-tems,yetweliveandreasoninacontinuousworld.Con-tinuoussystemsmust rstbediscretized,i.e.approximatedasdiscretesystems,toapplysuchalgorithms.Therearetwocommonwaysthatcontinuoussearchproblemsaredis-cretized:state-baseddiscretizationandaction-baseddis-cretization.State-baseddiscretization(Latombe1991)be-comesinfeasiblewhenthestatespaceishighlydimensional.Action-baseddiscretizationbecomesinfeasiblewhentherearetoomanydegreesoffreedom.Interestingly,biologi-calhigh-degree-of-freedomsystemsareoftengovernedbyamuchsmallercollectionofmotorprimitives(Mataric2000).Wefocushereonaction-baseddiscretization.

Action-baseddiscretizationconsistsoftwoparts:(1)actionparameterdiscretizationand(2)actiontimingdis-cretization,i.e.howandwhentoact.SeeFigure1.Forex-ample,considerrobotsoccer.Searchcanonlysampleactionparametercontinuasuchaskickforceandangle.Similarly,searchcanonlysamplein nitepossibleactiontimingssuchaswhentokick.Themostpopularformofdiscretizationisuniformdiscretization.Itiscommontosamplepossibleactionsandactiontimingsat xedintervals.

Inthispaper,wefocusonactiontimingdiscretization.Experimentalevidenceofthispaperandpreviousstud-ies(Neller2000)suggeststhata xeduniformdiscretiza-tionoftimeisnotadvisableforsearchifonehasadesired

discretizationisstatic,i.e.cannotbevariedbythealgo-rithm.However,actiontimingdiscretizationisdynamic,i.e.thesearchalgorithmcanvarytheactiontimingdiscretiza-tion.Forthisreason,wewillcallsuchsearches“SADATsearches”astheyhaveStaticActionandDynamicActionTimingdiscretization.

WeformalizetheSADATsearchproblemasthequadru-ple:

where

isthestatespace,

istheinitialstate,

isa nitesetofactionfunctions,mappingastateandapositivetime

durationtoasuccessorstateandatransitioncost,and

isthesetofgoalstates.

Theimportantdifferencebetweenthisandclassicalsearchformulationsisthegeneralizationofactions(i.e.op-erators).Ratherthanmappingastatetoanewstateandtheassociatedcostoftheaction,weadditionallytakeatimedu-rationparameterspecifyinghowmuchtimepassesbetweenthestateanditssuccessor.

Agoalpathcanbespeci edasasequenceofaction-durationpairsthatevolvetheinitialstatetoagoalstate.Thecostofapathisthesumofalltransitioncosts.Giventhisgeneralization,thestatespaceisgenerallyin nite,andtheoptimalpathisgenerallyonlyapproximablethroughasam-plingofpossiblepathsthroughthestatespace.

SphereNavigationProblem

SinceSADATsearchalgorithmswillgenerallyonlybeabletoapproximateoptimalsolutions,itishelpfultotestthemonproblemswithknownoptimalsolutions.RichardKorfproposedtheproblemofnavigationbetweentwopointsonthesurfaceofasphereasasimplebenchmarkwithaknownoptimalsolution.1Ourversionoftheproblemisgivenhere.Theshortestpathbetweentwopointsonasphereisalongthegreat-circlepath.Considerthecircleformedbythein-tersectionofasphereandaplanethroughtwopointsonthesurfaceofthesphereandthecenterofthesphere.Thegreat-circlepathbetweenthetwopointsistheshorterpartofthiscirclebetweenthetwopoints.Thegreat-circledistanceisthelengthofthispath.

Thestatespaceisthesetofallpositionsandheadingsonthesurfaceofaunitspherealongwithallnonnegativetimedurationsfortravel.Essentially,weencodepathcost(i.e.time)inthestatetofacilitatede nitionof.Theinitialstateisarbitrarilychosentohaveposition(1,0,0)andve-locity(0,1,0)insphericalcoordinates,withnotimeelapsedinitially.

Theaction,takesastateandtimeduration,andreturnsanewstateandthesametimeduration(i.e.cost=time).Thenewstateistheresultofchangingtheheadingradiansandtravelingwithunitvelocityatthatheadingforthegiventimedurationonthesurfaceofthe

Muchworkhasbeendoneindiscretesearchtotradeoffsolutionoptimalityforspeed.Weightedevaluationfunc-tions(e.g.or)(Pohl1970;Korf1993)provideasimplemeansto ndsolutionsthataresuboptimalbynomorethanamultiplicativefactorof.ForagoodcomparisonofIDA-stylessearches,see(Wah&Shang1995).Forapproximationofsearchtreestoexploitphasetransitionswithaconstantrelativesolutionerror,see(Pemberton&Zhang1996).

-AdmissibleRecursiveBest-FirstSearch

is-admissiblean-admissiblerecursivevariantbest- rstofrecursivesearch,best- rstherecalledsearch-RBFS,thatfollowsthedescriptionof(Korf1993,7.3)withoutfurthersearchafterasolutionisfound.Aswithourimplementationof-IDA,localsearchboundsincreasebyatleast(whennotlimitedbyB)toreduceredundantsearch.

InKorf’sstyleofpseudocode,-RBFSisasfollows:

eRBFS(node:N,value:F(N),bound:B)IFf(N)>B,RETURNf(N)

IFNisagoal,EXITalgorithm

IFNhasnochildren,RETURNinfinityFOReachchildNiofN,

IFf(N)<F(N),F[i]:=MAX(F(N),f(Ni))ELSEF[i]:=f(Ni)

sortNiandF[i]inincreasingorderofF[i]IFonlyonechild,F[2]:=infinityWHILE(F[1]<=BandF[1]<infinity)F[1]:=eRBFS(N1,F[1],

MIN(B,F[2]+epsilon))

insertNiandF[1]insortedorderRETURNF[1]

ThedifferencebetweenRBFSand-RBFSisinthecom-putationoftheboundfortherecursivecall.InRBFS,thisiscomputedasMIN(B,F[2])whereasin-RBFS,thisiscomputedasMIN(B,F[2]+epsilon).F[1]andF[2]arethelowestandsecond-loweststoredcostsofthechildren,respectively.Acorrectnessproofof-RBFSisde-scribedintheAppendix.

Thegiven,algorithm’sa, niteandbound.Actually,initialcallifonebothparameterswishesRBFStorestrictandare-RBFStherootnodesearchcanforso-belutionswithacostofnogreaterthanandusesanadmissi-bleheuristicfunction.Ifnosolutionisfound,thealgorithmwillreturnthe-valueoftheminimumopensearchnodebeyondthesearchcontourof.

InthecontextofSADATsearchproblems,both-IDAand-RBFSassumea xedtimeintervalbetweenanodeanditschild.Thefollowingtwoalgorithmsdonot.

Iterative-Re nement-RBFS

Iterative-re nement(Neller2000)isperhapsbestdescribedincomparisontoiterative-deepening.Iterative-deepeningdepth- rstsearch(Figure2(a))providesboththelin-earmemorycomplexitybene tofdepth- rstsearchandtheminimum-lengthsolution-pathbene tofbreadth- rstsearchatthecostofnodere-expansion.Suchre-expansion

costsaregenerallydominatedbythecostofthe nalitera-tionbecauseoftheexponentialnatureofsearchtimecom-plexity.

Iterative-re nementdepth- rstsearch(Figure2(b))canbelikenedtoaniterative-deepeningsearchtoa xedtime-horizon.Inclassicalsearchproblems,timeisnotanissue.Actionsleadfromstatestootherstates.Whenwegeneral-izesuchproblemstoincludetime,wethenhavethechoiceofhowmuchtimepassesbetweensearchstates.Assuming

thattheverticaltimeintervalinFigure2(b)is,weper-formsuccessivesearcheswithdelays,,,untilagoalpathisfound.

Iterative-deepeningaddressesourlackofknowledgecon-cerningtheproperdepthofsearch.Similarly,iterative-re nementaddressesourlackofknowledgeconcerningthepropertimediscretizationofsearch.Iterative-deepeningperformssuccessivesearchesthatgrowexponentiallyintimecomplexity.Thecomplexityofpreviousunsuccessfuliterationsisgenerallydominatedbythatofthe nalsuccess-fuliteration.Thesameistrueforiterative-re nement.

However,theconceptofiterative-re nementisnotlim-itedtotheuseofdepth- rstsearch.Otheralgorithmssuchas-RBFSmaybeusedaswell.Ingeneral,foreachit-erationofaniterative-re nementsearch,alevelof(perhapsadaptive)time-discretizationgranularityischosenforsearchandanupperboundonthesolutioncostisgiven.Iftheit-eration ndsasolutionwithinthiscostbound,thealgorithmterminateswithsuccess.Otherwise,a nerleveloftime-discretizationgranularityischosen,andsearchisrepeated.Searchissuccessivelyre nedwithrespecttotimegranular-ityuntilasolutionisfound.

Iterative-Re nement-RBFSisoneinstanceofsuchsearch.Thealgorithmcanbesimplydescribedasfollows:

IReRBFS(node:N,bound:B,initDelay:DT)FORI=1toinfinity

FixthetimedelaybetweenstatesatDT/IeRBFS(N,f(N),B)

IFeRBFSexitedwithsuccess,EXITalgorithm

Iterative-Re nement-RBFSdoesnotsearchtoa xedtime-horizon.Rather,eachiterationsearcheswithinasearchcontourboundedby.Successiveiterationssearchtothesamebound,butwith nertemporaldetail.DT/Iisas-signedtoaglobalvariablegoverningthetimeintervalbe-tweensuccessivestatesinsearch.

Iterative-Re nementDFS

ThealgorithmforIterative-Re nementDFSisgivenasfol-lows:

IRDFS(node:N,bound:B,initDelay:DT)FORI=1toinfinity

FixthetimedelaybetweenstatesatDT/IDFS-NOUB(N,f(N),B)

IFDFS-NOUBexitedwithsuccess,EXITalgorithm

Ourdepth- rstsearchimplementationDFS-NOUBusesanodeordering(NO)heuristicandhasapathcostupper-bound(UB).Thenode-orderingheuristicisasusual:Nodes

(a)Iterative-deepening

DFS.Figure2:areexpandedinincreasingorderof-value.Nodesarenotexpandedthatexceedagivencostupperbound.Assumingadmissibilityoftheheuristicfunction,nosolutionswithinthecostupper-boundwillbeprunedfromsearch.

ExperimentalResults

Intheseexperiments,wevaryonlytheinitialtimedelaybetweensearchstatesandobservetheperformanceofthealgorithmswehavedescribed.For-IDAand-RBFS,theinitialistheonlyforsearch.Theiterative-re nementalgorithmssearchusingtheharmonicre nementsequence,,,,andarelimitedto1000re nementiterations.-admissiblesearcheswereperformedwith.

Experimentalresultsforsuccessratesofsearcharesum-marizedinFigure3.Eachpointrepresents500trialsovera xed,randomsetofspherenavigationproblemswith

andcomputedas10%oftheoptimaltime.

Thus,thetargetsizeforeachproblemisthesame,butthevaryingrequirementforsolutionqualitymeansthatdiffer-entdelayswillbeappropriatefordifferentsearchproblems.Searchwasterminatedafter10seconds,sothesuccessrateisthefractionoftimeasolutionwasfoundwithintheallot-tedtimeandre nementiterations.

Inthisempiricalstudy,meansand90%con denceinter-valsforthemeanswerecomputedwith10000bootstrapre-samples.

Letus rstcomparetheperformanceofiterative-re nement(IR)-RBFSand-RBFS.Totheleftofthegraph,wheretheinitialissmall,thetwoalgorithmshaveidenticalbehavior.Thisregionofthegraphindicatesconditionsunderwhichasolutionisfoundwithin10sec-ondsonthe rstiterationornotatall.Thereisnoiterative-re nementinthisregion;thetimecomplexityofthe rst

iterationleavesnotimeforanother.Atabout,weobservethatIR-RBFSbeginstohaveasigni cantlygreatersuccessratethan-RBFS.Atthispoint,thetimecomplexityofsearchallowsformultipleiterations,andthuswebegintoseethebene tsofiterative-re nement.

,IR-Continuingtotherightwithgreaterinitial

RBFSnearsa100%successrate.Atthispoint,thedistri-butionof’soverdifferentiterationsallowsIR-RBFStoreliably ndasolutionwithinthetimeconstraints.Wecanseethedistributionof’sthatmostlikelyyieldsolutionsfromthebehaviorof-RBFS.

WherethesuccessrateofIR-RBFSbeginstofall,the

’sbeginstofalloutsideofthedistributionof rst1000

regionwheresolutionscanbefound.Withourre ne-mentlimitof1000,thelastiterationusesaminimal

.Thehighesttrialsfailnotbecausetimeruns

out.Rather,theiterationlimitisreached.However,evenwithagreaterre nementlimit,wewouldeventuallyreach

wheretheiterativesearchcostincurredonthewaytoa

thegoodrangewouldexceed10seconds.

ComparingIR-RBFSwithIRDFS,we rstnotethatthereislittledifferencebetweenthetwoforlarge.For

,thetwoalgorithmsarealmostalways

abletoperformcompletesearchesofthesamesearchcon-toursthroughalliterationsuptothe rstiterationwithasolutionpath.Thelargeststatisticaldifferenceoccursat

whereIRDFS’ssuccessrateis3.8%higher.We

notethatourimplementationofIRDFShasafasternode-expansionrate,andthat-RBFS’s-admissibilitynecessi-tatessigni cantnodere-expansion.Forthese’s,theuseofIRDFStradesoff-optimalityforspeedandaslightlyhighersuccessrate.Formid-to-low-rangevalues,however,webegintoseetheef ciencyof-RBFSoverDFSwithnodeordering

IR eRBFS

0.90.80.70.60.50.40.30.20.1

eIDA*

Success Rate

IR DFS

eRBFS

Initial Time Delay

Figure3:Effectofvaryinginitial

asthe rstiterationwithasolutionpathpresentsamorecomputationallycostlysearch.Sincethetargetdestinationissosmall,theroutethatactuallyleadsthroughthetargetdestinationisnotnecessarilythemostdirectroute.With-outaperfectheuristicwherecomplexsearchisnecessary,-RBFSshowsitsstrengthrelativetoDFS.Rarelywillprob-lemsbesounconstrainedandoffersuchaneasyheuristicasthisbenchmarkproblem,soIR-RBFSwillbegenerallybebettersuitedforallbutthesimplestsearchproblems.

ComparingIR-RBFSwith-IDA,wenotethat-IDAperformsrelativelypoorlyoverall.Whatispartic-ularlyinterestingistheperformanceof-IDAovertherangewhereIR-RBFSbehavesas-RBFS,i.e.wherenoiterative-re nementtakesplace.Herewehaveempiricalcon rmationofthesigni cantef ciencyof-RBFSover-IDA.

Insummary,iterative-re nementalgorithmsarestatisti-callythesameasorsuperiortotheothersearchesovertherangeofvaluestested.IR-RBFSoffersthegreatestav-eragesuccessrateacrossall.Withrespectto-RBFS,IR-RBFSofferssigni cantlybetterperformanceforspanningmorethanfourordersofmagnitude.These nd-ingsareinagreementwithpreviousempiricalstudiescon-cerningasubmarinedetectionavoidanceproblem(Neller2000).

Thisissigni cantforsearchproblemswherereasonablevaluesforareunknown.Thisisalsosigni cantfor

searchproblemswherereasonablevaluesforareknown

andonewishesto ckofknowledgeofagoodtimediscretizationiscompensatedforbyknowledgeofasuitablesolutioncostupperbound.

Conclusions

Thisempiricalstudyconcerningspherenavigationprovidesinsightintotheimportanceofsearchingwithdynamictimediscretization.Iterative-re nementalgorithmsaregivenaninitialtimedelaybetweensearchstatesandasolutioncostupperbound.Suchalgorithmsiterativelysearchtothis

untilasolutionisfound.boundwithsuccessivelysmaller

Iterative-re nement-admissiblerecursivebest- rstsearch(IR-RBFS)wasshowntobesimilartoorsuperior

spanningover vetoallothersearchesstudiedfor

ordersofmagnitude.Withrespectto-RBFS(withoutiterative-re nement),anew-admissiblevariantofKorf’srecursivebest- rstsearch,IR-RBFSofferssigni cantly

spanningoverfourordersofbetterperformancefor

magnitude.

Iterative-re nementalgorithmsareimportantforsearchproblemswherereasonablevaluesforare(1)unknownor(2)knownandonewishesto ckofknowledgeofagoodtimediscretizationiscompensatedfor

byknowledgeofasuitablesolutioncostupperbound.Ifoneknowsasuitablesolutioncostupperboundforaproblemwherecontinuoustimeisrelevant,aniterative-re nementalgorithmsuchasIR-RBFSisrecommended.

FutureWork

Thereasonthatouriterative-re nementalgorithmsmadeuseofaharmonicre nementsequencedeepening.,It,would)wasbetointerestingfacilitate(i.e.,

tocomparisonseetheperformancetoiterative-ofdifferentre nementsequences.Forexample,ageomet-ricre nementsequence,,wouldyieldauniformdistributionof,’swithonthelogarith-micscale.

Evenmoreinterestingwouldbeamachinelearningap-proachtotheprobleminwhichamappingwaslearnedbetweenprobleminitialconditionsandre nementse-quencesexpectedtomaximizetheutilityofsearch.Theprocesscouldbeknown.Assumingutilities,thatviewedonebothwouldtimeasanwantandoptimizationtothechoosesuccessoftheofsearchesnextsearchoversohaveastomaximizeexpectedsuccessinminimaltimeacrossfutureiterations.

Acknowledgements

TheauthorisgratefultoRichardKorfforsuggestingthespherenavigationproblem,andtotheanonymousreview-ersforgoodinsightandsuggestions.ThisresearchwasdonebothattheStanfordKnowledgeSystemsLaboratorywithsupportbyNASAGrantNAG2-1337,andatGettys-burgCollege.

Appendix:-RBFSProofofCorrectness

Proofofthecorrectnessof-RBFSisverysimilartotheproofofthecorrectnessofRBFSin(Korf1993,pp.52–57).Forbrevity,wehereincludethechangesnecessarytomakethecorrectnessproofof(Korf1993)applicableto-RBFS.Itwillbenecessaryforthereadertohavetheproofavailabletofollowthesechanges.

Lemma4.1Allcalls-RBFSRBFS,where

to.

areoftheform

Substitute“RBFS”for“RBFS”throughallproofs.Forthesecondtolastsentenceofthislemmaproof,substitute:“Thus,.Thus,.”

Lemma4.2Ifis nite,andTdoesnotcontainan

interiorgoalnode,thenRBFSexploresT

andreturnsMF.

Intheinductionstep’s rstandfourthparagraphs,sub-stitute“”for“”.Inthelastsentenceofinductionstepparagraphtwo,theassumptionof“noin nitelyincreasingcostsequences”isnotnecessarybecausethetermforcesaminimumincrementofwhilelessthan.

Lemma4.3Forallcalls,

and.

RBFS

Notetheadditionof“”tothelemmaandmakeasimi-laradditioneverywhereaboundiscomparedtoanterm.Forthethirdsentenceofthesecondtostitute“Since

Becauseforallsiblingsandnodesare,thenlastparagraph,sub-sortedbyvalue,

.of.”

Lemma4.4Whenanodeisexpandedby-RBFS,its

valuedoesnotexceedthevaluesofallopennodesatthetimebymorethan.

Notethelemmachange.“”inthesecondparagraphisthe“value”parameter.Wherever“”occurs,substi-tute“

”.Inthelastsentence,substitute“...doesnotexceedthevaluesofallopennodesinthetreewhenisexpandedbymorethan.”

Theorem4.5RBFS

willperformacomplete nding-admissiblethe rstsearchgoalofnodethetreechosenrootedforexpansion.

atnode,exitingafter-RBFSFortheperforms rstsentence,substitute“Lemma4.4showsthatlastsentence,substitutean-admissible“Sincethesearch.”upperboundIntheonsecondeachtoofthesecallsisthenextlowestFvalueplus,theupperboundsmustalsoincreasecontinually,”.

References

Korf,R.E.1985.Depth- rstiterative-deepening:anoptimaladmissibletreesearch.Arti cialIntelligence27(1):97–109.

Korf,R.E.1993.Linear-spacebest- rstsearch.Arti cialIntelligence62:41–78.

Latombe,J.-C.1991.RobotMotionPlanning.Boston,MA,USA:KluwerAcademicPublishers.

Mataric,M.J.2000.Sensory-motorprimitivesasabasisforimitation:Linkingperceptiontoactionandbiologytorobotics.InNehaniv,C.,andDautenhahn,K.,eds.,Imita-tioninAnimalsandArtifacts.Cambridge,MA,USA:MITPress.SeealsoUSCtechnicalreportIRIS-99-377.

Neller,T.W.2000.Simulation-BasedSearchforHybridSystemControlandAnalysis.Ph.D.Dissertation,StanfordUniversity,PaloAlto,California,USA.AvailableasStan-fordKnowledgeSystemsLaboratorytechnicalreportKSL-00-15atwww.ksl.stanford.edu.

Pemberton,J.C.,andZhang,W.1996.Epsilon-transformation:Exploitingphasetransitionstosolvecom-binatorialoptimizationproblems.Arti cialIntelligence81(1–2):297–325.

Pohl,I.1970.Heuristicsearchviewedaspath ndinginagraph.Arti cialIntelligence1:193–204.

Russell,S.,andNorvig,P.1995.Arti cialIntelligence:amodernapproach.UpperSaddleRiver,NJ,USA:PrenticeHall.

Wah,B.W.,andShang,parisonandevalu-ationofaclassofIDAalgorithms.Int’lJournalofToolswithArti cialIntelligence3(4):493–523.

本文来源：https://www.bwwdw.com/article/8wo4.html

相关文章：

激发教师专业发展内驱力的管理策略10-10

2014年秋石油大学《国家税收》在线作业1-311-28

火炬广场附属结构主体结构施工方案06-21

维修工个人年终工作总结05-05

口译教学大纲05-06

上一篇：招聘入职离职管理制度下一篇：2016年北京协和医学院流行病与卫生统计学考研报录比