A Novel Dynamic Decision Model in 2-player Symmetric Repeated Games

更新时间:2023-06-03 02:42:02 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

none

ANovelDynamicDecisionModelin2-player

SymmetricRepeatedGames

(1.InstituteofSystemsEngineering,WuhanUniversity,Wuhan430072,China;2.SchoolofManagement,ChinaUniversityOfGeosciences,Wuhan430074,China)

LiuWeibing,WangXianjia,WangGuangmin

Abstract:ConsideringthedynamiccharacterofrepeatedgamesandMarkovprocess,thispaperpresentedanoveldy-namicdecisionmodelforsymmetricrepeatedgames.Inthismodel,players’actionsweremappedtoaMarkovdecisionprocesswithpayoffs,andtheBoltzmanndistributionwasintroduced.Ourdynamicmodelisdifferentfromothers’,weusedthisdynamicmodeltostudytheiteratedprisoner’sdilemma,andtheresultsshowthatthisdecisionmodelcansuccessfullybeusedinsymmetricrepeatedgamesandhasanabilityofadaptivelearning.Keywords:gametheory;evolutionarygame;repeatedgame;Markovprocess;decisionmodel

1 Introduction

In1970sthefundamentalnotionofevolutionarygametheorywasintroducedbyMaynardSmith.Inhis

[1]

bookEvolutionandTheoryofGameshepresentedanevolutionaryapproachinclassicalgametheory,andhegavethedefinitionofevolutionarystablestrategy

[2]

(ESS)withPrice.Evolutionarygametheoryhasex-tensiveapplicationsinmanyfields,suchasmathemat-ics,biology,ecology,economicsandsociology.

Contrarytoclassicalgametheory,inevolutionarygametheorytheindividualsofapopulationarenotas-sumedtoactconsciouslyandrationally.Sothetheoret-icframeofclassicalgametheoryandthenotionofe-quilibriumcannotbeadaptedtoevolutionarygametheory.Thelastdecadehaswitnessedanumberofpublicationsonthebuildingofmodelsandanalyticmethodsforevolutionaryandrepeatedgames.YaoandDarwenpresentedamethodthatintroducedgenetical-gorithmintoevolutionarygames(iteratedprisoner’s

[3]

.YuceThlolandAdnanAcaninvestiga-dilemma)

tedanantcolonyoptimizationapproachforiterated

[4]

prisoner’sdilemma.Thismethodprovidedgamestrategiesofbetterqualitythangeneticalgorithms,butneededlongerrunningtimes.TakafumiKanazawaandToshimitsuUshiogaveamulti-populationreplicatordy-[5]

namicwitherroneousperceptions.

Duringlastdecadestherearemanyscholarswhousedstochasticprocessinevolutionaryandrepeatedgames.Amiretal.proposedadynamicmodelforsym-[6]

metricgamesusingbirthanddeathprocess.Tadj

Received29January2007

2 SomePreliminaries

andTouzeneextendedtheworkofAmiretal.andgave

aQBDapproachforevolutionarygames(non-symmet-[7]

ricandsymmetricgames).RecentlytheMoranprocesswassuccessfullyintroducedintoevolutionarygamesbyNowakandFudenbergandanewdynamic

[8]

.modelwaspresented

Inthispaper,webuiltanewdynamicmodelforevolutionaryandrepeatedgamesusingtheMarkovde-cisionprocess.Inthismodel,theBoltzmanndistribu-tionwasintroduced.Weusedthedynamicmodeltostudytheiteratedprisoner’sdilemma,andtheresultsshowthatthemodelcansuccessfullysimulatetheac-tionsofplayersandhastheabilityofadaptivelearn-ing.

2.1 MarkovProcess

Supposing.I={0,1,2…}isthespaceofstate,T={0,1,2,…}isasetoftime.Astochasticprocess{Xt,t∈T}iscalledMarkovifforarbitraryandstatei0,i1,…,in-1,in,j,wehaveP{Xt+1=j|Xt=i,Xt-in-1=1,…,X(1)=i1,X(0)=i0}

j|Xt=i}=P{Xt+1=

i.e.,theMarkovprocesswhosefutureprobabilitiesaredeterminedbyitsmostrecentvalues.

Theconditionalprobability

Pij(1)=P{Xt+1=j|Xt=i}

iscalledone-steptransitionprobability.Itmeansthe

probabilityofstatetransitionfromitoj,soPij(1)sat-isfiesthefollowing:

Vol.6No.1,March2008 43

none

P∑ij(1)≥0i,j∈I;

j

P

ij

(1)=1i∈I.

probabilityThereforematrix

,wehavetheone-stepMarkovtransition

P00P1=

(1)P01Pn0(1)P…

(1)……

………

n1(1)….

2.2 ……

strategicWeGamerestrictTheory

ofthreeform.Generallyourviewtoanormaltheclassformoffinitegamegamesconsistsiners,1)keywherePlayerscomponents:LetI=:

{1,2,…,n}beasetofplaydenote2)Strategiesnisapositivespace:integer-Foreach.

playeri∈egiesplayersetasetofallowableactions,calledthepureI,letstratS-i

(s1,s2,…,i∈.IThechoiceofaspecificactionsi∈Siofasisncalledapurestrategy.Thevectors=andplayerany3)playerPayoff)isifunctioncalled∈I,let:apureuiFor(s1,…,anystrategiessstrategiesprofilen)betheprofile.

payofftossicalEvolutionary.

gamegametheory,gameandevolutionarytheoryistheextensionofclas-keycallednotion.Theevolutionarystablestrategygame(isESSadynamic倡)is倡

the

straintsanESSinevolutionary,ifitgametheory.(s,…,sn)is1.:

adherestothefollowingcon-ui(s倡1,…s倡i-1,s倡i,s倡i+1,…s倡n倡

s倡i-1,s倡倡)≥ui(s1,…,s倡2.i,sIfi+1u,…,i(s倡sn)

1,…s倡i-1,s倡i,s倡i+1,…倡倡

s,ssn)=ui(s1i-1ij,3 Dynamic(s倡is+1,…,s倡n),thenui(s1倡

,…,1,…,Decisionsi-1,si,si+1Model

,…,sn,…).

si-1,si,si+1,…n)≥ui3.1 InOurTheresearchIdeaoffieldModel

focusesontherepeatedgames.payoffsrepeatedgames.Inthisgamespaper,players,weregardedupdatestrategiesiterationbytimestheir{spaceX(t),asasetoftimet=1,2,…,inMarkovprocessofgamesItof∈MarkovT},strategyprocessspace,andwasmappedtothestateThereforewereov,theregardedrepeatedasgamesrewardstheforpayoffsstateintransitionrepeated.prisonerprocessplayers’swithrewards.Forwereexamplemapped,thetoaiteratedMark-theiterated,eitherdilemmaprisonercooperationassumes’sdilemma(Ca)binarywasordefectionchoicelookedas(aDfor)theMark.So-44 EngineeringSciences

ovatprocesswithth

two-statetransition.Moregenerally,if(timet(thetrepeatedgame)theplayerisinstateiplayeri=C,D)).Thenattimet+1theprobabilityMarkovchoosespij=

P{Xtransitionstrategyt+1=j|XprobabilityCorDcanbeobtainedthatthebyt=i},i,j=matrixCorD(.Fig

.1).Where

Figiterated.1 TheprisonerMarkov’sprocessdilemma

for3.2 TheDecisionModelinSymmetric  theAsGames

Repeated

alsopayoffaMarkovtheberegardedwhentheprocesswithNstates,supposinguijis

asstatetransferredfromitoj,uijcanriestransitionvariablesofrewardsof.statesarewardSothat,thethesystemforstatetransition.Withrewardswilluijgeneratease-byMarkov,andtransitionitsprobabilityarestochasticprobabilitydistributionisdeterminedsingstrategynowFor2-theplayersplayersymmetricrepeatedmatrix.

games,suppo-nlytimes’isrepeatedi),wecanisinstatei(namelytheplayer’sgamesgettheusingtotalMarkovexpectedpayoffaftertimes,wefollowing’assumetransitionνprocess.First-i(equationfromt)isν:

statethetotal.Easilyexpected,wecanpayoffgetaftertheN

i(t)=∑j=1

pij[uij+νj(t-1)]

equation(1)(ican=1,2,…,beN;t=1,2,3,…)

(1)νN

simplifiedasN

follows:

i(t)=∑j=1

pijuij+j=1

j(t-1)

(2)

Ifdefine   qN

i=∑∑pijνj=1

pijuij (i=1,2,…,N)

qtimeican’sbetransitionexplainedν(t)fromas=qstatetheiexpected(3)

N

.Sothatrewardwecanafterobtain

oneiiUsingtheformofvectors+equation∑j=1pijνj((4)t-1)

(4)

asthefollowingWherecolumn,VV((tt))=:

willbedenotedisQthe+PVcolumn(t-1)vector(n=1,2,3,…)ofν(5)

i(t),Qistheprobabilityvectormatrixof.

qi,andPistheMarkovtransition

none

3.3 MarkovTransitionProbabilityMatrix

expectedFromtransitionpayoffequationV(t(5),ifwewanttoobtainthetotaltheprobability),matrixfirstlyPwe.InshouldthisgettheMarkovprovidesBoltzmanntransitiononemethoddistributionpaper,weadopt,.TheBoltzmanndistributionpijfrom=

expstate(uj(ti)towhere/λj)at/∑timethekexptprobabilityofstate(is

Wherestrategy,uj(t)isthe(k=payoff1,2,…,whenN)uk(t)/λ theplayer(6)

thethelearningj.Theparameterλhasanimportantchoosesroleincreasingrandomnessprocessofdecisions.Byincreasing.Onλ,wecanincreaseenablesλwillresultindecreasingtherandomnessotherhand,which,de-4accuratelythe Examples.

playertochoosetheoptimalstrategymoreandNumericalResults

pleInthissection,amplefor2-playerssymmetricwepresentedrepeatedangamesillustrative.Asexam-example,considertheiteratedprisoner’sdilemmatheasexan-nonplayer-zero,andhassumprisoner’sdilemmaisanon-cooperative,

twogamechoices,played;eitherbetweencooperationtwoplayers.Eachaccordingthepayoffstheygotfortheirchoicesareordefectioncalculated,Table1to TableThegeneral1.

formofprisoner’sdilemma

Rowplayer

CooperationDefection

(3,3)(5,0)

(0,5)(1,1)

FromdominantTabledefectionstrategy.1we,socananyeasilyrationalgetthatplayerdefectionwillchooseisa

iflemmatheycooperatenomatterrodofprisoners,theywhattheotherplayerchooses.But.Towouldovercomegetmore.Thisisthedi-ma[presented9-10]

ous,whichtherepeatsthoughttheofiteratedthisconventionalprisonerproblemgame’s,numerdilemAxel---bothtimesplayersplayerswithnotthehope.Repeatingthenumberofcooperationtheofgamesrepetitionsunknownto.Inthisthispaperwaycan,givegenttryforrepeated.Rathertoexplain,weintendwhethertocooperationdevelopatheoreticwillbeweemerdo-haveSupposinggamesmodelpresent.

timeanalyzecompleted30times’trepeated=30,namelygamestwo.Nowplayersgamesrepeatedgameswhent>30.Beforet=30thewetimesand9thatwasrespectivelyrowobservedplayer,inchoosesthereforelast30timesstrategywecan’repeatedCandassumeDthegamesis21.

FurthermoreandtionDtrix:

(6)issupposed,thetotalwegetthe49payoffobtainedbychoosingCMarkovand9respectivelytransitionprobability.Usingequama--P=

0.According0.770.0.33

tytransitionfunctionto:

,weVoncanNeumannobtain-Morgenstemtherewardmatrixexpectedofutilistate-U=

17/3Inaddition,fromequation17/

3Q=

(3)1.4wehave

Frominitializethebeginningofthe301.4

threpeatedgame,ifwegetthetotalvi(0)Tableexpected=0,byusingequation(5),wecanplayer2 Theiniteratedtotalpayoffexpected(Tableprisonerpayoffs2).

’sdilemma

ofrowt=3031

32

33

34

35

36

37

38

39

40…vcvp(t)clusionsForfurtherstudy,wecangetthefollowingcon-1):

FromtransitionP=0.probabilitymatrix

wetionknowis,thethesteadyMarkov0.770.0.33

-statedistributionprocessisergodicof.Bycomputa-isthesameasthetransitionprobabilitytheMarkovmatrixPprocess

.Thatbothtosaybilityplayers,the0.7andwillexample0.3choosehasanevolutionaryoutcomethatrespectivelystrategyCandDwithproba-bepaperchosen2)Theadaptivelyvalueofaccordingparameter.

λinequation(6)cannodenticalpreference,λ=0.05.tochoicesNotethat,namely,whentoenvironmentplayersλ→0,,inthiswillplayersassignhavei-λplayersishighprobabilitieswhenare,theinclinedrandomnesstothedifferentstrategies.WhentochooseofstrategydecisionsD.isInincreasingaddition,cooperationtherandomnessweemergenceiswillincreasingbe,theprobabilityfor,incanupdatethevalueofλaccordingdecreasingto.environmentTherefore,5modelevolutionary Conclusions

inthispaperandhasrepeatedtheabilitygamesof,adaptivesothelearningdynamic.

tionaryAsanextensionofclassicalgametheory,evolu-tioninrecentandrepeatedyears.gamesEspeciallyhave,attractedthestudymuchonlearning

atten-Vol.6No.1,March2008 45

none

modelofplayerswithboundedrationalityinevolution-aryordynamicgames.Inthispaper,weintroducedMarkovprocessintorepeatedgamesandpresentedadynamicmodelforsymmetricrepeatedgames.ThismodelusedMarkovprocesswithpayofftosimulatetheactionsofplayers.Experimentsshowthatthemodeliseffectiveandcanlearnadaptivelywithexotericenvi-ronment.

[3] YaoX,DarwenP.Geneticalgorithmsandevolutionarygames

Press,2000,313–333.

[A].InBarnettW,ChiarellaC,KeenS,etal,eds.Com-merce,ComplexityandEvolution[C].CambridgeUniversityIEEEInternationalConferenceonSystems,ManandCybernetics[C].2003,1348–1354.

erroneousperceptions[A].2004IEEEInternationalConferenceonSystems,ManandCybernetics[C].2004,1006–1011.

[4] ThlolY,AcanA.Antscanplayprisoner’sdilemma[A].2003

[5] KanazawaT,UshioT.Multi-populationreplicatordynamicswith

Acknowledgements

Wewouldliketothanktheeditorsandthereview-ersfortheirconsideration.WealsoacknowledgethesupportbytheNationalNaturalScienceFoundationofChina(GrantNo.60574071).References

[1] MaynardSJ.EvolutionandtheTheoryofGames[M].Cam-bridgeUniversityPress,NewYork,1982.

ture,1973,246:15–18.

[6] AmirM,BerninghausSK.Anotherapproachtomutationand

learning[J].GamesandEconomicBehavior.1996,14:19–

43.

[J].AppliedMathematicalModelling,2003,27:913–927.

[8] NowakMA,SasakiA,TaylorC,etal.Emergenceofcooperation

don),2004,428:646–650.

[7] TadjL,TouzeneA.AQBDapproachtoevolutionarygametheory

andevolutionarystabilityinfinitepopulations[J].Nature(Lon-

[9] AxelrodR.Theevolutionofstrategiesintheiteratedprisoner’s

dilemma[A].InDavisL,ed.GeneticAlgorithmsinSimulated[10] AxelrodR,HamiltonWD.Theevolutionofcooperation.science

[J].1981,211:1390–1396.

Annealing[C].Pitman,London,1987,32–41.

[2] MaynardSJ,PriceGR.Thelogicofanimalconflict[J].Na-

Author

LiuWeibing,male,bornin1978,graduatedfromWuhanUniversityandnowisadoctorstudentinSystemsEngineeringatWuhanUniversity,Wuhan,China.Mr.Liuhaspublishedover7papers.Hiscurrentresearchisgametheory,decisionandcontroltheory,evolutionaryalgorithmetc.HecanbereachedbyE-mail:liuweibing2002

@sohu.com

46 EngineeringSciences

本文来源:https://www.bwwdw.com/article/uls1.html

Top