A Novel Dynamic Decision Model in 2-player Symmetric Repeated Games
更新时间:2023-06-03 02:42:02 阅读量: 实用文档 文档下载
- a站推荐度:
- 相关推荐
none
ANovelDynamicDecisionModelin2-player
SymmetricRepeatedGames
(1.InstituteofSystemsEngineering,WuhanUniversity,Wuhan430072,China;2.SchoolofManagement,ChinaUniversityOfGeosciences,Wuhan430074,China)
LiuWeibing,WangXianjia,WangGuangmin
1
1
2
Abstract:ConsideringthedynamiccharacterofrepeatedgamesandMarkovprocess,thispaperpresentedanoveldy-namicdecisionmodelforsymmetricrepeatedgames.Inthismodel,players’actionsweremappedtoaMarkovdecisionprocesswithpayoffs,andtheBoltzmanndistributionwasintroduced.Ourdynamicmodelisdifferentfromothers’,weusedthisdynamicmodeltostudytheiteratedprisoner’sdilemma,andtheresultsshowthatthisdecisionmodelcansuccessfullybeusedinsymmetricrepeatedgamesandhasanabilityofadaptivelearning.Keywords:gametheory;evolutionarygame;repeatedgame;Markovprocess;decisionmodel
1 Introduction
In1970sthefundamentalnotionofevolutionarygametheorywasintroducedbyMaynardSmith.Inhis
[1]
bookEvolutionandTheoryofGameshepresentedanevolutionaryapproachinclassicalgametheory,andhegavethedefinitionofevolutionarystablestrategy
[2]
(ESS)withPrice.Evolutionarygametheoryhasex-tensiveapplicationsinmanyfields,suchasmathemat-ics,biology,ecology,economicsandsociology.
Contrarytoclassicalgametheory,inevolutionarygametheorytheindividualsofapopulationarenotas-sumedtoactconsciouslyandrationally.Sothetheoret-icframeofclassicalgametheoryandthenotionofe-quilibriumcannotbeadaptedtoevolutionarygametheory.Thelastdecadehaswitnessedanumberofpublicationsonthebuildingofmodelsandanalyticmethodsforevolutionaryandrepeatedgames.YaoandDarwenpresentedamethodthatintroducedgenetical-gorithmintoevolutionarygames(iteratedprisoner’s
[3]
.YuceThlolandAdnanAcaninvestiga-dilemma)
tedanantcolonyoptimizationapproachforiterated
[4]
prisoner’sdilemma.Thismethodprovidedgamestrategiesofbetterqualitythangeneticalgorithms,butneededlongerrunningtimes.TakafumiKanazawaandToshimitsuUshiogaveamulti-populationreplicatordy-[5]
namicwitherroneousperceptions.
Duringlastdecadestherearemanyscholarswhousedstochasticprocessinevolutionaryandrepeatedgames.Amiretal.proposedadynamicmodelforsym-[6]
metricgamesusingbirthanddeathprocess.Tadj
Received29January2007
2 SomePreliminaries
andTouzeneextendedtheworkofAmiretal.andgave
aQBDapproachforevolutionarygames(non-symmet-[7]
ricandsymmetricgames).RecentlytheMoranprocesswassuccessfullyintroducedintoevolutionarygamesbyNowakandFudenbergandanewdynamic
[8]
.modelwaspresented
Inthispaper,webuiltanewdynamicmodelforevolutionaryandrepeatedgamesusingtheMarkovde-cisionprocess.Inthismodel,theBoltzmanndistribu-tionwasintroduced.Weusedthedynamicmodeltostudytheiteratedprisoner’sdilemma,andtheresultsshowthatthemodelcansuccessfullysimulatetheac-tionsofplayersandhastheabilityofadaptivelearn-ing.
2.1 MarkovProcess
Supposing.I={0,1,2…}isthespaceofstate,T={0,1,2,…}isasetoftime.Astochasticprocess{Xt,t∈T}iscalledMarkovifforarbitraryandstatei0,i1,…,in-1,in,j,wehaveP{Xt+1=j|Xt=i,Xt-in-1=1,…,X(1)=i1,X(0)=i0}
j|Xt=i}=P{Xt+1=
i.e.,theMarkovprocesswhosefutureprobabilitiesaredeterminedbyitsmostrecentvalues.
Theconditionalprobability
Pij(1)=P{Xt+1=j|Xt=i}
iscalledone-steptransitionprobability.Itmeansthe
probabilityofstatetransitionfromitoj,soPij(1)sat-isfiesthefollowing:
Vol.6No.1,March2008 43
none
P∑ij(1)≥0i,j∈I;
j
P
ij
(1)=1i∈I.
probabilityThereforematrix
,wehavetheone-stepMarkovtransition
P00P1=
…
(1)P01Pn0(1)P…
(1)……
………
n1(1)….
2.2 ……
…
…
strategicWeGamerestrictTheory
ofthreeform.Generallyourviewtoanormaltheclassformoffinitegamegamesconsistsiners,1)keywherePlayerscomponents:LetI=:
{1,2,…,n}beasetofplaydenote2)Strategiesnisapositivespace:integer-Foreach.
playeri∈egiesplayersetasetofallowableactions,calledthepureI,letstratS-i
(s1,s2,…,i∈.IThechoiceofaspecificactionsi∈Siofasisncalledapurestrategy.Thevectors=andplayerany3)playerPayoff)isifunctioncalled∈I,let:apureuiFor(s1,…,anystrategiessstrategiesprofilen)betheprofile.
payofftossicalEvolutionary.
gamegametheory,gameandevolutionarytheoryistheextensionofclas-keycallednotion.Theevolutionarystablestrategygame(isESSadynamic倡)is倡
the
straintsanESSinevolutionary,ifitgametheory.(s,…,sn)is1.:
adherestothefollowingcon-ui(s倡1,…s倡i-1,s倡i,s倡i+1,…s倡n倡
s倡i-1,s倡倡)≥ui(s1,…,s倡2.i,sIfi+1u,…,i(s倡sn)
1,…s倡i-1,s倡i,s倡i+1,…倡倡
s,ssn)=ui(s1i-1ij,3 Dynamic(s倡is+1,…,s倡n),thenui(s1倡
,…,1,…,Decisionsi-1,si,si+1Model
,…,sn,…).
si-1,si,si+1,…n)≥ui3.1 InOurTheresearchIdeaoffieldModel
focusesontherepeatedgames.payoffsrepeatedgames.Inthisgamespaper,players,weregardedupdatestrategiesiterationbytimestheir{spaceX(t),asasetoftimet=1,2,…,inMarkovprocessofgamesItof∈MarkovT},strategyprocessspace,andwasmappedtothestateThereforewereov,theregardedrepeatedasgamesrewardstheforpayoffsstateintransitionrepeated.prisonerprocessplayers’swithrewards.Forwereexamplemapped,thetoaiteratedMark-theiterated,eitherdilemmaprisonercooperationassumes’sdilemma(Ca)binarywasordefectionchoicelookedas(aDfor)theMark.So-44 EngineeringSciences
ovatprocesswithth
two-statetransition.Moregenerally,if(timet(thetrepeatedgame)theplayerisinstateiplayeri=C,D)).Thenattimet+1theprobabilityMarkovchoosespij=
P{Xtransitionstrategyt+1=j|XprobabilityCorDcanbeobtainedthatthebyt=i},i,j=matrixCorD(.Fig
.1).Where
Figiterated.1 TheprisonerMarkov’sprocessdilemma
for3.2 TheDecisionModelinSymmetric theAsGames
Repeated
alsopayoffaMarkovtheberegardedwhentheprocesswithNstates,supposinguijis
asstatetransferredfromitoj,uijcanriestransitionvariablesofrewardsof.statesarewardSothat,thethesystemforstatetransition.Withrewardswilluijgeneratease-byMarkov,andtransitionitsprobabilityarestochasticprobabilitydistributionisdeterminedsingstrategynowFor2-theplayersplayersymmetricrepeatedmatrix.
games,suppo-nlytimes’isrepeatedi),wecanisinstatei(namelytheplayer’sgamesgettheusingtotalMarkovexpectedpayoffaftertimes,wefollowing’assumetransitionνprocess.First-i(equationfromt)isν:
statethetotal.Easilyexpected,wecanpayoffgetaftertheN
i(t)=∑j=1
pij[uij+νj(t-1)]
equation(1)(ican=1,2,…,beN;t=1,2,3,…)
(1)νN
simplifiedasN
follows:
i(t)=∑j=1
pijuij+j=1
j(t-1)
(2)
Ifdefine qN
i=∑∑pijνj=1
pijuij (i=1,2,…,N)
qtimeican’sbetransitionexplainedν(t)fromas=qstatetheiexpected(3)
N
.Sothatrewardwecanafterobtain
oneiiUsingtheformofvectors+equation∑j=1pijνj((4)t-1)
(4)
asthefollowingWherecolumn,VV((tt))=:
willbedenotedisQthe+PVcolumn(t-1)vector(n=1,2,3,…)ofν(5)
i(t),Qistheprobabilityvectormatrixof.
qi,andPistheMarkovtransition
none
3.3 MarkovTransitionProbabilityMatrix
expectedFromtransitionpayoffequationV(t(5),ifwewanttoobtainthetotaltheprobability),matrixfirstlyPwe.InshouldthisgettheMarkovprovidesBoltzmanntransitiononemethoddistributionpaper,weadopt,.TheBoltzmanndistributionpijfrom=
expstate(uj(ti)towhere/λj)at/∑timethekexptprobabilityofstate(is
Wherestrategy,uj(t)isthe(k=payoff1,2,…,whenN)uk(t)/λ theplayer(6)
thethelearningj.Theparameterλhasanimportantchoosesroleincreasingrandomnessprocessofdecisions.Byincreasing.Onλ,wecanincreaseenablesλwillresultindecreasingtherandomnessotherhand,which,de-4accuratelythe Examples.
playertochoosetheoptimalstrategymoreandNumericalResults
pleInthissection,amplefor2-playerssymmetricwepresentedrepeatedangamesillustrative.Asexam-example,considertheiteratedprisoner’sdilemmatheasexan-nonplayer-zero,andhassumprisoner’sdilemmaisanon-cooperative,
twogamechoices,played;eitherbetweencooperationtwoplayers.Eachaccordingthepayoffstheygotfortheirchoicesareordefectioncalculated,Table1to TableThegeneral1.
formofprisoner’sdilemma
Rowplayer
CooperationDefection
(3,3)(5,0)
(0,5)(1,1)
FromdominantTabledefectionstrategy.1we,socananyeasilyrationalgetthatplayerdefectionwillchooseisa
iflemmatheycooperatenomatterrodofprisoners,theywhattheotherplayerchooses.But.Towouldovercomegetmore.Thisisthedi-ma[presented9-10]
ous,whichtherepeatsthoughttheofiteratedthisconventionalprisonerproblemgame’s,numerdilemAxel---bothtimesplayersplayerswithnotthehope.Repeatingthenumberofcooperationtheofgamesrepetitionsunknownto.Inthisthispaperwaycan,givegenttryforrepeated.Rathertoexplain,weintendwhethertocooperationdevelopatheoreticwillbeweemerdo-haveSupposinggamesmodelpresent.
timeanalyzecompleted30times’trepeated=30,namelygamestwo.Nowplayersgamesrepeatedgameswhent>30.Beforet=30thewetimesand9thatwasrespectivelyrowobservedplayer,inchoosesthereforelast30timesstrategywecan’repeatedCandassumeDthegamesis21.
FurthermoreandtionDtrix:
(6)issupposed,thetotalwegetthe49payoffobtainedbychoosingCMarkovand9respectivelytransitionprobability.Usingequama--P=
0.According0.770.0.33
tytransitionfunctionto:
,weVoncanNeumannobtain-Morgenstemtherewardmatrixexpectedofutilistate-U=
17/3Inaddition,fromequation17/
3Q=
(3)1.4wehave
Frominitializethebeginningofthe301.4
threpeatedgame,ifwegetthetotalvi(0)Tableexpected=0,byusingequation(5),wecanplayer2 Theiniteratedtotalpayoffexpected(Tableprisonerpayoffs2).
’sdilemma
ofrowt=3031
32
33
34
35
36
37
38
39
40…vcvp(t)clusionsForfurtherstudy,wecangetthefollowingcon-1):
FromtransitionP=0.probabilitymatrix
wetionknowis,thethesteadyMarkov0.770.0.33
,
-statedistributionprocessisergodicof.Bycomputa-isthesameasthetransitionprobabilitytheMarkovmatrixPprocess
.Thatbothtosaybilityplayers,the0.7andwillexample0.3choosehasanevolutionaryoutcomethatrespectivelystrategyCandDwithproba-bepaperchosen2)Theadaptivelyvalueofaccordingparameter.
λinequation(6)cannodenticalpreference,λ=0.05.tochoicesNotethat,namely,whentoenvironmentplayersλ→0,,inthiswillplayersassignhavei-λplayersishighprobabilitieswhenare,theinclinedrandomnesstothedifferentstrategies.WhentochooseofstrategydecisionsD.isInincreasingaddition,cooperationtherandomnessweemergenceiswillincreasingbe,theprobabilityfor,incanupdatethevalueofλaccordingdecreasingto.environmentTherefore,5modelevolutionary Conclusions
inthispaperandhasrepeatedtheabilitygamesof,adaptivesothelearningdynamic.
tionaryAsanextensionofclassicalgametheory,evolu-tioninrecentandrepeatedyears.gamesEspeciallyhave,attractedthestudymuchonlearning
atten-Vol.6No.1,March2008 45
none
modelofplayerswithboundedrationalityinevolution-aryordynamicgames.Inthispaper,weintroducedMarkovprocessintorepeatedgamesandpresentedadynamicmodelforsymmetricrepeatedgames.ThismodelusedMarkovprocesswithpayofftosimulatetheactionsofplayers.Experimentsshowthatthemodeliseffectiveandcanlearnadaptivelywithexotericenvi-ronment.
[3] YaoX,DarwenP.Geneticalgorithmsandevolutionarygames
Press,2000,313–333.
[A].InBarnettW,ChiarellaC,KeenS,etal,eds.Com-merce,ComplexityandEvolution[C].CambridgeUniversityIEEEInternationalConferenceonSystems,ManandCybernetics[C].2003,1348–1354.
erroneousperceptions[A].2004IEEEInternationalConferenceonSystems,ManandCybernetics[C].2004,1006–1011.
[4] ThlolY,AcanA.Antscanplayprisoner’sdilemma[A].2003
[5] KanazawaT,UshioT.Multi-populationreplicatordynamicswith
Acknowledgements
Wewouldliketothanktheeditorsandthereview-ersfortheirconsideration.WealsoacknowledgethesupportbytheNationalNaturalScienceFoundationofChina(GrantNo.60574071).References
[1] MaynardSJ.EvolutionandtheTheoryofGames[M].Cam-bridgeUniversityPress,NewYork,1982.
ture,1973,246:15–18.
[6] AmirM,BerninghausSK.Anotherapproachtomutationand
learning[J].GamesandEconomicBehavior.1996,14:19–
43.
[J].AppliedMathematicalModelling,2003,27:913–927.
[8] NowakMA,SasakiA,TaylorC,etal.Emergenceofcooperation
don),2004,428:646–650.
[7] TadjL,TouzeneA.AQBDapproachtoevolutionarygametheory
andevolutionarystabilityinfinitepopulations[J].Nature(Lon-
[9] AxelrodR.Theevolutionofstrategiesintheiteratedprisoner’s
dilemma[A].InDavisL,ed.GeneticAlgorithmsinSimulated[10] AxelrodR,HamiltonWD.Theevolutionofcooperation.science
[J].1981,211:1390–1396.
Annealing[C].Pitman,London,1987,32–41.
[2] MaynardSJ,PriceGR.Thelogicofanimalconflict[J].Na-
Author
LiuWeibing,male,bornin1978,graduatedfromWuhanUniversityandnowisadoctorstudentinSystemsEngineeringatWuhanUniversity,Wuhan,China.Mr.Liuhaspublishedover7papers.Hiscurrentresearchisgametheory,decisionandcontroltheory,evolutionaryalgorithmetc.HecanbereachedbyE-mail:liuweibing2002
@sohu.com
46 EngineeringSciences
正在阅读:
A Novel Dynamic Decision Model in 2-player Symmetric Repeated Games06-03
27、青山处处埋忠骨导学案05-20
青岛版小学数学一年级下册《厘米的认识》听课评课记录12-19
你是我心中的一首歌06-05
彩虹精化:2010年第一次临时股东大会的法律意见书 2010-01-2006-07
2022-2022年清远市清城区建设工程质量检测站招聘真题及答案解析.04-10
喂小鸡作文300字06-12
《房地产评估》 模拟试题A卷06-29
浅谈商务谈判僵局化解策略的运用07-20
回忆中的痛作文500字06-26
- 1The decision of the New York Philharmonic to hire
- 2DYNAMIC PATH-PLANNING FOR SEARCH AND DESTROY
- 3Simulating the Electroweak Phase Transition in the SU(2) Higgs Model
- 4Model Test One
- 5model combination (tong)
- 6Dirac operator and Ising model on a compact 2D random lattice
- 7A novel diffusion cell for in vitro transdermal
- 8A Novel Approach for Automatic Palmprint Recognition
- 9CD-Player-DAC-Transport List
- 10drools Decision Table(决策表)
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- Symmetric
- Decision
- Repeated
- Dynamic
- player
- Novel
- Model
- Games
- 电工基础第三节 交流电路
- 车用空调机工作原理说明书
- DOTA食尸鬼个人心得
- 中国冷冻冷藏市场现状调研及发展商机研究报告(2014-2018)
- 赤土店派出所深入贯彻学习“坚持把群众呼声作为第一信号”文章精神
- 实用沟通技巧_(超级棒_理论与案例相结合)
- 2013年度防治水规划
- 东易日盛装饰公司多年来一直在行业
- 根据Proteus的步进电机的设计仿真
- 市政二级建造师历年真题考试试卷汇总
- 这些方法助你时间高效的人际沟通
- 中国抑郁障碍防治指南目录
- 【合肥工业大学】【半导体器件物理】试卷含答案
- 三角函数解答题 20051122083651319
- 河北省唐山一中2018-2019学年高一下学期期末考试地理试题 Word版含答案
- 深圳市2011年初中毕业生学业考试科学试卷((Word版)
- 高中历史必修二知识点总结
- 2013版思想道德修养与法律基础绪论(完整版)
- 2016中考历史复习提纲
- 19种常见蔬菜营养价值分析