Load and memory balanced mesh partitioning for a parallel envelope method
更新时间:2023-07-22 07:23:01 阅读量: 实用文档 文档下载
- load推荐度:
- 相关推荐
Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse
LoadandMemoryBalancedMeshPartitioning
foraParallelEnvelopeMethod
OndˇrejMedek,PavelTvrd´ k,andJaroslavKruis
CzechTechnicalUniversity,Prague,CzechRepublic,
xmedeko@fel.cvut.cz,tvrdik@fel.cvut.cz,kruis@fsv.cvut.cz
Abstract.WeuseaparalleldirectsolverbasedontheSchurcomple-mentmethodforsolvinglargesparselinearsystemsarisingfromthe niteelementmethod.Adomaindecompositionofaproblemisperformedus-ingagraphpartitioning.Itresultsinsparsesubmatriceswithbalancedsizes.Anenvelopemethodisusedtofactorizethesesubmatrices.How-ever,thememoryrequirementstostorethemandthecomputationalcosttofactorizethemdependsheavilyontheirstructure.Weproposeatech-niquethatmodi esthemultilevelgraphpartitioningschematobalancerealcomputationalloadormemoryrequirementsofthesolver.
1Introduction
Manyengineeringandscienti cproblemsaresolvedusingthe niteelementmethod(FEM).Weconsiderameshof niteelements.Oneelementconsistsofseveralnodes.Eachnodehasitsdegreesoffreedom(DOFs).Inotherwords,thenodescontainvariables.Globally,thesevariablesformasystemofequationsAx=b,whereAisann×nsparsematrixofcoe cients.AmeshisusuallyrepresentedbyadualgraphGDandanodalgraphGN.Theverticesinthedualgraphrepresentthe niteelementsand2verticesareadjacentifandonlyifthecorrespondingelementsshareacommonsurfacein3Doracommonedgein2D.Theverticesinthenodalgraphrepresentthemeshnodes.Allverticesthatrepresentmeshnodesbelongingtooneelementformaclique.AnexampleofaFEmeshanditsdualandnodalgraphisonFig.1.
WeusesolverSIFEL,developedattheCzechTechnicalUniversity,forsolvingproblemsbytheFEM.Amongothers,itcansolveproblemsinparallelbythepopularmethodoftheSchurcomplements,whicharecomputedbyanenvelopemethod[1].Asolutionofaproblemconsistsof6followingphases:1.
2.
3.
4.
5.Domaindecomposition.Orderingofnodes.Assemblingofsubmatrices.Factorizationofsubmatrices(computationoftheSchurcomplements).Solutionofthereducedproblem.
Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse
1
5
9ad
g
132610beh143711cfi154816(a)(b)(c)
Fig.1.Aquadrilateralmesh(a)with9elementsa–iand16nodes1–16.Thedual(b)andthenodal(c)graphderivedfromthemesh.
6.Backsubstitutiononsubdomains.
Thedomaindecomposition,andoptionallyorderingofnodes,aredoneasprepro-cessingsteps.Thesolverdoestherest.Phase3isthemostmemoryconsumingandPhase4isthemostcomputationallyintensivepartofthewholesolution.
Multileveltoolsarewidelyusedtosolvetheproblemofdomaindecompo-sition.Thedualgraphispartitionedintokparts,inducingsubdomainsandcorrespondingsubmatrices,sothatthesizesofsubmatricesareroughlyequal.De nition1.ConsideragraphG=(V,E)andanintegerk>=2.Anedgecut(nodecut)isasetofedges(vertices,respectively)whoseremovaldividesthegraphintoatleastkpartitions.Thek-waygraphpartitioningproblemisto.partitionVintokpairwisedisjointsubsetsV1,V2,...,Vksuchthat|Vi|=|V|/kandthesizeoftheedgecutisminimized.Apartitioningofagraphbyanodecutissimilar.
Eventhoughthesizesofsubmatricesareroughlyequal,theirmemoryre-quirementsortheirfactorizationtimeinPhase4arenotequal.Tode nethisformally,weusethetermqualitytodenotethememoryorcomputationalcom-plexity.
De nition2.Givenanunbalancingthresholdδ>=1,wesaythatapartitioningV1,V2,...,Vkwithasetofqualities{q1,q2,...,qk}isbalancedif
δ>qi)k/=(maxi=1
ApartitionViisoverbalancedifδ<qik/
ifatleastonepartitionisoverbalanced.kk i=1iqi.qi.(1) Thepartitioningisdisbalanced
Ingeneral,thequalitiesofsubmatricesarein uencedbytheorderingofvariables,aswasalreadymentionedin[2,3].
Inthispaper,wedescribeanovelapproachtothedomaindecompositionthatresultsintopartitioningwithbalancedmemoryrequirementsorbalanced
Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse
factorizationtimeestimations.Thisinfactleadstoshorterexecutiontimeoftheparallelsolver.Theideaistointegrateanorderingalgorithmintoamultilevelgraphpartitioningschema.
Section2describesthepreviouswork.InSection3,thenewre nementheuris-ticsthatallowstobalancebettermemoryorcomputingcomplexityisexplained.InSection4,theresultsofexperimentsarepresentedandSection5concludesthepaper.
2
2.1PreviousWorkDomainDecomposition(Phase1)
Commonmethodsfordomaindecompositionarebasedongraphpartitioning,typicallyofdualgraphs.Asubdomainismadeofelementsfromthesameparti-tion.Thenodesbelongingtomorethanonesubdomainarecalledboundaryandtheremainingnodesareinternal.Variablesoftheinternal(boundary)nodesarecalledcorrespondingly.AnexampleonFig.2(a)showsa2-waypartitioningofGDofthequadrilateralmeshfromFig.1.Thecorrespondingdomaindecompo-sitionisshownonFig.2(b).Theboundarynodesare3,7,10,11,and14.NotethatthiswayofdomaindecompositionproducespartitioningofthenodalgraphGNbyanodecut,asshownonFig.2(c).
1
9
135adg14142610be371110h153711cfi1648(a)(b)(c)
Fig.2.PartitioningofthemeshfromFig.1usingitsdualgraph(a)into2partitions(b)andcorrespondingpartitioningofthenodalgraph(c).Theedgecut(a)isindicatedbyadashedlineandnodesinthenodecut(c)as lledsquares.
MultilevelGraphPartitioning.Popularmultilevelgraphpartitioningsoft-wareisMETIS[4,5],CHACO[6],JOSTLE[7],andSCOTCH[8].Ourworkisbasedonthemultilevelk-waygraphpartitioningimplementedinMETIS[5].Thisschemaconsistsofthefollowing3phases,shownonFig.3.
Coarsening.AsequenceofsmallergraphsGl=(Vl,El)isconstructedfrom
theoriginalgraphG=G0=(V0,E0)sothat|Vl|>|Vl+1|.Thesequenceof
Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse
G0
Uncoarsening phase3G
GInitial partitioning phase
Fig.3.Theschemaofmultilevelk-waypartitioning.
coarsergraphscreateslevels.Amatchingisoftenusedtocollapse2verticesintoamultivertex.TheSHEMheuristicsformatching,introducedin[5],workswell.
Initialpartitioning.Whenthecoarsestgraphissu cientlysmall,itcanbe
partitionedbyanygraphpartitioningtechnique.
Uncoarsening.AcoarsergraphGlisuncoarsedtoGl 1andthepartitioning
ofGlisprojectedtoGl 1andthenre ned.TheFiduccia-Mattheyses(FM)heuristicsisasimple,fast,andsu cientlygoodoptionforthere nement.Itsearchescandidateverticesinthesetofboundaryvertices,i.e.,verticesadjacenttoavertexinanotherpartition.Thenittriestomoveaselectedvertexintootherpartitions.Themoveisacceptedifoneofthefollowingconditionsisful lled:
1.Thesizeoftheedgecutisdecreasedandthepartitioningremainsbal-anced.
2.Thesizeoftheedgecutisnotincreased,butthebalancingisimproved.Toimprovethebalancingofastronglydisbalancedpartitioning,abalancingstepmaybyadded.ItworksliketheFMheuristics,buttheconditionsforacceptingamovearedi erent:
1.Thebalancingisimproved.
2.Thesizeoftheedgecutisdecreased,butthebalancingisnotworsened.
2.2ReorderingandAssemblingofSubmatrices(Phases2+3)
Afterthedomaindecomposition,theinternalnodesinallpartitionsarereorderedtominimizethesizeoftheenvelope.
De nition3.Inthei-thstepofthefactorizationofasymmetricmatrixA,thewavefrontissetofpairs
<wi(A)={{r,i}: ars=0,r>=i,s=i}.(2)
Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse
TheenvelopeofAisthen(symbol\isthesetdi erence)
Env(A)=n
i=1wi(A)\{i,i}.(3)
TheenvelopemethodstoresthepartsofAinsideenvelopes,whichrepresentsthemainportionofthememoryconsumptionofthesolver.TheSIFELsolverusestheSloanreorderingalgorithm[9],butotheralgorithms(RCM,hybridalgorithm)canbeusedaswell.
Everynodeigeneratesdiequations,wherediisthenumberofDOFs.Letniandnbdenotethenumberofinternalandboundary,respectively,variables.Clearly,ni+nb=n.TheSIFELsolverreadsthesubdomainsandassemblessubmatriceswithnodesinagivenorder.Theboundarynodescomelast.Wees-timatememoryrequirementsW(A)oftheenvelopesolveranditscomputationalcomplexityOP(A)by
W(A)=
OP(A)=n i=1ni
i=1|wi(A)|=|Env(A)|+n|wi(A)|2.(4)(5)
3TheNewRe nementHeuristicswithQualityBalancingInthissection,weexplainthemainideasofloadandmemorybalancingfortheparallelenvelopemethod.Atthebeginning,itmustbedecidedwhichqualitywillbebalanced:thecomputationalloadorthememorycomplexity.Thenthepartitioningisstarted.Toestimatethequality,thepartitionermusthavetheinformationaboutthestructureofthematrixA.SincethedualgraphGDdoesnotsu ce,thepartitionermusthaveatdisposalalsotheweightednodalgraphGN,inwhicheachvertexislabelledwiththenumberofDOFsofthenode.WepartitionGDandprojectthepartitioningofGDtoGNtoestimatethequality.So,thepartitionerneedsalsotheinformationabouttherelationbetweenGDandGN.
Weuseamultilevelk-waygraphpartitioningasdescribedinSect.2.1.The rsttwophasesareperformedunchangedasinMETIS.TheSHEMheuristicsisusedforthecoarseningandthemultilevelgraphbisectionisusedfortheinitialpartitioning.Wehavemodi edtheconditionsofmoveacceptanceofthere nementheuristicsintheuncoarseningphaseandextendeditwithaqualityestimationprocess.ThisnewheuristicsiscalledQB.Assumethatthecurrentlevelisl.SimilartoFM,QBchoosesavertexasacandidateformovingfromaDsourcepartitionGDl,stoatargetpartitionGl,t.Todecidewhetherthemovewill
Dbeaccepted,QBstartsforbothpartitionsGDl,sandGl,ttheestimationprocess,
sketchedonFig.4,toobtaininformationaboutbalancingofthequalities.DFirst,partitionGDl,p,p∈{s,t},isprojectedtoG0,p,theoriginalpartition.(Thisisskippedinthe nallevel,wherel=0).Afterthat,allnodesbelonging
Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse
Fig.4.Data owoftheQBheuristics
NNtoelementsfromGD0,pcorrespondtoapartitionGpofG.Thentheinternal
NverticesofGNparereorderedbytheSloanalgorithm.Finally,thequalityofGpisestimatedandreturnedtothere nementheuristics.
InthecurrentimplementationoftheQBheuristics,nodeshaveeithercon-stantnumberofDOFsd>0orareconstrained,i.e.,thenumberofDOFsis0.NAllconstrainednodesareomittedinthestepofprojectionofGD0,ptoGp,i.e.,thereorderingisperformedonlywithnodeswiththenumberofDOFsd>0.Afterthat,nodeigeneratesequationsnumbereddi,di+1,...di+d 1andwavefrontswdi(A),wdi+1(A),...,wdi+d 1(A).
TheoriginalFMheuristicscomputessumsofweightsofverticesinthesourceandtargetpartitionsforeverycandidatemove.Infact,theweightofthecan-didatevertexissubtractedfromtheweightofthesourcepartitionandaddedtotheweightofthetargetpartition.However,intheQBheuristics,thiswouldimplythereorderingandestimationcomputingforeverycandidatemoveandthiswouldextremelyslowdownthere nement.Thus,wehadtomodifytheconditionsofmoveacceptanceasfollows:
1.Thesizeoftheedgecutisdecreasedandthetargetpartitionisnotoverbal-anced.
2.Thequalityqsofthesourcepartitionisgreaterthanthequalityqtofthetargetpartition,butthesizeoftheedgecutisnotincreased.
Theconditionsofmoveacceptanceofthebalancingsteparealsomodi ed:
1.qs>qt.
2.Thesizeoftheedgecutisdecreasedandqs>=qt.
Onlyifamoveisaccepted,thequalitiesqsandqtarerecomputed.Notethatthenewconditionsmayleadtooverbalancingofthetarget,oreventhesource,partitions.Therefore,ifthenewvalueqsisgreaterthanitspreviousvalue,thevertexmoveisnulli ed.
Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse
4ExperimentalResults
TheSIFELsolverwasmodi edtoperformjusttheassemblingandfactorizationofsubmatrices(Phases3and4inSect.1).AllexperimentswereperformedonaPCwithIntelPentiumIII,1GHz,underGNU/Linux2.4.Thebenchmarksaremodelsofrealproblemsofstructuralmechanics:“ oor”and“sieger”are2Dproblemsdiscretizedbytrianglesand“block”,“jete”,and“wheel”are3Dproblemsdiscretizedbytetrahedrons.TheirdescriptionisinTable1.
Table1.Descriptionoftestproblems.#standsfornumberofproblem.#
oor
sieger
block|V(GD)|25065421140516789problemname84123157529|V(GN)|
1
ofsubmatrixAi.ThentmaxW=W.Similarly,lettibethefactorizationtime k=maxkt,ii=1i=1ti,and t=tmax/k
Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse
parisonoftheFMandQBheuristics
METIS
#k
14
5243889227
16
1405623400
1603540777
8
533
32
34
42557336176
16
87441288023
91517489307
8
23982987406
32
54
174711810322
16
59171838763971229MemoryBalancingLoadBalancing|Ec|Wmax Wtmax ttp34373125811.0931.01.27491.3515.51.726298.91.096694411758111.073.21.33491.361.51.7914470.81.111251.087.41.141576.51.101028815234601.063.21.33161.352.31.916631.61.6110210723083191.090.51.35462753179726401.03233.71.072191.1265.11.20430351.91.03222604527474721.2215.31.392821.335.11.5189863.01.123501.24167.91.3993097.61.0376162856623141.0934.21.13591.4814.61.6724609.01.139234717498441.092.21.21104999202694551.07178.01.171171.31106.11.58180267.91.08156317335579041.0621.31.151571.2910.01.7064206.51.33577References
1.George,A.,Liu,J.:ComputerSolutionofLargeSparsePositiveDe niteSystems.PrenticeHall,EnglewoodCli s,NJ(1981)
2.Hendricson,B.:Graphpartitioningandparallelsolvers:Hasemperornoclothes?Irregular’98,LectureNotesinComputerScience1457(1998)218–225
3.Hendricson,B.:Loadbalancing ctions,falsehoodsandfallacies.AppliedMathe-maticalModelling25(2000)99–108
4.Karypis,G.,Kumar,V.:put.20(1998)359–392
5.Karypis,G.,Kumar,V.:put.48(1998)96–129
6.Hendrickson,B.,Leland,R.:Amultilevelalgorithmforpartitioninggraphs.In:Proceedingsofthe1995ACM/IEEEconferenceonSupercomputing(CDROM),ACMPress(1995)28
7.Walshaw,C.,Cross,M.:MeshPartitioning:aMultilevelBalancingandRe put.22(2000)63–80(originallypublishedasUniv.GreenwichTech.Rep.98/IM/35).
8.Pellegrini,F.:Staticmappingbydualrecursivebipartitioningofprocessandarchi-tecturegraphs.SHPCC’94(1994)486–493
9.Kumfert,G.,Pothen,A.:Twoimprovedalgorithmsforenvelopeandwavefrontreduction.BIT37(1997)559–590
- 1The_example_of_bootstrap_method
- 2how to get balanced during teenage prieds
- 3ansys荷载工况组合 Load Case
- 4A novel 3D mesh compression using mesh segmentation with multiple principal plane analysis
- 5Parallel Communication Mechanisms for Sparse, Irregular Applications
- 6Test-Method-Information
- 7childhood memory(童年的回忆)
- 8childhood memory(童年的回忆)
- 9Analysis and design of parallel mechanisms with flexure joints
- 10DC, AC and Pulse load of Multilayer Ceramic Capacitors
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- partitioning
- balanced
- parallel
- envelope
- memory
- method
- Load
- mesh
- 《中华人民共和国政府采购法实施条例》“奔图杯”全国知识竞赛(答案)
- 知名金融行业品牌策划活动策划品牌建设案例方案范文
- 03-高墩多跨连续刚构桥最优合龙顺序选择
- 北京市:开展第四次综合交通调查
- 八年级生物下册期末考试试卷及答案30
- 医学专业学生简短个人自我鉴定范文材料投稿
- 这张图浓缩了人生的真相
- 在Windows95下建立FORTRAN程序的开发环境
- 2014七年级英语单词竞赛试卷
- 北师大版语文三年级上册知识点
- 晋商以义制利精神的历史渊源、文化内涵及现代启示
- 机场日语常用对话
- 上海住宅户型50年发展演变分析
- 美容整形缝合技术在202例面部清创术中的应用
- listen_to_this2_英语中级听力2答案及原文
- IgA肾病的中西医治疗进展
- 照付不议协议的类型
- 双堠西瓜地标产品生产规程
- 招标代理机构设置运作机制及流程
- 2014-2019年中国干散货运输产业市场竞争力分析及投资价值预测报告