Load and memory balanced mesh partitioning for a parallel envelope method

更新时间:2023-07-22 07:23:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse

LoadandMemoryBalancedMeshPartitioning

foraParallelEnvelopeMethod

OndˇrejMedek,PavelTvrd´ k,andJaroslavKruis

CzechTechnicalUniversity,Prague,CzechRepublic,

xmedeko@fel.cvut.cz,tvrdik@fel.cvut.cz,kruis@fsv.cvut.cz

Abstract.WeuseaparalleldirectsolverbasedontheSchurcomple-mentmethodforsolvinglargesparselinearsystemsarisingfromthe niteelementmethod.Adomaindecompositionofaproblemisperformedus-ingagraphpartitioning.Itresultsinsparsesubmatriceswithbalancedsizes.Anenvelopemethodisusedtofactorizethesesubmatrices.How-ever,thememoryrequirementstostorethemandthecomputationalcosttofactorizethemdependsheavilyontheirstructure.Weproposeatech-niquethatmodi esthemultilevelgraphpartitioningschematobalancerealcomputationalloadormemoryrequirementsofthesolver.

1Introduction

Manyengineeringandscienti cproblemsaresolvedusingthe niteelementmethod(FEM).Weconsiderameshof niteelements.Oneelementconsistsofseveralnodes.Eachnodehasitsdegreesoffreedom(DOFs).Inotherwords,thenodescontainvariables.Globally,thesevariablesformasystemofequationsAx=b,whereAisann×nsparsematrixofcoe cients.AmeshisusuallyrepresentedbyadualgraphGDandanodalgraphGN.Theverticesinthedualgraphrepresentthe niteelementsand2verticesareadjacentifandonlyifthecorrespondingelementsshareacommonsurfacein3Doracommonedgein2D.Theverticesinthenodalgraphrepresentthemeshnodes.Allverticesthatrepresentmeshnodesbelongingtooneelementformaclique.AnexampleofaFEmeshanditsdualandnodalgraphisonFig.1.

WeusesolverSIFEL,developedattheCzechTechnicalUniversity,forsolvingproblemsbytheFEM.Amongothers,itcansolveproblemsinparallelbythepopularmethodoftheSchurcomplements,whicharecomputedbyanenvelopemethod[1].Asolutionofaproblemconsistsof6followingphases:1.

2.

3.

4.

5.Domaindecomposition.Orderingofnodes.Assemblingofsubmatrices.Factorizationofsubmatrices(computationoftheSchurcomplements).Solutionofthereducedproblem.

Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse

1

5

9ad

g

132610beh143711cfi154816(a)(b)(c)

Fig.1.Aquadrilateralmesh(a)with9elementsa–iand16nodes1–16.Thedual(b)andthenodal(c)graphderivedfromthemesh.

6.Backsubstitutiononsubdomains.

Thedomaindecomposition,andoptionallyorderingofnodes,aredoneasprepro-cessingsteps.Thesolverdoestherest.Phase3isthemostmemoryconsumingandPhase4isthemostcomputationallyintensivepartofthewholesolution.

Multileveltoolsarewidelyusedtosolvetheproblemofdomaindecompo-sition.Thedualgraphispartitionedintokparts,inducingsubdomainsandcorrespondingsubmatrices,sothatthesizesofsubmatricesareroughlyequal.De nition1.ConsideragraphG=(V,E)andanintegerk>=2.Anedgecut(nodecut)isasetofedges(vertices,respectively)whoseremovaldividesthegraphintoatleastkpartitions.Thek-waygraphpartitioningproblemisto.partitionVintokpairwisedisjointsubsetsV1,V2,...,Vksuchthat|Vi|=|V|/kandthesizeoftheedgecutisminimized.Apartitioningofagraphbyanodecutissimilar.

Eventhoughthesizesofsubmatricesareroughlyequal,theirmemoryre-quirementsortheirfactorizationtimeinPhase4arenotequal.Tode nethisformally,weusethetermqualitytodenotethememoryorcomputationalcom-plexity.

De nition2.Givenanunbalancingthresholdδ>=1,wesaythatapartitioningV1,V2,...,Vkwithasetofqualities{q1,q2,...,qk}isbalancedif

δ>qi)k/=(maxi=1

ApartitionViisoverbalancedifδ<qik/

ifatleastonepartitionisoverbalanced.kk i=1iqi.qi.(1) Thepartitioningisdisbalanced

Ingeneral,thequalitiesofsubmatricesarein uencedbytheorderingofvariables,aswasalreadymentionedin[2,3].

Inthispaper,wedescribeanovelapproachtothedomaindecompositionthatresultsintopartitioningwithbalancedmemoryrequirementsorbalanced

Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse

factorizationtimeestimations.Thisinfactleadstoshorterexecutiontimeoftheparallelsolver.Theideaistointegrateanorderingalgorithmintoamultilevelgraphpartitioningschema.

Section2describesthepreviouswork.InSection3,thenewre nementheuris-ticsthatallowstobalancebettermemoryorcomputingcomplexityisexplained.InSection4,theresultsofexperimentsarepresentedandSection5concludesthepaper.

2

2.1PreviousWorkDomainDecomposition(Phase1)

Commonmethodsfordomaindecompositionarebasedongraphpartitioning,typicallyofdualgraphs.Asubdomainismadeofelementsfromthesameparti-tion.Thenodesbelongingtomorethanonesubdomainarecalledboundaryandtheremainingnodesareinternal.Variablesoftheinternal(boundary)nodesarecalledcorrespondingly.AnexampleonFig.2(a)showsa2-waypartitioningofGDofthequadrilateralmeshfromFig.1.Thecorrespondingdomaindecompo-sitionisshownonFig.2(b).Theboundarynodesare3,7,10,11,and14.NotethatthiswayofdomaindecompositionproducespartitioningofthenodalgraphGNbyanodecut,asshownonFig.2(c).

1

9

135adg14142610be371110h153711cfi1648(a)(b)(c)

Fig.2.PartitioningofthemeshfromFig.1usingitsdualgraph(a)into2partitions(b)andcorrespondingpartitioningofthenodalgraph(c).Theedgecut(a)isindicatedbyadashedlineandnodesinthenodecut(c)as lledsquares.

MultilevelGraphPartitioning.Popularmultilevelgraphpartitioningsoft-wareisMETIS[4,5],CHACO[6],JOSTLE[7],andSCOTCH[8].Ourworkisbasedonthemultilevelk-waygraphpartitioningimplementedinMETIS[5].Thisschemaconsistsofthefollowing3phases,shownonFig.3.

Coarsening.AsequenceofsmallergraphsGl=(Vl,El)isconstructedfrom

theoriginalgraphG=G0=(V0,E0)sothat|Vl|>|Vl+1|.Thesequenceof

Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse

G0

Uncoarsening phase3G

GInitial partitioning phase

Fig.3.Theschemaofmultilevelk-waypartitioning.

coarsergraphscreateslevels.Amatchingisoftenusedtocollapse2verticesintoamultivertex.TheSHEMheuristicsformatching,introducedin[5],workswell.

Initialpartitioning.Whenthecoarsestgraphissu cientlysmall,itcanbe

partitionedbyanygraphpartitioningtechnique.

Uncoarsening.AcoarsergraphGlisuncoarsedtoGl 1andthepartitioning

ofGlisprojectedtoGl 1andthenre ned.TheFiduccia-Mattheyses(FM)heuristicsisasimple,fast,andsu cientlygoodoptionforthere nement.Itsearchescandidateverticesinthesetofboundaryvertices,i.e.,verticesadjacenttoavertexinanotherpartition.Thenittriestomoveaselectedvertexintootherpartitions.Themoveisacceptedifoneofthefollowingconditionsisful lled:

1.Thesizeoftheedgecutisdecreasedandthepartitioningremainsbal-anced.

2.Thesizeoftheedgecutisnotincreased,butthebalancingisimproved.Toimprovethebalancingofastronglydisbalancedpartitioning,abalancingstepmaybyadded.ItworksliketheFMheuristics,buttheconditionsforacceptingamovearedi erent:

1.Thebalancingisimproved.

2.Thesizeoftheedgecutisdecreased,butthebalancingisnotworsened.

2.2ReorderingandAssemblingofSubmatrices(Phases2+3)

Afterthedomaindecomposition,theinternalnodesinallpartitionsarereorderedtominimizethesizeoftheenvelope.

De nition3.Inthei-thstepofthefactorizationofasymmetricmatrixA,thewavefrontissetofpairs

<wi(A)={{r,i}: ars=0,r>=i,s=i}.(2)

Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse

TheenvelopeofAisthen(symbol\isthesetdi erence)

Env(A)=n

i=1wi(A)\{i,i}.(3)

TheenvelopemethodstoresthepartsofAinsideenvelopes,whichrepresentsthemainportionofthememoryconsumptionofthesolver.TheSIFELsolverusestheSloanreorderingalgorithm[9],butotheralgorithms(RCM,hybridalgorithm)canbeusedaswell.

Everynodeigeneratesdiequations,wherediisthenumberofDOFs.Letniandnbdenotethenumberofinternalandboundary,respectively,variables.Clearly,ni+nb=n.TheSIFELsolverreadsthesubdomainsandassemblessubmatriceswithnodesinagivenorder.Theboundarynodescomelast.Wees-timatememoryrequirementsW(A)oftheenvelopesolveranditscomputationalcomplexityOP(A)by

W(A)=

OP(A)=n i=1ni

i=1|wi(A)|=|Env(A)|+n|wi(A)|2.(4)(5)

3TheNewRe nementHeuristicswithQualityBalancingInthissection,weexplainthemainideasofloadandmemorybalancingfortheparallelenvelopemethod.Atthebeginning,itmustbedecidedwhichqualitywillbebalanced:thecomputationalloadorthememorycomplexity.Thenthepartitioningisstarted.Toestimatethequality,thepartitionermusthavetheinformationaboutthestructureofthematrixA.SincethedualgraphGDdoesnotsu ce,thepartitionermusthaveatdisposalalsotheweightednodalgraphGN,inwhicheachvertexislabelledwiththenumberofDOFsofthenode.WepartitionGDandprojectthepartitioningofGDtoGNtoestimatethequality.So,thepartitionerneedsalsotheinformationabouttherelationbetweenGDandGN.

Weuseamultilevelk-waygraphpartitioningasdescribedinSect.2.1.The rsttwophasesareperformedunchangedasinMETIS.TheSHEMheuristicsisusedforthecoarseningandthemultilevelgraphbisectionisusedfortheinitialpartitioning.Wehavemodi edtheconditionsofmoveacceptanceofthere nementheuristicsintheuncoarseningphaseandextendeditwithaqualityestimationprocess.ThisnewheuristicsiscalledQB.Assumethatthecurrentlevelisl.SimilartoFM,QBchoosesavertexasacandidateformovingfromaDsourcepartitionGDl,stoatargetpartitionGl,t.Todecidewhetherthemovewill

Dbeaccepted,QBstartsforbothpartitionsGDl,sandGl,ttheestimationprocess,

sketchedonFig.4,toobtaininformationaboutbalancingofthequalities.DFirst,partitionGDl,p,p∈{s,t},isprojectedtoG0,p,theoriginalpartition.(Thisisskippedinthe nallevel,wherel=0).Afterthat,allnodesbelonging

Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse

Fig.4.Data owoftheQBheuristics

NNtoelementsfromGD0,pcorrespondtoapartitionGpofG.Thentheinternal

NverticesofGNparereorderedbytheSloanalgorithm.Finally,thequalityofGpisestimatedandreturnedtothere nementheuristics.

InthecurrentimplementationoftheQBheuristics,nodeshaveeithercon-stantnumberofDOFsd>0orareconstrained,i.e.,thenumberofDOFsis0.NAllconstrainednodesareomittedinthestepofprojectionofGD0,ptoGp,i.e.,thereorderingisperformedonlywithnodeswiththenumberofDOFsd>0.Afterthat,nodeigeneratesequationsnumbereddi,di+1,...di+d 1andwavefrontswdi(A),wdi+1(A),...,wdi+d 1(A).

TheoriginalFMheuristicscomputessumsofweightsofverticesinthesourceandtargetpartitionsforeverycandidatemove.Infact,theweightofthecan-didatevertexissubtractedfromtheweightofthesourcepartitionandaddedtotheweightofthetargetpartition.However,intheQBheuristics,thiswouldimplythereorderingandestimationcomputingforeverycandidatemoveandthiswouldextremelyslowdownthere nement.Thus,wehadtomodifytheconditionsofmoveacceptanceasfollows:

1.Thesizeoftheedgecutisdecreasedandthetargetpartitionisnotoverbal-anced.

2.Thequalityqsofthesourcepartitionisgreaterthanthequalityqtofthetargetpartition,butthesizeoftheedgecutisnotincreased.

Theconditionsofmoveacceptanceofthebalancingsteparealsomodi ed:

1.qs>qt.

2.Thesizeoftheedgecutisdecreasedandqs>=qt.

Onlyifamoveisaccepted,thequalitiesqsandqtarerecomputed.Notethatthenewconditionsmayleadtooverbalancingofthetarget,oreventhesource,partitions.Therefore,ifthenewvalueqsisgreaterthanitspreviousvalue,thevertexmoveisnulli ed.

Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse

4ExperimentalResults

TheSIFELsolverwasmodi edtoperformjusttheassemblingandfactorizationofsubmatrices(Phases3and4inSect.1).AllexperimentswereperformedonaPCwithIntelPentiumIII,1GHz,underGNU/Linux2.4.Thebenchmarksaremodelsofrealproblemsofstructuralmechanics:“ oor”and“sieger”are2Dproblemsdiscretizedbytrianglesand“block”,“jete”,and“wheel”are3Dproblemsdiscretizedbytetrahedrons.TheirdescriptionisinTable1.

Table1.Descriptionoftestproblems.#standsfornumberofproblem.#

oor

sieger

block|V(GD)|25065421140516789problemname84123157529|V(GN)|

1

ofsubmatrixAi.ThentmaxW=W.Similarly,lettibethefactorizationtime k=maxkt,ii=1i=1ti,and t=tmax/k

Abstract. We use a parallel direct solver based on the Schur complement method for solving large sparse linear systems arising from the finite element method. A domain decomposition of a problem is performed using a graph partitioning. It results in sparse

parisonoftheFMandQBheuristics

METIS

#k

14

5243889227

16

1405623400

1603540777

8

533

32

34

42557336176

16

87441288023

91517489307

8

23982987406

32

54

174711810322

16

59171838763971229MemoryBalancingLoadBalancing|Ec|Wmax Wtmax ttp34373125811.0931.01.27491.3515.51.726298.91.096694411758111.073.21.33491.361.51.7914470.81.111251.087.41.141576.51.101028815234601.063.21.33161.352.31.916631.61.6110210723083191.090.51.35462753179726401.03233.71.072191.1265.11.20430351.91.03222604527474721.2215.31.392821.335.11.5189863.01.123501.24167.91.3993097.61.0376162856623141.0934.21.13591.4814.61.6724609.01.139234717498441.092.21.21104999202694551.07178.01.171171.31106.11.58180267.91.08156317335579041.0621.31.151571.2910.01.7064206.51.33577References

1.George,A.,Liu,J.:ComputerSolutionofLargeSparsePositiveDe niteSystems.PrenticeHall,EnglewoodCli s,NJ(1981)

2.Hendricson,B.:Graphpartitioningandparallelsolvers:Hasemperornoclothes?Irregular’98,LectureNotesinComputerScience1457(1998)218–225

3.Hendricson,B.:Loadbalancing ctions,falsehoodsandfallacies.AppliedMathe-maticalModelling25(2000)99–108

4.Karypis,G.,Kumar,V.:put.20(1998)359–392

5.Karypis,G.,Kumar,V.:put.48(1998)96–129

6.Hendrickson,B.,Leland,R.:Amultilevelalgorithmforpartitioninggraphs.In:Proceedingsofthe1995ACM/IEEEconferenceonSupercomputing(CDROM),ACMPress(1995)28

7.Walshaw,C.,Cross,M.:MeshPartitioning:aMultilevelBalancingandRe put.22(2000)63–80(originallypublishedasUniv.GreenwichTech.Rep.98/IM/35).

8.Pellegrini,F.:Staticmappingbydualrecursivebipartitioningofprocessandarchi-tecturegraphs.SHPCC’94(1994)486–493

9.Kumfert,G.,Pothen,A.:Twoimprovedalgorithmsforenvelopeandwavefrontreduction.BIT37(1997)559–590

本文来源:https://www.bwwdw.com/article/449m.html

Top