Clustering using firefly algorithm Performance study

更新时间:2023-05-31 09:00:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

萤火虫算法

SwarmandEvolutionaryComputation1(2011)

164–171

ContentslistsavailableatSciVerseScienceDirect

SwarmandEvolutionaryComputation

journalhomepage:

/locate/swevo

Regularpaper

Clusteringusingfireflyalgorithm:Performancestudy

J.Senthilnath,S.N.Omkar ,V.Mani

DepartmentofAerospaceEngineering,IndianInstituteofScience,Bangalore,India

articleinfoabstract

AFireflyAlgorithm(FA)isarecentnatureinspiredoptimizationalgorithm,thatsimulatestheflashpatternandcharacteristicsoffireflies.Clusteringisapopulardataanalysistechniquetoidentifyhomogeneousgroupsofobjectsbasedonthevaluesoftheirattributes.Inthispaper,theFAisusedforclusteringonbenchmarkproblemsandtheperformanceoftheFAiscomparedwithothertwonatureinspiredtechniques—ArtificialBeeColony(ABC),ParticleSwarmOptimization(PSO),andotherninemethodsusedintheliterature.ThirteentypicalbenchmarkdatasetsfromtheUCImachinelearningrepositoryareusedtodemonstratetheresultsofthetechniques.Fromtheresultsobtained,wecomparetheperformanceoftheFAalgorithmandconcludethattheFAcanbeefficientlyusedforclustering.

CrownCopyright©2011PublishedbyElsevierLtd.Allrightsreserved.

Articlehistory:

Received10February2011Receivedinrevisedform5May2011

Accepted2June2011

Availableonline30June2011Keywords:ClusteringClassificationFireflyalgorithm

1.Introduction

Clusteringisanimportantunsupervisedclassificationtech-nique,whereasetofpatterns,usuallyvectorsinamulti-dimensionalspace,aregroupedintoclusters(orgroups)basedonsomesimilaritymetric[1–4].Clusteringisoftenusedforavari-etyofapplicationsinstatisticaldataanalysis,imageanalysis,dataminingandotherfieldsofscienceandengineering.

Clusteringalgorithmscanbeclassifiedintotwocategories:hierarchicalclusteringandpartitionalclustering[5,6].Hierarchicalclusteringconstructsahierarchyofclustersbysplittingalargeclusterintosmalleronesandmergingsmallerclusterintotheirnearestcentroid[7].Inthis,therearetwomainapproaches:(i)thedivisiveapproach,whichsplitsalargerclusterintotwoormoresmallerones;(ii)theagglomerativeapproach,whichbuildsalargerclusterbymergingtwoormoresmallerclusters.Ontheotherhandpartitionalclustering[8,9]attemptstodividethedatasetintoasetofdisjointclusterswithoutthehierarchicalstructure.Themostwidelyusedpartitionalclusteringalgorithmsaretheprototype-basedclusteringalgorithmswhereeachclusterisrepresentedbyitscenter.Theobjectivefunction(asquareerrorfunction)isthesumofthedistancefromthepatterntothecenter[6].Inthispaperweareconcernedwithpartitionalclusteringforgeneratingclustercentersandfurtherusingtheseclustercenterstoclassifythedataset.

Apopularpartitionalclusteringalgorithm—k-meansclustering,isessentiallyafunctionminimizationtechnique,wheretheobjectivefunctionisthesquarederror.However,themain

Correspondingauthor.

E-mailaddress:omkar@aero.iisc.ernet.in(S.N.Omkar).

drawbackofk-meansalgorithmisthatitconvergestoalocalminimafromthestartingpositionofthesearch[10].Inordertoovercomelocaloptimaproblems,manynatureinspiredalgorithmssuchas,geneticalgorithm[11],antcolonyoptimization[12],artificialimmunesystem[13],artificialbeecolony[9],andparticleswarmoptimization[14]havebeenused.Recently,efficienthybridevolutionaryoptimizationalgorithmsbasedoncombiningevolutionarymethodsandk-meanstoovercomelocaloptimaproblemsinclusteringareused[15–17].

TheFireflyAlgorithm(FA)isarecentnatureinspiredtech-nique[18],thathasbeenusedforsolvingnonlinearoptimizationproblems.Thisalgorithmisbasedonthebehaviorofsocialinsects(fireflies).Insocialinsectcolonies,eachindividualseemstohaveitsownagendaandyetthegroupasawholeappearstobehighlyorganized.Algorithmsbasedonnaturehavebeendemonstratedtoshoweffectivenessandefficiencytosolvedifficultoptimizationproblems.Aswarmisagroupofmulti-agentsystemssuchasfire-flies,inwhichsimpleagentscoordinatetheiractivitiestosolvethecomplexproblemoftheallocationofcommunicationtomultipleforagesitesindynamicenvironments.

Inthisstudy,theFireflyAlgorithm(FA),whichisdescribedbyYang[18]fornumericaloptimizationproblems,isappliedtoclustering.TostudytheperformanceoftheFAtoclusteringproblems,weconsiderthestandardbenchmarkproblems(13typicaltestdatabases)thatareavailableintheliterature[9,14].TheperformanceoftheFAalgorithmonclusteringiscomparedwiththeresultsofothernatureinspiredtechniques—ArtificialBeeColony(ABC)[19]andParticleSwarmIntelligence(PSO)[20]algorithmonthesametestdatasets[9,14].TheFA,ABCandPSOalgorithmsareinthesameclassofpopulation-based,natureinspiredoptimizationtechniques.Hence,wecomparetheperformanceoftheFAalgorithmwithABCandPSOalgorithms.

2210-6502/$–seefrontmatterCrownCopyright©2011PublishedbyElsevierLtd.Allrightsreserved.doi:10.1016/j.swevo.2011.06.003

萤火虫算法

J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171165

Wealsopresenttheresultsofotherninemethodsusedintheliterature[9,14].Fortheeaseofunderstandingandcomparison,wefollowthesamemannerofanalysisanddiscussions,usedin[9].TheonlykeydifferenceistheuseoftheFAalgorithminthisstudy.Contributionofthispaper:Inthiswork,foragivendatasettheFAisusedtofindtheclustercenters.Theclustercentersareobtainedbyrandomlyselecting75%ofthegivendataset.This75%ofthegivendataset,wecallasatrainingset.TheFAalgorithmusesthistrainingsetandtheclustercentersareobtained.Inordertostudy,theperformanceoftheFAalgorithm,theremaining25%ofdatasetisused(calledtestdataset).TheperformancemeasureusedintheFAistheclassificationerrorpercentage(CEP).ThisCEPisdefinedastheratioofnumberofmisclassifiedsamplesinthetestdatasetandtotalnumberofsamplesinthetestdataset.Thiscanbedonebecauseinthetestdataset,weknowtheactualclassofthetestdata.Thedistancesbetweenthegiventestdataandtheclustercentersarecomputed.Thedataisassignedtotheclustercenter(class)thathastheminimumdistance.Hence,wecancomputetheperformancemeasure—classificationerrorpercentage(CEP).

ThepaperisorganizedastheimplementationoftheFAalgorithminSection2,clusteringusingtheFAandperformanceevaluationinSections3and4respectively,andthenresultspresentedanddiscussedinSection5.WeconcludethepaperinSection6bysummarizingtheobservations.2.Fireflyalgorithm

Firefliesareglowwormsthatglowthroughbioluminescence.Forsimplicityindescribingourfireflyalgorithm,wenowusethefollowingthreeidealizedrules:(i)allfirefliesareunisexsothatonefireflywillbeattractedtootherfirefliesregardlessoftheirsex;(ii)animportantandinterestingbehavioroffirefliesistoglowbrightermainlytoattractpreyandtosharefoodwithothers;(iii)attractivenessisproportionaltotheirbrightness,thuseachagentfirstlymovestowardaneighborthatglowsbrighter[21].TheFireflyAlgorithm(FA)[18]isapopulation-basedalgorithmtofindtheglobaloptimaofobjectivefunctionsbasedonswarmintelligence,investigatingtheforagingbehavioroffireflies.IntheFA,physicalentities(agentsorfireflies)arerandomlydistributedinthesearchspace.Agentsarethoughtofasfirefliesthatcarryaluminescencequality,calledluciferin,thatemitlightproportionaltothisvalue.Eachfireflyisattractedbythebrighterglowofotherneighboringfireflies.Theattractivenessdecreasesastheirdistanceincreases.Ifthereisnobrighteronethanaparticularfirefly,itwillmoverandomly.IntheapplicationoftheFAtoclustering,thedecisionvariablesareclustercenters.TheobjectivefunctionisrelatedtothesumonalltrainingsetinstancesofEuclideandistanceinanN-dimensionalspace,asgivenin[9].

Basedonthisobjectivefunction,initially,alltheagents(fireflies)arerandomlydispersedacrossthesearchspace.Thetwophasesofthefireflyalgorithmareasfollows.

i.Variationoflightintensity:Lightintensityisrelatedtoobjectivevalues[18].Soforamaximization/minimizationproblemafireflywithhigh/lowintensitywillattractanotherfireflywithhigh/lowintensity.Assumethatthereexistsaswarmofnagents(fireflies)andxirepresentsasolutionforafireflyi,whereasf(xi)denotesitsfitnessvalue.HerethebrightnessIofafireflyisselectedtoreflectitscurrentpositionxofitsfitnessvaluef(x)[18].Ii=f(xi),

1≤i≤n.

(1)

ii.Movementtowardattractivefirefly:Afireflyattractivenessis

proportionaltothelightintensityseenbyadjacentfireflies[18].Eachfireflyhasitsdistinctiveattractivenessβwhichimplieshowstrongitattractsothermembersoftheswarm.However,the

attractivenessβisrelative,itwillvarywiththedistancerijbetweentwofirefliesiandjatlocationsxiandxjrespectively,isgivenasrij=‖xi xj‖.

(2)

Theattractivenessfunctionβ(r)ofthefireflyisdeterminedby

β(r)=β0e γr

2

(3)

whereβ0istheattractivenessatr=0andγisthelightabsorptioncoefficient.

Themovementofafireflyiatlocationxiattractedtoanothermoreattractive(brighter)fireflyjatlocationxjisdeterminedbyxi(t+1)=xi(t)+βγr2

0e (xj xi).

(4)

AdetaileddescriptionofthisFAisgivenin[18].Apseudo-codeofthisalgorithmisgivenbelow.

Pseudo-code:AHigh-LevelDescriptionoffireflyalgorithmInput:

Createaninitialpopulationoffirefliesnwithind-dimensionalsearchspacexik,i=1,2,...,nandk=1,2,...,d

Evaluatethefitnessofthepopulationf(xik)whichisdirectlyproportionaltolightintensity,γIikAlgorithm’sparameter—β0Output:

Obtainedminimumlocation:ximinbegin

repeat

fori=1ton

forj=1ton

if(Ij<Ii)

Movefireflyitowardjind-dimensionusingEq.(4)endif

Attractivenessvarieswithdistancerviaexp[ r2]

EvaluatenewsolutionsandupdatelightintensityusingEq.(1)endforjendfori

Rankthefirefliesandfindthecurrentbestuntilstopconditiontrueend

3.ClusteringusingFA

Theclusteringmethods,separatingtheobjectsintogroupsorclasses,aredevelopedbasedonunsupervisedlearning.Intheunsupervisedtechnique,thetrainingdatasetaregroupedfirst,basedsolelyonthenumericalinformationinthedata(i.e.clustercenters),andarethenmatchedbytheanalysttoinformationclasses.Thedatasetsthatwetackledcontaintheinformationofclassesforeachdata.Therefore,themaingoalistofindthecentersoftheclustersbyminimizingtheobjectivefunction,thesumofdistancesofthepatternstotheircenters.

ForagivenNobjectstheproblemistominimizethesumofsquaredEuclideandistancesbetweeneachpatternandallocateeachpatterntooneofkclustercenters.TheclusteringobjectivefunctionisthesumoferrorsquaredasgiveninEq.(5)isdescribedasin[22]:

KJ(K)=

(xi ck)

(5)

k=1i∈ck

萤火虫算法

166J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171

whereKisthenumberofclusters,foragivennpatternxi(i=1,...,n)thelocationoftheithpatternandck(k=1,...,K)isthekthclustercenter,tobefoundbyEq.(6):ck=

xi

(6)

i∈Ck

nk

wherenkisthenumberofpatternsinthekthcluster.

Theclusteranalysisformstheassignmentofdatasetintoclusterssothatitcanbegroupedintosameclusterbasedonsomesimilaritymeasures[23].Distancemeasurementismostwidelyusedforevaluatingsimilaritiesbetweenpatterns.TheclustercentersarethedecisionvariableswhichareobtainedbyminimizingthesumofEuclideandistanceonalltrainingsetinstancesinthed-dimensionalspacebetweengenericinstancexiandthecenteroftheclusterck.Thecost(objective)functionforthepatterniisgivenbyEq.(7),asin[9,14]f

Train

i=

1

DDd(x,

CLknown(xj)jpTraini

)

(7)

j=1

whereDTrainisthenumberoftrainingdatasetwhichisusedtonormalizethesumthatwillrangeanydistancewithin[0.0,1.0]andpCLknown(xj)

todatabase.

idefinestheclassthatinstancebelongstoaccordingNotethatinourFAalgorithm,thedecisionvariablesaretheclustercenters.TheobjectivefunctioninourFAalgorithmisgivenbyEq.(7).Inourstudy,weconsiderthestandard13benchmarkproblemsgivenin[14].Foragivendataset,letnbethenumberofdatapoints,dbethedimension,cbethenumberofclasses.Agivendatapointbelongstoonlyoneofthesecclasses.Ofthegivendataset,75%ofthedatasetarerandomlyselectedtoobtaintheclustercentersusingEq.(7).Inthiswayweobtaintheclustercentersforallthecclasses.Theremaining25%ofdatasetisused(calledtestdataset)toobtaintheclassificationerrorpercentage(CEP).AnillustrativeexampleofthisFAalgorithmanditsperformancemeasure,isgiveninthenextsection.

4.Performancemeasuresandanillustrativeexample

Asdiscussedintheearliersection,ingtheseclustercenters,thetestingdatasetareclassifiedandtheperformanceofclassificationareanalyzed.

4.1.Performanceevaluation

TheperformanceoftheextractedknowledgeintheformofclustercentersbytheFAisevaluatedusingClassificationErrorPercentage(CEP)andclassificationefficiency.CEPdependsonlyontestdataandtheclassificationefficiencydependsonbothtrainingandtestingdata.

4.1.1.ClassificationErrorPercentage(CEP)

CEPisobtainedonlyusingthetestdata[9].Foreachproblem,wereporttheCEPwhichisthepercentageofincorrectlyclassifiedpatternsofthetestdatasetsasgivenin[9],tomakeareliablecomparison.

Theclassificationofeachpatternisdonebyassigningittotheclasswhosedistanceisclosesttothecenteroftheclusters.Then,theclassifiedoutputiscomparedwiththedesiredoutputandiftheyarenotexactlythesame,thepatternisseparatedasmisclassified[9].Thisprocedureisappliedtoalltestdataandthetotalmisclassifiedpatternnumberispercentagedtothesizeoftestdataset,whichisgivenbyCEP=

numberofmisclassifiedsamples

totalsizeoftestdataset

×100.

(8)

20

Class 2

15

y

training dataClass 1

testing data

10

5

0510

152025

x

Fig.1.Datadistribution.

4.1.2.Classificationefficiency

Classificationefficiencyisobtainedusingboththetrainingandtestdata.Theclassificationmatrixisusedtoobtainthestatisticalmeasuresfortheclass-levelperformance(individualefficiency)andtheglobalperformance(averageandoverallefficiency)oftheclassifier[24].Theindividualefficiencyisindicatedbythepercentageclassificationwhichtellsushowmanysamplesbelongingtoaparticularclasshavebeencorrectlyclassified.Thepercentageclassification(ηi)fortheclassciisgivenbyEq.(9).

ηii

i=

qn(9)

qji

j=1

whereqiiisthenumberofcorrectlyclassifiedsamplesandnisthenumberofsamplesfortheclassciinthedataset.Theglobalperformancemeasuresaretheaverage(ηa)andoverall(ηo)classification,whicharedefinedas

η1

nca=nηi

(10)

ci=1

η1

nco=

Nqii(11)i=1

wherencisthetotalnumberofclassesandNisthenumberofpatterns.

4.2.Illustrativeexample

WeillustratehowtheFireflyAlgorithm(FA)isusedforclusteringwiththefollowingsyntheticdata.Althoughtheproposedalgorithmcanbeusedforanytypeofmixturemodel,wefocusonaGaussianmixture.LetusconsidertwoGaussianmixturesthathavetwoinputfeatures,namelyxandy.Here,themeanvaluesµ1=[8,8]Tandµ2=[16,16]T,co-variancematrix(x,y)={(6,3);(3,2)}areassumedandeachclasshaveequalnumberofsamples.Inourexperimentation100samplesaregeneratedrandomlyforeachclass.Ofthese75datapointsareusedfortrainingandtheremaining25isusedfortestingineachclass.ThissyntheticdatageneratedisshowninFig.1.

Weusethefireflyalgorithmontrainingdatatoobtainclustercenters.Letxibeoneofthesolutions(clustercenters)andJibetheobjectivefunctionvalueforthisclustercenter.

Weconsiderapopulationsizeof5firefliesatlocationsx1,x2,x3,x4andx5within2d-dimensional,searchspace.NowevaluatethefitnessofthepopulationJ1,J2,J3J4,andJ5usingEq.(7)whichisdirectlyproportionaltolightintensityI1,I2,I3,I4andI5.Nowcomparetheintensityvaluesofafirefly,if(I2<I1)thenmovefirefly2toward1usingEq.(4),similarlycomparealltheagents

萤火虫算法

J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171167

18

16

14

y

12

Agent

Agent movementCluster center

10

86

46810

1214161820

x

Fig.2.Optimalclustercenters.

andupdatethemovementphaseofeachagentbyevaluatingnewsolutionsandupdatelightintensityusingEq.(1).Thisprocedureiscontinuedtillitconvergestotheoptimalclustercenteri.e.ximin,asshowninFig.2.Theclustercentersgeneratedare

(x,y)={(7.2233,7.6659);(16.0580,16.3230)}.

Theclassificationresultusingthetestingdatasetofeachclasscentersfoundbythefireflyalgorithmhaszeroclassificationerrorpercentage.Fortheentiredataset,theperformanceofindividual,averageandoverallefficiencyis100%.5.Resultsanddiscussion

Inthiswork,wepresenttheresultsobtainedusingtheFireflyAlgorithm(FA)on13typicalbenchmarkdatasetswhicharewellknownintheliterature(UCIdatabaserepository[25]).First,wedescribethecharacteristicsofthestandardclassificationdataset.NextwepresenttheresultsobtainedfromtheFAfor13benchmarkdatasetproblems.FinallywepresentthecomparisonoftheFAwithothertwonatureinspiredtechniques—ArtificialBeeColony(ABC)andParticleSwarmOptimization(PSO)andother9methodsusedintheliterature[9,14]andanalyzetheirperformance.5.1.Datasetdescription

The13classificationdatasetisawell-knownandwell-usedbenchmarkdatasetbythemachinelearningcommunity.Thenumberofdatasets,thenumberofinputfeaturesandthenumberofclassesarepresentedinTable1.These13benchmarkproblemsarechosenexactlythesameasin[9,14],tomakeareliablecomparison.Theentiredatasetissegregatedintotwoparts,the75%ofdataisusedfortrainingpurposeandtheremaining25%ofdataisusedastestingsamples.ThenumberofthetrainingandtestsetscanbefoundinTable1.Aftertraining,weobtaintheclustercenters(extractedknowledge)thatcanbeusedforclassifyingthetestdataset.Theproblemsconsideredinthisworkcanbedescribedbrieflyasfollows.

Dataset1:TheBalancedatasetisbasedonbalancescaleweightanddistance.Itcontains625patternswhicharesplitinto429fortrainingand156fortesting.Theirare4integervaluedattributesand3classes.

Dataset2and3:TheCancerandCancer-Intdatasetisbasedonthediagnosisof‘‘breastcancerWisconsin—Diagnostic’’and‘‘breastcancerWisconsin—Original’’datasetsrespectively.Itcontains2classeswithatumoraseitherbenignormalignant.Acancerdata

Table1

setcontains569patternswith30attributesandtheCancer-Intcontains699patterns,9attributes.

Dataset4:TheCreditdatasetisbasedontheAustraliancreditcardtoassessapplicationsforcreditcards.Thereare690patterns(numberofapplicants),15inputfeaturesandtheoutputhas2classes.

Dataset5:TheDermatologydatasetisbasedondifferentialdiagnosisoferythemato-squamousdiseases.Thereare6classes,366samples,and34attributes.

Dataset6:ThePima—Diabetesdatasethas768instancesof8attributesandtwoclasseswhicharetodetermineifthedetectionofdiabetesispositive(classA)ornegative(classB).

Dataset7:TheEscherichiacolidatasetisbasedonthecellularlocalizationsitesofproteins.Heretheoriginaldatasethas336patternsformedof8classes,but3classesarerepresentedwithonly2,2and5numberofpatterns.Therefore,these9examplesareomittedbyconsidering327patterns,5classesand7attributes.Dataset8:TheGlassdatasetisdefinedintermsoftheiroxidecontentasglasstype.Nineinputsarebasedon9chemicalmeasurementswithoneof6typesofglass.Thedatasetcontains214patternswhicharesplitinto161fortrainingand53fortesting.Dataset9:TheHeartdatasetisbasedonthediagnosisofheartdisease.Itcontains76attributesforeachpattern,35ofwhichareusedasinputfeatures.ThedataisbasedonClevelandHeartdatafromtherepositorywith303patternsand2classes.

Dataset10:TheHorsedatasetisusedtopredictthefortuneofahorsewithacolicandtoclassifywhetherthehorsewilldie,willsurvive,orwillbeeuthanized.Thedatasetcontains364patterns,eachofwhichhas58inputsfrom27attributesand3classes.

Dataset11:TheIrisdatasetconsistsofthreevarietiesofflowers—setosa,virginicaandversicolor.Thereare150instancesand4attributesthatmakeupthe3classes.

Dataset12:TheThyroiddatasetisbasedonthediagnosisofthyroidwhetheritishyperorhypofunction.Thedatasetcontains215patterns,5attributesand3classes.

Dataset13:TheWinedataobtainedfromthechemicalanalysisofwineswerederivedfromthreedifferentcultivators.Thedatasetcontains3typesofwines,with178patternsand13attributes.5.2.Resultsobtainedusingfireflyalgorithm

Inthissection,wediscusstheresultsobtainedusingtheFireflyAlgorithm(FA)on13benchmarkdatasetproblemsandcomparetheFAwithother11methodsusedintheliteraturebasedontheperformancemeasures.

5.2.1.FAclusteringandparametersetting

Thefirefliesareinitializedrandomlyinthesearchspace.Theparametervaluesusedinouralgorithmare

萤火虫算法

168J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171

Table2

Averageclassificationerrorpercentagesusingnatureinspiredtechniquesontest

Numberoffireflies(N)=20Attractiveness(β0)=1

Lightabsorptioncoefficient(γ)=1Numberofgenerations(T)=100.

Formostoftheapplications,thesameparametervaluesaresuggestedbyYang[18].Afterthefirefliesaredeployedrandomlywithinthesearchspace,theparameterβ0=1whichisequivalenttotheschemeofcooperativelocalsearchwiththebrightestfireflystronglydeterminedtheotherfirefliespositions,especiallyinitsneighborhood.Theparametervalueofγ=1determinesthevariationoflightintensitywithincreasingdistancefromthecommunicatedfirefly,resultsinthecompleterandomsearch.

Thenumberoffunctionevaluationsinthefireflyalgorithmcanbeobtainedasfollows:letNbethesizeofinitialpopulation,andTbethemaximumnumberofgeneration.Thenthenumberof

functionevaluationsforeachiterationisN (N 1)

N (N 12.Thetotalnumberoffunctionevaluationsgeneratedis)

wehaveused100asthemaximumnumber2

×T.Inourstudies,ofgenerations.Thenumberoffunctionevaluationforeach13classificationdataset,(withN=20andT=100)inonesimulationrunis19000.5.2.2.AnalysisofClassificationErrorPercentageusingFA

In[9,14],theClassificationErrorPercentage(CEP)measureisusedwithallthe13benchmarkdatasets.Falcoetal.[14]comparedtheperformanceofthePSOalgorithmwiththeother9methodsnamelyBayesNet[26],MultiLayerPerceptronArtificialNeuralNetwork(MLP)[27],RadialBasisFunctionArtificialNeuralNetwork(RBF)[28],KStar[29],Bagging[30],MultiBoostAB[31],NaiveBayesTree(NBTree)[32],RippleDownRule(Ridor)[33]andVotingFeatureInterval(VFI)[34].KarabogaandOzturk[9]implementedtheABCalgorithmandanalyzedCEPwithalltheabovementionedmethods.Inthisstudy,inadditiontothesemethods[9,14]wehaveanalyzedtheCEPmeasureoftheFAtomakereliablecomparison.

FromthetrainingdatasettheknowledgeintheformofclustercentersisobtainedusingtheFireflyAlgorithm(FA).FortheseclustercentersthetestingdatasetsareappliedandtheCEPvaluesareobtained.Theresultsofthenatureinspiredtechniques—FA,ABCandPSOfortheproblemsaregiveninTable2whereCEPvaluesarepresented.TheFAoutperformstheABCandPSOalgorithmsinall13problems,whereasABCalgorithm’sresultisbetterthanthatofPSOalgorithminall12problemsexceptforoneproblem(theglassproblem)intermsofclassificationerror.Moreover,theaverageclassificationerrorpercentagesisalsobetterforallproblemsinthecaseofFA(11.36%)comparingtothatofABC(13.13%)andPSO(15.99%).

FromTable3,wecanobservethattheCEPmeasureoftheFAand11methodsthataregivenin[9,14]arepresented,andtherankingisbasedontheascendingorderofaverageclassification

erroroftheclassifiersoneachproblemarealsogivenintheparenthesis.Ataglance,onecaneasilyseethattheFAgetsthebestsolutionin8oftheproblemsamong13problemsused.Tobeabletomakeagoodcomparisonofallthealgorithms,Tables4and5arereported.Table4showstheaverageclassificationerrorsofallproblemsandtherankingisbasedontheascendingorderofaverageclassificationerror.Table5showsthesumofthealgorithmsrankingofeachprobleminascendingorder.

FromTable4,wecanobservethatbasedonaverageCEPvalues,FAisthebestincomparisonwiththatofMLPartificialneuralnetworktechniqueandABC,whileMLPperformedbetterincomparisonwiththeABC.However,eveniftheresultsinthetablearecomparable,webelievethatitmaycausesomesignificantpointstobedisregardedsincethedistributionoftheerrorratesarenotproportional.Therefore,thegeneralrankingofthetechniquesinTable5isrealizedbycalculatingthesumoftheranksofeachproblemfromTable4.Fromthisranking,onceagaintheFAisthebest,whiletheABCalgorithmatthesecondposition,andtheBayesNettechniqueatthethirdposition.TheclassificationerrorrateandrankingsfromthetablesshowthatclusteringwiththeFAofferssuperiorgeneralizationcapability.Notethat,hereweareusingonlytheresultsofalltheothermethodsgiveninearlierstudies[9,14]excepttheFAalgorithm.

5.2.3.AnalysisofclassificationefficiencyusingFA

Intheprevioussection,wepresentedtheresultobtainedusingCEP.This(CEP)ingthesameclustercenterstheaverageandoverallefficiencyforentiredatasetareobtained.

i.Significanceofindividualefficiency:Foratestingdataset,themainsignificanceofindividualefficiencyistoanalyzetheclass-levelperformanceofaclassifieralgorithm.FromTable6,wecanobservethattheindividualclassificationefficiencyofthetestingsamples,hereCancer-Int,Iris,ThyroidandWineisgettingclassifiedwithoutanymisclassificationsandhencehasanindividualefficiencyof100%.InthecaseofBalancedatasettheindividualefficiencyofClass2is66.7%.InCreditandDiabetesdatasetclass1ismisclassifiedasclass2withindividualefficiencyof70%whereasinHeartandDermatologydatasetClass2haslessindividualefficiencyof73.1%and50%respectively.

FormTable3wecanobservethat,fortheclassificationproblem—Heart(2classproblem),theFAperformedbetterthanalltheotherclassifierwiththeCEPvalue13.16.Thisdoesnotmeanthattheindividualefficiencyofeachclasstobegood.Toillustratethisinmoredetail,letusconsiderHeartdataset,fromTable6wecanobservethattheClass1hasimpressiveindividualefficiencyof94%whereasinClass2mostofthesamplesbelongingtoClass2ismisclassifiedasClass1withindividualefficiency73.1%.Henceitisimportanttoconsidertheindividualefficiencytoanalyzetheclass-levelperformanceofaclusteringalgorithm.

ii.Performanceofaverageandoverallefficiency:Forentiredataset,itisalwaysnecessarytoknowtheglobalperformanceofaalgorithm.Thiscanbeachievedbyusingaverageandoverallefficiency.AswecannoticefromTable6,fortheentiredatasettheaverageandoverallefficiencyusingthefireflyalgorithmfor13benchmarkdatasets.TheBalancedatasethasanaverageandoverallefficiencyof74.9%and80.8%respectivelywhereasCancerdatasetwithaverageandoverallefficiencyof91%and92.5%respectively.AnaverageandoverallefficiencyofCancer-Int,Dermatology,IrisandWinedatasetare97.9%,81.9%,94.7%and90.6%respectively.TheaverageefficiencyofCredit,Diabetes,E.Coli,HeartandThyroidare75.5%,73.4%,88.5%,77.1%and92.6%.TheoverallefficiencyofCredit,Diabetes,E.Coli,HeartandThyroid

萤火虫算法

J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171

Table3

169

Table4

Table5

Table6

FAbestclassificationefficiency.

are78.7%,75.9%,89.2%,78.7%and94.2%.InthecaseofGlassandHorsedatasettheFAhasalessaverageefficiencyof70.8%and53.7%respectivelyandoverallefficiencyof61.6%and66.1%respectively.

parisonofclassificationefficiencyofnatureinspiredtechniqueusingIrisdataset

FormTable6wecanobservethat,forthestandardbenchmarkproblems—Cancer-Int,Iris,ThyroidandWinetheirisnomisclassi-ficationinanyoftheirindividualclassesi.e.allthetestdatasetareclassifiedcorrectlyandhenceCEPvalueis0.Thisdoesnotmeanthattheoverallefficiencyis100%.Toillustratethisinmorede-tail,letusconsiderIrisdatasetascomparedtoCancer-Int,Thy-roidandWine,ithaslessinputfeatures.TheFA,ABCandPSOareinthesameclassofpopulation-based,natureinspiredoptimiza-tiontechniques.Herewecomparethethreenatureinspiredtech-niquetoextractknowledgeintheformofclustercentersandtheperformanceisanalyzedusingclassificationefficiency.TheclustercentersgeneratedusingtheFA,ABCandPSOforIristrainingdataareshowninTable7.HeretheclustercentersobtainedforABCmatcheswiththepublishedliterature[35].TheparametervalueusedfortheFAisasgiveninSection5.2.1.ForABCandPSOweconsidertheparametervalueasin[9,14]respectively.

FromTable8,wecanobserveincomparisonwithothernatureinspiredtechniquestheoptimalandmeanfitnessvaluesobtainedusingtheFAisbetterthanABCandPSO.Beingacontinuousoptimizationproblem,initiallyeverypossibleclustercenterspickedbythepopulationisnotthebestoptimalpointinthesearchspace.Thereforetheselectionofnewclustercentersafterfitness

萤火虫算法

170J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171

Table7

Table8

Resultsofnatureinspiredtechniquesafter20runsforoptimalclustercentersusing

Table9

ComparisonofclassificationefficiencyforIrisdataset.evaluationiseffectivelyscannediterativelytillalltheparticlesconvergetoanoptimalresulti.e.intheformofclustercenters.IntheFAalgorithm,afireflyparticlemovestowardanotherfireflyparticle,whichhasabetterobjectivefunctionvalue(fitness).Thedistancemovedbythefireflyineachinstanceisgivenbythedistancebetweenthetwofireflyparticles(r).Whenthevalueofrislarge/small,thefireflywillmoveasmall/largedistance.Thiswillaffectthecomputationtimeofthisalgorithm.InPSOeachparticlewillmoveadistancebasedonitspersonalbestandtheglobalbest.InABCeachbeeparticlepositionwillbecomparedtwicewithbestparticleposition.Furthertheclustercentersgeneratedbythesealgorithmisanalyzedusingthedistancesbetweenthegivendataandtheclustercentersarecomputed.Thedataisassignedtotheclustercenter(class)thathastheminimumdistance.Theperformancemeasurewillhelpsustoexaminewhichmethodhasgeneratedtheoptimalclustercenters.

TheclassificationmatrixfortheentireIrisdatasetareshowninTable9.Fromthistable,wecanobservethat,forallthenatureinspiredalgorithms,samplesbelongingtoClass2andClass3aregettingmisclassifiedasClass3andClass2respectively.FortheFAandABCgeneratedoptimalclustercenters,theperformanceofindividualefficiencyofClass2is96%whereasPSOhas92%.TheindividualefficiencyofClass3usingFAis88%whichisbetterincomparisonwiththatofABCandPSOwith72%and74%respectively.Inallthealgorithms,Class1isclassifiedwithoutanymisclassificationandhencehaveindividualefficiencyof100%,asitislinearlyseparableincomparisonwiththatofother2classes.Also,theaverageandoverallefficiencyisbetterinthecaseoftheFAwith94.7%incomparisontoABCandPSOis89.3%and88.7%respectively.Henceitisimportanttoconsidertheindividual,averageandoverallefficiencyinmulti-classclassificationproblemforageneratedclustercenters.

Itisimportanttonotethattheperformanceofclusteringmainlydependsonthesizeandqualityoftrainingdataset.Therearesomemethodsavailableintheliteraturefortheselectionoftrainingdata

set[36].Earlierstudyshowedthattheproperselectionoftrainingdatasetimprovestheperformancemeasure[36].Inourstudy,wehaveselected75%oftrainingdatasetrandomlyandtabulatedtheresultbasedonthemostfavorableperformancemeasurefortheselectedtrainingdataset.

Inoverallformostofthedataset,theFAhasgoodglobalperformance.WecanclaimthatbylookingattheaccuracyandrobustnessofFA,itcanbeusedforclusteringproblemsstudiedinthispaper.

6.Discussionsandconclusions

Thispaperinvestigatesanewnatureinspiredalgorithm—theFAisusedforclusteringandevaluatingitsperformance.TheFAalgorithmiscomparedwithABCandPSOasallthesemethodsareinthesameclassofpopulation-based,natureinspiredoptimizationtechniques.Asinotherpopulation-basedalgorithms,theγperformanceoftheFAdependsonthepopulationsize,β,and.IntheFAalgorithm,afireflyparticlemovetowardanotherfireflyparticle,whichhasabetterobjectivefunctionvalue(fitness).Thedistancemovedbythefireflyineachinstanceisgivenbythedistancebetweenthetwofireflyparticles(r).Theeffectofthevaluesofβandγarediscussedin[18].Whenthevalueofrislarge/small,thefireflywillmoveasmall/largedistance.Thiswillaffectthecomputationtimeofthisalgorithm.InPSOeachparticlewillmoveadistancebasedonitspersonalbestandtheglobalbest.InABCeachbeeparticlepositionwillbecomparedtwicewithbestparticleposition.IntheFAalgorithmonlythedistanceisnecessaryforthemovement.Theperformancemeasure(CEP),willhelpsustoexaminewhichmethodhasgeneratedtheoptimalclustercenters.Theclusteringtaskof13benchmarkdatasetsareaccomplishedsuccessfullybytheprocedureofpartitionalclusteringusingare-centnatureinspiredtechnique—FireflyAlgorithm(FA).Clusteringisanimportanttechniquetoidentifyhomogeneousclusters(orclasses)suchthatthepatternsforaclustercentershareahighdegreeofaffinitywhilebeingverydissimilarforotherclusters.TheperformanceoftheFAusingclassificationerrorpercentageiscomparedwithothertwonatureinspiredtechniques—ArtificialBeeColony(ABC)andParticleSwarmOptimization(PSO)andotherninemethodswhicharewidelyusedbytheresearchers.Theper-formancemeasureusingclassificationefficiency—individual,aver-ageandoverallefficiencyoftheFAisanalyzedusing13benchmarkproblems.Fromtheresultsobtained,wecanconcludethattheFAisanefficient,reliableandrobustmethod,whichcanbeappliedsuccessfullytogenerateoptimalclustercenters.Acknowledgments

Theauthorswouldliketothankthereviewersfortheircommentswhichwereusefulduringtherevisionofthisstudy.References

[1]M.R.Anderberg,ClusterAnalysisforApplication,AcademicPress,NewYork,

1973.

[2]J.A.Hartigan,ClusteringAlgorithms,Wiley,NewYork,1975.

[3]P.A.Devijver,J.Kittler,PatternRecognition:AStatisticalApproach,Prentice-Hall,London,1982.

[4]A.K.Jain,R.C.Dubes,AlgorithmsforClusteringData,Prentice-Hall,Englewood

Cliffs,1988.

[5]H.Frigui,R.Krishnapuram,Arobustcompetitiveclusteringalgorithmwith

applicationsincomputervision,IEEETransactionsonPatternAnalysisandMachineIntelligence21(1999)450–465.

[6]Y.Leung,J.Zhang,Z.Xu,Clusteringbyscale-spacefiltering,IEEETransactions

onPatternAnalysisandMachineIntelligence22(2000)1396–1410.

[7]D.Chris,XiaofengHe,Clustermergingandsplittinginhierarchicalclustering

algorithms,in:Proc.IEEEICDM,2002,pp.1–8.

[8]B.Mirkin,MathematicalClassificationandClustering,KluwerAcademic

Publishers,Dordrecht,1996.

[9]D.Karaboga,C.Ozturk,Anovelclusterapproach:ArtificialBeeColony(ABC)

algorithm,AppliedSoftComputing11(1)(2010)652–657.

萤火虫算法

J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171

171

[10]S.Z.Selim,M.A.Ismail,K-meanstypealgorithms:ageneralizedconvergence

theoremandcharacterizationoflocaloptimality,IEEETransactionsonPatternAnalysisandMachineIntelligence6(1984)81–87.

[11]E.Falkenauer,GeneticAlgorithmsandGroupingProblems,Wiley,Chichester,

1998.

[12]Y.Kao,K.Cheng,AnACO-basedclusteringalgorithm,in:M.Dorigo,etal.(Eds.),

ANTS,in:LNCS,vol.4150,Springer,Berlin,2006,pp.340–347.

[13]R.Younsi,W.Wang,Anewartificialimmunesystemalgorithmforclustering,

in:Z.R.Yang(Ed.),LNCS,vol.3177,Springer,Berlin,2004,pp.58–64.

[14]IDeFalco,A.D.Cioppa,E.Tarantino,Facingclassificationproblemswith

particleswarmoptimization,AppliedSoftComputing7(3)(2007)652–658.[15]T.Niknam,B.Amiri,J.Olamaei,A.Arefi,Anefficienthybridevolutionary

optimizationalgorithmbasedonPSOandSAforclustering,JournalofZhejiangUniversity:ScienceA10(4)(2009)512–519.

[16]T.Niknam,E.TaherianFard,N.Pourjafarian,A.R.Rousta,Anefficienthybrid

algorithmbasedonmodifiedimperialistcompetitivealgorithmandk-meansfordataclustering,EngineeringApplicationsofArtificialIntelligence24(2)(2011)306–317.

[17]T.Niknam,B.Amiri,AnefficienthybridapproachbasedonPSO,ACOand

k-meansforclusteranalysis,AppliedSoftComputing10(1)(2010)183–197.[18]X.S.Yang,Nature-InspiredMetaheuristicAlgorithms,LuniverPress,2008.[19]D.Karaboga,A.Basturk,OntheperformanceofArtificialBeeColony(ABC)

algorithm,AppliedSoftComputing8(1)(2008)687–697.

[20]J.Kennady,R.C.Eberhart,Particleswarmoptimization,in:IEEEIntl.Conf.on

NeuralNetworks,vol.4,1995,pp.1942–1948.

[21]J.Tyler,Glow-worms./galaxypix/Tylerbookpt1.

html.

[22]Y.Marinakis,M.Marinaki,M.Doumpos,N.Matsatsinis,C.Zopounidis,A

hybridstochasticgenetic-GRASPalgorithmforclusteringanalysis,OperationalResearchAnInternationalJournal8(1)(2008)33–46.

[23]A.K.Jain,M.N.Murty,P.J.Flynn,Dataclustering:areview,ACMComputing

Surveys31(3)(1999)264–323.

[24]S.Suresh,N.Sundararajan,P.Saratchandran,Asequentialmulti-category

classifierusingradialbasisfunctionnetworks,Neurocomputing71(7–9)(2008)1345–1358.

[25]C.L.Blake,C.J.Merz,UniversityofCaliforniaatIrvineRepositoryofMachine

LearningDatabases,1998.http://www.ics.uci.edu/mlearn/MLRepository.html.

[26]F.Jensen,AnIntroductiontoBayesianNetworks,UCLPress,Springer-Verlag,

1996.

[27]D.E.Rumelhart,G.E.Hinton,R.J.Williams,Learningrepresentationbyback

propagationerrors,Nature323(1986)533–536.

[28]M.H.Hassoun,FundamentalsofArtificialNeuralNetworks,TheMITPress,

Cambridge,1995.

[29]J.G.Cleary,L.E.Trigg,K*:aninstance-basedlearnerusinganentropicdistance

measure,in:Proceedingsofthe12thInternationalConferenceonMachineLearning,1995,p.108–114.

[30]L.Breiman,Baggingpredictors,MachineLearning24(2)(1996)123–140.[31]G.I.Webb,Multiboosting:atechniqueforcombiningboostingandwagging,

MachineLearning40(2)(2000)159–196.

[32]R.Kohavi,ScalinguptheaccuracyofNaive–Bayesclassifiers:adecisiontree

hybrid,in:ProceedingsoftheSecondInternationalConferenceonKnowledgeDiscoveryandDataMining,AAAIPress,1996,pp.202–207.

[33]pton,R.Jansen,Knowledgeincontext:astrategyforexpertsystem

maintenance,in:ProceedingsofArtificialIntelligence,in:LNAI,vol.406,Springer-Verlag,Berlin,1988,pp.292–306.

[34]G.Demiroz,A.Guvenir,Classificationbyvotingfeatureintervals,in:

ProceedingsoftheSeventhEuropeanConferenceonMachineLearning,1997,pp.85–92.

[35]C.Zhang,D.Ouyang,J.Ning,Anartificialbeecolonyapproachforclustering,

ExpertSystemswithApplications37(7)(2010)4761–4767.

[36]T.Yoshida,S.Omatu,Neuralnetworkapproachtolandcovermapping,IEEE

TransactionsonGeoscienceandRemoteSensing32(5)(1994)1103–1109.

本文来源:https://www.bwwdw.com/article/f0u4.html

Top