Clustering using firefly algorithm Performance study
更新时间:2023-05-31 09:00:01 阅读量: 实用文档 文档下载
- clustering推荐度:
- 相关推荐
萤火虫算法
SwarmandEvolutionaryComputation1(2011)
164–171
ContentslistsavailableatSciVerseScienceDirect
SwarmandEvolutionaryComputation
journalhomepage:
/locate/swevo
Regularpaper
Clusteringusingfireflyalgorithm:Performancestudy
J.Senthilnath,S.N.Omkar ,V.Mani
DepartmentofAerospaceEngineering,IndianInstituteofScience,Bangalore,India
articleinfoabstract
AFireflyAlgorithm(FA)isarecentnatureinspiredoptimizationalgorithm,thatsimulatestheflashpatternandcharacteristicsoffireflies.Clusteringisapopulardataanalysistechniquetoidentifyhomogeneousgroupsofobjectsbasedonthevaluesoftheirattributes.Inthispaper,theFAisusedforclusteringonbenchmarkproblemsandtheperformanceoftheFAiscomparedwithothertwonatureinspiredtechniques—ArtificialBeeColony(ABC),ParticleSwarmOptimization(PSO),andotherninemethodsusedintheliterature.ThirteentypicalbenchmarkdatasetsfromtheUCImachinelearningrepositoryareusedtodemonstratetheresultsofthetechniques.Fromtheresultsobtained,wecomparetheperformanceoftheFAalgorithmandconcludethattheFAcanbeefficientlyusedforclustering.
CrownCopyright©2011PublishedbyElsevierLtd.Allrightsreserved.
Articlehistory:
Received10February2011Receivedinrevisedform5May2011
Accepted2June2011
Availableonline30June2011Keywords:ClusteringClassificationFireflyalgorithm
1.Introduction
Clusteringisanimportantunsupervisedclassificationtech-nique,whereasetofpatterns,usuallyvectorsinamulti-dimensionalspace,aregroupedintoclusters(orgroups)basedonsomesimilaritymetric[1–4].Clusteringisoftenusedforavari-etyofapplicationsinstatisticaldataanalysis,imageanalysis,dataminingandotherfieldsofscienceandengineering.
Clusteringalgorithmscanbeclassifiedintotwocategories:hierarchicalclusteringandpartitionalclustering[5,6].Hierarchicalclusteringconstructsahierarchyofclustersbysplittingalargeclusterintosmalleronesandmergingsmallerclusterintotheirnearestcentroid[7].Inthis,therearetwomainapproaches:(i)thedivisiveapproach,whichsplitsalargerclusterintotwoormoresmallerones;(ii)theagglomerativeapproach,whichbuildsalargerclusterbymergingtwoormoresmallerclusters.Ontheotherhandpartitionalclustering[8,9]attemptstodividethedatasetintoasetofdisjointclusterswithoutthehierarchicalstructure.Themostwidelyusedpartitionalclusteringalgorithmsaretheprototype-basedclusteringalgorithmswhereeachclusterisrepresentedbyitscenter.Theobjectivefunction(asquareerrorfunction)isthesumofthedistancefromthepatterntothecenter[6].Inthispaperweareconcernedwithpartitionalclusteringforgeneratingclustercentersandfurtherusingtheseclustercenterstoclassifythedataset.
Apopularpartitionalclusteringalgorithm—k-meansclustering,isessentiallyafunctionminimizationtechnique,wheretheobjectivefunctionisthesquarederror.However,themain
Correspondingauthor.
E-mailaddress:omkar@aero.iisc.ernet.in(S.N.Omkar).
drawbackofk-meansalgorithmisthatitconvergestoalocalminimafromthestartingpositionofthesearch[10].Inordertoovercomelocaloptimaproblems,manynatureinspiredalgorithmssuchas,geneticalgorithm[11],antcolonyoptimization[12],artificialimmunesystem[13],artificialbeecolony[9],andparticleswarmoptimization[14]havebeenused.Recently,efficienthybridevolutionaryoptimizationalgorithmsbasedoncombiningevolutionarymethodsandk-meanstoovercomelocaloptimaproblemsinclusteringareused[15–17].
TheFireflyAlgorithm(FA)isarecentnatureinspiredtech-nique[18],thathasbeenusedforsolvingnonlinearoptimizationproblems.Thisalgorithmisbasedonthebehaviorofsocialinsects(fireflies).Insocialinsectcolonies,eachindividualseemstohaveitsownagendaandyetthegroupasawholeappearstobehighlyorganized.Algorithmsbasedonnaturehavebeendemonstratedtoshoweffectivenessandefficiencytosolvedifficultoptimizationproblems.Aswarmisagroupofmulti-agentsystemssuchasfire-flies,inwhichsimpleagentscoordinatetheiractivitiestosolvethecomplexproblemoftheallocationofcommunicationtomultipleforagesitesindynamicenvironments.
Inthisstudy,theFireflyAlgorithm(FA),whichisdescribedbyYang[18]fornumericaloptimizationproblems,isappliedtoclustering.TostudytheperformanceoftheFAtoclusteringproblems,weconsiderthestandardbenchmarkproblems(13typicaltestdatabases)thatareavailableintheliterature[9,14].TheperformanceoftheFAalgorithmonclusteringiscomparedwiththeresultsofothernatureinspiredtechniques—ArtificialBeeColony(ABC)[19]andParticleSwarmIntelligence(PSO)[20]algorithmonthesametestdatasets[9,14].TheFA,ABCandPSOalgorithmsareinthesameclassofpopulation-based,natureinspiredoptimizationtechniques.Hence,wecomparetheperformanceoftheFAalgorithmwithABCandPSOalgorithms.
2210-6502/$–seefrontmatterCrownCopyright©2011PublishedbyElsevierLtd.Allrightsreserved.doi:10.1016/j.swevo.2011.06.003
萤火虫算法
J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171165
Wealsopresenttheresultsofotherninemethodsusedintheliterature[9,14].Fortheeaseofunderstandingandcomparison,wefollowthesamemannerofanalysisanddiscussions,usedin[9].TheonlykeydifferenceistheuseoftheFAalgorithminthisstudy.Contributionofthispaper:Inthiswork,foragivendatasettheFAisusedtofindtheclustercenters.Theclustercentersareobtainedbyrandomlyselecting75%ofthegivendataset.This75%ofthegivendataset,wecallasatrainingset.TheFAalgorithmusesthistrainingsetandtheclustercentersareobtained.Inordertostudy,theperformanceoftheFAalgorithm,theremaining25%ofdatasetisused(calledtestdataset).TheperformancemeasureusedintheFAistheclassificationerrorpercentage(CEP).ThisCEPisdefinedastheratioofnumberofmisclassifiedsamplesinthetestdatasetandtotalnumberofsamplesinthetestdataset.Thiscanbedonebecauseinthetestdataset,weknowtheactualclassofthetestdata.Thedistancesbetweenthegiventestdataandtheclustercentersarecomputed.Thedataisassignedtotheclustercenter(class)thathastheminimumdistance.Hence,wecancomputetheperformancemeasure—classificationerrorpercentage(CEP).
ThepaperisorganizedastheimplementationoftheFAalgorithminSection2,clusteringusingtheFAandperformanceevaluationinSections3and4respectively,andthenresultspresentedanddiscussedinSection5.WeconcludethepaperinSection6bysummarizingtheobservations.2.Fireflyalgorithm
Firefliesareglowwormsthatglowthroughbioluminescence.Forsimplicityindescribingourfireflyalgorithm,wenowusethefollowingthreeidealizedrules:(i)allfirefliesareunisexsothatonefireflywillbeattractedtootherfirefliesregardlessoftheirsex;(ii)animportantandinterestingbehavioroffirefliesistoglowbrightermainlytoattractpreyandtosharefoodwithothers;(iii)attractivenessisproportionaltotheirbrightness,thuseachagentfirstlymovestowardaneighborthatglowsbrighter[21].TheFireflyAlgorithm(FA)[18]isapopulation-basedalgorithmtofindtheglobaloptimaofobjectivefunctionsbasedonswarmintelligence,investigatingtheforagingbehavioroffireflies.IntheFA,physicalentities(agentsorfireflies)arerandomlydistributedinthesearchspace.Agentsarethoughtofasfirefliesthatcarryaluminescencequality,calledluciferin,thatemitlightproportionaltothisvalue.Eachfireflyisattractedbythebrighterglowofotherneighboringfireflies.Theattractivenessdecreasesastheirdistanceincreases.Ifthereisnobrighteronethanaparticularfirefly,itwillmoverandomly.IntheapplicationoftheFAtoclustering,thedecisionvariablesareclustercenters.TheobjectivefunctionisrelatedtothesumonalltrainingsetinstancesofEuclideandistanceinanN-dimensionalspace,asgivenin[9].
Basedonthisobjectivefunction,initially,alltheagents(fireflies)arerandomlydispersedacrossthesearchspace.Thetwophasesofthefireflyalgorithmareasfollows.
i.Variationoflightintensity:Lightintensityisrelatedtoobjectivevalues[18].Soforamaximization/minimizationproblemafireflywithhigh/lowintensitywillattractanotherfireflywithhigh/lowintensity.Assumethatthereexistsaswarmofnagents(fireflies)andxirepresentsasolutionforafireflyi,whereasf(xi)denotesitsfitnessvalue.HerethebrightnessIofafireflyisselectedtoreflectitscurrentpositionxofitsfitnessvaluef(x)[18].Ii=f(xi),
1≤i≤n.
(1)
ii.Movementtowardattractivefirefly:Afireflyattractivenessis
proportionaltothelightintensityseenbyadjacentfireflies[18].Eachfireflyhasitsdistinctiveattractivenessβwhichimplieshowstrongitattractsothermembersoftheswarm.However,the
attractivenessβisrelative,itwillvarywiththedistancerijbetweentwofirefliesiandjatlocationsxiandxjrespectively,isgivenasrij=‖xi xj‖.
(2)
Theattractivenessfunctionβ(r)ofthefireflyisdeterminedby
β(r)=β0e γr
2
(3)
whereβ0istheattractivenessatr=0andγisthelightabsorptioncoefficient.
Themovementofafireflyiatlocationxiattractedtoanothermoreattractive(brighter)fireflyjatlocationxjisdeterminedbyxi(t+1)=xi(t)+βγr2
0e (xj xi).
(4)
AdetaileddescriptionofthisFAisgivenin[18].Apseudo-codeofthisalgorithmisgivenbelow.
Pseudo-code:AHigh-LevelDescriptionoffireflyalgorithmInput:
Createaninitialpopulationoffirefliesnwithind-dimensionalsearchspacexik,i=1,2,...,nandk=1,2,...,d
Evaluatethefitnessofthepopulationf(xik)whichisdirectlyproportionaltolightintensity,γIikAlgorithm’sparameter—β0Output:
Obtainedminimumlocation:ximinbegin
repeat
fori=1ton
forj=1ton
if(Ij<Ii)
Movefireflyitowardjind-dimensionusingEq.(4)endif
Attractivenessvarieswithdistancerviaexp[ r2]
EvaluatenewsolutionsandupdatelightintensityusingEq.(1)endforjendfori
Rankthefirefliesandfindthecurrentbestuntilstopconditiontrueend
3.ClusteringusingFA
Theclusteringmethods,separatingtheobjectsintogroupsorclasses,aredevelopedbasedonunsupervisedlearning.Intheunsupervisedtechnique,thetrainingdatasetaregroupedfirst,basedsolelyonthenumericalinformationinthedata(i.e.clustercenters),andarethenmatchedbytheanalysttoinformationclasses.Thedatasetsthatwetackledcontaintheinformationofclassesforeachdata.Therefore,themaingoalistofindthecentersoftheclustersbyminimizingtheobjectivefunction,thesumofdistancesofthepatternstotheircenters.
ForagivenNobjectstheproblemistominimizethesumofsquaredEuclideandistancesbetweeneachpatternandallocateeachpatterntooneofkclustercenters.TheclusteringobjectivefunctionisthesumoferrorsquaredasgiveninEq.(5)isdescribedasin[22]:
KJ(K)=
(xi ck)
(5)
k=1i∈ck
萤火虫算法
166J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171
whereKisthenumberofclusters,foragivennpatternxi(i=1,...,n)thelocationoftheithpatternandck(k=1,...,K)isthekthclustercenter,tobefoundbyEq.(6):ck=
xi
(6)
i∈Ck
nk
wherenkisthenumberofpatternsinthekthcluster.
Theclusteranalysisformstheassignmentofdatasetintoclusterssothatitcanbegroupedintosameclusterbasedonsomesimilaritymeasures[23].Distancemeasurementismostwidelyusedforevaluatingsimilaritiesbetweenpatterns.TheclustercentersarethedecisionvariableswhichareobtainedbyminimizingthesumofEuclideandistanceonalltrainingsetinstancesinthed-dimensionalspacebetweengenericinstancexiandthecenteroftheclusterck.Thecost(objective)functionforthepatterniisgivenbyEq.(7),asin[9,14]f
Train
i=
1
DDd(x,
CLknown(xj)jpTraini
)
(7)
j=1
whereDTrainisthenumberoftrainingdatasetwhichisusedtonormalizethesumthatwillrangeanydistancewithin[0.0,1.0]andpCLknown(xj)
todatabase.
idefinestheclassthatinstancebelongstoaccordingNotethatinourFAalgorithm,thedecisionvariablesaretheclustercenters.TheobjectivefunctioninourFAalgorithmisgivenbyEq.(7).Inourstudy,weconsiderthestandard13benchmarkproblemsgivenin[14].Foragivendataset,letnbethenumberofdatapoints,dbethedimension,cbethenumberofclasses.Agivendatapointbelongstoonlyoneofthesecclasses.Ofthegivendataset,75%ofthedatasetarerandomlyselectedtoobtaintheclustercentersusingEq.(7).Inthiswayweobtaintheclustercentersforallthecclasses.Theremaining25%ofdatasetisused(calledtestdataset)toobtaintheclassificationerrorpercentage(CEP).AnillustrativeexampleofthisFAalgorithmanditsperformancemeasure,isgiveninthenextsection.
4.Performancemeasuresandanillustrativeexample
Asdiscussedintheearliersection,ingtheseclustercenters,thetestingdatasetareclassifiedandtheperformanceofclassificationareanalyzed.
4.1.Performanceevaluation
TheperformanceoftheextractedknowledgeintheformofclustercentersbytheFAisevaluatedusingClassificationErrorPercentage(CEP)andclassificationefficiency.CEPdependsonlyontestdataandtheclassificationefficiencydependsonbothtrainingandtestingdata.
4.1.1.ClassificationErrorPercentage(CEP)
CEPisobtainedonlyusingthetestdata[9].Foreachproblem,wereporttheCEPwhichisthepercentageofincorrectlyclassifiedpatternsofthetestdatasetsasgivenin[9],tomakeareliablecomparison.
Theclassificationofeachpatternisdonebyassigningittotheclasswhosedistanceisclosesttothecenteroftheclusters.Then,theclassifiedoutputiscomparedwiththedesiredoutputandiftheyarenotexactlythesame,thepatternisseparatedasmisclassified[9].Thisprocedureisappliedtoalltestdataandthetotalmisclassifiedpatternnumberispercentagedtothesizeoftestdataset,whichisgivenbyCEP=
numberofmisclassifiedsamples
totalsizeoftestdataset
×100.
(8)
20
Class 2
15
y
training dataClass 1
testing data
10
5
0510
152025
x
Fig.1.Datadistribution.
4.1.2.Classificationefficiency
Classificationefficiencyisobtainedusingboththetrainingandtestdata.Theclassificationmatrixisusedtoobtainthestatisticalmeasuresfortheclass-levelperformance(individualefficiency)andtheglobalperformance(averageandoverallefficiency)oftheclassifier[24].Theindividualefficiencyisindicatedbythepercentageclassificationwhichtellsushowmanysamplesbelongingtoaparticularclasshavebeencorrectlyclassified.Thepercentageclassification(ηi)fortheclassciisgivenbyEq.(9).
ηii
i=
qn(9)
qji
j=1
whereqiiisthenumberofcorrectlyclassifiedsamplesandnisthenumberofsamplesfortheclassciinthedataset.Theglobalperformancemeasuresaretheaverage(ηa)andoverall(ηo)classification,whicharedefinedas
η1
nca=nηi
(10)
ci=1
η1
nco=
Nqii(11)i=1
wherencisthetotalnumberofclassesandNisthenumberofpatterns.
4.2.Illustrativeexample
WeillustratehowtheFireflyAlgorithm(FA)isusedforclusteringwiththefollowingsyntheticdata.Althoughtheproposedalgorithmcanbeusedforanytypeofmixturemodel,wefocusonaGaussianmixture.LetusconsidertwoGaussianmixturesthathavetwoinputfeatures,namelyxandy.Here,themeanvaluesµ1=[8,8]Tandµ2=[16,16]T,co-variancematrix(x,y)={(6,3);(3,2)}areassumedandeachclasshaveequalnumberofsamples.Inourexperimentation100samplesaregeneratedrandomlyforeachclass.Ofthese75datapointsareusedfortrainingandtheremaining25isusedfortestingineachclass.ThissyntheticdatageneratedisshowninFig.1.
Weusethefireflyalgorithmontrainingdatatoobtainclustercenters.Letxibeoneofthesolutions(clustercenters)andJibetheobjectivefunctionvalueforthisclustercenter.
Weconsiderapopulationsizeof5firefliesatlocationsx1,x2,x3,x4andx5within2d-dimensional,searchspace.NowevaluatethefitnessofthepopulationJ1,J2,J3J4,andJ5usingEq.(7)whichisdirectlyproportionaltolightintensityI1,I2,I3,I4andI5.Nowcomparetheintensityvaluesofafirefly,if(I2<I1)thenmovefirefly2toward1usingEq.(4),similarlycomparealltheagents
萤火虫算法
J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171167
18
16
14
y
12
Agent
Agent movementCluster center
10
86
46810
1214161820
x
Fig.2.Optimalclustercenters.
andupdatethemovementphaseofeachagentbyevaluatingnewsolutionsandupdatelightintensityusingEq.(1).Thisprocedureiscontinuedtillitconvergestotheoptimalclustercenteri.e.ximin,asshowninFig.2.Theclustercentersgeneratedare
(x,y)={(7.2233,7.6659);(16.0580,16.3230)}.
Theclassificationresultusingthetestingdatasetofeachclasscentersfoundbythefireflyalgorithmhaszeroclassificationerrorpercentage.Fortheentiredataset,theperformanceofindividual,averageandoverallefficiencyis100%.5.Resultsanddiscussion
Inthiswork,wepresenttheresultsobtainedusingtheFireflyAlgorithm(FA)on13typicalbenchmarkdatasetswhicharewellknownintheliterature(UCIdatabaserepository[25]).First,wedescribethecharacteristicsofthestandardclassificationdataset.NextwepresenttheresultsobtainedfromtheFAfor13benchmarkdatasetproblems.FinallywepresentthecomparisonoftheFAwithothertwonatureinspiredtechniques—ArtificialBeeColony(ABC)andParticleSwarmOptimization(PSO)andother9methodsusedintheliterature[9,14]andanalyzetheirperformance.5.1.Datasetdescription
The13classificationdatasetisawell-knownandwell-usedbenchmarkdatasetbythemachinelearningcommunity.Thenumberofdatasets,thenumberofinputfeaturesandthenumberofclassesarepresentedinTable1.These13benchmarkproblemsarechosenexactlythesameasin[9,14],tomakeareliablecomparison.Theentiredatasetissegregatedintotwoparts,the75%ofdataisusedfortrainingpurposeandtheremaining25%ofdataisusedastestingsamples.ThenumberofthetrainingandtestsetscanbefoundinTable1.Aftertraining,weobtaintheclustercenters(extractedknowledge)thatcanbeusedforclassifyingthetestdataset.Theproblemsconsideredinthisworkcanbedescribedbrieflyasfollows.
Dataset1:TheBalancedatasetisbasedonbalancescaleweightanddistance.Itcontains625patternswhicharesplitinto429fortrainingand156fortesting.Theirare4integervaluedattributesand3classes.
Dataset2and3:TheCancerandCancer-Intdatasetisbasedonthediagnosisof‘‘breastcancerWisconsin—Diagnostic’’and‘‘breastcancerWisconsin—Original’’datasetsrespectively.Itcontains2classeswithatumoraseitherbenignormalignant.Acancerdata
Table1
setcontains569patternswith30attributesandtheCancer-Intcontains699patterns,9attributes.
Dataset4:TheCreditdatasetisbasedontheAustraliancreditcardtoassessapplicationsforcreditcards.Thereare690patterns(numberofapplicants),15inputfeaturesandtheoutputhas2classes.
Dataset5:TheDermatologydatasetisbasedondifferentialdiagnosisoferythemato-squamousdiseases.Thereare6classes,366samples,and34attributes.
Dataset6:ThePima—Diabetesdatasethas768instancesof8attributesandtwoclasseswhicharetodetermineifthedetectionofdiabetesispositive(classA)ornegative(classB).
Dataset7:TheEscherichiacolidatasetisbasedonthecellularlocalizationsitesofproteins.Heretheoriginaldatasethas336patternsformedof8classes,but3classesarerepresentedwithonly2,2and5numberofpatterns.Therefore,these9examplesareomittedbyconsidering327patterns,5classesand7attributes.Dataset8:TheGlassdatasetisdefinedintermsoftheiroxidecontentasglasstype.Nineinputsarebasedon9chemicalmeasurementswithoneof6typesofglass.Thedatasetcontains214patternswhicharesplitinto161fortrainingand53fortesting.Dataset9:TheHeartdatasetisbasedonthediagnosisofheartdisease.Itcontains76attributesforeachpattern,35ofwhichareusedasinputfeatures.ThedataisbasedonClevelandHeartdatafromtherepositorywith303patternsand2classes.
Dataset10:TheHorsedatasetisusedtopredictthefortuneofahorsewithacolicandtoclassifywhetherthehorsewilldie,willsurvive,orwillbeeuthanized.Thedatasetcontains364patterns,eachofwhichhas58inputsfrom27attributesand3classes.
Dataset11:TheIrisdatasetconsistsofthreevarietiesofflowers—setosa,virginicaandversicolor.Thereare150instancesand4attributesthatmakeupthe3classes.
Dataset12:TheThyroiddatasetisbasedonthediagnosisofthyroidwhetheritishyperorhypofunction.Thedatasetcontains215patterns,5attributesand3classes.
Dataset13:TheWinedataobtainedfromthechemicalanalysisofwineswerederivedfromthreedifferentcultivators.Thedatasetcontains3typesofwines,with178patternsand13attributes.5.2.Resultsobtainedusingfireflyalgorithm
Inthissection,wediscusstheresultsobtainedusingtheFireflyAlgorithm(FA)on13benchmarkdatasetproblemsandcomparetheFAwithother11methodsusedintheliteraturebasedontheperformancemeasures.
5.2.1.FAclusteringandparametersetting
Thefirefliesareinitializedrandomlyinthesearchspace.Theparametervaluesusedinouralgorithmare
萤火虫算法
168J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171
Table2
Averageclassificationerrorpercentagesusingnatureinspiredtechniquesontest
Numberoffireflies(N)=20Attractiveness(β0)=1
Lightabsorptioncoefficient(γ)=1Numberofgenerations(T)=100.
Formostoftheapplications,thesameparametervaluesaresuggestedbyYang[18].Afterthefirefliesaredeployedrandomlywithinthesearchspace,theparameterβ0=1whichisequivalenttotheschemeofcooperativelocalsearchwiththebrightestfireflystronglydeterminedtheotherfirefliespositions,especiallyinitsneighborhood.Theparametervalueofγ=1determinesthevariationoflightintensitywithincreasingdistancefromthecommunicatedfirefly,resultsinthecompleterandomsearch.
Thenumberoffunctionevaluationsinthefireflyalgorithmcanbeobtainedasfollows:letNbethesizeofinitialpopulation,andTbethemaximumnumberofgeneration.Thenthenumberof
functionevaluationsforeachiterationisN (N 1)
N (N 12.Thetotalnumberoffunctionevaluationsgeneratedis)
wehaveused100asthemaximumnumber2
×T.Inourstudies,ofgenerations.Thenumberoffunctionevaluationforeach13classificationdataset,(withN=20andT=100)inonesimulationrunis19000.5.2.2.AnalysisofClassificationErrorPercentageusingFA
In[9,14],theClassificationErrorPercentage(CEP)measureisusedwithallthe13benchmarkdatasets.Falcoetal.[14]comparedtheperformanceofthePSOalgorithmwiththeother9methodsnamelyBayesNet[26],MultiLayerPerceptronArtificialNeuralNetwork(MLP)[27],RadialBasisFunctionArtificialNeuralNetwork(RBF)[28],KStar[29],Bagging[30],MultiBoostAB[31],NaiveBayesTree(NBTree)[32],RippleDownRule(Ridor)[33]andVotingFeatureInterval(VFI)[34].KarabogaandOzturk[9]implementedtheABCalgorithmandanalyzedCEPwithalltheabovementionedmethods.Inthisstudy,inadditiontothesemethods[9,14]wehaveanalyzedtheCEPmeasureoftheFAtomakereliablecomparison.
FromthetrainingdatasettheknowledgeintheformofclustercentersisobtainedusingtheFireflyAlgorithm(FA).FortheseclustercentersthetestingdatasetsareappliedandtheCEPvaluesareobtained.Theresultsofthenatureinspiredtechniques—FA,ABCandPSOfortheproblemsaregiveninTable2whereCEPvaluesarepresented.TheFAoutperformstheABCandPSOalgorithmsinall13problems,whereasABCalgorithm’sresultisbetterthanthatofPSOalgorithminall12problemsexceptforoneproblem(theglassproblem)intermsofclassificationerror.Moreover,theaverageclassificationerrorpercentagesisalsobetterforallproblemsinthecaseofFA(11.36%)comparingtothatofABC(13.13%)andPSO(15.99%).
FromTable3,wecanobservethattheCEPmeasureoftheFAand11methodsthataregivenin[9,14]arepresented,andtherankingisbasedontheascendingorderofaverageclassification
erroroftheclassifiersoneachproblemarealsogivenintheparenthesis.Ataglance,onecaneasilyseethattheFAgetsthebestsolutionin8oftheproblemsamong13problemsused.Tobeabletomakeagoodcomparisonofallthealgorithms,Tables4and5arereported.Table4showstheaverageclassificationerrorsofallproblemsandtherankingisbasedontheascendingorderofaverageclassificationerror.Table5showsthesumofthealgorithmsrankingofeachprobleminascendingorder.
FromTable4,wecanobservethatbasedonaverageCEPvalues,FAisthebestincomparisonwiththatofMLPartificialneuralnetworktechniqueandABC,whileMLPperformedbetterincomparisonwiththeABC.However,eveniftheresultsinthetablearecomparable,webelievethatitmaycausesomesignificantpointstobedisregardedsincethedistributionoftheerrorratesarenotproportional.Therefore,thegeneralrankingofthetechniquesinTable5isrealizedbycalculatingthesumoftheranksofeachproblemfromTable4.Fromthisranking,onceagaintheFAisthebest,whiletheABCalgorithmatthesecondposition,andtheBayesNettechniqueatthethirdposition.TheclassificationerrorrateandrankingsfromthetablesshowthatclusteringwiththeFAofferssuperiorgeneralizationcapability.Notethat,hereweareusingonlytheresultsofalltheothermethodsgiveninearlierstudies[9,14]excepttheFAalgorithm.
5.2.3.AnalysisofclassificationefficiencyusingFA
Intheprevioussection,wepresentedtheresultobtainedusingCEP.This(CEP)ingthesameclustercenterstheaverageandoverallefficiencyforentiredatasetareobtained.
i.Significanceofindividualefficiency:Foratestingdataset,themainsignificanceofindividualefficiencyistoanalyzetheclass-levelperformanceofaclassifieralgorithm.FromTable6,wecanobservethattheindividualclassificationefficiencyofthetestingsamples,hereCancer-Int,Iris,ThyroidandWineisgettingclassifiedwithoutanymisclassificationsandhencehasanindividualefficiencyof100%.InthecaseofBalancedatasettheindividualefficiencyofClass2is66.7%.InCreditandDiabetesdatasetclass1ismisclassifiedasclass2withindividualefficiencyof70%whereasinHeartandDermatologydatasetClass2haslessindividualefficiencyof73.1%and50%respectively.
FormTable3wecanobservethat,fortheclassificationproblem—Heart(2classproblem),theFAperformedbetterthanalltheotherclassifierwiththeCEPvalue13.16.Thisdoesnotmeanthattheindividualefficiencyofeachclasstobegood.Toillustratethisinmoredetail,letusconsiderHeartdataset,fromTable6wecanobservethattheClass1hasimpressiveindividualefficiencyof94%whereasinClass2mostofthesamplesbelongingtoClass2ismisclassifiedasClass1withindividualefficiency73.1%.Henceitisimportanttoconsidertheindividualefficiencytoanalyzetheclass-levelperformanceofaclusteringalgorithm.
ii.Performanceofaverageandoverallefficiency:Forentiredataset,itisalwaysnecessarytoknowtheglobalperformanceofaalgorithm.Thiscanbeachievedbyusingaverageandoverallefficiency.AswecannoticefromTable6,fortheentiredatasettheaverageandoverallefficiencyusingthefireflyalgorithmfor13benchmarkdatasets.TheBalancedatasethasanaverageandoverallefficiencyof74.9%and80.8%respectivelywhereasCancerdatasetwithaverageandoverallefficiencyof91%and92.5%respectively.AnaverageandoverallefficiencyofCancer-Int,Dermatology,IrisandWinedatasetare97.9%,81.9%,94.7%and90.6%respectively.TheaverageefficiencyofCredit,Diabetes,E.Coli,HeartandThyroidare75.5%,73.4%,88.5%,77.1%and92.6%.TheoverallefficiencyofCredit,Diabetes,E.Coli,HeartandThyroid
萤火虫算法
J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171
Table3
169
Table4
Table5
Table6
FAbestclassificationefficiency.
are78.7%,75.9%,89.2%,78.7%and94.2%.InthecaseofGlassandHorsedatasettheFAhasalessaverageefficiencyof70.8%and53.7%respectivelyandoverallefficiencyof61.6%and66.1%respectively.
parisonofclassificationefficiencyofnatureinspiredtechniqueusingIrisdataset
FormTable6wecanobservethat,forthestandardbenchmarkproblems—Cancer-Int,Iris,ThyroidandWinetheirisnomisclassi-ficationinanyoftheirindividualclassesi.e.allthetestdatasetareclassifiedcorrectlyandhenceCEPvalueis0.Thisdoesnotmeanthattheoverallefficiencyis100%.Toillustratethisinmorede-tail,letusconsiderIrisdatasetascomparedtoCancer-Int,Thy-roidandWine,ithaslessinputfeatures.TheFA,ABCandPSOareinthesameclassofpopulation-based,natureinspiredoptimiza-tiontechniques.Herewecomparethethreenatureinspiredtech-niquetoextractknowledgeintheformofclustercentersandtheperformanceisanalyzedusingclassificationefficiency.TheclustercentersgeneratedusingtheFA,ABCandPSOforIristrainingdataareshowninTable7.HeretheclustercentersobtainedforABCmatcheswiththepublishedliterature[35].TheparametervalueusedfortheFAisasgiveninSection5.2.1.ForABCandPSOweconsidertheparametervalueasin[9,14]respectively.
FromTable8,wecanobserveincomparisonwithothernatureinspiredtechniquestheoptimalandmeanfitnessvaluesobtainedusingtheFAisbetterthanABCandPSO.Beingacontinuousoptimizationproblem,initiallyeverypossibleclustercenterspickedbythepopulationisnotthebestoptimalpointinthesearchspace.Thereforetheselectionofnewclustercentersafterfitness
萤火虫算法
170J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171
Table7
Table8
Resultsofnatureinspiredtechniquesafter20runsforoptimalclustercentersusing
Table9
ComparisonofclassificationefficiencyforIrisdataset.evaluationiseffectivelyscannediterativelytillalltheparticlesconvergetoanoptimalresulti.e.intheformofclustercenters.IntheFAalgorithm,afireflyparticlemovestowardanotherfireflyparticle,whichhasabetterobjectivefunctionvalue(fitness).Thedistancemovedbythefireflyineachinstanceisgivenbythedistancebetweenthetwofireflyparticles(r).Whenthevalueofrislarge/small,thefireflywillmoveasmall/largedistance.Thiswillaffectthecomputationtimeofthisalgorithm.InPSOeachparticlewillmoveadistancebasedonitspersonalbestandtheglobalbest.InABCeachbeeparticlepositionwillbecomparedtwicewithbestparticleposition.Furthertheclustercentersgeneratedbythesealgorithmisanalyzedusingthedistancesbetweenthegivendataandtheclustercentersarecomputed.Thedataisassignedtotheclustercenter(class)thathastheminimumdistance.Theperformancemeasurewillhelpsustoexaminewhichmethodhasgeneratedtheoptimalclustercenters.
TheclassificationmatrixfortheentireIrisdatasetareshowninTable9.Fromthistable,wecanobservethat,forallthenatureinspiredalgorithms,samplesbelongingtoClass2andClass3aregettingmisclassifiedasClass3andClass2respectively.FortheFAandABCgeneratedoptimalclustercenters,theperformanceofindividualefficiencyofClass2is96%whereasPSOhas92%.TheindividualefficiencyofClass3usingFAis88%whichisbetterincomparisonwiththatofABCandPSOwith72%and74%respectively.Inallthealgorithms,Class1isclassifiedwithoutanymisclassificationandhencehaveindividualefficiencyof100%,asitislinearlyseparableincomparisonwiththatofother2classes.Also,theaverageandoverallefficiencyisbetterinthecaseoftheFAwith94.7%incomparisontoABCandPSOis89.3%and88.7%respectively.Henceitisimportanttoconsidertheindividual,averageandoverallefficiencyinmulti-classclassificationproblemforageneratedclustercenters.
Itisimportanttonotethattheperformanceofclusteringmainlydependsonthesizeandqualityoftrainingdataset.Therearesomemethodsavailableintheliteraturefortheselectionoftrainingdata
set[36].Earlierstudyshowedthattheproperselectionoftrainingdatasetimprovestheperformancemeasure[36].Inourstudy,wehaveselected75%oftrainingdatasetrandomlyandtabulatedtheresultbasedonthemostfavorableperformancemeasurefortheselectedtrainingdataset.
Inoverallformostofthedataset,theFAhasgoodglobalperformance.WecanclaimthatbylookingattheaccuracyandrobustnessofFA,itcanbeusedforclusteringproblemsstudiedinthispaper.
6.Discussionsandconclusions
Thispaperinvestigatesanewnatureinspiredalgorithm—theFAisusedforclusteringandevaluatingitsperformance.TheFAalgorithmiscomparedwithABCandPSOasallthesemethodsareinthesameclassofpopulation-based,natureinspiredoptimizationtechniques.Asinotherpopulation-basedalgorithms,theγperformanceoftheFAdependsonthepopulationsize,β,and.IntheFAalgorithm,afireflyparticlemovetowardanotherfireflyparticle,whichhasabetterobjectivefunctionvalue(fitness).Thedistancemovedbythefireflyineachinstanceisgivenbythedistancebetweenthetwofireflyparticles(r).Theeffectofthevaluesofβandγarediscussedin[18].Whenthevalueofrislarge/small,thefireflywillmoveasmall/largedistance.Thiswillaffectthecomputationtimeofthisalgorithm.InPSOeachparticlewillmoveadistancebasedonitspersonalbestandtheglobalbest.InABCeachbeeparticlepositionwillbecomparedtwicewithbestparticleposition.IntheFAalgorithmonlythedistanceisnecessaryforthemovement.Theperformancemeasure(CEP),willhelpsustoexaminewhichmethodhasgeneratedtheoptimalclustercenters.Theclusteringtaskof13benchmarkdatasetsareaccomplishedsuccessfullybytheprocedureofpartitionalclusteringusingare-centnatureinspiredtechnique—FireflyAlgorithm(FA).Clusteringisanimportanttechniquetoidentifyhomogeneousclusters(orclasses)suchthatthepatternsforaclustercentershareahighdegreeofaffinitywhilebeingverydissimilarforotherclusters.TheperformanceoftheFAusingclassificationerrorpercentageiscomparedwithothertwonatureinspiredtechniques—ArtificialBeeColony(ABC)andParticleSwarmOptimization(PSO)andotherninemethodswhicharewidelyusedbytheresearchers.Theper-formancemeasureusingclassificationefficiency—individual,aver-ageandoverallefficiencyoftheFAisanalyzedusing13benchmarkproblems.Fromtheresultsobtained,wecanconcludethattheFAisanefficient,reliableandrobustmethod,whichcanbeappliedsuccessfullytogenerateoptimalclustercenters.Acknowledgments
Theauthorswouldliketothankthereviewersfortheircommentswhichwereusefulduringtherevisionofthisstudy.References
[1]M.R.Anderberg,ClusterAnalysisforApplication,AcademicPress,NewYork,
1973.
[2]J.A.Hartigan,ClusteringAlgorithms,Wiley,NewYork,1975.
[3]P.A.Devijver,J.Kittler,PatternRecognition:AStatisticalApproach,Prentice-Hall,London,1982.
[4]A.K.Jain,R.C.Dubes,AlgorithmsforClusteringData,Prentice-Hall,Englewood
Cliffs,1988.
[5]H.Frigui,R.Krishnapuram,Arobustcompetitiveclusteringalgorithmwith
applicationsincomputervision,IEEETransactionsonPatternAnalysisandMachineIntelligence21(1999)450–465.
[6]Y.Leung,J.Zhang,Z.Xu,Clusteringbyscale-spacefiltering,IEEETransactions
onPatternAnalysisandMachineIntelligence22(2000)1396–1410.
[7]D.Chris,XiaofengHe,Clustermergingandsplittinginhierarchicalclustering
algorithms,in:Proc.IEEEICDM,2002,pp.1–8.
[8]B.Mirkin,MathematicalClassificationandClustering,KluwerAcademic
Publishers,Dordrecht,1996.
[9]D.Karaboga,C.Ozturk,Anovelclusterapproach:ArtificialBeeColony(ABC)
algorithm,AppliedSoftComputing11(1)(2010)652–657.
萤火虫算法
J.Senthilnathetal./SwarmandEvolutionaryComputation1(2011)164–171
171
[10]S.Z.Selim,M.A.Ismail,K-meanstypealgorithms:ageneralizedconvergence
theoremandcharacterizationoflocaloptimality,IEEETransactionsonPatternAnalysisandMachineIntelligence6(1984)81–87.
[11]E.Falkenauer,GeneticAlgorithmsandGroupingProblems,Wiley,Chichester,
1998.
[12]Y.Kao,K.Cheng,AnACO-basedclusteringalgorithm,in:M.Dorigo,etal.(Eds.),
ANTS,in:LNCS,vol.4150,Springer,Berlin,2006,pp.340–347.
[13]R.Younsi,W.Wang,Anewartificialimmunesystemalgorithmforclustering,
in:Z.R.Yang(Ed.),LNCS,vol.3177,Springer,Berlin,2004,pp.58–64.
[14]IDeFalco,A.D.Cioppa,E.Tarantino,Facingclassificationproblemswith
particleswarmoptimization,AppliedSoftComputing7(3)(2007)652–658.[15]T.Niknam,B.Amiri,J.Olamaei,A.Arefi,Anefficienthybridevolutionary
optimizationalgorithmbasedonPSOandSAforclustering,JournalofZhejiangUniversity:ScienceA10(4)(2009)512–519.
[16]T.Niknam,E.TaherianFard,N.Pourjafarian,A.R.Rousta,Anefficienthybrid
algorithmbasedonmodifiedimperialistcompetitivealgorithmandk-meansfordataclustering,EngineeringApplicationsofArtificialIntelligence24(2)(2011)306–317.
[17]T.Niknam,B.Amiri,AnefficienthybridapproachbasedonPSO,ACOand
k-meansforclusteranalysis,AppliedSoftComputing10(1)(2010)183–197.[18]X.S.Yang,Nature-InspiredMetaheuristicAlgorithms,LuniverPress,2008.[19]D.Karaboga,A.Basturk,OntheperformanceofArtificialBeeColony(ABC)
algorithm,AppliedSoftComputing8(1)(2008)687–697.
[20]J.Kennady,R.C.Eberhart,Particleswarmoptimization,in:IEEEIntl.Conf.on
NeuralNetworks,vol.4,1995,pp.1942–1948.
[21]J.Tyler,Glow-worms./galaxypix/Tylerbookpt1.
html.
[22]Y.Marinakis,M.Marinaki,M.Doumpos,N.Matsatsinis,C.Zopounidis,A
hybridstochasticgenetic-GRASPalgorithmforclusteringanalysis,OperationalResearchAnInternationalJournal8(1)(2008)33–46.
[23]A.K.Jain,M.N.Murty,P.J.Flynn,Dataclustering:areview,ACMComputing
Surveys31(3)(1999)264–323.
[24]S.Suresh,N.Sundararajan,P.Saratchandran,Asequentialmulti-category
classifierusingradialbasisfunctionnetworks,Neurocomputing71(7–9)(2008)1345–1358.
[25]C.L.Blake,C.J.Merz,UniversityofCaliforniaatIrvineRepositoryofMachine
LearningDatabases,1998.http://www.ics.uci.edu/mlearn/MLRepository.html.
[26]F.Jensen,AnIntroductiontoBayesianNetworks,UCLPress,Springer-Verlag,
1996.
[27]D.E.Rumelhart,G.E.Hinton,R.J.Williams,Learningrepresentationbyback
propagationerrors,Nature323(1986)533–536.
[28]M.H.Hassoun,FundamentalsofArtificialNeuralNetworks,TheMITPress,
Cambridge,1995.
[29]J.G.Cleary,L.E.Trigg,K*:aninstance-basedlearnerusinganentropicdistance
measure,in:Proceedingsofthe12thInternationalConferenceonMachineLearning,1995,p.108–114.
[30]L.Breiman,Baggingpredictors,MachineLearning24(2)(1996)123–140.[31]G.I.Webb,Multiboosting:atechniqueforcombiningboostingandwagging,
MachineLearning40(2)(2000)159–196.
[32]R.Kohavi,ScalinguptheaccuracyofNaive–Bayesclassifiers:adecisiontree
hybrid,in:ProceedingsoftheSecondInternationalConferenceonKnowledgeDiscoveryandDataMining,AAAIPress,1996,pp.202–207.
[33]pton,R.Jansen,Knowledgeincontext:astrategyforexpertsystem
maintenance,in:ProceedingsofArtificialIntelligence,in:LNAI,vol.406,Springer-Verlag,Berlin,1988,pp.292–306.
[34]G.Demiroz,A.Guvenir,Classificationbyvotingfeatureintervals,in:
ProceedingsoftheSeventhEuropeanConferenceonMachineLearning,1997,pp.85–92.
[35]C.Zhang,D.Ouyang,J.Ning,Anartificialbeecolonyapproachforclustering,
ExpertSystemswithApplications37(7)(2010)4761–4767.
[36]T.Yoshida,S.Omatu,Neuralnetworkapproachtolandcovermapping,IEEE
TransactionsonGeoscienceandRemoteSensing32(5)(1994)1103–1109.
正在阅读:
Clustering using firefly algorithm Performance study05-31
新思维小学数学期末模拟测试(一)03-08
大二上大学英语期末考试翻译题答案05-30
重污染天气应急处置预案模板04-05
企业内部控制面临的问题与解决对策03-29
宝安区建筑立面刷新和屋顶改造标准指引03-29
DSP281209-27
英文信件开头问候语02-07
论如何提高学生英语单词的记忆能力12-23
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- Performance
- Clustering
- algorithm
- firefly
- using
- study