Empirical project monitor A tool for mining multiple project data
更新时间:2023-07-17 20:17:01 阅读量: 实用文档 文档下载
- empirical推荐度:
- 相关推荐
Project management for effective software process improvement must be achieved based on quantitative data. However, because data collection for measurement requires high costs and collaboration with developers, it is difficult to collect coherent, quantita
EmpiricalProjectMonitor:AToolforMiningMultipleProjectData
MasaoOhira ,ReishiYokomori ,MakotoSakai ,Ken-ichiMatsumoto ,KatsuroInoue ,KojiTorii
NaraInstituteofScienceandTechnology
ohira@empirical.jp,{matumoto,torii}@is.aist-nara.ac.jp
GraduateSchoolofInformationScienceandTechnology,OsakaUniversity
{yokomori,inoue}@ist.osaka-u.ac.jp
SRAKeyTechnologyLaboratory,Inc.
sakai@sra.co.jp
Abstract
Projectmanagementforeffectivesoftwareprocessim-provementmustbeachievedbasedonquantitativedata.However,becausedatacollectionformeasurementrequireshighcostsandcollaborationwithdevelopers,itisdif culttocollectcoherent,quantitativedatacontinuouslyandtoutilizethedataforpracticingsoftwareprocessimprove-ment.Inthispaper,wedescribeEmpiricalProjectMoni-tor(EPM)whichautomaticallycollectsandmeasuresdatafromthreekindsofrepositoriesinwidelyusedsoftwaredevelopmentsupportsystemssuchascon gurationman-agementsystems,mailinglistmanagersandissuetrackingsystems.Providingintegratedmeasurementresultsgraphi-cally,EPMhelpsdevelopers/managerskeepprojectsundercontrolinrealtime.
1Introduction
Insoftwaredevelopmentinrecentyears,improvementofsoftwareprocessisincreasinglygainingattention.Itsprac-ticeinsoftwareorganizationsconsistsofrepeatedlymea-suringthedevelopmentactivities, ndingpotentialprob-lemsintheprocesses,assessingimprovementplans,andprovidingfeedbackintotheprocesses.Projectmanage-mentforeffectivesoftwareprocessimprovementmustbeachievedbasedonquantitativedata.
Manysoftwaremeasurementmethodshavebeenpro-posedtobetterunderstand,monitor,control,andpredictsoftwareprocessesandproducts[4].Forinstance,theGoal-Question-Metric(GQM)paradigm[2]providesasophisti-catedmeasurementtechnique.GQMguidestosetupmea-surementgoals,createquestionsbasedonthegoals,andde-terminemeasurementmodelsandproceduresbasedonthe
questions.ThemeasurementbasedonGQMisalogicalandreasonablemethod.
However,initspractice,memberswhoparticipateinmeasurementactivitiesneedtostriveforthemeasurementprocessesoneverylastdetail.Datacollectionformeasure-mentingeneralrequireshighcostsandcollaborationwithdevelopers.Itisdif culttocollectcoherent,quantitativedatacontinuouslyandmoreovertoutilizethecollecteddataforpracticingsoftwareprocessimprovement.Fewstudieshaveproposedmeasurementtoolsfordealingwithanumberofprojectdataespeciallyintermsofalarge-scalesoftwareorganization.
Asameasurement-basedapproachtotheaboveis-sues,wehavebeenstudyingempiricalsoftwareengineer-ing[1,3]whichevaluatesvarioustechnologiesandtoolsbasedonquantitativedataobtainedthroughactualuse.Ourgoalistodevelopanenvironmentcomposedofavarietyoftoolsforsupportingmeasurementbasedsoftwareprocessimprovement,whichwecallEmpiricalsoftwareEngineer-ingEnvironment(ESEE).
Inthispaper,weintroduceEmpiricalProjectMonitor(EPM)asapartialimplementationofESEE,whichau-tomaticallycollectsandmeasuresquantitativedatafromthreekindsofrepositoriesinwidelyusedsoftwaredevel-opmentsupportsystemssuchascon gurationmanagementsystems,mailinglistmanagersandissuetrackingsystems.Collectingsuchthedatainsoftwaredevelopmentautomat-icallyandprovidingintegratedmeasurementresultsgraph-ically,EPMhelpsdevelopers/managerskeeptheirprojectsundercontrolinrealtime.
2EmpiricalProjectMonitor(EPM)
WehavedevelopedEmpiricalProjectMonitor(EPM)[9]whichautomaticallycollectsandanalyzesdatafrommulti-
Project management for effective software process improvement must be achieved based on quantitative data. However, because data collection for measurement requires high costs and collaboration with developers, it is difficult to collect coherent, quantita
Figure1.ThearchitectureofEPMintheESEEframework
plesoftwarerepositories.Figure1showsthearchitectureofEPMintheESEEframework.TheESEEframeworkisde-signedforsupportingmeasurementbasedprocessimprove-mentinsoftwareorganizationsbyprovidingvariousplug-gabletools.EPMconsistsoffourcomponentsaccordingtotheESEEframework:datacollection,formattranslation,datastore,anddataanalysis/visualization.Thissectionde-scribesanoverviewofEPMandthebasicdata owthroughEPM.
Automaticdatacollection:EPMautomaticallycollectsmultipleprojectdatafromthreekindsofrepositoriesinwidelyusedsoftwaredevelopmentsupportsystems.Forinstance,EPMcollectsversioninghistoriesfromcon gura-tionmanagementsystems(e.g.CVS1),mailarchivesfrommailinglistmanagers(e.g.Mailman2,Majordomo3,fml4),andissuetrackingrecordsfrom(bug)issuetrackingsys-tems(e.g.GNATS5,Bugzilla6).Becausethesedataareaccumulatedthrougheverydaydevelopmentactivitiesus-ingcommonGUItools(e.g.SourceShareTM7,WinCVS8),developers/managersdonotneedadditionalworkfordatacollection.Also,itdosenottakehighcoststointroduceEPMintoprojects/organizationsbecausethesystemsasthesourcesofdatacollectionareopensourcefreeware.
Formattranslationanddatastore:EPMconvertsthecollecteddataintotheXMLformatcalledthestandardizedempiricalsoftwareengineeringdata,sothatEPMcandeal
/
/
3Majordomo,/majordomo/4fml,/index.html.en
5GNATS,/software/gnats/6Bugzilla,/
7SourceShareTM,/8WinCVS,/
2Mailman,1CVS,
withnotonlytheabovethreekindsofsoftwarerepositoriesbutalsovariouskindsofrepositoriesaccordingtopurposesformeasurement.Datafromothersystemsareavailablebysmalladjustmentsofparameters.ThedataconvertedintotheXMLformatisstoredinthePostgreSQL9database.Analysisandvisualization:EPManalyzesthedatastoredinthePostgreSQLdatabase.Forinstance,inordertoanalyzedatarelatedtoCVS,EPMextractstheprocessdataabouteventssuchascheckin/checkout,transitionsofsourcecodesize,versionhistoriesofcomponents,andsoforth.Then,EPMvisualizesvariousmeasurementresultssuchasthegrowthoflinesofcodeandtherelationshipbe-tweencheckinandcheckout.EPMalsoprovidessummariesofeachrepositorysuchasinformationofCVSlogs.Allthemeasurementresultsareavailablethroughusingcommonwebbrowsers(e.g.seeFigure2),sothatusersareeasytosharetheresults.
Inthisway,EPMsupportsuserstoobtainquantitativedataatlowcostinrealtimeandprovidesthemwithvariousmeasurementresultsforunderstandingthecurrentdevelop-mentstatus.Thiswouldhelpuserskeeptheirprojectsundercontrol.
3Visualizationsofmeasurementresults
Dataminingtechniquesforsoftwarerepositorieshavebeenproposedtounderstandreasonsofsoftwarechanges[7],toidentifyhowcommunicationdelayamongdevel-opersinphysicallydistributedenvironmentshaveeffectsonsoftwaredevelopment[8],todetectpotentialsoftwarechangesandincompletechanges[11],andsoforth.Incon-trasttothesetools,thefeaturesofEPMaretovisualize
9PostgreSQL,
/
Project management for effective software process improvement must be achieved based on quantitative data. However, because data collection for measurement requires high costs and collaboration with developers, it is difficult to collect coherent, quantita
Figure2.Measurementresultsthroughwebbrowsers
combinationsofmeasurementresultsfromthreekindsofsoftwarerepositoriesandtobeabletodealwithdatafrommultipleprojectssimultaneously.
3.1Combinationsofmeasurementresults
Inadditiontoprovidingvisualizationsofmeasurementresultsfromeachsoftwarerepository,EPMalsovisualizescombinationsofmeasurementresultsfromthreekindsofrepositories.Thefollowingsshowtwoexamplesofthem.Bugissuesandcheckins:Figure3representsthere-lationshipbetweenthetransitionofthecumulativetotalofissues(thelinegraph)andthetimeofcheckins(thegrayedverticallinesontheX-axis)inourEASEproject[6].Thenumberofissuesandcheckinsaremeasuredfromtherepos-itoryinGNATSandCVSrespectively.Acheckinoftenoc-cursafterbugissuesarereportedbecausedeveloperstrytomodifyorresolvetheissues.Thegraphhelpsusers(de-velopers/managers)rememberthesituationwhereissuesbyevery leversionswereraised.Tothecontrary,the leit-selfwhichischeckedinCVSmayincludesomebugsifthegraphindicatesthatthereareissuesaftercheckins.
Bugissuesande-mailsamongdevelopers:Figure4illustratesthecommunicationhistoryamongdevelopersintheEASEproject.Theblacklinegraphisthetransitionofthecumulativetotalofe-mailsexchangedthroughusingMailman.Theverticalshorter/longerdashedlinesrepre-sentswhenbugissueswereraised/resolved.Thelight-grayverticallinesmeanwhenthechecked-in lesbydeveloperswereuploadedtoCVS.Fromthegraph,userscancon rmthestateofthecommunicationamongdevelopersandiden-tifythe leversionswhichmighthaveproblems.Becausediscussionsonissuesbecomeactiveusuallywhenissues
are
Figure3.Relationshipbetweenissuesandcheckins
reportedtoanissuetrackingsystem,municationproblemsamongdevelopersbringthede-creaseofsoftwareproductivityandreliability
[8].
Figure4.Historyofbugissuesande-mailsamongdevelopers
Theintegratedmeasurementresultsbasedondatafromcon gurationmanagementsystems,mailinglistmanagers,andissuetrackingsystemshelpdevelopersunderstandcur-rentandpasteventsindevelopmentactivities.
3.2Visualizationsofmultipleprojectdata
paringcurrentprojectswithpastoneswouldbehelp-fulformanagerstoestimatetheprogressofprojectsandtodetecttheunusualstatusinprojects.
Project management for effective software process improvement must be achieved based on quantitative data. However, because data collection for measurement requires high costs and collaboration with developers, it is difficult to collect coherent, quantita
Comparisonofmeasurementresultsamongmulti-pleprojects:EPMmakesmeasurementresultscompara-blewithmultipleprojects.Figure5representstherelation-shipofthegrowthoflinesofcodebetweentwoproject(theupperline:SPARS[10],thelowerline:EASE).Thebothprojectshavebeenproceedingunderthecollaborativere-searchwithauthors’universitiesandsomesoftwarecom-panies.Someresearchersanddevelopershavebeenpar-ticipatinginthebothprojects.Actuallyalthoughthebothhavedifferentpurposesandaspects,supposeherethattheyhavebeendevelopingsoftwaresystemsrespectivelyundersimilarconditions.Theprojectmanagerscancon rmsomecommoncharacteristicsandroughlyestimatetheprogressofthelaterproject(EASE)fromthegraph.Forinstance,SPARShasthetwophasesinwhichithaveevolvedrapidlyforreleasingmajorversions.EASEhasjustreleasedthe rstmajorversion.ThemanagersareeasytoguessthenearfutureoftheprogressofEASE:thedevelopmentofEPMwillstopforawhiletotesttheEPM,toreconsiderthede-sign,andsoforth.
parisonoftwoprojects
Distributionmapsofmultipleprojects:Usingmea-surementresultsfromthreekindsofrepositoriesinmulti-pleprojects,EPMcangeneratedistributionmaps.Figure6isadistributionmapusing100OpenSourceSoftwareDevelopment(OSSD)10,11,whichrepresentstherelationshipbetweenlinesofcode(theX-axis)andnumberofcheckins(theY-axis).Supposeherethattheseprojectsaremanagedbyonesoftwareorganization.Thegraphcanbeusedforhelp-ingmanagersidentify“unusual”projectswhichindicateex-tremehighorlowvalues.
,
/
11We
selectedthe100projectsinFigure6randomlyfromthelistof
mostactiveprojectsin
.
Figure6.Distributionmapof100OSSDprojects
3.3Customizationsofmeasurementparameters
EPMcurrentlyprovidesuserswiththe ingthedatabaseschemaforEPMwhichisopentothepublic,usersareabletoinputSQLsequencesandtocreatebargraphs,linegraphs,anddistributionmapssuchasFigure6.Becausewewouldliketosupportvar-iousprojectsandorganizationswhichhaveownproblemsrespectively,wedecidedtoprovidetheminimumtypesofgraphsandsummaryinformationratherthantoprovidealotoftheminadvance.Afterfeedbackfromsoftwareorga-nizationsusingEPM,wewilladdothertypesofgraphsinthenearfuture.CurrentlyEPMcanbeviewedasatoolforexploratorydataanalysis[5].
4Discussion
Inthissection,wereportacasestudyofapplyingEPMtoourprojectitself,inordertoobservetheactualusageofthepre-de ned5typesofgraphsmentionedabove.Wehaveinterviewedfourdevelopersontheadvantagesandthedis-advantagesofusingEPM.ThedevelopmentenvironmentofthisprojectissummarizedinTable1.
Oneoftheadvantagesisthatthegraphsmakedeveloperseasytounderstandthestatusoftheprojectbyidentifyingdistinctivepartsindicatedinthegraphs.Forinstance,thepartofthe atlineintheLoCgraphremindedthemwhythedevelopmentseemedtobestopped.Infact,alldeveloperswereonabusinesstripatthetime.Thiscouldhelpthem
Project management for effective software process improvement must be achieved based on quantitative data. However, because data collection for measurement requires high costs and collaboration with developers, it is difficult to collect coherent, quantita
Table1.EPMdevelopmentprojectincreasetheaccountabilityfortheirmanagers.Otheroneisthatthegraphsgeneratedinrealtimemotivateddevelopersto xbugs,sincetheycouldbeawarethattherewerestillunresolvedissues.
Incontrasttotheseadvantages,someproblemsrelatedtotheusageofEPMhavebeenfound.Oneisthatvisual-izationsaretoocomplicatedtounderstandthestatusoftheprojectinsomecases.Forinstance,developerscouldnotdistinguishwhich leversionscorrespondedtowhichver-ticallinesinFigure3,sinceonedevelopercheckedinCVSforbackupofhis leseverydayandthereforeanumberofcheckinsoccurred.Inthiscase,developersmightneedtousetwoCVS(e.g.oneisforsoftwarereleaseandanotherisforbackup).
TheaboveresultsarestilltheinitialevaluationsforEPM.EPMwillbeintroducedinsomesoftwarecompaniesinthenearfuture.WeintendtoevaluatetheusefulnessofEPMwithrespectto(1)theeffectsonsoftwaredevelop-mentandprocessimprovementbyprovidingmeasurementresultsfrommultiplesoftwarerepositories,and(2)theben-e tofgivingthecapabilitytomanagemultipleprojects.
5ConclusionandFutureWork
ThegoalofthisresearchistoconstructanenvironmentforsupportingmeasurementbasedsoftwaredevelopmentaccordingtotheESEEframework.Inthispaper,weintro-ducedEmpiricalProjectMonitor(EPM)asapartialimple-mentationofESEE,whichhelpsdevelopers/managerskeepprojectsundercontrolbyprovidingvariousvisualizationsofmeasurementresultsrelatedtoprojectactivities.Nowa-days,wecangatherandanalyzemassivedataonsoftwaredevelopmentinalargescaleusingrapidlygrowinghard-warecapabilities.Byanalyzingsuchthehugedatacol-lectedfromthousandsofsoftwaredevelopmentprojects,wewouldliketoprovideusefulknowledgeandbene tnotonlytoindividualdevelopers/managersbutalsotoorganizations.Empiricalstudyonsoftwaredevelopmentisanactiveareainthe eldofEmpiricalSoftwareEngineering(ESE).ButtheapproachesofESEhavenotbeensuf cientlyap-pliedtosoftwaredevelopmentinsoftwareindustryalthoughcompaniesholdmanyproblems.Thedatarelatedtosoft-waredevelopmentfromtheindustrialworldhasseldom
beenprovidedwithuniversity’sresearch.Wearecollabo-ratingwithsomesoftwaredevelopmentcompaniesasthe
EASEproject.Therefore,itwouldbeastrongtriggerforgoingbeyondtheobstacleofthetechnicalprogressinsoft-wareengineering.
Acknowledgment
ThisworkissupportedbytheComprehensiveDe-velopmentofe-SocietyFoundationSoftwareprogramoftheMinistryofEducation,Culture,Sports,ScienceandTechnology.WethankSatoruIwamura,EijiOnoandTairaShinkaiforsupportingthedevelopmentofEmpiricalProjectMonitor.
References
[1]A.Aurum,R.Jeffery,C.Wohlin,andM.Handzic.Manag-ingSoftwareEngineeringKnowledge.Springer,Germany,2003.
[2]V.Basili.GoalQuestionMetricParadigm,inEncyclopedia
ofSoftwareEngineering(J.Marciniaked.),pages528–532.JohnWeilyandSons,1994.
[3]V.Basili.Theexperimentalsoftwareengineeringgroup:A
perspective.ICSE’00awardpresentation,June2000.Lim-erick,Ireland.
[4]L.Briand,C.Differding,andD.Rombach.Practicalguide-linesformeasurement-basedprocessimprovement.Techni-calReportISERN-96-05,DepartmentofComputerScience,UniversityofKaiserslautern,Germany,1996.
[5]S.Card,J.Mackinlay,andB.Shneiderman.Readingsin
InformationVisualization:UsingVisiontoThink.Morgan-KaufmannPublishers,SanMeteo,CA,1999.
[6]EASE.TheEASE(EmpiricalApproachtoSoftwareEngi-neering)project,http://www.empirical.jp/intex-e.html.
[7]D.GermanandA.Mockus.Automatingthemeasurementof
opensourceprojects.InProceedingsofthe3rdWorkshoponOpenSourceSoftwareEngineering,pages63–67,Portland,Oregon,2003.
[8]J.D.Herbsleb,A.Mockus,T.A.Finholt,andR.E.Grinter.
Anempiricalstudyofglobalsoftwaredevelopment:Dis-tanceandspeed.InProceedingsofthe23rdinternationalconferenceonSoftwareengineering(ICSE’01),pages81–90,Toronto,Canada,2001.
[9]M.Ohira,R.Yokomori,M.Sakai,K.Matsumoto,K.Inoue,
andK.Torii.Empiricalprojectmonitor:Automaticdatacol-lectionandanalysistowardsoftwareprocessimprovement.InProceedingsof1stWorkshoponDependableSoftwareSystem,pages141–150,Tokyo,Japan,2004.[10]SPARS.TheSPARS(SoftwareProductArchiving
andRetrievingSystem)project,http://iip-lab.ics.es.osaka-u.ac.jp/SPARS/index.html.en.
[11]T.Zimmermann,P.Weissgerber,S.Diehl,andA.Zeller.
Miningversionhistoriestoguidesoftwarechanges.InPro-ceedingsofthe26thInternationalConferenceonSoftwareEngineering(ICSE’04),Edinburgh,Scotland,UK,2004(toappear).
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- project
- Empirical
- multiple
- monitor
- mining
- tool
- data
- 广州公司注册、公司变更及其内容
- 600MW火电机组定期工作标准-封面及前言
- 欢迎订阅2007年度《有色冶金节能》(双月刊)
- 浅谈钢结构防腐处理
- PN结正向伏安特性曲线随温度的变化
- 混凝土标号与强度等级1212
- COMP5318 Knowledge Discovery and Data Mining_2011 Semester 1_week3chap6_basic_association_analysis
- 北师大版 小学数学五年级下册期末测试卷
- 【理想树600分考点 700分考法】2016届高考历史二轮专题复习 专题17 第二次世界大战后世界政治格局的演变
- 初中语文教学案例反思举例
- 从友文槟榔看湖南槟榔演化简史
- 人教版二年级语文上册生字描红字帖
- 斐索实验中的多普勒效应
- Higher_Education_in_China_AP97
- 宝山钢铁股份有限公司招股意向书
- 多联机空调机组吊装方案_secret
- 电梯事故故障紧急救援预案
- 2014年秋季学期九年级英语第18周周考试卷
- 化疗后白细胞低吃什么食物
- 第二单元 认知新自我