An Uncertainty-Aware Approach for Exploratory Microblog Retrieval
更新时间:2023-06-09 12:52:01 阅读量: 实用文档 文档下载
- angelababy推荐度:
- 相关推荐
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2015.2467554, IEEE Transactions on Visualization and Computer Graphics
An Uncertainty-Aware Approach for Exploratory Microblog RetrievalMengchen Liu, Shixia Liu, Xizhou Zhu, Qinying Liao, Furu Wei, and Shimei PanLegend:Upper Extreme h , > ∥ , Lower Extreme#pjnet#makedclisten
0
1.0#alaska
#congress
#lnyhbt
#t2sda
#science
#breaking#furlough#ff
#budget#democrats#debtceiling#dems#debt#debtlimit#us#usa#cr#tcot#wwiimemorial
#barrycades#veterans#spitehouse#healthcare#impeachobama
#benghazi#tlot#military#ebt
#nationalparks#dems#pjnet#tcot
#politics#cnn
#dc
(b)#spitehouse
#default#government
#irs#obamacare#obama
#texas#gopshutdown#gop#edshow#msnbc
#science
#nationalparks#news
A
#shutdown#govtshutdown#obamashutdown#senate
#tgdn#truth
#jobs#economy#getcovered
#military#tlot#ebt
#spitehouse(d)
#military
#republicans#maddow#republican#teaparty#p2
#libcrib#demandavote#retweet#potus
#obamacare#obama#sot#jobs#economy
#ebt(e)
#immigration#america#fail
#dearcongress
#cspanchat#house
#topprog#sequester#uniteblue#vote#stoprush#enoughalready#aca#koch#wic#1u#endthisnow#inners#tedcruz#cleancr#boehner#justvote
#obamacare#getcovered#doctorwho#maddow#teaparty#p2
#teaparty
(a)
(c)
(f)
Fig. 1. Exploratory retrieval of the government shutdown dataset: (a) the hashtag graph with uncertainty and its propagation; (b) uncertainty propagation; (c)-(f) interactive ranking re nement results. Abstract— Although there has been a great deal of interest in analyzing customer opinions and breaking news in microblogs, progress has been hampered by the lack of an effective mechanism to discover and retrieve data of interest from microblogs. To address this problem, we have developed an uncertainty-aware visual analytics approach to retrieve salient posts, users, and hashtags. We extend an existing ranking technique to compute a multifaceted retrieval result: the mutual reinforcement rank of a graph node, the uncertainty of each rank, and the propagation of uncertainty among different graph nodes. To illustrate the three facets, we have also designed a composite visualization with three visual components: a graph visualization, an uncertainty glyph, and a ow map. The graph visualization with glyphs, the ow map, and the uncertainty analysis together enable analysts to effectively nd the most uncertain results and interactively re ne them. We have applied our approach to several Twitter datasets. Qualitative evaluation and two real-world case studies demonstrate the promise of our approach for retrieving high-quality microblog data. Index Terms—microblog data, mutual reinforcement model, unc
ertainty modeling, uncertainty visualization, uncertainty propagation.
1 I NTRODUCTION Microblogs such as Twitter and Facebook are among the most popular platforms for people to share their daily observations and thoughts, M. Liu and S. Liu are with Tsinghua University. E-mail: simon900314@, shixia@. S. Liu is the corresponding author. X. Zhu is with USTC. E-mail: ezra0408@. Q. Liao and F. Wei are with Microsoft. E-mail:{qiliao,fuwei}@. S. Pan is with University of Maryland, Baltimore County. E-mail: shimei@umbc.edu Manuscript received 31 Mar. 2015; accepted 1 Aug. 2015; date of publication xx Aug. 2015; date of current version 25 Oct. 2015. For information on obtaining reprints of this article, please send e-mail to: tvcg@.
including personal status updates and opinions regarding products or government policies. Since the crowd in microblogs provides many individual comments/opinions that were not available before, businesses and organizations have begun to leverage microblogs to pro le customers, derive brand perception, gauge citizen sentiments, and predict the stock market[23, 34, 41, 53]. For example, retailers track and examine relevant microblog posts to understand customer opinion toward their products and services. In spite of the growing interest in quickly analyzing customer opinions or breaking news in microblogs, progress has been hampered by the lack of an effective mechanism to retrieve data of interest from microblogs. For this reason, researchers have developed a number of microblog retrieval methods[6, 17]. The main goal is to generate a list of k microblog posts that are relevant to the information needs represented by a query q. Although these methods have successfully retrieved
1077-2626 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See /publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2015.2467554, IEEE Transactions on Visualization and Computer Graphics
relevantposts,theyhavetwomajordrawbacks.First,theuniquecharacteristicsofmicroblogdataarenotcomprehensivelyconsideredtoimproveretrievalperformance.Posts,users,andhashtagsarethreekeydimensionsofmicroblogdata.Thesedimensionsarenotindependent,astheyoftenin uenceoneanother.Forexample,apostpublishedbyanin uentialuserandlabeledwithapopularhashtagtendstobesalient.However,theexistingapproachestomicroblogretrievaldonottightlyintegratethethreedimensionsanddonottakeadvantageoftherelationshipsamongthem.Theseapproacheseitherconsideronlythepostortreatitastheprimarydimensionandtheothersassecondarydimensionsto lterposts.Forexample,ScatterBlogs2[6] ndspostsofinterestbycheckingwhetherthepostscontainacertainhashtag.Second,existingmethodsdonotaddressuncertaintyintheretrievalmodels.Improvingthemodelingandpresentationofuncertaintycanhelpdescriberetrieveddatamoreaccurately,whichcaninturnassistanalystsinmakingmoreinformeddecisions[14,39,49].
Toaddresstheaboveproblems,wehavedevelopedanuncertainty-awaremicroblogretrievaltoolkit,MutualRanker,toestimateuncertaintyintroducedbytheanalysisalgorithmaswellastoquicklyretrievesalientposts,users,andhashtags.Sincepostsaresharedandpropagatedonsocialnetworks,theauthorityofanauthorandthepopularityofahashtagplayanimportantroleindeterminingtheimportanceofapostandviceversa.Accordingly,weformulateuncertainty-awaremicroblogretrievalusinganuncertainty-basedmutualreinforcementgraphmodel(MRG)[16,45],wherethecontentqualityofposts,thesocialin uenceofusers,andthepopularityofhashtagsmutuallyreinforceoneanother.WeadoptaMonteCarlosamplingmethodtosolveMRGbecauseofitseffectivelocalupdatemechanism,fastconvergence,andprobability-baseduncertaintyformalization[2].Adispersion-basedmeasureisusedtoestimatetheuncertaintygeneratedbytheMonteCarlosamplingmethod.Inaddition,wemodeltheuncertaintypropagationasaMarkovchain.Tohelpanalystsunderstandtheretrieveddata,wehavedesignedacompositevisualization[20].Speci cally,adensity-basedgraphvisualizationhasbeendevelopedtovisuallyillustrateposts,users,hashtags,andtheirrelationships.Anuncertaintyglyphanda owmapareemployedtorepresentuncertaintyanditspropagationonagraph(Fig.1).Thethreevisualizationcomponents,togetherwiththeuncertaintyanalysis,enableanalyststoquicklydetectthemostuncertainresultsandinteractivelyresolvethem.TheMonteCarlosamplingmethodisthenusedtoincrementallymodifytherankingresultstomeetuserneeds.
Insummary,ourworkpresentsthreetechnicalcontributions:
Anuncertain-awaremicroblogretrievalmodelthatextractssalientposts,users,andhashtags.Thismodelalsocomputestheassociateduncertaintyanditspropagationamonggraphnodes. Acompositevisualizationthatenablesuserstounderstandthethree-level,mutualreinforcementrankingresults,theassociateduncertainty,anduncertaintypropagationpatterns.
Avisualanalyticssystemthathelpsusersquicklyretrievedataofinterest,aswellasanalyzeandunderstandtherankingresultsinaninteractiveanditerativeprocess.2RELATEDWORK
2.1MicroblogRetrieval
Inthe eldofdatamining,anumberofapproacheshavebeenpro-posedtoretrievedatafrommicroblogs.AcomprehensivesurveywaspresentedbyCherichiandFaiz[12].Mostrecentworkcanbecatego-rizedintotwogroups:vector-space-basedapproachesandlinkanalysisapproaches.
Thevector-space-basedapproachemploystwofeaturevectorstorepresentaqueryandapost.Asimilaritymeasure(e.g.,cosinesimi-larity)isthenadoptedtoestimatethesimilaritybetweenthepostandthequery.TherehavebeensomerecentresearcheffortsthatexploitadditionalstructuralfeaturessuchasURLsandhashtagstoenhanceretrievalperformance[1,29,31].
Recently,totakeadvantageofthelinkstructureofsocialnetworks,researchershaveintroducedthePageRankalgorithm[7]inmicroblogretrieval.Forexample,TwitterRank[46]adoptsthefollower-followee
linkstructureandthePageRankalgorithmtoidentifyin uentialusers.Duanetal.[16]modeledthetweet-rankingproblemasanMRG[45],wherethesocialin uenceofusersandthecontentqualityoftweetsmu-tuallyreinforceeachother.Speci cally,thepostgraph,theusergraph,andthehashtaggraph,aswellastherelationshipsbetweenthethreegraphs,wereusedtoretrievesalientposts,users,andhashtags.Weex-tendthisapproachbyexplicitlymodelingtheuncertaintyoftherankingresult,aswellasitspropagationonthetweet/user/hashtaggraph.Inthe eldofvisualanalytics,agreatdealofresearchhasbeenconductedonvisuallyanalyzingmicroblogdata.Themethodsappliedincludeeventdetection[30],topicextractionandanalysis[25,40,50],informationdiffusion[8,52],sentimentanalysis[47,48],andrevenue/stockprediction[28,34].However,fewstudieshavefocusedonmicroblogretrieval.
Boschetal.[6]developedScatterBlogs2toextractmicroblogpostsofinterest.Itallowsanalyststobuildcustomizedpost ltersandclas-si ersinteractively.These ltersandclassi ersarethenutilizedtosupportreal-timepostmonitoring.Inpost ltering,thepostdimensionisconsideredtheprimarydimensionandthehashtagthesecondarydimension.Incontrast,wetightlyintegratetheposts,users,andhash-tagsintheMRGmodelandusethemodeltoretrievehigh-qualitymicroblogdata.Moreover,wealsomodeluncertaintyintheretrievalprocess.Sinceanalystscaninteractivelyre nethemodel,wecanfurtherimproveretrievalqualitybyleveragingtheuncertaintyformal-izationandanalysts’knowledge.
2.2InteractiveUncertaintyAnalytics
Frequently,uncertaintyisintroducedintovisualanalyticswhendataisacquired,transformed,orvisualized[14,24,27].Anumberofuncer-taintyanalysismethodshavebeenproposed,whichcanbecategorizedintotwogroups:uncertaintyvisualizationanduncertaintymodeling.Manystudiesonuncertaintyvisualizationhavebeenconductedinthe eldofgeographicvisualizationandscienti cvisualization[32,37,42].Typicaluncertaintyrepresentationtechniquesincludetheadditionofglyphsandgeometry,themodi cationofgeometryandattributes,an-imation,soni cation,andpsycho-visualapproaches[32].Recently,researchersareincreasinglyinterestedinthedesignofuncertaintyrep-resentationsforinformationvisualizationandvisualanalytics.Forexample,Collinsetal.[13]designedtwoalternatives,thegradientborderandthebubbleborder,toillustrateuncertaintyinlatticegraphs.Wuetal.[48]developedacircularwheelrepresentationandsubjectivelogictoconveyuncertaintyincustomerreviewanalysis.Slingsbyetal.[38]utilizedbarchartstorevealtheuncertaintyassociatedwithgeodemographicclassi ers.Torepresentuncertaintyinaggregatedvertexsets,Vehlowetal.[43]consideredthelightnessandshapeofthenode.Chenetal.[10]paredwiththesemethods,MutualRankernotonlyvisualizesun-certainty,butalsoitspropagationonagraph.Wealsosupportuserstointeractivelymodifytheuncertainresult.
Anothertypeofuncertaintyvisualizationrepresentstheuncertaintyintheanalysisprocess.ZukandCarpendale[55]studiedissuesrelatedtouncertaintyinreasoninganddeterminedthetypeofvisualsupportrequired.Correaetal.[14]developedaframeworktorepresentandquantifytheuncertaintyinthevisualanalyticsprocess.Wuetal.[49]extendedthisframeworktoshowtheuncertainty owintheanalysisprocess.Bycontrast,ourworkaimstomodeluncertaintyinmicroblogretrieval.Wefocusonvisuallyillustratingtopologicaluncertaintypropagationonagraphandondesigninganiterativevisualanalyticsprocesstoactivelyengageanalystsinreducingoveralluncertainty.Probabilitytheory,fuzzysettheory,roughsettheory,andevidencetheoryarefourmajorapproachestomodeluncertainty[54].Amongtheseapproaches,probabilitytheoryisthemostcommonlyusedmethodinvisualanalytics.Forexample,Correa[14]andWuetal.[49]re-gardeduncertaintyasaparameterthatdescribesthedispersionofmea-suredvalues.Speci cally,theyrepresenteduncertaintyasanestimatedstandarddeviation,inwhichthemeasuredvalueisde nedonthesetofbothpositiveandnegativerealnumbers.Sincethemeasuredvalue(therankingscore)inourapproachisde nedonthesetofpositivereal
andoneresearcherinmediaandcommunications(C).Theexpertsareexperiencedinretrievingdatafrommicroblogs.Theyalsohadexperienceusingamethodsimilartotheonedescribedabove.Weconductedseveralinterviewswiththem,mainlyfocusingonprobingtheirneedsandmicroblogretrievalprocess.Wethenidenti edthefollowinghigh-levelrequirementsbasedontheirfeedback.
R1-Examininganinitialsetofsalientmicroblogdata.Bothex-pertsexpressedtheneedforarankinglistofkeywordsearchresults.Keyword-basedmicroblogretrievalresultsoftenincludemillionsofpostsandtensofthousandsofusersandhashtags.Thus,theseresultsaretoomassiveforanalyststoquicklydiscoverrelevantdata.Theexpertsusuallyhavetoexaminethedatacarefullyanddesignasetofrulesto lteroutirrelevantdata.Asaresult,theystatedtheneedforatoolkitthatcanrankextractedposts,users,andhashtagstofacilitatetheirdataretrievaltasks.Thisneedisconsistentwiththe ndingsofpreviousresearch[16,46].
R2-Revealingrelationshipswithinmicroblogdata.Previousre-search[11,16,25]hasalsoindicatedthattherelationshipswithindatahelpuserslocateinterestinginformationmoreeasily.Furthermore,therelationshipsamongthethreedimensionsofmicroblogdata(posts,users,andhashtags)canassisttheminextractingsalientdata.Forexample,postsfromopinionleadersareusuallymoreimportantthanthosefromaverageusers.Thedomainexpertsdesiredtheabilitytoexploredifferenttypesofrelationships.
R3-Exploringsalientmicroblogdatafromdifferentperspectives.Sincethethreedimensionsofmicroblogdatausuallyin uenceeachother,theexpertswantedtounderstandthisin uencesothattheycanlinkimportantdatainonedimensiontothatinanotherdimension.Forexample,expertSsaidthat,“Collectingrelevanttweetsisveryimportantforsomeofourprojects.After ndingoneimportanttweet,Iusuallycheckothertweetsfromthesameauthoraswellasthetweetsmarkedbythesamehashtag(s).Thishelpsme ndrelevanttweetsquickly.”
R4-Understandingtheerrorproducedbytherankingmecha-nism.Themicroblogdatarankingmechanismisnotperfectandoftenintroduceserrorsoruncertaintyintotheretrievalprocess.Thus,thedegreeofuncertaintymustbeanalyzedandunderstoodtofacilitateinformeddecision-making[14,37,49].Theexpertsrequestedtoknowwhichrankingscoresaremoreerror-prone.
R5-Analyzingthein uenceoftheerrorsofoneitemonotheritems.Theexpertsalsoexpressedtheneedtounderstanderrorpropagationamongdataitems.Theyclaimedthatthisinformationcanhelpthemconsiderablyin lteringoutirrelevantdata.Forexample,expertCcommented,“WhenI ndanitemwithanincorrectranking
erinterface:(a)MutualRankervisualization;(b)controlpanel;(c)informationpanel.
score,Ialsowanttoknowwhichitemsarein uencedbythissothatIcanadjusttherankingscorequickly.”3.2SystemOverview
Thecollectedrequirementshavemotivatedustodevelopavisualana-lyticstoolkit,MutualRanker.Itconsistsofthefollowingcomponents: AnMRGmodeltogeneratetheinitialrankinglistsofposts,users,andhashtags(R1);
Anuncertaintymodeltoestimateuncertaintyanditstopologicalpropagationonagraph(R4,R5);
Acompositevisualizationtopresentthegraph-basedrankingresults,uncertainty,anditspropagation(R2,R3).
TheprimarygoalofMutualRankeristoextractalistofkmicroblogposts/users/hashtagsthatarerelevanttoqueryq.Fig.2illustratesthemaincomponentsneededtoachievethisgoal.Givenamicroblogdatasetextractedbyaquery,thepreprocessingmodule rstextractsthepostgraph,theusergraph,andthehashtaggraph.ThethreegraphsarethenfedtotheMRGmodel,whichproducesthreerankinglistsofposts,users,andhashtags.Theuncertaintymoduleestimatestheuncertaintyintheretrievalmodelanditstopologicalpropagation.Thevisualizationmoduletakestherankingresultsandtheuncertaintyestimationasinputandillustratestheminacompositevisualizationthatincludesagraphvisualization,anuncertaintyglyph,anda erscaninteractwiththegeneratedvisualizationforfurtheranalysis.Forexample,ausercanmodifyarankingresult.Withthisinput,MutualRankerwillincrementallyupdatetherankingresults.
Fig.3depictstheuserinterfaceofMutualRanker.Itcontainsthreedifferentinteractionareas:MutualRankervisualization(Fig.3(a)),controlpanel(Fig.3(b)),andinformationpanel(Fig.3(c)).Thevisual-izationviewconsistsoftwoparts:1)thestackedtreevisualizationthatshowsthehierarchicalstructureofmicroblogdata;2)thecompositevisualizationthatsimultaneouslyrevealtheretrievedmicroblogdata,theuncertaintyoftherankingresults,anditstopologicalpropagation.Thecontrolpanelconsistsofasetofcontrolsthatenableuserstointeractivelyupdatetheranking.Theinformationpaneldisplaysthecorrespondingmicroblogdatasuchasposts,users,andhashtagsforaselectedaggregateitem.
10.1109/TVCG.2015.2467554, IEEE Transactions on Visualization and Computer Graphics#obamacare
#debtceiling
Hashtag Graph
GRAPH
mainofMRG[16,45]isthatitemploys
boththerelation-users,orhashtags,andtherelationshipsbetween
rankings.Thisfeaturesigni cantlyreducesthework-wheninteractingwithourvisualanalyticssystem.Foranalystmodi estherankingscoreofahashtag,MRGupdatestherankingscoresoftheneighboringPost Graphhashtags,butalsothoseofrelevantusersandposts.Thisprocessallowsoursystemtointegrateuserknowledgeintothevisualanalyticsprocesswithacceptableusereffort.ThisisalsothemainreasonwhyweadoptMRGinMutualRanker.
5.1MRGComputationwithMonteCarloSamplingMethodDuanetal.[16]proposedamatrix-basedmethodtosolveMRG,whichiterativelyupdatestherankingscoresusingEq.(1).Thematrix-basedmethodisaglobalone.Anupdatetoanyitemisachievedbyrunningthemethodontheentireitemset,whichisverytime-consuming.Toaddressthisproblem,weusetheMonteCarlosamplingmethod.Theadvantagesofthissamplingmethodoverthematrix-basedmethodareasfollows[2]:
Rankingscoresarelocallyupdatedwhentheinputchangeslo-cally;
Therankingscoresofimportantitemsareaccuratelyestimatedafterafewiterations;
TheuncertaintyoftherankingscoresaremodeledaccuratelybecausetheMonteCarlomethodcalculatesvariancestatistically.Toemploythismethod,we rstsolveRasformulatedbyEq.(2):
R=(1 d)(I dM) 1W=(1 d)(∑dkMk)W.
k=0∞
(3)
Wethenperformaseriesofrandomwalksforeachitem.Arandomwalkmaystopateachstepwithaprobabilityof1 d.Ifthewalkcontinues,thenitproceedstothenextstepaccordingtothematrixM.Eachelementmijde nesthetransitionprobabilityfromitoj.
∞
LetZ=∑dkMk.Therankingscoreofeachitemis:
k=0
Fig.4.Mutualreinforcementmodel.
rj=(1 d)∑wizij,
i=1
N
(4)
TheinputofMRGincludesthreegraphs,thepostgraph,theusergraph,andthehashtaggraph,aswellastherelationshipsamongthem.ThethreegraphsandtheirrelationshipsareshowninFig.4.Asin[16],thepostgraphisbuiltbasedoncosinesimilarity.ArecentstudyhasshownthatcosinesimilaritywithaTFIDFweightingschemeisthemostappropriatemeasuretocomputethesimilaritybetweenmicroblogposts[35,51].Asaresult,weemploycosinesimilarityinoursystem.Theusergraphisconstructedbasedonfollower-followeerelationships.Thehashtaggraphisgeneratedaccordingtotheco-occurrenceoftwohashtags.Thethreegraphsarealsoconnectedbytworelationships:authorshipandco-occurrence.Ifauserpublishesapost,thenweconnectthisuserwithhis/herpost.Wealsolinkthisuserwithallofthehashtagsinthispost.Eachpostisalsolinkedtoallofthehashtagsassociatedwithit.
Forsimplicity,weuniformlydenoteposts,users,andhashtagsasitemsinthefollowingdiscussion.
TheMRGemploysamethodsimilartoPageRank[7]tomodelthemutualin uenceamongdifferentitemsinheterogeneousgraphs:
RpαppMpp Ru =d αpuMpuRhαphMph
αupMup
αuuMuuαuhMuh
αhpMhpRpWpαhuMhu Ru +(1 d) Wu .αhhMhhRhWh
(1)
wheretheelementzijinZistheaveragenumberoftimesthatarandom
walkstartingfromitemivisitsitemj.Weestimatezijbycomputingtheempiricalmeanofanumberofrandomwalks.
Duanetal.[16]onlyconsiderthesimilarityofitemsincomputingmij.Thus,high-rankingscoresmaybeincorrectlyassignedtouserswhopublishmanypoststhatdonotreceiveattention.Toaddressthis,weconsiderthepriorsaliencyofitemsinthesamplingprocess.Speci -cally,thetransitionprobabilityfrommijisde nedassimilarity(i,j)·wj.5.2UncertaintyModeling
InMutualRanker,weuseanapproximationmethodtosolveMRG,whichmayintroduceuncertaintyintotheretrievalresults.Itisthereforeimportanttomodeluncertainty.SinceweemploytheMonteCarlosamplingmethod,thedistributionofeachrankingscoreisknown.Hence,wecanemploytheprobabilitytheorytomodeluncertainty.Uncertaintyisde nedasaparameterfordepictingthedispersionofvaluesthatcanbereasonablyattributedtothemeasuredvalue[5].Traditionalmethodsmodelthemeasuredvalueasanormallydistributedrandomvariable[14,49].Variance[14]andstandarddeviation[49]areamongthemostcommonlyusedmeasurestorepresentuncertaintywhereinthemeasuredvalueisde nedonthesetofbothpositiveandnegativerealnumbers.
Themeasuredvalue(rankingscore)inourapproachisde nedonthesetofpositiverealnumbers.Thus,theabovemodelingmethodcannotbeapplieddirectlytoourwork.
Accordingto[2],zijinEq.(4)hasaPoissondistribution.Therankingscoreistheweightedsumofaseriesofzij.Hence,therankingscoreismodeledasaPoissonmixture.ForaPoissonmixture,thevarianceisapproximatelyproportionaltothemean.Hence,ifweusevariancetomodeluncertainty,thelargertherankingscore,themoreuncertainitis,butthisisnotalwaystrue.
Standarddeviationisthesquarerootofvarianceandhasasimilarproblem.Consequently,varianceandstandarddeviationarenotgoodmeasuresfordepictinguncertaintyinourmodel.
Forsuchadistribution,acommonlyusedmeasureofdispersionisthevariance-mean-ratio(VMR)[15].ThehighertheVMR,themoredispersedthedistribution.Foritemj,itsVMR(uj)canbede nedas:
uj=vj/rj,
(5)
themutualreinforcementstrengthamongposts,users,andhashtags.disthedampingfactorinPageRank,andwesetitto0.85,asin[7].Wp,Wu,andWharevectorsforthepriorsaliencyoftheitems(e.g.,thecontentqualityofposts,thesocialin uenceofusers,orthepopularityofhashtags). LetR= Ru ,W= Wu ,andM= αpuMpu
Rh
Wh
Rp
Wp
αppMppαphMph
αupMup
αuuMuuαuhMuh
αhpMhpαhuMhu αhhMhh
Rp,Ru,andRharetherankingscorevectorsofposts(p),users(u),andhashtags(h).Mxydenotestheaf nitymatrixfromxtoy,wherex,ycanbeposts,users,orhashtags.αxyisaweightusedtobalance
.
Then,Eq.(1)canbesimpli edas:
R=dMR+(1 d)W.
(2)
5MRG-BASEDUNCERTAINTYANALYSIS
SinceexactinferenceofMRGisverytime-consumingonalargegraph,weapproximateitusingamoreef cientMonteCarlosamplingmethod.Wealsoexplicitlymodeltheuncertaintyassociatedwitheachitem(e.g.,apost,auser,orahashtag),aswellasitspropagationonthegraph.
wherevjisthedistributionvarianceoftherankingscoreofitemj.Accordingto[2],vjcanbecalculatedasfollows:
10.1109/TVCG.2015.2467554, IEEE Transactions on Visualization and Computer Graphics
vj=(1 d)2∑w2ivzij.
i=1N
(6)
wherevzijisthevarianceofzij.EachzijobeysaPoissondistribution
anditsvariancecanbecalculatedfromitsexpectation.
Themassivenumberofitemsinthemicroblogdatameanswecannotplaceallofthemonthescreen.Hence,weaggregatesimilaritemstoformacluster.Theoverallrankingscoreofaclusterrcisde nedasthesumoftherankingscoresofitsitems[4].Therankingscoresareindependentofeachotherandtheoverallvarianceofthecluster,vc,isthesumofthevarianceoftherankingscores.Thus,theuncertaintyofacluster,uc,canbecalculatednaturallybydividingvcandrc.
uc=vc/rc=∑(rj/rc)(vj/rj)=∑kjuj.
j∈c
j∈c
(c)
Fig.5.Topologicaluncertaintypropagationcalculation
MtoM .Thischangeonlyaffectsasmallpartoftherandomwalks
(7)
Eq.(7)showsthatuccanbeexpressedbyaweightedsumoftheuncertaintyofitsitemswhereeachweightkjistheratiooftherankingscoresofitemjandclusterc.Thus,theuncertaintyofaclusterismainlydeterminedbyitsimportantitems.
5.3TopologicalUncertaintyPropagation
Ifananalyst ndsanincorrectlyrankeditem,hecanmodifyitbasedonhisknowledge.Hecanfurthertrackhowtheuncertaintypropa-gatesfromoneclustertoanothertoidentifyotheraffecteditems.Tohelpananalysttrackuncertainty,weexplicitlymodelitstopologicalpropagationonthegraph.
InMRG,therankingscoreofanitemcanbeexpressedasalinearcombinationofrankingscoresofrelateditems.Hence,thevariancesofarankingscorecanalsobeexpressedasalinearcombinationofthevariancesofrelatedrankingscores.Theuncertaintyofeachitemcanbecalculatedfromitsrankingscoreanditsvariance,andhence,theuncertaintyofanitemcanalsobeexpressedlinearlybytheuncertaintyofotheritems.Speci cally,
uj=
i=1,i=j
∑
N
m ijui,
(8)
222whereeachm ij=(dmijri)/((1 dmjj)rj).Eq.(8)showsthattheuncer-taintyofeachitemisnotindependentanditpropagatesonthegraphin
alinearform.Thus,foreachpairofitemsiandj,m ijuicanbeviewedasthepropagateduncertaintyfromitemitoj.Wedenoteitbyui→j.RewritingEq.(8)inamatrixform,wecanformulatetheuncertaintypropagationasaMarkovchain:
usedintheMonteCarlosamplingmethod.
Fortheaffectedrandomwalks,existingincrementalgraphrankingalgorithms[3]performre-samplingandupdatetherankingscoresbyaggregatingthestatisticsofthesenewrandomwalksintotheoriginalresults.Onemainproblemwiththesealgorithmsisthatre-samplingrequiresaconsiderableamountoftime,whichmaymakereal-timeinteractionimpossible.Supposenistheaveragenumberofneighborsthatanitemhasandlistheaveragelengthofasampledrandomwalk.Ateachstepinarandomwalk,wehavetosamplefromamultinomialdistributionwithnpossibleoutcomesandthetimecostisO(n).Thus,samplinganewrandomwalkwilltakeO(nl)time.ThetimeneededtocomputeandaggregatethestatisticsofthesesamplesisO(l).ThetotaltimerequiredforanewsampleisO(nl)+O(l)=O(nl).
However,inourscenario,wedonotdeleteoraddedgesonthegraphs.Asaresult,wedonotneedtoperformre-sampling.Weonlyneedtomodifythestatisticsofarandomwalkbasedonthemodi edtransitionprobabilitymij,therebyavoidingthehighcostassociatedwithresampling.Thetimecostofupdatinganin uencedrandomwalkisreducedtoO(l).
Givenarandomwalk:path={i→n1→...→nk 1→j},wede neanewrandomvariablexikj.Inparticular,xikj=1indicatesthattherandomwalkstartsfromiandreachesjbymovingksteps.Theoriginalweightofeachstepintherandomwalkis1.Duringanupdate,were-calculatetheweightofthisstepusingP (xikj=1)/P(xikj=1).P(xikj=1)istheprobabilityofxikj=1accordingtoMandP (xikj=1)istheprobabilityofk=1accordingtoM .Hence,P(xk=1)iscalculatedby:xijij
P(xikj=1)=min1mn1n2...mnk 1j.
(12)
Similarly,P (xikj=1)canalsobecalculated.6
VISUALIZATION
UM =U,
(9)
whereU=[uj]1×NandM =[m ij]N×N.
Similartotheuncertaintypropagationfromitemtoitem,wecanmodeltheuncertaintypropagationfromclustertoclusterusingthefol-lowingprocedure.First,basedonEq.(8),wecalculatethepropagateduncertaintyfromeachitemiinthesourceclustercstoeachitemjinthetargetclusterct(Fig.5(a)).Second,foreachitemjinct,wecomputethepropagateduncertaintyucs→jfromcstoitemjbyaggregatingtheuncertaintypropagatedfromeachiteminthesourcecluster(Fig.5(b)).
ucs→j=
i∈cs
Tohelpanalystsextractmicroblogdataofinterestinteractively,wehavedesignedacompositevisualizationthatincludesagraphvisualization,anuncertaintyglyph,anda owmap(Fig1(a)).6.1RankingResultsasGraphVisualization
Sinceonepostcorrespondstoonlyoneuserandafewhashtags,thescopeofin uenceofapostissmallerthanthatofauserorahashtag.Updatingtherankingscoreofapostwillonlydirectlyaffecttherankingscoresofitsauthor,afewrelatedhashtags,andanumberofposts.Incontrast,updatingtherankingscoreofauserorhashtagwilldirectlyimpacttherankingscoresofhundredsoreventhousandsofpostsaswellasanumberofusersandhashtags.Ontheotherhand,thenumberofpostsisusuallyhuge,around10-100timesthatofusersorhashtags.Analystswouldrequiremoretimetoprovidetheirfeedbackonapostgraph.Asaresult,weregardtheuserandthehashtagastheprimaryvisualizationelementsandthepostasasecondaryelementmainlyusedtoillustratethecontentoftheprimaryelements.Accordingly,usersandhashtagsarevisuallyrepresentedbyanode-linkgraphwhereaspostsarerepresentedasalist.Forsimplicity,wetakeahashtaggraphasanexampletoillustratethebasicideaofgraphvisualization.
Toallowanalyststonavigatelargegraphsef ciently,ahierarchyisbuiltbasedonaBayesianRoseTree[26]witheachnon-leafnoderep-resentingahashtagcluster.AsshowninFig.3(a),astackedtreeisadoptedtorepresentthehashtaghierarchyandadensity-basedgraphvisualizationisemployedtoillustratetherelationshipswithintheuser/hashtaggraphsandbetweenthem(R2,R3).
∑ui→j.
(10)
Finally,theuncertaintyofaclusterisaweightedsumoftheuncertaintyoftheitemsinit(Eq.(7)).Thus,theoverallpropagateduncertaintyucs→ctfromcstoctcanbecalculatedastheweightedsumofthepropa-gateduncertaintyfromcstoeachjinct(Fig.5(c)).
ucs→ct=
j∈ct
∑kjucs→j.
(11)
5.4IncrementalRankingUpdate
Wealsoallowanalyststointeractivelymodifytheitemrankingresultbasedontheirknowledge.WecanupdatethemodellocallybecauseweusetheMonteCarlosamplingmethod.Aftertheanalystchangestherankingscore(s),ourapproachiterativelyupdatesthepriorsaliencescore(s)oftheitem(s).Accordingly,theaf nitymatrixischangedfrom
Fig.6.Basicideaofthelayoutalgorithm:(a)placetheclusternodesandderivethelayoutcenterofeachcluster;(b)computetheVoronoitessellationandtreatthecellasthelayoutareaofeachcluster;(c)layoutofrepresentativeandnon-representativenodes;and(d) nallayoutresultwithcontext.
Thedensity-basedgraphvisualizationcombinesanode-linkdiagramwithadensitymaptodisplaythenodesattheselectedlevelofthehashtagtree.Asin[25],weextractrepresentativenodesforeachoftheclusternodesattheselectedtreelevelandassignothernon-representativenodestotheirclosestrepresentativenodes.AsshowninFig.1(a),therepresentativenodesaredisplayedasanode-linkdiagramandtheothernodesasadensitymap.Inthisvisualization,therepresentativenodesofoneclusterareplacedneareachothertore ecttheircloseness.Thesizeofthenodeencodesthesumoftherankingscoreofeachitem.Thecorrespondingusersareoverlaidaroundtheselectedhashtagnodetoprovidemoreanalysiscontext(Fig.6(d)).Layout.Thelayoutofthestackedtreeisquitestraightforward.Thus,weintroducethelayoutofthedensity-basedgraph,whichcontainsthefollowingsteps.
Step1:Derivethelayoutcenterofeachclusterattheselectedtreelevel.Webuildaclustergraphbycheckingtheedgeconnectionsbetweenthetwoclusternodes.Anedgeisaddedifasuf cientnumberofconnectionsbetweenthetwoclusternodescanbefound.Theclustergraphisthenplacedbyaforce-directedlayout[21].AsshowninFig.6(a),thepositionofeachclusternodeistreatedasthecenterofeachhashtagcluster.
Step2:Computethelayoutareaofeachcluster.Inthisstep,wecomputethecorrespondingVoronoitessellationbasedontheclustercenter.Thecorrespondingtessellationcellsaretreatedaslayoutareasofthehashtagclusters(Fig.6(b)).
Step3:Layoutofrepresentativeandnon-representativenodes.Inthisstep,theforce-directedlayoutisadoptedtoplacetherepresentativenodes.Toensuretherepresentativenodeswithinoneclusterareplacedinthecorrespondingclusterlayoutarea,arepulsionforceisaddedfromtheareaboundarytoeachnodewithinthisarea.Thekerneldensityestimation[22]isutilizedtorepresentthedistributionofnon-representativenodes(Fig.6(c)).
Step4:Layoutofthecontextwordcloud.Showingthehashtaggraphandusergraphsimultaneouslywouldintroducevisualclutter.Tosolvethisissue,wetreatthehashtaggraphasaprimaryelementandtheuserinformationascontext.Inparticular,whenahashtagnodeisselected,awordcloudthatincludestheuserswhousethishashtagislaidouttoprovideusercontext.Inthiswordcloud,theselectedhashtagisplacedinthemiddle.Asweep-line-basedwordcloudlayoutalgorithm[36]isemployedtoproducesuchawordcloud.Fig.6(d)showsalayoutresultwithawordcloudcontext.
Interaction.Thefollowinginteractionsareprovidedtoassistanalystsininvestigatingtherankingresultsfrommultipleperspectives.
Examiningtherankedmicroblogdataandtheirrelationships(R2).Thedensity-basedgraphvisualizationprovidesaneasywaytoexploretherankingresultsfromthehashtagoruserperspective.Utilizingthehashtaghierarchyallowstheanalysttoexploretherankingresultsfromaglobaloverviewtolocaldetails.Several lters,suchastheedgeortheglyph lter,enableanalyststocustomizethisvieweasily.Relevantposts,hashtags,andusersarealsoprovidedtohelpanalystsbetterunderstandthecontentoftheselectedclusternode.
Smoothlyswitchingbetweendifferentdatadimensions(R3).Inspiredbythecontextpopupinteractionin[18],wealsooverlaycontextofaselecteditemtoprovidefurthernavigationcues.Forexample,ifthe
Upper Hinge (quartile)Lower Hinge (quartile)
Lower Extreme
(a)(b)(c)
Fig.7.Designoftheuncertaintyglyph:(a)boxplot;(b)transformingtheboxplottotheuncertaintyglyph;(c)analternativedesign.
analystselectsahashtag,thelabelsofuserswhousethathashtagcanbeoverlaidaroundtheselectedhashtagviaawordcloud(Fig.11(a)).Iftheanalyst ndssomethingofinterest,thehashtaggraphwillbesmoothlytransitionedtotheusergraph(Fig.11(b)).
6.2UncertaintyasGlyph
Aftertestingwiththe rstprototype,theexpertsidenti edseveralincorrectrankingresults.Theyexpressedtheneedtobeinformedofsuchresults.Thisrequirementisrelatedintimatelywiththeconclusionofpreviouswork,whichstatedthateffectivelyconveyinguncertaintyisveryimportanttothevisualanalyticsprocess[14,49].Sincetherankingresultsareaggregatedintoclustersintheoverview,theexpertswantedtoexaminetheuncertaintydistributionoftheaggregatenode,includingtheminimumvalue(0),maximumvalue(1.0),lowerextreme,upperextreme,lowerhinge(25%),andupperhinge(75%).
Inspiredbytheboxplotdesign(Fig.7(a)),wehavedesignedaglyphtomeettheaboverequirements(Fig.7(b)).AsshowninFig.7(a),sixvaluesfromasetofdataareconventionallyusedinaboxplot,includingtheminimumandmaximumvalues,theextremes,andtheupperandlowerhinges(quartiles).Atotalof50%percentofitemsfallinbetweentheupperandlowerhinges.Tocombineaboxplotwithagraphnode,we rsttransformtheboxplottoaline-basedone,andthenbenditaroundtheupperboundaryofthenode(Fig.7(b)).Wealsoattemptedseveralalternativesintheparticipatorydesignprocesswithexperts.Fig.7(c)isoneofthem.Afterinteractingwiththisalternative,theexpertsstatedthatitwasconfusing.Theythoughtthattheitemwithmoreofa lledareainsideshouldbetheoneonwhichtheyshouldfocus.However,inreality,thesenodeswereonlynodeswithalargerareabetweentheupperandlowerhinges.APhDstudentfromanartschoollatercon rmedthatalargeramountofdigitalinkwillattractmoreattentionfromusers.Afterseveralinteractionswiththeexpertsandtheartstudent,wechooseFig.7(b)asour naldesign.
Analystscanobtainanoverviewoftheuncertaintydistributioninaclusterbyexaminingitsuncertaintyglyph.Fig.8illustratesseveralexamplepatterns.Forexample,inFig.8(a),themajorityofitemsinthisclusterarecharacterizedbylowuncertainty.However,theclusteralsocontainssomeitemswithhigheruncertainty.Asaresult,exploringtheitemswithhighuncertaintyisaworthwhileendeavor.
Interaction.Inadditiontoallowinganalyststoexaminetheuncertaintyscore(R4),wealsoprovidetheinteractionshownbelowtointegrateanexpert’sknowledgeintotheretrievalprocess.
Interactiverankingre nement.Afteranexpert ndsanincorrectrank-ingresultbyexaminingtheuncertaintyglyph,theexpertcanmodifytherankingresult.Therankingscoresofthecorrespondinggraphnodeswillalsobeupdatedaccordingly.AsshowninFigs.1(c)-(f),the
isquiteshort.Thus,ifweconsiderthismeasure,manyoftheselinesegmentswillnotbebundledtogether.
Thetotaledgecompatibilityisde nedby:
Ce(ei,ej)=Cα(ei,ej)·Cs(ei,ej)·Cp(ei,ej).
(a)mostitemsinhigheruncertaintyarealsoincluded;(b)mostitemsinthisclusterhavehighuncertaintyandsomeitemswithhigheruncertaintyalsooccur;(c)uncertaintydistributionisuniform;(d)mostitemsinthisclusterhaveloweruncertainty.
Fig.9(b)showsthematchedresultsofthepropagationpaths.
Step3:Computetheforcetobundlethepropagationpath.Thecom-binedforceforapointpioneiisde nedas:
Fpi=Ki( pi 1 pi + pi pi+1 )+
ej∈E
∑
pi pj ·Ce(ei,ej)
whereKiisthespringconstantforeachsegmentandEisthesetofallthematchededgesofei.In[19],thelastitemistheelectrostaticforceFe=1/ ei ej .Inordertobundlethematchedpathsthatarelocatedawayfromeachother,wereplaceitwithanattractingspringforce.Fig.9(c)showsthelayoutresultsofthepropagationpaths.7
QUANTITATIVEEVALUATION
Inthissection,wequantitativelyevaluatetheeffectivenessofourMRGcomputationandincrementalrankingupdatealgorithm.
youtofmultipleuncertaintypropagationpaths:(a)initiallayoutbasedonthe owmaplayout;(b)thematchedresultofthepropagationpaths;(c)thelayoutresultofthepropagationpaths.
7.1MRGComputation
ToevaluatetheperformanceofourMRGcomputationbasedontheMonteCarlosamplingmethod,wecompareditwiththematrix-basedmethodproposedin[16].WeusedtwoTwitterdatasetsintheexperiments:governmentshutdownandEbolaoutbreak.Theshutdowndatasetcontainstweetsonthe2013USgovernmentshutdown(5,132,510tweetsfromOct.1toOct.16,2013),whichwerecollectedbyusingqueriessuchas“shutdown.”TheEboladatasetcontainstweetsontheEbolaoutbreak(1,425,017tweetsfromJan.1toDec.25,2014),whichwerecollectedbyusingqueriessuchas“ebola.”AllexperimentswereconductedonaPCwitha3.1GHzCPUand16GBRAM.
Thereweretoomanyposts,users,orhashtagsandwecouldnotlabelallofthem.Thus,wedidnotreporttherecallinourevaluation.Inthisevaluation,weusedtopn-precision(n-Prec)astheevaluationmeasure.Topn-precisionisthepercentageofthecorrectlyretrieveditemsamongthetop-nrankeditems.Thismeasureisoftenusedwhentherecallishardtocalculate[9].Tofullycomparethetwoalgorithms,wecalculatedthetop10,50,100,and200-precisionforposts,users,andhashtags,respectively.WeinvitedtwoPhDstudentswhomajoredindataminingandarefamiliarwiththedatasetstoevaluatethere-trievalresults.Theylabeledtheresultsindividuallyandresolvedthedifferencesviadiscussion.TheresultsareshowninTable1.Overall,ouralgorithmperformedbetterthanthebaselineonbothdatasets.Weinspectedthetop10retrieveditemswithbothmethods.Ingeneral,theretrieveditemswerequiteaccurate.However,thebaselinehadonemistakeinthetop10usersselectedfromtheshutdowndataset.Itoverestimatedtheimportanceofausercalled@governmentclosd,whopostedasigni cantnumberoftweetswithanumberofhashtags.However,thisuserdidnothavemanyfollowersandhis/hertweetswereseldomretweeted.Incontrast,ouralgorithmcanavoidthismistakebytakingauser’sauthorityintoconsideration.ThebaselinealgorithmalsohadsimilarmistakesintheEboladataset.
Dataset
n-Precrankingscores(e.g.,nodesizes)ofseveralnodeschanged.Aglyphisdesignedtoillustratethechange,withthedottedorangecircleencodingthepreviousrankingscoreandtheboundaryofthe lledcircle(graycolor)representingthechangedrankingscore(Figs.1(d)-(f)).6.3UncertaintyPropagationasFlowMap
The owmap[33,44]isdesignedtovisuallyanalyzethemovementofobjectsfromonelocationtomultiplelocations.Inspiredbythisdesign,wedeveloptheuncertaintypropagationpath(Fig.1),whichisusefulforquicklyderivingtheunknownuncertainnode(s)fromtheknownone(s)(R5).
Layout.Thelayoutofmultipleuncertaintypropagationpathsofdiffer-entnodesisbasedonthe owmaplayoutin[44]andtheedgebundlingin[19].Thelayoutcontainsthefollowingsteps.
Step1:Derivetheinitialuncertaintypropagationpathbasedonthe owmaplayout.We rstcomputetheuncertaintypropagationoftheselectednodebasedonthetopologybyusingthemethodinSec.5.3.The owmaplayoutviaspiraltreesisthenunitizedtogeneratetheinitialuncertaintypropagationpath(Fig.9(a)).
Step2:Employedgecompatibilitymeasurestomatchthecorrespond-ingpropagationpathsfromdifferentnodes.Inthisstep,weemploythethreecompatibilitymeasuresdescribedin[19]tomatchthepropagationpathsfromdifferentnodes.
The rstmeasureisanglecompatibility,whichaimstomatchtheedgeswithasmallerangle.Itisde nedby:
Cα(ei,ej)=|cos(α)|.
Thesecondmeasureisscalecompatibility,whichtendstomatchtheedgeswithsimilarlengths.Itismeasuredby:
Cs(ei,ej)=2/(lavg·min(|ei|,|ej|)+min(|ei|,|ej|)/lavg),
wherelavg=(|ei|+|ej|)/2.
Thethirdmeasureispositioncompatibility,whichaimstomatchthecloseedgestogether.Itisde nedby:
Cp(ei,ej)=lavg/(lavg+ Qm1 Qm2 ),
Shutdown
whereQm1andQm2arethemidpointsofedgeseiandej.
Thelastmeasure,visibilitycompatibility,describedin[19]isnotconsideredinourmethodbecausetherearetoomanylinesegmentsinthepropagationpathgeneratedbythe owmaplayout,eachofwhich
Ebola
parisonoftheMRGcomputationmethodwiththebaselineusingtop-nprecisionforposts,users,andhashtags.
10.Theoverviewofthegovernmentshutdowndataset.
ciology(S)andoneresearcherinmediaandcommunications(C),tounderstandtheirrespectiveinterestsinthedatasets.Wedesignedanumberofexplorationtasks.Inthesecondphase,wecollaboratedwiththeexpertsto nishthedesignedtasks.Duringthisphase,weaskedquestionstodiscusswiththeexpertstheusefulnessofourtoolforeachtask.Finally,theexpertswereinvitedtoanotherdiscussionsessiontoprovideoverallfeedbackonhowourtoolcouldhelpthemwithreal-worldtasks.
8.1CaseStudy:GovernmentShutdown
Inthisstudy,weworkedwithexpertSto:1)evaluatehowuncertaintyanalysiscanbeutilizedtoidentifykeyhashtagsanduserswithasatis-factorycon dencelevel;2)leverageoursystemtoiterativelyreducetheuncertaintylevels;3)extractrelevanthashtags/users/tweetsrelatedtothegovernmentshutdown.
.Theexpertquicklyfoundinterestingresultsafterexamininghashtagoverview(Fig.10(a))generatedbyoursystem.Sheiden-sevenprominenttopicsdescribedbyasetofhashtags:generalabouttheshutdownandObamacare(Fig.10A),politicalontwitter(Fig.10B),discussiononendingtheshutdown10C),thein uenceoftheshutdownonpeople’slives(Fig.10D)thegovernmentshutdownonnewsmedia(Fig.10E),debt-discussion(Fig.10F),andcriticsoftheshutdown(Fig.10G).analysis:The“#shutdown”cluster(Fig.10(b))attractedexpert’sattentionbecauseitcontainsitemswithhigheruncertainty.expertexaminedthedetailedhashtagsandtweetsinthecluster.foundthatinadditiontocommonhashtagssuchas#govtshutdown,and#shutdowngop,anumberofdiversehashtagsalsocreated.Suchhashtagsincludedthosethatcriticizedthee.g.,#shutdownharry;localnewsposts,e.g.,#hounews,andcampaigns,e.g.,#dontcutkids.Shewantedtoexaminethemostones,soshesortedthehashtagsbytheuncertaintylevel.#lewinskywasrankedasthemostuncertainhashtag10(c)).Theanalystsearchedtherelatedtweetsandfoundthattaggedwith#lewinskyconcernedtheshutdownoftheClintonin1995.Theexpertdecidedtolowertherankingscoreofhashtag.Duringtheprocess,shecommentedthattheuncertaintyanditem- lteringfeaturewereuseful,helpingher lteroutitemsbyloweringtheirrankingscores.
Uncertaintypropagation:Next,theexpertexaminedhowtheuncer-taintyofthe“#shutdown”clusterwouldin uenceneighboringclusters.Sheclickedthe“propagation”buttonandthecorrespondinguncertaintypropagationwasdisplayed(theorange owinFig.1).Shealsoselectedtheuncertaintypropagationofthe“#democrats”cluster(theblue owinFig.1)and“#republicans”cluster(thegreen owinFig.1),whichwerecloselyrelatedtothe“#shutdown”cluster.AsshowninFig.1(b),clus-ter“#nationalparks”sharedtheuncertaintypropagatedfromthethreeclusters.GiventhattheclosingofthenationalparkswasaresultofthegovernmentshutdownandstimulateddiscussiononTwitter,theexpertincreasedtherankingscoreof#nationalparks(from4to6).Inoursys-tem,therankingscoreisfrom1to10,with10beingthehighestscore.Aftertheadjustment,shenoticedthescoresofanothertwohashtagclusterswereautomaticallyincreased:“#spitehouse”and“#teaparty.”Inthe rstcluster,therankingscoresofhashtagssuchas“#spitehouse”
switchedbacktothehashtaggraphtocheckthe
onthisgraph.Shefoundanewhashtagcluster,Shethenzoomedintothiscluster.Asshowninprimarilyexpressescriticismofthegovernment,DemocratsorRepublicans(“@PeteSessions#De-#shutdown#MakeDCListen#senatemustactStandforusefulnessofoursystem,weconductedasemi-withthetwoexperts.TheyusedMutualRankerin2hours,sotheywerefamiliarwithitsbasicfunctions.waswellreceivedbythem.
MutualRankerasaresearchtooltohelpposts,users,andhashtagsquicklyandconve-believedthatMutualRankerisveryusefulforcodingAccordingtohim,codingisthemostinhis eld.Extensivetrainingandcarefulatten-beenrequiredtoproducereliabledata.Hecommented,isurgentlyneededinmydailyworktoandcosts.Especiallywhentherearenotthelinkagebetweenitems[inthissystem]willprovidetomakedecision.[...]Thissystemalsoprovidesanthedataretrievalprocess.”
wereimpressedbytheuncertaintyillustrationandfunction.Forexample,ExpertCsaid,“Uncertaintyawesomefeature,Icanuseitto ndsomeunexpectedthecoverageofcoding.”
thatsmoothlyswitchingbetweendifferentdatagraphshelpedthem ndrelevantdatamorequickly.ExpertScom-mented,“Thisswitchingfunctionenablesmetoeasilytransitionbe-tweenthehashtaggraphandtheusergraph.WhenImodifyonerankingscoreinonegraph,Icannotonlyverifytheresultinthisgraph,butalsoverifyitinanothergraph.”
Theexpertsalsosuggestedseveralimprovements.Thetargetaudi-enceofMutualRankerisexpertswithdomainknowledge.Theexpertsbelievedthataverageuserscanalsobene tfromit.Theysuggestedthatmoreintuitivevisualdesignbeused.ExpertCsaid,“Theuncertaintyglyphcanbesimpli edforageneraluser.Forexample,maybetheglyphdoesnotneedtoencodetheuncertaintydistribution,justsimplyshowthatthisrankingscoreisuncertainty.”Theyalsoexpressedtheneedtoretrievestreamingdata.9
DISCUSSION
AND
iscausedbytherankingscoredecreaseofhashtags“#ebt”and“#obam-zombies.”Theexpertthenexaminedtherelevanttweetstoprobethereason.TheEBTsystemwascrashedatthattimeandmanypeoplewonderedwhetherthecrashwascausedbythegovernmentshutdown:“Ahh...#ebtnotworkingcauseifa#governmentshutdown?Howsadyoucan’tspendmoneytakenfrommeagainstmywillthatIworkedfor...”Then,thecrashwasexplainedtobearesultofacomputerfailure(“AccordingtoNBC,#ebtisdownbecauseofatechnicalissue,NOT#governmentshutdown”).Thus,theexpertbelieved“#ebt”wasirrelevantandappreciatedthisautomaticchange.
Switchingbetweendifferentdataviews.Inadditiontohashtags,theexpertwantedtoexaminetheuserswhoparticipatedindifferentdiscus-siongroups.Forexample,shewantedtoidentifythemostactiveusersinthe“#shutdown”cluster,sosheoverlaidtheuserlabelsaroundthehashtaglabels(Fig.11(a)).Theexpertthenswitchedtotheuserviewtoexploreadditionaluserinformation(Fig.11(b)andFig.11(c)).Sheimmediatelyidenti edtheleadingusersinFig.11(b)andFig.11(c).Shedescribedthemwithtwocategories:1)keygovernmentof cialaccounts,including“@barackobama,”“@whitehouse”(Fig.11(b));and2)newsagencies/publicmediasuchas“@nytimes,”“@guardian,”and“@bloombergnews”(Fig.11(c)).Consideringthatpartisanleaderswereofmajorinteresttoher,she rstobservedtherankingscoresofselectpoliticians,e.g.,@speakerboehner(Rank8),@whiphoyer(Rank8),@nancypelosi(Rank7),etc.Shebelievedthattheimportanceoftheseuseraccountswasunderestimatedbecausethein uenceandac-tivenessofpoliticiansontwitterareusuallymuchlowerthanthatinreallife.Shechangedtherankingsofthepartisanleaders,“@speaker-boehner,”“@whiphoyer,”and“@nancypelosi,”to10,whichisthehighest.Fig.11(d)showsthedifferenceafterthisre nement.
Afterthechange,theuserclusterswereregeneratedandtheuncertainlylevelsofsomenodeswerelargelyreduced.Notably,“@whiphoyer”becameanimportantclusterwiththescoresofseveralusersintheclusterautomaticallyincreased(Fig.11(e)).Forexample,“@repmaloney,”from5to6and“@repteddeutch,”from5to6.“ThesearemembersofCongress.Thechangeoftheirrankingscoresisnaturalhere.”Theexpertcommented,“Thisiscool.[...]IfIwanttochangetherankingscoreofoneuser,othersjustautomaticallyfollow.Thiscouldhelpme ndtheimportantuserswhosenamesIamnotfamiliarwithorwhoarenotactiveonTwitter.”
FUTUREWORK
Thispaperpresentsavisualanalyticssystem,MutualRanker,tohelpanalystsinteractivelyretrievedataofinterestfrommicroblogs.WeextendtheMRGmodeltoextractamultifacetedretrievalresultthatincludesthemutualreinforcementrankingresults,theuncertaintyofeachrank,andtheuncertaintypropagationamongdifferentgraphnodes.Themodelistightlyintegratedwithacompositevisualizationtoassistanalystsinretrievingsalientposts,users,andhashtagseffectively,inanuncertainty-awareenvironment.
Inthefuture,weplantoimprovesystemperformancebyimplement-ingaparallelMonteCarlosamplingmethod.Anotherexcitingavenueforfutureworkistoretrievestreamingdatainmicroblogs,whichcanbeveryusefulinemergencymanagementandthreatanalysis.Webe-lievethesystemcanalsobene taverageusersinterestedincollectingmicroblogdata.Inthefuture,wewillalsoinvitemoreuserstotryoursystemandconductaformaluserstudy.Accordingly,wewillimproveMutualRankerbasedonthecollectedfeedback.ACKNOWLEDGMENTS
WewouldliketothankX.WangandJ.Yin,J.Gong,andDr.W.Cuiforhelpfuldiscussionsonthevisualizationdesign,Dr.J.ZhangandDr.Y.Songforconstructivesuggestionsonsimilaritymeasures,aswellasDr.W.PengandDr.J.Suforprovidingdomainexpertise.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2015.2467554, IEEE Transactions on Visualization and Computer Graphics
REFERENCES
[1]A.C.Alhadi,T.Gottron,J.Kunegis,andN.Naveed.Livetweet:Microblog
retrievalbasedoninterestingnessandanadaptationofthevectorspacemodel.InProceedingsofTREC,2011.
[2]K.Avrachenkov,N.Litvak,D.Nemirovsky,andN.Osipova.Montecarlo
methodsinpagerankcomputation:Whenoneiterationissuf cient.SIAMJ.Numer.Anal.,45(2):890–904,2007.
[3]B.Bahmani,A.Chowdhury,andA.Goel.Fastincrementalandpersonal-izedpagerank.Proc.VLDBEndow.,4(3):173–184,2010.
[4]M.Bianchini,M.Gori,andF.Scarselli.Insidepagerank.ACMTrans.
InternetTechnol.,5(1):92–128,2005.
[5]I.BIPM,I.IFcc,andI.IuPAc.Oiml,guidetotheexpressionofuncertainty
inmeasurement.InternationalOrganizationforStandardization,Geneva.ISBN,pages92–67,1995.
¨ttmann,S.Koch,R.Kru¨ger,[6]H.Bosch,D.Thom,F.Heimerl,E.Pu
¨rner,andT.Ertl.Scatterblogs2:Real-timemonitoringofmicroblogM.Wo
messagesthroughuser-guided ltering.IEEETVCG,19(12):2022–2031,2013.
[7]S.BrinandL.Page.Theanatomyofalarge-scalehypertextualwebsearch
puternetworksandISDNsystems,30(1):107–117,1998.[8]N.Cao,Y.-R.Lin,X.Sun,zer,S.Liu,andH.Qu.Whisper:Tracing
thespatiotemporalprocessofinformationdiffusioninrealtime.IEEETVCG,18(12):2649–2658,2012.
[9]A.ChandramouliandS.Gauch.Aco-operativewebservicesparadigm
forsupportingcrawlers.InLargeScaleSemanticAccesstoContent(Text,Image,Video,andSound),pages475–489,2007.
[10]H.Chen,S.Zhang,W.Chen,H.Mei,J.Zhang,A.Mercer,R.Liang,and
H.Qu.Uncertainty-awaremultidimensionalensembledatavisualizationandexploration.IEEETVCG,2015(ToAppear).
[11]J.Chen,J.Zhu,Z.Wang,X.Zheng,andB.Zhang.Scalableinferencefor
logistic-normaltopicmodels.InProceedingsofNIPS,pages2445–2453.2013.
[12]S.CherichiandR.Faiz.Relevantinformationmanagementinmicroblogs.
InformationSystemsforKnowledgeManagement,pages159–182,2013.[13]C.Collins,S.Carpendale,andG.Penn.Visualizationofuncertaintyin
latticestosupportdecision-making.InProceedingsofEUROVIS,pages51–58,2007.
[14]C.Correa,Y.-H.Chan,andK.-L.Ma.Aframeworkforuncertainty-aware
visualanalytics.InProceedingsofIEEEVAST,pages51–58,Oct2009.[15]D.R.CoxandP.A.Lewis.Thestatisticalanalysisofseriesofevents.
Wiley,1966.
[16]Y.Duan,F.Wei,Z.Chen,M.Zhou,andH.Shum.Twittertopicsumma-rizationbyrankingtweetsusingsocialin uenceandcontentquality.InProceedingsofColing,pages763–780,2012.
[17]rmationsearchandretrievalinmicroblogs.Journalof
theAmericanSocietyforInformationScienceandTechnology,62(6):996–1008,2011.
[18]S.Ghani,B.Kwon,S.Lee,J.-S.Yi,andN.Elmqvist.Visualanalyticsfor
multimodalsocialnetworkanalysis:Adesignstudywithsocialscientists.IEEETVCG,19(12):2032–2041,2013.
[19]D.HoltenandJ.J.VanWijk.Force-directededgebundlingforgraph
puterGraphicsForum,28(3):983–990,2009.
[20]W.JavedandN.Elmqvist.Exploringthedesignspaceofcomposite
visualization.InProceedingsofPaci cVis,pages1–8,2012.
[21]T.KamadaandS.Kawai.Analgorithmfordrawinggeneralundirected
rmationprocessingletters,31(1):7–15,1989.
[22]mpeandH.Hauser.Interactivevisualizationofstreamingdata
withkerneldensityestimation.InProceedingsofPaci cVis,pages171–178,2011.
[23]B.LiuandL.Zhang.Asurveyofopinionminingandsentimentanalysis.
InMiningtextdata,pages415–463.2012.
[24]S.Liu,W.Cui,Y.Wu,andM.Liu.Asurveyoninformationvisualization:
recentadvancesandchallenges.TheVisualComputer,pages1–21,2014.[25]S.Liu,X.Wang,J.Chen,J.Zhu,andB.Guo.Topicpanorama:Afull
pictureofrelevanttopics.InProceedingsofIEEEVAST,pages183–192,2014.
[26]X.Liu,Y.Song,S.Liu,andH.Wang.Automatictaxonomyconstruction
fromkeywords.InProceedingsofKDD,pages1433–1441,2012.
[27]S.Lodha,A.Pang,R.Sheehan,andC.Wittenbrink.U ow:visualizing
uncertaintyin uid ow.InProceedingsofIEEEVisualization,pages249–254,Oct1996.
[28]Y.Lu,F.Wang,andR.Maciejewski.Businessintelligencefromsocial
[29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50][51][52][53][54][55]
media:Astudyfromthevastboxof cechallenge.IEEEComputerGraphicsandApplications,34(5):58–69,2014.
Z.Luo,M.Osborne,S.Petrovic,andT.Wang.Improvingtwitterretrievalbyexploitingstructuralinformation.InProceedingsofAAAI,2012.A.Marcus,M.S.Bernstein,O.Badar,D.R.Karger,S.Madden,andR.C.Miller.Twitinfo:Aggregatingandvisualizingmicroblogsforeventexploration.InProceedingsofCHI,pages227–236,2011.
R.McCreadieandC.Macdonald.Relevanceinmicroblogs:Enhancingtweetretrievalusinghyperlinkeddocuments.InProceedingsofOAIR,pages189–196,2013.
A.T.Pang,C.M.Wittenbrink,andS.K.Lodha.Approachestouncertaintyvisualization.TheVisualComputer,13(8):370–390,1997.
D.Phan,L.Xiao,R.Yeh,andP.Hanrahan.Flowmaplayout.InProceed-ingsofIEEEInfoVis,pages219–224,2005.
E.J.Ruiz,V.Hristidis,C.Castillo,A.Gionis,andA.Jaimes.Correlating nancialtimeserieswithmicro-bloggingactivity.InProceedingsofWSDM,pages513–522,2012.
S.SedhaiandA.Sun.Hashtagrecommendationforhyperlinkedtweets.InProceedingsofSIGIR,pages831–834,2014.
L.Shi,F.Wei,S.Liu,L.Tan,X.Lian,andM.Zhou.Understandingtextcorporawithmultiplefacets.InProceedingsofIEEEVAST,pages99–106,2010.
M.Skeels,B.Lee,G.Smith,rmationVisualization,9(1):70–81,2010.A.Slingsby,J.Dykes,andJ.Wood.Exploringuncertaintyingeode-mographicswithinteractivegraphics.IEEETVCG,17(12):2545–2554,2011.
G.Sun,Y.Wu,R.Liang,andS.Liu.Asurveyofvisualanalyticstech-niquesandapplications:State-of-the-artresearchandfuturechallenges.JournalofComputerScienceandTechnology,28(5):852–867,2013.G.Sun,Y.Wu,S.Liu,T.-Q.Peng,J.Zhu,andR.Liang.Evoriver:Visualanalysisoftopiccoopetitiononsocialmedia.IEEETVCG,20(12):1753–1762,2014.
J.Tang,Z.Liu,M.Sun,andJ.Liu.Portrayinguserlifestatusfrommicrobloggingposts.TsinghuaScienceandTechnology,18(2):182–195,2013.
J.Thomson,E.Hetzler,A.MacEachren,M.Gahegan,andM.Pavel.Atypologyforvisualizinguncertainty.SPIE,5669:146–157,2005.
C.Vehlow,T.Reinhardt,andD.Weiskopf.Visualizingfuzzyoverlappingcommunitiesinnetworks.IEEETVCG,19(12):2486–2495,Dec2013.K.Verbeek,K.Buchin,andB.Speckmann.Flowmaplayoutviaspiraltrees.IEEETVCG,17(12):2536–2544,2011.
F.Wei,W.Li,Q.Lu,andY.He.Query-sensitivemutualreinforcementchainanditsapplicationinquery-orientedmulti-documentsummarization.InProceedingsofSIGIR,pages283–290,2008.
J.Weng,E.-P.Lim,J.Jiang,andQ.He.Twitterrank:Findingtopic-sensitivein uentialtwitterers.InProceedingsofWSDM,pages261–270,2010.
Y.Wu,S.Liu,K.Yan,M.Liu,andF.Wu.Opinion ow:Visualanalysisofopiniondiffusiononsocialmedia.IEEETVCG,20(12):1763–1772,2014.Y.Wu,F.Wei,S.Liu,N.Au,W.Cui,H.Zhou,andH.Qu.Opinion-seer:Interactivevisualizationofhotelcustomerfeedback.IEEETVCG,16(6):1109–1118,2010.
Y.Wu,G.-X.Yuan,andK.-L.Ma.Visualizing owofuncertaintythroughanalyticalprocesses.IEEETVCG,18(12):2526–2535,Dec2012.
P.Xu,Y.Wu,E.Wei,T.-Q.Peng,S.Liu,J.J.H.Zhu,andH.Qu.Visualanalysisoftopiccompetitiononsocialmedia.IEEETVCG,19(12):2012–2021,2013.
E.Zangerle,W.Gassler,andG.Specht.Ontheimpactoftextsimilarityfunctionsonhashtagrecommendationsinmicrobloggingenvironments.SocialNetworkAnalysisandMining,3(4):889–898,2013.
J.Zhao,N.Cao,Z.Wen,Y.Song,Y.-R.Lin,andC.Collins.# ux ow:Visualanalysisofanomalousinformationspreadingonsocialmedia.IEEETVCG,20(12):1773–1782,2014.
X.W.Zhao,Y.Guo,Y.He,H.Jiang,Y.Wu,andX.Li.Weknowwhatyouwanttobuy:Ademographic-basedsystemforproductrecommendationonmicroblogs.InProceedingsofKDD,pages1935–1944,2014.
H.-J.Zimmermann.Fuzzysettheoryanditsapplications.SpringerScience&BusinessMedia,2001.
T.ZukandS.Carpendale.Visualizationofuncertaintyandreasoning.InProceedingsofSG,pages164–177,2007.
正在阅读:
An Uncertainty-Aware Approach for Exploratory Microblog Retrieval06-09
名师堂学校2012秋初一数学半期复习指导01-18
山东省胶南市隐珠中学七年级历史 第19课江南地区的开发导学案(03-09
PC级ATS与CB级的区别及特点01-04
利辛职教园区建设PPP项目07-09
32式太极剑剑谱08-10
苏教版五年级下册分类复习-句子训练06-26
关于剩余法评估不动产价值中土地增值税计算问题的分析01-21
人教版小学四年级数学上册竞赛试卷及答案09-04
学期数学教学个人工作计划例文202204-03
- 1Compressed Spatio-temporal Descriptors for Video Matching and Retrieval
- 2Disordes of Fluids and Electrolytes--Physiological Approach
- 3A Novel Approach for Automatic Palmprint Recognition
- 4A Unifying Conformal Field Theory Approach to the Quantum Hall Effect
- 5Accommodating Hybrid Retrieval in a Comprehensive Video Database Management System
- 6I. Ontology-based Information Retrieval
- 7An Evolutionary Multiobjective Approach for Community Discov
- 8Disordes of Fluids and Electrolytes--Physiological Approach
- 9Lifetime-aware multicast routing in wireless ad hoc networks
- 10A Novel Image Retrieval System Based on BP Neural Network
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- Uncertainty
- Exploratory
- Microblog
- Retrieval
- Approach
- Aware
- 语言学概论离线作业
- 安全教育第一课教案
- 变压器及高低压柜吊装施工方案
- 2016高考语文适应性试卷-推荐下载
- 成都市规划管理技术规定(2008)
- 初中文言文实词和虚词积累
- 西方经济学导学答案
- 北京市十二五保障性住房建设规划
- 一件令我感动的事情作文700字
- 城市燃气管网信息化
- 兴业证券第三季度季报
- 东胜集团招标界面划分(试行)
- EXB-28V330JX-Panasonic
- 一个土皇帝的覆灭心得体会
- 2018版高中地理第一单元人口与地理环境第一节人口增长与人口问题试题鲁教版必修2
- 六上科学试卷命题细目表与意图说明
- (完整word版)小学四年级第一学期班主任工作总结
- 民主决策:作出最佳选择(3)
- 客户联谊活动策划方案
- 宏、微观经济学练习题