An Uncertainty-Aware Approach for Exploratory Microblog Retrieval

更新时间:2023-06-09 12:52:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2015.2467554, IEEE Transactions on Visualization and Computer Graphics

An Uncertainty-Aware Approach for Exploratory Microblog RetrievalMengchen Liu, Shixia Liu, Xizhou Zhu, Qinying Liao, Furu Wei, and Shimei PanLegend:Upper Extreme h ,  > ∥ ,  Lower Extreme#pjnet#makedclisten

0

1.0#alaska

#congress

#lnyhbt

#t2sda

#science

#breaking#furlough#ff

#budget#democrats#debtceiling#dems#debt#debtlimit#us#usa#cr#tcot#wwiimemorial

#barrycades#veterans#spitehouse#healthcare#impeachobama

#benghazi#tlot#military#ebt

#nationalparks#dems#pjnet#tcot

#politics#cnn

#dc

(b)#spitehouse

#default#government

#irs#obamacare#obama

#texas#gopshutdown#gop#edshow#msnbc

#science

#nationalparks#news

A

#shutdown#govtshutdown#obamashutdown#senate

#tgdn#truth

#jobs#economy#getcovered

#military#tlot#ebt

#spitehouse(d)

#military

#republicans#maddow#republican#teaparty#p2

#libcrib#demandavote#retweet#potus

#obamacare#obama#sot#jobs#economy

#ebt(e)

#immigration#america#fail

#dearcongress

#cspanchat#house

#topprog#sequester#uniteblue#vote#stoprush#enoughalready#aca#koch#wic#1u#endthisnow#inners#tedcruz#cleancr#boehner#justvote

#obamacare#getcovered#doctorwho#maddow#teaparty#p2

#teaparty

(a)

(c)

(f)

Fig. 1. Exploratory retrieval of the government shutdown dataset: (a) the hashtag graph with uncertainty and its propagation; (b) uncertainty propagation; (c)-(f) interactive ranking re nement results. Abstract— Although there has been a great deal of interest in analyzing customer opinions and breaking news in microblogs, progress has been hampered by the lack of an effective mechanism to discover and retrieve data of interest from microblogs. To address this problem, we have developed an uncertainty-aware visual analytics approach to retrieve salient posts, users, and hashtags. We extend an existing ranking technique to compute a multifaceted retrieval result: the mutual reinforcement rank of a graph node, the uncertainty of each rank, and the propagation of uncertainty among different graph nodes. To illustrate the three facets, we have also designed a composite visualization with three visual components: a graph visualization, an uncertainty glyph, and a ow map. The graph visualization with glyphs, the ow map, and the uncertainty analysis together enable analysts to effectively nd the most uncertain results and interactively re ne them. We have applied our approach to several Twitter datasets. Qualitative evaluation and two real-world case studies demonstrate the promise of our approach for retrieving high-quality microblog data. Index Terms—microblog data, mutual reinforcement model, unc

ertainty modeling, uncertainty visualization, uncertainty propagation.

1 I NTRODUCTION Microblogs such as Twitter and Facebook are among the most popular platforms for people to share their daily observations and thoughts, M. Liu and S. Liu are with Tsinghua University. E-mail: simon900314@, shixia@. S. Liu is the corresponding author. X. Zhu is with USTC. E-mail: ezra0408@. Q. Liao and F. Wei are with Microsoft. E-mail:{qiliao,fuwei}@. S. Pan is with University of Maryland, Baltimore County. E-mail: shimei@umbc.edu Manuscript received 31 Mar. 2015; accepted 1 Aug. 2015; date of publication xx Aug. 2015; date of current version 25 Oct. 2015. For information on obtaining reprints of this article, please send e-mail to: tvcg@.

including personal status updates and opinions regarding products or government policies. Since the crowd in microblogs provides many individual comments/opinions that were not available before, businesses and organizations have begun to leverage microblogs to pro le customers, derive brand perception, gauge citizen sentiments, and predict the stock market[23, 34, 41, 53]. For example, retailers track and examine relevant microblog posts to understand customer opinion toward their products and services. In spite of the growing interest in quickly analyzing customer opinions or breaking news in microblogs, progress has been hampered by the lack of an effective mechanism to retrieve data of interest from microblogs. For this reason, researchers have developed a number of microblog retrieval methods[6, 17]. The main goal is to generate a list of k microblog posts that are relevant to the information needs represented by a query q. Although these methods have successfully retrieved

1077-2626 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See /publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI

10.1109/TVCG.2015.2467554, IEEE Transactions on Visualization and Computer Graphics

relevantposts,theyhavetwomajordrawbacks.First,theuniquecharacteristicsofmicroblogdataarenotcomprehensivelyconsideredtoimproveretrievalperformance.Posts,users,andhashtagsarethreekeydimensionsofmicroblogdata.Thesedimensionsarenotindependent,astheyoftenin uenceoneanother.Forexample,apostpublishedbyanin uentialuserandlabeledwithapopularhashtagtendstobesalient.However,theexistingapproachestomicroblogretrievaldonottightlyintegratethethreedimensionsanddonottakeadvantageoftherelationshipsamongthem.Theseapproacheseitherconsideronlythepostortreatitastheprimarydimensionandtheothersassecondarydimensionsto lterposts.Forexample,ScatterBlogs2[6] ndspostsofinterestbycheckingwhetherthepostscontainacertainhashtag.Second,existingmethodsdonotaddressuncertaintyintheretrievalmodels.Improvingthemodelingandpresentationofuncertaintycanhelpdescriberetrieveddatamoreaccurately,whichcaninturnassistanalystsinmakingmoreinformeddecisions[14,39,49].

Toaddresstheaboveproblems,wehavedevelopedanuncertainty-awaremicroblogretrievaltoolkit,MutualRanker,toestimateuncertaintyintroducedbytheanalysisalgorithmaswellastoquicklyretrievesalientposts,users,andhashtags.Sincepostsaresharedandpropagatedonsocialnetworks,theauthorityofanauthorandthepopularityofahashtagplayanimportantroleindeterminingtheimportanceofapostandviceversa.Accordingly,weformulateuncertainty-awaremicroblogretrievalusinganuncertainty-basedmutualreinforcementgraphmodel(MRG)[16,45],wherethecontentqualityofposts,thesocialin uenceofusers,andthepopularityofhashtagsmutuallyreinforceoneanother.WeadoptaMonteCarlosamplingmethodtosolveMRGbecauseofitseffectivelocalupdatemechanism,fastconvergence,andprobability-baseduncertaintyformalization[2].Adispersion-basedmeasureisusedtoestimatetheuncertaintygeneratedbytheMonteCarlosamplingmethod.Inaddition,wemodeltheuncertaintypropagationasaMarkovchain.Tohelpanalystsunderstandtheretrieveddata,wehavedesignedacompositevisualization[20].Speci cally,adensity-basedgraphvisualizationhasbeendevelopedtovisuallyillustrateposts,users,hashtags,andtheirrelationships.Anuncertaintyglyphanda owmapareemployedtorepresentuncertaintyanditspropagationonagraph(Fig.1).Thethreevisualizationcomponents,togetherwiththeuncertaintyanalysis,enableanalyststoquicklydetectthemostuncertainresultsandinteractivelyresolvethem.TheMonteCarlosamplingmethodisthenusedtoincrementallymodifytherankingresultstomeetuserneeds.

Insummary,ourworkpresentsthreetechnicalcontributions:

Anuncertain-awaremicroblogretrievalmodelthatextractssalientposts,users,andhashtags.Thismodelalsocomputestheassociateduncertaintyanditspropagationamonggraphnodes. Acompositevisualizationthatenablesuserstounderstandthethree-level,mutualreinforcementrankingresults,theassociateduncertainty,anduncertaintypropagationpatterns.

Avisualanalyticssystemthathelpsusersquicklyretrievedataofinterest,aswellasanalyzeandunderstandtherankingresultsinaninteractiveanditerativeprocess.2RELATEDWORK

2.1MicroblogRetrieval

Inthe eldofdatamining,anumberofapproacheshavebeenpro-posedtoretrievedatafrommicroblogs.AcomprehensivesurveywaspresentedbyCherichiandFaiz[12].Mostrecentworkcanbecatego-rizedintotwogroups:vector-space-basedapproachesandlinkanalysisapproaches.

Thevector-space-basedapproachemploystwofeaturevectorstorepresentaqueryandapost.Asimilaritymeasure(e.g.,cosinesimi-larity)isthenadoptedtoestimatethesimilaritybetweenthepostandthequery.TherehavebeensomerecentresearcheffortsthatexploitadditionalstructuralfeaturessuchasURLsandhashtagstoenhanceretrievalperformance[1,29,31].

Recently,totakeadvantageofthelinkstructureofsocialnetworks,researchershaveintroducedthePageRankalgorithm[7]inmicroblogretrieval.Forexample,TwitterRank[46]adoptsthefollower-followee

linkstructureandthePageRankalgorithmtoidentifyin uentialusers.Duanetal.[16]modeledthetweet-rankingproblemasanMRG[45],wherethesocialin uenceofusersandthecontentqualityoftweetsmu-tuallyreinforceeachother.Speci cally,thepostgraph,theusergraph,andthehashtaggraph,aswellastherelationshipsbetweenthethreegraphs,wereusedtoretrievesalientposts,users,andhashtags.Weex-tendthisapproachbyexplicitlymodelingtheuncertaintyoftherankingresult,aswellasitspropagationonthetweet/user/hashtaggraph.Inthe eldofvisualanalytics,agreatdealofresearchhasbeenconductedonvisuallyanalyzingmicroblogdata.Themethodsappliedincludeeventdetection[30],topicextractionandanalysis[25,40,50],informationdiffusion[8,52],sentimentanalysis[47,48],andrevenue/stockprediction[28,34].However,fewstudieshavefocusedonmicroblogretrieval.

Boschetal.[6]developedScatterBlogs2toextractmicroblogpostsofinterest.Itallowsanalyststobuildcustomizedpost ltersandclas-si ersinteractively.These ltersandclassi ersarethenutilizedtosupportreal-timepostmonitoring.Inpost ltering,thepostdimensionisconsideredtheprimarydimensionandthehashtagthesecondarydimension.Incontrast,wetightlyintegratetheposts,users,andhash-tagsintheMRGmodelandusethemodeltoretrievehigh-qualitymicroblogdata.Moreover,wealsomodeluncertaintyintheretrievalprocess.Sinceanalystscaninteractivelyre nethemodel,wecanfurtherimproveretrievalqualitybyleveragingtheuncertaintyformal-izationandanalysts’knowledge.

2.2InteractiveUncertaintyAnalytics

Frequently,uncertaintyisintroducedintovisualanalyticswhendataisacquired,transformed,orvisualized[14,24,27].Anumberofuncer-taintyanalysismethodshavebeenproposed,whichcanbecategorizedintotwogroups:uncertaintyvisualizationanduncertaintymodeling.Manystudiesonuncertaintyvisualizationhavebeenconductedinthe eldofgeographicvisualizationandscienti cvisualization[32,37,42].Typicaluncertaintyrepresentationtechniquesincludetheadditionofglyphsandgeometry,themodi cationofgeometryandattributes,an-imation,soni cation,andpsycho-visualapproaches[32].Recently,researchersareincreasinglyinterestedinthedesignofuncertaintyrep-resentationsforinformationvisualizationandvisualanalytics.Forexample,Collinsetal.[13]designedtwoalternatives,thegradientborderandthebubbleborder,toillustrateuncertaintyinlatticegraphs.Wuetal.[48]developedacircularwheelrepresentationandsubjectivelogictoconveyuncertaintyincustomerreviewanalysis.Slingsbyetal.[38]utilizedbarchartstorevealtheuncertaintyassociatedwithgeodemographicclassi ers.Torepresentuncertaintyinaggregatedvertexsets,Vehlowetal.[43]consideredthelightnessandshapeofthenode.Chenetal.[10]paredwiththesemethods,MutualRankernotonlyvisualizesun-certainty,butalsoitspropagationonagraph.Wealsosupportuserstointeractivelymodifytheuncertainresult.

Anothertypeofuncertaintyvisualizationrepresentstheuncertaintyintheanalysisprocess.ZukandCarpendale[55]studiedissuesrelatedtouncertaintyinreasoninganddeterminedthetypeofvisualsupportrequired.Correaetal.[14]developedaframeworktorepresentandquantifytheuncertaintyinthevisualanalyticsprocess.Wuetal.[49]extendedthisframeworktoshowtheuncertainty owintheanalysisprocess.Bycontrast,ourworkaimstomodeluncertaintyinmicroblogretrieval.Wefocusonvisuallyillustratingtopologicaluncertaintypropagationonagraphandondesigninganiterativevisualanalyticsprocesstoactivelyengageanalystsinreducingoveralluncertainty.Probabilitytheory,fuzzysettheory,roughsettheory,andevidencetheoryarefourmajorapproachestomodeluncertainty[54].Amongtheseapproaches,probabilitytheoryisthemostcommonlyusedmethodinvisualanalytics.Forexample,Correa[14]andWuetal.[49]re-gardeduncertaintyasaparameterthatdescribesthedispersionofmea-suredvalues.Speci cally,theyrepresenteduncertaintyasanestimatedstandarddeviation,inwhichthemeasuredvalueisde nedonthesetofbothpositiveandnegativerealnumbers.Sincethemeasuredvalue(therankingscore)inourapproachisde nedonthesetofpositivereal

andoneresearcherinmediaandcommunications(C).Theexpertsareexperiencedinretrievingdatafrommicroblogs.Theyalsohadexperienceusingamethodsimilartotheonedescribedabove.Weconductedseveralinterviewswiththem,mainlyfocusingonprobingtheirneedsandmicroblogretrievalprocess.Wethenidenti edthefollowinghigh-levelrequirementsbasedontheirfeedback.

R1-Examininganinitialsetofsalientmicroblogdata.Bothex-pertsexpressedtheneedforarankinglistofkeywordsearchresults.Keyword-basedmicroblogretrievalresultsoftenincludemillionsofpostsandtensofthousandsofusersandhashtags.Thus,theseresultsaretoomassiveforanalyststoquicklydiscoverrelevantdata.Theexpertsusuallyhavetoexaminethedatacarefullyanddesignasetofrulesto lteroutirrelevantdata.Asaresult,theystatedtheneedforatoolkitthatcanrankextractedposts,users,andhashtagstofacilitatetheirdataretrievaltasks.Thisneedisconsistentwiththe ndingsofpreviousresearch[16,46].

R2-Revealingrelationshipswithinmicroblogdata.Previousre-search[11,16,25]hasalsoindicatedthattherelationshipswithindatahelpuserslocateinterestinginformationmoreeasily.Furthermore,therelationshipsamongthethreedimensionsofmicroblogdata(posts,users,andhashtags)canassisttheminextractingsalientdata.Forexample,postsfromopinionleadersareusuallymoreimportantthanthosefromaverageusers.Thedomainexpertsdesiredtheabilitytoexploredifferenttypesofrelationships.

R3-Exploringsalientmicroblogdatafromdifferentperspectives.Sincethethreedimensionsofmicroblogdatausuallyin uenceeachother,theexpertswantedtounderstandthisin uencesothattheycanlinkimportantdatainonedimensiontothatinanotherdimension.Forexample,expertSsaidthat,“Collectingrelevanttweetsisveryimportantforsomeofourprojects.After ndingoneimportanttweet,Iusuallycheckothertweetsfromthesameauthoraswellasthetweetsmarkedbythesamehashtag(s).Thishelpsme ndrelevanttweetsquickly.”

R4-Understandingtheerrorproducedbytherankingmecha-nism.Themicroblogdatarankingmechanismisnotperfectandoftenintroduceserrorsoruncertaintyintotheretrievalprocess.Thus,thedegreeofuncertaintymustbeanalyzedandunderstoodtofacilitateinformeddecision-making[14,37,49].Theexpertsrequestedtoknowwhichrankingscoresaremoreerror-prone.

R5-Analyzingthein uenceoftheerrorsofoneitemonotheritems.Theexpertsalsoexpressedtheneedtounderstanderrorpropagationamongdataitems.Theyclaimedthatthisinformationcanhelpthemconsiderablyin lteringoutirrelevantdata.Forexample,expertCcommented,“WhenI ndanitemwithanincorrectranking

erinterface:(a)MutualRankervisualization;(b)controlpanel;(c)informationpanel.

score,Ialsowanttoknowwhichitemsarein uencedbythissothatIcanadjusttherankingscorequickly.”3.2SystemOverview

Thecollectedrequirementshavemotivatedustodevelopavisualana-lyticstoolkit,MutualRanker.Itconsistsofthefollowingcomponents: AnMRGmodeltogeneratetheinitialrankinglistsofposts,users,andhashtags(R1);

Anuncertaintymodeltoestimateuncertaintyanditstopologicalpropagationonagraph(R4,R5);

Acompositevisualizationtopresentthegraph-basedrankingresults,uncertainty,anditspropagation(R2,R3).

TheprimarygoalofMutualRankeristoextractalistofkmicroblogposts/users/hashtagsthatarerelevanttoqueryq.Fig.2illustratesthemaincomponentsneededtoachievethisgoal.Givenamicroblogdatasetextractedbyaquery,thepreprocessingmodule rstextractsthepostgraph,theusergraph,andthehashtaggraph.ThethreegraphsarethenfedtotheMRGmodel,whichproducesthreerankinglistsofposts,users,andhashtags.Theuncertaintymoduleestimatestheuncertaintyintheretrievalmodelanditstopologicalpropagation.Thevisualizationmoduletakestherankingresultsandtheuncertaintyestimationasinputandillustratestheminacompositevisualizationthatincludesagraphvisualization,anuncertaintyglyph,anda erscaninteractwiththegeneratedvisualizationforfurtheranalysis.Forexample,ausercanmodifyarankingresult.Withthisinput,MutualRankerwillincrementallyupdatetherankingresults.

Fig.3depictstheuserinterfaceofMutualRanker.Itcontainsthreedifferentinteractionareas:MutualRankervisualization(Fig.3(a)),controlpanel(Fig.3(b)),andinformationpanel(Fig.3(c)).Thevisual-izationviewconsistsoftwoparts:1)thestackedtreevisualizationthatshowsthehierarchicalstructureofmicroblogdata;2)thecompositevisualizationthatsimultaneouslyrevealtheretrievedmicroblogdata,theuncertaintyoftherankingresults,anditstopologicalpropagation.Thecontrolpanelconsistsofasetofcontrolsthatenableuserstointeractivelyupdatetheranking.Theinformationpaneldisplaysthecorrespondingmicroblogdatasuchasposts,users,andhashtagsforaselectedaggregateitem.

10.1109/TVCG.2015.2467554, IEEE Transactions on Visualization and Computer Graphics#obamacare

#debtceiling

Hashtag Graph

GRAPH

mainofMRG[16,45]isthatitemploys

boththerelation-users,orhashtags,andtherelationshipsbetween

rankings.Thisfeaturesigni cantlyreducesthework-wheninteractingwithourvisualanalyticssystem.Foranalystmodi estherankingscoreofahashtag,MRGupdatestherankingscoresoftheneighboringPost Graphhashtags,butalsothoseofrelevantusersandposts.Thisprocessallowsoursystemtointegrateuserknowledgeintothevisualanalyticsprocesswithacceptableusereffort.ThisisalsothemainreasonwhyweadoptMRGinMutualRanker.

5.1MRGComputationwithMonteCarloSamplingMethodDuanetal.[16]proposedamatrix-basedmethodtosolveMRG,whichiterativelyupdatestherankingscoresusingEq.(1).Thematrix-basedmethodisaglobalone.Anupdatetoanyitemisachievedbyrunningthemethodontheentireitemset,whichisverytime-consuming.Toaddressthisproblem,weusetheMonteCarlosamplingmethod.Theadvantagesofthissamplingmethodoverthematrix-basedmethodareasfollows[2]:

Rankingscoresarelocallyupdatedwhentheinputchangeslo-cally;

Therankingscoresofimportantitemsareaccuratelyestimatedafterafewiterations;

TheuncertaintyoftherankingscoresaremodeledaccuratelybecausetheMonteCarlomethodcalculatesvariancestatistically.Toemploythismethod,we rstsolveRasformulatedbyEq.(2):

R=(1 d)(I dM) 1W=(1 d)(∑dkMk)W.

k=0∞

(3)

Wethenperformaseriesofrandomwalksforeachitem.Arandomwalkmaystopateachstepwithaprobabilityof1 d.Ifthewalkcontinues,thenitproceedstothenextstepaccordingtothematrixM.Eachelementmijde nesthetransitionprobabilityfromitoj.

LetZ=∑dkMk.Therankingscoreofeachitemis:

k=0

Fig.4.Mutualreinforcementmodel.

rj=(1 d)∑wizij,

i=1

N

(4)

TheinputofMRGincludesthreegraphs,thepostgraph,theusergraph,andthehashtaggraph,aswellastherelationshipsamongthem.ThethreegraphsandtheirrelationshipsareshowninFig.4.Asin[16],thepostgraphisbuiltbasedoncosinesimilarity.ArecentstudyhasshownthatcosinesimilaritywithaTFIDFweightingschemeisthemostappropriatemeasuretocomputethesimilaritybetweenmicroblogposts[35,51].Asaresult,weemploycosinesimilarityinoursystem.Theusergraphisconstructedbasedonfollower-followeerelationships.Thehashtaggraphisgeneratedaccordingtotheco-occurrenceoftwohashtags.Thethreegraphsarealsoconnectedbytworelationships:authorshipandco-occurrence.Ifauserpublishesapost,thenweconnectthisuserwithhis/herpost.Wealsolinkthisuserwithallofthehashtagsinthispost.Eachpostisalsolinkedtoallofthehashtagsassociatedwithit.

Forsimplicity,weuniformlydenoteposts,users,andhashtagsasitemsinthefollowingdiscussion.

TheMRGemploysamethodsimilartoPageRank[7]tomodelthemutualin uenceamongdifferentitemsinheterogeneousgraphs:

RpαppMpp Ru =d αpuMpuRhαphMph

αupMup

αuuMuuαuhMuh

αhpMhpRpWpαhuMhu Ru +(1 d) Wu .αhhMhhRhWh

(1)

wheretheelementzijinZistheaveragenumberoftimesthatarandom

walkstartingfromitemivisitsitemj.Weestimatezijbycomputingtheempiricalmeanofanumberofrandomwalks.

Duanetal.[16]onlyconsiderthesimilarityofitemsincomputingmij.Thus,high-rankingscoresmaybeincorrectlyassignedtouserswhopublishmanypoststhatdonotreceiveattention.Toaddressthis,weconsiderthepriorsaliencyofitemsinthesamplingprocess.Speci -cally,thetransitionprobabilityfrommijisde nedassimilarity(i,j)·wj.5.2UncertaintyModeling

InMutualRanker,weuseanapproximationmethodtosolveMRG,whichmayintroduceuncertaintyintotheretrievalresults.Itisthereforeimportanttomodeluncertainty.SinceweemploytheMonteCarlosamplingmethod,thedistributionofeachrankingscoreisknown.Hence,wecanemploytheprobabilitytheorytomodeluncertainty.Uncertaintyisde nedasaparameterfordepictingthedispersionofvaluesthatcanbereasonablyattributedtothemeasuredvalue[5].Traditionalmethodsmodelthemeasuredvalueasanormallydistributedrandomvariable[14,49].Variance[14]andstandarddeviation[49]areamongthemostcommonlyusedmeasurestorepresentuncertaintywhereinthemeasuredvalueisde nedonthesetofbothpositiveandnegativerealnumbers.

Themeasuredvalue(rankingscore)inourapproachisde nedonthesetofpositiverealnumbers.Thus,theabovemodelingmethodcannotbeapplieddirectlytoourwork.

Accordingto[2],zijinEq.(4)hasaPoissondistribution.Therankingscoreistheweightedsumofaseriesofzij.Hence,therankingscoreismodeledasaPoissonmixture.ForaPoissonmixture,thevarianceisapproximatelyproportionaltothemean.Hence,ifweusevariancetomodeluncertainty,thelargertherankingscore,themoreuncertainitis,butthisisnotalwaystrue.

Standarddeviationisthesquarerootofvarianceandhasasimilarproblem.Consequently,varianceandstandarddeviationarenotgoodmeasuresfordepictinguncertaintyinourmodel.

Forsuchadistribution,acommonlyusedmeasureofdispersionisthevariance-mean-ratio(VMR)[15].ThehighertheVMR,themoredispersedthedistribution.Foritemj,itsVMR(uj)canbede nedas:

uj=vj/rj,

(5)

themutualreinforcementstrengthamongposts,users,andhashtags.disthedampingfactorinPageRank,andwesetitto0.85,asin[7].Wp,Wu,andWharevectorsforthepriorsaliencyoftheitems(e.g.,thecontentqualityofposts,thesocialin uenceofusers,orthepopularityofhashtags). LetR= Ru ,W= Wu ,andM= αpuMpu

Rh

Wh

Rp

Wp

αppMppαphMph

αupMup

αuuMuuαuhMuh

αhpMhpαhuMhu αhhMhh

Rp,Ru,andRharetherankingscorevectorsofposts(p),users(u),andhashtags(h).Mxydenotestheaf nitymatrixfromxtoy,wherex,ycanbeposts,users,orhashtags.αxyisaweightusedtobalance

.

Then,Eq.(1)canbesimpli edas:

R=dMR+(1 d)W.

(2)

5MRG-BASEDUNCERTAINTYANALYSIS

SinceexactinferenceofMRGisverytime-consumingonalargegraph,weapproximateitusingamoreef cientMonteCarlosamplingmethod.Wealsoexplicitlymodeltheuncertaintyassociatedwitheachitem(e.g.,apost,auser,orahashtag),aswellasitspropagationonthegraph.

wherevjisthedistributionvarianceoftherankingscoreofitemj.Accordingto[2],vjcanbecalculatedasfollows:

10.1109/TVCG.2015.2467554, IEEE Transactions on Visualization and Computer Graphics

vj=(1 d)2∑w2ivzij.

i=1N

(6)

wherevzijisthevarianceofzij.EachzijobeysaPoissondistribution

anditsvariancecanbecalculatedfromitsexpectation.

Themassivenumberofitemsinthemicroblogdatameanswecannotplaceallofthemonthescreen.Hence,weaggregatesimilaritemstoformacluster.Theoverallrankingscoreofaclusterrcisde nedasthesumoftherankingscoresofitsitems[4].Therankingscoresareindependentofeachotherandtheoverallvarianceofthecluster,vc,isthesumofthevarianceoftherankingscores.Thus,theuncertaintyofacluster,uc,canbecalculatednaturallybydividingvcandrc.

uc=vc/rc=∑(rj/rc)(vj/rj)=∑kjuj.

j∈c

j∈c

(c)

Fig.5.Topologicaluncertaintypropagationcalculation

MtoM .Thischangeonlyaffectsasmallpartoftherandomwalks

(7)

Eq.(7)showsthatuccanbeexpressedbyaweightedsumoftheuncertaintyofitsitemswhereeachweightkjistheratiooftherankingscoresofitemjandclusterc.Thus,theuncertaintyofaclusterismainlydeterminedbyitsimportantitems.

5.3TopologicalUncertaintyPropagation

Ifananalyst ndsanincorrectlyrankeditem,hecanmodifyitbasedonhisknowledge.Hecanfurthertrackhowtheuncertaintypropa-gatesfromoneclustertoanothertoidentifyotheraffecteditems.Tohelpananalysttrackuncertainty,weexplicitlymodelitstopologicalpropagationonthegraph.

InMRG,therankingscoreofanitemcanbeexpressedasalinearcombinationofrankingscoresofrelateditems.Hence,thevariancesofarankingscorecanalsobeexpressedasalinearcombinationofthevariancesofrelatedrankingscores.Theuncertaintyofeachitemcanbecalculatedfromitsrankingscoreanditsvariance,andhence,theuncertaintyofanitemcanalsobeexpressedlinearlybytheuncertaintyofotheritems.Speci cally,

uj=

i=1,i=j

N

m ijui,

(8)

222whereeachm ij=(dmijri)/((1 dmjj)rj).Eq.(8)showsthattheuncer-taintyofeachitemisnotindependentanditpropagatesonthegraphin

alinearform.Thus,foreachpairofitemsiandj,m ijuicanbeviewedasthepropagateduncertaintyfromitemitoj.Wedenoteitbyui→j.RewritingEq.(8)inamatrixform,wecanformulatetheuncertaintypropagationasaMarkovchain:

usedintheMonteCarlosamplingmethod.

Fortheaffectedrandomwalks,existingincrementalgraphrankingalgorithms[3]performre-samplingandupdatetherankingscoresbyaggregatingthestatisticsofthesenewrandomwalksintotheoriginalresults.Onemainproblemwiththesealgorithmsisthatre-samplingrequiresaconsiderableamountoftime,whichmaymakereal-timeinteractionimpossible.Supposenistheaveragenumberofneighborsthatanitemhasandlistheaveragelengthofasampledrandomwalk.Ateachstepinarandomwalk,wehavetosamplefromamultinomialdistributionwithnpossibleoutcomesandthetimecostisO(n).Thus,samplinganewrandomwalkwilltakeO(nl)time.ThetimeneededtocomputeandaggregatethestatisticsofthesesamplesisO(l).ThetotaltimerequiredforanewsampleisO(nl)+O(l)=O(nl).

However,inourscenario,wedonotdeleteoraddedgesonthegraphs.Asaresult,wedonotneedtoperformre-sampling.Weonlyneedtomodifythestatisticsofarandomwalkbasedonthemodi edtransitionprobabilitymij,therebyavoidingthehighcostassociatedwithresampling.Thetimecostofupdatinganin uencedrandomwalkisreducedtoO(l).

Givenarandomwalk:path={i→n1→...→nk 1→j},wede neanewrandomvariablexikj.Inparticular,xikj=1indicatesthattherandomwalkstartsfromiandreachesjbymovingksteps.Theoriginalweightofeachstepintherandomwalkis1.Duringanupdate,were-calculatetheweightofthisstepusingP (xikj=1)/P(xikj=1).P(xikj=1)istheprobabilityofxikj=1accordingtoMandP (xikj=1)istheprobabilityofk=1accordingtoM .Hence,P(xk=1)iscalculatedby:xijij

P(xikj=1)=min1mn1n2...mnk 1j.

(12)

Similarly,P (xikj=1)canalsobecalculated.6

VISUALIZATION

UM =U,

(9)

whereU=[uj]1×NandM =[m ij]N×N.

Similartotheuncertaintypropagationfromitemtoitem,wecanmodeltheuncertaintypropagationfromclustertoclusterusingthefol-lowingprocedure.First,basedonEq.(8),wecalculatethepropagateduncertaintyfromeachitemiinthesourceclustercstoeachitemjinthetargetclusterct(Fig.5(a)).Second,foreachitemjinct,wecomputethepropagateduncertaintyucs→jfromcstoitemjbyaggregatingtheuncertaintypropagatedfromeachiteminthesourcecluster(Fig.5(b)).

ucs→j=

i∈cs

Tohelpanalystsextractmicroblogdataofinterestinteractively,wehavedesignedacompositevisualizationthatincludesagraphvisualization,anuncertaintyglyph,anda owmap(Fig1(a)).6.1RankingResultsasGraphVisualization

Sinceonepostcorrespondstoonlyoneuserandafewhashtags,thescopeofin uenceofapostissmallerthanthatofauserorahashtag.Updatingtherankingscoreofapostwillonlydirectlyaffecttherankingscoresofitsauthor,afewrelatedhashtags,andanumberofposts.Incontrast,updatingtherankingscoreofauserorhashtagwilldirectlyimpacttherankingscoresofhundredsoreventhousandsofpostsaswellasanumberofusersandhashtags.Ontheotherhand,thenumberofpostsisusuallyhuge,around10-100timesthatofusersorhashtags.Analystswouldrequiremoretimetoprovidetheirfeedbackonapostgraph.Asaresult,weregardtheuserandthehashtagastheprimaryvisualizationelementsandthepostasasecondaryelementmainlyusedtoillustratethecontentoftheprimaryelements.Accordingly,usersandhashtagsarevisuallyrepresentedbyanode-linkgraphwhereaspostsarerepresentedasalist.Forsimplicity,wetakeahashtaggraphasanexampletoillustratethebasicideaofgraphvisualization.

Toallowanalyststonavigatelargegraphsef ciently,ahierarchyisbuiltbasedonaBayesianRoseTree[26]witheachnon-leafnoderep-resentingahashtagcluster.AsshowninFig.3(a),astackedtreeisadoptedtorepresentthehashtaghierarchyandadensity-basedgraphvisualizationisemployedtoillustratetherelationshipswithintheuser/hashtaggraphsandbetweenthem(R2,R3).

∑ui→j.

(10)

Finally,theuncertaintyofaclusterisaweightedsumoftheuncertaintyoftheitemsinit(Eq.(7)).Thus,theoverallpropagateduncertaintyucs→ctfromcstoctcanbecalculatedastheweightedsumofthepropa-gateduncertaintyfromcstoeachjinct(Fig.5(c)).

ucs→ct=

j∈ct

∑kjucs→j.

(11)

5.4IncrementalRankingUpdate

Wealsoallowanalyststointeractivelymodifytheitemrankingresultbasedontheirknowledge.WecanupdatethemodellocallybecauseweusetheMonteCarlosamplingmethod.Aftertheanalystchangestherankingscore(s),ourapproachiterativelyupdatesthepriorsaliencescore(s)oftheitem(s).Accordingly,theaf nitymatrixischangedfrom

Fig.6.Basicideaofthelayoutalgorithm:(a)placetheclusternodesandderivethelayoutcenterofeachcluster;(b)computetheVoronoitessellationandtreatthecellasthelayoutareaofeachcluster;(c)layoutofrepresentativeandnon-representativenodes;and(d) nallayoutresultwithcontext.

Thedensity-basedgraphvisualizationcombinesanode-linkdiagramwithadensitymaptodisplaythenodesattheselectedlevelofthehashtagtree.Asin[25],weextractrepresentativenodesforeachoftheclusternodesattheselectedtreelevelandassignothernon-representativenodestotheirclosestrepresentativenodes.AsshowninFig.1(a),therepresentativenodesaredisplayedasanode-linkdiagramandtheothernodesasadensitymap.Inthisvisualization,therepresentativenodesofoneclusterareplacedneareachothertore ecttheircloseness.Thesizeofthenodeencodesthesumoftherankingscoreofeachitem.Thecorrespondingusersareoverlaidaroundtheselectedhashtagnodetoprovidemoreanalysiscontext(Fig.6(d)).Layout.Thelayoutofthestackedtreeisquitestraightforward.Thus,weintroducethelayoutofthedensity-basedgraph,whichcontainsthefollowingsteps.

Step1:Derivethelayoutcenterofeachclusterattheselectedtreelevel.Webuildaclustergraphbycheckingtheedgeconnectionsbetweenthetwoclusternodes.Anedgeisaddedifasuf cientnumberofconnectionsbetweenthetwoclusternodescanbefound.Theclustergraphisthenplacedbyaforce-directedlayout[21].AsshowninFig.6(a),thepositionofeachclusternodeistreatedasthecenterofeachhashtagcluster.

Step2:Computethelayoutareaofeachcluster.Inthisstep,wecomputethecorrespondingVoronoitessellationbasedontheclustercenter.Thecorrespondingtessellationcellsaretreatedaslayoutareasofthehashtagclusters(Fig.6(b)).

Step3:Layoutofrepresentativeandnon-representativenodes.Inthisstep,theforce-directedlayoutisadoptedtoplacetherepresentativenodes.Toensuretherepresentativenodeswithinoneclusterareplacedinthecorrespondingclusterlayoutarea,arepulsionforceisaddedfromtheareaboundarytoeachnodewithinthisarea.Thekerneldensityestimation[22]isutilizedtorepresentthedistributionofnon-representativenodes(Fig.6(c)).

Step4:Layoutofthecontextwordcloud.Showingthehashtaggraphandusergraphsimultaneouslywouldintroducevisualclutter.Tosolvethisissue,wetreatthehashtaggraphasaprimaryelementandtheuserinformationascontext.Inparticular,whenahashtagnodeisselected,awordcloudthatincludestheuserswhousethishashtagislaidouttoprovideusercontext.Inthiswordcloud,theselectedhashtagisplacedinthemiddle.Asweep-line-basedwordcloudlayoutalgorithm[36]isemployedtoproducesuchawordcloud.Fig.6(d)showsalayoutresultwithawordcloudcontext.

Interaction.Thefollowinginteractionsareprovidedtoassistanalystsininvestigatingtherankingresultsfrommultipleperspectives.

Examiningtherankedmicroblogdataandtheirrelationships(R2).Thedensity-basedgraphvisualizationprovidesaneasywaytoexploretherankingresultsfromthehashtagoruserperspective.Utilizingthehashtaghierarchyallowstheanalysttoexploretherankingresultsfromaglobaloverviewtolocaldetails.Several lters,suchastheedgeortheglyph lter,enableanalyststocustomizethisvieweasily.Relevantposts,hashtags,andusersarealsoprovidedtohelpanalystsbetterunderstandthecontentoftheselectedclusternode.

Smoothlyswitchingbetweendifferentdatadimensions(R3).Inspiredbythecontextpopupinteractionin[18],wealsooverlaycontextofaselecteditemtoprovidefurthernavigationcues.Forexample,ifthe

Upper Hinge (quartile)Lower Hinge (quartile)

Lower Extreme

(a)(b)(c)

Fig.7.Designoftheuncertaintyglyph:(a)boxplot;(b)transformingtheboxplottotheuncertaintyglyph;(c)analternativedesign.

analystselectsahashtag,thelabelsofuserswhousethathashtagcanbeoverlaidaroundtheselectedhashtagviaawordcloud(Fig.11(a)).Iftheanalyst ndssomethingofinterest,thehashtaggraphwillbesmoothlytransitionedtotheusergraph(Fig.11(b)).

6.2UncertaintyasGlyph

Aftertestingwiththe rstprototype,theexpertsidenti edseveralincorrectrankingresults.Theyexpressedtheneedtobeinformedofsuchresults.Thisrequirementisrelatedintimatelywiththeconclusionofpreviouswork,whichstatedthateffectivelyconveyinguncertaintyisveryimportanttothevisualanalyticsprocess[14,49].Sincetherankingresultsareaggregatedintoclustersintheoverview,theexpertswantedtoexaminetheuncertaintydistributionoftheaggregatenode,includingtheminimumvalue(0),maximumvalue(1.0),lowerextreme,upperextreme,lowerhinge(25%),andupperhinge(75%).

Inspiredbytheboxplotdesign(Fig.7(a)),wehavedesignedaglyphtomeettheaboverequirements(Fig.7(b)).AsshowninFig.7(a),sixvaluesfromasetofdataareconventionallyusedinaboxplot,includingtheminimumandmaximumvalues,theextremes,andtheupperandlowerhinges(quartiles).Atotalof50%percentofitemsfallinbetweentheupperandlowerhinges.Tocombineaboxplotwithagraphnode,we rsttransformtheboxplottoaline-basedone,andthenbenditaroundtheupperboundaryofthenode(Fig.7(b)).Wealsoattemptedseveralalternativesintheparticipatorydesignprocesswithexperts.Fig.7(c)isoneofthem.Afterinteractingwiththisalternative,theexpertsstatedthatitwasconfusing.Theythoughtthattheitemwithmoreofa lledareainsideshouldbetheoneonwhichtheyshouldfocus.However,inreality,thesenodeswereonlynodeswithalargerareabetweentheupperandlowerhinges.APhDstudentfromanartschoollatercon rmedthatalargeramountofdigitalinkwillattractmoreattentionfromusers.Afterseveralinteractionswiththeexpertsandtheartstudent,wechooseFig.7(b)asour naldesign.

Analystscanobtainanoverviewoftheuncertaintydistributioninaclusterbyexaminingitsuncertaintyglyph.Fig.8illustratesseveralexamplepatterns.Forexample,inFig.8(a),themajorityofitemsinthisclusterarecharacterizedbylowuncertainty.However,theclusteralsocontainssomeitemswithhigheruncertainty.Asaresult,exploringtheitemswithhighuncertaintyisaworthwhileendeavor.

Interaction.Inadditiontoallowinganalyststoexaminetheuncertaintyscore(R4),wealsoprovidetheinteractionshownbelowtointegrateanexpert’sknowledgeintotheretrievalprocess.

Interactiverankingre nement.Afteranexpert ndsanincorrectrank-ingresultbyexaminingtheuncertaintyglyph,theexpertcanmodifytherankingresult.Therankingscoresofthecorrespondinggraphnodeswillalsobeupdatedaccordingly.AsshowninFigs.1(c)-(f),the

isquiteshort.Thus,ifweconsiderthismeasure,manyoftheselinesegmentswillnotbebundledtogether.

Thetotaledgecompatibilityisde nedby:

Ce(ei,ej)=Cα(ei,ej)·Cs(ei,ej)·Cp(ei,ej).

(a)mostitemsinhigheruncertaintyarealsoincluded;(b)mostitemsinthisclusterhavehighuncertaintyandsomeitemswithhigheruncertaintyalsooccur;(c)uncertaintydistributionisuniform;(d)mostitemsinthisclusterhaveloweruncertainty.

Fig.9(b)showsthematchedresultsofthepropagationpaths.

Step3:Computetheforcetobundlethepropagationpath.Thecom-binedforceforapointpioneiisde nedas:

Fpi=Ki( pi 1 pi + pi pi+1 )+

ej∈E

pi pj ·Ce(ei,ej)

whereKiisthespringconstantforeachsegmentandEisthesetofallthematchededgesofei.In[19],thelastitemistheelectrostaticforceFe=1/ ei ej .Inordertobundlethematchedpathsthatarelocatedawayfromeachother,wereplaceitwithanattractingspringforce.Fig.9(c)showsthelayoutresultsofthepropagationpaths.7

QUANTITATIVEEVALUATION

Inthissection,wequantitativelyevaluatetheeffectivenessofourMRGcomputationandincrementalrankingupdatealgorithm.

youtofmultipleuncertaintypropagationpaths:(a)initiallayoutbasedonthe owmaplayout;(b)thematchedresultofthepropagationpaths;(c)thelayoutresultofthepropagationpaths.

7.1MRGComputation

ToevaluatetheperformanceofourMRGcomputationbasedontheMonteCarlosamplingmethod,wecompareditwiththematrix-basedmethodproposedin[16].WeusedtwoTwitterdatasetsintheexperiments:governmentshutdownandEbolaoutbreak.Theshutdowndatasetcontainstweetsonthe2013USgovernmentshutdown(5,132,510tweetsfromOct.1toOct.16,2013),whichwerecollectedbyusingqueriessuchas“shutdown.”TheEboladatasetcontainstweetsontheEbolaoutbreak(1,425,017tweetsfromJan.1toDec.25,2014),whichwerecollectedbyusingqueriessuchas“ebola.”AllexperimentswereconductedonaPCwitha3.1GHzCPUand16GBRAM.

Thereweretoomanyposts,users,orhashtagsandwecouldnotlabelallofthem.Thus,wedidnotreporttherecallinourevaluation.Inthisevaluation,weusedtopn-precision(n-Prec)astheevaluationmeasure.Topn-precisionisthepercentageofthecorrectlyretrieveditemsamongthetop-nrankeditems.Thismeasureisoftenusedwhentherecallishardtocalculate[9].Tofullycomparethetwoalgorithms,wecalculatedthetop10,50,100,and200-precisionforposts,users,andhashtags,respectively.WeinvitedtwoPhDstudentswhomajoredindataminingandarefamiliarwiththedatasetstoevaluatethere-trievalresults.Theylabeledtheresultsindividuallyandresolvedthedifferencesviadiscussion.TheresultsareshowninTable1.Overall,ouralgorithmperformedbetterthanthebaselineonbothdatasets.Weinspectedthetop10retrieveditemswithbothmethods.Ingeneral,theretrieveditemswerequiteaccurate.However,thebaselinehadonemistakeinthetop10usersselectedfromtheshutdowndataset.Itoverestimatedtheimportanceofausercalled@governmentclosd,whopostedasigni cantnumberoftweetswithanumberofhashtags.However,thisuserdidnothavemanyfollowersandhis/hertweetswereseldomretweeted.Incontrast,ouralgorithmcanavoidthismistakebytakingauser’sauthorityintoconsideration.ThebaselinealgorithmalsohadsimilarmistakesintheEboladataset.

Dataset

n-Precrankingscores(e.g.,nodesizes)ofseveralnodeschanged.Aglyphisdesignedtoillustratethechange,withthedottedorangecircleencodingthepreviousrankingscoreandtheboundaryofthe lledcircle(graycolor)representingthechangedrankingscore(Figs.1(d)-(f)).6.3UncertaintyPropagationasFlowMap

The owmap[33,44]isdesignedtovisuallyanalyzethemovementofobjectsfromonelocationtomultiplelocations.Inspiredbythisdesign,wedeveloptheuncertaintypropagationpath(Fig.1),whichisusefulforquicklyderivingtheunknownuncertainnode(s)fromtheknownone(s)(R5).

Layout.Thelayoutofmultipleuncertaintypropagationpathsofdiffer-entnodesisbasedonthe owmaplayoutin[44]andtheedgebundlingin[19].Thelayoutcontainsthefollowingsteps.

Step1:Derivetheinitialuncertaintypropagationpathbasedonthe owmaplayout.We rstcomputetheuncertaintypropagationoftheselectednodebasedonthetopologybyusingthemethodinSec.5.3.The owmaplayoutviaspiraltreesisthenunitizedtogeneratetheinitialuncertaintypropagationpath(Fig.9(a)).

Step2:Employedgecompatibilitymeasurestomatchthecorrespond-ingpropagationpathsfromdifferentnodes.Inthisstep,weemploythethreecompatibilitymeasuresdescribedin[19]tomatchthepropagationpathsfromdifferentnodes.

The rstmeasureisanglecompatibility,whichaimstomatchtheedgeswithasmallerangle.Itisde nedby:

Cα(ei,ej)=|cos(α)|.

Thesecondmeasureisscalecompatibility,whichtendstomatchtheedgeswithsimilarlengths.Itismeasuredby:

Cs(ei,ej)=2/(lavg·min(|ei|,|ej|)+min(|ei|,|ej|)/lavg),

wherelavg=(|ei|+|ej|)/2.

Thethirdmeasureispositioncompatibility,whichaimstomatchthecloseedgestogether.Itisde nedby:

Cp(ei,ej)=lavg/(lavg+ Qm1 Qm2 ),

Shutdown

whereQm1andQm2arethemidpointsofedgeseiandej.

Thelastmeasure,visibilitycompatibility,describedin[19]isnotconsideredinourmethodbecausetherearetoomanylinesegmentsinthepropagationpathgeneratedbythe owmaplayout,eachofwhich

Ebola

parisonoftheMRGcomputationmethodwiththebaselineusingtop-nprecisionforposts,users,andhashtags.

10.Theoverviewofthegovernmentshutdowndataset.

ciology(S)andoneresearcherinmediaandcommunications(C),tounderstandtheirrespectiveinterestsinthedatasets.Wedesignedanumberofexplorationtasks.Inthesecondphase,wecollaboratedwiththeexpertsto nishthedesignedtasks.Duringthisphase,weaskedquestionstodiscusswiththeexpertstheusefulnessofourtoolforeachtask.Finally,theexpertswereinvitedtoanotherdiscussionsessiontoprovideoverallfeedbackonhowourtoolcouldhelpthemwithreal-worldtasks.

8.1CaseStudy:GovernmentShutdown

Inthisstudy,weworkedwithexpertSto:1)evaluatehowuncertaintyanalysiscanbeutilizedtoidentifykeyhashtagsanduserswithasatis-factorycon dencelevel;2)leverageoursystemtoiterativelyreducetheuncertaintylevels;3)extractrelevanthashtags/users/tweetsrelatedtothegovernmentshutdown.

.Theexpertquicklyfoundinterestingresultsafterexamininghashtagoverview(Fig.10(a))generatedbyoursystem.Sheiden-sevenprominenttopicsdescribedbyasetofhashtags:generalabouttheshutdownandObamacare(Fig.10A),politicalontwitter(Fig.10B),discussiononendingtheshutdown10C),thein uenceoftheshutdownonpeople’slives(Fig.10D)thegovernmentshutdownonnewsmedia(Fig.10E),debt-discussion(Fig.10F),andcriticsoftheshutdown(Fig.10G).analysis:The“#shutdown”cluster(Fig.10(b))attractedexpert’sattentionbecauseitcontainsitemswithhigheruncertainty.expertexaminedthedetailedhashtagsandtweetsinthecluster.foundthatinadditiontocommonhashtagssuchas#govtshutdown,and#shutdowngop,anumberofdiversehashtagsalsocreated.Suchhashtagsincludedthosethatcriticizedthee.g.,#shutdownharry;localnewsposts,e.g.,#hounews,andcampaigns,e.g.,#dontcutkids.Shewantedtoexaminethemostones,soshesortedthehashtagsbytheuncertaintylevel.#lewinskywasrankedasthemostuncertainhashtag10(c)).Theanalystsearchedtherelatedtweetsandfoundthattaggedwith#lewinskyconcernedtheshutdownoftheClintonin1995.Theexpertdecidedtolowertherankingscoreofhashtag.Duringtheprocess,shecommentedthattheuncertaintyanditem- lteringfeaturewereuseful,helpingher lteroutitemsbyloweringtheirrankingscores.

Uncertaintypropagation:Next,theexpertexaminedhowtheuncer-taintyofthe“#shutdown”clusterwouldin uenceneighboringclusters.Sheclickedthe“propagation”buttonandthecorrespondinguncertaintypropagationwasdisplayed(theorange owinFig.1).Shealsoselectedtheuncertaintypropagationofthe“#democrats”cluster(theblue owinFig.1)and“#republicans”cluster(thegreen owinFig.1),whichwerecloselyrelatedtothe“#shutdown”cluster.AsshowninFig.1(b),clus-ter“#nationalparks”sharedtheuncertaintypropagatedfromthethreeclusters.GiventhattheclosingofthenationalparkswasaresultofthegovernmentshutdownandstimulateddiscussiononTwitter,theexpertincreasedtherankingscoreof#nationalparks(from4to6).Inoursys-tem,therankingscoreisfrom1to10,with10beingthehighestscore.Aftertheadjustment,shenoticedthescoresofanothertwohashtagclusterswereautomaticallyincreased:“#spitehouse”and“#teaparty.”Inthe rstcluster,therankingscoresofhashtagssuchas“#spitehouse”

switchedbacktothehashtaggraphtocheckthe

onthisgraph.Shefoundanewhashtagcluster,Shethenzoomedintothiscluster.Asshowninprimarilyexpressescriticismofthegovernment,DemocratsorRepublicans(“@PeteSessions#De-#shutdown#MakeDCListen#senatemustactStandforusefulnessofoursystem,weconductedasemi-withthetwoexperts.TheyusedMutualRankerin2hours,sotheywerefamiliarwithitsbasicfunctions.waswellreceivedbythem.

MutualRankerasaresearchtooltohelpposts,users,andhashtagsquicklyandconve-believedthatMutualRankerisveryusefulforcodingAccordingtohim,codingisthemostinhis eld.Extensivetrainingandcarefulatten-beenrequiredtoproducereliabledata.Hecommented,isurgentlyneededinmydailyworktoandcosts.Especiallywhentherearenotthelinkagebetweenitems[inthissystem]willprovidetomakedecision.[...]Thissystemalsoprovidesanthedataretrievalprocess.”

wereimpressedbytheuncertaintyillustrationandfunction.Forexample,ExpertCsaid,“Uncertaintyawesomefeature,Icanuseitto ndsomeunexpectedthecoverageofcoding.”

thatsmoothlyswitchingbetweendifferentdatagraphshelpedthem ndrelevantdatamorequickly.ExpertScom-mented,“Thisswitchingfunctionenablesmetoeasilytransitionbe-tweenthehashtaggraphandtheusergraph.WhenImodifyonerankingscoreinonegraph,Icannotonlyverifytheresultinthisgraph,butalsoverifyitinanothergraph.”

Theexpertsalsosuggestedseveralimprovements.Thetargetaudi-enceofMutualRankerisexpertswithdomainknowledge.Theexpertsbelievedthataverageuserscanalsobene tfromit.Theysuggestedthatmoreintuitivevisualdesignbeused.ExpertCsaid,“Theuncertaintyglyphcanbesimpli edforageneraluser.Forexample,maybetheglyphdoesnotneedtoencodetheuncertaintydistribution,justsimplyshowthatthisrankingscoreisuncertainty.”Theyalsoexpressedtheneedtoretrievestreamingdata.9

DISCUSSION

AND

iscausedbytherankingscoredecreaseofhashtags“#ebt”and“#obam-zombies.”Theexpertthenexaminedtherelevanttweetstoprobethereason.TheEBTsystemwascrashedatthattimeandmanypeoplewonderedwhetherthecrashwascausedbythegovernmentshutdown:“Ahh...#ebtnotworkingcauseifa#governmentshutdown?Howsadyoucan’tspendmoneytakenfrommeagainstmywillthatIworkedfor...”Then,thecrashwasexplainedtobearesultofacomputerfailure(“AccordingtoNBC,#ebtisdownbecauseofatechnicalissue,NOT#governmentshutdown”).Thus,theexpertbelieved“#ebt”wasirrelevantandappreciatedthisautomaticchange.

Switchingbetweendifferentdataviews.Inadditiontohashtags,theexpertwantedtoexaminetheuserswhoparticipatedindifferentdiscus-siongroups.Forexample,shewantedtoidentifythemostactiveusersinthe“#shutdown”cluster,sosheoverlaidtheuserlabelsaroundthehashtaglabels(Fig.11(a)).Theexpertthenswitchedtotheuserviewtoexploreadditionaluserinformation(Fig.11(b)andFig.11(c)).Sheimmediatelyidenti edtheleadingusersinFig.11(b)andFig.11(c).Shedescribedthemwithtwocategories:1)keygovernmentof cialaccounts,including“@barackobama,”“@whitehouse”(Fig.11(b));and2)newsagencies/publicmediasuchas“@nytimes,”“@guardian,”and“@bloombergnews”(Fig.11(c)).Consideringthatpartisanleaderswereofmajorinteresttoher,she rstobservedtherankingscoresofselectpoliticians,e.g.,@speakerboehner(Rank8),@whiphoyer(Rank8),@nancypelosi(Rank7),etc.Shebelievedthattheimportanceoftheseuseraccountswasunderestimatedbecausethein uenceandac-tivenessofpoliticiansontwitterareusuallymuchlowerthanthatinreallife.Shechangedtherankingsofthepartisanleaders,“@speaker-boehner,”“@whiphoyer,”and“@nancypelosi,”to10,whichisthehighest.Fig.11(d)showsthedifferenceafterthisre nement.

Afterthechange,theuserclusterswereregeneratedandtheuncertainlylevelsofsomenodeswerelargelyreduced.Notably,“@whiphoyer”becameanimportantclusterwiththescoresofseveralusersintheclusterautomaticallyincreased(Fig.11(e)).Forexample,“@repmaloney,”from5to6and“@repteddeutch,”from5to6.“ThesearemembersofCongress.Thechangeoftheirrankingscoresisnaturalhere.”Theexpertcommented,“Thisiscool.[...]IfIwanttochangetherankingscoreofoneuser,othersjustautomaticallyfollow.Thiscouldhelpme ndtheimportantuserswhosenamesIamnotfamiliarwithorwhoarenotactiveonTwitter.”

FUTUREWORK

Thispaperpresentsavisualanalyticssystem,MutualRanker,tohelpanalystsinteractivelyretrievedataofinterestfrommicroblogs.WeextendtheMRGmodeltoextractamultifacetedretrievalresultthatincludesthemutualreinforcementrankingresults,theuncertaintyofeachrank,andtheuncertaintypropagationamongdifferentgraphnodes.Themodelistightlyintegratedwithacompositevisualizationtoassistanalystsinretrievingsalientposts,users,andhashtagseffectively,inanuncertainty-awareenvironment.

Inthefuture,weplantoimprovesystemperformancebyimplement-ingaparallelMonteCarlosamplingmethod.Anotherexcitingavenueforfutureworkistoretrievestreamingdatainmicroblogs,whichcanbeveryusefulinemergencymanagementandthreatanalysis.Webe-lievethesystemcanalsobene taverageusersinterestedincollectingmicroblogdata.Inthefuture,wewillalsoinvitemoreuserstotryoursystemandconductaformaluserstudy.Accordingly,wewillimproveMutualRankerbasedonthecollectedfeedback.ACKNOWLEDGMENTS

WewouldliketothankX.WangandJ.Yin,J.Gong,andDr.W.Cuiforhelpfuldiscussionsonthevisualizationdesign,Dr.J.ZhangandDr.Y.Songforconstructivesuggestionsonsimilaritymeasures,aswellasDr.W.PengandDr.J.Suforprovidingdomainexpertise.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI

10.1109/TVCG.2015.2467554, IEEE Transactions on Visualization and Computer Graphics

REFERENCES

[1]A.C.Alhadi,T.Gottron,J.Kunegis,andN.Naveed.Livetweet:Microblog

retrievalbasedoninterestingnessandanadaptationofthevectorspacemodel.InProceedingsofTREC,2011.

[2]K.Avrachenkov,N.Litvak,D.Nemirovsky,andN.Osipova.Montecarlo

methodsinpagerankcomputation:Whenoneiterationissuf cient.SIAMJ.Numer.Anal.,45(2):890–904,2007.

[3]B.Bahmani,A.Chowdhury,andA.Goel.Fastincrementalandpersonal-izedpagerank.Proc.VLDBEndow.,4(3):173–184,2010.

[4]M.Bianchini,M.Gori,andF.Scarselli.Insidepagerank.ACMTrans.

InternetTechnol.,5(1):92–128,2005.

[5]I.BIPM,I.IFcc,andI.IuPAc.Oiml,guidetotheexpressionofuncertainty

inmeasurement.InternationalOrganizationforStandardization,Geneva.ISBN,pages92–67,1995.

¨ttmann,S.Koch,R.Kru¨ger,[6]H.Bosch,D.Thom,F.Heimerl,E.Pu

¨rner,andT.Ertl.Scatterblogs2:Real-timemonitoringofmicroblogM.Wo

messagesthroughuser-guided ltering.IEEETVCG,19(12):2022–2031,2013.

[7]S.BrinandL.Page.Theanatomyofalarge-scalehypertextualwebsearch

puternetworksandISDNsystems,30(1):107–117,1998.[8]N.Cao,Y.-R.Lin,X.Sun,zer,S.Liu,andH.Qu.Whisper:Tracing

thespatiotemporalprocessofinformationdiffusioninrealtime.IEEETVCG,18(12):2649–2658,2012.

[9]A.ChandramouliandS.Gauch.Aco-operativewebservicesparadigm

forsupportingcrawlers.InLargeScaleSemanticAccesstoContent(Text,Image,Video,andSound),pages475–489,2007.

[10]H.Chen,S.Zhang,W.Chen,H.Mei,J.Zhang,A.Mercer,R.Liang,and

H.Qu.Uncertainty-awaremultidimensionalensembledatavisualizationandexploration.IEEETVCG,2015(ToAppear).

[11]J.Chen,J.Zhu,Z.Wang,X.Zheng,andB.Zhang.Scalableinferencefor

logistic-normaltopicmodels.InProceedingsofNIPS,pages2445–2453.2013.

[12]S.CherichiandR.Faiz.Relevantinformationmanagementinmicroblogs.

InformationSystemsforKnowledgeManagement,pages159–182,2013.[13]C.Collins,S.Carpendale,andG.Penn.Visualizationofuncertaintyin

latticestosupportdecision-making.InProceedingsofEUROVIS,pages51–58,2007.

[14]C.Correa,Y.-H.Chan,andK.-L.Ma.Aframeworkforuncertainty-aware

visualanalytics.InProceedingsofIEEEVAST,pages51–58,Oct2009.[15]D.R.CoxandP.A.Lewis.Thestatisticalanalysisofseriesofevents.

Wiley,1966.

[16]Y.Duan,F.Wei,Z.Chen,M.Zhou,andH.Shum.Twittertopicsumma-rizationbyrankingtweetsusingsocialin uenceandcontentquality.InProceedingsofColing,pages763–780,2012.

[17]rmationsearchandretrievalinmicroblogs.Journalof

theAmericanSocietyforInformationScienceandTechnology,62(6):996–1008,2011.

[18]S.Ghani,B.Kwon,S.Lee,J.-S.Yi,andN.Elmqvist.Visualanalyticsfor

multimodalsocialnetworkanalysis:Adesignstudywithsocialscientists.IEEETVCG,19(12):2032–2041,2013.

[19]D.HoltenandJ.J.VanWijk.Force-directededgebundlingforgraph

puterGraphicsForum,28(3):983–990,2009.

[20]W.JavedandN.Elmqvist.Exploringthedesignspaceofcomposite

visualization.InProceedingsofPaci cVis,pages1–8,2012.

[21]T.KamadaandS.Kawai.Analgorithmfordrawinggeneralundirected

rmationprocessingletters,31(1):7–15,1989.

[22]mpeandH.Hauser.Interactivevisualizationofstreamingdata

withkerneldensityestimation.InProceedingsofPaci cVis,pages171–178,2011.

[23]B.LiuandL.Zhang.Asurveyofopinionminingandsentimentanalysis.

InMiningtextdata,pages415–463.2012.

[24]S.Liu,W.Cui,Y.Wu,andM.Liu.Asurveyoninformationvisualization:

recentadvancesandchallenges.TheVisualComputer,pages1–21,2014.[25]S.Liu,X.Wang,J.Chen,J.Zhu,andB.Guo.Topicpanorama:Afull

pictureofrelevanttopics.InProceedingsofIEEEVAST,pages183–192,2014.

[26]X.Liu,Y.Song,S.Liu,andH.Wang.Automatictaxonomyconstruction

fromkeywords.InProceedingsofKDD,pages1433–1441,2012.

[27]S.Lodha,A.Pang,R.Sheehan,andC.Wittenbrink.U ow:visualizing

uncertaintyin uid ow.InProceedingsofIEEEVisualization,pages249–254,Oct1996.

[28]Y.Lu,F.Wang,andR.Maciejewski.Businessintelligencefromsocial

[29][30][31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50][51][52][53][54][55]

media:Astudyfromthevastboxof cechallenge.IEEEComputerGraphicsandApplications,34(5):58–69,2014.

Z.Luo,M.Osborne,S.Petrovic,andT.Wang.Improvingtwitterretrievalbyexploitingstructuralinformation.InProceedingsofAAAI,2012.A.Marcus,M.S.Bernstein,O.Badar,D.R.Karger,S.Madden,andR.C.Miller.Twitinfo:Aggregatingandvisualizingmicroblogsforeventexploration.InProceedingsofCHI,pages227–236,2011.

R.McCreadieandC.Macdonald.Relevanceinmicroblogs:Enhancingtweetretrievalusinghyperlinkeddocuments.InProceedingsofOAIR,pages189–196,2013.

A.T.Pang,C.M.Wittenbrink,andS.K.Lodha.Approachestouncertaintyvisualization.TheVisualComputer,13(8):370–390,1997.

D.Phan,L.Xiao,R.Yeh,andP.Hanrahan.Flowmaplayout.InProceed-ingsofIEEEInfoVis,pages219–224,2005.

E.J.Ruiz,V.Hristidis,C.Castillo,A.Gionis,andA.Jaimes.Correlating nancialtimeserieswithmicro-bloggingactivity.InProceedingsofWSDM,pages513–522,2012.

S.SedhaiandA.Sun.Hashtagrecommendationforhyperlinkedtweets.InProceedingsofSIGIR,pages831–834,2014.

L.Shi,F.Wei,S.Liu,L.Tan,X.Lian,andM.Zhou.Understandingtextcorporawithmultiplefacets.InProceedingsofIEEEVAST,pages99–106,2010.

M.Skeels,B.Lee,G.Smith,rmationVisualization,9(1):70–81,2010.A.Slingsby,J.Dykes,andJ.Wood.Exploringuncertaintyingeode-mographicswithinteractivegraphics.IEEETVCG,17(12):2545–2554,2011.

G.Sun,Y.Wu,R.Liang,andS.Liu.Asurveyofvisualanalyticstech-niquesandapplications:State-of-the-artresearchandfuturechallenges.JournalofComputerScienceandTechnology,28(5):852–867,2013.G.Sun,Y.Wu,S.Liu,T.-Q.Peng,J.Zhu,andR.Liang.Evoriver:Visualanalysisoftopiccoopetitiononsocialmedia.IEEETVCG,20(12):1753–1762,2014.

J.Tang,Z.Liu,M.Sun,andJ.Liu.Portrayinguserlifestatusfrommicrobloggingposts.TsinghuaScienceandTechnology,18(2):182–195,2013.

J.Thomson,E.Hetzler,A.MacEachren,M.Gahegan,andM.Pavel.Atypologyforvisualizinguncertainty.SPIE,5669:146–157,2005.

C.Vehlow,T.Reinhardt,andD.Weiskopf.Visualizingfuzzyoverlappingcommunitiesinnetworks.IEEETVCG,19(12):2486–2495,Dec2013.K.Verbeek,K.Buchin,andB.Speckmann.Flowmaplayoutviaspiraltrees.IEEETVCG,17(12):2536–2544,2011.

F.Wei,W.Li,Q.Lu,andY.He.Query-sensitivemutualreinforcementchainanditsapplicationinquery-orientedmulti-documentsummarization.InProceedingsofSIGIR,pages283–290,2008.

J.Weng,E.-P.Lim,J.Jiang,andQ.He.Twitterrank:Findingtopic-sensitivein uentialtwitterers.InProceedingsofWSDM,pages261–270,2010.

Y.Wu,S.Liu,K.Yan,M.Liu,andF.Wu.Opinion ow:Visualanalysisofopiniondiffusiononsocialmedia.IEEETVCG,20(12):1763–1772,2014.Y.Wu,F.Wei,S.Liu,N.Au,W.Cui,H.Zhou,andH.Qu.Opinion-seer:Interactivevisualizationofhotelcustomerfeedback.IEEETVCG,16(6):1109–1118,2010.

Y.Wu,G.-X.Yuan,andK.-L.Ma.Visualizing owofuncertaintythroughanalyticalprocesses.IEEETVCG,18(12):2526–2535,Dec2012.

P.Xu,Y.Wu,E.Wei,T.-Q.Peng,S.Liu,J.J.H.Zhu,andH.Qu.Visualanalysisoftopiccompetitiononsocialmedia.IEEETVCG,19(12):2012–2021,2013.

E.Zangerle,W.Gassler,andG.Specht.Ontheimpactoftextsimilarityfunctionsonhashtagrecommendationsinmicrobloggingenvironments.SocialNetworkAnalysisandMining,3(4):889–898,2013.

J.Zhao,N.Cao,Z.Wen,Y.Song,Y.-R.Lin,andC.Collins.# ux ow:Visualanalysisofanomalousinformationspreadingonsocialmedia.IEEETVCG,20(12):1773–1782,2014.

X.W.Zhao,Y.Guo,Y.He,H.Jiang,Y.Wu,andX.Li.Weknowwhatyouwanttobuy:Ademographic-basedsystemforproductrecommendationonmicroblogs.InProceedingsofKDD,pages1935–1944,2014.

H.-J.Zimmermann.Fuzzysettheoryanditsapplications.SpringerScience&BusinessMedia,2001.

T.ZukandS.Carpendale.Visualizationofuncertaintyandreasoning.InProceedingsofSG,pages164–177,2007.

本文来源:https://www.bwwdw.com/article/y4v1.html

Top