Merging uncertain information with semantic heterogeneity in XML. Knowledge and Information
更新时间:2023-05-21 07:01:01 阅读量: 实用文档 文档下载
- merging推荐度:
- 相关推荐
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
MergingUncertainInformationwithSemantic
HeterogeneityinXML
AnthonyHunter andWeiruLiu
March11,2005
Abstract
Semi-structuredinformationinXMLcanbemergedinalogic-basedframework[Hun02,Hun02b].Thisframeworkhasbeenextendedtodealwithuncertainty,intheformofprobabilityvalues,degreesofbeliefs,ornecessitymeasures,associatedwithleaves(i.e.,textentries)intheXMLdocuments[HL04a].Inthispaperwefurtherextendthisapproachtomodellingandmerginguncertaininformationthatisde nedatdifferentlevelsofgranularityofXMLtextentries,andtomodellingandreasoningwithXMLdocumentsthatcontainsemanticallyheterogeneousuncertaininformationonmorecomplexelementsinXMLsubtrees.Wepresenttheformalde nitionsformodelling,propagatingandmergingsemanti-callyheterogeneousuncertaininformationandexplainhowtheycanbehandledusinglogic-basedfusiontechniques.
1Introduction
WithXMLfastemergingasthedominantstandardforrepresentingandexchanginginformationovertheweb,theneedformodellinguncertaintyintheinformationhasbeguntobeaddressed.In[NJ02],aprob-abilisticapproachistakentomodelandreasonwithuncertaininformationatdifferentlevelsoftagsinasingleXMLdocument.The nalprobabilityofthevalueofaspeci ctagiscalculatedviamultipleconditionalprobabilitiesonitsancesters’tags.Inanotherapproach[KKA05],probabilityvaluesarealsoattachedtotags,butitrequiresthattheprobabilitiesofasetofvaluesassociatedwithasingletagmustsumto1.0,aconditionthatwasnotrequiredin[NJ02].AsimplemergingmethodisalsoprovidedtointegratetwoprobabilisticXMLtreesin[KKA05],whilst[NJ02]didnotconsidermultipleXMLdocuments.Since
[KKA05]doesnotusemuchofthebackgroundknowledgetoverifytheprobabilisticXMLdocumentsbe-foremerging,eventwosimpleXML lesasinputcanproduceahugenumberofpossibleXMLdocumentsasoutput(seeConclusionfordetails),whichmakesthemethoddif culttouseinpractice.
Incontrast,ourapproachtomodelling,reasoning,andmergingXMLdocumentswithuncertaininfor-mation([HL04a])concernsinformationwithinthelogicalfusionframework[HS04]wherebackgroundknowledgecanprovideadditionalinformationtofacilitatemergingandreduceredundancyandinconsis-tencyamonginformation.Inthispaper,wefocusonstructuredreports.TheformatofastructuredreportisanXMLdocumentwherethetagnamesprovidethesemanticstructureandcoherencetothedocumentandthetextentries(i.e.leaves)arerestrictedto(1)individualwordsorsimplephrasesfromascienti cnomenclature/terminologyand(2)individualnumericalvalueswithunits.Forinstance,astructuredreportondepositsofaparticularundergroundlocationcanberepresentedusingthetagnamesdepositwithtextentriessuchaswater,oil,gas,andsand,etc.
Department
SchoolofComputerScience,UniversityCollegeLondon,GowerStreet,LondonWC1E6BT,UKofComputerScience,Queen’sUniversityBelfast,Belfast,CoAntrimBT71NN,UK
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
Example1Considerthefollowingtwostructuredreportswhichareforthesameareabeingexplored.Bothofthemde neamassfunctiononthetextentrydeposit.
report
source Experiment1 /source
date 19/3/02 /date
location NorthSea /location
layer layer7:100m 120m /layer
deposit
belfunction
massvalue=“0.4”
massitem water /massitem
massitem oil /massitem
/mass
massvalue=“0.6”
massitem gas /massitem
/mass
/belfunction
/deposit
/report report source Experiment2 /source date 19March2002 /date location NorthSea /location layer layer7:100m 120m /layer deposit belfunction massvalue=“0.2” massitem water /massitem /mass massvalue=“0.8” massitem gas /massitem /mass /belfunction /deposit /report
Letτ1,τ2betwologicaltermsthatrepresentthetwoXMLdocumentsabove,andletXbeavariable.AfusionpredicateDempster(τ1,τ2,X)de nedlaterinSection2takesthesetwoXMLdocumentsasinputsandgeneratesamergedstructuredreportthatgroundsXwiththecombinedmassfunctionsegmentasshownbelow.
report
source Exp1andExp2 /source
date 19/3/02 /date
location NorthSea /location
layer layer7:100m 120m /layer
deposit
belfunction
massvalue=“0.143”
massitem water /massitem
/mass
massvalue=“0.857”
massitem gas /massitem
/mass
/belfunction
/deposit
/report
Inourapproach,eachstructuredreportcanisomorphicallyberepresentedasalogicalterm:Eachtagnameisafunctionsymbol,andeachtextentryisaconstantsymbol.Furthermore,subtreesofastructuredreportcanbeisomorphicallyrepresentedassubtermsinlogic.Inthisway,theinformationineachstructuredreportcanbecapturedinalogicallanguage.Wehavealsode nedarangeofpredicates,inaPrologknowledgebase,thatcaptureusefulrelationshipsbetweenstructuredreports,andsoasetofthemcanthenbeanalysedormergedasPrologqueriestoaPrologknowledgebase.Inthisway,aquerytomergesomestructuredreportscanbehandledbyrecursivecallstoPrologtomergethesubtreesinthestructuredreports.Thisgivesacontext-dependentlogic-basedapproachtomergingthatissensitivetotheuncertaininformationinthestructuredreportsandtothebackgroundknowledgeinthePrologknowledgebase.
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
In[HL04a],amethodtomodelandmergeuncertaininformation,representedbyprobabilities,massfunc-tionsintheDempster-Shafertheoryofevidence(DStheory)[Sha76]andnecessitymeasuresinpossibilitytheory[DP88],wasproposed.Example1illustrateshowamassfunctioncanbeencodedintoXMLfor-matandhowtwomassfunctionsonthesamesetofvaluescanbemergedtoproduceacombinedXMLdocument.Detailsoftheformalde nitionandmergingprocedurewillbereviewedinSection2.
Hereinthisandsubsequentexamples,weusesomesimpli eddatafromthepetroleumexplorationdomain.Themainpurposeofpetroleumexplorationistoanalysequalitativelyandcalculatequantitativelythewellloggingdatainordertopredictthepossibledepositsinparticularlocations.Thewellloggingdataaredigitalrecordswhichcanre ecttheundergroundphysicalfeatures,forinstance,electronicresistance,micro-electroderesistance,naturalgammaray,etc.Theyarecollectedbywellloggingequipmentinsidethewellfromthegroundleveltosomedepthunderground.Thewholedepthfromthegroundleveltothebottomofthewellisdividedintolayers(suchas,100metersto150meters)basedonthedigitaldatacollectedandthevaluesofthesephysicalfeaturescangiveindicationsoflayerswithpossibledeposits.The rsttwoXMLdocumentsinExample1showhowanexpertcanpredictapossibledepositofaparticularlayer,byexaminingthedigitaldataofthelayer.Sinceequipmentusedissubjecttonoiseandinaccuracy,multipleexperimentsareneededinordertomakeanaccurateprediction.Furthermore,thegeneralanalysisofthebroaderareaofthephysicalfeaturesofthelocationoftenprovidessomeadditionalinformationforpredication.ThisknowledgecanequallyberepresentedasXMLdocumentsandbeusedtoassistpredicationwhennecessary.
Themainfocusof[HL04a]isthemodellingandmergingofuncertaininformationassociatedwithtex-tentriesinXMLdocuments.Multiplepiecesofuncertaininformationconcerningthesameissue(suchasdepositintheaboveexample)areassumedtobespeci edonthesamesetofpossiblevalues.However,
[HL04a]doesnotconsidersituationswhereonepieceofinformationusesmorespeci cvaluesthanan-othernorthesituationwhereonepieceofinformationisdescribedononesetofvaluesandanotherisonadifferentsetofvalueswherethesetwosetsofvaluesareinter-connected.
Weelaboratethisissuefurtherhere.Assumethatforatargetedlayerofaspeci cwellofaparticulararea,weonlywishtoconcludewhetherthelayercontainseithersolidorliquidmaterials,regardlessofthedetailsofthesubstance.Thenweuseasetofvalues{solid,liquid}tobearanyinformationwehaveaboutthelayer.However,wecouldmakethisinformationmorespeci cbygivingdifferenttypesofsolidandliquidsubstances,suchas,stone,sand,water,gas,oil.Thereforesomeuncertaininformationcanbedescribedonthisdetailedsetofvalues{stone,sand,water,gas,oil}.Thislattersetofvalueshasa nergranularitythantheformerone.Furthermore,sincepossibledepositsofalayerareoftendrawnthroughinterpretingwellloggingdataotherthanbeingobserveddirectly,wellloggingdatawilldirectlyin uencetheprediction.Forinstance,itiscommonlyknownthatasetofdatais rstinterpretedintermsofgeographicalfeatures,andthentheassumedfeaturesareusedtopredictpossibledeposits.Inthissituation,theinformationisrepresentedononesetofvalues(geographicalfeatures,e.g.,lithology)andtheconclusionisonanotherset(e.g.,deposit).Theinformationfromthegivensetofvaluesshouldbepropagatedtothedestinationsetofvaluesasanewdistributionofbeliefs.Todealwiththesesituations,inthispaper,weextendourtheapproachtomergingmultiplepiecesofuncertaininformationwhere
evidenceisspeci edatdifferentlevelsofgranularityonthesameconceptastextentries.Werefertotwopiecesofthistypeofevidenceassemanticallyhomogeneous.Inthiscase,avalueinacoarsersetcanbereplacebyasetofvaluesina nerset.Theexampleaboverelatingsolidandliquidwithstone,sand,water,gas,andoil,belongstothiscategory.
evidenceisspeci edoninter-relatedconceptsastextentries.Werefertotwopiecesofthistypeofevidenceassemanticallyheterogeneous.Example3belowrelatingstone,sand,water,gas,andoil,withlithologiesL1,L2etc.belongstothiscategory.
evidenceisassignedtoheterogeneoussubtreesinvolvingmultipleconcepts.Wealsorefertotwopiecesofthistypeofevidenceassemanticallyheterogeneous.Forinstance,ifwehaveasetofvalues
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
measuringthelithologyofalayerandanothersetevaluatingthetypeofdepositofthelayer,andwewouldliketoknowboththelithologyandthedepositofthelayer,thenthejointsetfromthesetwosetssayswhatlithologyandwhattypeofdepositalocationhas.
The rsttwotypesofevidenceareillustratedbyExamples2and3respectivelyandthethirdtypeofuncertaininformationisdemonstratedbyExample4.
Example2Considerthetwostructuredreportsaboutaspeci cundergroundlayer.The rstreportgivesmoreprecisedescriptionsofthepossibledepositunderaparticularlayerwithprobabilitieswhilsttheothergivesamoregeneralsuggestionofthepossibledeposit.Thesetworeportsdescribethesameproblemwithdifferentlevelsofabstraction(differentgranularities),sotheyhaveuncertaininformationthatissemanticallyhomogeneous.
report report
deposit deposit
probability probability
probvalue=“0.2” water /prob probvalue=“0.4” liquid /prob
probvalue=“0.8” sand /prob probvalue=“0.6” solid /prob
/probability /probability
/deposit /deposit
/report /report
Evidencebearingona nergranularity(e.g.,depositwithvalueswater,gasetc)wouldhaveimpactonacoarsergranularity(e.g.,depositwithvaluesliquid,solidetc)orviceversa.Itissensibletoconsiderbothpiecesofevidenceatthesamelevelofgranularityifonepieceofevidencecanbepropagatedtotheleveloftheother.Thisisthe rsttopicwewilllookintointhispaper.
Example3Thefollowingtwostructuredreportsprovidetwodifferentbutinter-relatedpiecesofevidenceaboutthesamelayerofthesamewell.Theevidenceintheleft-handXMLdocumentreportsdirectlyonthepotentialphysicalnatureofthedeposit.Thisiscommonlyusedforpredictionandthisinformationcancomefromthegeneralknowledgeaboutthearea.WhilstthesecondXMLdocumentreportsontheobservationsintermsoflithologymadebytheequipment.Fromthelithologicalfeatures,wecandeterminethephysicalnatureofthedeposit(orviceversa).TomakeuseofthissecondXMLreportinprediction,weneedtohaveapropermappingfunctionwhichspeci eshowtheinterpretationsoflithologyimplydeposits,andthenbothofthesereportscanbemerged.Sincethesetworeportsprovideuncertaininformationontwodifferentbutinter-relatedconceptsi.e.,depositandlithology,werefertothemassemanticallyheterogeneous.Propagatingapieceofuncertaininformationfromonesetofvaluestoadifferentsetofvaluesisthesecondtopicwewillinvestigateinthispaper.
report report
deposit lithology
belfunction belfunction
massvalue=“0.2” massvalue=“0.3”
massitem water /massitem massitem L1 /massitem
massitem oil /massitem massitem L3 /massitem
/mass /mass
massvalue=“0.8” massvalue=“0.7”
massitem gas /massitem massitem L2 /massitem
/mass /mass
/belfunction /belfunction
/deposit /lithology
/report /report
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
Example4Considerthefollowingtwostructuredreportswhichagainareforthesamelayerofthesamewell.Intheleftreport,therearetwoprobabilitydistributionsontwotextentriesrespectively.Whenweusethisinformationtomakeaprediction,wecaneitherusetheinformationaboutthedepositorlithologysincetheformermayhavebeenderivedfromthelaterorviceversa.Whilstintherightreport,thechildofthe probvalue=“...” tagisnotatextentry,itisinfactasubtreeinvolvingtwoconceptsdepositandlithology.Thisinformationcanbethesummaryofgeneralknowledgeaboutthisareasayingwhatdepositisassociatedwithwhatlithologies.Forthepurposeofprediction,uncertaintiesassignedtothepairsofvalues(e.g.,(water,L1))havetobere-assignedtovaluesofdepositsuchaswater,oiletc.Followingthisuncertaintyre-assignment,thenewlyderiveduncertaininformationondepositcanbemergedwiththeinformationintheleftXML.Thesetwopiecesofuncertaininformationarealsoreferredtoassemanticallyheterogeneous,however,theyrequireadifferentmethodtopropagatebeforetheycanbemerged.Subtreeuncertaininformationisthethirdtopicwewillstudyinthispaper.
report report
source experiment3 /source source Generalknowledge /source
date 19/3/02 /date date 19March2002 /date
location NorthSea /location location NorthSea /location
layer 150m 155m /layer date 150m 160m /layer
deposit probability
probability probvalue=“0.4”
probvalue=“0.2” water /prob deposit water /deposit
probvalue=“0.8” gas /prob lithology L1 /lithology
/probability /prob
/deposit probvalue“0.6”
lithology deposit gas /deposit
probability lithology L2 /lithology
probvalue=“0.3” L1 /prob /prob
probvalue=“0.7” L2 /prob /probability
/probability /report
/lithology
/report
Sothepurposeofthispaperistosigni cantlyextendourpreviouspaperonhandlinguncertainty[HL04a]bypresentingtechniquesformergingstructuredreportswithuncertaintyexpressed:(1)atdifferentlevelsofgranularity;(2)ondifferentbutinter-relatedsetsofvalues;and(3)onsubtrees.Wewillproceedasfollows.InSection2,wepresentformalde nitionsoflogicalrepresentationsofXMLdocuments,reviewthebasicsofDStheory,andprovideformalde nitionsofmodellingandmerginguncertaininformationinstructuredreportsintheformofmassfunctionsonthesametextentryoftwoXMLdocuments.InSection3,weconsiderpropagatingandmerginguncertaininformationatdifferentlevelsofgranularity.InSection4,weinvestigatemethodsofreasoningwithsemanticallyheterogeneousuncertaininformationonsubtrees.InSection5,wecompareourworkwithrelatedresearch.Finally,inSection6weprovideconclusions.2Structuredreports
Wenowbrie yreviewde nitionsforstructuredreports,Dempster-Shafertheoryofevidence(DStheory),forrepresentinguncertaininformationinstructuredreports.
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
2.1Basicde nitions
EachstructuredreportisanXMLdocument,butnotviceversa,asde nedbelow.Thisrestrictionmeansthatwecaneasilyrepresenteachstructuredreportbyagroundterminclassicallogic.
De nition1Structuredreport:If isatagname(i.eanelementname),andφisatextentry,then φ / isastructuredreport.If isatagname(i.eanelementname),φisatextentry,θisanattributename,andκisanattributevalue,then θ=κ φ / isastructuredreport.If isatagnameandσ1,...,σnarestructuredreports,then σ1...σn / isastructuredreport.
Thede nitionforastructuredreportisverygeneral.Inpractice,wewouldexpectaDTDforagivendomain.Forinstance,wewouldexpectthatforanimplementedsystemthatmergespetroleumexplorationreports,therewouldbeacorrespondingDTD.OneoftherolesofaDTD,sayforpetroleumexplorationreports,wouldbetospecifytheminimumconstellationoftagsthatwouldbeexpectedofapetroleumexplorationreport.Wemayalsoexpectintegrityconstraintsrepresentedinclassicallogictofurtherrestrictappropriatestructuredreportsforadomain[HS04].Inthispaper,wewillimposesomefurtherconstraintsonstructuredreports,inSection2.3,tosupportthehandlingofuncertainty.
Clearlyeachstructuredreportisisomorphictoatreewiththenon-leafnodesbeingthetagnamesandtheleafnodesbeingthetextentries.Whenwerefertoasubtree(ofastructuredreport),wemeanasubtreeformedfromthetreerepresentationofthestructuredreport,wheretherootofthesubtreeisatagnameandtheleavesaretextentries.Weformalizethisasfollows.
De nition2Subtree:Letσbeastructuredreportandletρbeatreethatisisomorphictoσ.Atreeρ isasubtreeofρiff(1)thesetofnodesinρ isasubsetofthesetofnodesinρ,and(2)foreachnode iinρ ,if iistheparentof jinρ,then jisinρ and iistheparentof jinρ .Byextension,ifσ isastructuredreport,andρ isisomorphictoσ ,thenwedescribeσ asasubtreeofσ.
Eachstructuredreportisalsoisomorphicwithagroundterm(ofclassicallogic)whereeachtagnameisafunctionsymbolandeachtextentryisaconstantsymbol.
De nition3Abstractterm:Eachstructuredreportisisomorphicwithagroundterm(ofclassicallogic)calledanabstractterm.Thisisomorphismisde nedinductivelyasfollows:(1)If φ / isastructuredreport,whereφisatextentry,then (φ)isanabstracttermthatisisomorphicwith φ / ;(2)If θ=κ φ / isastructuredreport,whereφisatextentry,then (φ,κ)isanabstracttermthatisisomorphicwith θ=κ φ / ;and(3)If φ1..φn / isastructuredreport,andφ 1isanabstract
termthatisisomorphicwithφ1,....,andφnisanabstracttermthatisisomorphicwithφn,then (φ 1,..,φn)
isanabstracttermthatisisomorphicwith φ1..φn / .
Viathisisomorphicrelationship,wecanrefertoabranchofanabstracttermbyusingthebranchoftheisomorphicstructuredreport,andwecanrefertoasubtreeofanabstracttermbyusingthesubtreeoftheisomorphicstructuredreport.Note,De nition1describeshowanXMLdocumentcanbede nedrecursivelystartingfromthesimplistonewhichhasonlyonetagnameandonevalueassociatedwiththetagname.AlsoDe nition3speci eshowatreestructurelikeXMLdocumentcanbeequallydescribedasalogicaltermwhichalsore ectstherelationshipsbetweentagnamesandtheirvalues.Forinstance,XMLinformation date 03/03/99 /date isdenotedasdate(03/03/99)inlogicswhere03/03/99canbeunderstoodasthevalueofattributedate.
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
Example5Considerthefollowingstructuredreport.
fieldreport
log deposit liquid /deposit lithology L1 /lithology /log
layer 250m 300m /layer
/fieldreport
Thiscanberepresentedbythefollowingabstractterm:
fieldreport(log(deposit(liquid),lithology(L1)),layer(250m 300m))
Inthisabstractterm,fieldreport/log/depositisabranch.
2.2BasicsofDempster-ShaferTheoryofEvidence
TheDempster-Shafertheory(DStheory)ofevidenceprovidesamechanismformodellingandreasoningwithuncertaininformationinanumericalway,especiallywhenitisnotpossibletoassignaproportionofthetotalbelieftosingleelementsofasetofvalues.DStheory([Sha76,Sme88])hasacommonlyacceptedadvantageoverprobabilitytheoryintermsofassigningaproportionofanagent’sbelieftoasubsetofasetofpossiblevaluesratherthanonlyonsingletons,andassigninganyunspeci edproportiontothewholeset.Thisisespeciallyusefulwhentheevidencesupportinganagent’sbeliefisnotaccurateorincomplete.Furthermore,multiplepiecesofevidencecanbeaccumulatedovertimeonthesamesubjectandthesepiecesofevidencecanbecombined/mergedinsomewayinordertodrawaconclusionoutofthem.Dempster’scombinationruleinDStheoryprovidesasimplemechanismtoachievethisobjective.DuetothesetwoadvantagesprovidedbyDStheory,wehavechosenittomodel,reasonandmergeuncertaininformationinstructuredreports.
Let bea nitesetcontainingmutuallyexclusiveandexhaustivesolutionstoaquestion. iscalledtheframeofdiscernment.Amassfunction,alsocalledabasicprobabilityassignment,capturestheimpactofapieceofevidenceonsubsetsof .Amassfunctionsm: ( )→[0,1]satis es:
(1)m( )=0
(2)ΣA m(A)=1
Whenm(A)>0,Aisreferredtoasafocalelement.ToobtainthetotalbeliefinasubsetA,i.e.theextenttowhichallavailableevidencesupportsA,weneedtosumallthemassassignedtoallsubsetsofA.Abelieffunction,Bel: ( )→[0,1],isde nedas
Bel(A)=ΣB Am(B)
Aplausibilityfunction,denotedPl: ( )→[0,1],isde nedas
¯)=ΣB∩A= m(B)Pl(A)=1 Bel(A
Dempster’sruleofcombinationbelowshowshowtwomassfunctionsm1andm2onthesameframeofdiscernmentfromindependentsources,canbecombinedtoproduceamergedmassfunction.
m1⊕m2(C)=ΣA∩B=C(m1(A)×m2(B))
A∩B= 12Amassfunctionreducestoaprobabilitydistributionwheneveryfocalelementisinfactasingleton.Itiswiththisaspectthatinthispaper,weviewprobabilitytheoryasaspecialcaseofDStheory.
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
2.3Representinguncertaininformation
Inordertosupporttherepresentationofuncertaininformationinstructuredreports,weneedsomefurtherformalization.First,weassumeasetoftagnamesthatarereservedforrepresentinguncertaininformation.Second,weassumesomeconstraintsontheuseofthesetagssothatwecanensuretheyareusedinameaningfulwaywithrespecttoprobabilitytheoryandDempster-Shafertheoryofevidence.Thesetofkeyuncertaintytagnamesforthispaperareprobabilityandbelfunction.Thesetofsubsidiaryuncertaintytagnamesforthispaperareprob,multiitem,mass,andmassitem.Theunionofthekeyuncertaintytagnamesandthesubsidiaryuncertaintytagnamesisthesetofreservedtagnames.De nition4([HL04a])Thestructuredreport probability σ1,..,σn /probability iscalledaprobability-validcomponent(ProVC)iffeachσi∈{σ1,..,σn}isoftheform probvalue=κ φ /prob whereκ∈[0,1]andφisatextentry.
Alltextentriesφibetween probvalue=κi φi /prob areelementsofapre-de nedsetcontainingmu-tuallyexclusiveandexhaustivevaluesthattherelatedtagnamecantake.
Example6ThefollowingisaProVCwhichcorrespondstoaprobabilitydistributionp(water)=0.2andp(gas)=0.8.
probability
probvalue=“0.2” water /prob
probvalue=“0.8” gas /prob
/probability
De nition5Thestructuredreport probability σ1,..,σn /probability iscalledasubtreeprobability-validcomponent(ProSC)iffforeachσi∈{σ1,..,σn},σiisoftheform
ii probvalue=κi multiitem σ1,...,σm /multiitem /prob
iiiiiiii},σjisoftheform ψj φiandforeachσj∈{σ1,..,σmjl /ψjl ,andκi∈[0,1],ψjlisatagname,andφjllisatextentry.
Example7ThefollowingisaProSCthatmodelsaprobabilitydistributiononacompoundsetofvalueswithp({water,L1})=0.4andp({gas,L2})=0.6.
probability
probvalue=“0.4”
multiitem
deposit water /deposit
lithology L1 /lithology
/multiitem
/prob
probvalue=“0.6”
multiitem
deposit gas /deposit
lithology L2 /lithology
/multiitem
/prob
/probability
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
Thereservedtagnamemultiitemwithintagnameprobindicatesthattherearemultipleconceptsinthisuncertaininformation.Intheaboveexample,eachprobabilityvalueisattachedtoacompoundelementcombiningdepositandlithology.
De nition6([HL04a])Thestructuredreport belfunction σ1,..,σn /belfunction iscalledabelfunction-validcomponent(BelVC)iffforeachσi∈{σ1,..,σn}σiisoftheform massvalue=κi ψi /mass andψiisintheform
massitem φi1 /massitem ,..., massitem φix /massitem
whereκi∈[0,1]andφisatextentry.Tomakethesubsequentnotationsimpler,wealsoletψi={φi1,...,φix}.Inthisway,aBelVCcanberepresentedasacollectionof(subset,massvalue)pairs,(ψi,κi),i=1,...,n.
Example8ThefollowingisaBelVConasingletagnamedepositwithm({water,oil})=0.2andm({gas})=0.8.
belfunction
massvalue=“0.2”
massitem water /massitem
massitem oil /massitem
/mass
massvalue=“0.8”
massitem gas /massitem
/mass
/belfunction
ThetextentriesinaBelVCareelementsofapre-de nedsetcontainingmutuallyexclusiveandexhaustivevaluesfortherelatedtagnameasinthecaseforProVCs.Wenowprovidethede nitionofmassfunctionsonsubtrees.
De nition7Thestructuredreport belfunction σ1,..,σn /belfunction iscalledasubtreebelfunction-validcomponent(BelSC)iffforeachσi∈{σ1,..,σn}σiisoftheform massvalue=κi ψi /mass andψiisintheform
multiitem i1 /multiitem ... multiitem ix /multiitem
andeach ijin{ i1,..., ix}isintheform
iiiii ρij1 φj1 /ρj1 ,..., ρjl φjl /ρjl
iiiwhereκi∈[0,1],ρijtaretagnames,andφjtaretextentries.Equally,ψi={<φ11,...,φ1p>,...,<iφix1,...,φxm>}canbeusedtostandforasubsetwithmassvalueκiwherethesubsetconsistsofelements
withmultipleatomvalues.
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
Example9ThefollowingisaBelSCprovidingamassfunctiononasubtree.
belfunction
massvalue=“0.4”
multiitem
deposit water /deposit
lithology L1 /lithology
/multiitem
multiitem
deposit oil /deposit
lithology L3 /lithology
/multiitem
/mass
massvalue=“0.6”
multiitem
deposit gas /deposit
lithology L2 /lithology
/multiitem
/mass
/belfunction
Ifabelieffunctionisde nedonasubtree,thenforeachmassvalue,itselementsshouldcomefromdifferentframes.Sothetagnamesshouldbedistinct.Inaddition,ifthesubtreeinvolvesntagnames,thenineach( multiitem , /multiitem )pair,thereshouldbentagnames.ThesearethetwoconstraintsweimposeonBelSCs.Whenatagnameamongthesennamesismissing,thispartoftheXMLcanbeextendedtoincludethemissingtagname.Morespeci cally,ifwearede ningamassfunctionforasubtreeinvolvingframesΘ1andΘ2,thenforamassassignmentthatinvolveselementsfromjustoneofthetwoframes,wecanextendittoincludealltheelementsintheotherframe.Forexample,themassfunctioninExample9gives
m({<water,L1>,<oil,L3>})=0.4,m({<gas,L2>})=0.6.
Ifitwasthecasethatm({<gas,L2>})=0.4ismis-representedasm({gas})=0.4,thenitcanbeextendedintom({<gas,L1>,<gas,L2>,...,<gas,L10>})=0.4.Thismeansgasiscompatiblewithallthelithologies.Therefore,inthefollowing,wealwaysassumethataBelSCcomplieswiththesetwoconstraints.
TheProVCs,ProSCs,BelVCs,andBelSCsarereferredtoasuncertaintycomponentsandarenormallypartoflargerstructuredreports.Normally,wewouldexpectthatforanapplication,theDTDforthestruc-turedreportswouldexcludeakeyuncertaintytagastherootofastructuredreport.Inotherwords,thekeyuncertaintytagsarerootsofsubtreesnestedwithinlargerstructuredreports.Wealsoassumevariousintegrityconstraintsontheuseoftheuncertaintycomponents.
De nition8Let probability σ1,..,σn /probability beaProVCoraProSC,andletσi∈{σ1,..,σn}beeitheroftheform probvalue=κi φi /prob oroftheform probvalue=κi multiitem φi1,...,φil /multiitem /prob .Thiscomponentadherestothefullprobabilitydistributioncon-straintiffthefollowingtwoconditionshold:
(1)Σiκi=1
(2)foralli,j,if1≤i≤nand1≤j≤nandi=j,thenφi=φjor{φi1,...,φil}={φj1,...,φjt}De nition9Let belfunction σ1,..,σn /belfunction beaBelVCoraBelSC,letS={(ψ1,κ1),...,(ψn,κn)}bethecollectionof(subset,mass)pairsinthecomponent.Thiscomponentadherestothefullbelfunctiondistributionconstraintiffthefollowingtwoconditionshold:
(1)Σiκi=1
(2)foralli,j,if1≤i≤nand1≤j≤nandi=j,thenψi=ψj
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
WhentherearetwoBelVCsreferringtothesametextentry,weneedtomergethem.Thefollowingproce-dureimplementsDempster’scombinationrule.
De nition10([HL04a])LetthefollowingbetwoBelVCs
11 belfunction σ1,..,σp /belfunction
22 belfunction σ1,..,σq /belfunction
where
11111.σi∈{σ1,..,σp}isoftheform massvalue=κ1i ψi /mass
1112.the(subset,mass)paircollectionisS1={(ψ1,κ11),...,(ψp,κp)},
22223.σj∈{σ1,..,σq}isoftheform massvalue=κ2j ψj /mass
2224.the(subset,mass)paircollectionisS2={(ψ1,κ21),...,(ψq,κq)},
LetthecombinedBelVCbe belfunction σ1,..,σs /belfunction whereeachσk∈{σ1,..,σs}isoftheform massvalue=κk ψk /mass and
2Σκ1i×κjκk=1 Σκn×κm
112121221suchthatψk=ψi∩ψjforthe(ψi,κ1i)and(ψj,κj)pairs,andψn∩ψm= forthe(ψn,κn)and
2(ψm,κ2m)pairs,andψkisoftheform massitem φk1 /massitem ,..., massitem φkz /massitem .
2Thevalueκ⊥=Σκ1n×κm(thatis,ΣA∩B= (m1(A)×m2(B))indicateshowmuchofthetotalbeliefhas
beencommittedtotheemptysetwhilecombiningtwopiecesofuncertaininformation.Ahigherκ⊥valuere ectseitheraninconsistencyamongthetwosourcesorlowercon denceinanyofthepossibleoutcomesfrombothsources.
De nition11Lettheabstracttermsτ1andτ2eachdenoteaBelVCandletXbealogicalvariable.ThepredicateDempster(τ1,τ2,X)issuchthatXisevaluatedtoτ3whereτ3istheabstracttermdenotingthecombinedBelVCobtainedbyDe nition10.
ThepredicateDempster(τ1,τ2,X)isde nedinPrologtocarryouttheactualmerge.LookingbackatExample1again,ifweletτ1andτ2betheabstracttermsforthe rsttwoXMLdocumentsintheexample,thenXrepresentsthemergedabstracttermisomorphictothethirdXMLdocumentintheexample.3Merginguncertaintyontextentrieswithcompatibleframes
Inthissection,weconcentrateonmergingstructuredreportswithuncertaininformation(uncertaintyvalidcomponents)ontextentrieswhereeithertheuncertaintyisexpressedatdifferentlevelsofgranularity(whichwedescribeassemanticallyhomogeneous)orondifferentbutinter-relatedsetsofvalues(whichwede-scribeassemanticallyheterogeneous).Weconsiderbothprobabilisticandbelieffunctioninformationandtakeprobabilitytheoryasaspecialcaseofbelieffunctiontheory.Weleavethetopicofmergingsemanti-callyheterogeneousuncertainty-validcomponentsonsubtreesfrommultiplestructuredreportstothenextsection.
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
Whenmergingtwostructuredreports,onewithanuncertaintyvalidcomponentandonewithout,wetakethelatterasaspecialcaseoftheformerandassignvalue1.0(nomatterwhetheritstandsforaprobabilityvalueoramassvalue)tothecorrespondingtextentry(ortextentries).Then,thesetwostructuredreportscanbemergedusingoneoftherulesde nedbelow.
Beforeproceedingtothedetailsofthislogic-basedmergingtechnique,weneedtoemphasizethatinthispaperanytwouncertaintycomponentstobemergedareassumedtorefertothesameorrelatedis-sue(ortopic)thatarebeingconsidered.Forinstance,bothuncertaintycomponentsareeitheraboutthedepositoflayerXofNorthSeaforWellNoA,oraboutthedepositorlithologyofNorthSeaforWellNoA,layerY.IfitisthecasethatoneuncertaintycomponentisaboutthedepositofNorthSeaforWellNoAandanotherisaboutthelithologyofNorthSeaforWellNoB,thenthesetwouncertaintycomponentscannotbemerged.Themethodtoverifysemanticallywhethertwogivenuncertaintycompo-nentsareeligibleformergingisgivenin[HS04].Intherestofthispaper,wheneverweintendtomergetwosuchcomponents,weassumetheireligibilityhasbeencheckedandwewillnotrepeatthisprerequisiteanyfurther.
3.1PropagationoperationinDStheory
Whentwomassfunctionsarenotgivenonthesameframe,theycannotbecombineddirectly,ratheronemassfunctionhastobepropagatedtotheframeofanothermassfunction.Letusnowlookatseveralsituationswhenthispropagationcantakeplace.
De nition12Let 1and 2betwoframesofdiscernmentandΓbeamappingfunctionΓ: 1→2 2.Whenthefollowingconditionshold, 2iscalledare nementof 1,and 1iscalledacoarseningof 2.Γiscalledare nementmapping.
(1)Γ(φ)=Tφ= ,forallφ∈ 1,whereTφ 2
(2)Γ(φi)∩Γ(φj)= ,wheni=j
(3)∪φ∈ 1Γ(φ)= 2
Example2inSection1givesamassfunction(wetakeaprobabilitydistributionasaspecialcaseofmassfunction)onframe 1={liquid,solid}andanotheronframe 2={water,oil,gas,sand,stone}respectively. 2isinfactare nementof 1,ifwede nethere nementmappingfunctionΓas
Γ(liquid)={water,oil,gas},Γ(solid)={sand,stone}.
Are nementmappinggeneratesasetofdisjointsubsetsofthe nerframe.Throughare nementmappingΓ,wecanalsode neacoarseningmappingfunctionΓ : 2→ 1as:
Γ (ψ)=φwhereψ∈TφandΓ(φ)=Tφ
Forinstance,thecoarseningmappingfunctionoftheabovere nementmappingfunctiongives
Γ (water)=Γ (oil)=Γ (gas)=liquidΓ (sand)=Γ (stone)=solid
Lemma1Let 2beare nementofframe 1bymappingfunctionΓandletm 1beamassfunctionon 1.Functionm 2de nedbelowisamassfunctionon 2.
m 2(T)=m 1(S)whereT=Γ(φ)forφ∈S,andS 1isafocalelement.(1)
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
Let 1and 2betwoframesasde nedinExample2andlet
m 1({liquid})=0.4,m 1({solid})=0.6
beamassfunctionon 1.ApplyingLemma1,
m 2({water,oil,gas})=0.4m 2({sand,stone})=0.6
isamassfunctionon 2.
Lemma2Let 1beacoarseningofframe 2bycoarseningmappingfunctionΓ andletm 2beamassfunctionon 2.Functionm 1de nedbelowisamassfunctionon 1.
m 1(S)=ΣTm 2(T)whereS=Γ (ψ)forψ∈TandT 2isafocalelement.(2)
Yetagain,ifwehavem 2({water,oil})=0.2andm 2({gas})=0.8,basedonLemma2,thismassfunctiongeneratesamassfunctionon 1asm 1({liquid})=0.2+0.8=1.
Nowwelookatmorecomplexmappingrelationsbetweenframes.
De nition13Let 1and 2betwoframesofdiscernmentcontainingpossiblevaluestotworelatedquestionsQ1andQ2.LetΓbeamappingfunctionΓ: 1→2 2whichde nesthatwheneverφ1iisthe
1trueanswertoquestionQ1thenthetrueanswertoquestionQ2mustbeoneoftheelementsinΓ(φi)= ,
121andforeveryφ2j∈ 2,thereexistsatleastoneφisuchthatφj∈Γ(φi).Thenframes 1and 2aresaid
tobecompatible.
MappingΓisreferredtoasacompatibilitymapping[LGS86,LH+93].Equally,acompatibilitymappingcanbede nedfrom 2to 1.Are nement(orcoarsening)mappingisaspecialcaseofcompatibilitymapping.
Lemma3Let 1and 2betworelatedframeswithacompatibilitymappingΓ.Letm 1beamassfunctionon 1.Thenfunctionm 2de nedbelowisamassfunctionon 2.
m 2(T)=ΣSm 1(S)whereT=Γ(φ)forφ∈SandS 1isafocalelement.(3)
AllthesethreeLemmascanbeprovedeasily(e.g.,[Sha76]).
Forinstance,therelationshipbetweendeposits(capturedby 2)andlithologies(capturedby 3)canbeestablishedthroughamappingΓ: 2→2 3as
Γ(water)={L1,L2},Γ(oil)={L3,L4},Γ(gas)={L2,L5,L6},
Γ(sand)={L8,L9},Γ(stone)={L7,L8}.
OramappingfunctionΓ : 3→2 2as
Γ (L1)={water},Γ (L2)={water,gas},Γ (L3)={oil},
Γ (L4)={oil},Γ (L5)={gas},Γ (L6)={gas},
Γ (L7)={stone},Γ (L8)={sand,stone},Γ (L9)={sand}.
Usingthismappingrelationship,theuncertaininformationon 3inthesecondXMLdocumentinExample3canbepropagatedto 2toobtainanewmassfunctionondepositas
m 3({water,oil})=0.3,m 3({water,gas})=0.7.
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
3.2Predicateforbeliefpropagationontextentries
Wenowde neaformalproceduretoperformtheabovepropagationsasdiscussedinSection3.1andde neapredicatetocalltheprocedure.
11De nition14Let belfunction σ1,..,σp /belfunction beaBelVCwhere
11111.σi∈{σ1,..,σp}isoftheform massvalue=κ1i ψi /mass
1112.S={(ψ1,κ11),...,(ψp,κp)}isthecollectionof(subset,mass)pairs
113.Γ: 1→2 2isacompatibilitymappingandΓ(ψi)=Γ(φ1i1)∪...∪Γ(φix)where11ψi={φ1i1,...,φix}
22222LetthepropagatedBelVCon 2be belfunction σ1,..,σq /belfunction whereeachσj∈{σ1,..,σq}
2isoftheform massvalue=κ2j ψj /mass and
12111κ2j=Σiκis.tψj=Γ(ψi)foreach(ψi,κi)pair
22andψjisoftheform massitem φ2j1 /massitem ··· massitem φjy /massitem
De nition15LettheabstracttermτbeaBelVCon 1.LetΓbeacompatibilitymappingΓ: 1→2 2,andXbealogicalvariable.ThepredicatePropagate(τ,Γ,X)issuchthatXisevaluatedtoτ whereτ istheabstracttermdenotingthepropagatedBelVCon 2obtainedbyDe nition14.
PredicatePropagate(τ,Γ,X)canbeusedtogenerateaBelVConaframefromanexistingBelVConanotherframe,nomatterwhethertherelationshipbetweenthetwoframesisare nement,oracoarsening,orcompatible.
SincewetakeaProVCasaspecialcaseofBelVCs,itispossibletoeasilyconverttheformertotheformatofthelatterasgivenin[HL04a].Werepeatthisde nitionagainhere.
De nition16LetabstracttermτbeaProVC probability σ1,..,σn /probability andeachσi∈{σ1,..,σn}isoftheform probvalue=κ φ /prob whereκ∈[0,1]andφisatextentry.Thenτ isthe
/belfunction whereeachσi∈{σ1,..,σn},..,σnabstracttermdenotingtheBelVC belfunction σ1isoftheform massvalue=κ massitem φ /massitem /mass andκ∈[0,1],andφisatextentry.
De nition17IftheabstracttermτisaProVCandXisalogicalvariable,thenBayesBelief(τ,X)isapredicatesuchthatXisevaluatedtoτ whereτ istheabstracttermdenotingtheBelVCobtainedfromτbyDe nition16.
InananalogouswaytoDe nitions16and17,itispossibletode nehowaProSCcanbeconvertedintoaBelSC.
De nition18LetabstracttermτbeaProSC probability σ1,..,σn /probability andeachσi∈{σ1,..,σn}isoftheform probvalue=κi ψi /prob whereκi∈[0,1]andψiisintheform
multiitem ρi1 φi1 /ρi1 ... ρix φix /ρix /multiitem
/belfunction whereeachThenτ istheabstracttermdenotingtheBelVC belfunction σ1,..,σn σi∈{σ1,..,σn}isoftheform massvalue=κi ψi /mass andκi∈[0,1],andψiisintheform
multiitem ρi1 φi1 /ρi1 ... ρix φix /ρix /multiitem
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
De nition19LettheabstracttermτbeaProSCandletXbealogicalvariable.ThepredicateBayesBelief(τ,X)issuchthatXisevaluatedtoτ whereτ istheabstracttermdenotingtheBelSCobtainedfromτbyDe nition18.
Example10Letusre-visitExample2.Letτ1andτ2betheabstracttermsforthetwoXMLdocumentsinthisexampleleftandright.BothoftheProVCscanbeconvertedbycallingpredicatesBayesBelief(τ1,X1)
andBayesBelief(τ2,X2),whereX1andX2aregroundbyabstracttermsτ1andτ2respectivelywhere
τ1andτ2aretheconvertedBelVCsrepresentedbytheXMLdocumentsleftandrightbelow(respectively).
report report
deposit deposit
belfunction belfunction
massvalue=“0.2” massvalue=“0.4”
massitem water /massitem massitem liquid /massitem
/mass /mass
massvalue=“0.8” massvalue=“0.6”
massitem sand /massitem massitem solid /massitem
/mass /mass
/belfunction /belfunction
/deposit /deposit
/report /report
Ifanagent’squeryisposedontheconceptdepositatthegenerallevel,e.g,eitheranswersolidorliquidwillbesuf cient,thenuncertaininformationrepresentedbyX1shouldbepropagatedtothisgen-eralframeusingpredicatePropagate(X1,Γ ,X3)whereΓ isacoarseningmappingandX3isgroundtoτ3asfollows.
belfunction
massvalue=“0.2”
massitem liquid /massitem
/mass
massvalue=“0.8”
massitem solid /massitem
/mass
/belfunction
Finally,τ3canbecombinedwithτ2usingDempster(X3,X2,X4)toobtainthe nalresultwhereX3 isgroundbyτ3andX2isgroundbyτ2.ThewholesequenceofcallstothePrologpredicatescanbe
summarizedas:
BayesBelief(τ1,X1)∧BayesBelief(τ2,X2)∧Propagate(X1,Γ ,X3)∧Dempster(X3,X2,X4)Ontheotherhand,ifaqueryisposedatamoredetailedlevel,thenthecalltoPropagate(X1,Γ ,X3)isreplacedbyPropagate(X2,Γ,X3)wherethemassfunctiononthegenerallevelofframewillbeprop-agatedtothe nerframethroughare nementmappingΓ.Inthiscase,thesequenceofexecutionsofpredicatesisrevisedas:
BayesBelief(τ1,X1)∧BayesBelief(τ2,X2)∧Propagate(X2,Γ,X3)∧Dempster(X1,X3,X4)Example11Considerthefollowingthreeuncertaintyvalidcomponentswhereτ1,τ2aretheabstract
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
termsoftheleftandrightBelVCs,andτ3isthecorrespondingabstracttermfortheProVC.
belfunction
massvalue=“0.2”
massitem water /massitem
massitem gas /massitem
/mass
massvalue=“0.8”
massitem sand /massitem
/mass
/belfunction belfunction massvalue=“0.4” massitem liquid /massitem /mass massvalue=“0.6” massitem solid /massitem /mass /belfunction
probability
probvalue=“0.2” water /prob
probvalue=“0.8” gas /prob
/probability
LetQbeanagent’squeryaboutinformationonthepossibledepositofacertainlocationatthemostgenerallevel.Qcanbeansweredbythefollowingstringofcallstoseveralpredicates.
Propagate(τ1,Γ ,X1)∧Dempster(X1,τ2,X2)
∧BayesBelief(τ3,X3)∧Propagate(X3,Γ ,X4)∧Dempster(X2,X4,X5)
Inthis,Propagate(τ1,Γ ,X1)∧Dempster(X1,τ2,X2)takestheconvertedBelVC(fromadetailedframetoageneralframethroughacoarseningmappingΓ )asits rstargumentandcombinesitwiththesecondBelVC(ontheright-handside)toproduceamergedresultdenotedbyX2.ThisnewlygeneratedBelVCisthencombinedwithanotherconvertedBelVCdenotedbyvariableX4(fromprobability,usingconditionliteralBayesBelief(τ3,X3) rstandthenpropagatedtotherightframe)usingpredicateDempstertoobtainthe nalresultwhichisdenotedbyX5.
belfunction
massvalue=“0.039”
massitem liquid /massitem
/mass
massvalue=“0.960”
massitem solid /massitem
/mass
/belfunction
Allthethreesourceshaveahighcon denceinchoice{solid}thanin{liquid},sothecombinedresultgivesahighercon denceinthechoicepreferredbyallofthemandalowercon denceinthelesspreferredone.Thisisduetothefactthatthesesourcesareinagreementwitheachother.Therefore,whenmultiplesourcesarenotincon ict,mergingthemwillproduceamorecompleteandcomprehensivesolutionthanindividualsources.Methodsfordetectinginconsistenciesamongmultiplesourceshavebeendiscussedandprovidedin[HL04a]andwewillnotdiscussthemfurtherhere.
4Merginguncertaininformationonsubtrees
Todeveloppredicatesformergingsubtreeuncertaintycomponents,weneedtolookattheapproachestopropagatingmassfunctionsamongcompoundframes,sinceasubtreeuncertaintycomponentcontainstwoormoreframesofdiscernmentwhilstaquerymayonlyberelatedtooneofthem.The rstsubsectionbelowlooksintothetechniquesofmassfunctionpropagationinthissituationwhichisfollowedbyasubsectiononpredicatestomergesubtreeuncertaintycomponents.
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
4.1ExtensionandprojectionoperationsinDStheory
Theconceptofcompatibleframes(orcompatibilityrelations)canbeextendedtosituationswhereaframeisinfactaCartesianproduct(orasubsetoftheproduct)ofseveralframes.
De nition20Let i,i=1,...,nbenframesofdiscernmenteachofwhichcontainsmutuallyexclusiveandexhaustivesolutionstoarelatedquestionoravariable.Frame = i iisajointframecontainingsolutionstothejointquestionorthejointvariable.
Forinstance,let 1beaframecontainingvalues(answers)toquestion Whatdepositisit? ,and 2beaframecontainingvalues(answers)toquestion Whatlithologyisit? ,then 1 2istheframecontainingvaluesforthejointquestion Whatdepositandwhatlithologyisit? withvaluesthatareintheform<ω1i,ω2j>whereω1i∈ 1andω2j∈ 2.Ifsomeofthepairs<ω1i,ω2j>arefalse,thatis,ω1iandω2jarenotcompatible, 1 2isthenapropersubsetofthesetproductconsistingofonlythosepairswithcompatibleelementsfromindividualframes.Valuesofaframearealsoreferredtoascon gurationsofthequestionorvariableassociatedwiththeframe.Forexample,ifweletgandhbetwovariablesthatcantakevaluesfrom 1and 1 2respectively,thenvaluewaterisacon gurationofgandvalue<water,L1>isacon gurationofh.
De nition21[LHA03]LetV={r1,r2,...,rn}benvariableseachofwhichhasasetofcon gurationsrepresentedbyitsassociatedframeofdiscernment i.LetVp VandVq VbetwosubsetsofvariableswhereVp Vq,andlet Vp= ri∈Vp iand Vq= rj∈Vq jbetwojointframesforthem.LetQ Vqbeasetofcon gurationsofVq.Then,theprojectionofQto Vp,denotedbyQ↓Vpisasetofcon gurationsforVp.Similarly,letHbeasubsetof Vp,thentheextensionofHto Vq,denotedbyH↑VqisH Vq\Vpwhichisasetofcon gurationsforvariablesetVq.
Vp,asubsetofvariables,isalsoreferredtoasajointvariable.Inthefollowing,wetalkaboutasubsetofvariablesorajointvariableinterchangeablywithoutfurtherexplanation.Ineithercase, Vpisthefullsetofcon gurationsforit.
Example12Assumer1,r2,r3,andr4arefourvariablestakingvaluesfromframesofdiscernment i,i=1,2,3,4respectively,where 1={ω11,ω12}, 2={ω21,ω22,ω23}, 3={ω31,ω32,ω33},and 4={ω41,ω42,ω43,ω44}.LetVp={r1,r2}andVq={r1,r2,r3}betwosubsetsofvariablesandQ={<ω11,ω21,ω31>,<ω12,ω23,ω31>}beasetofcon gurationsforVq,thenQ↓Vp={<ω11,ω21>,<ω12,ω23>}isasetofcon gurationsforVp.
However,givenasetofcon gurationsH={<ω11,ω21>,<ω12,ω23>}forVp,itsextensiontovariablesetVqwouldbeQ =H↑Vq={<ω11,ω21,ω31>,<ω12,ω23,ω31>,<ω11,ω21,ω32>,<ω12,ω23,ω32>,<ω11,ω21,ω33>,<ω12,ω23,ω33>}.Thissetofcon gurationsisdifferentfromQalthoughtheprojectionofQisHtoo.
De nition22LetVp VandVq Vbetwosubsetsofvariableswhere =Vp Vq.Letmbeamassfunctionon VqforthejointvariableVq,thenthemarginalofmon VpforthejointvariableVp,denotedbym↓Vpde nedbelow,isamassfunctionon Vp
m↓Vp(H)=Σ{m(G)|G Vq,G↓Vp=H,Gisafocalelement}
Equally,ifmisamassfunctionon VpforthejointvariableVp,thenthemarginalofmon VqforthejointvariableVq,denotedbym↑Vqde nedbelow,isamassfunctionon Vq
m↑Vq(G)=Σ{m(H)|H Vp,H↑Vq=G,Hisafocalelement}
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
Example13Considerthefollowingtwouncertaintycomponents.
belfunction
massvalue=“0.2”
massitem water /massitem
massitem gas /massitem
/mass
massvalue=“0.8”
massitem sand /massitem
/mass
/belfunction belfunction massvalue=“0.4” multiitem deposit water /deposit lithology L1 /lithology /multiitem multiitem deposit oil /deposit lithology L3 /lithology
/multiitem
/mass
massvalue=“0.6”
multiitem
deposit gas /deposit
lithology L2 /lithology
/multiitem
/mass
/belfunction
Theleft-handXMLdocumentde nesamassfunctiononframe 2={water,oil,gas,sand,stone}asm 2({water,gas})=0.2andm 2({sand})=0.8,andtheright-handXMLdocumentde nesan-othermassfunctiononframe 2 3where 3={L1,L2,L3,L4,L5}asm 2 3({<water,L1>,<oil,L3>})=0.4andm 2 3({<gas,L2>})=0.6.Assumeanagentisinterestedinknowingthejointimpactofthesetwopiecesofevidenceonthevalueset 2,thentheimpactofthemassfunctionon 2 3hastobemarginalizedon 2.BasedonDe nition22,m 2 3givesanewmassfunction on 2asm 2({water,oil})=0.4andm 2({gas})=0.6,whichcanbemergedwithm 2usingDempster(τ1,τ2,X)toobtainthe nalresultm({water})=0.4andm({gas})=0.6,ifweassumethatthesetwopiecesofevidencearefromindependentsources.
4.2Predicateforbeliefmarginalizationonsubtrees
Nowweprovideaprocedurethatimplementsthemarginalizationofamassfunctionfromalargervariablesettoasmallersetde nedinDe nition22.
11 /belfunction whereDe nition23LetthefollowingbeaBelSC belfunction σ1,..,σq
11111.σi∈{σ1,..,σq}isoftheform massvalue=κ1i ψi /mass
112.ψiisoftheform multiitem 1i1 /multiitem ... multiitem in /multiitem
111111113.each 1itin{ i1,..., in}isoftheform ρit1 φit1 /ρit1 ,..., ρitl φitl /ρitl
1114.andρ1it1,...,ρitlaretagnames,andφit1,...,φitlaretextentries.
LetthevariablesetassociatedwithitbeVqwithcon gurationsin Vq.LetVp Vq.
When|Vp|>1,letthemarginalizedBelSCon Vpbe
22 belfunction σ1,..,σp /belfunction
2222whereeachσj∈{σ1,..,σp}isoftheform massvalue=κ2j ψj /mass
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
2andeachψjisoftheform
2 multiitem 2j1 /multiitem ... multiitem jm /multiitem
andeach 2jkisoftheform
22222 ρ2jk1 φjk1 /ρjk1 ,..., ρjkf φjkf /ρjkf
and111↓Vp2={<φ2κ2j=Σiκi,s.t.,{<φit1,...,φitl>}jk1,...,φikf>}
When|Vp|=1,letthemarginalizedBelVCon Vpbe
22 ρ2 belfunction σ1,..,σp /belfunction /ρ2
2222whereeachσj∈{σ1,..,σp}isoftheform massvalue=κ2j ψj /mass
2isoftheformandeachψj
2 massitem φ2j1 /massitem ,..., massitem φjm /massitem
and111↓Vp222κ2={φ2j=Σiκisuchthat,{<φit1,...,φitl>}jz}andφjz∈{φj1,...,φjm}
andρ2isatagnamethatisassociatedwiththesetofvaluesin Vp.
When|Vp|=1,thatis,thereisonlyonevariableinsetVp,aBelSCisreducedtoaBelVC.
De nition24LettheabstracttermτbeaBelSConasubtreewithvariablesetVq.LetVpbeasubsetofVqandXbealogicalvariable.ThepredicatePropagateTree(τ,Vq,X)issuchthatXisevaluatedtoτ whereτ istheabstracttermdenotingthepropagatedBelSC(orBelVC)onVpobtainedbyDe nition23.Example14LetτdenotetheBelSCinExample13.ApplyingpredicatePropagateTree(τ,{deposit},X),weobtainanewBelVCas deposit
belfunction
massvalue=“0.4”
massitem water /massitem
massitem oil /massitem
/mass
massvalue=“0.6”
massitem gas /massitem
/mass
/belfunction
/deposit
Sincethereisonlyonevariabletoprojectonwhenusingthispredicate,asubtreestructureisreducedtoaBelVConatextentry.
Similartotheprocedureandpredicateabove,itispossibletode neanotherprocedureandpredicatetomarginalizeamassfunctionfrom Vpto Vqthroughanextensionoperation.However,obtainingamassfunctiononalargerframe(withmorevariables)isnotasusefulastheprojectionoperationwhichderivesamassfunctiononasmallerframe,thereforewewillnotincludethesedetailedde nitionsinthispaper.
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
5Comparisonwithrelatedapproaches
In[NJ02],ingthismodel,wecanconstructanXMLreportasillustratedbelow.Twotypesofprobabilityassignmentsaredistinguished,mutuallyexclusiveornotmutuallyexclusive.Forthe rsttype,probabilitiesareassignedtosingleatomswhereonlyoneoftheseatomscanbetrue,andthetotalsumofprobabilityvaluesislessthanorequalto1(asfor precipitation ).Forthesecondtype,twosingleatomscanbecompatible,sothetotalsumofprobabilitiescanbegreaterthan1(asfor cities ).
report
source TV1 /source
date 19/3/02 /date
cities
cityProb=“0.7”
cityName London /cityName
precipitation
Disttype=“mutually exclusive”
ValProb=“0.1” sunny /Val
ValProc=“0.7” rain /Val
/Dist
/precipitation
/city
cityProb=“0.4”
cityName GreaterLondon /cityName
precipitation
Disttype=“mutually exclusive”
ValProb=“0.2” sunny /Val
ValProc=“0.6” rain /Val
/Dist
/precipitation
/city
/cities
/report
Thismodelallowsprobabilitiestobeassignedtomultiplegranularities.Whenthisoccurs,theprobabilityofanelementistrueisconditionedupontheexistenceofitsparent(withprobability),andsoonuntiluptotherootofthetree.Forexample,ifwewouldliketoknowtheprobabilityofsunnyinLondon,wehaveProb(precipitation=sunny∧cityName=London)
=Prob(precipitation=sunny) Prob(cityName=London)
Prob(precipitation=sunny∧cityName=London|city) Prob(city|cities) Prob(cities|report) Prob(report)
=0.1 1.0 0.7 1.0 1.0 1.0=0.07
Therefore,theprobabilityassociatedwithatextentry(atanylevel)istreatedastheconditionalprobabilityunderitsparent.Aqueryisansweredbytracingtherelevantbrancheswiththetextentriesspeci edbythequery,andcalculatingprobabilitiesusingtheconditionalprobabilitiesalongthesebranches.Thesederivedprobabilitiesaretheneithermultipliedoraddeddependingonwhetherthe“and”orthe“or”operationareusedintheoriginalquery.Forinstance,thequery“Londoniseithersunnyorrainon19/3/02”isevaluated
Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.
¨r ¨r¨rrrr¨¨¨¨rtelnmtelnm1John1112222 ¨rr¨¨r1111John
Figure1:AprobabilisticXMLtree.
as:
Prob(cityName=London∧((precipitation=sunny)∨(precipitation=rain)))
=Prob(cityName=London) Prob(precipitation=sunny)
Prob(cityName=London∧precipitation=sunny|city) Prob(city|cities)
Prob(cities|report) Prob(report)
+Prob(cityName=London) Prob(precipitation=rain)
Prob(cityName=London∧precipitation=rain|city) Prob(city|cities)
Prob(cities|report) Prob(report)
=(1.0 0.1 0.7 1.0 1.0 1.0)+(1.0 0.7 0.7 1.0 1.0 1.0)=0.07+0.49=0.56.Themainadvantageofthismodelisthatitallowsprobabilitiestobeassignedtomultiplelevelsofsubtreesandprovidesameanstocalculatethejointprobabilityfromthem.However,itdoesnotmergemultipleprobabilisticXMLdocumentsonthesameissue.Onthecontrary,ouruncertaintyXMLmodelfocusesonmultipleXMLdatasetsandprovidesasetofmeanstomergeopinionswithuncertaintyfromdifferentsourcesontextentriesandsubtrees.Therefore,ourmodellingandreasoningmethodismoregeneralthenthatin[NJ02].
AnothermethodtomodelandreasonwithprobabilisticXMLinformationisreportedin[KKA05].Inthispaper,threetypesoftagsareidenti edas:(1)tagsthatstandforprobabilities(denotedas );(2)tagsthatstandforpossiblevaluesassociatedwithprobabilities(denotedas );and(3)ordinarytagnames(denotedas ).AtreestructureincludingthesenotationsisillustratedinFigure1[KKA05].
SincetheauthorsinthepaperdidnotprovidetheactualXMLstructurefortheexample(oranyotherexamples)toexplicitlyshowhowthesetypesoftagsarerepresented,wecreatedanXMLdocumentforthisexamplebasedonourownunderstandingasdemonstratedinFigure2left.Aswecansee,thereislotofredundantinformationinthisXMLdocument,suchasallthetagsrelatedtopossiblevaluesarenotstrictlyrequired,sinceapossibletagwillalwayssitbetweenaprobabilitytagandanormaltag.ThisexamplecanbeequivalentlyrepresentedinourProSCformatwithamorecompactstructureasshowinFigure2right.
Apartfromtheapparentstructuraldifferencesbetweentheapproachin[KKA05]andours,therealdif-ferenceliesinthemergingprocessitself.In[KKA05],eachpairof(tag,value)andthecombinationofthesepairsaretreatedaspossibleworlds.ThemergingoftwoprobabilisticXMLdocumentsistogen-erateallthecombinationsofpossibleworldsfromthetwodocuments.Asaconsequence,therecanbeahugenumberofbranchesinthemergedXMLdocumentandtherecanbevarietiesofthedocument.Forinstance,oneexamplegiveninthepaperconsistsoftwosimpleXMLdocumentsaboutpersonswithcertainty(noprobabilities).Onedocumenthasdetailsforfourpersonswitheachpersonhastagsfirstname,lastname,phone,roomandassociatedvalues,theotherdocumenthasdetailsfortwopersonswiththesamesetoftagnamesandcorrespondingvalues.Interestingly,mergingthesetwosimpledocu-mentsin[KKA05]generates3201possibleworldswhichresultsinaverylargeandcomplextree.Mostof
正在阅读:
Merging uncertain information with semantic heterogeneity in XML. Knowledge and Information05-21
西南财经大学1996年度科研成果目录04-28
组成实验指导书07-09
曲轴箱体钻孔夹具设计说明书04-12
2012考研英语冲刺讲义(白子墨)(完型 翻译 写作) - 图文05-23
支柱整正10-27
县发展和改革局最新工作总结及2022年工作计划范本04-20
感动无数人的9部电影01-03
- 1Information flow based event distribution middleware
- 2Information Designer 使用手册
- 3Information Designer 使用手册
- 4On-Line Analytical Processing with Conceptual Information Systems
- 5Management Information System 管理信息系统
- 6Section 2. General Information, Conversion Tables, and Mathematics
- 7Common Criteria for Information Technology Security Evaluation ....pdf
- 8Utilizing financial market information in forecasting real g
- 9I. Ontology-based Information Retrieval
- 10Copyright information to be inserted by the Publishers MODULATED DIFFUSION FOR A SIMPLE LAT
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- information
- heterogeneity
- uncertain
- Knowledge
- semantic
- Merging
- with
- XML
- 网络与信息安全-网络扫描技术
- 高一英语阅读理解分类训练与解析(生活、体育、文化、新闻、自然类)
- 长江三角洲地区土壤环境质量与修复研究__典_省略_地区农业土壤中多环芳烃的污染状
- 考北京大学行政管理专业考研比掌握要点
- 北方地区节能建筑屋顶最佳保温层厚度的研究
- 招教 笔试 策略类 南京六城区教师招聘考试真题分析及备考建议-公共知识篇 王利科 原创
- Verilog HDL数字系统设计报告 9
- Lync Server 2010 安装详解
- 63指挥工操作施工方案安全技术交底记录施工方案安全技术交底记录
- 计算机二级C语言程序修改题解题思路
- 路由交换技术 第八讲
- 人教版三年级英语上册第一单元测试题1
- 江苏招聘社区工作者:苏州市姑苏区招聘招聘公告
- 二十四式简化太极拳动作名称
- 小学四年级英语PEP下册教学计划
- 网络内容安全技术
- 日本高档IH电饭煲产品新技术扫描
- 乡镇卫生院考核指标
- 花店老板心得分享
- 九年级化学知识点汇总(填空)