Merging uncertain information with semantic heterogeneity in XML. Knowledge and Information

更新时间:2023-05-21 07:01:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

MergingUncertainInformationwithSemantic

HeterogeneityinXML

AnthonyHunter andWeiruLiu

March11,2005

Abstract

Semi-structuredinformationinXMLcanbemergedinalogic-basedframework[Hun02,Hun02b].Thisframeworkhasbeenextendedtodealwithuncertainty,intheformofprobabilityvalues,degreesofbeliefs,ornecessitymeasures,associatedwithleaves(i.e.,textentries)intheXMLdocuments[HL04a].Inthispaperwefurtherextendthisapproachtomodellingandmerginguncertaininformationthatisde nedatdifferentlevelsofgranularityofXMLtextentries,andtomodellingandreasoningwithXMLdocumentsthatcontainsemanticallyheterogeneousuncertaininformationonmorecomplexelementsinXMLsubtrees.Wepresenttheformalde nitionsformodelling,propagatingandmergingsemanti-callyheterogeneousuncertaininformationandexplainhowtheycanbehandledusinglogic-basedfusiontechniques.

1Introduction

WithXMLfastemergingasthedominantstandardforrepresentingandexchanginginformationovertheweb,theneedformodellinguncertaintyintheinformationhasbeguntobeaddressed.In[NJ02],aprob-abilisticapproachistakentomodelandreasonwithuncertaininformationatdifferentlevelsoftagsinasingleXMLdocument.The nalprobabilityofthevalueofaspeci ctagiscalculatedviamultipleconditionalprobabilitiesonitsancesters’tags.Inanotherapproach[KKA05],probabilityvaluesarealsoattachedtotags,butitrequiresthattheprobabilitiesofasetofvaluesassociatedwithasingletagmustsumto1.0,aconditionthatwasnotrequiredin[NJ02].AsimplemergingmethodisalsoprovidedtointegratetwoprobabilisticXMLtreesin[KKA05],whilst[NJ02]didnotconsidermultipleXMLdocuments.Since

[KKA05]doesnotusemuchofthebackgroundknowledgetoverifytheprobabilisticXMLdocumentsbe-foremerging,eventwosimpleXML lesasinputcanproduceahugenumberofpossibleXMLdocumentsasoutput(seeConclusionfordetails),whichmakesthemethoddif culttouseinpractice.

Incontrast,ourapproachtomodelling,reasoning,andmergingXMLdocumentswithuncertaininfor-mation([HL04a])concernsinformationwithinthelogicalfusionframework[HS04]wherebackgroundknowledgecanprovideadditionalinformationtofacilitatemergingandreduceredundancyandinconsis-tencyamonginformation.Inthispaper,wefocusonstructuredreports.TheformatofastructuredreportisanXMLdocumentwherethetagnamesprovidethesemanticstructureandcoherencetothedocumentandthetextentries(i.e.leaves)arerestrictedto(1)individualwordsorsimplephrasesfromascienti cnomenclature/terminologyand(2)individualnumericalvalueswithunits.Forinstance,astructuredreportondepositsofaparticularundergroundlocationcanberepresentedusingthetagnamesdepositwithtextentriessuchaswater,oil,gas,andsand,etc.

Department

SchoolofComputerScience,UniversityCollegeLondon,GowerStreet,LondonWC1E6BT,UKofComputerScience,Queen’sUniversityBelfast,Belfast,CoAntrimBT71NN,UK

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

Example1Considerthefollowingtwostructuredreportswhichareforthesameareabeingexplored.Bothofthemde neamassfunctiononthetextentrydeposit.

report

source Experiment1 /source

date 19/3/02 /date

location NorthSea /location

layer layer7:100m 120m /layer

deposit

belfunction

massvalue=“0.4”

massitem water /massitem

massitem oil /massitem

/mass

massvalue=“0.6”

massitem gas /massitem

/mass

/belfunction

/deposit

/report report source Experiment2 /source date 19March2002 /date location NorthSea /location layer layer7:100m 120m /layer deposit belfunction massvalue=“0.2” massitem water /massitem /mass massvalue=“0.8” massitem gas /massitem /mass /belfunction /deposit /report

Letτ1,τ2betwologicaltermsthatrepresentthetwoXMLdocumentsabove,andletXbeavariable.AfusionpredicateDempster(τ1,τ2,X)de nedlaterinSection2takesthesetwoXMLdocumentsasinputsandgeneratesamergedstructuredreportthatgroundsXwiththecombinedmassfunctionsegmentasshownbelow.

report

source Exp1andExp2 /source

date 19/3/02 /date

location NorthSea /location

layer layer7:100m 120m /layer

deposit

belfunction

massvalue=“0.143”

massitem water /massitem

/mass

massvalue=“0.857”

massitem gas /massitem

/mass

/belfunction

/deposit

/report

Inourapproach,eachstructuredreportcanisomorphicallyberepresentedasalogicalterm:Eachtagnameisafunctionsymbol,andeachtextentryisaconstantsymbol.Furthermore,subtreesofastructuredreportcanbeisomorphicallyrepresentedassubtermsinlogic.Inthisway,theinformationineachstructuredreportcanbecapturedinalogicallanguage.Wehavealsode nedarangeofpredicates,inaPrologknowledgebase,thatcaptureusefulrelationshipsbetweenstructuredreports,andsoasetofthemcanthenbeanalysedormergedasPrologqueriestoaPrologknowledgebase.Inthisway,aquerytomergesomestructuredreportscanbehandledbyrecursivecallstoPrologtomergethesubtreesinthestructuredreports.Thisgivesacontext-dependentlogic-basedapproachtomergingthatissensitivetotheuncertaininformationinthestructuredreportsandtothebackgroundknowledgeinthePrologknowledgebase.

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

In[HL04a],amethodtomodelandmergeuncertaininformation,representedbyprobabilities,massfunc-tionsintheDempster-Shafertheoryofevidence(DStheory)[Sha76]andnecessitymeasuresinpossibilitytheory[DP88],wasproposed.Example1illustrateshowamassfunctioncanbeencodedintoXMLfor-matandhowtwomassfunctionsonthesamesetofvaluescanbemergedtoproduceacombinedXMLdocument.Detailsoftheformalde nitionandmergingprocedurewillbereviewedinSection2.

Hereinthisandsubsequentexamples,weusesomesimpli eddatafromthepetroleumexplorationdomain.Themainpurposeofpetroleumexplorationistoanalysequalitativelyandcalculatequantitativelythewellloggingdatainordertopredictthepossibledepositsinparticularlocations.Thewellloggingdataaredigitalrecordswhichcanre ecttheundergroundphysicalfeatures,forinstance,electronicresistance,micro-electroderesistance,naturalgammaray,etc.Theyarecollectedbywellloggingequipmentinsidethewellfromthegroundleveltosomedepthunderground.Thewholedepthfromthegroundleveltothebottomofthewellisdividedintolayers(suchas,100metersto150meters)basedonthedigitaldatacollectedandthevaluesofthesephysicalfeaturescangiveindicationsoflayerswithpossibledeposits.The rsttwoXMLdocumentsinExample1showhowanexpertcanpredictapossibledepositofaparticularlayer,byexaminingthedigitaldataofthelayer.Sinceequipmentusedissubjecttonoiseandinaccuracy,multipleexperimentsareneededinordertomakeanaccurateprediction.Furthermore,thegeneralanalysisofthebroaderareaofthephysicalfeaturesofthelocationoftenprovidessomeadditionalinformationforpredication.ThisknowledgecanequallyberepresentedasXMLdocumentsandbeusedtoassistpredicationwhennecessary.

Themainfocusof[HL04a]isthemodellingandmergingofuncertaininformationassociatedwithtex-tentriesinXMLdocuments.Multiplepiecesofuncertaininformationconcerningthesameissue(suchasdepositintheaboveexample)areassumedtobespeci edonthesamesetofpossiblevalues.However,

[HL04a]doesnotconsidersituationswhereonepieceofinformationusesmorespeci cvaluesthanan-othernorthesituationwhereonepieceofinformationisdescribedononesetofvaluesandanotherisonadifferentsetofvalueswherethesetwosetsofvaluesareinter-connected.

Weelaboratethisissuefurtherhere.Assumethatforatargetedlayerofaspeci cwellofaparticulararea,weonlywishtoconcludewhetherthelayercontainseithersolidorliquidmaterials,regardlessofthedetailsofthesubstance.Thenweuseasetofvalues{solid,liquid}tobearanyinformationwehaveaboutthelayer.However,wecouldmakethisinformationmorespeci cbygivingdifferenttypesofsolidandliquidsubstances,suchas,stone,sand,water,gas,oil.Thereforesomeuncertaininformationcanbedescribedonthisdetailedsetofvalues{stone,sand,water,gas,oil}.Thislattersetofvalueshasa nergranularitythantheformerone.Furthermore,sincepossibledepositsofalayerareoftendrawnthroughinterpretingwellloggingdataotherthanbeingobserveddirectly,wellloggingdatawilldirectlyin uencetheprediction.Forinstance,itiscommonlyknownthatasetofdatais rstinterpretedintermsofgeographicalfeatures,andthentheassumedfeaturesareusedtopredictpossibledeposits.Inthissituation,theinformationisrepresentedononesetofvalues(geographicalfeatures,e.g.,lithology)andtheconclusionisonanotherset(e.g.,deposit).Theinformationfromthegivensetofvaluesshouldbepropagatedtothedestinationsetofvaluesasanewdistributionofbeliefs.Todealwiththesesituations,inthispaper,weextendourtheapproachtomergingmultiplepiecesofuncertaininformationwhere

evidenceisspeci edatdifferentlevelsofgranularityonthesameconceptastextentries.Werefertotwopiecesofthistypeofevidenceassemanticallyhomogeneous.Inthiscase,avalueinacoarsersetcanbereplacebyasetofvaluesina nerset.Theexampleaboverelatingsolidandliquidwithstone,sand,water,gas,andoil,belongstothiscategory.

evidenceisspeci edoninter-relatedconceptsastextentries.Werefertotwopiecesofthistypeofevidenceassemanticallyheterogeneous.Example3belowrelatingstone,sand,water,gas,andoil,withlithologiesL1,L2etc.belongstothiscategory.

evidenceisassignedtoheterogeneoussubtreesinvolvingmultipleconcepts.Wealsorefertotwopiecesofthistypeofevidenceassemanticallyheterogeneous.Forinstance,ifwehaveasetofvalues

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

measuringthelithologyofalayerandanothersetevaluatingthetypeofdepositofthelayer,andwewouldliketoknowboththelithologyandthedepositofthelayer,thenthejointsetfromthesetwosetssayswhatlithologyandwhattypeofdepositalocationhas.

The rsttwotypesofevidenceareillustratedbyExamples2and3respectivelyandthethirdtypeofuncertaininformationisdemonstratedbyExample4.

Example2Considerthetwostructuredreportsaboutaspeci cundergroundlayer.The rstreportgivesmoreprecisedescriptionsofthepossibledepositunderaparticularlayerwithprobabilitieswhilsttheothergivesamoregeneralsuggestionofthepossibledeposit.Thesetworeportsdescribethesameproblemwithdifferentlevelsofabstraction(differentgranularities),sotheyhaveuncertaininformationthatissemanticallyhomogeneous.

report report

deposit deposit

probability probability

probvalue=“0.2” water /prob probvalue=“0.4” liquid /prob

probvalue=“0.8” sand /prob probvalue=“0.6” solid /prob

/probability /probability

/deposit /deposit

/report /report

Evidencebearingona nergranularity(e.g.,depositwithvalueswater,gasetc)wouldhaveimpactonacoarsergranularity(e.g.,depositwithvaluesliquid,solidetc)orviceversa.Itissensibletoconsiderbothpiecesofevidenceatthesamelevelofgranularityifonepieceofevidencecanbepropagatedtotheleveloftheother.Thisisthe rsttopicwewilllookintointhispaper.

Example3Thefollowingtwostructuredreportsprovidetwodifferentbutinter-relatedpiecesofevidenceaboutthesamelayerofthesamewell.Theevidenceintheleft-handXMLdocumentreportsdirectlyonthepotentialphysicalnatureofthedeposit.Thisiscommonlyusedforpredictionandthisinformationcancomefromthegeneralknowledgeaboutthearea.WhilstthesecondXMLdocumentreportsontheobservationsintermsoflithologymadebytheequipment.Fromthelithologicalfeatures,wecandeterminethephysicalnatureofthedeposit(orviceversa).TomakeuseofthissecondXMLreportinprediction,weneedtohaveapropermappingfunctionwhichspeci eshowtheinterpretationsoflithologyimplydeposits,andthenbothofthesereportscanbemerged.Sincethesetworeportsprovideuncertaininformationontwodifferentbutinter-relatedconceptsi.e.,depositandlithology,werefertothemassemanticallyheterogeneous.Propagatingapieceofuncertaininformationfromonesetofvaluestoadifferentsetofvaluesisthesecondtopicwewillinvestigateinthispaper.

report report

deposit lithology

belfunction belfunction

massvalue=“0.2” massvalue=“0.3”

massitem water /massitem massitem L1 /massitem

massitem oil /massitem massitem L3 /massitem

/mass /mass

massvalue=“0.8” massvalue=“0.7”

massitem gas /massitem massitem L2 /massitem

/mass /mass

/belfunction /belfunction

/deposit /lithology

/report /report

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

Example4Considerthefollowingtwostructuredreportswhichagainareforthesamelayerofthesamewell.Intheleftreport,therearetwoprobabilitydistributionsontwotextentriesrespectively.Whenweusethisinformationtomakeaprediction,wecaneitherusetheinformationaboutthedepositorlithologysincetheformermayhavebeenderivedfromthelaterorviceversa.Whilstintherightreport,thechildofthe probvalue=“...” tagisnotatextentry,itisinfactasubtreeinvolvingtwoconceptsdepositandlithology.Thisinformationcanbethesummaryofgeneralknowledgeaboutthisareasayingwhatdepositisassociatedwithwhatlithologies.Forthepurposeofprediction,uncertaintiesassignedtothepairsofvalues(e.g.,(water,L1))havetobere-assignedtovaluesofdepositsuchaswater,oiletc.Followingthisuncertaintyre-assignment,thenewlyderiveduncertaininformationondepositcanbemergedwiththeinformationintheleftXML.Thesetwopiecesofuncertaininformationarealsoreferredtoassemanticallyheterogeneous,however,theyrequireadifferentmethodtopropagatebeforetheycanbemerged.Subtreeuncertaininformationisthethirdtopicwewillstudyinthispaper.

report report

source experiment3 /source source Generalknowledge /source

date 19/3/02 /date date 19March2002 /date

location NorthSea /location location NorthSea /location

layer 150m 155m /layer date 150m 160m /layer

deposit probability

probability probvalue=“0.4”

probvalue=“0.2” water /prob deposit water /deposit

probvalue=“0.8” gas /prob lithology L1 /lithology

/probability /prob

/deposit probvalue“0.6”

lithology deposit gas /deposit

probability lithology L2 /lithology

probvalue=“0.3” L1 /prob /prob

probvalue=“0.7” L2 /prob /probability

/probability /report

/lithology

/report

Sothepurposeofthispaperistosigni cantlyextendourpreviouspaperonhandlinguncertainty[HL04a]bypresentingtechniquesformergingstructuredreportswithuncertaintyexpressed:(1)atdifferentlevelsofgranularity;(2)ondifferentbutinter-relatedsetsofvalues;and(3)onsubtrees.Wewillproceedasfollows.InSection2,wepresentformalde nitionsoflogicalrepresentationsofXMLdocuments,reviewthebasicsofDStheory,andprovideformalde nitionsofmodellingandmerginguncertaininformationinstructuredreportsintheformofmassfunctionsonthesametextentryoftwoXMLdocuments.InSection3,weconsiderpropagatingandmerginguncertaininformationatdifferentlevelsofgranularity.InSection4,weinvestigatemethodsofreasoningwithsemanticallyheterogeneousuncertaininformationonsubtrees.InSection5,wecompareourworkwithrelatedresearch.Finally,inSection6weprovideconclusions.2Structuredreports

Wenowbrie yreviewde nitionsforstructuredreports,Dempster-Shafertheoryofevidence(DStheory),forrepresentinguncertaininformationinstructuredreports.

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

2.1Basicde nitions

EachstructuredreportisanXMLdocument,butnotviceversa,asde nedbelow.Thisrestrictionmeansthatwecaneasilyrepresenteachstructuredreportbyagroundterminclassicallogic.

De nition1Structuredreport:If isatagname(i.eanelementname),andφisatextentry,then φ / isastructuredreport.If isatagname(i.eanelementname),φisatextentry,θisanattributename,andκisanattributevalue,then θ=κ φ / isastructuredreport.If isatagnameandσ1,...,σnarestructuredreports,then σ1...σn / isastructuredreport.

Thede nitionforastructuredreportisverygeneral.Inpractice,wewouldexpectaDTDforagivendomain.Forinstance,wewouldexpectthatforanimplementedsystemthatmergespetroleumexplorationreports,therewouldbeacorrespondingDTD.OneoftherolesofaDTD,sayforpetroleumexplorationreports,wouldbetospecifytheminimumconstellationoftagsthatwouldbeexpectedofapetroleumexplorationreport.Wemayalsoexpectintegrityconstraintsrepresentedinclassicallogictofurtherrestrictappropriatestructuredreportsforadomain[HS04].Inthispaper,wewillimposesomefurtherconstraintsonstructuredreports,inSection2.3,tosupportthehandlingofuncertainty.

Clearlyeachstructuredreportisisomorphictoatreewiththenon-leafnodesbeingthetagnamesandtheleafnodesbeingthetextentries.Whenwerefertoasubtree(ofastructuredreport),wemeanasubtreeformedfromthetreerepresentationofthestructuredreport,wheretherootofthesubtreeisatagnameandtheleavesaretextentries.Weformalizethisasfollows.

De nition2Subtree:Letσbeastructuredreportandletρbeatreethatisisomorphictoσ.Atreeρ isasubtreeofρiff(1)thesetofnodesinρ isasubsetofthesetofnodesinρ,and(2)foreachnode iinρ ,if iistheparentof jinρ,then jisinρ and iistheparentof jinρ .Byextension,ifσ isastructuredreport,andρ isisomorphictoσ ,thenwedescribeσ asasubtreeofσ.

Eachstructuredreportisalsoisomorphicwithagroundterm(ofclassicallogic)whereeachtagnameisafunctionsymbolandeachtextentryisaconstantsymbol.

De nition3Abstractterm:Eachstructuredreportisisomorphicwithagroundterm(ofclassicallogic)calledanabstractterm.Thisisomorphismisde nedinductivelyasfollows:(1)If φ / isastructuredreport,whereφisatextentry,then (φ)isanabstracttermthatisisomorphicwith φ / ;(2)If θ=κ φ / isastructuredreport,whereφisatextentry,then (φ,κ)isanabstracttermthatisisomorphicwith θ=κ φ / ;and(3)If φ1..φn / isastructuredreport,andφ 1isanabstract

termthatisisomorphicwithφ1,....,andφnisanabstracttermthatisisomorphicwithφn,then (φ 1,..,φn)

isanabstracttermthatisisomorphicwith φ1..φn / .

Viathisisomorphicrelationship,wecanrefertoabranchofanabstracttermbyusingthebranchoftheisomorphicstructuredreport,andwecanrefertoasubtreeofanabstracttermbyusingthesubtreeoftheisomorphicstructuredreport.Note,De nition1describeshowanXMLdocumentcanbede nedrecursivelystartingfromthesimplistonewhichhasonlyonetagnameandonevalueassociatedwiththetagname.AlsoDe nition3speci eshowatreestructurelikeXMLdocumentcanbeequallydescribedasalogicaltermwhichalsore ectstherelationshipsbetweentagnamesandtheirvalues.Forinstance,XMLinformation date 03/03/99 /date isdenotedasdate(03/03/99)inlogicswhere03/03/99canbeunderstoodasthevalueofattributedate.

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

Example5Considerthefollowingstructuredreport.

fieldreport

log deposit liquid /deposit lithology L1 /lithology /log

layer 250m 300m /layer

/fieldreport

Thiscanberepresentedbythefollowingabstractterm:

fieldreport(log(deposit(liquid),lithology(L1)),layer(250m 300m))

Inthisabstractterm,fieldreport/log/depositisabranch.

2.2BasicsofDempster-ShaferTheoryofEvidence

TheDempster-Shafertheory(DStheory)ofevidenceprovidesamechanismformodellingandreasoningwithuncertaininformationinanumericalway,especiallywhenitisnotpossibletoassignaproportionofthetotalbelieftosingleelementsofasetofvalues.DStheory([Sha76,Sme88])hasacommonlyacceptedadvantageoverprobabilitytheoryintermsofassigningaproportionofanagent’sbelieftoasubsetofasetofpossiblevaluesratherthanonlyonsingletons,andassigninganyunspeci edproportiontothewholeset.Thisisespeciallyusefulwhentheevidencesupportinganagent’sbeliefisnotaccurateorincomplete.Furthermore,multiplepiecesofevidencecanbeaccumulatedovertimeonthesamesubjectandthesepiecesofevidencecanbecombined/mergedinsomewayinordertodrawaconclusionoutofthem.Dempster’scombinationruleinDStheoryprovidesasimplemechanismtoachievethisobjective.DuetothesetwoadvantagesprovidedbyDStheory,wehavechosenittomodel,reasonandmergeuncertaininformationinstructuredreports.

Let bea nitesetcontainingmutuallyexclusiveandexhaustivesolutionstoaquestion. iscalledtheframeofdiscernment.Amassfunction,alsocalledabasicprobabilityassignment,capturestheimpactofapieceofevidenceonsubsetsof .Amassfunctionsm: ( )→[0,1]satis es:

(1)m( )=0

(2)ΣA m(A)=1

Whenm(A)>0,Aisreferredtoasafocalelement.ToobtainthetotalbeliefinasubsetA,i.e.theextenttowhichallavailableevidencesupportsA,weneedtosumallthemassassignedtoallsubsetsofA.Abelieffunction,Bel: ( )→[0,1],isde nedas

Bel(A)=ΣB Am(B)

Aplausibilityfunction,denotedPl: ( )→[0,1],isde nedas

¯)=ΣB∩A= m(B)Pl(A)=1 Bel(A

Dempster’sruleofcombinationbelowshowshowtwomassfunctionsm1andm2onthesameframeofdiscernmentfromindependentsources,canbecombinedtoproduceamergedmassfunction.

m1⊕m2(C)=ΣA∩B=C(m1(A)×m2(B))

A∩B= 12Amassfunctionreducestoaprobabilitydistributionwheneveryfocalelementisinfactasingleton.Itiswiththisaspectthatinthispaper,weviewprobabilitytheoryasaspecialcaseofDStheory.

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

2.3Representinguncertaininformation

Inordertosupporttherepresentationofuncertaininformationinstructuredreports,weneedsomefurtherformalization.First,weassumeasetoftagnamesthatarereservedforrepresentinguncertaininformation.Second,weassumesomeconstraintsontheuseofthesetagssothatwecanensuretheyareusedinameaningfulwaywithrespecttoprobabilitytheoryandDempster-Shafertheoryofevidence.Thesetofkeyuncertaintytagnamesforthispaperareprobabilityandbelfunction.Thesetofsubsidiaryuncertaintytagnamesforthispaperareprob,multiitem,mass,andmassitem.Theunionofthekeyuncertaintytagnamesandthesubsidiaryuncertaintytagnamesisthesetofreservedtagnames.De nition4([HL04a])Thestructuredreport probability σ1,..,σn /probability iscalledaprobability-validcomponent(ProVC)iffeachσi∈{σ1,..,σn}isoftheform probvalue=κ φ /prob whereκ∈[0,1]andφisatextentry.

Alltextentriesφibetween probvalue=κi φi /prob areelementsofapre-de nedsetcontainingmu-tuallyexclusiveandexhaustivevaluesthattherelatedtagnamecantake.

Example6ThefollowingisaProVCwhichcorrespondstoaprobabilitydistributionp(water)=0.2andp(gas)=0.8.

probability

probvalue=“0.2” water /prob

probvalue=“0.8” gas /prob

/probability

De nition5Thestructuredreport probability σ1,..,σn /probability iscalledasubtreeprobability-validcomponent(ProSC)iffforeachσi∈{σ1,..,σn},σiisoftheform

ii probvalue=κi multiitem σ1,...,σm /multiitem /prob

iiiiiiii},σjisoftheform ψj φiandforeachσj∈{σ1,..,σmjl /ψjl ,andκi∈[0,1],ψjlisatagname,andφjllisatextentry.

Example7ThefollowingisaProSCthatmodelsaprobabilitydistributiononacompoundsetofvalueswithp({water,L1})=0.4andp({gas,L2})=0.6.

probability

probvalue=“0.4”

multiitem

deposit water /deposit

lithology L1 /lithology

/multiitem

/prob

probvalue=“0.6”

multiitem

deposit gas /deposit

lithology L2 /lithology

/multiitem

/prob

/probability

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

Thereservedtagnamemultiitemwithintagnameprobindicatesthattherearemultipleconceptsinthisuncertaininformation.Intheaboveexample,eachprobabilityvalueisattachedtoacompoundelementcombiningdepositandlithology.

De nition6([HL04a])Thestructuredreport belfunction σ1,..,σn /belfunction iscalledabelfunction-validcomponent(BelVC)iffforeachσi∈{σ1,..,σn}σiisoftheform massvalue=κi ψi /mass andψiisintheform

massitem φi1 /massitem ,..., massitem φix /massitem

whereκi∈[0,1]andφisatextentry.Tomakethesubsequentnotationsimpler,wealsoletψi={φi1,...,φix}.Inthisway,aBelVCcanberepresentedasacollectionof(subset,massvalue)pairs,(ψi,κi),i=1,...,n.

Example8ThefollowingisaBelVConasingletagnamedepositwithm({water,oil})=0.2andm({gas})=0.8.

belfunction

massvalue=“0.2”

massitem water /massitem

massitem oil /massitem

/mass

massvalue=“0.8”

massitem gas /massitem

/mass

/belfunction

ThetextentriesinaBelVCareelementsofapre-de nedsetcontainingmutuallyexclusiveandexhaustivevaluesfortherelatedtagnameasinthecaseforProVCs.Wenowprovidethede nitionofmassfunctionsonsubtrees.

De nition7Thestructuredreport belfunction σ1,..,σn /belfunction iscalledasubtreebelfunction-validcomponent(BelSC)iffforeachσi∈{σ1,..,σn}σiisoftheform massvalue=κi ψi /mass andψiisintheform

multiitem i1 /multiitem ... multiitem ix /multiitem

andeach ijin{ i1,..., ix}isintheform

iiiii ρij1 φj1 /ρj1 ,..., ρjl φjl /ρjl

iiiwhereκi∈[0,1],ρijtaretagnames,andφjtaretextentries.Equally,ψi={<φ11,...,φ1p>,...,<iφix1,...,φxm>}canbeusedtostandforasubsetwithmassvalueκiwherethesubsetconsistsofelements

withmultipleatomvalues.

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

Example9ThefollowingisaBelSCprovidingamassfunctiononasubtree.

belfunction

massvalue=“0.4”

multiitem

deposit water /deposit

lithology L1 /lithology

/multiitem

multiitem

deposit oil /deposit

lithology L3 /lithology

/multiitem

/mass

massvalue=“0.6”

multiitem

deposit gas /deposit

lithology L2 /lithology

/multiitem

/mass

/belfunction

Ifabelieffunctionisde nedonasubtree,thenforeachmassvalue,itselementsshouldcomefromdifferentframes.Sothetagnamesshouldbedistinct.Inaddition,ifthesubtreeinvolvesntagnames,thenineach( multiitem , /multiitem )pair,thereshouldbentagnames.ThesearethetwoconstraintsweimposeonBelSCs.Whenatagnameamongthesennamesismissing,thispartoftheXMLcanbeextendedtoincludethemissingtagname.Morespeci cally,ifwearede ningamassfunctionforasubtreeinvolvingframesΘ1andΘ2,thenforamassassignmentthatinvolveselementsfromjustoneofthetwoframes,wecanextendittoincludealltheelementsintheotherframe.Forexample,themassfunctioninExample9gives

m({<water,L1>,<oil,L3>})=0.4,m({<gas,L2>})=0.6.

Ifitwasthecasethatm({<gas,L2>})=0.4ismis-representedasm({gas})=0.4,thenitcanbeextendedintom({<gas,L1>,<gas,L2>,...,<gas,L10>})=0.4.Thismeansgasiscompatiblewithallthelithologies.Therefore,inthefollowing,wealwaysassumethataBelSCcomplieswiththesetwoconstraints.

TheProVCs,ProSCs,BelVCs,andBelSCsarereferredtoasuncertaintycomponentsandarenormallypartoflargerstructuredreports.Normally,wewouldexpectthatforanapplication,theDTDforthestruc-turedreportswouldexcludeakeyuncertaintytagastherootofastructuredreport.Inotherwords,thekeyuncertaintytagsarerootsofsubtreesnestedwithinlargerstructuredreports.Wealsoassumevariousintegrityconstraintsontheuseoftheuncertaintycomponents.

De nition8Let probability σ1,..,σn /probability beaProVCoraProSC,andletσi∈{σ1,..,σn}beeitheroftheform probvalue=κi φi /prob oroftheform probvalue=κi multiitem φi1,...,φil /multiitem /prob .Thiscomponentadherestothefullprobabilitydistributioncon-straintiffthefollowingtwoconditionshold:

(1)Σiκi=1

(2)foralli,j,if1≤i≤nand1≤j≤nandi=j,thenφi=φjor{φi1,...,φil}={φj1,...,φjt}De nition9Let belfunction σ1,..,σn /belfunction beaBelVCoraBelSC,letS={(ψ1,κ1),...,(ψn,κn)}bethecollectionof(subset,mass)pairsinthecomponent.Thiscomponentadherestothefullbelfunctiondistributionconstraintiffthefollowingtwoconditionshold:

(1)Σiκi=1

(2)foralli,j,if1≤i≤nand1≤j≤nandi=j,thenψi=ψj

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

WhentherearetwoBelVCsreferringtothesametextentry,weneedtomergethem.Thefollowingproce-dureimplementsDempster’scombinationrule.

De nition10([HL04a])LetthefollowingbetwoBelVCs

11 belfunction σ1,..,σp /belfunction

22 belfunction σ1,..,σq /belfunction

where

11111.σi∈{σ1,..,σp}isoftheform massvalue=κ1i ψi /mass

1112.the(subset,mass)paircollectionisS1={(ψ1,κ11),...,(ψp,κp)},

22223.σj∈{σ1,..,σq}isoftheform massvalue=κ2j ψj /mass

2224.the(subset,mass)paircollectionisS2={(ψ1,κ21),...,(ψq,κq)},

LetthecombinedBelVCbe belfunction σ1,..,σs /belfunction whereeachσk∈{σ1,..,σs}isoftheform massvalue=κk ψk /mass and

2Σκ1i×κjκk=1 Σκn×κm

112121221suchthatψk=ψi∩ψjforthe(ψi,κ1i)and(ψj,κj)pairs,andψn∩ψm= forthe(ψn,κn)and

2(ψm,κ2m)pairs,andψkisoftheform massitem φk1 /massitem ,..., massitem φkz /massitem .

2Thevalueκ⊥=Σκ1n×κm(thatis,ΣA∩B= (m1(A)×m2(B))indicateshowmuchofthetotalbeliefhas

beencommittedtotheemptysetwhilecombiningtwopiecesofuncertaininformation.Ahigherκ⊥valuere ectseitheraninconsistencyamongthetwosourcesorlowercon denceinanyofthepossibleoutcomesfrombothsources.

De nition11Lettheabstracttermsτ1andτ2eachdenoteaBelVCandletXbealogicalvariable.ThepredicateDempster(τ1,τ2,X)issuchthatXisevaluatedtoτ3whereτ3istheabstracttermdenotingthecombinedBelVCobtainedbyDe nition10.

ThepredicateDempster(τ1,τ2,X)isde nedinPrologtocarryouttheactualmerge.LookingbackatExample1again,ifweletτ1andτ2betheabstracttermsforthe rsttwoXMLdocumentsintheexample,thenXrepresentsthemergedabstracttermisomorphictothethirdXMLdocumentintheexample.3Merginguncertaintyontextentrieswithcompatibleframes

Inthissection,weconcentrateonmergingstructuredreportswithuncertaininformation(uncertaintyvalidcomponents)ontextentrieswhereeithertheuncertaintyisexpressedatdifferentlevelsofgranularity(whichwedescribeassemanticallyhomogeneous)orondifferentbutinter-relatedsetsofvalues(whichwede-scribeassemanticallyheterogeneous).Weconsiderbothprobabilisticandbelieffunctioninformationandtakeprobabilitytheoryasaspecialcaseofbelieffunctiontheory.Weleavethetopicofmergingsemanti-callyheterogeneousuncertainty-validcomponentsonsubtreesfrommultiplestructuredreportstothenextsection.

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

Whenmergingtwostructuredreports,onewithanuncertaintyvalidcomponentandonewithout,wetakethelatterasaspecialcaseoftheformerandassignvalue1.0(nomatterwhetheritstandsforaprobabilityvalueoramassvalue)tothecorrespondingtextentry(ortextentries).Then,thesetwostructuredreportscanbemergedusingoneoftherulesde nedbelow.

Beforeproceedingtothedetailsofthislogic-basedmergingtechnique,weneedtoemphasizethatinthispaperanytwouncertaintycomponentstobemergedareassumedtorefertothesameorrelatedis-sue(ortopic)thatarebeingconsidered.Forinstance,bothuncertaintycomponentsareeitheraboutthedepositoflayerXofNorthSeaforWellNoA,oraboutthedepositorlithologyofNorthSeaforWellNoA,layerY.IfitisthecasethatoneuncertaintycomponentisaboutthedepositofNorthSeaforWellNoAandanotherisaboutthelithologyofNorthSeaforWellNoB,thenthesetwouncertaintycomponentscannotbemerged.Themethodtoverifysemanticallywhethertwogivenuncertaintycompo-nentsareeligibleformergingisgivenin[HS04].Intherestofthispaper,wheneverweintendtomergetwosuchcomponents,weassumetheireligibilityhasbeencheckedandwewillnotrepeatthisprerequisiteanyfurther.

3.1PropagationoperationinDStheory

Whentwomassfunctionsarenotgivenonthesameframe,theycannotbecombineddirectly,ratheronemassfunctionhastobepropagatedtotheframeofanothermassfunction.Letusnowlookatseveralsituationswhenthispropagationcantakeplace.

De nition12Let 1and 2betwoframesofdiscernmentandΓbeamappingfunctionΓ: 1→2 2.Whenthefollowingconditionshold, 2iscalledare nementof 1,and 1iscalledacoarseningof 2.Γiscalledare nementmapping.

(1)Γ(φ)=Tφ= ,forallφ∈ 1,whereTφ 2

(2)Γ(φi)∩Γ(φj)= ,wheni=j

(3)∪φ∈ 1Γ(φ)= 2

Example2inSection1givesamassfunction(wetakeaprobabilitydistributionasaspecialcaseofmassfunction)onframe 1={liquid,solid}andanotheronframe 2={water,oil,gas,sand,stone}respectively. 2isinfactare nementof 1,ifwede nethere nementmappingfunctionΓas

Γ(liquid)={water,oil,gas},Γ(solid)={sand,stone}.

Are nementmappinggeneratesasetofdisjointsubsetsofthe nerframe.Throughare nementmappingΓ,wecanalsode neacoarseningmappingfunctionΓ : 2→ 1as:

Γ (ψ)=φwhereψ∈TφandΓ(φ)=Tφ

Forinstance,thecoarseningmappingfunctionoftheabovere nementmappingfunctiongives

Γ (water)=Γ (oil)=Γ (gas)=liquidΓ (sand)=Γ (stone)=solid

Lemma1Let 2beare nementofframe 1bymappingfunctionΓandletm 1beamassfunctionon 1.Functionm 2de nedbelowisamassfunctionon 2.

m 2(T)=m 1(S)whereT=Γ(φ)forφ∈S,andS 1isafocalelement.(1)

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

Let 1and 2betwoframesasde nedinExample2andlet

m 1({liquid})=0.4,m 1({solid})=0.6

beamassfunctionon 1.ApplyingLemma1,

m 2({water,oil,gas})=0.4m 2({sand,stone})=0.6

isamassfunctionon 2.

Lemma2Let 1beacoarseningofframe 2bycoarseningmappingfunctionΓ andletm 2beamassfunctionon 2.Functionm 1de nedbelowisamassfunctionon 1.

m 1(S)=ΣTm 2(T)whereS=Γ (ψ)forψ∈TandT 2isafocalelement.(2)

Yetagain,ifwehavem 2({water,oil})=0.2andm 2({gas})=0.8,basedonLemma2,thismassfunctiongeneratesamassfunctionon 1asm 1({liquid})=0.2+0.8=1.

Nowwelookatmorecomplexmappingrelationsbetweenframes.

De nition13Let 1and 2betwoframesofdiscernmentcontainingpossiblevaluestotworelatedquestionsQ1andQ2.LetΓbeamappingfunctionΓ: 1→2 2whichde nesthatwheneverφ1iisthe

1trueanswertoquestionQ1thenthetrueanswertoquestionQ2mustbeoneoftheelementsinΓ(φi)= ,

121andforeveryφ2j∈ 2,thereexistsatleastoneφisuchthatφj∈Γ(φi).Thenframes 1and 2aresaid

tobecompatible.

MappingΓisreferredtoasacompatibilitymapping[LGS86,LH+93].Equally,acompatibilitymappingcanbede nedfrom 2to 1.Are nement(orcoarsening)mappingisaspecialcaseofcompatibilitymapping.

Lemma3Let 1and 2betworelatedframeswithacompatibilitymappingΓ.Letm 1beamassfunctionon 1.Thenfunctionm 2de nedbelowisamassfunctionon 2.

m 2(T)=ΣSm 1(S)whereT=Γ(φ)forφ∈SandS 1isafocalelement.(3)

AllthesethreeLemmascanbeprovedeasily(e.g.,[Sha76]).

Forinstance,therelationshipbetweendeposits(capturedby 2)andlithologies(capturedby 3)canbeestablishedthroughamappingΓ: 2→2 3as

Γ(water)={L1,L2},Γ(oil)={L3,L4},Γ(gas)={L2,L5,L6},

Γ(sand)={L8,L9},Γ(stone)={L7,L8}.

OramappingfunctionΓ : 3→2 2as

Γ (L1)={water},Γ (L2)={water,gas},Γ (L3)={oil},

Γ (L4)={oil},Γ (L5)={gas},Γ (L6)={gas},

Γ (L7)={stone},Γ (L8)={sand,stone},Γ (L9)={sand}.

Usingthismappingrelationship,theuncertaininformationon 3inthesecondXMLdocumentinExample3canbepropagatedto 2toobtainanewmassfunctionondepositas

m 3({water,oil})=0.3,m 3({water,gas})=0.7.

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

3.2Predicateforbeliefpropagationontextentries

Wenowde neaformalproceduretoperformtheabovepropagationsasdiscussedinSection3.1andde neapredicatetocalltheprocedure.

11De nition14Let belfunction σ1,..,σp /belfunction beaBelVCwhere

11111.σi∈{σ1,..,σp}isoftheform massvalue=κ1i ψi /mass

1112.S={(ψ1,κ11),...,(ψp,κp)}isthecollectionof(subset,mass)pairs

113.Γ: 1→2 2isacompatibilitymappingandΓ(ψi)=Γ(φ1i1)∪...∪Γ(φix)where11ψi={φ1i1,...,φix}

22222LetthepropagatedBelVCon 2be belfunction σ1,..,σq /belfunction whereeachσj∈{σ1,..,σq}

2isoftheform massvalue=κ2j ψj /mass and

12111κ2j=Σiκis.tψj=Γ(ψi)foreach(ψi,κi)pair

22andψjisoftheform massitem φ2j1 /massitem ··· massitem φjy /massitem

De nition15LettheabstracttermτbeaBelVCon 1.LetΓbeacompatibilitymappingΓ: 1→2 2,andXbealogicalvariable.ThepredicatePropagate(τ,Γ,X)issuchthatXisevaluatedtoτ whereτ istheabstracttermdenotingthepropagatedBelVCon 2obtainedbyDe nition14.

PredicatePropagate(τ,Γ,X)canbeusedtogenerateaBelVConaframefromanexistingBelVConanotherframe,nomatterwhethertherelationshipbetweenthetwoframesisare nement,oracoarsening,orcompatible.

SincewetakeaProVCasaspecialcaseofBelVCs,itispossibletoeasilyconverttheformertotheformatofthelatterasgivenin[HL04a].Werepeatthisde nitionagainhere.

De nition16LetabstracttermτbeaProVC probability σ1,..,σn /probability andeachσi∈{σ1,..,σn}isoftheform probvalue=κ φ /prob whereκ∈[0,1]andφisatextentry.Thenτ isthe

/belfunction whereeachσi∈{σ1,..,σn},..,σnabstracttermdenotingtheBelVC belfunction σ1isoftheform massvalue=κ massitem φ /massitem /mass andκ∈[0,1],andφisatextentry.

De nition17IftheabstracttermτisaProVCandXisalogicalvariable,thenBayesBelief(τ,X)isapredicatesuchthatXisevaluatedtoτ whereτ istheabstracttermdenotingtheBelVCobtainedfromτbyDe nition16.

InananalogouswaytoDe nitions16and17,itispossibletode nehowaProSCcanbeconvertedintoaBelSC.

De nition18LetabstracttermτbeaProSC probability σ1,..,σn /probability andeachσi∈{σ1,..,σn}isoftheform probvalue=κi ψi /prob whereκi∈[0,1]andψiisintheform

multiitem ρi1 φi1 /ρi1 ... ρix φix /ρix /multiitem

/belfunction whereeachThenτ istheabstracttermdenotingtheBelVC belfunction σ1,..,σn σi∈{σ1,..,σn}isoftheform massvalue=κi ψi /mass andκi∈[0,1],andψiisintheform

multiitem ρi1 φi1 /ρi1 ... ρix φix /ρix /multiitem

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

De nition19LettheabstracttermτbeaProSCandletXbealogicalvariable.ThepredicateBayesBelief(τ,X)issuchthatXisevaluatedtoτ whereτ istheabstracttermdenotingtheBelSCobtainedfromτbyDe nition18.

Example10Letusre-visitExample2.Letτ1andτ2betheabstracttermsforthetwoXMLdocumentsinthisexampleleftandright.BothoftheProVCscanbeconvertedbycallingpredicatesBayesBelief(τ1,X1)

andBayesBelief(τ2,X2),whereX1andX2aregroundbyabstracttermsτ1andτ2respectivelywhere

τ1andτ2aretheconvertedBelVCsrepresentedbytheXMLdocumentsleftandrightbelow(respectively).

report report

deposit deposit

belfunction belfunction

massvalue=“0.2” massvalue=“0.4”

massitem water /massitem massitem liquid /massitem

/mass /mass

massvalue=“0.8” massvalue=“0.6”

massitem sand /massitem massitem solid /massitem

/mass /mass

/belfunction /belfunction

/deposit /deposit

/report /report

Ifanagent’squeryisposedontheconceptdepositatthegenerallevel,e.g,eitheranswersolidorliquidwillbesuf cient,thenuncertaininformationrepresentedbyX1shouldbepropagatedtothisgen-eralframeusingpredicatePropagate(X1,Γ ,X3)whereΓ isacoarseningmappingandX3isgroundtoτ3asfollows.

belfunction

massvalue=“0.2”

massitem liquid /massitem

/mass

massvalue=“0.8”

massitem solid /massitem

/mass

/belfunction

Finally,τ3canbecombinedwithτ2usingDempster(X3,X2,X4)toobtainthe nalresultwhereX3 isgroundbyτ3andX2isgroundbyτ2.ThewholesequenceofcallstothePrologpredicatescanbe

summarizedas:

BayesBelief(τ1,X1)∧BayesBelief(τ2,X2)∧Propagate(X1,Γ ,X3)∧Dempster(X3,X2,X4)Ontheotherhand,ifaqueryisposedatamoredetailedlevel,thenthecalltoPropagate(X1,Γ ,X3)isreplacedbyPropagate(X2,Γ,X3)wherethemassfunctiononthegenerallevelofframewillbeprop-agatedtothe nerframethroughare nementmappingΓ.Inthiscase,thesequenceofexecutionsofpredicatesisrevisedas:

BayesBelief(τ1,X1)∧BayesBelief(τ2,X2)∧Propagate(X2,Γ,X3)∧Dempster(X1,X3,X4)Example11Considerthefollowingthreeuncertaintyvalidcomponentswhereτ1,τ2aretheabstract

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

termsoftheleftandrightBelVCs,andτ3isthecorrespondingabstracttermfortheProVC.

belfunction

massvalue=“0.2”

massitem water /massitem

massitem gas /massitem

/mass

massvalue=“0.8”

massitem sand /massitem

/mass

/belfunction belfunction massvalue=“0.4” massitem liquid /massitem /mass massvalue=“0.6” massitem solid /massitem /mass /belfunction

probability

probvalue=“0.2” water /prob

probvalue=“0.8” gas /prob

/probability

LetQbeanagent’squeryaboutinformationonthepossibledepositofacertainlocationatthemostgenerallevel.Qcanbeansweredbythefollowingstringofcallstoseveralpredicates.

Propagate(τ1,Γ ,X1)∧Dempster(X1,τ2,X2)

∧BayesBelief(τ3,X3)∧Propagate(X3,Γ ,X4)∧Dempster(X2,X4,X5)

Inthis,Propagate(τ1,Γ ,X1)∧Dempster(X1,τ2,X2)takestheconvertedBelVC(fromadetailedframetoageneralframethroughacoarseningmappingΓ )asits rstargumentandcombinesitwiththesecondBelVC(ontheright-handside)toproduceamergedresultdenotedbyX2.ThisnewlygeneratedBelVCisthencombinedwithanotherconvertedBelVCdenotedbyvariableX4(fromprobability,usingconditionliteralBayesBelief(τ3,X3) rstandthenpropagatedtotherightframe)usingpredicateDempstertoobtainthe nalresultwhichisdenotedbyX5.

belfunction

massvalue=“0.039”

massitem liquid /massitem

/mass

massvalue=“0.960”

massitem solid /massitem

/mass

/belfunction

Allthethreesourceshaveahighcon denceinchoice{solid}thanin{liquid},sothecombinedresultgivesahighercon denceinthechoicepreferredbyallofthemandalowercon denceinthelesspreferredone.Thisisduetothefactthatthesesourcesareinagreementwitheachother.Therefore,whenmultiplesourcesarenotincon ict,mergingthemwillproduceamorecompleteandcomprehensivesolutionthanindividualsources.Methodsfordetectinginconsistenciesamongmultiplesourceshavebeendiscussedandprovidedin[HL04a]andwewillnotdiscussthemfurtherhere.

4Merginguncertaininformationonsubtrees

Todeveloppredicatesformergingsubtreeuncertaintycomponents,weneedtolookattheapproachestopropagatingmassfunctionsamongcompoundframes,sinceasubtreeuncertaintycomponentcontainstwoormoreframesofdiscernmentwhilstaquerymayonlyberelatedtooneofthem.The rstsubsectionbelowlooksintothetechniquesofmassfunctionpropagationinthissituationwhichisfollowedbyasubsectiononpredicatestomergesubtreeuncertaintycomponents.

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

4.1ExtensionandprojectionoperationsinDStheory

Theconceptofcompatibleframes(orcompatibilityrelations)canbeextendedtosituationswhereaframeisinfactaCartesianproduct(orasubsetoftheproduct)ofseveralframes.

De nition20Let i,i=1,...,nbenframesofdiscernmenteachofwhichcontainsmutuallyexclusiveandexhaustivesolutionstoarelatedquestionoravariable.Frame = i iisajointframecontainingsolutionstothejointquestionorthejointvariable.

Forinstance,let 1beaframecontainingvalues(answers)toquestion Whatdepositisit? ,and 2beaframecontainingvalues(answers)toquestion Whatlithologyisit? ,then 1 2istheframecontainingvaluesforthejointquestion Whatdepositandwhatlithologyisit? withvaluesthatareintheform<ω1i,ω2j>whereω1i∈ 1andω2j∈ 2.Ifsomeofthepairs<ω1i,ω2j>arefalse,thatis,ω1iandω2jarenotcompatible, 1 2isthenapropersubsetofthesetproductconsistingofonlythosepairswithcompatibleelementsfromindividualframes.Valuesofaframearealsoreferredtoascon gurationsofthequestionorvariableassociatedwiththeframe.Forexample,ifweletgandhbetwovariablesthatcantakevaluesfrom 1and 1 2respectively,thenvaluewaterisacon gurationofgandvalue<water,L1>isacon gurationofh.

De nition21[LHA03]LetV={r1,r2,...,rn}benvariableseachofwhichhasasetofcon gurationsrepresentedbyitsassociatedframeofdiscernment i.LetVp VandVq VbetwosubsetsofvariableswhereVp Vq,andlet Vp= ri∈Vp iand Vq= rj∈Vq jbetwojointframesforthem.LetQ Vqbeasetofcon gurationsofVq.Then,theprojectionofQto Vp,denotedbyQ↓Vpisasetofcon gurationsforVp.Similarly,letHbeasubsetof Vp,thentheextensionofHto Vq,denotedbyH↑VqisH Vq\Vpwhichisasetofcon gurationsforvariablesetVq.

Vp,asubsetofvariables,isalsoreferredtoasajointvariable.Inthefollowing,wetalkaboutasubsetofvariablesorajointvariableinterchangeablywithoutfurtherexplanation.Ineithercase, Vpisthefullsetofcon gurationsforit.

Example12Assumer1,r2,r3,andr4arefourvariablestakingvaluesfromframesofdiscernment i,i=1,2,3,4respectively,where 1={ω11,ω12}, 2={ω21,ω22,ω23}, 3={ω31,ω32,ω33},and 4={ω41,ω42,ω43,ω44}.LetVp={r1,r2}andVq={r1,r2,r3}betwosubsetsofvariablesandQ={<ω11,ω21,ω31>,<ω12,ω23,ω31>}beasetofcon gurationsforVq,thenQ↓Vp={<ω11,ω21>,<ω12,ω23>}isasetofcon gurationsforVp.

However,givenasetofcon gurationsH={<ω11,ω21>,<ω12,ω23>}forVp,itsextensiontovariablesetVqwouldbeQ =H↑Vq={<ω11,ω21,ω31>,<ω12,ω23,ω31>,<ω11,ω21,ω32>,<ω12,ω23,ω32>,<ω11,ω21,ω33>,<ω12,ω23,ω33>}.Thissetofcon gurationsisdifferentfromQalthoughtheprojectionofQisHtoo.

De nition22LetVp VandVq Vbetwosubsetsofvariableswhere =Vp Vq.Letmbeamassfunctionon VqforthejointvariableVq,thenthemarginalofmon VpforthejointvariableVp,denotedbym↓Vpde nedbelow,isamassfunctionon Vp

m↓Vp(H)=Σ{m(G)|G Vq,G↓Vp=H,Gisafocalelement}

Equally,ifmisamassfunctionon VpforthejointvariableVp,thenthemarginalofmon VqforthejointvariableVq,denotedbym↑Vqde nedbelow,isamassfunctionon Vq

m↑Vq(G)=Σ{m(H)|H Vp,H↑Vq=G,Hisafocalelement}

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

Example13Considerthefollowingtwouncertaintycomponents.

belfunction

massvalue=“0.2”

massitem water /massitem

massitem gas /massitem

/mass

massvalue=“0.8”

massitem sand /massitem

/mass

/belfunction belfunction massvalue=“0.4” multiitem deposit water /deposit lithology L1 /lithology /multiitem multiitem deposit oil /deposit lithology L3 /lithology

/multiitem

/mass

massvalue=“0.6”

multiitem

deposit gas /deposit

lithology L2 /lithology

/multiitem

/mass

/belfunction

Theleft-handXMLdocumentde nesamassfunctiononframe 2={water,oil,gas,sand,stone}asm 2({water,gas})=0.2andm 2({sand})=0.8,andtheright-handXMLdocumentde nesan-othermassfunctiononframe 2 3where 3={L1,L2,L3,L4,L5}asm 2 3({<water,L1>,<oil,L3>})=0.4andm 2 3({<gas,L2>})=0.6.Assumeanagentisinterestedinknowingthejointimpactofthesetwopiecesofevidenceonthevalueset 2,thentheimpactofthemassfunctionon 2 3hastobemarginalizedon 2.BasedonDe nition22,m 2 3givesanewmassfunction on 2asm 2({water,oil})=0.4andm 2({gas})=0.6,whichcanbemergedwithm 2usingDempster(τ1,τ2,X)toobtainthe nalresultm({water})=0.4andm({gas})=0.6,ifweassumethatthesetwopiecesofevidencearefromindependentsources.

4.2Predicateforbeliefmarginalizationonsubtrees

Nowweprovideaprocedurethatimplementsthemarginalizationofamassfunctionfromalargervariablesettoasmallersetde nedinDe nition22.

11 /belfunction whereDe nition23LetthefollowingbeaBelSC belfunction σ1,..,σq

11111.σi∈{σ1,..,σq}isoftheform massvalue=κ1i ψi /mass

112.ψiisoftheform multiitem 1i1 /multiitem ... multiitem in /multiitem

111111113.each 1itin{ i1,..., in}isoftheform ρit1 φit1 /ρit1 ,..., ρitl φitl /ρitl

1114.andρ1it1,...,ρitlaretagnames,andφit1,...,φitlaretextentries.

LetthevariablesetassociatedwithitbeVqwithcon gurationsin Vq.LetVp Vq.

When|Vp|>1,letthemarginalizedBelSCon Vpbe

22 belfunction σ1,..,σp /belfunction

2222whereeachσj∈{σ1,..,σp}isoftheform massvalue=κ2j ψj /mass

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

2andeachψjisoftheform

2 multiitem 2j1 /multiitem ... multiitem jm /multiitem

andeach 2jkisoftheform

22222 ρ2jk1 φjk1 /ρjk1 ,..., ρjkf φjkf /ρjkf

and111↓Vp2={<φ2κ2j=Σiκi,s.t.,{<φit1,...,φitl>}jk1,...,φikf>}

When|Vp|=1,letthemarginalizedBelVCon Vpbe

22 ρ2 belfunction σ1,..,σp /belfunction /ρ2

2222whereeachσj∈{σ1,..,σp}isoftheform massvalue=κ2j ψj /mass

2isoftheformandeachψj

2 massitem φ2j1 /massitem ,..., massitem φjm /massitem

and111↓Vp222κ2={φ2j=Σiκisuchthat,{<φit1,...,φitl>}jz}andφjz∈{φj1,...,φjm}

andρ2isatagnamethatisassociatedwiththesetofvaluesin Vp.

When|Vp|=1,thatis,thereisonlyonevariableinsetVp,aBelSCisreducedtoaBelVC.

De nition24LettheabstracttermτbeaBelSConasubtreewithvariablesetVq.LetVpbeasubsetofVqandXbealogicalvariable.ThepredicatePropagateTree(τ,Vq,X)issuchthatXisevaluatedtoτ whereτ istheabstracttermdenotingthepropagatedBelSC(orBelVC)onVpobtainedbyDe nition23.Example14LetτdenotetheBelSCinExample13.ApplyingpredicatePropagateTree(τ,{deposit},X),weobtainanewBelVCas deposit

belfunction

massvalue=“0.4”

massitem water /massitem

massitem oil /massitem

/mass

massvalue=“0.6”

massitem gas /massitem

/mass

/belfunction

/deposit

Sincethereisonlyonevariabletoprojectonwhenusingthispredicate,asubtreestructureisreducedtoaBelVConatextentry.

Similartotheprocedureandpredicateabove,itispossibletode neanotherprocedureandpredicatetomarginalizeamassfunctionfrom Vpto Vqthroughanextensionoperation.However,obtainingamassfunctiononalargerframe(withmorevariables)isnotasusefulastheprojectionoperationwhichderivesamassfunctiononasmallerframe,thereforewewillnotincludethesedetailedde nitionsinthispaper.

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

5Comparisonwithrelatedapproaches

In[NJ02],ingthismodel,wecanconstructanXMLreportasillustratedbelow.Twotypesofprobabilityassignmentsaredistinguished,mutuallyexclusiveornotmutuallyexclusive.Forthe rsttype,probabilitiesareassignedtosingleatomswhereonlyoneoftheseatomscanbetrue,andthetotalsumofprobabilityvaluesislessthanorequalto1(asfor precipitation ).Forthesecondtype,twosingleatomscanbecompatible,sothetotalsumofprobabilitiescanbegreaterthan1(asfor cities ).

report

source TV1 /source

date 19/3/02 /date

cities

cityProb=“0.7”

cityName London /cityName

precipitation

Disttype=“mutually exclusive”

ValProb=“0.1” sunny /Val

ValProc=“0.7” rain /Val

/Dist

/precipitation

/city

cityProb=“0.4”

cityName GreaterLondon /cityName

precipitation

Disttype=“mutually exclusive”

ValProb=“0.2” sunny /Val

ValProc=“0.6” rain /Val

/Dist

/precipitation

/city

/cities

/report

Thismodelallowsprobabilitiestobeassignedtomultiplegranularities.Whenthisoccurs,theprobabilityofanelementistrueisconditionedupontheexistenceofitsparent(withprobability),andsoonuntiluptotherootofthetree.Forexample,ifwewouldliketoknowtheprobabilityofsunnyinLondon,wehaveProb(precipitation=sunny∧cityName=London)

=Prob(precipitation=sunny) Prob(cityName=London)

Prob(precipitation=sunny∧cityName=London|city) Prob(city|cities) Prob(cities|report) Prob(report)

=0.1 1.0 0.7 1.0 1.0 1.0=0.07

Therefore,theprobabilityassociatedwithatextentry(atanylevel)istreatedastheconditionalprobabilityunderitsparent.Aqueryisansweredbytracingtherelevantbrancheswiththetextentriesspeci edbythequery,andcalculatingprobabilitiesusingtheconditionalprobabilitiesalongthesebranches.Thesederivedprobabilitiesaretheneithermultipliedoraddeddependingonwhetherthe“and”orthe“or”operationareusedintheoriginalquery.Forinstance,thequery“Londoniseithersunnyorrainon19/3/02”isevaluated

Semi-structured information in XML can be merged in a logic-based framework [Hun02, Hun02b]. This framework has been extended to deal with uncertainty, in the form of probability values, degrees of beliefs, or necessity measures, associated with leaves (i.

¨r ¨r¨rrrr¨¨¨¨rtelnmtelnm1John1112222 ¨rr¨¨r1111John

Figure1:AprobabilisticXMLtree.

as:

Prob(cityName=London∧((precipitation=sunny)∨(precipitation=rain)))

=Prob(cityName=London) Prob(precipitation=sunny)

Prob(cityName=London∧precipitation=sunny|city) Prob(city|cities)

Prob(cities|report) Prob(report)

+Prob(cityName=London) Prob(precipitation=rain)

Prob(cityName=London∧precipitation=rain|city) Prob(city|cities)

Prob(cities|report) Prob(report)

=(1.0 0.1 0.7 1.0 1.0 1.0)+(1.0 0.7 0.7 1.0 1.0 1.0)=0.07+0.49=0.56.Themainadvantageofthismodelisthatitallowsprobabilitiestobeassignedtomultiplelevelsofsubtreesandprovidesameanstocalculatethejointprobabilityfromthem.However,itdoesnotmergemultipleprobabilisticXMLdocumentsonthesameissue.Onthecontrary,ouruncertaintyXMLmodelfocusesonmultipleXMLdatasetsandprovidesasetofmeanstomergeopinionswithuncertaintyfromdifferentsourcesontextentriesandsubtrees.Therefore,ourmodellingandreasoningmethodismoregeneralthenthatin[NJ02].

AnothermethodtomodelandreasonwithprobabilisticXMLinformationisreportedin[KKA05].Inthispaper,threetypesoftagsareidenti edas:(1)tagsthatstandforprobabilities(denotedas );(2)tagsthatstandforpossiblevaluesassociatedwithprobabilities(denotedas );and(3)ordinarytagnames(denotedas ).AtreestructureincludingthesenotationsisillustratedinFigure1[KKA05].

SincetheauthorsinthepaperdidnotprovidetheactualXMLstructurefortheexample(oranyotherexamples)toexplicitlyshowhowthesetypesoftagsarerepresented,wecreatedanXMLdocumentforthisexamplebasedonourownunderstandingasdemonstratedinFigure2left.Aswecansee,thereislotofredundantinformationinthisXMLdocument,suchasallthetagsrelatedtopossiblevaluesarenotstrictlyrequired,sinceapossibletagwillalwayssitbetweenaprobabilitytagandanormaltag.ThisexamplecanbeequivalentlyrepresentedinourProSCformatwithamorecompactstructureasshowinFigure2right.

Apartfromtheapparentstructuraldifferencesbetweentheapproachin[KKA05]andours,therealdif-ferenceliesinthemergingprocessitself.In[KKA05],eachpairof(tag,value)andthecombinationofthesepairsaretreatedaspossibleworlds.ThemergingoftwoprobabilisticXMLdocumentsistogen-erateallthecombinationsofpossibleworldsfromthetwodocuments.Asaconsequence,therecanbeahugenumberofbranchesinthemergedXMLdocumentandtherecanbevarietiesofthedocument.Forinstance,oneexamplegiveninthepaperconsistsoftwosimpleXMLdocumentsaboutpersonswithcertainty(noprobabilities).Onedocumenthasdetailsforfourpersonswitheachpersonhastagsfirstname,lastname,phone,roomandassociatedvalues,theotherdocumenthasdetailsfortwopersonswiththesamesetoftagnamesandcorrespondingvalues.Interestingly,mergingthesetwosimpledocu-mentsin[KKA05]generates3201possibleworldswhichresultsinaverylargeandcomplextree.Mostof

本文来源:https://www.bwwdw.com/article/bv14.html

Top