Semantic Role Labelling of Prepositional Phrases

更新时间:2023-05-28 19:16:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

SemanticRoleLabellingofPrepositionalPhrases

PatrickYeandTimothyBaldwin

DepartmentofComputerScienceandSoftwareEngineering

UniversityofMelbourne,VIC3010,Australia

{jingy,tim}@cs.mu.oz.au

Abstract.Inthispaper,weproposeamethodforlabellingpreposi-tionalphrasesaccordingtotwodi erentsemanticroleclassi cations,ascontainedinthePenntreebankandtheCoNLL2004SemanticRoleLabellingdataset.Ourresultsillustratethedi cultiesindeterminingprepositionsemantics,butalsodemonstratethepotentialforPPseman-ticrolelabellingtoimprovetheperformanceofaholisticsemanticrolelabellingsystem.

1Introduction

Prepositionalphrases(PPs)arebothcommonandsemanticallyvariedinopenEnglishtext.WhilePPscanoccurasbothcomplementsandadjunctstoverbs

[1]andalsoascomplementstonouns[2],thesemanticsofagivenPPcanof-tenbepredictedwithreasonablereliabilityindependentofcontext.Consider,forexample,thePPtothecar:ourexpectationwouldbeforittooccurasadirectionaladjunct,andonlyininstancessuchasrefertothecarwouldweseeasigni cantdivergencefromthissemantics,althoughhere,theimmediatecontextofthePPwouldgiveusanimmediatesenseofthesemanticshift.Basedonthisobservation,wemayconsiderthepossibilityofconstructingasemantictaggerspeci callyforPPs,whichusestheimmediatecontextofthePPtoarriveatasemanticanalysis.ItisthistaskoftargetedPPsemanticrolelabellingthatwetargetinthispaper.

APPsemanticrolelabellerwouldallowustotakeadocumentandidentifyalladjunctPPswiththeirsemantics.Wewouldexpectthistoincludealargeportionoflocativeandtemporalexpressions,e.g.,inthedocument,providingvaluabledatafortaskssuchasinformationextractionandquestionanswering.IndeedourinitialforayintoPPsemanticrolelabellingrelatestoaninterestingeoparsing,andtherealisationoftheimportanceofPPsinidentifyingandclassifyingspatialreferences.

ThecontributionsofthispaperaretoproposeamethodforPPsemanticrolelabelling,andevaluateitsperformanceoverboththePenntreebank(includingcomparativeevaluationwithpreviouswork)andalsothedatafromtheCoNLLSemanticRoleLabellingsharedtask.Aspartofthisprocess,weidentifythelevelofcomplementarityofadedicatedPPsemanticrolelabellerwithaconven-tionalholisticsemanticrolelabeller,suggestingPPsemanticrolelabellingasapotentialavenueforboostingtheperformanceofexistingsystems.

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

Fig.1.AnexampleoftheprepositionsemanticrolesinPennTeebank

2PrepositionSemanticRoleDisambiguationinPenn

Treebank

Signi cantnumbersofprepositionalphrases(PPs)inthePenntreebank[3]aretaggedwiththeirsemanticrolerelativetothegoverningverb.Forexample,Figure1,showsafragmentoftheparsetreeforthesentence:[Japan’sreservesofgold,convertibleforeigncurrencies,andspecialdrawingrights]fellbyahefty$1.82billioninOctoberto$84.29billion[theFinanceMinistrysaid],inwhichthethreePPsgovernedbytheverbfellaretaggedas,respectively:PP-EXT“extend”,meaninghowmuchofthereservefell;PP-TMP“temporal”,meaningwhenthereservefell,andPP-DIR“direction”,meaningthedirectionofthefall.

Accordingtoouranalysis,thereare143prepositionsemanticrolesinthetree-bank.However,manyofthesesemanticrolesareverysimilartooneanother;forexample,thefollowingsemanticroleswerefoundinthetreebank:PP-LOC,PP-LOC-1,PP-LOC-2,PP-LOC-3,PP-LOC-4,PP-LOC-5,PP-LOC-CLR,

PP-LOC-CLR-2,PP-LOC-CLR-TPC-1.Inspectionofthedatarevealednosystem-aticsemanticdi erencesbetweenthesePPtypes.Indeed,formostPPs,itwasimpossibletodistinguishthesubtypesofagivensuperclass(e.g.PP-LOCinourexample).WethereforedecidedtocollapsethePPsemanticrolesbasedontheir rstsemanticfeature.Forexample,allsemanticrolesthatstartwithPP-LOCarecollapsedtothesingleclassPP-LOC.Table1showsthedistributionofthecollapsedprepositionsemanticroles.

[4]describeasystemfordisambiguatingthesemanticrolesofprepositionsinthePenntreebankaccordingto7basicsemanticclasses.Intheirsystem,O’HaraandWeibeusedadecisiontreeclassi er,andthefollowingtypesoffeatures:–POStagsofsurroundingtokens:ThePOStagsofthetokensbeforeandafterthetargetprepositionwithinaprede nedwindowsize.InO’HaraandWiebe’swork,thiswindowsizeis2.–POStagofthetargetpreposition–Thetargetpreposition–Wordcollocation:Allthewordsinthesamesentenceasthetargetprepo-sition;eachwordistreatedasabinaryfeature.–Hypernymcollocation:TheWordNethypernyms[5]oftheopenclasswordsbeforeandafterthetargetprepositionwithinaprede nedwindowsize(setto5words);eachhypernymistreatedasabinaryfeature.

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

Table1.Penntreebanksemanticroledistribution(top-20roles)

O’HaraandWiebe’ssystemalsoperformsthefollowingpre-classi cation lteringonthecollocationfeatures:

–Frequencyconstraint:f(coll)>1,wherecolliseitherawordfromthewordcollocationorahypernymfromthehypernymcollocation

–Conditionalindependencethreshold:p(c|coll) p(c)

1PleasenotethatNisthenumberoffeaturefrequencybinsandnotthenumberoffeatures,andthatitispossibleformorethanonefeaturetooccurwithagivenfrequency.

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

1.

2.

3.

4.LetsbethelistthatcontainsthefrequencyofallthecollocationfeaturesSortsindescendingorderminFrequency=s[N]DiscardallfeatureswhosefrequencyislessthanminFrequency

RankingAccuracy(%)

O’Hara&Wiebe85.8

2

3Webuildourclassi erusingtheJ48decisiontreeimplementationinWEKA,fordirectcomparabilitywithO’HaraandWiebe.O’Hara’ssystemwasalsoevaluatedusingstrati ed10-foldcrossvalidation

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

methodwithaholisticSRLsystemtodemonstratetheabilityofPPsemanticrolelabellingtoenhanceoverallsystemperformance.

SincethefocusoftheCoNLLdataisonSRLrelativetoasetofpre-determinedverbsforeachsentenceinput,4ourprimaryobjectiveistoinves-tigatewhethertheperformanceofSRLsystemsingeneralcanbeimprovedinanywaybyanindependentprepositionSRLsystem.Weachievethisbyembed-dingourPPclassi cationmethodwithinanexistingholisticSRLsystem—thatisasystemwhichattemptstotagallsemanticroletypesintheCoNLL2004data—throughthefollowingthreesteps:

1.PerformSRLoneachprepositionintheCoNLLdataset;2.MergetheoutputoftheprepositionSRLwiththeoutputofagivenverbSRLsystemoverthesamedataset;3.PerformstandardCoNLLSRLevaluationoverthemergedoutput.

ThedetailsofprepositionSRLandcombinationwiththeoutputofaholisticSRLsystemarediscussedbelow.

3.1BreakdownofthePrepositionSemanticRoleLabellingProblemPrepositionsemanticrolelabellingovertheCoNLLdatasetisconsiderablymorecomplicatedthanthetaskofdisambiguatingprepositionsemanticrolesinthePenntreebank.TherearethreeseparatesubtaskswhicharerequiredtoperformprepositionSRL:

1.VerbAttachment:determiningwhichprepositionisattachedtowhichverb.2.PrepositionSemanticRoleDisambiguation3.Segmentation:determiningtheboundariesofthesemanticroles.

Thethreesubtasksarenottotallyindependentofeachother,aswedemon-strateintheresultssection,andimprovedperformanceoveroneofthesubtasksdoesnotnecessarilycorrelatewithanimprovementinthe nalresults.

3.2VerbAttachmentClassi cation

Verbattachment(VA)classi cationisthe rststepofprepositionsemanticrolelabellingandinvolvesdeterminingtheverbattachmentsiteforagivenpreposition,i.e.whichofthepre-identi edverbsinthesentencetheprepositionisgovernedby.Normally,thistaskwouldbeperformedbyaparser.However,sincetheCoNLLdatasetcontainsnoparsinginformation5andwedidnotwanttouseanyresourcesnotexplicitlyprovidedintheCoNLLdata,wehadtoconstructaVAclassi ertospeci callyperformthistask.

Thisclassi erusesthefollowingfeatures,allofwhicharederivedfrominfor-mationprovidedintheCoNLLdata:

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

None

-1

1

-2

2

3

-3

-630051454411402982160.7129.378.300.810.590.160.040.02

6http://homepages.inf.ed.ac.uk/s0450736/maxent

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

A1

A2

AM-TMP

AM-LOC

A0

AM-MNR

A3

AM-ADV

A4

AM-CAU

AM-PNC

AM-DIS

AM-DIR

AM-EXT

C-A1

R-A1

R-A0

C-V

C-A0

AM-PRD424355299188183125106714440323219742222221.7918.2415.369.669.406.425.453.652.262.061.641.640.970.360.210.100.100.100.100.10

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

SRDAUTO

SEGNP

VAAUTO

VAORACLESEGORACLESEGNPSRDORACLESEGORACLEPRFPRFPRFPRF38.774.588.255.126.9612.3662.687.4213.2791.4111.5320.4842.26.9611.9556.6410.3617.5171.6411.8120.2899.3718.1530.69

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

SRDAUTO

SEGNP

ORIG

VAAUTOS1PRF72.4366.7769.4972.0066.8469.32SEGORACLEPR72.4366.7772.0866.91F69.4969.40SEGNPPR72.4366.7772.1366.95SRDORACLESEGORACLEF69.4969.44PRF72.4366.7769.4972.3167.1169.61S3VAORACLE71.0166.1668.50VAAUTO70.1065.0067.4669.7865.2172.2566.8367.4269.4373.6868.6771.0873.1267.8470.3874.3569.5571.8777.1671.3974.16S2VAORACLE68.1864.5966.3368.9365.5767.21VAAUTO68.2163.5265.7968.3163.6465.8969.7566.0967.8770.5365.6868.0271.6568.1869.8771.8766.9469.32VAORACLE66.7963.2264.9669.5866.0567.7671.9868.1470.0177.8773.9375.85Table7.PrepositionSRLcombinedwith[10](P=precision,R=recall,F=F-score;above-baselineresultsinboldface)

SRDAUTO

SEGNP

ORIG

VAAUTOS1SEGORACLESEGNPSRDORACLESEGORACLEPRF71.8161.1166.0372.3463.8367.82PRFPRF71.8161.1166.0371.8161.1166.0370.2361.8765.7870.7462.4366.32PRF71.8161.1166.0371.1362.6566.62

S3VAORACLE69.1462.1965.4868.8462.3565.43VAAUTO69.0160.6664.5771.3162.5766.6572.7965.4768.9472.2463.4967.5874.8367.8271.1576.5467.1571.54

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

theirrespectivelimitsare,wealsousedoracledoutputsfromeachsubtaskincombiningthe naloutputsoftheprepositionSRLsystem.Theoracledoutputsarewhatwouldbeproducedbyperfectclassi ers,andareemulatedbyinspectionofthegold-standardannotationsforthetestingdata.

Table5showstheresultsoftheprepositionSRLsystemsbeforetheyaremergedwiththeverbSRLsystems.TheseresultsshowthatthecoverageofourprepositionSRLsystemisrelativelylowrelativetothetotalnumberofargu-mentsinthetestingdata,evenwhenoracledoutputsfromallthreesubsystemsareused(recall=18.15%).However,thisisnotsurprisingbecauseweexpectedthemajorityofsemanticrolestobenounphrases.

InTables6,7and8,weshowhowourprepositionSRLsystemperformswhenmergedwiththetop3systemsunderthe3mergingstrategiesintroducedinSection3.6.Ineachtable,ORIGreferstothebasesystemwithoutprepositionSRLmerging.

Wecanmakeafewobservationsfromtheresultsofthemergedsystems.First,outofverbattachment,SRDandsegmentation,theSRDmoduleisboth:(a)thecomponentwiththegreatestimpactonoverallperformance,and(b)thecomponentwiththegreatestdi erentialbetweentheoracleperformanceandclassi er(AUTO)performance.Thiswouldthusappeartobetheareainwhichfuturee ortsshouldbeconcentratedinordertoboosttheperformanceofholisticSRLsthroughprepositionSRL.

Second,theresultsshowthatinmostcases,therecallofthemergedsystemishigherthanthatoftheoriginalSRLsystem.Thisisnotsurprisinggiventhatwearegenerallyrelabellingoraddinginformationtotheargumentstructureofeachverb,althoughwiththemoreagressivemergingstrategies(namelyS2andS3)itsometimeshappensthatrecalldrops,thoughtheextentofanargumentbeingaverselya ectedbyrelabelling.Itdoesseemtopointtoacomplementaritybetweenverb-drivenSRLandpreposition-speci cSRL,however.

Finally,itwassomewhatdisappointingtoseethatinnoinstancedidafully-automatedmethodsurpassthebasesysteminprecisionorF-score.Havingsaidthis,weareencouragebythesizeofthemarginbetweenthebasesystemsandthefullyoracle-basedsystems,asitsupportsourbasehypothesisthatprepositionSRLhasthepotentialtoboosttheperformanceofholisticSRLsystems,uptoamarginof10%inF-scoreforS3.

4AnalysisandDiscussion

Intheprevious2sections,wepresentedthemethodologiesandresultsoftwosystemsthatperformstatisticalanalysisonthesemanticsofprepositions,eachusingadi erentdataset.Theperformanceofthe2systemswasverydi er-ent.TheSRDsystemtrainedonthetreebankproducedalmostperfectresults,whereastheSRLsystemtrainedonConll2004SRLdatasetproducedsomewhatnegativeresults.Intheremainderofthissection,wewillanalyzetheseresultsanddiscusstheirsigni cance.

Thealmostperfectresultsonthetreebankdatasuggestthatthesemantictaggingofprepositionsintreebankishighlyarti cial.Thisisevidentinthreeways.First,theproportionofprepositionalphrasestaggedwithsemanticrolesissmall–around57,000PPsoutofthemillion-wordTreebankcorpus.This

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

smallproportionsuggeststhattheprepositionsemanticrolesweretaggedonlyincertainprototypicalsituations.Second,wewereabletoachievereasonablyhighresultsevenwhenweusedacollocationfeaturesetwithfewerthan200features.Thisfurthersuggeststhatthesemanticrolesweretaggedforonlyasmallnumberofverbsinrelatively xedsituations.Third,theprepositionSRDsystemfortheCoNLLdatasetusedaverysimilarfeaturesettothetreebanksystem,butwasnotabletoproduceanywherenearcomparableresults.SincetheCoNLLdatasetisaimedatholisticSRLacrossallargumenttypes,itincorporatesamuchlargersetofverbsandtaggingscenarios;asaresult,thesemanticrolelabellingofPPsisfarmoreheterogeneousandrealisticthanisthecaseinthetreebank.Therefore,weconcludethattheresultsofourtreebankprepositionSRDsystemarenotverymeaningfulintermsofpredictingthesuccessofthemethodatidentifyingandsemanticallylabellingPPsinopentext.

AfewinterestingfactscameoutoftheresultsovertheCoNLLdataset.ThemostimportantoneisthatbyusinganindependentprepositionSRLsystem,theresultsofageneralverbSRLsystemcanbesigni cantlyboosted.Thisisevidentbecausewhentheoracledresultsofallthreesubtaskswereused,themergedresultswerearound10%higherthanthosefortheoriginalsystems,inallthreecases.Unfortunately,itwasalsoevidentfromtheresultsthatwewerenotsuccessfulinautomatingprepositionSRL.DuetothestrictnessoftheCoNLLevaluation,itwasnotalwayspossibletoachieveabetteroverallperformancebyimprovingjustoneofthethreesubsystems.Forexample,insomecases,worseresultswereachievedbyusingtheoracledresultsforVA,andtheresultsproducedbySRDclassi erthanusingtheVAclassi erandtheSRDclassi ersinconjunction.Thereasonfortheworseresultsisthatinourexperiments,theoracledVAalwaysidenti esmoreprepositionsattachedtoverbsthantheVAclassi er,thereforemoreprepositionswillbegivensemanticrolesbytheSRDclassi er.However,sincetheperformanceoftheSRDclassi erisnothigh,andthesegmentationsubsystemdoesnotalwaysproducethesamesemanticroleboundariesastheCoNLLdataset,mostoftheseadditionalprepositionswouldeitherbegivenawrongsemanticroleorwrongphrasalextent(orboth),therebycausingtheoverallperformancetofall.

Finally,itisevidentthatthemergingstrategyalsoplaysanimportantroleindeterminingtheperformanceofthemergedprepositionSRLandverbSRLsystems:whentheperformanceoftheprepositionSRLsystemishigh,amorepreposition-orientedmergingschemewouldproducebetteroverallresults,andviceversa.

5ConclusionandFutureWork

Inthispaper,wehaveproposedamethodforlabellingprepositionsemanticsanddeployedthemethodovertwodi erentdatasetsinvolvingprepositionsemantics.Wehaveshownthatprepositionsemanticsisnotatrivialproblemingeneral,andalsothathasthepotentialtocomplementothersemanticanalysistasks,suchassemanticrolelabelling.

OuranalysisoftheresultsoftheprepositionSRLsystemshowsthatsig-ni cantimprovementinallthreestagesofprepositionsemanticrolelabelling—namelyverbattachment,prepositionsemanticroledisambiguationandargu-

Abstract. In this paper, we propose a method for labelling prepositional phrases according to two different semantic role classifications, as contained in the Penn treebank and the CoNLL 2004 Semantic Role Labelling data set. Our results illustrate the dif

mentsegmentation—mustbeachievedbeforeprepositionSRLcanmakeasig-ni cantcontributiontoholisticSRL.TheunsatisfactoryresultsofourCoNLLprepositionSRLsystemshowthattherelativelysimplisticfeaturesetsusedinourresearcharefarfromsu cient.Therefore,wewilldirectourfutureworktowardsusingadditionalNLPtools,informationrepositoriesandfeatureengi-neeringtoimproveallthreestagesofprepositionsemanticrolelabelling.References

[1]Merlo,P.,Leybold,M.:Automaticdistinctionofargumentsandmodi ers:thecaseofprepositionalphrases.In:Proc.oftheACL/EACL-2001WorkshoponComputationalNaturalLanguageLearning(CoNLL-2001),Toulouse,France(2001)121–8

[2]Abney,S.,Schapire,R.E.,Singer,Y.:Boostingappliedtotaggingandppat-tachment.In:Proc.oftheJointSIGDATConferenceonEmpiricalMethodsinNaturalLanguageProcessingandVeryLargeCorpora.(1999)

[3]Marcus,M.P.,Marcinkiewicz,M.A.,Santorini,B.:BuildingalargeannotatedcorpusofEnglish:putationalLinguistics19(1993)313–330

[4]O’Hara,T.,Wiebe,J.:Prepositionsemanticclassi cationviatreebankandframenet.In:Proc.ofthe7thConferenceonNaturalLanguageLearning(CoNLL-2003),Edmonton,Canada(2003)

[5]Miller,G.A.:WordNet:municationsoftheACM38(1995)39–41

[6]Carreras,X.,M`arquez,L.:IntroductiontotheCoNLL-2004sharedtask:Semanticrolelabeling.In:Proc.ofthe8thConferenceonNaturalLanguageLearning(CoNLL-2004),Boston,USA(2004)89–97

[7]Berger,A.L.,Pietra,V.J.D.,Pietra,S.A.D.:put.Linguist.22(1996)39–71

[8]Briscoe,T.,Carroll,J.:Robustaccuratestatisticalannotationofgeneraltext.In:Proc.ofthe3rdInternationalConferenceonLanguageResourcesandEvaluation(LREC2002),LasPalmas,CanaryIslands(2002)1499–1504

[9]Hacioglu,K.,Pradhan,S.,Ward,W.,Martin,J.H.,Jurafsky,D.:Semanticrolelabelingbytaggingsyntacticchunks.In:Proc.ofthe8thConferenceonNaturalLanguageLearning(CoNLL-2004),Boston,USA(2004)

[10]Punyakanok,V.,Roth,D.,Yih,W.T.,Zimak,D.,Tu,Y.:Semanticrolelabeling

viageneralizedinferenceoverclassi ers.In:Proc.ofthe8thConferenceonNaturalLanguageLearning(CoNLL-2004),Boston,USA(2004)

[11]Carreras,X.,M`arquez,L.,Chrupa,G.:Hierarchicalrecognitionofpropositional

argumentswithperceptrons.In:Proc.ofthe8thConferenceonNaturalLanguageLearning(CoNLL-2004),Boston,USA(2004)

本文来源:https://www.bwwdw.com/article/1mr4.html

Top