XLELFG parsing → Discriminant

更新时间:2023-05-21 00:56:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

We present the LFG Parsebanker, a comprehensive toolkit for interactive incremental construction of a treebank as a parsed corpus. The tool which we have developed supports the process flow in semi-automatic treebank construction, as illustrated in the fol

LFG

Parsebanker

TREPILNorwegianTreebankPilotProject

Introduction

WepresenttheLFGParsebanker,acomprehensivetoolkitforinteractiveincrementalcon-structionofatreebankasaparsedcorpus.Thetoolwhichwehavedevelopedsupportstheprocess owinsemi-automatictreebankconstruction,asillustratedinthefollowingscheme:

XLE/LFGparsing →Discriminantdisambiguation

→DatabasestorageThetoolkithasthefollowingcomponents:

XLE-Web,aninterfacetotheXLEparseronawebpage;thisinterfaceincludesanewdisplayofpackedstructuresandoffersdiscriminants[1],designedandimplementedforLFGgrammars,toselectananalysis;

aparsebankingpagewhichoffersviewsanddisambiguationasinXLE-Web,butalsoaddi-tionalparsebankmanagementoperations,suchassubcorpusandgrammarselectionandasearchwindowbasedonTigerSearchextendedforf-structures;

anoverviewpageprovidingnavigation,informationandsortingofutterances; adiscriminantstatisticspagedisplayingstatisticsonchosendiscriminants.

MostofthesecomponentsareimplementedinCommonLispanduseXML,XSLTandJavascripttoservetheinterfacewebpages.C-structuretrees(andgraphs)aredrawnus-ingScalableVectorGraphics(SVG)andMySQLisusedtostoretheparsebank.

Disambiguationwithdiscriminants

Inbuildingatreebank,theannotator’schoicebetweendifferentpossiblegrammaticalstruc-turesiscomplicatedbyseveralfactors.Amajorchallengeisthesheernumberofpossiblestructures,whichmayrunintothehundredsorthousandsforlongersentences.Anotherchal-lengeisthehighlevelofdetailrecordedinthestructures,whichisdesirableinthetreebankbutcanbedauntingfortheannotator.Considerthef-structuresin(2)forthesentenceinexample(1),wherehverdagcanbeanobjectoranadjunct.(1)Barn-alekerhverdag.child-DEF.PLplayeveryday“Thechildrenplayeveryday.”

(2)

Thedifferenceindicatedwithgreenshadinginthestructuresin(2)ispresentedtothean-notatorasthechoicein(3).Thesesimple,localdifferencesarecalleddiscriminants[1].Bychoosingwhetherhverdagisanobjectoranadjunct,theannotatordecidesontheintendedanalysisbutavoidsexaminingthewhole,complicatedstructures.

(3)

NULLOBJNULLADJUNCTParsebankinginterfacewithdiscriminantdisambiguation

Theinterfaceforidentifyingtheintendedanalysisisshowninthefollowingscreenshot.Hereweseethelistofdiscriminantsontheleft,thepackedconstituentstructureinthemiddle,andthepackedfunctionalstructureontheright.Theanalysesshownareforexample(4),inwhichtilfjellshastwopossibleattachments.Theannotator

basicallychoosesdiscriminantsbyclickingtochooseorrejectthem,butotheradvancedactionsarealsoavailable[3].

VictoriaRosén,PaulMeurerandKoenraaddeSmedt

UniversityofBergenandUnifobAKSIS

(4)Tamedbarn-atilfjells.

takealongchild-DEF.PLtomountain-LOC

“Takethechildrenalongtothemountains”or“Takethechildreninthemountainsalong”

Discriminanttypes

1.Lexicaldiscriminant(awordformanditspartofspeech)

2.Morphologicaldiscriminant(abaseformwithitstagsfrommorphologicalpreprocessing)3.C-structurediscriminant(alabeledorunlabeledbracketingofasubstring)4.F-structurediscriminant(aminimalpaththroughanf-structure)

Treebankoverviewpage

Theoverviewpage,showninthefollowingscreenshot,listsallsentencesinthecorpusto-getherwithinformationaboutnumberofparsesolutions,whethertheanalysisisfragmented,numberofdiscriminants,numberofchosenanalyses,sentencelength,andwhetherthecho-senanalysis

istheintendedone.Anycommentsaddedbytheannotatorduringthedisam-biguationprocessarealsoshown.

Discriminantstatisticspage

Thediscriminantstatisticspagepresentsafrequencylistofchosendiscriminantsforasub-corpus.Eachdiscriminantislistedwithitstype,thenumberoftimesitischosen(i.e.markedasgood)andthenumberoftimesitscomplementischosen

(i.e.markedasbad).(Note:Thestatisticsshownwerecompiledbeforelexicaldiscriminantswereaddedtothesystem.)

Resultsandprospects

OurworkbuildsonpreviousparsebankingeffortssuchastheTreebanker[1],Alpino[4]andLinGORedwoods[2].Ourtoolkit,however,isspeci callydesignedforLFGgrammars.WehaveimplementedTIGER-basedsearchonf-structuresaswellasc-structures,andwecantrainparserankingbasedonourLFGdiscriminants.

Thetoolwhichwehavedevelopedisfunctionalandwillbefurtherdevelopedintheremain-deroftheproject.AlthoughitwasoriginallyprimarilyintendedforNorwegian,ithasbeenimplementedinalanguage-independentfashion.ThismeansthatitmaybeusedforbuildingatreebankforanylanguageforwhichasuitableLFGgrammarisavailable.

TheTREPILprojectrunsfromApril1,2004toDecember31,2008.Itswebsiteis:http://gandalf.aksis.uib.no/trepil/.

References

[1]DavidCarter.TheTreeBanker:Atoolforsupervisedtrainingofparsedcorpora.InProceedingsoftheFourteenthNationalConferenceonArti cialIntelli-gence,pages598–603,Providence,RhodeIsland,1997.[2]StephanOepen,DanFlickinger,KristinaToutanova,andChristopherD.Manning.LinGORedwoods,arichanddynamictreebankforHPSG.ResearchonLanguage&Computation,2(4):575–596,December2004.[3]VictoriaRosén,KoenraadDeSmedt,andPaulMeurer.Towardsatoolkitlinkingtreebankingtogrammardevelopment.InProceedingsoftheFifthWorkshoponTreebanksandLinguisticTheories,pages55–66,2006.[4]LeonoorVanderBeek,GosseBouma,RobertMalouf,andGertjanVanNoord.TheAlpinodependencytreebank.InComputationalLinguisticsintheNetherlands(CLIN)2001,TwenteUniversity,2002.

本文来源:https://www.bwwdw.com/article/1y14.html

Top