Collapsed Consonant and Vowel Models New Approaches for Engl
更新时间:2023-04-23 09:14:01 阅读量:1 实用文档 文档下载
- collapsed推荐度:
- 相关推荐
We propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast, we introduce an alignment algorithm particularly designed for transliteration. Our
CollapsedConsonantandVowelModels:NewApproachesforEnglish-PersianTransliterationandBack-Transliteration
SarvnazKarimiFalkScholerAndrewTurpinSchoolofComputerScienceandInformationTechnologyRMITUniversity,GPOBox2476V,Melbourne3001,Australia
{sarvnaz,fscholer,aht}@cs.rmit.edu.au
Abstract
WeproposeanovelalgorithmforEnglishtoPersiantransliteration.Previousmeth-odsproposedforthislanguagepairapplyawordalignmenttoolfortraining.Bycontrast,weintroduceanalignmentalgo-rithmparticularlydesignedfortranslitera-tion.OurnewmodelimprovestheEnglishtoPersiantransliterationaccuracyby14%overann-grambaseline.Wealsoproposeanovelback-transliterationmethodforthislanguagepair,apreviouslyunstudiedprob-lem.Experimentalresultsdemonstratethatouralgorithmleadstoanabsoluteimprove-mentof25%overstandardtransliterationapproaches.
1Introduction
Translationofatextfromasourcelanguagetoatargetlanguagerequiresdealingwithtechnicaltermsandpropernames.Theseoccurinalmostanytext,butrarelyappearinbilingualdictionar-ies.Thesolutionisthetransliterationofsuchout-of-dictionaryterms:awordfromthesourcelanguageistransformedtoawordinthetargetlanguage,pre-servingitspronunciation.Recoveringtheoriginalwordfromthetransliteratedtargetiscalledback-transliteration.Automatictransliterationisimpor-tantformanydifferentapplications,includingma-chinetranslation,cross-lingualinformationretrievalandcross-lingualquestionanswering.
Transliterationmethodscanbecategorizedintographeme-based(AbdulJaleelandLarkey,2003;Li
etal.,2004),phoneme-based(KnightandGraehl,1998;Jungetal.,2000),andcombined(BilacandTanaka,2005)approaches.Grapheme-basedmeth-odsperformadirectorthographicalmappingbe-tweensourceandtargetwords,whilephoneme-basedapproachesuseanintermediatephoneticrep-resentation.Bothgrapheme-orphoneme-basedmethodsusuallybeginbybreakingthesourcewordintosegments,andthenuseasourcesegmenttotar-getsegmentmappingtogeneratethetargetword.Therulesofthismappingareobtainedbyaligningalreadyavailabletransliteratedwordpairs(trainingdata);alternatively,suchrulescanbehandcrafted.Fromthisperspective,pastworkisroughlypidedintothosemethodswhichapplyawordalignmenttoolsuchasGIZA++(OchandNey,2003),andap-proachesthatcombinethealignmentstepintotheirmaintransliterationprocess.
Transliterationislanguagedependent,andmeth-odsthatareeffectiveforonelanguagepairmaynotworkaswellforanother.Inthispaper,weinvestigatetheEnglish-Persiantransliterationprob-lem.Persian(Farsi)isanIndo-Europeanlanguage,writteninArabicscriptfromrighttoleft,butwithanextendedalphabetanddifferentpronunciationfromArabic.OurpreviousapproachtoEnglish-Persiantransliterationintroducedthegrapheme-basedcollapsed-vowelmethod,employingGIZA++forsourcetotargetalignment(Karimietal.,2006).Weproposeanewtransliterationapproachthatex-tendsthecollapsed-vowelmethod.TomeetPer-sianlanguagetransliterationrequirements,wealsoproposeanovelalignmentalgorithminourtrainingstage,whichmakesuseofstatisticalinformationof
We propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast, we introduce an alignment algorithm particularly designed for transliteration. Our
thecorpus,transliterationspeci cations,andsimplelanguageproperties.Thisapproachhandlespossi-bleconsequencesofelision(omissionofsoundstomakethewordeasiertoread)andepenthesis(addingextrasoundstoawordtomakeit uent)inwrittentargetwordsthathappenduetothechangeoflan-guage.Ourmethodshowsanabsoluteaccuracyim-provementof14.2%overann-grambaseline.
Inaddition,weinvestigatetheproblemofback-transliterationfromPersiantoEnglish.Toourknowledge,thisisthe rstreportofsuchastudy.TherearetwochallengesinPersiantoEnglishtransliterationthatmakesitparticularlydif cult.First,writtenPersianomitsshortvowels,whileonlylongvowelsappearintexts.Second,monophthon-gization(changingdiphthongstomonophthongs)ispopularamongPersianspeakerswhenadaptingfor-eignwordsintotheirlanguage.Totaketheseintoaccount,weproposeanovelmethodtoformtrans-formationrulesbychangingthenormalsegmenta-tionalgorithm.We ndthatthismethodsigni -cantlyimprovesthePersiantoEnglishtranslitera-tioneffectiveness,demonstratinganabsoluteperfor-mancegainof25.1%overstandardtransliterationapproaches.
2Background
Ingeneral,transliterationconsistsofatrainingstage(runningonabilingualtrainingcorpus),andagen-eration–alsocalledtesting–stage.
Thetrainingstepofatransliterationdevelopstransformationrulesmappingcharactersinthesourcetocharactersinthetargetlanguageusingknowledgeofcorrespondingcharactersintranslit-eratedpairsprovidedbyanalignment.Forexample,forthesource-targetwordpair(pat,),analign-mentmaymap“p”to“”and“a”to“”,andthetrainingstagemaydeveloptherulepa→,with“”asthetransliterationof“a”inthecontextof“pa”.Thegenerationstageappliestheserulesonaseg-mentedsourceword,transformingittoawordinthetargetlanguage.
Previousworkontransliterationeitheremploysawordalignmenttool(usuallyGIZA++),ordevelopsspeci calignmentstrategies.Transliterationmeth-odsthatuseGIZA++astheirwordpairaligner(Ab-dulJaleelandLarkey,2003;VirgaandKhudanpur,
2003;Karimietal.,2006)havebasedtheirworkontheassumptionthattheprovidedalignmentsarere-liable.Gaoetal.(2004)arguethatprecisealign-mentcanimprovetransliterationeffectiveness,ex-perimentingonEnglish-Chinesedataandcompar-ingIBMmodels(Brownetal.,1993)withphoneme-basedalignmentsusingdirectprobabilities.
Othertransliterationsystemsfocusonalignmentfortransliteration,forexamplethejointsource-channelmodelsuggestedbyLietal.(2004).TheirmethodoutperformsthenoisychannelmodelindirectorthographicalmappingforEnglish-Chinesetransliteration.Lietal.also ndthatgrapheme-basedmethodsthatusethejointsource-channelmodelaremoreeffectivethanphoneme-basedmeth-odsduetoremovingtheintermediatephonetictransformationstep.Alignmenthasalsobeenin-vestigatedfortransliterationbyadoptingCoving-ton’salgorithmoncognateidenti cation(Coving-ton,1996);thisisacharacteralignmentalgorithmbasedonmatchingorskippingofcharacters,withamanuallyassignedcostofassociation.Coving-tonconsidersconsonanttoconsonantandvoweltovowelcorrespondencemorevalidthanconsonanttovowel.KangandChoi(2000)revisethismethodfortransliterationwhereaskipisde nedasinsertinganullinthetargetstringwhentwocharactersdonotmatchbasedontheirphoneticsimilaritiesortheirconsonantandvowelnature.OhandChoi(2002)revisethismethodbyintroducingbinding,inwhichmanytomanycorrespondencesareallowed.How-ever,alloftheseapproachesrelyonthemanuallyassignedpenaltiesthatneedtobede nedforeachpossiblematching.
Inaddition,somerecentstudiesinvestigatedis-criminativetransliterationmethods(KlementievandRoth,2006;ZelenkoandAone,2006)inwhicheachsegmentofthesourcecanbealignedtoeachseg-mentofthetarget,wheresomerestrictiveconditionsbasedonthedistanceofthesegmentsandphoneticsimilaritiesareapplied.
3TheProposedAlignmentApproach
Weproposeanalignmentmethodbasedonsegmentoccurrencefrequencies,therebyavoidingprede nedmatchingpatternsandpenaltyassignments.Wealsoapplytheobservedtendencyofaligningconsonants
We propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast, we introduce an alignment algorithm particularly designed for transliteration. Our
toconsonants,andvowelstovowels,asasubsti-tuteforphoneticsimilarities.Manytomany,onetomany,onetonullandmanytoonealignmentscanbegenerated.3.1
Formulation
Ouralignmentapproachconsistsoftwosteps:the rstisbasedontheconsonantandvowelnatureoftheword’sletters,whilethesecondusesafrequency-basedsequentialsearch.
De nition1AbilingualcorpusBistheset{(S,T)},whereS=s1..s ,T=t1..tm,siisaletterinthesourcelanguagealphabet,andtjisaletterinthetargetlanguagealphabet.
De nition2Givensomeword,w,theconsonant-vowelsequencep=(C|V)+forwisobtained
byreplacingeachconsonantwithCandeachvowelwithV.
De nition3Givensomeconsonant-vowelse-quence,p,areducedconsonant-vowelsequenceqreplacesallrunsofC’swithC,andallrunsofV’swithV;henceq=q′|q′′,q′=V(CV) (C| )andq′′=C(VC) (V| ).
Foreachnaturallanguageword,wecandeterminetheconsonant-vowelsequence(p)fromwhichthereducedconsonant-vowelsequence(q)canbede-rived,givingacommonnotationbetweentwodif-ferentlanguages,nomatterwhichscripteitherofthemuse.Tosimplify,semi-vowelsandapproxi-mants(soundsintermediatebetweenconsonantsandvowels,suchas“w”and“y”inEnglish)aretreatedaccordingtotheirtargetlanguagecounterparts.Ingeneral,forallthewordpairs(S,T)inacorpusB,analignmentcanbeachievedusingthefunction
T ,r).f:B→A;(S,T)→(S,
Thefunctionfmapsthewordpair(S,T)∈Bto
T ,r)∈AwhereS andT aresub-thetriple(S,
stringsofSandTrespectively.Thefrequencyofthiscorrespondenceisdenotedbyr.Arepresentsasetofsubstringalignments,andweuseaperwordalignmentnotationofae2pwhenaligningEnglishtoPersianandap2eforPersiantoEnglish.3.2
AlgorithmDetails
Step1(Consonant-Vowelbased)
Foranywordpair(S,T)∈B,thecorrespondingreducedconsonant-vowelsequences,qSandqT,aregenerated.Ifthesequencesmatch,thenthealignedconsonantclustersandvowelsequencesareaddedtothealignmentsetA.IfqSdoesnotmatchwithqT,thewordpairremainsunalignedinStep1.
Theassumptioninthisstepisthattransliterationofeachvowelsequenceofthesourceisavowelse-quenceinthetargetlanguage,andsimilarlyforcon-sonants.However,consonantsdonotalwaysmaptoconsonants,orvowelstovowels(forexample,theEnglishletter“s”maybewrittenas“”inPersianwhichconsistsofonevowelandoneconsonant).Al-ternatively,theymightbeomittedaltogether,whichcanbespeci edasthenullstring,ε.Wethereforerequireasecondstep.
Ouralgorithmconsistsoftwosteps.
Step2(Frequencybased)
Formostnaturallanguages,themaximumlengthofcorrespondingphonemesofeachgraphemeisadigraph(twoletters)oratmostatrigraph.Hence,alignmentcanbede nedasasearchproblemthatseeksforunitswithamaximumlengthoftwoorthreeinbothstringsthatneedtobealigned.Inourapproach,wesearchbasedonstatisticaloccurrencedataavailablefromStep1.
InStep2,onlythosewordsthatremainunalignedattheendofStep1needtobeconsidered.Foreachpairofwords(S,T),matchingproceedsfromlefttoright,examiningoneofthethreepossibleoptionsoftransliteration:singlelettertosingleletter,digraphtosingleletterandsinglelettertodigraph.Trigraphsareunnecessaryinalignmentastheycanbeeffec-tivelycapturedduringtransliterationgeneration,asweexplainbelow.
Wede nefourdifferentvalidalignmentsforthesource(S=s1s2...si...sl)andtarget(T=t1t2...tj...tm)strings:(si,tj,r),(sisi+1,tj,r),(si,tjtj+1,r)and(si,ε,r).Thesefouroptionsareconsideredastheonlypossiblevalidalignments,andthemostfrequentlyoccurringalignment(high-estr)ischosen.Thesefrequenciesaredynamicallyupdatedaftersuccessfullyaligningapair.Forex-ceptionalsituations,wherethereisnocharacterinthetargetstringtomatchwiththesourcecharactersi,itisalignedwiththeemptystring.
Itispossiblethatnoneofthefourvalidalignment
We propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast, we introduce an alignment algorithm particularly designed for transliteration. Our
optionshaveoccurredpreviously(thatis,r=0foreach).Thissituationcanariseintwoways: rst,suchatuplemaysimplynothaveoccurredinthetrainingdata;and,second,thepreviousalign-mentinthecurrentstringpairmayhavebeenincor-rect.Toaccountforthissecondpossibility,apar-tialbacktrackingisconsidered.Mostmisalignmentsarederivedfromthesimultaneouscomparisonofalignmentpossibilities,givingthehighestprioritytothemostfrequent.ForexampleifS=bbc,T=andA={(b,,100),(bb,,40),(c,,60)},startingfromtheinitialpositions1andt1,the rstalignmentchoiceis(b,,101).Howeverimmediatelyafter,wefacetheproblemofaligningthesecond“b”.Therearetwosolutions:insertingεandaddingthetriple(b,ε,1),orbacktrackingthepreviousalignmentandsubstitutingthatwiththelessfrequentbutpossiblealignmentof(bb,,41).Thesecondsolutionisabetterchoiceasitaddslessambiguousalignmentscontainingε.Attheend,thealignmentsetisup-datedasA={(b,,100),(bb,,41),(c,,61)}.Incaseofequalfrequencies,wecheckpossiblesubsequentalignmentstodecideonwhichalign-mentshouldbechosen.Forexample,if(b,,100)and(bb,,100)bothexistaspossibleoptions,weconsiderifchoosingtheformerleadstoasubse-quentεinsertion.Ifso,weoptforthelatter.
Attheendofastring,ifjustonecharacterinthetargetstringremainsunalignedwhilethelastalign-mentisaεinsertion,that nalalignmentwillbesub-stitutedforε.Thisusuallyhappenswhenthealign-mentof nalcharactersisnotyetregisteredinthealignmentset,mainlybecausePersianspeakerstendtotransliteratethe nalvowelstoconsonantstopre-servetheirexistenceintheword.Forexample,intheword“Jose”the nal“e”mightbetransliteratedto“”whichisaconsonant(“h”)andthereforeisnotcapturedinStep1.
lematicsubstrings:backparsing.
Thepoorlyalignedsubstringsofthesourceandtargetaretakenasnewpairsofstrings,whicharethenreintroducedintothesystemasnewentries.Notethattheythemselvesarenotsubjecttoback-parsing.Moststringsofrepeatingnullscanbebro-kenupthisway,andintheworstcasewillremainasonetupleinthealignmentset.
Toclarify,considertheexamplegiveninFigure1.
),whereanForthewordpair(patricia,
associationbetween“c”and“”isnotyetregis-tered.Forwardparsing,asshowninthe gure,doesnotresolvealltargetcharacters;aftertheincorrectalignmentof“c”with“ε”,subsequentcharactersarealsoalignedwithnull,andthesubstring“”re-mainsintact.Backwardparsing,showninthenextlineofthe gure,isalsonotsuccessful.Itisabletocorrectlyalignthelasttwocharactersofthestring,beforegeneratingrepeatednullalignments.There-fore,thecentralregion—substringsofthesourceandtargetwhichremainedunalignedplusoneextraalignedsegmenttotheleftandright—isentered
),asshownasanewpairtothesystem(ici,
inthelinelabelledInput2inthe gure.ThisnewinputmeetsStep1requirements,andisalignedsuc-cessfully.TheresultingtuplesarethenmergedwiththealignmentsetA.
Backparsing
TheprocessofaligningwordsexplainedabovecanhandlewordswithalreadyknowncomponentsinthealignmentsetA(thefrequencyofoccurrenceisgreaterthanzero).However,whenthisisnotthecase,thesystemmayrepeatedlyinsertεwhilepartorallofthetargetcharactersareleftintact(unsuc-cessfulalignment).Insuchcases,processingthesourceandtargetbackwardshelpsto ndtheprob-Anadvantageofourbackparsingstrategyisthatittakescareofcasualtransliterationshappeningduetoelisionandepenthesis(addingorremovingex-trasounds).Itisnotonlyintranslationthatpeoplemayaddextrawordstomake uenttargettext;fortransliterationalso,itispossiblethatspuriouschar-actersareintroducedfor uency.However,thisof-tenfollowspatterns,suchasaddingvowelstothetargetform.Theseirregularitiesareconsistentlycoveredinthebackparsingstrategy,wheretheyre-mainconnectedtotheirpreviouscharacter.
4TransliterationMethod
Transliterationalgorithmsusealigneddata(theout-putfromthealignmentprocess,ae2porap2ealign-menttuples)fortrainingtoderivetransformationrules.Theserulesarethenusedtogenerateatar-getwordTgivenanewinputsourcewordS.
We propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast, we introduce an alignment algorithm particularly designed for transliteration. Our
Input:Step1:
Forwardalignment:Input2:Step1:(patricia,)qS=CVCVCVqT=CVCVqS=qT
(p,,43),(a,ε,100),(t,,52),(r,,201),(i,,61),(c,ε,1)
,(i,ε,6),(r,ε,1),(t,ε,1),(a,ε,100),(p,ε,1)
)qS=VCVqT=VCV(ici,
(i,,61),(c,,1),(i,,61)
Figure1:Abackparsingexample.Notemiddletuplesinforwardandbackwardparsingsarenotmergedin
Atillthealignmentissuccessfullycompleted.
CV-MODEL1CV-MODEL2CV-MODEL3CCVCCVCCVCCVCVCV
,shshsh#sh,el,le
l(CVC),ll(CV)(CVC),ll(CV)(CVC),ll(CV)s,h,e,l,e,y
s(C),h(C),e(V),l(C),e(V),y(V)AsAbove.
sh(C),s(C),h(C),e(V),l(C),e(V),y(V)
We propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast, we introduce an alignment algorithm particularly designed for transliteration. Our
consonants,asdemonstratedintheexamplebelow.Aspecialsymbolisusedtoindicatethestartand/orendofeachwordifthebeginningandendofthewordisaconsonantrespectively.Therefore,forthewordsstartingorendingwithconsonants,thesymbol“#”isadded,whichistreatedasaconsonantandthereforegroupedintheconsonantsegment.AnexampleofapplyingthistechniqueisshowninFigure2forthestring“shelley”.Inthisexample,“sh”and“ll”aretreatedastwoconsonantsegments,wherethetransliterationofinpidualcharactersin-sideasegmentisdependentontheothermembersbutnotthesurroundingsegments.However,thisisnotthecaseforvowelsequenceswhichincorporatealevelofknowledgeaboutanysegmentneighbours.Therefore,fortheexample“shelley”,the rstseg-mentis“sh”whichbelongstoCpattern.Duringtransliteration,if“#sh”doesnotappearinanyex-istingrules,abackoffsplitsthesegmenttosmallersegments:“#”and“sh”,or“s”and“h”.Thesecondsegmentcontainsthevowel“e”.Sincethisvowelissurroundedbyconsonants,thesegmentpatternisCVC.Inthiscase,backoffonlyappliesforvowelsasconsonantsaresupposedtobepartoftheirownin-dependentsegments.Thatis,ifsearchintherulesofpatternCVCwasunsuccessful,itlooksfor“e”inVpattern.Similarly,segmentationforthiswordcon-tinueswith“ll”inCpatternand“ey”inCVpattern(“y”isanapproximant,andthereforeconsideredasavowelwhentransliteratingEnglishtoPersian).4.3
RulesforBack-Transliteration
phase,itispossibletobene tfromtheirexistenceinthetrainingphase.Forexample,usingCV-,merkel)withqS=CandMODEL3,thepair(
ap2e=((,me),(,r),(,ke),(,l)),producesjustone
→merkel”basedonatransformationrule“
Cpattern.Thatis,thePersianstringcontainsnovowelcharacters.If,duringthetransliterationgen-”(S=)iserationphase,asourceword“
entered,therewouldbeoneandonlyoneoutputof“merkel”,whileanalternativesuchas“mercle”mightberequiredinstead.Toavoidover ttingthesystembylongconsonantclusters,weperformseg-mentationbasedontheEnglishqsequence,butcate-gorisetherulesbasedontheirPersiansegmentcoun-,merkel)withterparts.Thatis,forthepair(
ae2p=((m,),(e,ε),(r,),(k,),(e,ε),(l,)),theserulesaregenerated(withcategorypatternsgiveninparen-thesis):→m(C),→rk(C),→l(C),→merk(C),→rkel(C).Wecallthesuggestedtrainingapproachreversesegmentation.
Reversesegmentationavoidsclusteringalltheconsonantsinonerule,sincemanyEnglishwordsmightbetransliteratedtoall-consonantPersianwords.
4.4TransliterationGenerationandRanking
Inthetransliterationgenerationstage,thesource
wordissegmentedfollowingthesameprocessofsegmentingwordsintrainingstage,andaprobabil-ityiscomputedforeachgeneratedtargetword:
WrittenPersianignoresshortvowels,andonlylong
|K|Yvowelsappearintext.ThiscausesmostEnglish k|S k),P(TP(T|S)=
vowelstodisappearwhentransliteratingfromEn-k=1
glishtoPersian;hence,thesevowelsmustbere-storedduringback-transliteration.where|K|isthenumberofdistinctsourceseg- k|S k)istheprobabilityoftheS k→T kWhentheinitialtransliterationhappensfromEn-ments.P(T
glishtoPersian,thetransliterator(whetherhu-transformationrule,asobtainedfromthetrainingmanormachine)usestherulesoftransliterat-stage:
ingfromEnglishasthesourcelanguage.There- k|S k)=frequencyofSk→TkP(T
fore,transliteratingbacktotheoriginallanguageshouldconsidertheoriginalprocess,toavoidlos-ingessentialinformation.Intermsofsegmenta-tionincollapsed-vowelmodels,differentpatternsde nesegmentboundariesinwhichvowelsarenecessaryclues.Althoughwedonothavemostofthesevowelsinthetransliterationgeneration
We propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast, we introduce an alignment algorithm particularly designed for transliteration. Our
SmallCorpus
TOP-1TOP-5TOP-10
58.0(2.2)85.6(3.4)89.4(2.9)47.2(1.0)77.6(1.4)83.3(1.5)
61.7(3.0)80.9(2.2)82.0(2.1)50.6(2.5)79.8(3.4)84.9(3.1)
60.0(3.9)86.0(2.8)91.2(2.5)47.4(1.0)79.2(1.0)87.0(0.9)
67.4(5.5)90.9(2.1)93.8(2.1)55.3(0.8)84.5(0.7)89.5(0.4)
72.2(2.2)92.9(1.6)93.5(1.7)59.8(1.1)85.4(0.8)92.6(0.7)
LargeCorpus
TOP-1TOP-5TOP-10
We propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast, we introduce an alignment algorithm particularly designed for transliteration. Our
GIZA++NewAlignmentReverse
Table2:Comparisonofmean(standarddeviation)wordaccuracy(%)forPersiantoEnglishtransliteration.
6Conclusions
WehavepresentedanewalgorithmforEnglishtoPersiantransliteration,andanovelalignmental-gorithmapplicablefortransliteration.Ournewtransliterationmethod(CV-MODEL3)outperformsthepreviousapproachesforEnglishtoPersian,in-creasingwordaccuracybyarelative9.2%to17.2%(TOP-1),whenusingGIZA++foralignmentintrain-ing.Thismethodshowsfurther7.1%to8.1%in-creaseinwordaccuracy(TOP-1)withournewalign-mentalgorithm.
PersiantoEnglishback-transliterationisalsoin-vestigated,withCV-MODEL3signi cantlyoutper-formingothermethods.Enrichingthismodelwithanewreversesegmentationalgorithmgivesrisetofurtheraccuracygainsincomparisontodirectlyap-plyingEnglishtoPersianmethods.
Infutureworkwewillinvestigatewhetherpho-neticinformationcanhelpre neourCV-MODEL3,andexperimentwithmanuallyconstructedrulesasabaselinesystem.
calmachinetranslation:putionalLinguistics,19(2):263–311.
putationalLinguistics,22(4):481–496.WeiGao,Kam-FaiWong,andWaiLam.2004.Improvingtransliterationwithprecisealignmentofphonemechunksandusingcontextualfeatures.InAsiaInformationRetrievalSymposium,pages106–117.SungYoungJung,SungLimHong,andEunokPaek.2000.AnEnglishtoKoreantransliterationmodelofextendedMarkovwindow.InConferenceonComputationalLinguistics,pages383–389.Byung-JuKangandKey-SunChoi.2000.Automatictranslit-erationandback-transliterationbydecisiontreelearning.InConferenceonLanguageResourcesandEvaluation,pages1135–1411.SarvnazKarimi,AndrewTurpin,andFalkScholer.2006.En-glishtoPersiantransliteration.InStringProcessingandIn-formationRetrieval,pages255–266.AlexandreKlementievandDanRoth.2006.Weaklysuper-visednamedentitytransliterationanddiscoveryfrommul-tilingualcomparablecorpora.InAssociationforComputa-tionalLinguistics,pages817–putationalLinguistics,24(4):599–612.HaizhouLi,MinZhang,andJianSu.2004.Ajointsource-channelmodelformachinetransliteration.InAssociationforComputationalLinguistics,pages159–puta-tionalLinguistics,29(1):19–51.Jong-HoonOhandKey-SunChoi.2002.AnEnglish-Koreantransliterationmodelusingpronunciationandcontextualrules.InConferenceonComputationalLinguistics.PaolaVirgaandSanjeevKhudanpur.2003.Transliterationofpropernamesincross-languageapplications.InACMSIGIRConferenceonResearchandDevelopmentonInformationRetrieval,pages365–366.DmitryZelenkoandChinatsuAone.2006.Discriminativemethodsfortransliteration.InProceedingsofthe2006Con-ferenceonEmpiricalMethodsinNaturalLanguageProcess-ing.,pages612–617.
Acknowledgments
ThisworkwassupportedinpartbytheAustraliangovernmentIPRSprogram(SK)andanARCDis-coveryProjectGrant(AT).
References
rkey.2003.StatisticaltransliterationforEnglish-Arabiccrosslanguageinforma-tionretrieval.InConferenceonInformationandKnowledgeManagement,pages139–146.SlavenBilacandHozumiTanaka.2005.Directcombinationofspellingandpronunciationinformationforrobustback-transliteration.InConferencesonComputationalLinguis-ticsandIntelligentTextProcessing,pages413–424.PeterF.Brown,VincentJ.DellaPietra,StephenA.DellaPietra,andRobertL.Mercer.1993.Themathematicsofstatisti-
正在阅读:
Collapsed Consonant and Vowel Models New Approaches for Engl04-23
ABCDV德福攻略和经验05-26
北师英语专科(二)作业二04-11
第一次模拟试题FCH2 - 图文12-22
洪恩幼儿启蒙英语500单词09-01
大学体验英语综合教程1(第二版)课后答案04-27
人教版初中数学九年级下册同步测试 第29章 投影与视图(共12页)(04-11
高职院校校园文化环境的培育12-30
李英利《品《红楼梦》诗词悟人生》教学大纲01-18
- 12005The future of animal models of invasive aspergillosis
- 2Orthogonal polynomial method and odd vertices in matrix models
- 3New Zealand
- 4argumentative essay and inductive__ essay models for studen
- 5Automatic reconstruction of colored 3d models
- 6Comparing Agent-Based and Differential Equation Models
- 7Fitting Parameterized Three-dimensional Models to Images
- 8Remote sensing minefield area reduction Model-based approaches for the extraction of minefi
- 92005The future of animal models of invasive aspergillosis
- 102005The future of animal models of invasive aspergillosis
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- Approaches
- Collapsed
- Consonant
- Models
- Vowel
- Engl
- New
- 《标准日本语》初级上下册单词
- 建设项目经济评价方法与参数(三版)
- 2014届高考语文二轮复习精品课件:专题十_文学类文本阅读——散
- 海信液晶电视机TCON电路原理分析
- QSY1002.1-2013健康、安全与环境管理体系第1部分:规范(doc版本)
- 广东省惠州市2014届高三4月模拟考试地理试题 Word版含答案
- 3.6《洛伦兹力与现代技术》
- 麻风病防治知识讲座底稿
- 对外汉语教学论文:对外汉语教学中的颜色词教学
- 湘教版地理必修二第一章习题
- 机械制造工艺学典型习题解答
- O型圈在静密封场合的选用
- 新人教版三年级下册第一单元例4认识线路
- P2P网络借贷平台的风险控制研究
- 第11章_渐开线园柱齿轮精度及检测
- 绞股蓝处方制剂的工艺及质量标准研究概况
- 儒家哲学原著选读4-2010
- 病例分析——艾滋病
- 深井井点降水施工工艺
- 78例根尖诱导成形术的临床分析