Literature Review 英文文献综述模板
更新时间:2023-06-07 14:43:01 阅读量: 实用文档 文档下载
- literature推荐度:
- 相关推荐
IEEE标准格式
TextRecognitionwithMachineLearning
basedonTextStructure
LiteratureReview
YifanShiStudentID:27291944Email:ys1n13@soton.ac.ukMScArti cialIntelligence
FacultyofPhysicalSciences&Eng,UniversityofSouthampton
Abstract—ThefastdevelopingMachineLearningalgorithmsintroducedtosemanticareanowadayshasbroughtvasttechniquesintextrecognition,classi cation,andprocessing.However,thereisalwaysacontradictionbetweenaccuracyandspeed,ashigheraccuracygenerallyrepresentsmorecomplicatedsystemaswellaslargetrainingdatabase.Inordertoachieveabalancebetweenfastspeedandgoodaccuracy,manybrilliantdesignsareusedintextprocessing.Inthisliteraturereview,theseeffortsareintroducedinthreelayers:Natural-LanguageProcessing,TextClassi cation,andIBMWatsonSystem.Keywords—MachineLearning,Natural-LanguageProcessing,TextClassi cation,IBMWatson
asitsworkingpipeline.Finally,aconclusionwillbeincludedtogivesomecommentsonthesetechniques.
II.NATURALLANGUAGEPROCESSINGInordertodealwiththehumannatural-language,itisnecessarytotransformtheunstructuredtextintowell-structuredtablesofexplicitsemantics(Ferrucci,2012).AccordingtoLiddy(2001),Natural-LanguageProcessing(NLP)isaseriesofcomputationaltechniquesusedtoanalyzeandrepresentnaturallyorganizedtextinordertoachievecertaintasksandapplications.CollobertandWeston(2008)havecategorizedNLPtasksintosixtypes:Part-Of-SpeechTagging,Chunking,NamedEntityRecognition,SemanticRoleLabeling,LanguageModels,andSemanticallyRelatedWords.Inadditiontothis,theyalsoimplementedMultitaskLearningwithDeepNeuralNetworkstobuildasuccessfuluni edarchitecturewhichavoidedtraditionallargeamountofempiricalhand-designedfeaturestotrainthesystembyusingbackpropagationtraining(Collobertetal.,2011).III.TEXTCLASSIFICATION
Oneofthesimplewaytorepresentanarticleforalearningalgorithmistousethenumberoftimesthatdistinctwordsappearinthedocument(Joachims,2005).However,duetothelargeamountofpossiblewordsusedinarticles,itwouldcreateaveryhighdimensionalspaceoffeatures.Joachims(1999)suggestsaTransductive1
I.INTRODUCTION
ThegrowingpopularityoftheInternethasbroughtincreasingnumberofusersonline,withavastamountofmessages,blogs,articles,etc.tobedealtwith.Thesetexts,knownasnatural-languagetexts,containpossibleusefulinformationbuttakealongtimeforhumantoread,understandanddealwith.Despitethepopularsearchenginetechnologynowadaysinhelpingusersto ndthesourceswithkeywords,semantictechniquesarealsoneededbymanycompaniestoimprovetheiruser-friendlyworkingenvironment.Inthisliteraturereview,Iwillintroduceseveralimportantsemantictechniques,startingfromthemostbasicNatural-LanguageProcessing,concentratinginthemeaningofwordsandsentences,followedbyTextClassi cationwhichisfocusedonparagraphsandarticles.Then,IwillintroducealandmarksystemnamedIBMWatson,whichhasDeepQA
IEEE标准格式
SupportVectorMachinestodoclassi cationbecauseofitseffectivelearningabilityeveninhighdimensionalfeaturespace.Ratherthanusingnon-linearSupportVectorMachine(SVM),Dumaisetal.(1998)comparedlinearSVMwithanotherfourdifferentlearningalgorithmswhichareFindSimilar,DecisionTrees,NaiveBayes,andBayesNets,whichalsosupportsSVMintextclassi cationbecauseofitshighaccuracy,fastspeedaswellasitssimplemodel.Sebastiani(2002)alsorecommendsNeuralNetworkasapotentialselectionintextclassi cationinthatitsaccuracyisonlyslightlylowerthanSVMincomparison.Thecross-documentcomparisonofsmallpiecesoftext,usinglinguisticfeaturessuchasnounphrases,andsynonymsisintroducedbyHatzivassiloglouetal.(1999).Thesimilarityoftwoparagraphsisde nedbythesameactionconductedonthesameobjectbythesameactor.Therefore,drawingfeaturesaccordingtonounsandverbswouldgenerallyconcludeaparagraphintoseveralprimitiveelements.Inadditiontothesimilarprimitiveelements,restrictionssuchasordering,distancesandprimitive(matchingnounandverbpairs)arealsoimplementedtoexcludeweaklyrelatedfeatures.Thefeatureselectionmethodscaneffectivelyreducethedimensionsofdataset(Ikonomakis,2005)whilekeepingtheperformanceofclassi cation.Tomakesurewhichwordsaretobekept,anEvaluationfunctionhasbeenintroducedbySoucyandMineau(2003)tomeasurehowmuchinformationwecangetbyclassifyingthroughasingleword.AnotherimprovementbyHanetal.(2004)istousePrincipalComponentAnalysis(PCA)toreducethedimensionintransformationoffeatures.NigamandMccallum(2000)combineExpectation-MaximizationandNaiveBayesclassi ertotraintheclassi erwithcertainamountoflabeledtextsfollowedbylargeamountofunlabeleddocuments,whichrealizestheautomatictrainingwithouthugeamountofhand-designedtrainingdata.
answering(QA)ispossibletobeathumanchampionsinJeopardy.AsFerrucci(2012)mentioned,thestructureofWatsonismorecomplicatedthananysingleagentasithashundredsofalgorithmsworkingtogether,inthewaythatMinsky(1988)introducedinSocietyofMind.Generally,WatsonconsistsofpartswhichareDeepQA,NaturalLanguageProcessing(NLP),MachineLearning(ML),andSemanticWebandCloudComputing(Gliozzoetal.,2013).TheDeepQAsystemanalyzesthequestionbydifferentalgorithms,givingdifferentinterpretationsofquestionsandformingqueriesforeachquestion(Ferrucci,2012).Itprovidesallthepossibleanswerstothequestionwiththeevidencesandthescoresforeachcandidate,whichwouldgeneratearankingofcandidateanswerswiththelikelihoodofcorrectness.TheMachineLearningalgorithmsareusedtotraintheweightsinitsevaluatingandanalyzingalgorithms(Gliozzoetal.,2013).ThecluethatWatsonusesinsearchingisnamedaslexicalanswertype(LAT),whichtellsWatsonwhatthequestionisaskingaboutandwhatkindofthingsitneedstolookfor.Beforedoingsearching,itwouldgeneratepriorknowledgeoftypelabel,knownas‘direction’,toeachcandidateanswerandsearchevidencesforandagainstthis‘typedirection’(Ferrucci,2012).TheDeepQAalsohasahighrequirementinGrammar-basedandsyntacticanalysistechniques,forexample,relationextractiontechniquesingettingpossiblerelationsbetweenwords,basedonarule-basedapproach.Inaddition,theabilityofbreakingthequestiondownintosub-questionsbylogicsalsoimprovedWatsonsperformance(Ferrucci,2012),whichenablesWatsonto ndresultsforeachsmallerquestionsandcombinethemtogether.Incorrespondencetotheabilityofbreakingdownquestions,itcanalsogeneratethescorefortheoriginalquestionbasedontheevidenceforsub-questions.
Tosimulatehumanknowledge,Watsonalsousesself-containeddatabase.However,thisrequirementhasledtoitsgreathardwarecost.Watsonalso
IV.IBMWATSONneedstodoautomatictextanalysisandknowledge
TheIBMWatsonprojecthasshownusthatextractiontoupdateitsdatabase,becauseofthecomputersysteminopen-domainquestion-enormousamountofworkandtheinsuranceof
2
IEEE标准格式
input-knowledgeaccuracy.However,theuseofself-containeddatabaseiscostly,thatonlyfewinstitutionscanaffordthehardwareexpense,whichmakestheapplicationofWatsonexpensive.Anotherlimitationisthatthestructuredresourceisrelativelynarrowcomparedwithvastunstructurednatural-languagetexts.Oneofthepossibleimprovementistouseonlinedataandordinaryonlinesearchengineto ndpossiblerelatedarticlesandanalyzethemwithPCclients.Despitethetradeoffbetweenaccuracyandcost,becauseofthepossibletheunrealdataandincorrectinformationonline,itmakesthetechniquemorerealizableingeneral.
[4]V.Hatzivassiloglou,J.Klavans,andE.Eskin,DetectingText
SimilarityOverShortPassages:ExploringLinguisticFeatureCombinationsViaMachineLearning,JointSIGDATConferenceonEmpiricalMethodsinNaturalLanguageProcessingandVeryLargeCorpora,2000.
[5]K.Nigam,TextClassi cationfromLabeledandUnlabeledDoc-umentsusingEM,MachineLearning,Volume39,pp-103134,2000.
[6]E.Liddy,NaturalLanguageProcessing,InEncyclopediaof
LibraryandInformationScience,2ndEd.NY.MarcelDecker,Inc,2001.
[7]S.TongandD.Koller,SupportVectorMachineActiveLearning
withApplicationstoTextClassi cation,JournalofMachineLearningResearchpp-45-66,2001.
[8]F.Sebastiani,MachineLearninginAutomatedTextCategoriza-tion,ACMComputingSurveys(CSUR),Issue1,Volume34,pp-1-47,2002.
[9]P.SoucyandG.Mineau,FeatureSelectionStrategiesforText
Categorization,AI2003,LNAI2671,pp-505-509,2003.
[10]X.Han,G.Zu,W.Ohyama,T.Wakabayashi,andF.Kimura,
AccuracyImprovementofAutomaticTextClassi cationBasedonFeatureTransformationandMulti-classi erCombination,LNCS,Volume3309,pp.463-468,Jan2004.
[11]M.Ikonomakis,S.Kotsiantis,V.andTampakas,TextClassi ca-tionusingMachineLearningTechniques,WSEASTransactionsonComputers,Issue8,Volume4,pp-966-974,2005.
[12]R.CollobertandJ.Weston,uni edarchitecturefornaturallan-guageprocessing:deepneuralnetworkswithmultitasklearning,ICML’08Proceedingsofthe25thinternationalconferenceonMachinelearning,ACMNewYork,USA,Pages160-167,2008.[13]R.Collobert,J.Weston,L.Bottou,M.Karlen,K.Kavukcuoglu,
andP.KuksaNaturalLanguageProcessing(Almost)fromScratch,JournalofMachineLearningResearch,Volume12,pp-2493-2537,2011.
[14]A.Gliozzo,O.Biran,S.Patwardhan,andK.McKeown,Seman-ticTechnologiesinIBMWatson,The10thInternationalSemanticWebConference,Bonn,Germany,2011.
[15]D.Ferrucci,Introductionto“ThisisWatson”,IBMJournalof
ResearchandDevelopment,Volume56Number3/4,pp-1:1-1:15May/July2012.
[16]G.Tesauro,D.Gondek,J.Lenchner,J.Fan,andJ.Prager,
Simulation,learning,andoptimizationtechniquesinWatsonsgamestrategies,IBMJournalofResearchandDevelopment,Volume56,Number3/4,pp-16:116:11,2012.
V.CONCLUSION
Ascanbeseenfromthecontentabove,mosttechniquesusedintextanalysisarebasedon‘wordfeature’extraction,wordtypes,andrelations,whichareallsemantictechniques.WhileWatsonalsousessearchingtechniquesto ndtheexactanswershownintext.However,themachineslacktheabilitytoconcludethemainideainaparagraph,whichismorerelatedwithabstractlogicthinking.Whilethewaythathumanreadconcernsnotonlyonvocabulariesandmeanings,butalsothestructureofparagraphandthelocationofsentences,forexample,the rstsentenceintheparagraphusuallyguidesthefollowingcontent,whichhelpstellthesigni canceofthesentencesandwords.Therefore,usingmachinelearningtoanalyzethestructureofanarticleandcombiningwiththemeaningofeverysentencemightgeneratetheabilitytoconcludethemainidea,whichcanbeusedintextscanningandclassi cation.
REFERENCES
[1]S.Dumais,J.Platt,D.Heckerman,andM.Sahami,Inductive
LearningAlgorithmsandRepresentationsforTextCategoriza-tion,ProceedingsoftheseventhinternationalconferenceonInformationandknowledgemanagement,pp-148-155,1998.[2]T.Joachims,TextCategorizationwithSupportVectorMachines:
LearningwithManyRelevant,ECML-98Proceedingsofthe10thEuropeanConferenceonMachineLearning,pp-137-142,1998.[3]T.Joachims,TransductiveInferenceforTextClassi cationusing
SupportVectorMachines,InternationalConferenceonMachineLearning(ICML),pp-200-209,1999.
3
正在阅读:
Literature Review 英文文献综述模板06-07
高级财务管理作业试题与答案04-08
四季的田野作文300字07-16
间隔问题10-23
商法电大1-5任务答案05-15
HSE-S05上锁挂签管理程序06-08
以物抵债协议(模板)02-22
主井强力皮带输送机操作规程11-04
海底捞公关策划11-25
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- 英文
- Literature
- 综述
- 文献
- 模板
- Review
- 中美钢铁贸易摩擦原因探究——美国钢铁工人联合会的影响作用
- 时间序列时序关联规则挖掘研究
- 初一上册科学期末复习提纲
- 吕梁煤业公司班组建设管理办法
- 供应商月度考核汇总
- 2017-2023年中国移印机市场运营格局及投资潜力研究预测报告(目录)
- 三年级语文阅读练习题
- 商场超市总台服务台工作规范
- 第14讲 汇编程序的基本结构(一)
- 激励理论谈大学生创业激情
- 4、第四章、汽车发动机检测与分解
- 云南西双版纳景洪市非开挖定向钻顶管电力管道施工工程
- 第3-1章 正弦交流电路相量法
- 超级黑马形态全攻略系列(四)——波浪式推进黑马形态
- 数控技术第3章插补原理
- 刑法分论期末考试复习题(全)整合版
- 【精】对外汉语考研-对外汉语院校排名-对外汉语考研学校-对外汉语考研科目-对外汉语考研经验
- 高一语文必修二重点
- 学习动机的激发与培养
- 中国矿业大学银川学院优秀个人简历