推荐系统netflix获奖算法
更新时间:2023-06-01 10:13:01 阅读量: 实用文档 文档下载
- 推荐系统算法推荐度:
- 相关推荐
赢得netflix推荐系统大奖的算法
1
TheBellKorSolutiontotheNet ixGrandPrize
YehudaKorenAugust2009
I.INTRODUCTION
Thisarticledescribespartofourcontributiontothe“Bell-Kor’sPragmaticChaos” nalsolution,whichwontheNet ixGrandPrize.TheotherportionofthecontributionwascreatedwhileworkingatAT&TwithRobertBellandChrisVolinsky,asreportedinour2008ProgressPrizereport[3].The nalsolutionincludesallthepredictorsdescribedthere.Inthisarticlewedescribeonlythenewerpredictors.
Sowhatisnewoverlastyear’ssolution?Firstwefurtherim-provedthebaselinepredictors(Sec.III).Thisinturnimprovesourothermodels,whichincorporatethosepredictors,likethematrixfactorizationmodel(Sec.IV).Inaddition,anextensionoftheneighborhoodmodelthataddressestemporaldynamicswasintroduced(Sec.V).OntheRestrictedBoltzmannMa-chines(RBM)front,weuseanewRBMmodelwithsuperioraccuracybyconditioningthevisibleunits(Sec.VI).The naladditionistheintroductionofanewblendingalgorithm,whichisbasedongradientboosteddecisiontrees(GBDT)(Sec.VII).
II.PRELIMINARIES
TheNet ixdatasetcontainsmorethan100milliondate-stampedmovieratingsperformedbyanonymousNet ixcus-tomersbetweenDec31,1999andDec31,2005[4].Thisdatasetgivesratingsaboutm=480,189usersandn=17,770movies(aka,items).
Thecontestwasdesignedinatraining-testsetformat.AHold-outsetofabout4.2millionratingswascreatedconsistingofthelastninemoviesratedbyeachuser(orfewerifauserhadnotratedatleast18moviesovertheentireperiod).Theremainingdatamadeupthetrainingset.TheHold-outsetwasrandomlysplitthreeways,intosubsetscalledProbe,Quiz,andTest.TheProbesetwasattachedtothetrainingset,andlabels(theratingthattheusergavethemovie)wereattached.TheQuizandTestsetsmadeupanevaluationset,whichisknownastheQualifyingset,thatcompetitorswererequiredtopredictratingsfor.Onceacompetitorsubmitspre-dictions,theprizemasterreturnstherootmeansquarederror(RMSE)achievedontheQuizset,whichispostedonapublicleaderboard(/leaderboard).RMSEvaluesmentionedinthisarticlecorrespondtotheQuizset.Ultimately,thewinneroftheprizeistheonethatscoresbestontheTestset,andthosescoreswereneverdisclosedbyNet ix.Thisprecludescleversystemswhichmight“game”thecompetitionbylearningabouttheQuizsetthroughrepeatedsubmissions.
Comparedwiththetrainingdata,theHold-outsetcontainsmanymoreratingsbyusersthatdonotratemuchandare
Y.KoreniswithYahoo!Research,
Haifa,
ISRAEL.
Email:yehuda@
thereforehardertopredict.Inaway,thisrepresentsrealrequirementsforacollaborative ltering(CF)system,whichneedstopredictnewratingsfromolderones,andtoequallyaddressallusers,notjusttheheavyraters.
Wereservespecialindexingletterstodistinguishusersfrommovies:forusersu,v,andformoviesi,j.Aratingruiindicatesthepreferencebyuseruofmoviei.Valuesarerangingfrom1(star)indicatingnointerestto5(stars)indicatingastronginterest.Wedistinguishpredictedratingsfromknownones,byusingthenotationr uiforthepredictedvalueofrui.
Thescalartuidenotesthetimeofratingrui.Here,timeismeasuredindays,sotuicountsthenumberofdayselapsedsincesomeearlytimepoint.About99%ofthepossibleratingsaremissing,becauseausertypicallyratesonlyasmallportionofthemovies.The(u,i)pairsforwhichruiisknownarestoredinthetrainingsetK={(u,i)|ruiisknown}.NoticethatKincludesalsotheProbeset.EachuseruisassociatedwithasetofitemsdenotedbyR(u),whichcontainsalltheitemsforwhichratingsbyuareavailable.Likewise,R(i)denotesthesetofuserswhorateditemi.Sometimes,wealsouseasetdenotedbyN(u),whichcontainsallitemsforwhichuprovidedarating,eveniftheratingvalueisunknown.Thus,N(u)extendsR(u)byalsoconsideringtheratingsintheQualifyingset.
Modelsfortheratingdataarelearnedby ttingthepre-viouslyobservedratings(trainingset).However,ourgoalistogeneralizethoseinawaythatallowsustopredictfuture,unknownratings(Qualifyingset).Thus,cautionshouldbeex-ercisedtoavoidover ttingtheobserveddata.Weachievethisbyregularizingthelearnedparameters,whosemagnitudesarepenalized.Theextentofregularizationiscontrolledbytunableconstants.Unlessotherwisestated,weuseL2regularization.Thisisagoodplacetoaddsomewordsontheconstantscontrollingouralgorithms(includingstepsizes,regularization,andnumberofiterations).ExactvaluesoftheseconstantsaredeterminedbyvalidationontheProbeset.Inallcasesbutone(tobementionedbelow),suchvalidationisdoneinamanualgreedymanner.Thatis,whenanewlyintroducedconstantneedstogettuned,weexecutemultiplerunsofthealgorithmsandpickthevaluethatyieldsthebestRMSEontheNet ixProbeset[4].Thisschemedoesnotresultinoptimalsettingsforseveralreasons.First,onceaconstantissetwedonotrevisititsvalue,eventhoughfutureintroductionofotherconstantsmayrequiremodifyingearliersettings.Second,weusethesameconstantsundermultiplevariantsofthesamealgorithm(e.g.,multipledimensionalitiesofafactorizationmodel),whereasamoredelicatetuningwouldrequireadifferentsettingforeachvariant.Wechosethisconvenient,butlessaccuratemethod,becauseourexperienceshowedthatovertuningtheaccuracyofasinglepredictordoes
赢得netflix推荐系统大奖的算法
notdeliverarealcontributionafterbeingincorporatedwithintheoverallblend.
III.BASELINEPREDICTORS
Collaborative lteringmodelstrytocapturetheinteractionsbetweenusersanditemsthatproducethedifferentratingvalues.However,manyoftheobservedratingvaluesareduetoeffectsassociatedwitheitherusersoritems,independentlyoftheirinteraction.AprimeexampleisthattypicalCFdataexhibitlargeuseranditembiases–i.e.,systematictendenciesforsomeuserstogivehigherratingsthanothers,andforsomeitemstoreceivehigherratingsthanothers.
Wewillencapsulatethoseeffects,whichdonotinvolveuser-iteminteraction,withinthebaselinepredictors.Becausethesepredictorstendtocapturemuchoftheobservedsignal,itisvitaltomodelthemaccurately.Thisenablesisolatingthepartofthesignalthattrulyrepresentsuser-iteminteraction,andsubjectingittomoreappropriateuserpreferencemodels.Denotebyµtheoverallaveragerating.Abaselinepredic-tionforanunknownratingruiisdenotedbybuiandaccountsfortheuseranditemeffects:
bui=µ+bu+bi
(1)
Theparametersbuandbiindicatetheobserveddeviationsofuseruanditemi,respectively,fromtheaverage.Forexample,supposethatwewantabaselineestimatefortheratingofthemovieTitanicbyuserJoe.Now,saythattheaverageratingoverallmovies,µ,is3.7stars.Furthermore,Titanicisbetterthananaveragemovie,soittendstoberated0.5starsabovetheaverage.Ontheotherhand,Joeisacriticaluser,whotendstorate0.3starslowerthantheaverage.Thus,thebaselineestimateforTitanic’sratingbyJoewouldbe3.9starsbycalculating3.7 0.3+0.5.
Awaytoestimatetheparametersisbydecouplingthecalculationofthebi’sfromthecalculationofthebu’s.First,foreachitemiweset
bi=∑u∈R(i)(rui µ)λ(i)|.(2)
1+|RThen,foreachuseruweset
bu=
∑i∈R(u)(rui µ bi)
λ+|R(u)|
.
(3)
2Averagesareshrunktowardszerobyusingtheregularizationparameters,λ1,λ2,whicharedeterminedbyvalidationontheProbeset.Wewereusing:λ1=25,λ2=10.Wheneverthisworkreferstobaselinepredictorsestimatedfashion,theyaredenotedbyb
inthisdecoupled
ui.Amoreaccurateestimationofbuandbiwilltreatthemsymmetrically,bysolvingtheleastsquaresproblem
min
b(rui µ bu bi)2+λ3(∑b2u+∑b2
i).
(4)
(u,i∑)∈K
u
i
Hereinafter,b denotesalluseranditembiases(busand
bis).The rstterm∑’sthat(u,i)∈K(rui µ+bu+bi)2strivesto ndbu’sandbiterm,λ3(∑ub2u+∑ib2 tthegivenratings.Theregularizing
i),avoidsover ttingbypenalizingthe
magnitudesoftheparameters.Thisleastsquareproblemcan
2
besolvedfairlyef cientlybythemethodofstochasticgradientdescent.Inpractice,wewereusingmorecomprehensiveversionsof(4),towhichweturnnow.A.Timechangingbaselinepredictors
Muchofthetemporalvariabilityinthedataisincludedwithinthebaselinepredictors,throughtwomajortemporaleffects.The rstaddressesthefactthatanitem’spopularitymaychangeovertime.Forexample,moviescangoinandoutofpopularityastriggeredbyexternaleventssuchastheappearanceofanactorinanewmovie.Thisismanifestedinourmodelsbytreatingtheitembiasbiasafunctionoftime.Thesecondmajortemporaleffectallowsuserstochangetheirbaselineratingsovertime.Forexample,auserwhotendedtorateanaveragemovie“4stars”,maynowratesuchamovie“3stars”.Thismayre ectseveralfactorsincludinganaturaldriftinauser’sratingscale,thefactthatratingsaregiveninthecontextofotherratingsthatweregivenrecentlyandalsothefactthattheidentityoftheraterwithinahouseholdcanchangeovertime.Hence,inourmodelswetaketheparameterbuasafunctionoftime.Thisinducesatemplateforatimesensitivebaselinepredictorforu’sratingofiatdaytui:
bui=µ+bu(tui)+bi(tui)
(5)
Here,bu(·)andbi(·)arerealvaluedfunctionsthatchangeovertime.Theexactwaytobuildthesefunctionsshouldre ectareasonablewaytoparameterizetheinvolvingtemporalchanges.
Amajordistinctionisbetweentemporaleffectsthatspanextendedperiodsoftimeandmoretransienteffects.Wedonotexpectmovielikeabilityto uctuateonadailybasis,butrathertochangeovermoreextendedperiods.Ontheotherhand,weobservethatusereffectscanchangeonadailybasis,re ectinginconsistenciesnaturaltocustomerbehavior.Thisrequires nertimeresolutionwhenmodelinguser-biasescomparedwithalowerresolutionthatsuf cesforcapturingitem-relatedtimeeffects.
Westartwithourchoiceoftime-changingitembiasesbi(t).Wefounditadequatetosplittheitembiasesintotime-basedbins,usingaconstantitembiasforeachtimeperiod.Thedecisionofhowtosplitthetimelineintobinsshouldbalancethedesiretoachieve nerresolution(hence,smallerbins)withtheneedforenoughratingsperbin(hence,largerbins).Infact,thereisawidevarietyofbinsizesthatyieldaboutthesameaccuracy.Inourimplementation,eachbincorrespondstoroughlytenconsecutiveweeksofdata,leadingto30binsspanningalldaysinthedataset.AdaytisassociatedwithanintegerBin(t)(anumberbetween1and30inourdata),suchthatthemoviebiasissplitintoastationarypartandatimechangingpart:
bi(t)=bi+bi,Bin(t)(6)Whilebinningtheparametersworkswellontheitems,
itismoreofachallengeontheusers’side.Ontheonehand,wewouldlikea nerresolutionforuserstodetectveryshortlivedtemporaleffects.Ontheotherhand,wedonotexpectenoughratingsperusertoproducereliableestimatesforisolatedbins.Differentfunctionalformscanbe
赢得netflix推荐系统大奖的算法
consideredforparameterizingtemporaluserbehavior,withvaryingcomplexityandaccuracy.
Onesimplemodelingchoiceusesalinearfunctiontocaptureapossiblegradualdriftofuserbias.Foreachuseru,wedenotethemeandateofratingbytu.Now,ifuratedamovieondayt,thentheassociatedtimedeviationofthisratingisde nedas
devu(t)=sign(t tu)·|t tu|β.
Here|t tu|measuresthenumberofdaysbetweendatestandtu.WesetthevalueofβbyvalidationontheProbeset;inourimplementationβ=0.4.Weintroduceasinglenewparameterforeachusercalledαusothatwegetour rstde nitionofatime-dependentuser-bias:
bu(1)
(t)=bu+αu·devu(t)
(7)
Thissimplelinearmodelforapproximatingadriftingbehaviorrequireslearningtwoparametersperuser:buandαu.
Thelinearfunctionformodelingtheuserbiasmesheswellwithgradualdriftsintheuserbehavior.However,wealsoobservesuddendriftsemergingas“spikes”associatedwithasingledayorsession.Forexample,wehavefoundthatmultipleratingsausergivesinasingleday,tendtoconcentratearoundasinglevalue.Suchaneffectneednotspanmorethanasingleday.Thismayre ectthemoodoftheuserthatday,theimpactofratingsgiveninasingledayoneachother,orchangesintheactualraterinmulti-personaccounts.Toaddresssuchshortlivedeffects,weassignasingleparameterperuserandday,absorbingtheday-speci cvariability.Thisparameterisdenotedbybut.
IntheNet ixdata,auserrateson40differentdaysonaverage.Thus,workingwithbutrequires,onaverage,40parameterstodescribeeachuserbias.Itisexpectedthatbutisinadequateasastandaloneforcapturingtheuserbias,sinceitmissesallsortsofsignalsthatspanmorethanasingleday.Thus,itservesasanadditivecomponentwithinthepreviouslydescribedschemes.Theuserbiasmodel(7)becomes
bu(3)
(t)=bu+αu·devu(t)+but.
(8)
Thediscussionsofarleadstothebaselinepredictorbui=µ+bu+αu·devu(tui)+bu,tui+bi+bi,Bin(tui).
(9)
Ifusedasastandalonepredictor,itsresultingRMSEwouldbe0.9605.
Anothereffectwithinthescopeofbaselinepredictorsisrelatedtothechangingscaleofuserratings.Whilebi(t)isauser-independentmeasureforthemeritofitemiattimet,userstendtorespondtosuchameasuredifferently.Forexample,differentusersemploydifferentratingscales,andasingleusercanchangehisratingscaleovertime.Accordingly,therawvalueofthemoviebiasisnotcompletelyuser-independent.Toaddressthis,weaddatime-dependentscalingfeaturetothebaselinepredictors,denotedbycu(t).Thus,thebaselinepredictor(9)becomes
bui=µ+bu+αu·devu(tui)+bu,tui+(bi+bi,Bin(tui))·cu(tui).
(10)
3
Alldiscussedwaystoimplementbu(t)wouldbevalidforimplementingcu(t)aswell.Wechosetodedicateaseparateparameterperday,resultingin:cu(t)=cu+cut.Asusual,cuisthestablepartofcu(t),whereascutrepresentsday-speci cvariability.
Addingthemultiplicativefactorcu(t)tothebaselinepre-dictor(asper(10))lowersRMSEto0.9555.Interestingly,thisbasicmodel,whichcapturesjustmaineffectsdisregardinguser-iteminteractions,canexplainalmostasmuchofthedatavariabilityasthecommercialNet ixCinematchrecommendersystem,whosepublishedRMSEonthesameQuizsetis0.9514[4].B.Frequencies
ItwasbroughttoourattentionbyourcolleaguesatthePragmaticTheoryteam(PT)thatthenumberofratingsausergaveonaspeci cdayexplainsasigni cantportionofthevariabilityofthedataduringthatday.Formally,denotebyFuitheoverallnumberofratingsthatuserugaveondaytui.ThevalueofFuiwillbehenceforthdubbeda“frequency”,followingPT’snotation.InpracticeweworkwitharoundedlogarithmofFui,denotedbyfui= logaFui .1
Interestingly,eventhoughfuiissolelydrivenbyuseru,itwillin uencetheitem-biases,ratherthantheuser-biases.Accordingly,foreachitemiweintroduceatermbif,capturingthebiasspeci cfortheitemiatlog-frequencyf.Baselinepredictor(10)isextendedtobe
bui=µ+bu+αu·devu(tui)+bu,tui+(bi+bi,Bin(tui))·cu(tui)+bi,fui.
(11)
Wenotethatitwouldbesensibletomultiplybi,fuibycu(tui),butwehavenotexperimentedwiththis.
Theeffectofaddingthefrequencytermtothemoviebiasisquitedramatic.RMSEdropsfrom0.9555to0.9278.Notably,itshowsabaselinepredictorwithapredictionaccuracysigni cantlybetterthanthatoftheoriginalNet ixCinematchalgorithm.
Here,itisimportanttoremindthatabaselinepredictor,nomatterhowaccurate,cannotyieldpersonalizedrecommenda-tionsonitsown,asitmissesallinteractionsbetweenusersanditems.Inasense,itiscapturingtheportionofthedatathatislessrelevantforestablishingrecommendationsandindoingsoenablesderivingaccuraterecommendationsbysubjectingothermodelstocleanerdata.Nonetheless,weincludedtwoofthemoreaccuratebaselinepredictorsinourblend.
Whyfrequencieswork?:Inordertograspthesourceoffrequenciescontribution,wemaketwoempiricalobservations.First,wecouldseethatfrequenciesareextremelypowerfulforastandalonebaselinepredictor,butaswewillsee,theycontributemuchlesswithinafullmethod,wheremostoftheirbene tdisappearswhenaddingtheuser-movieinteractionterms(matrixfactorizationorneighborhood).Secondisthefactthatfrequenciesseemtobemuchmorehelpfulwhenusedwithmoviebiases,butnotsowhenusedwithuser-relatedparameters.
1Notice
thatFuiisstrictlypositivewheneveritisused,sothelogarithmis
wellde ned.
赢得netflix推荐系统大奖的算法
Frequencieshelpindistinguishingdayswhenusersratealotinabulk.Typically,suchratingsaregivennotcloselytotheactualwatchingday.Ourtheoryisthatwhenratinginabulk,usersstillre ecttheirnormalpreferences.However,certainmoviesexhibitanasymmetricattitudetowardsthem.Somepeoplelikethem,andwillrememberthemforlongastheirall-timefavorites.Ontheotherhand,somepeopledislikethemandjusttendtoforgetthem.Thus,whengivingbulkratings,onlythosewiththepositiveapproachwillmarkthemastheirfavorites,whilethosedislikingthemwillnotmentionthem.Suchabehaviorisexpectedtowardsmostpopularmovies,whichcanbeeitherrememberedasverygoodorjustbeforgotten.Asimilarphenomenoncanalsohappenwithanegativeapproach.Somemoviesarenotoriouslybad,andpeoplewhodidnotlikethemalwaysgivethemasnegativeexamples,indicatingwhattheydonotwanttowatch.However,fortheotherpartofthepopulation,wholikedthosemovies,theyarenotgoingtoberememberedlongassalientpositiveexamples.Thus,whenratinginbulk,longafterwatchingthemovie,onlythosewhodislikedthemoviewillrateit.
Thisexplainswhysuchbiasesshouldbeassociatedwithmovies,notwithusers.Thisalsoexplainswhymostoftheeffectdisappearswhenaddingtheinteractionterms,whichalready“understand”thattheuserisofthetypethatlikes/dislikesthemovie.Inotherwords,wehypothesizethathighfrequencies(orbulkratings)donotrepresentmuchchangeinpeople’staste,butmostlyabiasedselectionofmoviestoberated–somemoviesarenaturalcandidatesas“badexamples”,whileothersarenatural“goodexamples”.Webelievethatfurthervalidatingourhypothesisbearspracticalimplications.If,indeed,frequenciesrepresentbiasedselection,theyshouldbetreatedascapturingnoise,whichneedstogetisolatedoutwhenmakingrecommendations.
Finally,weshouldcommentthatamovierentersuchasNet ix,mighthaveadditionaldatasourcesthatcomplementfrequencies.Forexample,dataontimepasedsinceactualwatchingdate,oronwhetherratingswereenteredinresponsetoagivenquestionnaireorinitiatedbytheuser.
C.Predictingfuturedays
Ourmodelsincludeday-speci cparameters.Weareoftenaskedhowthesemodelscanbeusedforpredictingratingsinthefuture,onnewdatesforwhichwecannottraintheday-speci cparameters?Thesimpleansweristhatforthosefuture(untrained)dates,theday-speci cparametersshouldtaketheirdefaultvalue.Inparticularfor(11),cu(tui)issettocu,andbu,tuiissettozero.Yet,onewonders,ifwecannotusetheday-speci cparametersforpredictingthefuture,whyaretheygoodatall?Afterall,predictionisinterestingonlywhenitisaboutthefuture.Tofurthersharpenthequestion,weshouldmentionthefactthattheNet ixQualifyingsetincludesmanyratingsondatesforwhichwehavenootherratingbythesameuserandhenceday-speci cparameterscannotbeexploited.Toanswerthis,noticethatourtemporalmodelingmakesnoattempttocapturefuturechanges.Allitistryingtodoistocapturetransienttemporaleffects,whichhadasigni cantin uenceonpastuserfeedback.Whensucheffectsareidenti- edtheymustbetuneddown,sothatwecanmodelthemore
4
enduringsignal.Thisallowsourmodeltobettercapturethelong-termcharacteristicsofthedata,whilelettingdedicatedparametersabsorbshortterm uctuations.Forexample,ifausergavemanyhigherthanusualratingsonaparticularsingleday,ourmodelsdiscountthosebyaccountingforapossibleday-speci cgoodmood,whichdoesnotre ectsthelongertermbehaviorofthisuser.Thisway,theday-speci cparametersaccomplishakindofdatacleaning,whichimprovespredictionoffuturedates.D.What’sintheblend?
TheRMSE=0.9555resultofmodel(10)isincludedintheblend.Tolearntheinvolvedparameters,bu,αu,but,bi,bi,Bin(t),cu,andcutoneshouldminimizetheregularizedsquarederroronthetrainingset.Learningisdonebyastochasticgradientdescentalgorithmrunningfor30iterations.Weuseseparatelearningrate(stepsize)andregularization(weightdecay)oneachkindoflearnedparameter,byminimizingthecostfunction
bmin,α∑
rui µ bu αu·devu(tui) bu,t(12) ,c
ui
(u,i)∈K
(bi+bi,Bin(tui))·(cu+cu,tui) 2
+λab2u+λbαu2
+
λcb2u,t2b2i,Bin(t22ui+λdbi+λeui)+λf(cu 1)+λgcu,tui.
Actualvaluesofthelearningrates
andregularizationcon-stants(λa,λb,...,λg)areasfollows:
bubutαubi
bi,Bin(t)cucutreg×10
235e-150003
10
1
5e-1
Noticethatregularizationshrinksparameterstowardszero,withoneexception.Themultipliercuisshrunktowards1,i.e.,wepenalize(cu 1)2,ratherthanc2parametersareinitializedtozero,exceptu.Similarly,alllearnedcuthatisinitializedto1.
Theblendalsoincludestheresultofthemoreaccuratebaselinepredictor(11).Infact,thisistheonlycasewhereweresortedtoanautomaticparametertuner(APT)to ndthebestconstants(learningrates,regularization,andlogbasis).Speci cally,wewereusingAPT1,whichisdescribedin[13].ThereasonweusedAPThereistwofold.First,thisbaselinepredictorcomponentisembeddedinourmorecomprehensivemodels(describedlater).Therefore,itisworthwhiletohighlyoptimizeit.Second,thisisasmallquickly-trainedmodel.Sowecouldeasilyaffordmanyhundredsofautomaticexecutionsseekingoptimalsettings.Still,itisworthmentioningthebene tofAPTwasanRMSEreductionof(only)0.0016overourinitialmanualsettings.
TheparametersoftheRMSE=0.9278resultofmodel(11)werelearnedwitha40-iterationstochasticgradientdescentprocess,withthefollowingconstantsgoverningtheblearningbαofeachkindofparameter:
i,Bin(t)cucutbi,fuutubibreg×102
2.55.2313952.559.294.761.901.10e-6
Thelogbasis,a,ter,werefertothismodelas[PQ1].
赢得netflix推荐系统大奖的算法
IV.MATRIXFACTORIZATIONWITHTEMPORALDYNAMICSMatrixfactorizationwithtemporaldynamicswasalreadydescribedinlastyear’sProgressReport[3],orwithmoredetailinaKDD’09paper[8].ThemajorenhancementforthisyearistheincorporationoftheimprovedbaselinepredictorsdescribedinSec.III.
Thefullmodel,whichisknownastimeSVD++[8]isbasedonthepredictionrule
r ui=bui+qTi
pu(tui)+|N(u)| 1
yj
.(13)
j∈∑
N(u)
Here,theexactde nitionofthetime-dependentbaseline
predictor,bui,follows(10).
AsistypicalforaSVD++model[7],weemploytwosetsofstaticmoviefactors:qi,yi∈Rf.The rstset(theqis)iscommontoallfactormodels.Thesecondset(theyis)facilitatesapproximatingauserfactorthroughthesetofmoviesratedbythesameuser,usingthenormalized
sum|N(u)| 1
∑j∈N(u)yj.Differentnormalizationsoftheform|N(u)| αcouldbeemployed.Ourchoiceofα=1 xingthevarianceofthesum(seealso[7]formoreattemptsatintuitiononthischoice.)
Userfactors,pu(t)∈Rfaretime-dependent.Wemodeledeachofthecomponentsofpu(t)T=(pu1(t),...,puf(t))inthesamewaythatwetreateduserbiases.Inparticularwehavefoundmodelingafter(8)effective,leadingto
puk(t)=puk+αuk·devu(t)+pukt
k=1,...,f.
(14)
Herepukcapturesthestationaryportionofthefactor,αuk·devu(t)approximatesapossibleportionthatchangeslinearlyovertime,andpuktabsorbstheverylocal,day-speci cvari-ability.
Wewereoccasionallyalsousingamorememoryef cientversion,withouttheday-speci cportion:
puk(t)=puk+αuk·devu(t)k=1,...,f
(15)
Thesamemodelwasalsoextendedwiththeaforementionedfrequencies.Sincefrequencyaffectstheperceptionofmovies,wetriedtoinjectfrequencyawarenessintothemoviefactors.Tothisendwecreatedanothercopyofthemoviefactors,foreachpossiblefrequencyvalue.Thisleadstothemodel
r 1
ui=bui+(qTi+qTi,fui)
pu(tui)+|N(u)|
yj (16)
j∈∑
.N(u)
Herethede nitionofbuiisfrequency-awarefollowing(11).Noticethatwhilethetransitiontofrequency-awarebiaseswasmeasurablyeffective,theintroductionoffrequency-dependentmoviefactorswasbarelybene cial.A.What’sintheblend?
Weincludedmultiplevariationsofthematrixfactorizationmodelsintheblend.Allmodelsarelearnedbystochasticgradientdescentapplieddirectlyontherawdata,nopre-orpost-processingareinvolved.Inotherwords,allparameters(biases,user-factorsandmovie-factors)aresimultaneously
5
learnedfromthedata.Constants(learningratesandregu-larizationtobespeci edshortly)aretunedtoreachlowestRMSEafter40iterations.(Practically,onecangiveortakearoundteniterationswithoutameaningfulRMSEimpact).However,forblendingwehavefoundthatover-trainingishelpful.Thatis,weoftenletthealgorithmrunfarmorethan40iterations,therebyover ttingthetraindata,whichhappenstobebene cialwhenblendingwithotherpredictors.
The rstmodelistheoneusingrule(13),togetherwiththemorememoryef cientuser-factors(15).Thesettingscontrol-lingthelearningofbias-relatedparametersareasdescribedinSec.III-D.Asforlearningthefactorsthemselves(qi,puandyj),weareusingalearningrateof0.008andregularizationof0.0015,wherethelearningratedecaysbyamultiplicativefactorof0.9aftereachiteration.Finally,forαukthelearningrateis1e-5andtheregularizationis50.Thesesamesettingsremainthesamethroughoutthissection.Thethreevariantswithinourblendare:
1)f=20,#iterations=40,RMSE=0.89142)f=200,#iterations=40,RMSE=0.88143)f=500,#iterations=50,RMSE=0.8815
Thenextmodelstillemploysrule(13),butwiththemoreaccurateuser-factorrepresentation(14).Thisaddsonetypeofparameter,pukt,whichislearnedwithalearningrateof0.004andregularizationof0.01.Thetwovariantswithintheblendwerebothheavilyover-trainedtoover tthetrainingdata:1)f=200,#iterations=80,RMSE=0.88252)f=500,#iterations=110,RMSE=0.8841
Finallywehaveourmostaccuratefactormodel,whichfollows(16).Whilemainnoveltyofthismodel(overthepreviousone)isinthebiasterm,wealsoaddedthefrequency-speci cmovie-factorsqi,fregularizationui.Theirrespectivelearningrateis2e-5,withof0.02.Theblendincludessixvariants:1)f=200,#iterations=40,RMSE=0.87772)f=200,#iterations=60,RMSE=0.87873)f=500,#iterations=40,RMSE=0.87694)f=500,#iterations=60,RMSE=0.87845)f=1000,#iterations=80,RMSE=0.87926)
f
=2000,#iterations=40,RMSE=0.8762
Later,werefertothemodelwithf=200and#iterations=40
as[PQ2].
V.NEIGHBORHOODMODELSWITHTEMPORALDYNAMICSThemostcommonapproachtoCFisbasedonneigh-borhoodmodels.Whiletypicallylessaccuratethantheirfactorizationcounterparts,neighborhoodmethodsenjoypop-ularitythankstosomeoftheirmerits,suchasexplainingthereasoningbehindcomputedrecommendations,andseamlesslyaccountingfornewenteredratings.ThemethoddescribedinthissectionisbasedonSec.5ofourKDD’09paper[8].Recently,wesuggestedanitem-itemmodelbasedonglobaloptimization[7],whichwillenableusheretocapturetimedynamicsinaprincipledmanner.Thestaticmodel,withouttemporaldynamics,iscenteredonthefollowingprediction
赢得netflix推荐系统大奖的算法
rule:
r ui=bui+|R(u)| 1
(ruj b
+|N(u)| j∈∑
1
uj)wijR(u)
j∈∑
cij
N(u)
Here,then2
(17)item-itemweightswijandcijrepresenttheadjustmentsweneedtomaketothepredictedratingofitemi,givenaratingofitemj.Itwasprovengreatlybene cialtousetwosetsofitem-itemweights:one(thewijs)isrelatedtothevaluesoftheratings,andtheotherdisregardstheratingvalue,consideringonlywhichitemswererated(thecijs).Theseweightsareautomaticallylearnedfromthebiases.Theconstantsb
thedatatogetherwith
ujareprecomputedaccordingto(2)–(3).
Whenadaptingrule(17)toaddresstemporaldynamics,twocomponentsshouldbeconsideredseparately.First,isthebaselinepredictorportion,bui=µ+bi+bu,whichexplainsmostoftheobservedsignal.Second,isthepartthatcapturesthemore|R(u)| 1
informativesignal,∑j∈R(u)(ruj b dealingwith)| 1user-iteminteraction
uj)wij+|N(ufromthe∑j∈N(u)cij.Forthe
baselinepart,nothingchangesfactormodel,andwemakeittime-aware,accordingtoeither(10),or(11).Thelatteroneaddsfrequenciesandisgenerallypreferred.However,capturingtemporaldynamicswithintheinteractionpartrequiresadifferentstrategy.
Item-itemweights(wijandcij)re ectinherentitemcharac-teristicsandarenotexpectedtodriftovertime.Thelearningprocessshouldmakesurethattheycaptureunbiasedlongtermvalues,withoutbeingtooaffectedfromdriftingaspects.Indeed,thetime-changingnatureofthedatacanmaskmuchofthelongertermitem-itemrelationshipsifnottreatedad-equately.Forinstance,auserratingbothitemsiandjhighinashorttimeperiod,isagoodindicatorforrelatingthem,therebypushinghigherthevalueofwij.Ontheotherhand,ifthosetworatingsaregiven veyearsapart,whiletheuser’staste(ifnotheridentity)couldconsiderablychange,thisislessevidenceofanyrelationbetweentheitems.Ontopofthis,wewouldarguethatthoseconsiderationsareprettymuchuser-dependent–someusersaremoreconsistentthanothersandallowrelatingtheirlongertermactions.
Ourgoalhereistodistillaccuratevaluesfortheitem-itemweights,despitetheinterferingtemporaleffects.Firstweneedtoparameterizethedecayingrelationsbetweentwoitemsratedbyuseru.Weadoptanexponentialdecayformedbythefunctione βu· t,whereβu>0controlstheuserspeci cdecayrateandshouldbelearnedfromthedata.Thisleadstothepredictionrule
r 1
ui=bui+|N(u)| j|cij+
(18)
j∈∑
e βu·|tui tuN(u)
|R(u)| 1
e βu·|tui tuj|((ruj b
).j∈∑
uj)wijR(u)
Theinvolvedparametersarelearnedbyminimizingtheas-sociatedregularizedsquarederror.Minimizationisperformed
bystochasticgradientdescentfor20–30iterations.Themodelisapplieddirectlytotherawdata,soallparameters(biasesandmovie-movieweights)arelearnedsimultaneously.Learning
6
ofbias-relatedparametersisgovernedbythesameconstants
discussedinSec.III.Asforthemovie-movieweights(bothwijandcij),theirlearningrateis0.005withregularizationconstantof0.002.Finally,theupdateoftheexponentβu,usesaparticularlysmallstepsizeof1e-7,withregularizationconstantequaling0.01.
Wealsoexperimentedwithotherdecayforms,likethemorecomputationally-friendly(1+βu t) 1,whichresultedinthesameaccuracy,withanimprovedrunningtime.(Noneedtochangemeta-parameters.)
Asinthefactorcase,properlyconsideringtemporaldy-namicsimprovestheaccuracyoftheneighborhoodmodel.TheRMSEdecreasesfrom0.9002[7]to0.8870(seenextsubsection).Toourbestknowledge,thisissigni cantlybetterthanpreviouslyknownresultsbyneighborhoodmethods.Toputthisinsomeperspective,thisresultisevenbetterthanthosereported[1,2,11,15]byusinghybridapproachessuchasapplyinganeighborhoodapproachonresidualsofotheralgorithms.Alessonisthataddressingtemporaldynamicsinthedatacanhaveamoresigni cantimpactonaccuracythandesigningmorecomplexlearningalgorithms.
A.What’sintheblend?
Weranthetime-awareneighborhoodmodel,withbiasesfollowing(10)for20,25,and30iterationsofstochasticgradientdescent.TheresultingRMSEswere0.8887,0.8885and0.8887,respectively.Theresultswith20and30iterationsareintheblend.
Wealsotriedextending(18)withanon-normalizedterm.Thisinvolvedaddingathirdsetofmovie-movieweights,dij,asfollows:
r 1
ui=bui+|N(u)| βu·|tui tuj|cij+
j∈∑
e N(u)
|R(u)|
1e βu·|tui tuj|((ruj b
)+j∈∑
uj)wijR(u)
j∈∑
e γu·|tui tuj|((ruj b
uj)dij).R(u)
Here,wealsotriedtoemphasizetheveryadjacentratingsmadebytheuser.Therefore,thenewdecay-controllingcon-stants,theγus,wereinitializedwitharelativelyhighvalueof0.5(comparedtoinitializingβuwithzero.)Inaddition,fordijweusedaslowerlearningrateof1e-5.Learningwasdoneby25iterationsofstochasticgradientdescent.TheresultwithRMSE=0.8881isincludedintheblend.Inretrospect,webelievethatsuchaminisculeRMSEreductiondoesnotjustifyaddingathirdsetofmovie-movieweights.
Finally,weranthetime-awareneighborhoodmodel,withbiasesfollowing(11)for20iterationsofstochasticgradientdescent.(Thethirdsetofmovie-movieweightswasnotused.)TheresultofRMSE=ter,werefertothismodelas[PQ3].
赢得netflix推荐系统大奖的算法
VI.EXTENSIONSTORESTRICTEDBOLTZMANN
MACHINES
A.RBMswithconditionalvisibleunits
WeextendedtheRestrictedBoltzmannMachines(RBM)modelsuggestedby[12].Theoriginalworkshowedasigni -cantperformanceboostbymakingthehiddenunitsconditionalonwhichmoviesthecurrentuserhasrated.Wehavefoundthatasimilarlysigni cantperformanceboostisachievedbyconditioningthevisibleunits.
Intuitivelyspeaking,eachoftheRBMvisibleunitscorre-spondstoaspeci cmovie.Thustheirbiasesrepresentmovie-biases.However,weknowthatotherkindsofbiasesaremoresigni ly,user-bias,single-dayuserbias,andfrequency-basedmoviebias.Therefore,weaddthosebiasestothevisibleunitsthroughconditionalconnections,whichdependonthecurrentlyshownuseranddate.
Letusborrowtheoriginalnotation[12],whichusesaconditionalmultinomialdistributionformodelingeachcolumnoftheobservedvisiblebinaryratingmatrixV:
p(vkexp(bki+∑Fj=1hjWiki
=1|h)=
j)∑l=1exp(bl
i+∑j=1hjWilj)
(19)
Weextendthisconditionaldistributionasfollows.Letthecurrentinstancerefertouseruanddatet.Weaddauser/datebiasterm:p(vkexp(bkut+bku+bki+∑Fj=1hjWiki
=1|h,u,t)=
j)∑l=1exp(blut+blu+bli+∑j=1hjWilj)
(20)
wherebkuisauser-speci cparameter,andbkruleis
utisauser×date-speci cvariable.Thelearning bku=ε1( vki data vki T), bkut=ε2( vki data vki T).
Wehavefounditconvenientheretodeviatefromthemini-batchlearningschemesuggestedin[12],andtolearnbkandbkweupdateeachu
immediatelyutinafullyonlinemanner.Thatis,afterobservingacorrespondingtraininginstance.Theusedlearningratesare:ε1=0.0025andε2=0.008.Noticethatunlessotherwisestated,weusethesameweightdecaysuggestedintheoriginalpaper,whichis0.001.
Consideringthesigni canteffectoffrequencies,wecanfurtherconditiononthemhere.Letthecurrentinstancerefertouseruanddatetwithassociatedfrequencyf.Theresultingconditionaldistributionisasfollows:
p(vkexp(bkif+bkut+bku+bki+∑Fj=1hjWikj)
i
=1|h,u,t,f)=
∑l=1exp(bif+but+blu+bi+∑j=1hjWij)
(21)
wherebkrulewillifisamovie×frequency-speci cvariable.Itslearning
be
bkif=ε3( vki data vk
i T).
whereε3=0.0002.Onlinelearningisusedaswell.
Whenusingfrequencieswealsoemployedthefollowingmodi cationtothevisible-hiddenweights,whichwasbroughttoourattentionbyourcolleaguesatthePragmaticTheory
7
team.InsteadofusingtheoriginalweightsWikj,wewilluse
frequency-dependentweightsWikjf,whicharefactoredas
Wikjf=Wikj·(1+Cfj).
WeuseonlinelearningforthenewparametersCfj,withlearningrateof1e-5.
Asitturnedout,thisextensionoftheweightsbarelyimprovesperformancewhenfrequency-biasesarealreadypresent,whilebeingsomewhatonerousintermsofrunningtime.Thus,weareunlikelytorecommendit.Still,itispartofourfrequency-awareRBMimplementation.B.RBMswithday-speci chiddenunits
2Motivated
bytheday-speci cuserfactor(14),wealso
triedtocreateday-speci cRBMhiddenunits.OntopoftheFhiddenunits,wealsoaddGday-speci cunits.Forauserthatratedonrdifferentdays,wecreaterparallelcopiesoftheGday-speci cunits.Allthoseparallelcopiessharethesamehidden-visibleweights,hiddenbiases,andconditionalconnections.Also,eachparallelcopyisconnectedonlytothevisibleunitscorrespondingtotheratingsgiveninitsrespectiveday.
Toputthisformally,foraday-speci chiddenunitindexedbyj,withacorrespondingratingdatet,weusetheindicatorvectorrt∈{0,1}ntodenotewhichmoviesthecurrentuserratedondatet.Then,theBernoullidistributionformodelinghiddenuserfeaturesbecomesn
n
p(hj=1|V,rt)=σ(bj+∑
riviWik
j+
Dij).
(22)
i=1k∑
5
tk=1
i∑rit=1
Inourimplementationweusedthismodeltogetherwith
thefrequency-biasedRBM.Allparametersassociatedwiththeday-speci cunitswerelearnedinmini-batches,astheirnonday-speci ccounterparts,butwithalearningrateof0.005andaweightdecayof0.01.Resultswerenotencouraging,andfurtherre nementisstillneeded.Stillasinglevariantofthisschemecontributestotheblend.C.What’sintheblend?
Firstanoteonover-training.OurparametersettingmadetheRBMtypicallyconvergeatlowestQuizRMSEwith60–90iterations.However,fortheoverallblenditwasbene cialtocontinueover ttingthetrainingset,andlettheRBMrunformanyadditionaliterations,aswillbeseeninthefollowing.WeincludeintheblendfourvariantsoftheRBMmodelfollowing(20):
1)F=200,#iterations=52,RMSE=0.89512)F=400,#iterations=62,RMSE=0.89423)F=400,#iterations=82,RMSE=0.89444)F=400,#iterations=100,RMSE=0.8952
TherearealsotwovariantsoftheRBMwithfrequencies(21):
1)F=200,#iterations=90,RMSE=0.89282)F=200,#iterations=140,RMSE=0.8949
2An
ideadevelopedtogetherwithMartinPiotte
赢得netflix推荐系统大奖的算法
Later,werefertothesetwomodelsas[PQ4]and[PQ5].Interestingly,theRMSE=0.8928resultisthebestweknowbyusingapureRBM.Ifourgoodexperiencewithpostpro-cessingRBMbykNN[2]isrepeatable,onecanachieveafurthersigni cantRMSEreductionbyapplyingkNNtotheresiduals.However,wehavenotexperimentedwiththis.Finally,thereisasinglepredictorRBMwith50hiddenunitsand50day-speci chiddenunits,whichran70iterationstoproduceRMSE=ter,werefertothismodelas[PQ6].
VII.GBDTBLENDING
AkeytoachievinghighlycompetitiveresultsontheNet- ixdataisusageofsophisticatedblendingschemes,whichcombine3themultipleindividualpredictorsintoasingle nalsolution.Thissigni cantcomponentwasmanagedbyourcolleaguesattheBigChaosteam[14].Still,wewereproduc-ingafewblendedsolutions,whichwerelaterincorporatedasindividualpredictorsinthe nalblend.
Ourblendingtechniqueswereappliedtothreedistinctsetsofpredictors.Firstisasetof454predictors,whichrepresentallpredictorsoftheBellKor’sPragmaticChaosteamforwhichwehavematchingProbeandQualifyingresults[14].Second,isasetof75predictors,whichtheBigChaosteampickedoutofthe454predictorsbyforwardselection[14].Finally,asetof24BellKorpredictorsforwhichwehadmatchingProbeandQualifyingresults.Detailsofthissetaregivenattheendofthissection.
A.GradientBoostedDecisionTrees
Whilemajorbreakthroughsinthecompetitionwereachievedbyuncoveringnewfeaturesunderlyingthedata,thosebecamerareandveryhardtoget.Asweenteredthe nal30daysofthecompetition(“lastcallforgrandprizeperiod”),werealizedthatindividualpredictors,evenifnovelandaccurate,areunlikelytomakeadifferencetotheblend.Wespeculatedthatthemostimpactduringashortperiodof30dayswouldbeachievedbyexploringnewblendingtechniquesorimprovingtheexistingones.Blendingoffersalowerriskpathtoimprovementinashorttime.First,unlikeindividualpredictors,betterblendingisdirectlyconnectedtothe nalresult.Second,blendingsimultaneouslytouchesmanypredictors,ratherthanimprovingoneatatime.ThisledtotheideaofemployingGradientBoostedDecisionTrees,whichwasraisedtogetherwithMichaelJahrerandAndreasT¨oscher.Eventually,itdidindeedmakeacontributiontotheblend,thoughwehopedforamoresigni cantimpact.
GradientBoostedDecisionTrees(GBDT)areanadditiveregressionmodelconsistingofanensembleoftrees, ttedtocurrentresidualsinaforwardstep-wisemanner.Inthetra-ditionalboostingframework,theweaklearnersaregenerallyshallowdecisiontreesconsistingofafewleafnodes.GBDTensemblesarefoundtoworkwellwhentherearehundredsofsuchdecisiontrees.Standardreferencesare[5,6],andaknownimplementationisTreenet[16].
3Whileweuseherethegenericterm“blending”,themoreaccurateterm
wouldbe“stackedgeneralization”.
8
GBDTcombineafewadvantages,includinganabilityto ndnon-lineartransformations,abilitytohandleskewedvariableswithoutrequiringtransformations,computationalro-bustness(e.g.,highlycollinearvariablesarenotanissue)andhighscalability.Theyalsonaturallylendthemselvestoparallelization.Thishasmadethemagoodchoiceforseverallargescalepracticalproblemssuchasrankingresultsofasearchengine[9,17],orquery-biasedsummarizationofsearchresults[10].Inpracticewehadfoundthem,indeed,very exibleandconvenient.However,theiraccuracylagsbehindthatofNeuralNetworkregressorsdescribedin[13].
TherearefourparameterscontrollingGBDT,whichare:(1)numberoftrees,(2)sizeofeachtree,(3)shrinkage(or,“learningrate”),and(4)samplingrate.Ourexperimentsdidnotshowmuchsensitivitytoanyoftheseparameters(exactchoicesaredescribedlater.)
SinceGBDTcanhandleveryskewedvariables,weaddedtothelistofpredictorsfouradditionalfeatures:usersupport(numberofratedmovies),moviesupport(numberofratingusers),frequencyanddateofrating(numberofdayspassedsinceearliestratinginthedataset).
WeappliedGBDTlearningontheaforementionedsetsof454and75predictors.TheProbesetisusedfortrain-ingtheGBDT,whichisthenappliedontheQualifyingset.Parametersettingsare:#trees=200,tree-size=20,shrink-age=0.18,andsampling-rate=0.9.Theresults,whicharein-cludedintheblend,areofRMSE=0.8603(454predictors)andRMSE=0.8606(75predictors).
Whenworkingwiththemuchsmallersetof24BellKorpre-dictors,weusedthesettings:#trees=150,tree-size=20,shrink-age=0.2,andsampling-rate=1.0.TheresultofRMSE=0.8664wasincludedintheblend.
Itisalsobene cialtointroduceaclusteringofusersormovies,whichwillallowGBDTtotreatallusers(ormovies)ofacertainkindsimilarly.Inthepast[2],wetoutedsplittingusersintobinsbasedontheirsupport,andapplyinganequalblendingstrategyforallusersinthesamebin.ThisisalreadyaddressedintheGBDTimplementationdescribedabove,thankstoaddingtheusersupportvariabletotheblendedfeatures.Howeverwecanintroduceadditionalkindsofuserrelationshipstothescheme.Forexample,amatrixfactorizationmodelcomputesashortvectorcharacterizingeachuser(auserfactor).Like-mindedusersareexpectedtogetmappedtosimilarvectors.Hence,addingsuchvectorstotheblendedfeaturesetswilleffectivelyallowGBDTtosliceanddicetheuserbaseintosubsetsofsimilarusersonwhichthesameblendingrulesshouldbeapplied.Thesamecanbedonewithmovies.
Weincludedintheblendthreeformsofthisidea,allappliedonthesetof24BellKorpredictors.FirstweaddedtotheblendedpredictorsfeaturesfromthetimeSVD++model(16)ofdimensionalityf=20.Thisway,allindividualbiastermswereaddedasfeatures.Inaddition,foreachmovie-userpairu i,weaddedthe20-Dmoviefactor(qi+qi,fui),andthe
20-Duserfactorp)+|R(u)| 1
u(tui∑j∈R(u)yj.ThisresultedinRMSE=0.8661.
Second,weusedthe20hiddenunitsofanRBMasa20-Duserrepresentation(inlieuofthetimeSVD++user
赢得netflix推荐系统大奖的算法
representation).ThemovierepresentationisstillbasedonthetimeSVD++model.TheresultingRMSEisalso0.8661.
Finally,weaddedk-NNfeaturesontopofthetimeSVD++features.Thatis,foreachu ipair,wefoundthetop20moviesmostsimilartoi,whichwereratedbyu.Weaddedthemoviescores,eachmultipliedbytheirrespectivesimilaritiesasadditionalfeatures.SimilaritiesherewereshrunkPearsoncorrelations[1].ThisslightlyreducestheRMSEto0.8660.AnotherusageofGBDTisforsolvingaregressionproblempermovie.Foreachuserwecomputeda50-Dcharacteristicvectorformedbythevaluesofthe50hiddenunitsofarespectiveRBM.Then,foreachmovieweusedGBDTforsolvingtheregressionproblemoflinkingthe50-Duservectorstothetrueuserratingsofthemovie.Theresult,withRMSE=0.9248,willbedenotedas[PQ7]inthefollowingdescription.
B.ListofBellKor’sProbe-Qualifyingpairs
Welistthe24BellKorpredictorswhichparticipatedintheGBDTblending.Noticethatmanymoreofourpredictorsareinthe nalblendofQualifyingresults(asmentionedearlierinthisarticle).However,onlyforthoselistedbelowwepossesscorrespondingProberesults,whichrequireextracomputationalresourcestofullyre-trainthemodelwhileexcludingtheProbesetfromthetrainingset.
PostProgressPrize2008predictors
Thosewerementionedearlierinthisdocument:1)PQ12)PQ23)PQ34)PQ45)PQ56)PQ67)PQ7
ProgressPrize2008predictors
Thefollowingisbasedonournotationin[3]:8)SVD++(1)(f=200)
9)Integrated3)(f=100,k=300)10)SVD++((f=500)
11)FirstneighborhoodmodelofSec.2.2of[3]
(RMSE=0.9002)
12)Aneighborhoodmodelmentionedtowardstheendof
Sec.2.2of[3](RMSE=0.8914)ProgressPrize2007predictors
Thefollowingisbasedonournotationin[2]:13)Predictor#4014)Predictor#3515)Predictor#6716)Predictor#75
17)NNMF(60factors)withadaptiveuserfactors18)Predictor#8119)Predictor#73
20)100neighborsUser-kNNonresidualsofallglobal
effectsbutthelast421)Predictor#8522)Predictor#45
9
23)Predictor#8324)Predictor#106
OnelastpredictorwithRMSE=0.8713isinthe nalblend.Itisbasedontheblendingtechniquedescribedinpage12of[3].Thetechniquewasappliedtothefourpredictorsindexedaboveby:2,9,12,and13.
VIII.CONCLUDINGREMARKS
GrantingthegrandprizecelebratestheconclusionoftheNet ixPrizecompetition.Wideparticipation,extensivepresscoverageandmanypublicationsallre ecttheimmensesuc-cessofthecompetition.Dealingwithmovies,asubjectclosetotheheartsofmany,wasde nitelyagoodstart.Yet,muchcouldgowrong,butdidnot,thankstoseveralenablingfactors.The rstsuccessfactorisontheorganizationalside–Net ix.Theydidagreatservicetothe eldbyreleasingapreciousdataset,anactwhichissorare,yetcourageousandimportanttotheprogressofscience.Beyondthis,bothdesignandconductofthecompetitionwere awlessandnon-trivial.Forexample,thesizeofthedatawasrightontarget.Muchlargerandmorerepresentativethancomparabledatasets,yetsmallenoughtomakethecompetitionaccessibletoanyonewithacommodityPC.Asanotherexample,Iwouldmentionthesplitofthetestsetintothreeparts:Probe,Quiz,andTest,whichwasessentialtoensurethefairnessofthecompetition.Despitebeingplannedwellahead,itprovedtobeadecisivefactorattheverylastminuteofthecompetition,threeyearslater.Thesecondsuccessfactoristhewideengagementofmanycompetitors.Thiscreatedpositivebuzz,leadingtofurtherenrollmentofmanymore.Muchwassaidandwrittenonthecollaborativespiritofthecompetitors,whichopenlypublishedanddiscussedtheirinnovationsonthewebforumandthroughscienti cpublications.Thefeelingwasofabigcommunityprogressingtogether,makingtheexperiencemoreenjoyableandef cienttoallparticipants.Infact,thisfacilitatedthena-tureofthecompetition,whichproceededlikealongmarathon,ratherthanaseriesofshortsprints.
Anotherhelpfulfactorwassometouchofluck.Themostprominentoneisthechoiceofthe10%improvementgoal.Anysmalldeviationfromthisnumber,wouldhavemadethecompetitioneithertooeasyorimpossiblydif cult.Inaddition,thegoddessofluckensuredmostsuspenseful nishlinesinboth2007ProgressPrizeand2009GrandPrize,matchingbestsportsevents.
Thescienceofrecommendersystemsisaprimebene ciaryofthecontest.Manynewpeoplebecameinvolvedinthe eldandmadetheircontributions.Thereisaclearspikeinrelatedpublications,andtheNet ixdatasetisthedirectcatalysttodevelopingsomeofthebetteralgorithmsknowninthe eld.Outofthenumerousnewalgorithmiccontributions,Iwouldliketohighlightone–thosehumblebaselinepredictors(orbiases),whichcapturemaineffectsinthedata.Whiletheliteraturemostlyconcentratesonthemoresophisticatedalgorithmicaspects,wehavelearnedthatanaccuratetreatmentofmaineffectsisprobablyatleastassigni cantascomingupwithmodelingbreakthroughs.
Finally,wewereluckytowinthiscompetition,butrecog-nizetheimportantcontributionsofthemanyothercontestants,
赢得netflix推荐系统大奖的算法
10
fromwhichwehavelearnedsomuch.Wewouldliketothankallthosewhopublishedtheirresults,thosewhoparticipatedinthewebforum,andthosewhoemaileduswithquestionsandideas.Wegotourbestintuitionsthisway.
REFERENCES
[1]R.BellandY.Koren.ScalableCollaborativeFilteringwithJointly
DerivedNeighborhoodInterpolationWeights.IEEEInternationalCon-ferenceonDataMining(ICDM’07),pp.43–52,2007.
[2]R.Bell,Y.KorenandC.Volinsky.TheBellKorSolutiontotheNet ix
Prize.2007.
[3]R.Bell,Y.KorenandC.Volinsky.TheBellKor2008Solutiontothe
Net ixPrize.2008.
[4]nning.TheNet ixPrize.KDDCupandWorkshop,
[5]J.H.Friedman.GreedyFunctionApproximation:AGradientBoosting
Machine.TheAnnalsofStatistics,Vol.29(2001),pp.1189-1232.
[6]putationalStatistics
&DataAnalysis,Vol.38(2002),pp.367–378.
[7]Y.Koren.FactorizationMeetstheNeighborhood:aMultifacetedCol-laborativeFilteringModel.Proc.14thACMSIGKDDInt.Conf.onKnowledgeDiscoveryandDataMining(KDD’08),pp.426–434,2008.[8]Y.Koren.CollaborativeFilteringwithTemporalDynamics.Proc.15th
ACMSIGKDDInt.Conf.onKnowledgeDiscoveryandDataMining(KDD’09),pp.447–456,2009.
[9]P.Li,C.J.BurgesandQ.Wu.Mcrank.LearningtoRankUsingMultiple
Classi cationandGradientBoosting.Proc.21stProc.ofAdvancesinNeuralInformationProcessingSystems,2007.
[10]D.MetzlerandT.Kanungo.MachineLearnedSentenceSelection
StrategiesforQuery-BiasedSummarization.SIGIRLearningtoRankWorkshop,2008.
[11]A.Paterek.ImprovingRegularizedSingularValueDecompositionfor
CollaborativeFiltering.Proc.KDDCupandWorkshop,2007.
[12]R.Salakhutdinov,A.MnihandG.Hinton.RestrictedBoltzmannMa-chinesforCollaborativeFiltering.Proc.24thAnnualInternationalConferenceonMachineLearning,pp.791–798,2007.[13]A.T¨oscherandM.Jahrer.TheBigChaosSolutiontotheNet ixPrize
2008,2008.[14]A.T¨oscherandM.Jahrer.TheBigChaosSolutiontotheNet ixGrand
Prize,2009.[15]A.T¨oscher,M.JahrerandR.Legenstein.ImprovedNeighborhood-based
AlgorithmsforLarge-ScaleRecommenderSystems.KDD’08WorkshoponLargeScaleRecommendersSystemsandtheNet ixPrize,2008.[16]/TreeNet.
php.
[17]Z.Zheng,H.Zha,T.Zhang,O.Chapelle,K.ChenandG.Sun.A
GeneralBoostingMethodanditsApplicationtoLearningRankingFunctionsforWebSearch.Proc.21stProc.ofAdvancesinNeuralInformationProcessingSystems,2007.
正在阅读:
推荐系统netflix获奖算法06-01
公路工程试验检测考试题(公路)01-22
皮带机安装整治标准04-10
党员转正表态发言12-11
给自己蝶变的机会08-02
孟德尔的豌豆杂交实验(一)教学反思02-21
苯乙烯材料制备原理10-11
师德建设档案03-17
氧化锌05-28
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- 算法
- 获奖
- netflix
- 推荐
- 系统