A new variance estimator for parameters of semiparametric generalized additive models
更新时间:2023-05-13 02:33:01 阅读量: 实用文档 文档下载
- 阿根廷推荐度:
- 相关推荐
ANewVarianceEstimatorforParametersofSemiparametricGeneralizedAdditiveModels
W.DanaFLANDERS,MitchKLEIN,andPaigeTOLBERT
Generalizedadditivemodels(GAMs)havebecomepopularintheairpollutionepi-demiologyliterature.Twoproblems,recentlysurfaced,concernimplementationofthesesemiparametricmodels.The rstproblem,easilycorrected,waslaxityofthedefaultconver-gencecriteria.Theother,notedindependentlybyKlein,Flanders,andTolbert,andRamsay,Burnett,andKrewskiconcernedvarianceestimatesproducedbycommerciallyavailablesoftware.Insimulations,theywereasmuchas50%toosmall.Wederiveanexpressionforavarianceestimatorfortheparametriccomponentofgeneralizedadditivemodelsthatcanincludeuptothreesmoothingsplines,andshowhowthestandarderror(SE)ingMonteCarloexperiments,weevaluatedperformanceoftheestimatorin nitesamples.TheestimatorperformedwellinMonteCarloexperiments,inthesituationsconsidered.However,ingdatafromourstudyofairpollutionandcardiovasculardisease,thestandarderrorestimatedusingthenewmethodwasabout10%to20%largerthanthebiased,commerciallyavailablestandarderrorestimate.
KeyWords:Epidemiologicmethods;Generalizedadditivemodels;Semiparametricmod-els;Variance.
1.INTRODUCTION
Generalizedadditivemodels(GAMs),arelativelynewapproachtononparametricorsemiparametricsmoothinganddataanalysis(HastieandTibshirani1990),havebecomewidelyused,particularlyintimeseriesanalysesofacutehealtheffectsofairpollution.Insemiparametricmodels,thefocusofthisarticle,themeanofthedependentvariableismodeledasaparametric,linearfunctionofsomepredictorsplusasumoffunctionsofotherpredictors,whichinsomeapplicationsmaybeconfoundersornuisancefactors.Theformofthefunctionusedfortheseotherpredictorsisquitegeneral,hencethetermsemiparametric.W.DanaFlandersisProfessor,RollinsSchoolofPublicHealth,DepartmentofEpidemiology,EmoryUniversity,1518CliftonRoad,Atlanta,GA30327(E-mail: anders@sph.emory.edu).MitchKleinisAssistantProfessor,andPaigeTolbertisAssociateProfessor,RollinsSchoolofPublicHealth,DepartmentofEpidemiology andDepartmentofEnvironmentalandOccupationalHealth,EmoryUniversity,RollinsSchoolofPublicHealth,1518CliftonRoad,Atlanta,GA30327.
©2005AmericanStatisticalAssociationandtheInternationalBiometricSociety
JournalofAgricultural,Biological,andEnvironmentalStatistics,Volume10,Number2,Pages246–257DOI:10.1198/108571105X47010
246
VARIANCEOFSEMIPARAMETRICGAMS247
SchwartzproposedapplicationofGAMStotimeseriesstudiesassessingtheassociationofairpollutionwithmortalityorotheroutcomemeasuresin1994(Schwartz1994a),andinitiallypresentedGAMmodelsasasensitivityanalysisaugmentingaparametricapproach(Schwartz1994b).Intheinterveningyears,GAMshavegainedwidespreadpopularityforuseinthesetypesoftimeseriesstudies(e.g.,Borja-Aburtaetal.1998;Michelozzietal.1998;Burnettetal.1999;Conceicaoetal.2001;Moolgavkar2000;Pope,Hill,andVillegas1999;Sametetal.2000).
GAMscangenerallybe tusingS-PlusorusingPROCGAMinSAS(SAS2001).Asdiscussedinthefollowing,themodelscanbe tusingaback ttingalgorithm.HastieandTibshirani(1990)discussedconditionsthatassureconvergenceofthisapproach.Twoproblemshaverecentlysurfaced,however,concerningimplementationofthesemodels.The rstproblem,easilycorrected,wasthatthedefaultconvergencecriteriawerenotadequatelystrict(Dominici,McDermott,Zeger,andSamet2002;Katsouyannietal.2002).TheotherproblemconcernsthevarianceestimatesproducedbytheseprogramsfortheparametriccomponentofthesemiparameticGAMs.TheproblemwasnotedindependentlybyKlein,Flanders,andTolbert(2002)andbyRamsay,Burnett,andKrewski(2003).Theyshowedthatthevarianceestimatescouldbeasmuchas50%lowerthanthesimulatedvarianceinsomeofthesituationsconsidered.Thisarticleaddressesthesecondproblembyderivingarelativelyeasilyimplementablevarianceestimatorforthesemodels.
Oneofthespeci cproblemsthatmotivatedthisworkarethenumerouspublishedorongoingstudiesoftheassociationsbetweenhealthoutcomes,suchasrespiratorydisease,andairpollution(e.g.,Borja-Aburtaetal.1998;Michelozzietal.1998;Burnettetal.1999;Conceicaoetal.2001;Moolgavkar2000;Pope,Hill,andVillegas1999;Sametetal.2000;Tolbertetal.2000).Someofthestudiespublishedbyothershaveusedgeneralizedadditivemodelstoassesstheassociationbetweenairpollutionanddisease,butanappropriatevarianceestimatorhasbeenunavailable.
Thepurposeofthisarticleisthree-fold.First,wepresentanasymptoticvarianceesti-matorfortheparametriccomponentofGAMsemiparametricmodels,providinganexplicitformulationforuptothreesplines.Second,weempiricallyevaluatetheperformanceofthisestimatorin nitesamplesusingMonteCarlosimulationsandbasethesesimulationsonactualdatafromanongoingstudyofairpollution.Finally,weapplytheestimatortoanongoingstudyofairpollutionandemergencydepartmentvisits.Weillustratethat,inthisstudy,thevarianceweestimatediffersfromthecorrespondingestimatesproducedbycommerciallyavailablesoftware.
2.METHODS
Inthesemiparametricsituationsofinteresthere,thegeneralizedadditivemodelisgivenby:
E(Yi|Xi,Z1i,...,ZJi)=g–1(ηi)=g 1(α+βXi+f1(Z1i)
+···+fJ(ZJi)),i=1,2,...,n,(2.1)
248W.D.FLANDERS,M.KLEIN,ANDP.TOLBERT
whereYiisthenumberofeventsfortheithobservation;gisastrictlymonotonelinkfunction;ηi=α+βXi+f1(Z1i)+···+fJ(ZJi);βis(p×1)parameterofinteresttobeestimated;Xiisa(1×p)vectorofpredictors;Zjiisthevalueofthejthcovariatefortheithobservation;andfj(Zji)isanarbitrary(smoothing)functionwithcontinuoussecondderivatives,forj=1toJ.(Here,welimitconsiderationtoJ≤3,butresultsshouldextendinananalogouswayforJ>3.)WeassumeherethattheYigivenXi,andZ1iareindependentwithaPoissondistributionwhosemeanisgivenbyEquation(2.1).ThePoissondistributionistypicallyusedinapplicationsinairpollutionepidemiology.However,resultsholdwithobviousmodi cationsforotherdistributionsintheexponentialfamily(HastieandTibshirani1990).
WederiveanexplicitexpressionforthevarianceestimatorforaclassofestimatorsofβinthemodelgivenbyEquation(2.1),estimatedbypenalizedlikelihood(HastieandTibshirani1990).Thatis,onemaximizes
1 λjj(β,f)=l(η;Y) 23j=1 fj(z) 2dz,(2.2)
overηandoverfjintheclassoffunctionswithcontinuoussecondderivativesfj.Here,the
λjaresmoothingparameterswhichmustbespeci ed,orestimatedfromthedata(HastieandTibshirani1990).ThefunctionsthatmaximizeEquation(2.2),arecubicsmoothingsplines(HastieandTibshirani1990).Anequivalentproblem(HastieandTibshirani1990)istomaximize
l(η;Y) 1 tλjfjKjfj,2(2.3)
whereKjarethen×nquadraticpenaltymatricesgiven,forexample,byBuja,Hastie,andTibshirani(1989)forj=1,2,3;fjarethen×1vectorsfj(Zi),i=1,2,...n,j=1,2,3;andthesuperscript“t”ing[A] todenotethegeneralizedinverse,themodelcanbe tusingalocalscoringprocedurethatincorporatestheweightedsmoothingmatricesSj=(A+Kj) Ainaback ttingalgorithmasdescribedbyHastieandTibshirani(1990).Weassumesuf cientregularityandchoiceofsmoothingparameterssothatthelocalscoringprocedureandtheback ttingalgorithmthatitincludes
i→βoforeachiandconvergeinprobabilityasthesamplesizeincreases.Inparticular,β
2,i→f2,0,andf 3,i→f3,0,inprobabilitywhereβ i 1,f 1,i,f 2,i,f 3,iarethe 1,i→f1,0,ff
parameterestimatesatstepi;andβo,f1,0,f2,0,andf3,0arethecorrespondingtruevalues.
isstraightforward:we ndOurapproachtoestimatingthelargesamplevarianceofβ
intermsofY,E(Y),andknown(oralargesample,approximate,linearexpressionforβ
consistentlyestimable)functions.
VARIANCEOFSEMIPARAMETRICGAMS249
ByargumentspresentedbyHastieandTibshirani(1990),theone-stepupdatesfortheNewton-Raphsonstepofthe ttingalgorithmattheithsteparegivenby:
t 1t c Xβi=XXA(I S2)XXAZ f1,i 1 f2,i 1 f3,i 1, c =S1Z Xβ f2,i 1 f3,i 1,f1,i c =S2Z Xβ f1,i 1 f3,i 1, f2,i
and
f3,i= c S3Z Xβ f1,i 1 f2,i 1,(2.4)(2.5)(2.6)(2.7)whereAisthen×nmatrix 2l/ η ηt;listhePoissonlog-likelihood;Zcisthen×1vectoroflinearizeddependentvariablesZc=ηc+A 1u;andwhereuisthen×1vector l/ ηc,allevaluatedusingthecurrentestimatesofβ,f1,f2,f3,andη.
Equations(2.4)–(2.7)areasystemoffourequationsinfourunknowns(vectors).Astraightforward,thoughtediousderivationyieldsthefollowingclosedformestimateforthe
i:Newton-Raphsonupdateofβ
i=XtA(I–V1–V2 V3)X XtA(I–V1–V2 V3)Zc,β
where
V1 I S1S3–S1(I S3)(I S2S3) S2(I S3) S1I S3 (I S3)I S2S)3S2(I S3)
=I S2S1–S2(I S1)(I S3S1) S3(I S1) S2I S1 (I S1)[I S3S1] S3(I S1) =I S3S2 S3(I S2)[I S1S2]
S1(I S2)] S3I S2 (I S2)[I S1S2] S1(I S2),=(2.8)V2V3
i 1,f 1,i,f 2,i,f 3,i,andη.andZcisexpressedintermsofβ
Equation(2.8)isanexplicitformformultiplesmoothingsplines(intermsoftheindividualsmoothersS1,S2,andS3),oftheresultgivenbyHastieandTibshirani(1990)of
=[XtA(I S)X] (XtA(I S))Zc.BytakingS=(I V1 V2 V3),onetheform:β
seesthattheone-stepupdateforβinEquation(2.8)isconsistentwithHastieandTibshirani(1990)whogaveexplicitresultsforasinglespline,butnotedthatthesameformoftheequationwouldholdformultiplesplines.Anexplicitformformultiplesmoothingsplines(intermsoftheindividualsmoothers)canbeobtainedbyapplyingtherecursiveequationdescribedintheAppendix.
Becausetheestimatesconvergeinprobabilitybyassumption,wecanimaginestarting
istheprocessatthetruevalueηt.Then,withalargesamplesize,theone-stepestimatorβ
250W.D.FLANDERS,M.KLEIN,ANDP.TOLBERT
givenby:
=XtA(I V1 V2 V3)X XtA(I V1 V2 V3)Zt,β(2.9)
whereZt=ηt+A 1u—allparametersandexpressionsnowevaluatedattheirtruevalues.Thus,followingtheargumentsofbyMcCullaghandNelder(1989),theone-stepestimatorisalinearfunctionoftheobservations,andsubsequentupdatesshouldbenegligibleforalargesamplesize(asymptotically).
Thus,theasymptoticvarianceisgivenby:
)var(β
where
W=≈W var(ZT) Wt t W=XtA(I V1 V2 V3)XXA(I V1 V2 V3).
(2.10)
Furthermore,wecanestimatevar(ZT)byA 1andW,withallparametersevaluatedattheestimatedvalues.
3.SIMULATIONS
Toevaluatetheperformanceofthisvarianceestimator(Equation(2.10))in nitesamplesizes,weperformedMonteCarlosimulationsintwodifferentsetsofsituations—onebasedonrealdata,theotherusinghypotheticaldata.Inthe rstsituation,weuseddatafromourongoingstudyofemergencydepartment(ED)visitsforcardiorespiratorydiseasesandairpollutioninAtlanta(Tolbertetal.2000).TheoutcomevariablewaseitherdailyEDvisitsforalltypesofcardiovasculardisease(CVD)orforasthmafromAugust1,1998,toJuly31,1999.Weanalyzeddatafordailynitrogendioxide(NO2)andseparatelyfordailyparticulatematter(PM10),resultinginatotaloffourexperimentalconditionsbasedonrealdata(Table
1).WecontrolledforotherEDvisits,temperature,dewpoint,dayoftheweek,andtimeusingsmoothingsplinesinaGAMmodel:
EYi|Xi,Z1i,...,ZJi
=exp(α+βXi+f1(Z1i)+f2(Z2i)+f3(Z3i)),fori=1,2,...,n,(3.1)whereXti=(airpollutantondayi,allEDvisitsondayi,ITue,IWed,IThu,IFri,ISat,ISun),ITue ISunareindicatorsfordayi;Z1i=numberofdayssincethestartofthestudy,Z2i=themeantemperaturefordayi,andZ3i=themeandewpointondayi.Toavoidtiesanddivisionby0,weaddedasmallrandomnumber(mean0,variance.01)toeachtemperatureanddewpoint.Wethen tthisGAMtotheobserveddata,using14degreesoffreedomforthetimespline,correspondingapproximatelytomonthlyknots;and,5eachfortemperatureanddewpointsplines.(Weused7degreesoffreedomfortheCVDoutcome,correspondingapproximatelytoseasonalknots).Wethensavedthesemodelpredictedvalues,andusedtheminthesimulationsastheexpectedvalues,andgeneratedindependentlyforeachday,
VARIANCEOFSEMIPARAMETRICGAMS251
Table1.MonteCarloSimulationResults,ExpectedValuesCalculatedfromActualEDvisitsandAir
Pollutants,inAtlanta
CoveragedCoveragee
95%CI oldSE newe95%CI new
90.7%
93.4%
89.4%
90.2%.0123.0345.0226.038395.3%95.5%95.7%94.5% )cS OutcomePollutantTrueβ1bSE(βE olddCVDCVDAsthmaAsthma
a
bPM10NO2PM10NO2.011.006.040.014.0124.0336.0219.0378.0111.0313.0187.0314CVD(Asthma):Emergencyroomvisitsforcardiovasculardisease(asthma),usedtodetermineexpecteddailycountsValueofβ1usedtodetermineexpectedvalueofY,foreachsetof1,000MonteCarloexperiments.c 1foreachsetof1,000experiments.StandarderrorofβdMeanestimatedstandarderror,andcoverageof95%CIproducedbySASPROCGAM(3.1);95%CIcalculatedasthepointestimate+/ 1.96timestheestimatedstandarderror.
eMeanestimatedstandarderror,andcoverageof95%CIbasedonnewvarianceestimator;95%CIcalculatedasthepointestimate+/ 1.96timestheestimatedstandarderror.
Poissonrandomvariables.Weanalyzedthisrandomlygeneratedseries,by ttingtheGAMmodel(Equation(3.1)),andsavingthestandarderrorgeneratedbySASandthatcalculatedusingEquation(2.10).Werepeatedthisprocess1,000timesforeachoutcome-airpollutantcombination.Wefocusonβ1,thelograteratiofortheassociationofEDvisitswiththeairpollutant.
Thesecondsetofsituationsusedwassimilartothatjustdescribed,exceptthathypo-thetical,simulateddatareplacedtheobserveddatausedabovetogeneratethedailyexpectedcountsofEDvisits.Speci cally,fori=1,2,...200wegenerated(hypothetical)exposureandcovariates:εi,x2,i,t2,i,t3,iasindependent,standardGaussianvariables;x1,i=εi+x2,iandt1,i=i.Wethende nedtheexpectedvalueofYias:
EYi|x1i,x2i,t1i,t2i,t2i
1132=expβ0+β1x1 β2x2+cos(t1/5)+sin(t2/5)+(t3 t3+3/4t3)/200.22
(3.2)
Wechoseβ0=3,4,or5andβ1=.0or.04,andβ2=.1,resultinginatotalofsixadditionalexperimentalconditions(Table2).Werandomlygenerated200preliminaryvaluesofYiasindependentPoissonvariableswithmeangivenbyEquation(3.2).Sothatthemodelwouldbecorrectlyspeci ed,wethen taGAMwithasmoothingsplinesusingsixdegreesoffreedomeachfort1,t2,andt3tothesepreliminaryobservationsandusedthemodel-predictedvaluesastheexpectedvaluestogeneratethe200observationsYiusedforeachMonteCarloexperiment.Weadjustedthesmoothingparameters(λ1,λ2,λ3)tocorrespondtothesixdegreesoffreedomthatwespeci edforuseinPROCGAM.(Thatis,weretainedthesamex1,x2,t1,t2,andt3foreachexperiment,butusedthepredictedvaluesfromthemodel ttothepreliminarydataastheexpectedvaluesforall1,000MonteCarloexperimentsineachset.Thisprocedureshouldensurethattheexpectedvalueswereinthespaceofpossible ts.Becauseofthisprocess,the“truevalue”ofβ1foreachsimulation
252
Table2.W.D.FLANDERS,M.KLEIN,ANDP.TOLBERTMonteCarloSimulationResults,ExpectedValuesCalculatedHypotheticalData
MeanofCoveragecCoveraged
)bS ExperimentExpectedYTrueβ1aSE(βE oldc95%CI oldSE newd95%CI new
1
2
3
4
5
6
a
b21.951.215722.359.9165.0010 .0001.0050.049.064.065.0158.0096.0057.0161.0098.0058.0123.0073.0042.0125.0076.004687.8%85.7%84.6%87.6%89.1%88.6%.0152.0096.0056.0163.0099.006093.3%95.0%94.7%95.3%95.0%95.3%Valueofβ1usedforeachsetof1,000MonteCarloexperiments.
foreachsetof1,000experiments.Standarderrorofβ1cMeanestimatedstandarderror,andcoverageof95%CIproducedbySASPROCGAM(3.1);95%CIcalculatedas
thepointestimate+/ 1.96timestheestimatedstandarderror.
dMeanestimatedstandarderror,andcoverageof95%CIbasedonnewvarianceestimator;95%CIcalculatedasthepointestimate+/ 1.96timestheestimatedstandarderror.
setwasequaltotheinitialestimate,anddifferedslightlyfrom0or.04dependingonthebaselineestimates.)UsingPROCGAMinSAS(2001),weestimatedβ11,000timesforeachrandomlygeneratedseries.Wecalculatedthestandarderroroftheestimatedβ1’s,andcomparedthisstandarderrorwiththeestimatedstandarderrorsproducedbytheSASprogram,andwiththenewstandarderrorestimateinEquation(2.10).
Inthethirdsetofexperiments(Table3),weconsideredestimationwithasmallernumberoftimepoints(either100or50),andchoseeither6,4,or2degreesoffreedomforthesmoothingsplines.Otherwise,thislastsetofexperimentsislikethesecondset,speci callyexperiment6.Intwoexperiments(12and14),weusedtwoalternativetypesoferrorstructures,thenormalandthebinomialdistributions.
4.RESULTS
Resultsofthe rstsetofMonteCarloexperiments—basedontheactualobservationsofEDvisitsandairpollutantsinAtlanta(situation1)—areshowninTable1.Theseresultsillustrateseveralpoints.First,theyillustrateatendencyforunder-estimationofthestandarderrorsbytheSASprocedurePROCGAM(3.1).Forexample,the“old”standardestimateforβ1(whichrelatesemergencyroomvisitstoPM10)averagedabout.0111,comparedto.0123,thesamplestandarderrorofestimatedβ1’s.Ontheotherhand,thenewstandarderrorestimatorwasaboutthesame,onaverage,asthesamplestandarderror.Asimilarpatternheldfortheotherpollutant,andwhenasthmaemergencyroomvisitswereusedtodeterminetheexpectedvalues.Thesecondpoint,relatedtothe rst,isthatcoverageofthe95%con dencelimitsbasedonthe“old”varianceestimateswasconsistentlylessthanthenominalvalueof95%.Thecoverageintheseexperimentswasaslowas89%inoneinstance.Incontrast,thecoverageofcon denceintervalsbasedonthenewvarianceestimatorwasclosetothenominallevel.
ResultsofthesecondsetofMonteCarloexperiments—basedonhypotheticaldata(situation2)—areshowninTable2.Theseresultsfurthersupportthesesamepatterns.
VARIANCEOFSEMIPARAMETRICGAMS
Table3.
Experiment/
Distribution
8 /Poisson
9/Poisson
10/Poisson
11/Poisson
12/Poisson
13/Normal
14/BinomialMonteCarloSimulationResults,ExpectedValuesCalculatedHypotheticalData253N,df100,6100,450,650,450,2100,2100,4 )bS Trueβ1aSE(βE oldc.031.044.031.056.076 .038.037.0100.0083.0134.0121.0119.0920.0116.0076.0072.0072.0111.0112.0904.0088CoveragecCoveraged 95%CI oldSE newd95%CI new89.4%82.4%71.6%92.2%93.3%94.7%79.9%.0100.0085.0133.0119.0119.0913.011395.1%93.4%94.4%94.8%94.9%94.9%94.4% EachexperimentislikeExperiment6inTable2,exceptfor:theerrordistribution,thenumberoftimepoints(N),andthedegreesoffreedom(df).
aValueofβ1usedforeachsetof1,000MonteCarloexperiments.
b 1foreachsetof1,000experiments.StandarderrorofβcMeanestimatedstandarderror,andcoverageof95%CIproducedbySASPROCGAM(3.1);95%CIcalculatedasthepointestimate+/ 1.96timestheestimatedstandarderror.
dMeanestimatedstandarderror,andcoverageof95%CIbasedonnewvarianceestimator;95%CIcalculatedasthepointestimate+/ 1.96timestheestimatedstandarderror.
Speci cally,theoldstandarderrorestimatesareconsistentlylowerthanthesamplestandarderroroftheestimatedβ’s,andthecoverageoftheassociatedcon denceintervalswasconsistentlylowerthanthenominal95%level.
InthethirdsetofMonteCarlosexperiments,thenewestimatorofthestandarderroryieldedresultsclosetothesimulatedstandarddeviationoftheestimatedβ,whenwereducedthenumberoftimepointsfrom100asinearlierexperimentstoeither100or500inthese.Inafewexperimentswiththelowernumberoftimepointsandifweusedmorethan2–4
differedslightly,butsigni cantlyfromdegreesoffreedomforeachspline,theaverageβ
thetrueβ(datanotshown).
5.EXAMPLE:APPLICATIONOFNEWMETHOD
WeappliedtheestimatortodatafromanongoingstudyofairpollutionandEDvisitsforcardiorespiratorydiseasesinAtlanta(Tolbertetal.2000),thedatausedherefromAugust1,1998,toJuly31,1999.OurrationaleforusingGAMsre ectstheirinherentappealdueinparttothesemiparametricnatureofthetimedependencyandconsequentrelaxationofassumptions,andthefrequentuseofthesemodelsintheairpollutionliterature.Wecalculatedthestandarderrorforparameterestimatesfromasemiparameticgeneralizedadditivemodel,withthePoissondistributionandloglink,executedusingPROCGAMinSAS,andcomparedthisestimatetothestandarderrorestimatedwithournewmethod.WechosethedegreesoffreedomforthesplinestobesimilartothoseweusedinparametricPoissonregression;Useofgeneralizedcross-validation,thedefaultapproachinSAS(SASInstitute2001),suggestedslightlyfewerdegreesoffreedom,butledtothesameresults.Weevaluatedtheassociationbetweennitrogendioxide(NO2)andERvisitsforallCVD,usingthethree-daymovingaverageofNO2,inpartduetoaprioriinterest,andcontrollingfor
254W.D.FLANDERS,M.KLEIN,ANDP.TOLBERT
time,formeantemperature,andfordewpointusingcubicsplineswith7,7,and5degreesoffreedom,respectively.Wealsocontrolledfordayoftheweekusingindicatorvariablesandthenumberofemergencyroomvisitsfornoncardiovasculardisease.Tosimplifycalculationsbyavoidingties,weaddedasmallrandomnumbertothetemperatureanddewpoint(whichdidnotchangetheestimateoftheparameteroritsstandarderror).Wefoundlittleevidenceofautocorrelationofresiduals(Durbin-Watson=2.155,p=.40bysimulation).InthemodelforCVDvisits,theparameterestimateforNO2was.020(rateratio=1.020)andthestandarderrorestimatedinPROCGAMwas.018.Thestandarderrorestimatedusingthenewestimatorwas.020,about10%largerthanthatobtainedusingSAS.Thisdifferenceisimportantforevaluatingthestabilityofresults,forintervalestimation,andwouldaffectanymeta-analysisthatusedthisresult.
6.DISCUSSION
OurresultsprovideevidencethatthevarianceestimatorinEquation(2.10)workswellwith nitesamples—atleastforthesituationsconsidered.Morework,however,needstobedonetoverifythatitsperfomanceremainsgoodunderotherconditions.OurresultsalsofurthersupportandareconsistentwiththeworkofKleinetal.(2002)andofRamseyetal.(2003)whoshowedthatthevarianceestimationproceduresusedincommerciallyavailableprogramscouldbeinadequate.Thesetendenciesofcommercialsoftwaretounderestimatevarianceshavebeenattributedtoconcurvityinthedata(Ramsayetal.2003),andtoin-adequatelinearapproximationsusedforthesmoothfunctions(Dominici,McDermott,andHastie2003).Recognitionoftheseandotherproblemshasmotivatedreanalysesofatleast20studiesofairpollutionandhealtheffectswiththeoverallconclusionthatuseofGAMs,implementedwiththefaultyvarianceestimator,wasassociatedwithsmallerstandarderrorsthanuseofgeneralizedlinearmodels(HealthEffectsInstitute2003).
Wehavepresentedandevaluatedavarianceestimatorforuptothreesplines,extendingthepreviousworkofHastieandTibshirani(1990)whopresentedexplicitresultsforasinglespline,andofFlanders,Klein,andTolbert(2003).Implementationofthisapproachformoresplinesisstraightforward;forexample,perhapsusingtherecursiveequationsgiven,butasthenumberofsplinesincreases,ofcourse,computationsbecomemoreandmoreonerous.OurresultalsoappliesdirectlytothecaseofonlyoneortwosplinesbysimplytakingS1and/orS2equalto0.
Ourargumentsdependheavilyontheassumptionofconsistencyofβandη,andconvergenceoftheback ttingalgorithmasarguedbyHastieandTibshirani(1990).Wehavenotinvestigatedperformanceofthevarianceestimatorwhenthatassumptionmightfail.ConditionsotherthanthosenotedbyHastieandTibshirani(1990)mayalsoleadtoconsistency.Inparticular,wemightalsoexpectconsistencyifthenumberof(say,time)pointsremains xed,buttheexpectedmeanincreasesforeachpoint,otherparametersremainconstant,andthemodeliscorrectlyspeci edwithjudiciouschoiceofdegreesoffreedomforthesplines.Moreworkonconsistency,notthefocusofthisarticle,remains.Forexample,theapplicationtotimeseriesshouldprobablybebasedonfutherspeci cationof
VARIANCEOFSEMIPARAMETRICGAMS255
assumptionsbecause,asthenumberoftimepointsincreases,thecomplexityandingeneralthenumberofparametersintheunderlyingmodelcouldpotentiallyincreaseproportionately.Yetadditionalworkcouldallowforpotentialserialautocorrelation,althoughourexampledidnotsuggestimportantresidualautocorrelation.Wealsonotethatafterdevelopmentoftheexpressionforvariance(Flanders,Klein,andTolbert2002,Flandersetal.2003;equation10),webecameawarethatanotherestimatorwasintheprocessofbeingdevelopedtocorrecttheerrorsinthecommerciallyavailablesoftware(Dominici,McDermott,andHastie2002).Comparisonnowshowsourformulationtobeequivalenttotheirs(Dominici,McDermott,andHastie2003):onesimplysubstitutesacombinedsmoothingmatrix,S,inplaceof(I–V1–V2 V3)intheexpressionforWinEquation(2.10).Ourindependentderivation,resultsofourempiricevaluationsandourexampleshowthatthecommerciallyavailableestimatescanbetoosmall,andthatthealternativeestimatorhasgood nitesampleproperties,atleastinthesituationsconsidered.Inparticular,ourworkprovidesempiricevidence,complimentingworkofDominicietal.(2003),thatuseofEquation(2.10)foranalysisofrealdatacanleadtocon dencelimitswithappropriatecoverageproperties,againatleastforthesituationsconsidered.
SomeinvestigatorsinairpollutionepidemiologyhavesofaravoideduseofGAMsastheprimarymethodofanalysisbecauseofconcernsaboutthevarianceestimator(Kleinetal.2002),choosinginsteadtousePoissonregressionmodelswithsplinesfortimewithmanyknotsandchosen,inpart,basedonaprioriconsiderations.TheMonteCarloexper-imentssuggest—inagreementwithsimulationsofKleinetal.(2002)andofRamseyetal(2003)—thatthestandarderrorprovidedbycommerciallyavailablesoftwarecanhavesubstantialerror.Theestimatorevaluatedhereperformednicelywith nitesamplesinsim-ulationscompletedsofar.Importantly,asillustratedinoursimulations,thecorrectionhasasubstantialeffectontheestimatedstandarderrorsandcon denceintervalwhenappliedtodatabasedonanongoingstudyinAtlanta(Tolbertetal.2000),andonhypotheticaldata.Thisunderestimationisfurtherillustratedintheexample,usingdatafromourongoingstudyofairpollutioninAtlanta.Thesimulationssuggestthat,atleastforsituationslikethoseconsideredhere,theassumptionsmaybeadequatelyachievedandthatthenewestimatorcanperformwellinsuchrealsituations.AsnotedbyLumleyandSheppard(2003),majorchallengesairpolllutionepidemiologyremain,particularlyincludingmodelselectioninthefaceofmeasurementerrorandconfounding.
APPENDIX
andvar(β )intermsoftheindividualsmoothingmatriceslikeEquationsEquationsforβ
(2.8)and(2.10),butthatapplywithanynumberofsplinescanbeobtainedrecursivelyas
(suchasEquation(2.8))andvar(β )(i.e.,Equationfollows.Startwithexpressionsforβ
(2.10))thatapplywithJ 1splines,intermsofsmoothingmatricesS1,S2,...,SJ 1,
andmatrixA(then×nmatrix 2l/ η ηt,asinEquation(2.8)).Expressionsforβ
)thatapplywithoneadditionalspline(say,SJ)isobtainedbysubstituting:(I var(β
SiSJ) Si(I SJ)inplaceofSifori=1,2,...,J 1andA(I SJ)forAthroughout.
256W.D.FLANDERS,M.KLEIN,ANDP.TOLBERT
Thisrecursiveapproachisjusti edbywritingthesystemofJ+1linearEquationsin
andf ...,f ,comparabletoEquations(2.4)–(2.7);eliminatingf ;andunknownsβ1JJrearrangingtoobtainareducedsystemofJequationsinJunknowns.Thenewsystemofequationshasthesameformastheoriginal,providedweidentify(I SiSJ) Si(I SJ)withSifori=1,2,...,J 1andA(I SJ)withAthroughout.
ACKNOWLEDGMENTS
ThisworkwassupportedbygrantsfromtheU.S.EnvironmentalProtectionAgency(R82921301-0)andfromtheNationalInstituteofEnvironmentalHealthSciences(R01ES11294).
[ReceivedApril2004.RevisedNovember2004.]
REFERENCES
Borja-Aburta,V.H.,Castillejos,M.,Gold,D.R.,Bierzwinski,S.andLoomis,D.(1998),“MortalityandAmbient
FineParticlesinSouthwestMexicoCity,1993–1995,”EnvironmentalHealthPerspectives,106,849–855.Buja,A.,Hastie,T.,Tibshirani,R.,(1998),“LinearSmoothersandAdditiveModels,”TheAnnalsofStatistics,17,
453–510.
Burnett,R.T.,Smith-Doiron,M.,Stieb,D.,Cakmak,S.,andBrook,J.,(1999),“EffectsofParticulateandGaseous
AirPollutiononCardiorespiratoryHospitalizations,”ArchivesofEnvironmentalHealth,54,130–139.Conceicao,G.M.S.,Miraglia,S.G.E.K,Kishi,H.S.,Saldiva,P.N.H.,andSinger,J.M.(2001),“AirPollutionand
ChildMortality:ATime-SeriesStudyinSaoPaolo,Brazil,”EnvironmentalHealthPerspectives,109,347–350.
Dominici,F.,McDermott,A.,Zeger,S.L.,andSamet,J.M.(2002),“OntheUseofGeneralizedAdditiveModels
inTime-SeriesStudiesofAirPollutionandHealth,”AmericanJournalofEpidemiology,156,193–203.Dominici,F.,McDermott,A.,andHastie,T.(2002),“SemiparametricRegressioninTimeSeriesAnalysesofAir
PollutionandMortality:GeneralizedAdditiveandGeneralizedLinearModels,”PresentationonVarianceofGAMEstimators,EnvironmentalProtectionAgencyWorkshoponGAM-RelatedStatisticalIssuesinPMEpidemiology,November4–6,2002,Durham,NC.
(2003),“ImprovedSemi-ParametricTimeSeriesModelsofAirPollutionandMortality”[on-line],http://www.biostat.jhsph.edu/~fdominic/jasa.R2.pdf.
Flanders,W.D.,Klein,M.,andTolbert,P.(2002),“ANewVarianceEstimatorforParametersofSemi-parametric
GeneralizedAdditiveModels.AReporttotheU.S.EnvironmentalProtectionAgency,”BasedonaPre-sentationattheEnvironmentalProtectionAgencyWorkshoponGAM-RelatedStatisticalIssuesinPMEpidemiology,November4–6,2002,Durham,NC.
Flanders,W.D.,Klein,M.,andTolbert,P.(2003),“ANewVarianceEstimatorforParametersofSemi-parametric
GeneralizedAdditiveModels,”TechnicalReport,RollinsSchoolofPublicHealth,EmoryUniversity,De-partmentofBiostatistics,Atlanta,GA.
Hastie,T.J.,andTibshirani,R.J.(1990),GeneralizedAdditiveModels,MonographsonStatisticsandApplied
Probability43,NewYork:Chapman&Hall.
HealthEffectsInstitute(2003),“RevisedAnalysesofTimeSeriesStudiesofAirPollutionandHealth,”Special
Report,Boston,MA:HealthEffectsInstituteBoston,MA.
Katsouyanni,K.,Touloumi,G.,Samoli,E.,Gryparis,A.,Monopolis,Y.,LeTertre,A.,Boumghar,A.,Rossi,G.,
Zmirou,D.,Ballester,F.,Anderson,H.R.,Wojtyniak,B.,Paldy,A.,Braunstein,R.,Pekkanen,J.,Schindler,
VARIANCEOFSEMIPARAMETRICGAMS257
C.,andSchwartz,J.(2002),“DifferentConvergenceParametersAppliedtotheS-PlusGAMFunction,”Epidemiology,13,742–743.
Klein,M.,Flanders,W.D.,andTolbert,P.E.(2002),“VariancesmaybeUnderestimatedUsingAvailableSoftware
forGeneralizedAdditiveModels,”AmericanJournalofEpidemiology,155,s106.
Lumley,T.,andSheppard,L.,(2003),“TimeSeriesAnalysesofAirPollutionandHealth:StrainingatGnatsand
SwallowingCamels,”Epidemiology,14,13–14.
McCullagh,P.,andNelder,J.A.(1989),GeneralizedAdditiveModels,NewYork:ChapmanandHall,pp.327–329.Michelozzi,P.,Forastiere,F.,Fusco,D.,Perucci,C.A.,Ostro,B.,Ancona,C.,andPalotti,G.(1998),“AirPollution
andDailyMortalityinRome,Italy,”OccupationalandEnvironmentalMedicine,44,605–610.
Moolgavkar,S.(2000),“AirPollutionandHospitalAdmissionsforDiseasesoftheCirculatorySysteminThree
U.S.MetropolitanAreas,”JournalofAirWasteManagementAssociation,50,1199–1206.
Pope,C.A.,Hill,R.W.,andVillegas,G.M.(1999),“ParticulateAirPollutionandDailyMortalityonUtah’s
WasatchFront,”EnvironmentalHealthPerspectives,107,567–573.
Ramsay,T.,Burnett,R.,andKrewski,D.(2003),“TheEffectofConcurvityinGeneralizedAdditiveModels
LinkingMortalitytoAmbientAirPollution,”Epidemiology,14,18–23.
Samet,J.M.,Dominici,F.,Curriero,F.,Coursac,I.,andZeger,S.L.(2000),“FineParticulateAirPollutionand
Mortalityin20U.S.Cities:1987–1994,”NewEnglandJournalofMedicine,343,1742–1757.
SASInstitute(2001),TheSASsystemforWindows,Release8.02,TSLevel02M0,Cary,NC:SASInstitute.Schwartz,J.(1994a),“TheUseofGeneralizedAdditiveModelsinEpidemiology,”XVIIthInternationalBiometric
Conference,Hamilton,Ontario,Canada,August8-12,1994.Proceedings,Volume1:Invitedpapers.
(1994b),“AirPollutionandHospitalAdmissionsfortheElderlyinBirmingham,Alabama,”AmericanJournalofEpidemiology,139,589–598.
Tolbert,P.E.,Klein,M.,Metzger,K.B.,Peel,J.,Flanders,W.D.,Todd,K.,Mulholland,J.A.,Ryan,P.B.,and
Frumkin,H.(2000),“InterimResultsoftheStudyofParticulatesandHealthinAtlanta(SOPHIA),”JournalofExposureAnalysisandEnvironmentalEpidemiology,20,446–460.
正在阅读:
A new variance estimator for parameters of semiparametric generalized additive models05-13
液压与气压传动试题库02-27
2014年中国快运快递行业竞争报告05-12
体育教育专业排球必修课模拟试题(四)10-10
电子测量原理古天祥版各章习题附详细答案05-30
市卫生健康局年度全面依法治市工作总结和来年工作计划08-08
2014级信息技术学业水平考试算法真题库05-08
补充练习第六章函数11-02
- 1《皇帝 龙之崛起》Models修改
- 2Generalized Schur methods to compute coprime factorizations of rational matrices
- 3Storage device performance prediction with CART models
- 42005The future of animal models of invasive aspergillosis
- 5Orthogonal polynomial method and odd vertices in matrix models
- 6New Zealand
- 7argumentative essay and inductive__ essay models for studen
- 8Automatic reconstruction of colored 3d models
- 9Comparing Agent-Based and Differential Equation Models
- 10Fitting Parameterized Three-dimensional Models to Images
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- semiparametric
- generalized
- parameters
- estimator
- variance
- additive
- models
- new
- 2013年江苏理综(化学)
- 数电课程设计 八路数字抢答器
- 不干胶印刷项目可行性研究报告
- 第16课主要资本主义国家的发展变化
- 核西北公司录用通知书
- 基于CDIO的《软件工程》实践课程教学改革与探索
- 2010云南省第一学期毛概试题及答案
- 湘潭大学十字路口的交通灯控制电路设计的数电论文
- 5初二 等边三角形
- 品牌授权许可使用合同律师拟定版本
- 迎接新员工演讲稿
- 冠珠陶瓷业务员培训资料
- 江苏省泗阳实验初中2012秋七年级数学期末复习综合练习二 新人教版
- 2009年9月计算机等考三级数据库试题及参考答案
- 本科生毕业论文(设计)工作表 (1)
- 盛大收购新浪案例分析
- 河北省清河一中2009届高三第二次月考语文试题
- 环保及公用事业行业周报:天然气保供能力提升终端顺价再推进,全社会用电增速回落
- 对信用社奶牛养殖业贷款情况的调查通用范本
- 石膏、夹板 骨牵引固定技术