A new variance estimator for parameters of semiparametric generalized additive models

更新时间:2023-05-13 02:33:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

ANewVarianceEstimatorforParametersofSemiparametricGeneralizedAdditiveModels

W.DanaFLANDERS,MitchKLEIN,andPaigeTOLBERT

Generalizedadditivemodels(GAMs)havebecomepopularintheairpollutionepi-demiologyliterature.Twoproblems,recentlysurfaced,concernimplementationofthesesemiparametricmodels.The rstproblem,easilycorrected,waslaxityofthedefaultconver-gencecriteria.Theother,notedindependentlybyKlein,Flanders,andTolbert,andRamsay,Burnett,andKrewskiconcernedvarianceestimatesproducedbycommerciallyavailablesoftware.Insimulations,theywereasmuchas50%toosmall.Wederiveanexpressionforavarianceestimatorfortheparametriccomponentofgeneralizedadditivemodelsthatcanincludeuptothreesmoothingsplines,andshowhowthestandarderror(SE)ingMonteCarloexperiments,weevaluatedperformanceoftheestimatorin nitesamples.TheestimatorperformedwellinMonteCarloexperiments,inthesituationsconsidered.However,ingdatafromourstudyofairpollutionandcardiovasculardisease,thestandarderrorestimatedusingthenewmethodwasabout10%to20%largerthanthebiased,commerciallyavailablestandarderrorestimate.

KeyWords:Epidemiologicmethods;Generalizedadditivemodels;Semiparametricmod-els;Variance.

1.INTRODUCTION

Generalizedadditivemodels(GAMs),arelativelynewapproachtononparametricorsemiparametricsmoothinganddataanalysis(HastieandTibshirani1990),havebecomewidelyused,particularlyintimeseriesanalysesofacutehealtheffectsofairpollution.Insemiparametricmodels,thefocusofthisarticle,themeanofthedependentvariableismodeledasaparametric,linearfunctionofsomepredictorsplusasumoffunctionsofotherpredictors,whichinsomeapplicationsmaybeconfoundersornuisancefactors.Theformofthefunctionusedfortheseotherpredictorsisquitegeneral,hencethetermsemiparametric.W.DanaFlandersisProfessor,RollinsSchoolofPublicHealth,DepartmentofEpidemiology,EmoryUniversity,1518CliftonRoad,Atlanta,GA30327(E-mail: anders@sph.emory.edu).MitchKleinisAssistantProfessor,andPaigeTolbertisAssociateProfessor,RollinsSchoolofPublicHealth,DepartmentofEpidemiology andDepartmentofEnvironmentalandOccupationalHealth,EmoryUniversity,RollinsSchoolofPublicHealth,1518CliftonRoad,Atlanta,GA30327.

©2005AmericanStatisticalAssociationandtheInternationalBiometricSociety

JournalofAgricultural,Biological,andEnvironmentalStatistics,Volume10,Number2,Pages246–257DOI:10.1198/108571105X47010

246

VARIANCEOFSEMIPARAMETRICGAMS247

SchwartzproposedapplicationofGAMStotimeseriesstudiesassessingtheassociationofairpollutionwithmortalityorotheroutcomemeasuresin1994(Schwartz1994a),andinitiallypresentedGAMmodelsasasensitivityanalysisaugmentingaparametricapproach(Schwartz1994b).Intheinterveningyears,GAMshavegainedwidespreadpopularityforuseinthesetypesoftimeseriesstudies(e.g.,Borja-Aburtaetal.1998;Michelozzietal.1998;Burnettetal.1999;Conceicaoetal.2001;Moolgavkar2000;Pope,Hill,andVillegas1999;Sametetal.2000).

GAMscangenerallybe tusingS-PlusorusingPROCGAMinSAS(SAS2001).Asdiscussedinthefollowing,themodelscanbe tusingaback ttingalgorithm.HastieandTibshirani(1990)discussedconditionsthatassureconvergenceofthisapproach.Twoproblemshaverecentlysurfaced,however,concerningimplementationofthesemodels.The rstproblem,easilycorrected,wasthatthedefaultconvergencecriteriawerenotadequatelystrict(Dominici,McDermott,Zeger,andSamet2002;Katsouyannietal.2002).TheotherproblemconcernsthevarianceestimatesproducedbytheseprogramsfortheparametriccomponentofthesemiparameticGAMs.TheproblemwasnotedindependentlybyKlein,Flanders,andTolbert(2002)andbyRamsay,Burnett,andKrewski(2003).Theyshowedthatthevarianceestimatescouldbeasmuchas50%lowerthanthesimulatedvarianceinsomeofthesituationsconsidered.Thisarticleaddressesthesecondproblembyderivingarelativelyeasilyimplementablevarianceestimatorforthesemodels.

Oneofthespeci cproblemsthatmotivatedthisworkarethenumerouspublishedorongoingstudiesoftheassociationsbetweenhealthoutcomes,suchasrespiratorydisease,andairpollution(e.g.,Borja-Aburtaetal.1998;Michelozzietal.1998;Burnettetal.1999;Conceicaoetal.2001;Moolgavkar2000;Pope,Hill,andVillegas1999;Sametetal.2000;Tolbertetal.2000).Someofthestudiespublishedbyothershaveusedgeneralizedadditivemodelstoassesstheassociationbetweenairpollutionanddisease,butanappropriatevarianceestimatorhasbeenunavailable.

Thepurposeofthisarticleisthree-fold.First,wepresentanasymptoticvarianceesti-matorfortheparametriccomponentofGAMsemiparametricmodels,providinganexplicitformulationforuptothreesplines.Second,weempiricallyevaluatetheperformanceofthisestimatorin nitesamplesusingMonteCarlosimulationsandbasethesesimulationsonactualdatafromanongoingstudyofairpollution.Finally,weapplytheestimatortoanongoingstudyofairpollutionandemergencydepartmentvisits.Weillustratethat,inthisstudy,thevarianceweestimatediffersfromthecorrespondingestimatesproducedbycommerciallyavailablesoftware.

2.METHODS

Inthesemiparametricsituationsofinteresthere,thegeneralizedadditivemodelisgivenby:

E(Yi|Xi,Z1i,...,ZJi)=g–1(ηi)=g 1(α+βXi+f1(Z1i)

+···+fJ(ZJi)),i=1,2,...,n,(2.1)

248W.D.FLANDERS,M.KLEIN,ANDP.TOLBERT

whereYiisthenumberofeventsfortheithobservation;gisastrictlymonotonelinkfunction;ηi=α+βXi+f1(Z1i)+···+fJ(ZJi);βis(p×1)parameterofinteresttobeestimated;Xiisa(1×p)vectorofpredictors;Zjiisthevalueofthejthcovariatefortheithobservation;andfj(Zji)isanarbitrary(smoothing)functionwithcontinuoussecondderivatives,forj=1toJ.(Here,welimitconsiderationtoJ≤3,butresultsshouldextendinananalogouswayforJ>3.)WeassumeherethattheYigivenXi,andZ1iareindependentwithaPoissondistributionwhosemeanisgivenbyEquation(2.1).ThePoissondistributionistypicallyusedinapplicationsinairpollutionepidemiology.However,resultsholdwithobviousmodi cationsforotherdistributionsintheexponentialfamily(HastieandTibshirani1990).

WederiveanexplicitexpressionforthevarianceestimatorforaclassofestimatorsofβinthemodelgivenbyEquation(2.1),estimatedbypenalizedlikelihood(HastieandTibshirani1990).Thatis,onemaximizes

1 λjj(β,f)=l(η;Y) 23j=1 fj(z) 2dz,(2.2)

overηandoverfjintheclassoffunctionswithcontinuoussecondderivativesfj.Here,the

λjaresmoothingparameterswhichmustbespeci ed,orestimatedfromthedata(HastieandTibshirani1990).ThefunctionsthatmaximizeEquation(2.2),arecubicsmoothingsplines(HastieandTibshirani1990).Anequivalentproblem(HastieandTibshirani1990)istomaximize

l(η;Y) 1 tλjfjKjfj,2(2.3)

whereKjarethen×nquadraticpenaltymatricesgiven,forexample,byBuja,Hastie,andTibshirani(1989)forj=1,2,3;fjarethen×1vectorsfj(Zi),i=1,2,...n,j=1,2,3;andthesuperscript“t”ing[A] todenotethegeneralizedinverse,themodelcanbe tusingalocalscoringprocedurethatincorporatestheweightedsmoothingmatricesSj=(A+Kj) Ainaback ttingalgorithmasdescribedbyHastieandTibshirani(1990).Weassumesuf cientregularityandchoiceofsmoothingparameterssothatthelocalscoringprocedureandtheback ttingalgorithmthatitincludes

i→βoforeachiandconvergeinprobabilityasthesamplesizeincreases.Inparticular,β

2,i→f2,0,andf 3,i→f3,0,inprobabilitywhereβ i 1,f 1,i,f 2,i,f 3,iarethe 1,i→f1,0,ff

parameterestimatesatstepi;andβo,f1,0,f2,0,andf3,0arethecorrespondingtruevalues.

isstraightforward:we ndOurapproachtoestimatingthelargesamplevarianceofβ

intermsofY,E(Y),andknown(oralargesample,approximate,linearexpressionforβ

consistentlyestimable)functions.

VARIANCEOFSEMIPARAMETRICGAMS249

ByargumentspresentedbyHastieandTibshirani(1990),theone-stepupdatesfortheNewton-Raphsonstepofthe ttingalgorithmattheithsteparegivenby:

t 1t c Xβi=XXA(I S2)XXAZ f1,i 1 f2,i 1 f3,i 1, c =S1Z Xβ f2,i 1 f3,i 1,f1,i c =S2Z Xβ f1,i 1 f3,i 1, f2,i

and

f3,i= c S3Z Xβ f1,i 1 f2,i 1,(2.4)(2.5)(2.6)(2.7)whereAisthen×nmatrix 2l/ η ηt;listhePoissonlog-likelihood;Zcisthen×1vectoroflinearizeddependentvariablesZc=ηc+A 1u;andwhereuisthen×1vector l/ ηc,allevaluatedusingthecurrentestimatesofβ,f1,f2,f3,andη.

Equations(2.4)–(2.7)areasystemoffourequationsinfourunknowns(vectors).Astraightforward,thoughtediousderivationyieldsthefollowingclosedformestimateforthe

i:Newton-Raphsonupdateofβ

i=XtA(I–V1–V2 V3)X XtA(I–V1–V2 V3)Zc,β

where

V1 I S1S3–S1(I S3)(I S2S3) S2(I S3) S1I S3 (I S3)I S2S)3S2(I S3)

=I S2S1–S2(I S1)(I S3S1) S3(I S1) S2I S1 (I S1)[I S3S1] S3(I S1) =I S3S2 S3(I S2)[I S1S2]

S1(I S2)] S3I S2 (I S2)[I S1S2] S1(I S2),=(2.8)V2V3

i 1,f 1,i,f 2,i,f 3,i,andη.andZcisexpressedintermsofβ

Equation(2.8)isanexplicitformformultiplesmoothingsplines(intermsoftheindividualsmoothersS1,S2,andS3),oftheresultgivenbyHastieandTibshirani(1990)of

=[XtA(I S)X] (XtA(I S))Zc.BytakingS=(I V1 V2 V3),onetheform:β

seesthattheone-stepupdateforβinEquation(2.8)isconsistentwithHastieandTibshirani(1990)whogaveexplicitresultsforasinglespline,butnotedthatthesameformoftheequationwouldholdformultiplesplines.Anexplicitformformultiplesmoothingsplines(intermsoftheindividualsmoothers)canbeobtainedbyapplyingtherecursiveequationdescribedintheAppendix.

Becausetheestimatesconvergeinprobabilitybyassumption,wecanimaginestarting

istheprocessatthetruevalueηt.Then,withalargesamplesize,theone-stepestimatorβ

250W.D.FLANDERS,M.KLEIN,ANDP.TOLBERT

givenby:

=XtA(I V1 V2 V3)X XtA(I V1 V2 V3)Zt,β(2.9)

whereZt=ηt+A 1u—allparametersandexpressionsnowevaluatedattheirtruevalues.Thus,followingtheargumentsofbyMcCullaghandNelder(1989),theone-stepestimatorisalinearfunctionoftheobservations,andsubsequentupdatesshouldbenegligibleforalargesamplesize(asymptotically).

Thus,theasymptoticvarianceisgivenby:

)var(β

where

W=≈W var(ZT) Wt t W=XtA(I V1 V2 V3)XXA(I V1 V2 V3).

(2.10)

Furthermore,wecanestimatevar(ZT)byA 1andW,withallparametersevaluatedattheestimatedvalues.

3.SIMULATIONS

Toevaluatetheperformanceofthisvarianceestimator(Equation(2.10))in nitesamplesizes,weperformedMonteCarlosimulationsintwodifferentsetsofsituations—onebasedonrealdata,theotherusinghypotheticaldata.Inthe rstsituation,weuseddatafromourongoingstudyofemergencydepartment(ED)visitsforcardiorespiratorydiseasesandairpollutioninAtlanta(Tolbertetal.2000).TheoutcomevariablewaseitherdailyEDvisitsforalltypesofcardiovasculardisease(CVD)orforasthmafromAugust1,1998,toJuly31,1999.Weanalyzeddatafordailynitrogendioxide(NO2)andseparatelyfordailyparticulatematter(PM10),resultinginatotaloffourexperimentalconditionsbasedonrealdata(Table

1).WecontrolledforotherEDvisits,temperature,dewpoint,dayoftheweek,andtimeusingsmoothingsplinesinaGAMmodel:

EYi|Xi,Z1i,...,ZJi

=exp(α+βXi+f1(Z1i)+f2(Z2i)+f3(Z3i)),fori=1,2,...,n,(3.1)whereXti=(airpollutantondayi,allEDvisitsondayi,ITue,IWed,IThu,IFri,ISat,ISun),ITue ISunareindicatorsfordayi;Z1i=numberofdayssincethestartofthestudy,Z2i=themeantemperaturefordayi,andZ3i=themeandewpointondayi.Toavoidtiesanddivisionby0,weaddedasmallrandomnumber(mean0,variance.01)toeachtemperatureanddewpoint.Wethen tthisGAMtotheobserveddata,using14degreesoffreedomforthetimespline,correspondingapproximatelytomonthlyknots;and,5eachfortemperatureanddewpointsplines.(Weused7degreesoffreedomfortheCVDoutcome,correspondingapproximatelytoseasonalknots).Wethensavedthesemodelpredictedvalues,andusedtheminthesimulationsastheexpectedvalues,andgeneratedindependentlyforeachday,

VARIANCEOFSEMIPARAMETRICGAMS251

Table1.MonteCarloSimulationResults,ExpectedValuesCalculatedfromActualEDvisitsandAir

Pollutants,inAtlanta

CoveragedCoveragee

95%CI oldSE newe95%CI new

90.7%

93.4%

89.4%

90.2%.0123.0345.0226.038395.3%95.5%95.7%94.5% )cS OutcomePollutantTrueβ1bSE(βE olddCVDCVDAsthmaAsthma

a

bPM10NO2PM10NO2.011.006.040.014.0124.0336.0219.0378.0111.0313.0187.0314CVD(Asthma):Emergencyroomvisitsforcardiovasculardisease(asthma),usedtodetermineexpecteddailycountsValueofβ1usedtodetermineexpectedvalueofY,foreachsetof1,000MonteCarloexperiments.c 1foreachsetof1,000experiments.StandarderrorofβdMeanestimatedstandarderror,andcoverageof95%CIproducedbySASPROCGAM(3.1);95%CIcalculatedasthepointestimate+/ 1.96timestheestimatedstandarderror.

eMeanestimatedstandarderror,andcoverageof95%CIbasedonnewvarianceestimator;95%CIcalculatedasthepointestimate+/ 1.96timestheestimatedstandarderror.

Poissonrandomvariables.Weanalyzedthisrandomlygeneratedseries,by ttingtheGAMmodel(Equation(3.1)),andsavingthestandarderrorgeneratedbySASandthatcalculatedusingEquation(2.10).Werepeatedthisprocess1,000timesforeachoutcome-airpollutantcombination.Wefocusonβ1,thelograteratiofortheassociationofEDvisitswiththeairpollutant.

Thesecondsetofsituationsusedwassimilartothatjustdescribed,exceptthathypo-thetical,simulateddatareplacedtheobserveddatausedabovetogeneratethedailyexpectedcountsofEDvisits.Speci cally,fori=1,2,...200wegenerated(hypothetical)exposureandcovariates:εi,x2,i,t2,i,t3,iasindependent,standardGaussianvariables;x1,i=εi+x2,iandt1,i=i.Wethende nedtheexpectedvalueofYias:

EYi|x1i,x2i,t1i,t2i,t2i

1132=expβ0+β1x1 β2x2+cos(t1/5)+sin(t2/5)+(t3 t3+3/4t3)/200.22

(3.2)

Wechoseβ0=3,4,or5andβ1=.0or.04,andβ2=.1,resultinginatotalofsixadditionalexperimentalconditions(Table2).Werandomlygenerated200preliminaryvaluesofYiasindependentPoissonvariableswithmeangivenbyEquation(3.2).Sothatthemodelwouldbecorrectlyspeci ed,wethen taGAMwithasmoothingsplinesusingsixdegreesoffreedomeachfort1,t2,andt3tothesepreliminaryobservationsandusedthemodel-predictedvaluesastheexpectedvaluestogeneratethe200observationsYiusedforeachMonteCarloexperiment.Weadjustedthesmoothingparameters(λ1,λ2,λ3)tocorrespondtothesixdegreesoffreedomthatwespeci edforuseinPROCGAM.(Thatis,weretainedthesamex1,x2,t1,t2,andt3foreachexperiment,butusedthepredictedvaluesfromthemodel ttothepreliminarydataastheexpectedvaluesforall1,000MonteCarloexperimentsineachset.Thisprocedureshouldensurethattheexpectedvalueswereinthespaceofpossible ts.Becauseofthisprocess,the“truevalue”ofβ1foreachsimulation

252

Table2.W.D.FLANDERS,M.KLEIN,ANDP.TOLBERTMonteCarloSimulationResults,ExpectedValuesCalculatedHypotheticalData

MeanofCoveragecCoveraged

)bS ExperimentExpectedYTrueβ1aSE(βE oldc95%CI oldSE newd95%CI new

1

2

3

4

5

6

a

b21.951.215722.359.9165.0010 .0001.0050.049.064.065.0158.0096.0057.0161.0098.0058.0123.0073.0042.0125.0076.004687.8%85.7%84.6%87.6%89.1%88.6%.0152.0096.0056.0163.0099.006093.3%95.0%94.7%95.3%95.0%95.3%Valueofβ1usedforeachsetof1,000MonteCarloexperiments.

foreachsetof1,000experiments.Standarderrorofβ1cMeanestimatedstandarderror,andcoverageof95%CIproducedbySASPROCGAM(3.1);95%CIcalculatedas

thepointestimate+/ 1.96timestheestimatedstandarderror.

dMeanestimatedstandarderror,andcoverageof95%CIbasedonnewvarianceestimator;95%CIcalculatedasthepointestimate+/ 1.96timestheestimatedstandarderror.

setwasequaltotheinitialestimate,anddifferedslightlyfrom0or.04dependingonthebaselineestimates.)UsingPROCGAMinSAS(2001),weestimatedβ11,000timesforeachrandomlygeneratedseries.Wecalculatedthestandarderroroftheestimatedβ1’s,andcomparedthisstandarderrorwiththeestimatedstandarderrorsproducedbytheSASprogram,andwiththenewstandarderrorestimateinEquation(2.10).

Inthethirdsetofexperiments(Table3),weconsideredestimationwithasmallernumberoftimepoints(either100or50),andchoseeither6,4,or2degreesoffreedomforthesmoothingsplines.Otherwise,thislastsetofexperimentsislikethesecondset,speci callyexperiment6.Intwoexperiments(12and14),weusedtwoalternativetypesoferrorstructures,thenormalandthebinomialdistributions.

4.RESULTS

Resultsofthe rstsetofMonteCarloexperiments—basedontheactualobservationsofEDvisitsandairpollutantsinAtlanta(situation1)—areshowninTable1.Theseresultsillustrateseveralpoints.First,theyillustrateatendencyforunder-estimationofthestandarderrorsbytheSASprocedurePROCGAM(3.1).Forexample,the“old”standardestimateforβ1(whichrelatesemergencyroomvisitstoPM10)averagedabout.0111,comparedto.0123,thesamplestandarderrorofestimatedβ1’s.Ontheotherhand,thenewstandarderrorestimatorwasaboutthesame,onaverage,asthesamplestandarderror.Asimilarpatternheldfortheotherpollutant,andwhenasthmaemergencyroomvisitswereusedtodeterminetheexpectedvalues.Thesecondpoint,relatedtothe rst,isthatcoverageofthe95%con dencelimitsbasedonthe“old”varianceestimateswasconsistentlylessthanthenominalvalueof95%.Thecoverageintheseexperimentswasaslowas89%inoneinstance.Incontrast,thecoverageofcon denceintervalsbasedonthenewvarianceestimatorwasclosetothenominallevel.

ResultsofthesecondsetofMonteCarloexperiments—basedonhypotheticaldata(situation2)—areshowninTable2.Theseresultsfurthersupportthesesamepatterns.

VARIANCEOFSEMIPARAMETRICGAMS

Table3.

Experiment/

Distribution

8 /Poisson

9/Poisson

10/Poisson

11/Poisson

12/Poisson

13/Normal

14/BinomialMonteCarloSimulationResults,ExpectedValuesCalculatedHypotheticalData253N,df100,6100,450,650,450,2100,2100,4 )bS Trueβ1aSE(βE oldc.031.044.031.056.076 .038.037.0100.0083.0134.0121.0119.0920.0116.0076.0072.0072.0111.0112.0904.0088CoveragecCoveraged 95%CI oldSE newd95%CI new89.4%82.4%71.6%92.2%93.3%94.7%79.9%.0100.0085.0133.0119.0119.0913.011395.1%93.4%94.4%94.8%94.9%94.9%94.4% EachexperimentislikeExperiment6inTable2,exceptfor:theerrordistribution,thenumberoftimepoints(N),andthedegreesoffreedom(df).

aValueofβ1usedforeachsetof1,000MonteCarloexperiments.

b 1foreachsetof1,000experiments.StandarderrorofβcMeanestimatedstandarderror,andcoverageof95%CIproducedbySASPROCGAM(3.1);95%CIcalculatedasthepointestimate+/ 1.96timestheestimatedstandarderror.

dMeanestimatedstandarderror,andcoverageof95%CIbasedonnewvarianceestimator;95%CIcalculatedasthepointestimate+/ 1.96timestheestimatedstandarderror.

Speci cally,theoldstandarderrorestimatesareconsistentlylowerthanthesamplestandarderroroftheestimatedβ’s,andthecoverageoftheassociatedcon denceintervalswasconsistentlylowerthanthenominal95%level.

InthethirdsetofMonteCarlosexperiments,thenewestimatorofthestandarderroryieldedresultsclosetothesimulatedstandarddeviationoftheestimatedβ,whenwereducedthenumberoftimepointsfrom100asinearlierexperimentstoeither100or500inthese.Inafewexperimentswiththelowernumberoftimepointsandifweusedmorethan2–4

differedslightly,butsigni cantlyfromdegreesoffreedomforeachspline,theaverageβ

thetrueβ(datanotshown).

5.EXAMPLE:APPLICATIONOFNEWMETHOD

WeappliedtheestimatortodatafromanongoingstudyofairpollutionandEDvisitsforcardiorespiratorydiseasesinAtlanta(Tolbertetal.2000),thedatausedherefromAugust1,1998,toJuly31,1999.OurrationaleforusingGAMsre ectstheirinherentappealdueinparttothesemiparametricnatureofthetimedependencyandconsequentrelaxationofassumptions,andthefrequentuseofthesemodelsintheairpollutionliterature.Wecalculatedthestandarderrorforparameterestimatesfromasemiparameticgeneralizedadditivemodel,withthePoissondistributionandloglink,executedusingPROCGAMinSAS,andcomparedthisestimatetothestandarderrorestimatedwithournewmethod.WechosethedegreesoffreedomforthesplinestobesimilartothoseweusedinparametricPoissonregression;Useofgeneralizedcross-validation,thedefaultapproachinSAS(SASInstitute2001),suggestedslightlyfewerdegreesoffreedom,butledtothesameresults.Weevaluatedtheassociationbetweennitrogendioxide(NO2)andERvisitsforallCVD,usingthethree-daymovingaverageofNO2,inpartduetoaprioriinterest,andcontrollingfor

254W.D.FLANDERS,M.KLEIN,ANDP.TOLBERT

time,formeantemperature,andfordewpointusingcubicsplineswith7,7,and5degreesoffreedom,respectively.Wealsocontrolledfordayoftheweekusingindicatorvariablesandthenumberofemergencyroomvisitsfornoncardiovasculardisease.Tosimplifycalculationsbyavoidingties,weaddedasmallrandomnumbertothetemperatureanddewpoint(whichdidnotchangetheestimateoftheparameteroritsstandarderror).Wefoundlittleevidenceofautocorrelationofresiduals(Durbin-Watson=2.155,p=.40bysimulation).InthemodelforCVDvisits,theparameterestimateforNO2was.020(rateratio=1.020)andthestandarderrorestimatedinPROCGAMwas.018.Thestandarderrorestimatedusingthenewestimatorwas.020,about10%largerthanthatobtainedusingSAS.Thisdifferenceisimportantforevaluatingthestabilityofresults,forintervalestimation,andwouldaffectanymeta-analysisthatusedthisresult.

6.DISCUSSION

OurresultsprovideevidencethatthevarianceestimatorinEquation(2.10)workswellwith nitesamples—atleastforthesituationsconsidered.Morework,however,needstobedonetoverifythatitsperfomanceremainsgoodunderotherconditions.OurresultsalsofurthersupportandareconsistentwiththeworkofKleinetal.(2002)andofRamseyetal.(2003)whoshowedthatthevarianceestimationproceduresusedincommerciallyavailableprogramscouldbeinadequate.Thesetendenciesofcommercialsoftwaretounderestimatevarianceshavebeenattributedtoconcurvityinthedata(Ramsayetal.2003),andtoin-adequatelinearapproximationsusedforthesmoothfunctions(Dominici,McDermott,andHastie2003).Recognitionoftheseandotherproblemshasmotivatedreanalysesofatleast20studiesofairpollutionandhealtheffectswiththeoverallconclusionthatuseofGAMs,implementedwiththefaultyvarianceestimator,wasassociatedwithsmallerstandarderrorsthanuseofgeneralizedlinearmodels(HealthEffectsInstitute2003).

Wehavepresentedandevaluatedavarianceestimatorforuptothreesplines,extendingthepreviousworkofHastieandTibshirani(1990)whopresentedexplicitresultsforasinglespline,andofFlanders,Klein,andTolbert(2003).Implementationofthisapproachformoresplinesisstraightforward;forexample,perhapsusingtherecursiveequationsgiven,butasthenumberofsplinesincreases,ofcourse,computationsbecomemoreandmoreonerous.OurresultalsoappliesdirectlytothecaseofonlyoneortwosplinesbysimplytakingS1and/orS2equalto0.

Ourargumentsdependheavilyontheassumptionofconsistencyofβandη,andconvergenceoftheback ttingalgorithmasarguedbyHastieandTibshirani(1990).Wehavenotinvestigatedperformanceofthevarianceestimatorwhenthatassumptionmightfail.ConditionsotherthanthosenotedbyHastieandTibshirani(1990)mayalsoleadtoconsistency.Inparticular,wemightalsoexpectconsistencyifthenumberof(say,time)pointsremains xed,buttheexpectedmeanincreasesforeachpoint,otherparametersremainconstant,andthemodeliscorrectlyspeci edwithjudiciouschoiceofdegreesoffreedomforthesplines.Moreworkonconsistency,notthefocusofthisarticle,remains.Forexample,theapplicationtotimeseriesshouldprobablybebasedonfutherspeci cationof

VARIANCEOFSEMIPARAMETRICGAMS255

assumptionsbecause,asthenumberoftimepointsincreases,thecomplexityandingeneralthenumberofparametersintheunderlyingmodelcouldpotentiallyincreaseproportionately.Yetadditionalworkcouldallowforpotentialserialautocorrelation,althoughourexampledidnotsuggestimportantresidualautocorrelation.Wealsonotethatafterdevelopmentoftheexpressionforvariance(Flanders,Klein,andTolbert2002,Flandersetal.2003;equation10),webecameawarethatanotherestimatorwasintheprocessofbeingdevelopedtocorrecttheerrorsinthecommerciallyavailablesoftware(Dominici,McDermott,andHastie2002).Comparisonnowshowsourformulationtobeequivalenttotheirs(Dominici,McDermott,andHastie2003):onesimplysubstitutesacombinedsmoothingmatrix,S,inplaceof(I–V1–V2 V3)intheexpressionforWinEquation(2.10).Ourindependentderivation,resultsofourempiricevaluationsandourexampleshowthatthecommerciallyavailableestimatescanbetoosmall,andthatthealternativeestimatorhasgood nitesampleproperties,atleastinthesituationsconsidered.Inparticular,ourworkprovidesempiricevidence,complimentingworkofDominicietal.(2003),thatuseofEquation(2.10)foranalysisofrealdatacanleadtocon dencelimitswithappropriatecoverageproperties,againatleastforthesituationsconsidered.

SomeinvestigatorsinairpollutionepidemiologyhavesofaravoideduseofGAMsastheprimarymethodofanalysisbecauseofconcernsaboutthevarianceestimator(Kleinetal.2002),choosinginsteadtousePoissonregressionmodelswithsplinesfortimewithmanyknotsandchosen,inpart,basedonaprioriconsiderations.TheMonteCarloexper-imentssuggest—inagreementwithsimulationsofKleinetal.(2002)andofRamseyetal(2003)—thatthestandarderrorprovidedbycommerciallyavailablesoftwarecanhavesubstantialerror.Theestimatorevaluatedhereperformednicelywith nitesamplesinsim-ulationscompletedsofar.Importantly,asillustratedinoursimulations,thecorrectionhasasubstantialeffectontheestimatedstandarderrorsandcon denceintervalwhenappliedtodatabasedonanongoingstudyinAtlanta(Tolbertetal.2000),andonhypotheticaldata.Thisunderestimationisfurtherillustratedintheexample,usingdatafromourongoingstudyofairpollutioninAtlanta.Thesimulationssuggestthat,atleastforsituationslikethoseconsideredhere,theassumptionsmaybeadequatelyachievedandthatthenewestimatorcanperformwellinsuchrealsituations.AsnotedbyLumleyandSheppard(2003),majorchallengesairpolllutionepidemiologyremain,particularlyincludingmodelselectioninthefaceofmeasurementerrorandconfounding.

APPENDIX

andvar(β )intermsoftheindividualsmoothingmatriceslikeEquationsEquationsforβ

(2.8)and(2.10),butthatapplywithanynumberofsplinescanbeobtainedrecursivelyas

(suchasEquation(2.8))andvar(β )(i.e.,Equationfollows.Startwithexpressionsforβ

(2.10))thatapplywithJ 1splines,intermsofsmoothingmatricesS1,S2,...,SJ 1,

andmatrixA(then×nmatrix 2l/ η ηt,asinEquation(2.8)).Expressionsforβ

)thatapplywithoneadditionalspline(say,SJ)isobtainedbysubstituting:(I var(β

SiSJ) Si(I SJ)inplaceofSifori=1,2,...,J 1andA(I SJ)forAthroughout.

256W.D.FLANDERS,M.KLEIN,ANDP.TOLBERT

Thisrecursiveapproachisjusti edbywritingthesystemofJ+1linearEquationsin

andf ...,f ,comparabletoEquations(2.4)–(2.7);eliminatingf ;andunknownsβ1JJrearrangingtoobtainareducedsystemofJequationsinJunknowns.Thenewsystemofequationshasthesameformastheoriginal,providedweidentify(I SiSJ) Si(I SJ)withSifori=1,2,...,J 1andA(I SJ)withAthroughout.

ACKNOWLEDGMENTS

ThisworkwassupportedbygrantsfromtheU.S.EnvironmentalProtectionAgency(R82921301-0)andfromtheNationalInstituteofEnvironmentalHealthSciences(R01ES11294).

[ReceivedApril2004.RevisedNovember2004.]

REFERENCES

Borja-Aburta,V.H.,Castillejos,M.,Gold,D.R.,Bierzwinski,S.andLoomis,D.(1998),“MortalityandAmbient

FineParticlesinSouthwestMexicoCity,1993–1995,”EnvironmentalHealthPerspectives,106,849–855.Buja,A.,Hastie,T.,Tibshirani,R.,(1998),“LinearSmoothersandAdditiveModels,”TheAnnalsofStatistics,17,

453–510.

Burnett,R.T.,Smith-Doiron,M.,Stieb,D.,Cakmak,S.,andBrook,J.,(1999),“EffectsofParticulateandGaseous

AirPollutiononCardiorespiratoryHospitalizations,”ArchivesofEnvironmentalHealth,54,130–139.Conceicao,G.M.S.,Miraglia,S.G.E.K,Kishi,H.S.,Saldiva,P.N.H.,andSinger,J.M.(2001),“AirPollutionand

ChildMortality:ATime-SeriesStudyinSaoPaolo,Brazil,”EnvironmentalHealthPerspectives,109,347–350.

Dominici,F.,McDermott,A.,Zeger,S.L.,andSamet,J.M.(2002),“OntheUseofGeneralizedAdditiveModels

inTime-SeriesStudiesofAirPollutionandHealth,”AmericanJournalofEpidemiology,156,193–203.Dominici,F.,McDermott,A.,andHastie,T.(2002),“SemiparametricRegressioninTimeSeriesAnalysesofAir

PollutionandMortality:GeneralizedAdditiveandGeneralizedLinearModels,”PresentationonVarianceofGAMEstimators,EnvironmentalProtectionAgencyWorkshoponGAM-RelatedStatisticalIssuesinPMEpidemiology,November4–6,2002,Durham,NC.

(2003),“ImprovedSemi-ParametricTimeSeriesModelsofAirPollutionandMortality”[on-line],http://www.biostat.jhsph.edu/~fdominic/jasa.R2.pdf.

Flanders,W.D.,Klein,M.,andTolbert,P.(2002),“ANewVarianceEstimatorforParametersofSemi-parametric

GeneralizedAdditiveModels.AReporttotheU.S.EnvironmentalProtectionAgency,”BasedonaPre-sentationattheEnvironmentalProtectionAgencyWorkshoponGAM-RelatedStatisticalIssuesinPMEpidemiology,November4–6,2002,Durham,NC.

Flanders,W.D.,Klein,M.,andTolbert,P.(2003),“ANewVarianceEstimatorforParametersofSemi-parametric

GeneralizedAdditiveModels,”TechnicalReport,RollinsSchoolofPublicHealth,EmoryUniversity,De-partmentofBiostatistics,Atlanta,GA.

Hastie,T.J.,andTibshirani,R.J.(1990),GeneralizedAdditiveModels,MonographsonStatisticsandApplied

Probability43,NewYork:Chapman&Hall.

HealthEffectsInstitute(2003),“RevisedAnalysesofTimeSeriesStudiesofAirPollutionandHealth,”Special

Report,Boston,MA:HealthEffectsInstituteBoston,MA.

Katsouyanni,K.,Touloumi,G.,Samoli,E.,Gryparis,A.,Monopolis,Y.,LeTertre,A.,Boumghar,A.,Rossi,G.,

Zmirou,D.,Ballester,F.,Anderson,H.R.,Wojtyniak,B.,Paldy,A.,Braunstein,R.,Pekkanen,J.,Schindler,

VARIANCEOFSEMIPARAMETRICGAMS257

C.,andSchwartz,J.(2002),“DifferentConvergenceParametersAppliedtotheS-PlusGAMFunction,”Epidemiology,13,742–743.

Klein,M.,Flanders,W.D.,andTolbert,P.E.(2002),“VariancesmaybeUnderestimatedUsingAvailableSoftware

forGeneralizedAdditiveModels,”AmericanJournalofEpidemiology,155,s106.

Lumley,T.,andSheppard,L.,(2003),“TimeSeriesAnalysesofAirPollutionandHealth:StrainingatGnatsand

SwallowingCamels,”Epidemiology,14,13–14.

McCullagh,P.,andNelder,J.A.(1989),GeneralizedAdditiveModels,NewYork:ChapmanandHall,pp.327–329.Michelozzi,P.,Forastiere,F.,Fusco,D.,Perucci,C.A.,Ostro,B.,Ancona,C.,andPalotti,G.(1998),“AirPollution

andDailyMortalityinRome,Italy,”OccupationalandEnvironmentalMedicine,44,605–610.

Moolgavkar,S.(2000),“AirPollutionandHospitalAdmissionsforDiseasesoftheCirculatorySysteminThree

U.S.MetropolitanAreas,”JournalofAirWasteManagementAssociation,50,1199–1206.

Pope,C.A.,Hill,R.W.,andVillegas,G.M.(1999),“ParticulateAirPollutionandDailyMortalityonUtah’s

WasatchFront,”EnvironmentalHealthPerspectives,107,567–573.

Ramsay,T.,Burnett,R.,andKrewski,D.(2003),“TheEffectofConcurvityinGeneralizedAdditiveModels

LinkingMortalitytoAmbientAirPollution,”Epidemiology,14,18–23.

Samet,J.M.,Dominici,F.,Curriero,F.,Coursac,I.,andZeger,S.L.(2000),“FineParticulateAirPollutionand

Mortalityin20U.S.Cities:1987–1994,”NewEnglandJournalofMedicine,343,1742–1757.

SASInstitute(2001),TheSASsystemforWindows,Release8.02,TSLevel02M0,Cary,NC:SASInstitute.Schwartz,J.(1994a),“TheUseofGeneralizedAdditiveModelsinEpidemiology,”XVIIthInternationalBiometric

Conference,Hamilton,Ontario,Canada,August8-12,1994.Proceedings,Volume1:Invitedpapers.

(1994b),“AirPollutionandHospitalAdmissionsfortheElderlyinBirmingham,Alabama,”AmericanJournalofEpidemiology,139,589–598.

Tolbert,P.E.,Klein,M.,Metzger,K.B.,Peel,J.,Flanders,W.D.,Todd,K.,Mulholland,J.A.,Ryan,P.B.,and

Frumkin,H.(2000),“InterimResultsoftheStudyofParticulatesandHealthinAtlanta(SOPHIA),”JournalofExposureAnalysisandEnvironmentalEpidemiology,20,446–460.

本文来源:https://www.bwwdw.com/article/hqpe.html

Top