Computing Network (SHARCNETwww.sharcnet.ca). License GPL URL

更新时间:2023-05-04 00:58:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

The np Package

January11,2008

Version0.14-2

Date2008-01-11

Depends boot

Suggests quantreg

Title Nonparametric kernel smoothing methods for mixed datatypes

Author Tristen Hay?eld,Jeffrey S.Racine Maintainer Jeffrey S.Racine

Description This package provides a variety of nonparametric(and semiparametric)kernel methods that seamlessly handle a mix of continuous,unordered,and ordered factor datatypes.We would like to gratefully acknowledge support from the Natural Sciences and Engineering Research

Council of Canada(NSERC:nserc.ca),the Social Sciences and Humanities Research

Council of Canada(SSHRC:sshrc.ca),and the Shared Hierarchical Academic Research Computing Network(SHARCNET:sharcnet.ca).

License GPL

URL

R topics documented:

cps71 (2)

Italy (3)

oecdpanel (4)

wage1 (5)

gradients (6)

np (8)

npcmstest (11)

npcdens (15)

npcdensbw (23)

npconmode (29)

npudens (35)

1

2cps71 npudensbw (41)

npksum (50)

npplot (58)

npplreg (70)

npplregbw (76)

npqcmstest (83)

npqreg (86)

npreg (90)

npregbw (100)

npsigtest (107)

npindex (110)

npindexbw (119)

npscoef (125)

npscoefbw (130)

se (136)

uocquantile (137)

Index138 cps71Canadian High School Graduate Earnings

Description

Canadian cross-section wage data consisting of a random sample taken from the1971Canadian Census Public Use Tapes for male individuals having common education(grade13).There are205 observations in total.

Usage

data("cps71")

Format

A data frame with2columns,and205rows.

logwage the?rst column,of type numeric

age the second column,of type integer

Source

Aman Ullah

References

Pagan,A.and A.Ullah(1999),Nonparametric Econometrics,Cambridge University Press.

Italy3

Examples

data("cps71")

attach(cps71)

plot(age,logwage,xlab="Age",ylab="log(wage)")

detach(cps71)

Italy Italian GDP Panel

Description

Italian GDP growth panel for21regions covering the period1951-1998(millions of Lire,1990=base).

There are1008observations in total.

Usage

data("Italy")

Format

A data frame with2columns,and1008rows.

year the?rst column,of type integer

gdp the second column,of type numeric:millions of Lire,1990=base

Source

Giovanni Baiocchi

References

Baiocchi,G.(2006),“Economic Applications of Nonparametric Methods,”Ph.D.Thesis,Univer-sity of York.

Examples

data("Italy")

attach(Italy)

plot(ordered(year),gdp,xlab="Year(ordered factor)",

ylab="GDP(millions of Lire,1990=base)")

detach(Italy)

4oecdpanel oecdpanel Cross Country Growth Panel

Description

Cross country GDP growth panel covering the period1960-1995used by Liu and Stengos(2000) and Maasoumi,Racine,and Stengos(2007).There are616observations in total.data("oecdpanel") makes available the dataset"oecdpanel"plus an additional object"bw".

Usage

data("oecdpanel")

Format

A data frame with7columns,and616rows.This panel covers75-year periods:1960-1964,1965-

1969,1970-1974,1975-1979,1980-1984,1985-1989and1990-1994.

A separate local-linear rbandwidth object(‘bw’)has been computed for the user’s convenience

which can be used to visualize this dataset using npplot(bws=bw).

growth the?rst column,of type numeric:growth rate of real GDP per capita for each5-year period

oecd the second column,of type integer:equal to1for OECD members,0otherwise

year the third column,of type integer

initgdp the fourth column,of type numeric:per capita real GDP at the beginning of each5-year period

popgro the?fth column,of type numeric:average annual population growth rate for each5-year period

inv the sixth column,of type numeric:average investment/GDP ratio for each5-year period

humancap the seventh column,of type numeric:average secondary school enrollment rate for each5-year period

Source

Thanasis Stengos

References

Liu,Z.and T.Stengos(1999),“Non-linearities in cross country growth regressions:a semipara-metric approach,”Journal of Applied Econometrics,14,527-538.

Maasoumi,E.and J.S.Racine and T.Stengos(2007),“Growth and convergence:a pro?le of distri-bution dynamics and mobility,”Journal of Econometrics,136,483-508

wage15

Examples

data("oecdpanel")

attach(oecdpanel)

summary(oecdpanel)

detach(oecdpanel)

wage1Cross-Sectional Data on Wages

Description

Cross-section wage data consisting of a random sample taken from the U.S.Current Population Survey for the year1976.There are526observations in total.data("wage1")makes available the dataset"wage"plus additional objects"bw.all"and"bw.subset".

Usage

data("wage1")

Format

A data frame with24columns,and526rows.

Two local-linear rbandwidth objects(‘bw.all’and‘bw.subset’)have been computed for the user’s convenience which can be used to visualize this dataset using npplot(bws=bw.all) wage column1,of type numeric,average hourly earnings

educ column2,of type numeric,years of education

exper column3,of type numeric,years potential experience

tenure column4,of type numeric,years with current employer

nonwhite column5,of type character,=“Nonwhite”if nonwhite,“White”otherwise

female column6,of type character,=“Female”if female,“Male”otherwise

married column7,of type character,=“Married”if Married,“Nonmarried”otherwise

numdep column8,of type numeric,number of dependents

smsa column9,of type numeric,=1if live in SMSA

northcen column10,of type numeric,=1if live in north central U.S

south column11,of type numeric,=1if live in southern region

west column12,of type numeric,=1if live in western region

construc column13,of type numeric,=1if work in construc.indus.

ndurman column14,of type numeric,=1if in nondur.manuf.indus.

trcommpu column15,of type numeric,=1if in trans,commun,pub ut

trade column16,of type numeric,=1if in wholesale or retail

services column17,of type numeric,=1if in services indus.

6gradients

profserv column18,of type numeric,=1if in prof.serv.indus.

profocc column19,of type numeric,=1if in profess.occupation

clerocc column20,of type numeric,=1if in clerical occupation

servocc column21,of type numeric,=1if in service occupation

lwage column22,of type numeric,log(wage)

expersq column23,of type numeric,exper2

tenursq column24,of type numeric,tenure2

Source

Jeffrey M.Wooldridge

References

Wooldridge,J.M.(2000),Introductory Econometrics:A Modern Approach,South-Western College Publishing.

Examples

data("wage1")

attach(wage1)

summary(wage1)

detach(wage1)

gradients Extract Gradients

Description

gradients is a generic function which extracts gradients from objects.

Usage

gradients(x,...)

##S3method for class'condensity':

gradients(x,errors=FALSE,...)

##S3method for class'condistribution':

gradients(x,errors=FALSE,...)

##S3method for class'npregression':

gradients(x,errors=FALSE,...)

##S3method for class'qregression':

gradients(x,errors=FALSE,...)

gradients7 ##S3method for class'singleindex':

gradients(x,errors=FALSE,...)

Arguments

x an object for which the extraction of gradients is meaningful.

...other arguments.

errors a logical value specifying whether or not standard errors of gradients are desired.

Defaults to FALSE.

Details

This function provides a generic interface for extraction of gradients from objects.

Value

Gradients extracted from the model object x.

Note

This method currently only supports objects from the np library.

Author(s)

Tristen Hay?eld hay?eld@phys.ethz.ch ,Jeffrey S.Racine racinej@mcmaster.ca

References

See the references for the method being interrogated via gradients in the appropriate help?le.

For example,for the particulars of the gradients for nonparametric regression see the references in npreg

See Also

fitted,residuals,coef,and se,for related methods;np for supported objects.

Examples

x<-runif(10)

y<-x+rnorm(10,sd=0.1)

gradients(npreg(npregbw(y~x),gradients=TRUE))

8np np Nonparametric Kernel Smoothing Methods for Mixed Datatypes

Description

This package provides a variety of nonparametric and semiparametric kernel methods that seam-lessly handle a mix of continuous,unordered,and ordered factor datatypes(unordered and ordered factors are often referred to as‘nominal’and‘ordinal’categorical variables respectively).

Bandwidth selection is a key aspect of sound nonparametric and semiparametric kernel estimation.

np is designed from the ground up to make bandwidth selection the focus of attention.To this end, one typically begins by creating a‘bandwidth object’which embodies all aspects of the method, including speci?c kernel functions,data names,datatypes,and the like.One then passes these bandwidth objects to other functions,and those functions can grab the speci?cs from the bandwidth object thereby removing potential inconsistencies and unnecessary repetition.

There are two ways in which you can interact with functions in np,either i)using dataframes,or ii) using a formula interface,where appropriate.

To some,it may be natural to use the dataframe interface.The R data.frame function preserves

a variable’s type once it has been cast(unlike cbind,which we avoid for this reason).If you?nd

this most natural for your project,you?rst create a dataframe casting data according to their type

(i.e.,one of continuous(default),factor,ordered)Then you would simply pass this dataframe

to the appropriate np function,for example npudensbw(dat=data).

To others,however,it may be natural to use the formula interface that is used for the regression ex-amples,among others.For nonparametric regression functions such as npreg,you would proceed as you would using lm(e.g.,bw<-npregbw(y~factor(x1)+x2))except that you would of course not need to specify,e.g.,polynomials in variables,interaction terms,or create a number of dummy variables for a factor.Every function in np supports both interfaces,where appropriate.

Note that if your factor is in fact a character string such as,say,X being either"MALE"or"FEMALE", np will handle this directly,i.e.,there is no need to map the string values into unique integers such as(0,1).Once the user casts a variable as a particular datatype(i.e.,factor,ordered,or contin-uous(default)),all subsequent methods automatically detect the type and use the appropriate kernel function and method where appropriate.

All estimation methods are fully multivariate,i.e.,there are no limitations on the number of variables one can model(or number of observations for that matter).Execution time for most routines is, however,exponentially increasing in the number of observations and increases with the number of variables involved.

Nonparametric methods include unconditional density(distribution),conditional density(distri-bution),regression,mode,and quantile estimators along with gradients where appropriate,while semiparametric methods include single index,partially linear,and smooth(i.e.,varying)coef?cient models.

A number of tests are included such as consistent speci?cation tests for parametric regression and

quantile regression models along with tests of signi?cance for nonparametric regression.

A variety of bootstrap methods for computing standard errors,nonparametric con?dence bounds,

and bias-corrected bounds are implemented.

np9

A variety of bandwidth methods are implemented including?xed,nearest-neighbor,and adaptive

nearest-neighbor.

A variety of data-driven methods of bandwidth selection are implemented,while the user can spec-

ify their own bandwidths should they so choose(either a raw bandwidth or scaling factor).

A?exible plotting utility,npplot,facilitates graphing of multivariate objects.An example for creating postscript graphs using the npplot utility and pulling this into a LaTeX document is provided.

The function npksum allows users to create or implement their own kernel estimators or tests should they so desire.

The underlying functions are written in C for computational ef?ciency.Despite this,due to their nature,data-driven bandwidth selection methods involving multivariate numerical search can be time-consuming,particularly for large datasets.A version of this package using the Rmpi wrapper is under development that allows one to deploy this software in a clustered computing environment to facilitate computation involving large datasets.

To cite the np package,type citation("np")from within R for details.

Details

The kernel methods in np employ the so-called‘generalized product kernels’found in Hall,Racine, and Li(2004),Li and Racine(2003),Li and Racine(2004),Li and Racine(2007),Ouyang,Li,and Racine(2006),and Racine and Li(2004),among others.For details on a particular method,kindly refer to the original references listed above.

We brie?y describe the particulars of various univariate kernels used to generate the generalized product kernels that underlie the kernel estimators implemented in the np package.In a nutshell, the generalized kernel functions that underlie the kernel estimators in np are formed by taking the product of univariate kernels such as those listed below.When you cast your data as a particular type (continuous,factor,or ordered factor)in a data frame or formula,the routines will automatically recognize the type of variable being modelled and use the appropriate kernel type for each variable in the resulting estimator.

Second Order Gaussian(x is continuous)k(z)=exp(?z2/2)/√

2π,where z=(x i?x)/h,

and h>0.

Second Order Epanechnikov(x is continuous)k(z)=3

1?z2/5

/(4

5)if z2<5,0other-

wise,where z=(x i?x)/h,and h>0.

Uniform(x is continuous)k(z)=1/2if|z|<1,0otherwise,where z=(x i?x)/h,and h>0.

Aitchison and Aitken(x is a(discrete)factor)l(x i,x,λ)=1?λif x i=x,andλ/(c?1)if x i=x,where c is the number of(discrete)outcomes assumed by the factor x.Note thatλmust lie between0and(c?1)/c.

Wang and van Ryzin(x is a(discrete)ordered factor)l(x i,x,λ)=1?λif|x i?x|=0,and (1?λ)λ|x i?x|/2if|x i?x|≥1.Note thatλmust lie between0and1.

Li and Racine(x is a(discrete)factor)l(x i,x,λ)=1if x i=x,andλif x i=x.Note thatλmust lie between0and1.

Li and Racine(x is a(discrete)ordered factor)l(x i,x,λ)=1if|x i?x|=0,andλ|x i?x|if |x i?x|≥1.Note thatλmust lie between0and1.

10np So,if you had two variables,x i1and x i2,and x i1was continuous while x i2was,say,binary(0/1), and you created a data frame of the form X<-data.frame(x1,factor(x2)),then the kernel function used by np would be K(·)=k(·)×l(·)where the particular kernel functions k(·) and l(·)would be,say,the second order Gaussian(ckertype="gaussian")and Aitchison and Aitken(ukertype="aitchisonaitken")kernels by default,respectively.

Note that higher order continuous kernels(i.e.,fourth,sixth,and eighth order)are derived from the second order kernels given above(see Li and Racine(2007)for details).

For particulars on any given method,kindly see the references listed for the method in question.

Author(s)

Tristen Hay?eld,Jeffrey S.Racine

Maintainer:Jeffrey S.Racine

We are grateful to John Fox and Achim Zeleis for their valuable input and encouragement.We would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada(NSERC:nserc.ca),the Social Sciences and Humanities Research Council of Canada(SSHRC:sshrc.ca),and the Shared Hierarchical Academic Research Computing Network(SHARCNET:sharcnet.ca)

References

Aitchison,J.and C.G.G.Aitken(1976),“Multivariate binary discrimination by the kernel method,”

Biometrika,63,413-420.

Hall,P.and J.S.Racine and Q.Li(2004),“Cross-validation and the estimation of conditional prob-ability densities,”Journal of the American Statistical Association,99,1015-1026.

Li,Q.and J.S.Racine(2003),“Nonparametric estimation of distributions with categorical and continuous data,”Journal of Multivariate Analysis,86,266-292.

Li,Q.and J.S.Racine(2004),“Cross-validated local linear nonparametric regression,”Statistica Sinica,14,485-512.

Ouyang,D.and Q.Li and J.S.Racine(2006),“Cross-validation and the estimation of probability distributions with categorical data,”Journal of Nonparametric Statistics,18,69-100.

Racine,J.S.and Q.Li(2004),“Nonparametric estimation of regression functions with both cate-gorical and continuous Data,”Journal of Econometrics,119,99-130.

Li,Q.and J.S.Racine(2007),Nonparametric Econometrics:Theory and Practice,Princeton Uni-versity Press.

Pagan,A.and A.Ullah(1999),Nonparametric Econometrics,Cambridge University Press.

Scott,D.W.(1992),Multivariate Density Estimation.Theory,Practice and Visualization,New York:Wiley.

Silverman,B.W.(1986),Density Estimation,London:Chapman and Hall.

Wang,M.C.and J.van Ryzin(1981),“A class of smooth estimators for discrete distributions,”

Biometrika,68,301-309.

npcmstest11 npcmstest Kernel Consistent Model Speci?cation Test with Mixed Data

Description

npcmstest implements a consistent test for correct speci?cation of parametric regression models

(linear or nonlinear)as described in Hsiao,Li,and Racine(forthcoming).

Usage

npcmstest(formula,

data=NULL,

subset,

xdat,

ydat,

model=stop(paste(sQuote("model"),"has not been provided")),

distribution=c("bootstrap","asymptotic"),

boot.method=c("iid","wild","wild-rademacher"),

boot.num=399,

pivot=TRUE,

density.weighted=TRUE,

random.seed=42,

...)

Arguments

formula a symbolic description of variables on which the test is to be performed.The

details of constructing a formula are described below.

data an optional data frame,list or environment(or object coercible to a data frame

by as.data.frame)containing the variables in the model.If not found in

data,the variables are taken from environment(formula),typically the

environment from which the function is called.

subset an optional vector specifying a subset of observations to be used.

model a model object obtained from a call to lm(or glm).Important:the call to either

glm or lm must have the arguments x=TRUE and y=TRUE or npcmstest

will not work.

xdat a p-variate data frame of explanatory data(training data)used to calculate the

regression estimators.

ydat a one(1)dimensional numeric or integer vector of dependent data,each element

i corresponding to each observation(row)i of xdat.

distribution a character string used to specify the method of estimating the distribution of the

statistic to be calculated.bootstrap will conduct bootstrapping.asymptotic

will use the normal distribution.Defaults to bootstrap.

12npcmstest boot.method a character string used to specify the bootstrap method.iid will generate inde-pendent identically distributed draws.wild will use a wild bootstrap.wild-

rademacher will use a wild bootstrap with Rademacher variables.Defaults

to iid.

boot.num an integer value specifying the number of bootstrap replications to use.Defaults to399.

pivot a logical value specifying whether the statistic should be normalised such that it approaches N(0,1)in distribution.Defaults to TRUE.

density.weighted

a logical value specifying whether the statistic should be weighted by the density

of xdat.Defaults to TRUE.

random.seed an integer used to seed R’s random number generator.This is to ensure replica-bility.Defaults to42.

...additional arguments supplied to control bandwidth selection on the residuals.

One can specify the bandwidth type,kernel types,and so on.To do this,you may

specify any of bwscaling,bwtype,ckertype,ckerorder,ukertype,

okertype,as described in npregbw.This is necessary if you specify bws

as a p-vector and not a bandwidth object,and you do not desire the default

behaviours.

Value

npcmstest returns an object of type cmstest with the following components,components will contain information related to Jn or In depending on the value of pivot:

Jn the statistic Jn

In the statistic In

Omega.hat as described in Hsiao,C.and Q.Li and J.S.Racine.

q.*the various quantiles of the statistic Jn(or In if pivot=FALSE)are in com-ponents q.90,q.95,q.99(one-sided1%,5%,10%critical values) P the P-value of the statistic

Jn.bootstrap if pivot=TRUE contains the bootstrap replications of Jn

In.bootstrap if pivot=FALSE contains the bootstrap replications of In

summary supports object of type cmstest.

Usage Issues

If you are using data of mixed types,then it is advisable to use the data.frame function to construct your input data and not cbind,since cbind will typically not work as intended on mixed data types and will coerce the data to the same type.

Author(s)

Tristen Hay?eld hay?eld@phys.ethz.ch ,Jeffrey S.Racine racinej@mcmaster.ca

npcmstest13

References

Aitchison,J.and C.G.G.Aitken(1976),“Multivariate binary discrimination by the kernel method,”

Biometrika,63,413-420.

Hsiao,C.and Q.Li and J.S.Racine(forthcoming),“A consistent model speci?cation test with mixed categorical and continuous data,”Journal of Econometrics.

Li,Q.and J.S.Racine(2007),Nonparametric Econometrics:Theory and Practice,Princeton Uni-versity Press.

Maasoumi,E.and J.S.Racine and T.Stengos(2007),“Growth and convergence:a pro?le of distri-bution dynamics and mobility,”Journal of Econometrics,136,483-508.

Murphy,K.M.and F.Welch(1990),“Empirical age-earnings pro?les,”Journal of Labor Eco-nomics,8,202-229.

Pagan,A.and A.Ullah(1999),Nonparametric Econometrics,Cambridge University Press.

Wang,M.C.and J.van Ryzin(1981),“A class of smooth estimators for discrete distributions,”

Biometrika,68,301-309.

Examples

#EXAMPLE1:For this example,we conduct a consistent model

#specification test for a parametric wage regression model that is

#quadratic in age.The work of Murphy and Welch(1990)would suggest

#that this parametric regression model is misspecified.

data("cps71")

attach(cps71)

model<-lm(logwage~age+I(age^2),x=TRUE,y=TRUE)

plot(age,logwage)

lines(age,fitted(model))

#Note-this may take a few minutes depending on the speed of your

#computer...

npcmstest(model=model,xdat=age,ydat=logwage)

##Not run:

#Sleep for5seconds so that we can examine the output...

Sys.sleep(5)

#Next try Murphy&Welch's(1990)suggested quintic specification.

model<-lm(logwage~age+I(age^2)+I(age^3)+I(age^4)+I(age^5),x=TRUE,y=TRUE)

plot(age,logwage)

lines(age,fitted(model))

X<-data.frame(age)

14npcmstest #Note-this may take a few minutes depending on the speed of your

#computer...

npcmstest(model=model,xdat=age,ydat=logwage)

#Sleep for5seconds so that we can examine the output...

Sys.sleep(5)

#Note-you can pass in multiple arguments to this function.For

#instance,to use local linear rather than local constant regression,

#you would use npcmstest(model,X,regtype="ll"),while you could also #change the kernel type(default is second order Gaussian),numerical

#search tolerance,or feed in your own vector of bandwidths and so

#forth.

detach(cps71)

#EXAMPLE2:For this example,we replicate the application in Maasoumi, #Racine,and Stengos(forthcoming)(see oecdpanel for details).We

#estimate a parametric model that is used in the literature,then

#subject it to the model specification test.

data("oecdpanel")

attach(oecdpanel)

model<-lm(growth~oecd+

factor(year)+

initgdp+

I(initgdp^2)+

I(initgdp^3)+

I(initgdp^4)+

popgro+

inv+

humancap+

I(humancap^2)+

I(humancap^3)-1,

x=TRUE,

y=TRUE)

X<-data.frame(factor(oecd),factor(year),initgdp,popgro,inv,humancap) #Note-we override the default tolerances for the sake of this example #(don't of course do this in general).This example may take a few

#minutes depending on the speed of your computer(data-driven bandwidth #selection is,by its nature,time consuming,while the bootstrapping

#also takes some time).

npcmstest(model=model,xdat=X,ydat=growth,tol=.1,ftol=.1)

detach(oecdpanel)

##End(Not run)

npcdens15 npcdens Kernel Conditional Density and Distribution Estimates with Mixed

Datatypes

Description

npcdens computes kernel conditional density estimates on p+q-variate evaluation data,given a set of training data(both explanatory and dependent)and a bandwidth speci?cation(a conbandwidth object or a bandwidth vector,bandwidth type,and kernel type)using the method of Hall,Racine, and Li(2004).Similarly npcdist computes kernel conditional cumulative distribution estimates.

The data may be continuous,discrete(unordered and ordered factors),or some combination thereof. Usage

npcdens(bws,...)

##S3method for class'formula':

npcdens(bws,data=NULL,newdata=NULL,...)

##S3method for class'call':

npcdens(bws,...)

##S3method for class'conbandwidth':

npcdens(bws,

txdat=stop("invoked without training data'txdat'"),

tydat=stop("invoked without training data'tydat'"),

exdat,

eydat,

gradients=FALSE,

...)

##Default S3method:

npcdens(bws,

txdat=stop("training data'txdat'missing"),

tydat=stop("training data'tydat'missing"),

exdat,

eydat,

gradients,...)

Arguments

bws a bandwidth speci?cation.This can be set as a conbandwidth object returned from a previous invocation of npcdensbw,or as a p+q-vector of bandwidths,

with each element i up to i=p corresponding to the bandwidth for column i in

txdat,and each element i from i=p+1to i=p+q corresponding to the

bandwidth for column i?p in tydat.If speci?ed as a vector,then additional

16npcdens

arguments will need to be supplied as necessary to specify the bandwidth type,

kernel types,training data,and so on.

gradients a logical value specifying whether to return estimates of the gradients at the

evaluation points.Defaults to FALSE.

...additional arguments supplied to specify the bandwidth type,kernel types,and

so on.This is necessary if you specify bws as a p+q-vector and not a conbandwidth

object,and you do not desire the default behaviours.To do this,you may spec-

ify any of bwmethod,bwscaling,bwtype,cxkertype,cxkerorder,

cykertype,cykerorder,uxkertype,uykertype,oxkertype,oykertype,

as described in npcdensbw.

data an optional data frame,list or environment(or object coercible to a data frame by

as.data.frame)containing the variables in the model.If not found in data,

the variables are taken from environment(bws),typically the environment

from which npcdensbw was called.

newdata An optional data frame in which to look for evaluation data.If omitted,the

training data are used.

txdat a p-variate data frame of sample realizations of explanatory data(training data).

Defaults to the training data used to compute the bandwidth object.

tydat a q-variate data frame of sample realizations of dependent data(training data).

Defaults to the training data used to compute the bandwidth object.

exdat a p-variate data frame of explanatory data on which conditional densities will be

evaluated.By default,evaluation takes place on the data provided by txdat.

eydat a q-variate data frame of dependent data on which conditional densities will be

evaluated.By default,evaluation takes place on the data provided by tydat.

Details

npcdens and npcdist implement a variety of methods for estimating multivariate conditional

distributions(p+q-variate)de?ned over a set of possibly continuous and/or discrete(unordered,

ordered)data.The approach is based on Li and Racine(2004)who employ‘generalized product

kernels’that admit a mix of continuous and discrete datatypes.

Three classes of kernel estimators for the continuous datatypes are available:?xed,adaptive nearest-

neighbor,and generalized nearest-neighbor.Adaptive nearest-neighbor bandwidths change with

each sample realization in the set,x i,when estimating the density at the point x.Generalized

nearest-neighbor bandwidths change with the point at which the density is estimated,x.Fixed

bandwidths are constant over the support of x.

Training and evaluation input data may be a mix of continuous(default),unordered discrete(to

be speci?ed in the data frames using factor),and ordered discrete(to be speci?ed in the data

frames using ordered).Data can be entered in an arbitrary order and data types will be detected

automatically by the routine(see np for details).

A variety of kernels may be speci?ed by the user.Kernels implemented for continuous datatypes

include the second,fourth,sixth,and eighth order Gaussian and Epanechnikov kernels,and the

uniform kernel.Unordered discrete datatypes use a variation on Aitchison and Aitken’s(1976)

kernel,while ordered datatypes use a variation of the Wang and van Ryzin(1981)kernel.

npcdens17

Value

npcdens returns a condensity object,similarly npcdist returns a condistribution object.The generic accessor functions fitted,se,and gradients,extract estimated values, asymptotic standard errors on estimates,and gradients,respectively,from the returned object.Fur-thermore,the functions summary and plot support objects of both classes.The returned objects have the following components:

xbw bandwidth(s),scale factor(s)or nearest neighbours for the explanatory data, txdat

ybw bandwidth(s),scale factor(s)or nearest neighbours for the dependent data,tydat xeval the evaluation points of the explanatory data

yeval the evaluation points of the dependent data

condens or condist

estimates of the conditional density(cumulative distribution)at the evaluation

points

conderr standard errors of the conditional density(cumulative distribution)estimates congrad if invoked with gradients=TRUE,estimates of the gradients at the evalu-ation points

congerr if invoked with gradients=TRUE,standard errors of the gradients at the evaluation points

log_likelihood

log likelihood of the conditional density estimate

Usage Issues

If you are using data of mixed types,then it is advisable to use the data.frame function to construct your input data and not cbind,since cbind will typically not work as intended on mixed data types and will coerce the data to the same type.

Author(s)

Tristen Hay?eld hay?eld@phys.ethz.ch ,Jeffrey S.Racine racinej@mcmaster.ca

References

Aitchison,J.and C.G.G.Aitken(1976),“Multivariate binary discrimination by the kernel method,”

Biometrika,63,413-420.

Hall,P.and J.S.Racine and Q.Li(2004),“Cross-validation and the estimation of conditional prob-ability densities,”Journal of the American Statistical Association,99,1015-1026.

Li,Q.and J.S.Racine(2007),Nonparametric Econometrics:Theory and Practice,Princeton Uni-versity Press.

Pagan,A.and A.Ullah(1999),Nonparametric Econometrics,Cambridge University Press.

Scott,D.W.(1992),Multivariate Density Estimation.Theory,Practice and Visualization,New York:Wiley.

Silverman,B.W.(1986),Density Estimation,London:Chapman and Hall.

18npcdens Wang,M.C.and J.van Ryzin(1981),“A class of smooth estimators for discrete distributions,”

Biometrika,68,301-309.

See Also

npudens

Examples

#EXAMPLE1(INTERFACE=FORMULA):For this example,we load Giovanni

#Baiocchi's Italian GDP panel(see Italy for details),and compute the #likelihood cross-validated bandwidths(default)using a second-order #Gaussian kernel(default).Note-this may take a minute or two

#depending on the speed of your computer.

data("Italy")

attach(Italy)

#First,compute the bandwidths...note that this may take a minute or #two depending on the speed of your computer.We override the default #tolerances for the search method as the objective function is

#well-behaved(don't of course do this in general).

bw<-npcdensbw(formula=gdp~ordered(year),tol=.1,ftol=.1)

#Next,compute the condensity object...

fhat<-npcdens(bws=bw)

#The object fhat now contains results such as the estimated conditional #density function(fhat$condens)and so on...

summary(fhat)

##Not run:

#Call the npplot()function to visualize the results(-C will

#interrupt on*NIX systems,will interrupt on MS Windows

#systems).

npplot(bws=bw)

#To plot the conditional distribution we use cdf=TRUE in npplot

#(-C will interrupt on*NIX systems,will interrupt on MS #Windows systems)

npplot(bws=bw,cdf=TRUE)

detach(Italy)

#EXAMPLE1(INTERFACE=DATAFRAME):For this example,we load Giovanni #Baiocchi's Italian GDP panel(see Italy for details),and compute the #likelihood cross-validated bandwidths(default)using a second-order

npcdens19 #Gaussian kernel(default).Note-this may take a minute or two

#depending on the speed of your computer.

data("Italy")

attach(Italy)

#First,compute the bandwidths...note that this may take a minute or #two depending on the speed of your computer.We override the default #tolerances for the search method as the objective function is

#well-behaved(don't of course do this in general).

#Note-we cast`X'and`y'as data frames so that npplot()can

#automatically grab names(this looks like overkill,but in

#multivariate settings you would do this anyway,so may as well get in #the habit).

X<-data.frame(year=ordered(year))

y<-data.frame(gdp)

bw<-npcdensbw(xdat=X,ydat=y,tol=.1,ftol=.1)

#Next,compute the condensity object...

fhat<-npcdens(bws=bw)

#The object fhat now contains results such as the estimated conditional #density function(fhat$condens)and so on...

summary(fhat)

#Call the npplot()function to visualize the results(-C will

#interrupt on*NIX systems,will interrupt on MS Windows systems).

npplot(bws=bw)

#To plot the conditional distribution we use cdf=TRUE in npplot

#(-C will interrupt on*NIX systems,will interrupt on MS #Windows systems)

npplot(bws=bw,cdf=TRUE)

detach(Italy)

#EXAMPLE2(INTERFACE=FORMULA):For this example,we load the old

#faithful geyser data from the R`datasets'library and compute the

#conditional density and conditional distribution functions.

library("datasets")

data("faithful")

attach(faithful)

#Note-this may take a few minutes depending on the speed of your

#computer...

20npcdens

bw<-npcdensbw(formula=eruptions~waiting)

summary(bw)

#Plot the density function(-C will interrupt on*NIX systems, #will interrupt on MS Windows systems).

npplot(bws=bw)

#Plot the distribution function(cdf=TRUE)(-C will interrupt on #*NIX systems,will interrupt on MS Windows systems)

npplot(bws=bw,cdf=TRUE)

detach(faithful)

#EXAMPLE2(INTERFACE=DATAFRAME):For this example,we load the old

#faithful geyser data from the R`datasets'library and compute the

#conditional density and conditional distribution functions.

library("datasets")

data("faithful")

attach(faithful)

#Note-this may take a few minutes depending on the speed of your

#computer...

#Note-we cast`X'and`y'as data frames so that npplot()can

#automatically grab names(this looks like overkill,but in

#multivariate settings you would do this anyway,so may as well get in #the habit).

X<-data.frame(waiting)

y<-data.frame(eruptions)

bw<-npcdensbw(xdat=X,ydat=y)

summary(bw)

#Plot the density function(-C will interrupt on*NIX systems, #will interrupt on MS Windows systems)

npplot(bws=bw)

#Plot the distribution function(cdf=TRUE)(-C will interrupt on #*NIX systems,will interrupt on MS Windows systems)

npplot(bws=bw,cdf=TRUE)

detach(faithful)

#EXAMPLE3(INTERFACE=FORMULA):Replicate the DGP of Klein&Spady

npcdens21 #(1993)(see their description on page405,pay careful attention to

#footnote6on page405).

set.seed(123)

n<-1000

#x1is chi-squared having3df truncated at6standardized by

#subtracting 2.348and dividing by 1.511

x<-rchisq(n,df=3)

x1<-(ifelse(x<6,x,6)- 2.348)/1.511

#x2is normal(0,1)truncated at+-2divided by0.8796

x<-rnorm(n)

x2<-ifelse(abs(x)<2,x,2)/0.8796

#y is1if y*>0,0otherwise.

y<-ifelse(x1+x2+rnorm(n)>0,1,0)

#Generate data-driven bandwidths(likelihood cross-validation).We

#override the default tolerances for the search method as the objective

#function is well-behaved(don't of course do this in general).Note-

#this may take a few minutes depending on the speed of your computer...

bw<-npcdensbw(formula=factor(y)~x1+x2,tol=.1,ftol=.1)

#Next,create the evaluation data in order to generate a perspective

#plot

x1.seq<-seq(min(x1),max(x1),length=50)

x2.seq<-seq(min(x2),max(x2),length=50)

X.eval<-expand.grid(x1=x1.seq,x2=x2.seq)

data.eval<-data.frame(y=factor(rep(1,nrow(X.eval))),x1=X.eval[,1],x2=X.eval[,2]) #Now evaluate the conditional probability for y=1and for the

#evaluation Xs

fit<-fitted(npcdens(bws=bw,newdata=data.eval))

#Finally,coerce the data into a matrix for plotting with persp()

fit.mat<-matrix(fit,50,50)

#Generate a perspective plot similar to Figure2b of Klein and Spady

#(1993)

persp(x1.seq,

x2.seq,

fit.mat,

本文来源:https://www.bwwdw.com/article/luqe.html

Top