Computing Network (SHARCNETwww.sharcnet.ca). License GPL URL
更新时间:2023-05-04 00:58:01 阅读量: 实用文档 文档下载
- computing推荐度:
- 相关推荐
The np Package
January11,2008
Version0.14-2
Date2008-01-11
Depends boot
Suggests quantreg
Title Nonparametric kernel smoothing methods for mixed datatypes
Author Tristen Hay?eld
Description This package provides a variety of nonparametric(and semiparametric)kernel methods that seamlessly handle a mix of continuous,unordered,and ordered factor datatypes.We would like to gratefully acknowledge support from the Natural Sciences and Engineering Research
Council of Canada(NSERC:nserc.ca),the Social Sciences and Humanities Research
Council of Canada(SSHRC:sshrc.ca),and the Shared Hierarchical Academic Research Computing Network(SHARCNET:sharcnet.ca).
License GPL
URL
R topics documented:
cps71 (2)
Italy (3)
oecdpanel (4)
wage1 (5)
gradients (6)
np (8)
npcmstest (11)
npcdens (15)
npcdensbw (23)
npconmode (29)
npudens (35)
1
2cps71 npudensbw (41)
npksum (50)
npplot (58)
npplreg (70)
npplregbw (76)
npqcmstest (83)
npqreg (86)
npreg (90)
npregbw (100)
npsigtest (107)
npindex (110)
npindexbw (119)
npscoef (125)
npscoefbw (130)
se (136)
uocquantile (137)
Index138 cps71Canadian High School Graduate Earnings
Description
Canadian cross-section wage data consisting of a random sample taken from the1971Canadian Census Public Use Tapes for male individuals having common education(grade13).There are205 observations in total.
Usage
data("cps71")
Format
A data frame with2columns,and205rows.
logwage the?rst column,of type numeric
age the second column,of type integer
Source
Aman Ullah
References
Pagan,A.and A.Ullah(1999),Nonparametric Econometrics,Cambridge University Press.
Italy3
Examples
data("cps71")
attach(cps71)
plot(age,logwage,xlab="Age",ylab="log(wage)")
detach(cps71)
Italy Italian GDP Panel
Description
Italian GDP growth panel for21regions covering the period1951-1998(millions of Lire,1990=base).
There are1008observations in total.
Usage
data("Italy")
Format
A data frame with2columns,and1008rows.
year the?rst column,of type integer
gdp the second column,of type numeric:millions of Lire,1990=base
Source
Giovanni Baiocchi
References
Baiocchi,G.(2006),“Economic Applications of Nonparametric Methods,”Ph.D.Thesis,Univer-sity of York.
Examples
data("Italy")
attach(Italy)
plot(ordered(year),gdp,xlab="Year(ordered factor)",
ylab="GDP(millions of Lire,1990=base)")
detach(Italy)
4oecdpanel oecdpanel Cross Country Growth Panel
Description
Cross country GDP growth panel covering the period1960-1995used by Liu and Stengos(2000) and Maasoumi,Racine,and Stengos(2007).There are616observations in total.data("oecdpanel") makes available the dataset"oecdpanel"plus an additional object"bw".
Usage
data("oecdpanel")
Format
A data frame with7columns,and616rows.This panel covers75-year periods:1960-1964,1965-
1969,1970-1974,1975-1979,1980-1984,1985-1989and1990-1994.
A separate local-linear rbandwidth object(‘bw’)has been computed for the user’s convenience
which can be used to visualize this dataset using npplot(bws=bw).
growth the?rst column,of type numeric:growth rate of real GDP per capita for each5-year period
oecd the second column,of type integer:equal to1for OECD members,0otherwise
year the third column,of type integer
initgdp the fourth column,of type numeric:per capita real GDP at the beginning of each5-year period
popgro the?fth column,of type numeric:average annual population growth rate for each5-year period
inv the sixth column,of type numeric:average investment/GDP ratio for each5-year period
humancap the seventh column,of type numeric:average secondary school enrollment rate for each5-year period
Source
Thanasis Stengos
References
Liu,Z.and T.Stengos(1999),“Non-linearities in cross country growth regressions:a semipara-metric approach,”Journal of Applied Econometrics,14,527-538.
Maasoumi,E.and J.S.Racine and T.Stengos(2007),“Growth and convergence:a pro?le of distri-bution dynamics and mobility,”Journal of Econometrics,136,483-508
wage15
Examples
data("oecdpanel")
attach(oecdpanel)
summary(oecdpanel)
detach(oecdpanel)
wage1Cross-Sectional Data on Wages
Description
Cross-section wage data consisting of a random sample taken from the U.S.Current Population Survey for the year1976.There are526observations in total.data("wage1")makes available the dataset"wage"plus additional objects"bw.all"and"bw.subset".
Usage
data("wage1")
Format
A data frame with24columns,and526rows.
Two local-linear rbandwidth objects(‘bw.all’and‘bw.subset’)have been computed for the user’s convenience which can be used to visualize this dataset using npplot(bws=bw.all) wage column1,of type numeric,average hourly earnings
educ column2,of type numeric,years of education
exper column3,of type numeric,years potential experience
tenure column4,of type numeric,years with current employer
nonwhite column5,of type character,=“Nonwhite”if nonwhite,“White”otherwise
female column6,of type character,=“Female”if female,“Male”otherwise
married column7,of type character,=“Married”if Married,“Nonmarried”otherwise
numdep column8,of type numeric,number of dependents
smsa column9,of type numeric,=1if live in SMSA
northcen column10,of type numeric,=1if live in north central U.S
south column11,of type numeric,=1if live in southern region
west column12,of type numeric,=1if live in western region
construc column13,of type numeric,=1if work in construc.indus.
ndurman column14,of type numeric,=1if in nondur.manuf.indus.
trcommpu column15,of type numeric,=1if in trans,commun,pub ut
trade column16,of type numeric,=1if in wholesale or retail
services column17,of type numeric,=1if in services indus.
6gradients
profserv column18,of type numeric,=1if in prof.serv.indus.
profocc column19,of type numeric,=1if in profess.occupation
clerocc column20,of type numeric,=1if in clerical occupation
servocc column21,of type numeric,=1if in service occupation
lwage column22,of type numeric,log(wage)
expersq column23,of type numeric,exper2
tenursq column24,of type numeric,tenure2
Source
Jeffrey M.Wooldridge
References
Wooldridge,J.M.(2000),Introductory Econometrics:A Modern Approach,South-Western College Publishing.
Examples
data("wage1")
attach(wage1)
summary(wage1)
detach(wage1)
gradients Extract Gradients
Description
gradients is a generic function which extracts gradients from objects.
Usage
gradients(x,...)
##S3method for class'condensity':
gradients(x,errors=FALSE,...)
##S3method for class'condistribution':
gradients(x,errors=FALSE,...)
##S3method for class'npregression':
gradients(x,errors=FALSE,...)
##S3method for class'qregression':
gradients(x,errors=FALSE,...)
gradients7 ##S3method for class'singleindex':
gradients(x,errors=FALSE,...)
Arguments
x an object for which the extraction of gradients is meaningful.
...other arguments.
errors a logical value specifying whether or not standard errors of gradients are desired.
Defaults to FALSE.
Details
This function provides a generic interface for extraction of gradients from objects.
Value
Gradients extracted from the model object x.
Note
This method currently only supports objects from the np library.
Author(s)
Tristen Hay?eld hay?eld@phys.ethz.ch ,Jeffrey S.Racine racinej@mcmaster.ca
References
See the references for the method being interrogated via gradients in the appropriate help?le.
For example,for the particulars of the gradients for nonparametric regression see the references in npreg
See Also
fitted,residuals,coef,and se,for related methods;np for supported objects.
Examples
x<-runif(10)
y<-x+rnorm(10,sd=0.1)
gradients(npreg(npregbw(y~x),gradients=TRUE))
8np np Nonparametric Kernel Smoothing Methods for Mixed Datatypes
Description
This package provides a variety of nonparametric and semiparametric kernel methods that seam-lessly handle a mix of continuous,unordered,and ordered factor datatypes(unordered and ordered factors are often referred to as‘nominal’and‘ordinal’categorical variables respectively).
Bandwidth selection is a key aspect of sound nonparametric and semiparametric kernel estimation.
np is designed from the ground up to make bandwidth selection the focus of attention.To this end, one typically begins by creating a‘bandwidth object’which embodies all aspects of the method, including speci?c kernel functions,data names,datatypes,and the like.One then passes these bandwidth objects to other functions,and those functions can grab the speci?cs from the bandwidth object thereby removing potential inconsistencies and unnecessary repetition.
There are two ways in which you can interact with functions in np,either i)using dataframes,or ii) using a formula interface,where appropriate.
To some,it may be natural to use the dataframe interface.The R data.frame function preserves
a variable’s type once it has been cast(unlike cbind,which we avoid for this reason).If you?nd
this most natural for your project,you?rst create a dataframe casting data according to their type
(i.e.,one of continuous(default),factor,ordered)Then you would simply pass this dataframe
to the appropriate np function,for example npudensbw(dat=data).
To others,however,it may be natural to use the formula interface that is used for the regression ex-amples,among others.For nonparametric regression functions such as npreg,you would proceed as you would using lm(e.g.,bw<-npregbw(y~factor(x1)+x2))except that you would of course not need to specify,e.g.,polynomials in variables,interaction terms,or create a number of dummy variables for a factor.Every function in np supports both interfaces,where appropriate.
Note that if your factor is in fact a character string such as,say,X being either"MALE"or"FEMALE", np will handle this directly,i.e.,there is no need to map the string values into unique integers such as(0,1).Once the user casts a variable as a particular datatype(i.e.,factor,ordered,or contin-uous(default)),all subsequent methods automatically detect the type and use the appropriate kernel function and method where appropriate.
All estimation methods are fully multivariate,i.e.,there are no limitations on the number of variables one can model(or number of observations for that matter).Execution time for most routines is, however,exponentially increasing in the number of observations and increases with the number of variables involved.
Nonparametric methods include unconditional density(distribution),conditional density(distri-bution),regression,mode,and quantile estimators along with gradients where appropriate,while semiparametric methods include single index,partially linear,and smooth(i.e.,varying)coef?cient models.
A number of tests are included such as consistent speci?cation tests for parametric regression and
quantile regression models along with tests of signi?cance for nonparametric regression.
A variety of bootstrap methods for computing standard errors,nonparametric con?dence bounds,
and bias-corrected bounds are implemented.
np9
A variety of bandwidth methods are implemented including?xed,nearest-neighbor,and adaptive
nearest-neighbor.
A variety of data-driven methods of bandwidth selection are implemented,while the user can spec-
ify their own bandwidths should they so choose(either a raw bandwidth or scaling factor).
A?exible plotting utility,npplot,facilitates graphing of multivariate objects.An example for creating postscript graphs using the npplot utility and pulling this into a LaTeX document is provided.
The function npksum allows users to create or implement their own kernel estimators or tests should they so desire.
The underlying functions are written in C for computational ef?ciency.Despite this,due to their nature,data-driven bandwidth selection methods involving multivariate numerical search can be time-consuming,particularly for large datasets.A version of this package using the Rmpi wrapper is under development that allows one to deploy this software in a clustered computing environment to facilitate computation involving large datasets.
To cite the np package,type citation("np")from within R for details.
Details
The kernel methods in np employ the so-called‘generalized product kernels’found in Hall,Racine, and Li(2004),Li and Racine(2003),Li and Racine(2004),Li and Racine(2007),Ouyang,Li,and Racine(2006),and Racine and Li(2004),among others.For details on a particular method,kindly refer to the original references listed above.
We brie?y describe the particulars of various univariate kernels used to generate the generalized product kernels that underlie the kernel estimators implemented in the np package.In a nutshell, the generalized kernel functions that underlie the kernel estimators in np are formed by taking the product of univariate kernels such as those listed below.When you cast your data as a particular type (continuous,factor,or ordered factor)in a data frame or formula,the routines will automatically recognize the type of variable being modelled and use the appropriate kernel type for each variable in the resulting estimator.
Second Order Gaussian(x is continuous)k(z)=exp(?z2/2)/√
2π,where z=(x i?x)/h,
and h>0.
Second Order Epanechnikov(x is continuous)k(z)=3
1?z2/5
/(4
√
5)if z2<5,0other-
wise,where z=(x i?x)/h,and h>0.
Uniform(x is continuous)k(z)=1/2if|z|<1,0otherwise,where z=(x i?x)/h,and h>0.
Aitchison and Aitken(x is a(discrete)factor)l(x i,x,λ)=1?λif x i=x,andλ/(c?1)if x i=x,where c is the number of(discrete)outcomes assumed by the factor x.Note thatλmust lie between0and(c?1)/c.
Wang and van Ryzin(x is a(discrete)ordered factor)l(x i,x,λ)=1?λif|x i?x|=0,and (1?λ)λ|x i?x|/2if|x i?x|≥1.Note thatλmust lie between0and1.
Li and Racine(x is a(discrete)factor)l(x i,x,λ)=1if x i=x,andλif x i=x.Note thatλmust lie between0and1.
Li and Racine(x is a(discrete)ordered factor)l(x i,x,λ)=1if|x i?x|=0,andλ|x i?x|if |x i?x|≥1.Note thatλmust lie between0and1.
10np So,if you had two variables,x i1and x i2,and x i1was continuous while x i2was,say,binary(0/1), and you created a data frame of the form X<-data.frame(x1,factor(x2)),then the kernel function used by np would be K(·)=k(·)×l(·)where the particular kernel functions k(·) and l(·)would be,say,the second order Gaussian(ckertype="gaussian")and Aitchison and Aitken(ukertype="aitchisonaitken")kernels by default,respectively.
Note that higher order continuous kernels(i.e.,fourth,sixth,and eighth order)are derived from the second order kernels given above(see Li and Racine(2007)for details).
For particulars on any given method,kindly see the references listed for the method in question.
Author(s)
Tristen Hay?eld
Maintainer:Jeffrey S.Racine
We are grateful to John Fox and Achim Zeleis for their valuable input and encouragement.We would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada(NSERC:nserc.ca),the Social Sciences and Humanities Research Council of Canada(SSHRC:sshrc.ca),and the Shared Hierarchical Academic Research Computing Network(SHARCNET:sharcnet.ca)
References
Aitchison,J.and C.G.G.Aitken(1976),“Multivariate binary discrimination by the kernel method,”
Biometrika,63,413-420.
Hall,P.and J.S.Racine and Q.Li(2004),“Cross-validation and the estimation of conditional prob-ability densities,”Journal of the American Statistical Association,99,1015-1026.
Li,Q.and J.S.Racine(2003),“Nonparametric estimation of distributions with categorical and continuous data,”Journal of Multivariate Analysis,86,266-292.
Li,Q.and J.S.Racine(2004),“Cross-validated local linear nonparametric regression,”Statistica Sinica,14,485-512.
Ouyang,D.and Q.Li and J.S.Racine(2006),“Cross-validation and the estimation of probability distributions with categorical data,”Journal of Nonparametric Statistics,18,69-100.
Racine,J.S.and Q.Li(2004),“Nonparametric estimation of regression functions with both cate-gorical and continuous Data,”Journal of Econometrics,119,99-130.
Li,Q.and J.S.Racine(2007),Nonparametric Econometrics:Theory and Practice,Princeton Uni-versity Press.
Pagan,A.and A.Ullah(1999),Nonparametric Econometrics,Cambridge University Press.
Scott,D.W.(1992),Multivariate Density Estimation.Theory,Practice and Visualization,New York:Wiley.
Silverman,B.W.(1986),Density Estimation,London:Chapman and Hall.
Wang,M.C.and J.van Ryzin(1981),“A class of smooth estimators for discrete distributions,”
Biometrika,68,301-309.
npcmstest11 npcmstest Kernel Consistent Model Speci?cation Test with Mixed Data
Description
npcmstest implements a consistent test for correct speci?cation of parametric regression models
(linear or nonlinear)as described in Hsiao,Li,and Racine(forthcoming).
Usage
npcmstest(formula,
data=NULL,
subset,
xdat,
ydat,
model=stop(paste(sQuote("model"),"has not been provided")),
distribution=c("bootstrap","asymptotic"),
boot.method=c("iid","wild","wild-rademacher"),
boot.num=399,
pivot=TRUE,
density.weighted=TRUE,
random.seed=42,
...)
Arguments
formula a symbolic description of variables on which the test is to be performed.The
details of constructing a formula are described below.
data an optional data frame,list or environment(or object coercible to a data frame
by as.data.frame)containing the variables in the model.If not found in
data,the variables are taken from environment(formula),typically the
environment from which the function is called.
subset an optional vector specifying a subset of observations to be used.
model a model object obtained from a call to lm(or glm).Important:the call to either
glm or lm must have the arguments x=TRUE and y=TRUE or npcmstest
will not work.
xdat a p-variate data frame of explanatory data(training data)used to calculate the
regression estimators.
ydat a one(1)dimensional numeric or integer vector of dependent data,each element
i corresponding to each observation(row)i of xdat.
distribution a character string used to specify the method of estimating the distribution of the
statistic to be calculated.bootstrap will conduct bootstrapping.asymptotic
will use the normal distribution.Defaults to bootstrap.
12npcmstest boot.method a character string used to specify the bootstrap method.iid will generate inde-pendent identically distributed draws.wild will use a wild bootstrap.wild-
rademacher will use a wild bootstrap with Rademacher variables.Defaults
to iid.
boot.num an integer value specifying the number of bootstrap replications to use.Defaults to399.
pivot a logical value specifying whether the statistic should be normalised such that it approaches N(0,1)in distribution.Defaults to TRUE.
density.weighted
a logical value specifying whether the statistic should be weighted by the density
of xdat.Defaults to TRUE.
random.seed an integer used to seed R’s random number generator.This is to ensure replica-bility.Defaults to42.
...additional arguments supplied to control bandwidth selection on the residuals.
One can specify the bandwidth type,kernel types,and so on.To do this,you may
specify any of bwscaling,bwtype,ckertype,ckerorder,ukertype,
okertype,as described in npregbw.This is necessary if you specify bws
as a p-vector and not a bandwidth object,and you do not desire the default
behaviours.
Value
npcmstest returns an object of type cmstest with the following components,components will contain information related to Jn or In depending on the value of pivot:
Jn the statistic Jn
In the statistic In
Omega.hat as described in Hsiao,C.and Q.Li and J.S.Racine.
q.*the various quantiles of the statistic Jn(or In if pivot=FALSE)are in com-ponents q.90,q.95,q.99(one-sided1%,5%,10%critical values) P the P-value of the statistic
Jn.bootstrap if pivot=TRUE contains the bootstrap replications of Jn
In.bootstrap if pivot=FALSE contains the bootstrap replications of In
summary supports object of type cmstest.
Usage Issues
If you are using data of mixed types,then it is advisable to use the data.frame function to construct your input data and not cbind,since cbind will typically not work as intended on mixed data types and will coerce the data to the same type.
Author(s)
Tristen Hay?eld hay?eld@phys.ethz.ch ,Jeffrey S.Racine racinej@mcmaster.ca
npcmstest13
References
Aitchison,J.and C.G.G.Aitken(1976),“Multivariate binary discrimination by the kernel method,”
Biometrika,63,413-420.
Hsiao,C.and Q.Li and J.S.Racine(forthcoming),“A consistent model speci?cation test with mixed categorical and continuous data,”Journal of Econometrics.
Li,Q.and J.S.Racine(2007),Nonparametric Econometrics:Theory and Practice,Princeton Uni-versity Press.
Maasoumi,E.and J.S.Racine and T.Stengos(2007),“Growth and convergence:a pro?le of distri-bution dynamics and mobility,”Journal of Econometrics,136,483-508.
Murphy,K.M.and F.Welch(1990),“Empirical age-earnings pro?les,”Journal of Labor Eco-nomics,8,202-229.
Pagan,A.and A.Ullah(1999),Nonparametric Econometrics,Cambridge University Press.
Wang,M.C.and J.van Ryzin(1981),“A class of smooth estimators for discrete distributions,”
Biometrika,68,301-309.
Examples
#EXAMPLE1:For this example,we conduct a consistent model
#specification test for a parametric wage regression model that is
#quadratic in age.The work of Murphy and Welch(1990)would suggest
#that this parametric regression model is misspecified.
data("cps71")
attach(cps71)
model<-lm(logwage~age+I(age^2),x=TRUE,y=TRUE)
plot(age,logwage)
lines(age,fitted(model))
#Note-this may take a few minutes depending on the speed of your
#computer...
npcmstest(model=model,xdat=age,ydat=logwage)
##Not run:
#Sleep for5seconds so that we can examine the output...
Sys.sleep(5)
#Next try Murphy&Welch's(1990)suggested quintic specification.
model<-lm(logwage~age+I(age^2)+I(age^3)+I(age^4)+I(age^5),x=TRUE,y=TRUE)
plot(age,logwage)
lines(age,fitted(model))
X<-data.frame(age)
14npcmstest #Note-this may take a few minutes depending on the speed of your
#computer...
npcmstest(model=model,xdat=age,ydat=logwage)
#Sleep for5seconds so that we can examine the output...
Sys.sleep(5)
#Note-you can pass in multiple arguments to this function.For
#instance,to use local linear rather than local constant regression,
#you would use npcmstest(model,X,regtype="ll"),while you could also #change the kernel type(default is second order Gaussian),numerical
#search tolerance,or feed in your own vector of bandwidths and so
#forth.
detach(cps71)
#EXAMPLE2:For this example,we replicate the application in Maasoumi, #Racine,and Stengos(forthcoming)(see oecdpanel for details).We
#estimate a parametric model that is used in the literature,then
#subject it to the model specification test.
data("oecdpanel")
attach(oecdpanel)
model<-lm(growth~oecd+
factor(year)+
initgdp+
I(initgdp^2)+
I(initgdp^3)+
I(initgdp^4)+
popgro+
inv+
humancap+
I(humancap^2)+
I(humancap^3)-1,
x=TRUE,
y=TRUE)
X<-data.frame(factor(oecd),factor(year),initgdp,popgro,inv,humancap) #Note-we override the default tolerances for the sake of this example #(don't of course do this in general).This example may take a few
#minutes depending on the speed of your computer(data-driven bandwidth #selection is,by its nature,time consuming,while the bootstrapping
#also takes some time).
npcmstest(model=model,xdat=X,ydat=growth,tol=.1,ftol=.1)
detach(oecdpanel)
##End(Not run)
npcdens15 npcdens Kernel Conditional Density and Distribution Estimates with Mixed
Datatypes
Description
npcdens computes kernel conditional density estimates on p+q-variate evaluation data,given a set of training data(both explanatory and dependent)and a bandwidth speci?cation(a conbandwidth object or a bandwidth vector,bandwidth type,and kernel type)using the method of Hall,Racine, and Li(2004).Similarly npcdist computes kernel conditional cumulative distribution estimates.
The data may be continuous,discrete(unordered and ordered factors),or some combination thereof. Usage
npcdens(bws,...)
##S3method for class'formula':
npcdens(bws,data=NULL,newdata=NULL,...)
##S3method for class'call':
npcdens(bws,...)
##S3method for class'conbandwidth':
npcdens(bws,
txdat=stop("invoked without training data'txdat'"),
tydat=stop("invoked without training data'tydat'"),
exdat,
eydat,
gradients=FALSE,
...)
##Default S3method:
npcdens(bws,
txdat=stop("training data'txdat'missing"),
tydat=stop("training data'tydat'missing"),
exdat,
eydat,
gradients,...)
Arguments
bws a bandwidth speci?cation.This can be set as a conbandwidth object returned from a previous invocation of npcdensbw,or as a p+q-vector of bandwidths,
with each element i up to i=p corresponding to the bandwidth for column i in
txdat,and each element i from i=p+1to i=p+q corresponding to the
bandwidth for column i?p in tydat.If speci?ed as a vector,then additional
16npcdens
arguments will need to be supplied as necessary to specify the bandwidth type,
kernel types,training data,and so on.
gradients a logical value specifying whether to return estimates of the gradients at the
evaluation points.Defaults to FALSE.
...additional arguments supplied to specify the bandwidth type,kernel types,and
so on.This is necessary if you specify bws as a p+q-vector and not a conbandwidth
object,and you do not desire the default behaviours.To do this,you may spec-
ify any of bwmethod,bwscaling,bwtype,cxkertype,cxkerorder,
cykertype,cykerorder,uxkertype,uykertype,oxkertype,oykertype,
as described in npcdensbw.
data an optional data frame,list or environment(or object coercible to a data frame by
as.data.frame)containing the variables in the model.If not found in data,
the variables are taken from environment(bws),typically the environment
from which npcdensbw was called.
newdata An optional data frame in which to look for evaluation data.If omitted,the
training data are used.
txdat a p-variate data frame of sample realizations of explanatory data(training data).
Defaults to the training data used to compute the bandwidth object.
tydat a q-variate data frame of sample realizations of dependent data(training data).
Defaults to the training data used to compute the bandwidth object.
exdat a p-variate data frame of explanatory data on which conditional densities will be
evaluated.By default,evaluation takes place on the data provided by txdat.
eydat a q-variate data frame of dependent data on which conditional densities will be
evaluated.By default,evaluation takes place on the data provided by tydat.
Details
npcdens and npcdist implement a variety of methods for estimating multivariate conditional
distributions(p+q-variate)de?ned over a set of possibly continuous and/or discrete(unordered,
ordered)data.The approach is based on Li and Racine(2004)who employ‘generalized product
kernels’that admit a mix of continuous and discrete datatypes.
Three classes of kernel estimators for the continuous datatypes are available:?xed,adaptive nearest-
neighbor,and generalized nearest-neighbor.Adaptive nearest-neighbor bandwidths change with
each sample realization in the set,x i,when estimating the density at the point x.Generalized
nearest-neighbor bandwidths change with the point at which the density is estimated,x.Fixed
bandwidths are constant over the support of x.
Training and evaluation input data may be a mix of continuous(default),unordered discrete(to
be speci?ed in the data frames using factor),and ordered discrete(to be speci?ed in the data
frames using ordered).Data can be entered in an arbitrary order and data types will be detected
automatically by the routine(see np for details).
A variety of kernels may be speci?ed by the user.Kernels implemented for continuous datatypes
include the second,fourth,sixth,and eighth order Gaussian and Epanechnikov kernels,and the
uniform kernel.Unordered discrete datatypes use a variation on Aitchison and Aitken’s(1976)
kernel,while ordered datatypes use a variation of the Wang and van Ryzin(1981)kernel.
npcdens17
Value
npcdens returns a condensity object,similarly npcdist returns a condistribution object.The generic accessor functions fitted,se,and gradients,extract estimated values, asymptotic standard errors on estimates,and gradients,respectively,from the returned object.Fur-thermore,the functions summary and plot support objects of both classes.The returned objects have the following components:
xbw bandwidth(s),scale factor(s)or nearest neighbours for the explanatory data, txdat
ybw bandwidth(s),scale factor(s)or nearest neighbours for the dependent data,tydat xeval the evaluation points of the explanatory data
yeval the evaluation points of the dependent data
condens or condist
estimates of the conditional density(cumulative distribution)at the evaluation
points
conderr standard errors of the conditional density(cumulative distribution)estimates congrad if invoked with gradients=TRUE,estimates of the gradients at the evalu-ation points
congerr if invoked with gradients=TRUE,standard errors of the gradients at the evaluation points
log_likelihood
log likelihood of the conditional density estimate
Usage Issues
If you are using data of mixed types,then it is advisable to use the data.frame function to construct your input data and not cbind,since cbind will typically not work as intended on mixed data types and will coerce the data to the same type.
Author(s)
Tristen Hay?eld hay?eld@phys.ethz.ch ,Jeffrey S.Racine racinej@mcmaster.ca
References
Aitchison,J.and C.G.G.Aitken(1976),“Multivariate binary discrimination by the kernel method,”
Biometrika,63,413-420.
Hall,P.and J.S.Racine and Q.Li(2004),“Cross-validation and the estimation of conditional prob-ability densities,”Journal of the American Statistical Association,99,1015-1026.
Li,Q.and J.S.Racine(2007),Nonparametric Econometrics:Theory and Practice,Princeton Uni-versity Press.
Pagan,A.and A.Ullah(1999),Nonparametric Econometrics,Cambridge University Press.
Scott,D.W.(1992),Multivariate Density Estimation.Theory,Practice and Visualization,New York:Wiley.
Silverman,B.W.(1986),Density Estimation,London:Chapman and Hall.
18npcdens Wang,M.C.and J.van Ryzin(1981),“A class of smooth estimators for discrete distributions,”
Biometrika,68,301-309.
See Also
npudens
Examples
#EXAMPLE1(INTERFACE=FORMULA):For this example,we load Giovanni
#Baiocchi's Italian GDP panel(see Italy for details),and compute the #likelihood cross-validated bandwidths(default)using a second-order #Gaussian kernel(default).Note-this may take a minute or two
#depending on the speed of your computer.
data("Italy")
attach(Italy)
#First,compute the bandwidths...note that this may take a minute or #two depending on the speed of your computer.We override the default #tolerances for the search method as the objective function is
#well-behaved(don't of course do this in general).
bw<-npcdensbw(formula=gdp~ordered(year),tol=.1,ftol=.1)
#Next,compute the condensity object...
fhat<-npcdens(bws=bw)
#The object fhat now contains results such as the estimated conditional #density function(fhat$condens)and so on...
summary(fhat)
##Not run:
#Call the npplot()function to visualize the results(
#interrupt on*NIX systems,
#systems).
npplot(bws=bw)
#To plot the conditional distribution we use cdf=TRUE in npplot
#(
npplot(bws=bw,cdf=TRUE)
detach(Italy)
#EXAMPLE1(INTERFACE=DATAFRAME):For this example,we load Giovanni #Baiocchi's Italian GDP panel(see Italy for details),and compute the #likelihood cross-validated bandwidths(default)using a second-order
npcdens19 #Gaussian kernel(default).Note-this may take a minute or two
#depending on the speed of your computer.
data("Italy")
attach(Italy)
#First,compute the bandwidths...note that this may take a minute or #two depending on the speed of your computer.We override the default #tolerances for the search method as the objective function is
#well-behaved(don't of course do this in general).
#Note-we cast`X'and`y'as data frames so that npplot()can
#automatically grab names(this looks like overkill,but in
#multivariate settings you would do this anyway,so may as well get in #the habit).
X<-data.frame(year=ordered(year))
y<-data.frame(gdp)
bw<-npcdensbw(xdat=X,ydat=y,tol=.1,ftol=.1)
#Next,compute the condensity object...
fhat<-npcdens(bws=bw)
#The object fhat now contains results such as the estimated conditional #density function(fhat$condens)and so on...
summary(fhat)
#Call the npplot()function to visualize the results(
#interrupt on*NIX systems,
npplot(bws=bw)
#To plot the conditional distribution we use cdf=TRUE in npplot
#(
npplot(bws=bw,cdf=TRUE)
detach(Italy)
#EXAMPLE2(INTERFACE=FORMULA):For this example,we load the old
#faithful geyser data from the R`datasets'library and compute the
#conditional density and conditional distribution functions.
library("datasets")
data("faithful")
attach(faithful)
#Note-this may take a few minutes depending on the speed of your
#computer...
20npcdens
bw<-npcdensbw(formula=eruptions~waiting)
summary(bw)
#Plot the density function(
npplot(bws=bw)
#Plot the distribution function(cdf=TRUE)(
npplot(bws=bw,cdf=TRUE)
detach(faithful)
#EXAMPLE2(INTERFACE=DATAFRAME):For this example,we load the old
#faithful geyser data from the R`datasets'library and compute the
#conditional density and conditional distribution functions.
library("datasets")
data("faithful")
attach(faithful)
#Note-this may take a few minutes depending on the speed of your
#computer...
#Note-we cast`X'and`y'as data frames so that npplot()can
#automatically grab names(this looks like overkill,but in
#multivariate settings you would do this anyway,so may as well get in #the habit).
X<-data.frame(waiting)
y<-data.frame(eruptions)
bw<-npcdensbw(xdat=X,ydat=y)
summary(bw)
#Plot the density function(
npplot(bws=bw)
#Plot the distribution function(cdf=TRUE)(
npplot(bws=bw,cdf=TRUE)
detach(faithful)
#EXAMPLE3(INTERFACE=FORMULA):Replicate the DGP of Klein&Spady
npcdens21 #(1993)(see their description on page405,pay careful attention to
#footnote6on page405).
set.seed(123)
n<-1000
#x1is chi-squared having3df truncated at6standardized by
#subtracting 2.348and dividing by 1.511
x<-rchisq(n,df=3)
x1<-(ifelse(x<6,x,6)- 2.348)/1.511
#x2is normal(0,1)truncated at+-2divided by0.8796
x<-rnorm(n)
x2<-ifelse(abs(x)<2,x,2)/0.8796
#y is1if y*>0,0otherwise.
y<-ifelse(x1+x2+rnorm(n)>0,1,0)
#Generate data-driven bandwidths(likelihood cross-validation).We
#override the default tolerances for the search method as the objective
#function is well-behaved(don't of course do this in general).Note-
#this may take a few minutes depending on the speed of your computer...
bw<-npcdensbw(formula=factor(y)~x1+x2,tol=.1,ftol=.1)
#Next,create the evaluation data in order to generate a perspective
#plot
x1.seq<-seq(min(x1),max(x1),length=50)
x2.seq<-seq(min(x2),max(x2),length=50)
X.eval<-expand.grid(x1=x1.seq,x2=x2.seq)
data.eval<-data.frame(y=factor(rep(1,nrow(X.eval))),x1=X.eval[,1],x2=X.eval[,2]) #Now evaluate the conditional probability for y=1and for the
#evaluation Xs
fit<-fitted(npcdens(bws=bw,newdata=data.eval))
#Finally,coerce the data into a matrix for plotting with persp()
fit.mat<-matrix(fit,50,50)
#Generate a perspective plot similar to Figure2b of Klein and Spady
#(1993)
persp(x1.seq,
x2.seq,
fit.mat,
正在阅读:
Computing Network (SHARCNETwww.sharcnet.ca). License GPL URL05-04
全国病险水库除险加固专项规划项目开工情况05-15
哈师大社会与历史学院2013—2014学年第一学期期末考试安排11-14
我爱春雨作文500字07-17
席慕容经典的短篇散文03-30
晨雾作文500字07-08
全国安全生产月个人工作总结范本集锦04-04
2022秋八年级英语上册 Unit 9 Can you come to my party导学案(04-12
县司法局普法教育工作总结和下一年工作思路08-04
一个菜鸟医疗站长对外链建设的一点小体会05-29
- 1Integer Factorization and Computing Discrete Logarithms in Maple
- 2Biomolecular computing systems principles, progress and pote
- 3Network Boot Protocol设置
- 4the impact of social network sites
- 5aps审核-computer network
- 6获得视频URL地址的方法
- 7Cloud computing理论测试部分习题答案
- 8License管理,流程和页面设计 - 图文
- 9PowerChute network shutdown 安装指南
- 10观后感-The social network
- 教学能力大赛决赛获奖-教学实施报告-(完整图文版)
- 互联网+数据中心行业分析报告
- 2017上海杨浦区高三一模数学试题及答案
- 招商部差旅接待管理制度(4-25)
- 学生游玩安全注意事项
- 学生信息管理系统(文档模板供参考)
- 叉车门架有限元分析及系统设计
- 2014帮助残疾人志愿者服务情况记录
- 叶绿体中色素的提取和分离实验
- 中国食物成分表2020年最新权威完整改进版
- 推动国土资源领域生态文明建设
- 给水管道冲洗和消毒记录
- 计算机软件专业自我评价
- 高中数学必修1-5知识点归纳
- 2018-2022年中国第五代移动通信技术(5G)产业深度分析及发展前景研究报告发展趋势(目录)
- 生产车间巡查制度
- 2018版中国光热发电行业深度研究报告目录
- (通用)2019年中考数学总复习 第一章 第四节 数的开方与二次根式课件
- 2017_2018学年高中语文第二单元第4课说数课件粤教版
- 上市新药Lumateperone(卢美哌隆)合成检索总结报告
- SHARCNETwww
- Computing
- sharcnet
- Network
- License
- GPL
- URL
- ca
- 乡镇XX年开展“两违”整治工作情况汇报
- 2016-2021年中国安全通信行业竞争分析及发展前景预测报告
- 2020一级建造师《机电工程》考点预习汇总【二】.doc
- 四川省射洪外国语学校八年级语文下学期第3周周考试题
- 浙江高中语文第1单元科学之光第1课物种起源绪论落实应用案苏教版必修5
- Unix_Linux_Windows_OpenMP多线程编程
- 2016年全国中学生生物学联赛试题及解析-萧山三中
- 最新统编版小学二年级语文下册全套试卷 (1)
- 烟台光伏设备项目立项报告
- 2018年西南大学思想政治教育综合考研复试核心题库
- 部编八年级初二语文下册期末复习计划+复习教案(名师推荐精编版)
- 我国连锁药店的历史发展进程
- 第一学期高一物理教学工作计划2018
- ASPEN PLUS 10.0 用户模型
- 新疆2017年上半年银行从业《法规与综合能力》:通货膨胀考试试题
- 酒店服务(中餐宴会摆台与服务、客房中式铺床)技能大赛比赛规程及评分标准
- 雷达水位传感器一般知识
- 杰克伦敦《海狼》中莫德布鲁斯特的女性主义解读
- 尔雅通识-社会心理学作业完整版
- 2019版高考地理总复习第十五单元区域经济发展、区际联系与区域协调发展第一讲区域农业发展——以我国东北地