1. Econometrics(计量经济学):

the social science in which the tools of economic theory, mathematics, and statistical inference are applied to the analysis of economic phenomena.

the result of a certain outlook on the role of economics, consists of the application of mathematical statistics to economic data to lend empirical support to the models constructed by mathematical economics and to obtain numerical results.

2. Econometric analysis proceeds along the following lines计量经济学分析步骤 1)Creating a statement of theory or hypothesis.建立一个理论假说 2)Collecting data.收集数据

3)Specifying the mathematical model of theory.设定数学模型

4)Specifying the statistical, or econometric, model of theory.设立统计或经济计量模型

5)Estimating the parameters of the chosen econometric model.估计经济计量模型参数

6)Checking for model adequacy : Model specification testing.核查模型的适用性:模型设定检验

7)Testing the hypothesis derived from the model.检验自模型的假设 8)Using the model for prediction or forecasting.利用模型进行预测 ? Step2:收集数据 ? Three types of data三类可用于分析的数据

1)Time series(时间序列数据):Collected over a period of time, are collected at regular intervals.按时间跨度收集得到

2)Cross-sectional截面数据:Collected over a period of time, are collected at regular intervals.按时间跨度收集得到

3)Pooled data合并数据(上两种的结合) ? Step3:设定数学模型

1. plot scatter diagram or scattergram 2. write the mathematical model

? Step4:设立统计或经济计量模型 ? CLFPR is dependent variable应变量 ? CUNR is independent or explanatory variable独立或解释变量(自变量) ? We give a catchall variable U to stand for all these neglected factors ? In linear regression analysis our primary objective is to explain the behavior of the dependent variable in relation to the behavior of one or more other variables, allowing for the data that the relationship between them is inexact.线性回归分析的主要目标就是解释一个变量(应变量)与其他一个或多个变量(自变量)只见的行为关系,当然这种关系并非完全正确 ? Step5:估计经济计量模型参数 ? In short, the estimated regression line gives the relationship between average CLFPR and CUNR 简言之,估计的回归直线给出了平均应变量和自变量之间的关系 ? That is, on average, how the dependent variable responds to a unit change in the

independent variable.单位因变量的变化引起的自变量平均变化量的多少。 ? Step6:核查模型的适用性:模型设定检验

The purpose of developing an econometric model is not to capture total reality, but just its salient features.

? Step7:检验自模型的假设

Why do we perform hypothesis testing?

We want to find our whether the estimated model makes economic sense and whether the results obtains conform with the underlying economic theory.


1. The meaning of regression(回归)

Regression analysis is concerned with the study of the relationship between one variable called the dependent or explained variable, and one or more other variables called independent or explanatory variables. 2. Objectives of regression

1)Estimate the mean, or average, and the dependent values given the independent values

2)Test hypotheses about the nature of the dependence -----hypotheses suggested by the underlying economic theory

3)Predict or forecast the mean value of the dependent variable given the values of the independents

4)One or more of the preceding objectives combined 3. Population Regression Line(PRL)

In short, the PRL tells us how the mean, or average, value of Y is related to each value of X in the whole population

4. The dependence of Y on X, technically called the regression of Y on X. 5. How do we explain it?

A student’s S.A.T. score, say, the ith individual, corresponding to a specific family income can be expressed as the sum of two components 1) The component can be called the systematic, or deterministic, component. 2) May be called the nonsystematic or random component 6. What is the nature of U(stochastic error) term?

1)The error term may represent the influence of those variables that are not explicitly included in the model.误差项代表了未纳入模型变量的影响

2)Some intrinsic randomness in the math score is bound to occur that can not be explained even we include all relevant variables.即使模型包括了决定性数学分数的所有变量,内在随机性也不可避免,这是做任何努力都无法解释的。 3)U may also represent errors of measurement. U还代表了度量误差

4)The principle of Ockham’s razor - the description be kept as simple as possible until proved inadequate - would suggest that we keep our regression model as simple as possible.“奥卡姆剃刀原则”,描述应该尽可能简单,只要不遗漏重要信息。这表明回归模型应尽可能简单。

7. How do we estimate the PRF(population regression function)?

Unfortunately, in practice, We rarely have the entire population in our disposal,

often we have only a sample from this population.

8. Granted that the SRF is only an approximation of PRF. Can we find a method or a

procedure that will make this approximation as close as possible? SRF仅仅是PRF的近似,那么能不能找到一种方法使这种近似尽可能接近真实呢? 9. Special meaning of “linear” 1)Linearity in the variables变量线性

The conditional mean value of the dependent variable is a linear function of the independent variables

2)Linearity in the Parameters参数线性

The conditional mean of the dependent variable is a linear function of the parameters, the B’s; it may or may not be linear in the variables.


1. Unless we are willing to assume how the stochastic U terms are generated, we will not be able to tell how good an SRF is as an estimate of the true PRF.只有假定了随机误差的生成过程,才能判定SRF对PRF拟合的是好是坏。 2. Classical Linear Regression Model

1) Assumption 1: The regression model is linear in the parameters. It may or may not be linear in the variables.回归模型是参数线性的,但不一定是变量线性的。 2) Assumption 2: The explanatory variables X is uncorrelated with the disturbance term U. X’s are nonstochastic, U is stochastic. 解释变量X与扰动误差项u不相关. X是非随机的,U是随机的。

3) Assumption 3: Given the value of Xi, the expected, or mean value of the disturbance term U is zero.给定Xi,扰动项的期望或均值为零。

Disturbance U represent all those factors that are not specifically introduced in the model干扰项U代表了所有未纳入模型的影响因素。

4) Assumption 4:The variance of each Ui is constant, or homoscedastic. U的方差为常数,或同方差。

? Homoscedasticity(同方差):

a. This assumption simply means that the conditional distribution of each Y population corresponding to the given value of X has the same variance. 该假定表明,与给定的X相对应的每个Y的条件分布具有同方差。

b. The individual Y values are spread around their mean values with the same variance.即每个Y值以相同的方差分布在其均值周围。

5) Assumption 5:There is no correlation between two error terms, this is the assumption of no-autocorrelation.无自相关假定,即两个误差项之间不相关。 6) Assumption 6:The regression model is correctly specified.回归模型是正确假定的。There is no specification bias or specification error in the model.实证分析的模型不存在设定偏差或设定误差。

? This assumption can be explained informally as follows. An econometric investigation begins with the specification of the econometric model underlying the phenomenon of interest.

3.Variances and Standard errors of OLS estimators普通最小二乘估计量的方差与标准误:One immediate result of the assumptions introduced is that they enable us to

estimate the variances and standard errors of the OLS estimators given in Eq.(2.16) and (2.17).

4.We should know:

? Variances of the estimators

? Standard errors of the estimators 5.What is the value of σ

? The homoscedastic σ is estimated from formula 6.Standard Error of the Regression (SER) 回归标准误

? Is simply the standard deviation of the Y values about the estimated regression line. Y值偏离估计回归的标准差。 7.Summary of math S.A.T.score function 1) Interpretation

? The standard deviation, or standard error, is 0.000245, is a measure of variability of b2 from sample to sample.

? If we can say that our computed b2 lies within a certain number of standard deviation units from the true B2, we can state with some confidence how good the computed SRF is as an estimator of the true PRF. 2)Sampling Distribution 抽样分布

Once we determine the sampling distribution of our two estimators, the task of hypothesis testing becomes straightforward.一旦确定了两个估计量的抽样分布,那么假设检验就是举手之劳的事情。 8.Why do we use OLS ?

? The properties of OLS estimators

? The method of OLS is used popularly not only because it is easy to use but also because it has some strong theoretical properties. OLS法得到广泛使用,不仅是因为它简单易行,还因为它具有很强的理论性质。 9.Gauss-Markov theorem 高斯-马尔科夫定理

Given the assumptions of the classical linear regression model (CLRM), the OLS estimators have minimum variance in the class of linear estimators.The OLS

estimators are BLUE (best linear unbiased estimators)满足古典线性模型的基本假定,则在所有线性据计量中,OLS估计两具有最小方差性,即OLS是最优线性无偏估计量(BLUE)

10.BLUE property 最优线性无偏估计量的性质

1) B1 and B2 are linear estimators. B1和B2是线性估计量

2) They are unbiased , that is E(b1)=B1, E(b2)=B2. B1和B2是无偏估计两

3) The OLS estimator of the error variance is unbiased.误差方差的OLS估计量是无偏的

4) b1 and b2 are efficient estimators.B1和B2是有效估计量

Var(b1) is less than the variance of any other linear unbiased estimator of B1 Var(b2) is less than the variance of any other linear unbiased estimator of B2 11.Monte Carlo simulation 蒙特卡洛模拟 ? Do the experiment at lab

? Do it by Excell. =NORMINV(RAND(),0,2)

? Do it by matlab.= NORMINV(uniform(),MU,SIGMA)

? Do it by Stata. =invnorm(uniform())

12.Central Limit Theorem’s 中心极限定理

If there is a large number of independent and identically distributed (iid) random variables, then, with a few exceptions , the distribution of their sum tends to be a normal distribution as the number of such variables increases indefinitely. 随着变量个数的无限增加,独立同分布随机变量近似服从正态分布 13.Recall

U, the error term represents the influence of all those forces that affect Y but are not specifically included in the regression model because there are so many of them and the individual effect of any one such force on Y may be too minor.


If all these forces are random, if we let U represent the sum of all these forces, then by invoking the CLT, we can assume that the error term U follows the normal

distribution.如果所有这些影响因素都是随机的,用U代表所有这些影响因素之和,那么根据中心极限定理,可以假定误差项服从正态分布。 14.Another property of normal distribution另一个正态分布的性质

Any linear function of a normally distributed variable is itself normally distributed. 正态变量的性质函数仍服从正态分布。 15.Hypothesis testing 假设检验

Having known the distribution of OLS estimators b1 and b2, we can proceed the topic of hypothesis testing. 16.Null hypothesis 零假设

“zero” null hypothesis is deliberately chosen to find out whether Y is related to X al all, which is also called straw man hypothesis.之所以选择这样一个假设是为了确定Y是否与X有关,也称为稻草人假设。

17.We need some formal testing procedure to reject or receive the null hypothesis and make the skeptical guys shut up.需要正规的检验过程拒绝或接受零假设

18. If our null hypothesis is B2=0 and the computed b2=0.0013, we can find out the probability of obtaining such a value from the Z, the standard normal distribution.如果零假设为B2=0,计算得到b2=0.0013,那么根据标准正态分布Z,能够求得获此b2值的概率If the probability is very small, we can reject the null hypothesis.如果这个概率非常小,则拒绝零假设。If the probability is larger, say , greater than 10 percent, we may not reject the null hypothesis.如果这概率比较大,比如大于10%,就不拒绝零假设。

19.We don’t know the σ2

2We must know the true σ2, but we can estimate it by using ?

20.What will happen if we replace σby its estimator σ-hat





Yt?E(Yt)?ut Any individual Y value can be expressed as the sum of two


Any individual Y value can be expressed as the sum of two components: 任何一个Y值可以表示成两部分之和 ? a systematic or deterministic,components


(B1?B2X2t?B3X3t) ,Which is simply its mean 也就是Y的均值E(Yt)

(B1?B2X2t?B3X3t)? Ut , which is the nonsystematic or random component determined by factors other

than X2 and X3.非系统成分或随即成分 Ut ,由除X2,X3以外的因素决定。 3、The meaning of partial regression coefficient偏回归系数的含义

The regression coefficients B2 and B3 are known as partial regression or partial slope coefficients. B2,B3称为偏回归系数或偏斜率系数

① The meaning of Partial regression coefficient is as follows: B2 measures the change in the mean value of Y, E(Y), per unit change in X2, holding the value of X3 constant. B2度量了在X3保持不变的情况下,X2单位变动引起Y均值E(Y)的变化量。

② Likewise,B3 measures change in the mean value of Y per unit change in X3 holding the value of X2 constant.同样的,B2度量了X2保持不变的情况下,X3单位变动引起Y均值E(Y)的变化量。

③ Uniqueness:特殊性质

In the multiple regression model在多元回归模型中

we want to find out what part of the change in the average value of Y can be directly attributable to X2 and what part to X3.我们想要知道的是Y均值的变动有多大比例“直接”来源于X2,多大比例“直接”来源于X3。 A example:


? The meaning of B2

B2=-1.2 indicates that the mean value of Y decrease by 1.2 per unit increase in X2 when X3 is held constant, in this example it is held constant at the value of 10.

B2是斜率,表示当X3为常数时,X2每增加1个单位,Y的均值将减少1.2个单位——本例中,X3为常数10 ? The meaning of B3

Here the slope coefficient B3=0.8 means that the mean value of Y increase by 0.8 per unit increase in X3 when X2 is held constant. Here it is held constant at the value of 5. 斜率B3=0.8,表示X2为常量时,X3每增加1个单位,Y的平均值增加0.8个单位,(这里假设X2等于5)

4、In short,A partial regression coefficient reflects the (partial) effect of one

explanatory variable on the mean value of the dependent variable when the values of other explanatory variables included in the model are held constant. 总之,偏回归系数反映了当模型中其他解释变量为常量时,某个解释变量对应变量均值的影响。 5、uniqueness

This unique feature of multiple regression enables us not only to include more than one explanatory variable in the model but also to “isolate” or “disentangel” the effect of each X variable on Y from the other X variables included in the model.


6、Assumptions of the multiple linear regression model多元线性回归模型的若干假定

In order to estimate the regression coefficients of the multiple regression model, we will continue to operate within the framework of the classical linear regression model (CLRM) to use the ordinary least squares (OLS) to estimate the coefficients.为了对多元回归模型的参数进行估计,我们沿用古典线性回归模型的基本框架,并利用普通最小二乘法(OLS)进行参数估计。

A 4.1 The regression model is linear in the parameters and is correctly specified. A4.2 X2 and X3 are uncorrelated with the disturbance term U.

If X2 and X3 are non-stochastic, this assumption is automatically fulfilled. A4.3 The error term U has a zero mean valueE(ui)?0

A4.4 Homoscedasticity, the variance of U is constant.Var(ui)?? A4.5 No autocorrelation exists between the error term Ui and Uj


A4.6 No exact collinearity exists between X2 and X3

There is no exact linear relationship between the two explanatory variables. Cov(X2,X3)?0

A4.7 The error term U follows the normal distribution with mean zero and variance σ2


7、Why we make assumptions?

We make these assumptions to facilitate the development of the subject. 为了确保能够使用OLS法估计模型的参数 8、No Multicollinearity:无多重共线性

There is no exact linear relationship between the explanatory variables X2 and X3.This is the assumption of no collinearity or no multicollinearity. 解释变量X2,X3不存在严格的共线性,这个假定也称为无共线性或者无多重共线性假设 No perfect collinearity means that a variable, say, X2, cannot be expressed as an exact linear function of another variable无完全共线性通俗的解释是,变量X2不能表示为另一变量X3的线性函数 9、Troublesome

? This is one equation with two unknowns we need two (independent) equations to obtain

unique estimates of B2 and B3

(we have only one A, but we have two B to solve.)

? Now even if we can estimate and obtain an estimate of A, there is no way that we can get

individual estimates of B2 and B3 from the estimated A.

? We cannot asses the individual effect of X2 and X3 on Y.But this is hardly surprising, for

we really do not have two independent variables in the model.

不能估计解释变量X2,X3各自对应变量Y的影响,没什么好奇怪的,因为在模型中确实没有两个独立的变量。 10、OLS principle最小二乘法

The OLS principle chooses the value of the unknown parameters in such a way that the

eresidual sum of squares (RSS) ?2tAs small as possible.

11、BLUE:Under assumed conditions the OLS estimators are best linear unbiased estimators 在古典线性回归模型的基本假定下,双变量模型的OLS估计量是最优无偏估计量 Each regression coefficient estimated by OLS is linear and unbiased. 每一个回归系数都是线性的和无偏的

On the average it coincides with the true value. 平均而言,他与真实值一致

Among all such linear unbiased estimators, the OLS estimators have the least possible

variance so that the true parameter can be estimated more accurately than by competing linear unbiased estimators.在所有线性无偏估计量中,OLS估计量具有最小方差性,所以,OLS估计量比其他线性无偏估计量更准确地估计了真实的参数值。 In short, the OLS estimators are efficient.简言之,OLS是最有效的

12、In two-variable case we saw that r^2 measures the goodness of fit of the fitted sample regression line (SRL) r^2度量了样本回归直线(SRL)的拟合优度

13、In three-variable case,We would like to know the proportion of the total variation in Y (

e?yt2) explained by X2 and X3 jointly.在三变量模型中,我们用多元判定系数度量X2


14、In multiple regression model, R can be interpreted as the degree of linear association between Y and all the X variables jointly.

15、Antique clock auction revision(Eviews)

Let Y= auction price, X2= age of clock, X3= number of bidders


16、Interpretation of the results回归结果的解释:The interpretation of the slope coefficient of about 12.74 (b2) means that holding other variables constant, if the age of the clock goes up by a year, the average price of the clock will go up by about 12.74 $.

R2?0.8906,????F?118.058517、The test of significance approach显著性检验法

① we develop a test statistic

② find out its sampling distribution ③ choose a level of significance α

④ determine the critical value (s) of the test statistic at the chosen level of significance ⑤ compare the value of the test statistic obtained from the sample at hand with the critical value (s)

⑥ reject the null hypothesis if the computed value of the test statistic exceeds the critical value (s)

18、 if the test statistic has a negative value, we consider its absolute value and say that if the absolute value of the test statistic exceeds the critical value, we reject the null hypothesis. 19、We can find the p value of the test statistic and reject the null hypothesis if the p value is smaller than the chosen αvalue


20、Testing the joint hypothesis that B2=B3=0 or R2=0:检验联合假设 Null hypothesis


① This null hypothesis is a joint hypothesis that B2 and B3 are jointly or simultaneously equal to zero.这个零假设成为联合假设,即B2,B3联合或同时为令(而不是单独为零)

② This hypothesis states that the two explanatory variables together have no influence on Y.这个假设表明两个解释变量联合对应变量Y无影响。

③ This is the same as saying that等同于



? The temptation here is to state that since individually b2 and b3 are statistically different from zero in the present example, then jointly or collectively they also must be statistically different from zero, that we reject the null hypothesis.这里潜在的逻辑是,既然b2,b3各自均显著不为零,那么它们一定也联合或集体显著不为零,即拒绝


? In other words, since age of the antique clock and the number of bidders at the auction, each has a significant effect on the auction price, together they also must have a significant effect on the auction price.既然钟表年代和竞标人数各自都对拍卖价格有显著影响,那么它们一起也一定会对拍卖价格有显著影响

? When multicollineratiy exists, in a multiple regression one ore more variables individually have no effect on the dependent variable but collectively they have a significant impact on it.在多元回归模型中,一个或多个解释变量各自对应变量没有影响,但却联合对应变量有影响。 ? This means that the t-testing procedure discussed previously, although valid for testing the statistical significance of an individual regression coefficient, is not valid for testing the joint hypothesis.这意味着前面讨论的t检验显然对于检验单个回归系数的统计显著性是有效的,但对于联合假设却是无效的。

22、F test statistic(会考小题10分)

? F follows F distribution with 2 and (n-3) d.f. in the numerator and denominator, respectively. 服从分子自由度为2,分母自由度为(n-3)的F分布

? In general, if the regression model has k explanatory variables including the intercept term, the

F ratio has (k-1) d.f., in the numerator and (n-k) d.f. in the denominator.一般地,如果回归模型有k个解释变量(包括截距),则F值的分子自由度为(k-1),分母自由度为(n-k)

? How can we use the F ratio to test the joint hypothesis that both X2 and X3 have no impact on Y ?如何利用给出的F值检验联合假设:X2和X3对Y没有影响呢?

The answer is evident. If the numerator is larger the its denominator. If the variance of Y explained by the regression (i.e. by X2 and X3) is larger than the variance not explained by the regression. The F ratio is greater than 1.如果分子比分母大,即如果Y由回归解释的部分(即由X2和X3解释部分)比未被回归结实的部分大,则F值将大于1。

? Therefore, as the variance explained by the X variables becomes increasingly larger, relative to the unexplained variance, the F ratio will be increasingly larger, too. 因此,随着解释变量对应变量Y变异的解释比例逐渐增大,F值也将逐渐增大。

? Thus an increasingly large F ratio will be evidence against the null hypothesis that the two (or more) explanatory variables have no effect on Y. 因此,F值越大,则拒绝零假设的理由越充分:两个(或多个)解释变量对应变量Y无影响。

? We compare this computed F value with the critical F value for 2 and (n-3) d.f. at the chosen level of α, the probability of committing a type Ⅰerror.将计算出的F值与临界F值(分子自由度为2,分母自由度为n-3)做比较... 23、F and R2

? This equation show how F and R^2 are related. These two statistics vary directly, when R^2=0 (i.e. no relationship between Y and the X variables), F is zero ipso facto. 两个变量同方向变动,当R^2=0(即Y与解释变量X不相关)时,F为0 ? The larger R^2 is , the greater the F value will be.

? In the limit when R^2=1, the F value if infinite. R^2取极限值1时,F值趋于无穷大

? Thus the F test discussed earlier, which is a measure of the overall significance of the estimated regression line, is also a test of significance of R^2, that is, whether R^2 is different from zero. 因此,F检验也可用于检验R^2的显著性——R^2是否显著不为零。

? In other words, testing the null hypothesis all slope coefficients are equal to zero is equivalent to testing the null hypothesis that R^2 is zero.

检验零假设式与检验零假设(总体的)R^2为零是等价的 24、Specification error设定误差

In our multiple regression that both the age of the clock and the number of bidders variables were individually as well as collectively important influences on the auction price. 钟表年代和竞标人数无论是单独地还是联合地都对拍卖价格有重要影响

25、Comparing two R^2 values: the adjusted R^2比较两个R^2值:校正的判定系数

By examining the R^2 values of these two-variable model and the three-variable model, we can find out that the R^2 value is 0.5325 for age-model , 0.1549 for the bidders-model, both are

smaller than that of the three-variable model (0.8906). 检查双变量回归模型与三变量回归模型的


26、The features of the adjusted R^2ba 校正的判定系数 性质:

1. if k>1, R^2 bar ≤R^2, that is , as the number of explanatory variables increases in a model, the adjusted R^2ba become increasingly smaller than the unadjusted R^2. There seem to be a “penalty” involved in adding more explanatory variables to a regression model.



2、although the unadjusted R2 is always positive, the adjusted R2 can on occasion turn out to be negative.This is due to its special formula form.

虽然未校正判定系数R^2总为正,但校正判定系数R^2bar可能为负 27、When does adjusted R2 increase ?什么时候增加新的解释变量

R2 bar will increase if the |t| (absolute t) value of the coefficient of the added variable is larger than 1, where the t value is computed under the null hypothesis that the population value of the said coefficient is zero. 如果增加变量系数的 |t| 值大于1,R^2bar就会增加,这里的t值是在零假设“真实系数为零”下计算得到的 28、Some interesting facts

? If you square the t value of 5.8457, we get (5.8457)^2=34.1722, which is about the same as the F value of 34.1723 shown before. 如果将t值平方,与F值几乎相等 ? It is not surprising because


29、The null hypothesis here is that the restrictions imposed by the restricted model are valid.


30、If the F value estimated from its statistic exceeds the critical F value at the chosen level of significance, we reject the restricted regression. That is, in this situation, the restrictions imposed by the restricted model are not valid.



1、In this textbook, our concern is with models that are linear in parameters (LIP)参数线性模型, 2、Within the confines of LIP

① log-linear or constant elasticity models双对数模型或不变弹性模型 ② semilog models半对数模型 ③ reciprocal models倒数模型

④ Polynomial regression models多项式回归模型

⑤ regression-through-the-origin, or zero intercept, model过原点的回归模型,或零截距模型 3、How to measure elasticity : the Log-linear model如何度量弹性:双对数模型

To ease the algebra, we will introduce the error term ui later.为了使代数形式更简洁,引入随机误差项


1、Artificial variable vs. dummy variable定性变量和虚拟变量

One method of “qualifying” these attributes is by constructing artificial variables that take on values of 0 or 1 indicating the presence (or possession) of that attribute.


? For example, 1 may indicate that a person is a female and 0 may designate a male, or 1 may indicate that a person is a college graduate and that 0 he or she is not, or 1 may indicate that membership in the Democratic party and 0 membership in the Republican party.


1. The attributes of a good model 模型判断的一些标准。 ? Parsimony简约性

A model can never completely capture the reality ; some amount of abstraction or simplification is inevitable in any model building. The Occam’s razor, or the principle of parsimony suggesting that a model be kept as simple as possible.模型永远无法完全把握现实,在建模过程中,一定程度的抽象或简化是不可避免的。简单优于复杂或者简约原则表明模型应尽可能简单。 ? Identifiability可识别性

This means that , for a given set of data, the estimated parameters must have unique values or , what amounts to the same thing, there is only one estimate per parameter.对于给定的一组数据,估计的参数值必须是唯一的,或者说,每个参数只有一个估计值。 ? Goodness of fit拟合优度

Since the basic thrust [(活动、思想的)要点,主要内容,要旨 ]of regression analysis is to explain as much of the variation in the dependent variable as possible by explanatory variables included in the model, a model is judged to be good if this explanatory, as measured, say by the adjusted R2 is as high as possible.回归分析的基本思想是用模型中所包含的解释变量来尽可能地解释应变量的变化,如可用校正的R^2度量拟合优度,R^2越高,模型越好 ? Theoretical consistency理论一致性

a. No matter how high the goodness of fit measures, a model may not be judged to be good if one or more coefficients have the wrong signs. In short, in considering a model we should have some theoretical underpinning to it; “measurement without theory” often leads to very disappointing results.无论拟合优度有多高,一单模型中的一个或多个系数的符号有误,就不能说是一个好的模型。简而言之,在构建模型时,必须有一定的理论基础,“没有理论基础的度量”经常是导致令人失望的结果。 ? Predicative power预测能力

a. the only relevant test of the validity of a hypothesis (model) is comparison of its prediction with experience. 对假设(模型)有效性的唯一检验就是将预测值与经验值相比较。 2. In particular, we will discuss the following specification errors. 以下错误 1) Omissions of a relevant variable (s) 遗漏相关变量

2) Inclusion of an unnecessary variable (s) 包括不必要变量 3) Adopting the wrong function form 采用错误的函数形式 4) Errors of measurement 度量误差


1. Introduction

In chapter 4 we noted that one of the assumptions of the classical linear regression model (CLRM) is that there is no perfect multicollinearity: no exact linear relationships among explanatory variables, X’s, included in a multiple regression.

在第四章曾指出,古典线性回归模型(CLRM)的假设质疑是不存在完全多重共线性——即多元回归中的解释变量X之间不存在完全的线性关系。 2. Interpretation

In other words, the income variable X3 and the price variable X2 are perfectly linearly related; that is, we have perfect collinearity (or multicollinearity).换句话说,收入变量(X3)与价格变量(X2)完全线性相关,即完全共线性。(或多重共线性) 3. Troubles

To put it bluntly, in cases of perfect multicollinearity, estimation and hypothesis testing about individual regression coefficients in a multiple regression are not possible. It is a dead end issue. Of course, as Eqs.(8.5) and (8.6) show, we can obtain estimates of a linear combination (i.e., the sum of difference) of the original coefficients , but not of each of them individually.在完全多重共线性情况下,不可能对多元回归模型中的单个回归系数进行估计和假设检验。这是一个死胡同。当然,正如方程(8.5) 和(8.6)所示,可以得到原始系数线性组合(例如和或看差)的一个估计值,但无法活的每个系数的估计值。

4. The case of near, or imperfect, multicollinearity近似或者不完全多重共线性的情形

The case of perfect multicollinearity is a pathological extreme. In most applications involving economic data two or more explanatory variables are not exactly linearly related but can be appropriately so.完全多重共线性是一个极端情形。在用经济数据进行分析时,两个解释变量之间常常表现不出完全线性相关,但近似线性相关。 5. High multicollinearity高度共线性

That is , collinearity can be high but not perfect. This is the case of near, or imperfect, or high multicollinearity. We will explain what we mean by “high” collinearity shortly. 即共线性程度很高,但不是完全共线性。这就近似,或不完全,或高度多重共线性的情形。


1. Introduction

An important assumption of CLRM is that the disturbances Ui entering the population regression function (PRF) are homoscedastic; that is, they all have the same variance, σ2.古典线性回归模型(CLRM)的一个重要假定是进入总体回归函数(PRF)的随机扰动项Ui 是同方差的,即具有相同的方差σ2

222. Symbolically we express heteroscedasticity as E(ui)??i 异方差用符号表示

3. Notice

The subscription on σ2, which is a reminder that the variance of Ui, is no longer constant but varies from observation to observation. σ2,的下标,表明Ui的方差不再是固定的,而是随观察值的不同而变化。

4. Where heteroscedasticity occurs ?

Moreover, these members may be of different sizes, such as small, medium, or large firms, or low, medium, or high income. In other words, there may be some scale effect.而且,这些样本规模不同,如小公司、中等公司或者大公司,或者是低收入、中等收入或高收入。换言之可能存在规模效应。 5. Nature 异质性

It is now generally assumed that in similar studies we can expect heteroscedasticity in the error term. as a matter of fact, in cross-sectional data involving heterogeneous units, heteroscedasticity may be the rule rather than the exception.类似的研究一般都假设扰动项存在异方差。事实上,在异质性截面数据中,一方差时常发生,并不例外。Thus, in cross-sectional studies involving investment expenditure in relation to sales, the rate of interest, etc, heteroscedasticity is generally expected if small-, medium-, and large-sized firms are sampled together.投资对销售,利率等截面分析中,如果把小、中和大型公司放在一起抽样,就可能存在异方差。Similarly, in a cross-sectional study of the average cost of production in relation to the output, heteroscedasticity is likely to be found if small-, medium-, and large-sized firms are included in the sample.类似地,

在平均成本对产出的截面研究中,如果样本包括了小、中和大型公司,也可能存在异方差。 6. Graphical examination of residuals残差的图形检验

Sometimes it is helpful to create a residual plot of the squared residuals, especially in the context of heteroscedasticity. The squared residuals can be plotted on their own or they can be plotted against one or more explanatory variables.有时候,通过残差平方图来判断异方差性。可以用残差平方对一个或多个解释变量作图。

7. White’s test proceed as follows 怀特检验步骤3

Obtain the R2 value from the auxiliary regression (9.14). Under the null hypothesis that there is no heteroscedasticity (i.e., all the slope coefficients in Eq.(9.14) are zero)求附注回归方程的R2值。在不存在异方差(即9-14 中所有斜率系数都为零)的零假设下。


1. Introduction

In chapter 9 we examined the consequences of relaxing one of the assumptions of the classical linear regression model (CLRM) — assumption of homoscedasticity. 第九章讨论了放松古典线性回归模型(CLRM)假定之一 同方差假定的后果。 2. The nature of autocorrelation 自相关的性质

The term autocorrelation can be defined as “correlation between members of observations ordered in time (as in time series data) or space (as in cross-sectional data) ”自相关的定义为,“按时间(如时间序列)或者空间(如截面数据)排列的观察值之间的相关关系”Just as heteroscedasticity is generally associated with cross-sectional data, autocorrelation is usually associated with time series data (i.e., data ordered in temporal sequence). 异方差的产生通常与截面数据有关,自相关通常与时间序列数据有关(即数据按照时间顺序排列)

3. That is, the expected value of the product of two different error terms ui and uj is zero. 两个


4. In plain English, this assumption means that the disturbance term relating to any observation

is not related to or influenced by the disturbance term relating to any other observation.这个假定意味着任一观察值的扰动项不受其他观察值扰动项的影响。

5. Figure 10-1 (a) to (d) show a distinct pattern among the u’s while Figure 10-1 (e) shows no

systematic pattern, which is the geometric counterpart of the assumption of no autocorrelation given in Eq.(10.1)

Figure 10-1 (a) to (d) 表明u中存在明显模式,而图10-1e则表明u中不存在系统模式,这也是式10-1无自相关假定的几何解释

6. reasons for autocorrelation 产生自相关的原因

There are several reasons for autocorrelation, some of which follow. a. Inertia 惯性

b. Model specification errors 模型设定误差 c. The Cobweb Phenomenon 蛛网现象 d. Data manipulation 数据处理

7. Assumptions underlying the d statistic d统计量的假设

Before proceeding to show how the computed d value can be used to determine the presence, or otherwise, of autocorrelation, it is very important to note the assumptions underlying the d statistic.在说明如何利用d值判断自相关是否存在之前,先了解d统计量的假设:

a. The regression model includes an intercept term. Therefore, it cannot be used to determine autocorrelation in models of regression through the origin.回归模型包括截距项。因此,d统计量无法判断过原点回归模型的自相关问题。

b. The X variables are nonstochastic; that is, their values are fixed in repeated sampling.变量X是非随机变量,即在重复抽样中变量X取值是固定的。

c. The disturbances ut are generated by the following mechanism:扰动项ui的生成机制:


Which states that the value of the disturbance, or error, term at time t depends on its value in time period (t-1) and a purely random term (vt), the extent of the dependence on the past value, is measured by p.表明t期的扰动项或误差项与t-1期值和一个纯随机项v1有关。P度量了对前期值的依赖程度,称为自相关系数,介于-1和1之间。This is called the coefficient of autocorrelation, which lies between -1 and 1. the mechanism, Eq.(10.6), is known as the Markov first-order autoregressive scheme or simply the first-order autoregressive scheme, usually denoted as the AR(1) scheme.式10-6称为马尔科夫一阶回归过程,简称一阶自回归过程,通常记为AR(1)。

d. the regression does not contain the lagged value of the dependent variable as one of the

