第2章多元回归分析

更新时间：2023-08-14 14:37:01 阅读量：人文社科文档下载

说明：文章内容仅供预览，部分内容可能不全。下载后的文档，内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的，是否完整无缺。

第2章属下玫瑰推荐度：
相关推荐

计量经济学课程课件赵西亮

第二章多元回归分析:估计y = β0 + β1x1 + β2x2 + . . . βkxk + u

计量经济学课程课件赵西亮

Multiple Regression Analysisy = β0 + β1x1 + β2x2 + . . . βkxk + u 1. Estimation

计量经济学课程课件赵西亮

Parallels with Simple Regressiony = β0 + β1x1 + β2x2 + . . . βkxk + u

β0 is still the intercept β1 to βk all called slope parametersu is still the error term (or disturbance) Still need to make a zero conditional mean assumption, so now assume that E(u|x1,x2, …,xk) = 0 Still minimizing the sum of squared residuals, so have k+1 first order conditions

计量经济学课程课件赵西亮

Obtaining OLS EstimatesIn the general case with k independent variables, from the first order condition, we can get k + 1 we seek estimates β + βunknowns the equition linear equations in k0 ,11 ,K , β k in β 0 , β1 ,K , β k : n + y = β 0 +β1 x1 L + β k xk ∑ yi β0 minimize theβsum of squared residuals: therefore, β1 xi1 L k xik = 0 i =n 1 2 n β x L β x ∑xi1yi yi β0β0 1βi11xi1 L kβ kikxik = 0 ∑

∑ x ( y βMi =1 n i2 i

i =1 i =1 n

( ((

) ))

β1 xi1 L β k xik = 0

)

∑ x ( y βi =1 ik i

β1 xi1 L β k xik = 0. 04

)

计量经济学课程课件赵西亮

Obtaining OLS Estimates, cont. y = β 0 + β 1 x1 + β 2 x 2 L + β k x kThe above estimated equation is called the OLS regression line or the sample regression function (SRF) the above equation is the estimated equation, is not the really equation. The really equation is population regression line which we don’t know. We only estimate it. So, using a different sample, OLS slope estimates OLS intercept estimate we can get another different estimated equation line. The population regression line is

E ( y | x) = β 0 + β1 x1 + β 2 x2 L + β k xk5

计量经济学课程课件赵西亮

Interpreting Multiple Regression

y = β 0 + β1 x1 + β 2 x2 + ... + β k xk , so y = β x + β x + ... + β x ,1 1 2 2 k k

so holding x2 ,..., xk fixed implies that y = β x , that is each β has1 1

a ceteris paribus interpretation6

计量经济学课程课件赵西亮

An Example (Wooldridge, p76)The determination of wage (dollars per hour), wage:Years of education, educ Years of labor market experience, exper Years with the current employer, tenure

The relationship btw. wage and educ, exper, tenure:wage=β0+β1educ+β2exper+β3tenure+u log(wage)=β0+β1educ+β2exper+β3tenure+u

The estimated equation as below:wage=2.873+0.599educ+0.022exper+0.169tenure log(wage)=0.284+0.092educ+0.0041exper+0.022tenure

The STATA commandUse [path]wage1.dta (insheet using [path]wage1.raw/wage1.txt) Reg wage educ exper tenure Reg lwage educ exper tenure7

计量经济学课程课件赵西亮

A “Partialling Out” Interpretation

Consider the case where k = 2, i.e. y = β + β x + β x , then β1 = (∑ ri1 yi )0 1 1 2 2 2 i1

∑ r

, where ri1 are

the residuals from the estimated regression x1 = γ0 + γ2 x28

计量经济学课程课件赵西亮

“Partialling Out” continuedPrevious equation implies that regressing y on x1 and x2 gives same effect of x1 as regressing y on residuals from a regression of x1 on x2 This means only the part of xi1 that is uncor

related with xi2 are being related to yi so we’re estimating the effect of x1 on y after x2 has been “partialled out”9

计量经济学课程课件赵西亮

The wage determinationsThe estimated equation as below:wage=2.873+0.599educ+0.022exper+0.169tenure log(wage)=0.284+0.092educ+0.0041exper+0.022tenure

Now, we first regress educ on exper and tenure to patial out the exper and tenure’s effects. Then we regress wage on the residuals of educ on exper and tenure. Whether we get the same result.?educ=13.575-0.0738exper+0.048tenure wage=5.896+0.599resid log(wage)=1.623+0.092resid

We can see that the coefficient of resid is the same of the coefficien of the variable educ in the first estimated equation. And the same to log(wage) in the second equation.10

计量经济学课程课件赵西亮

Simple vs Multiple Reg Estimate~ ~ ~=β +β x Compare the simple regression y 0 1 1 with the multiple regression y = β 0 + β1 x1 + β 2 x2 ~ Generally, β1 ≠ β1 unless : β = 0 (i.e. no partial effect of x ) OR2 2

x1 and x2 are uncorrelated in the sample11

计量经济学课程课件赵西亮

The wage determinations: exempleThe estimated equation as below:wage=2.873+0.599educ+0.022exper+0.169tenure log(wage)=0.284+0.092educ+0.0041exper+0.022tenure

The estimated equations without tenurewage=3.391+0.644educ+0.070exper log(wage)=0.217+0.098educ+0.0103exper wage=0.905+0.541educ log(wage)=0.584+0.083educ12

计量经济学课程课件赵西亮

Goodness-of-FitWe can think of each observation as being made up of an explained part, and an unexplained part, yi = yi + ui We then define the following :

∑ ( y y ) is the total sum of squares (SST) ∑ ( y y ) is the explained sum of squares (SSE) ∑ u is the residual sum of squares (SSR)2 2 i i 2 i

Then SST = SSE + SSR13

计量经济学课程课件赵西亮

Goodness-of-Fit (continued)How do we think about how well our sample regression line fits our sample data? Can compute the fraction of the total sum of squares (SST) that is explained by the model, call this the R-squared of regression R2 = SSE/SST = 1 – SSR/SST14

计量经济学课程课件赵西亮

Goodness-of-Fit (continued)We can also think of R 2 as being equal to the squared correlation coefficient between the actual yi and the values yi R2

(∑ ( y y )(y y )) = (∑ ( y y ) )(∑ (y y ) )2 i i 2 2 i i

计量经济学课程课件赵西亮

More about R-squaredR2 can never decrease when another independent variable is added to a regression, and usually will increase Because R2 will usually increase with the number of independent variables, it is not a good way to compare models16

计量经济学课程课件赵西亮

An Example: an crime model (w p82)What determines the person to commit crime? (the dependent variable is the number of times the man was arrested during 1986, narr86)pcnv, the proportion of arrests that led to conviction. avgsen, average sentence length served for prior convictions. ptime86, months spent in prison in 1986. qemp86, the number of quarters during which the man was employed in 1986.

The Regression model

narr86=β0+β1pcnv+β2avgsen+β3ptime86+β4qemp86+u

The Estimated equation

narr86=0.7120.150pcnv-0.034ptime86-0.104qemp86 n=2,725 R2=0.0413 narr86=

0.7070.151pcnv+0.0074avgsen-0.037ptime86-0.103qemp86 n=2,725 R2=0.0422 The fact the estimated model explain only about 4.2% of the variable in narr86 does not necessarily mean that the equation is useless. Generally, a low R2 indicates that it is hard to predict individual outcomes on y with much accuracy.17

Another word

计量经济学课程课件赵西亮

Assumptions for UnbiasednessPopulation model is linear in parameters: y = β0 + β1x1 + β2x2 +…+ βkxk + u We can use a random sample of size n, {(xi1, xi2,…, xik, yi): i=1, 2, …, n}, from the population model, so that the sample model is yi = β0 + β1xi1 + β2xi2 +…+ βkxik + ui E(u|x1, x2,… xk) = 0, implying that all of the explanatory variables are exogenousE(u|x)=0 Cov(ux)=0, E(ux)=0

None of the x’s is constant, and there are no exact linear relationships among themIt does allow the independent variables to be correlated; they just cannot be perfectly linear correlated. Student performance: avgscore=β0+β1expend+β2avginc+u Consumption function: consum=β0+β1inc+β2inc2+u But, the following is invalid: log(consum)=β0+β1inc+β2inc2+u

计量经济学课程课件赵西亮

Unbiasedness of OLS estimationUnder the four assumptions above, we can get

E β j = β j , j = 0,1,L , k

we prove the result only for β1 : the proof for the other parameters is virtually identical. we first write the β1 as following: n n n n 2 = r y r = r (β + β x + β x +L+ β x + u ) r2 β1

( )i n

∑i =1 n i =1 n

i1 i

∑i =1 n i =1

∑i =1

1 i1

2 i2 n

k ik n

∑i =1 2 i1

= β 0 ∑ rii1 + β1 ∑ rii1xi1 + β 2 ∑ rii1xi 2 + L + β k ∑ rii1xik + ∑ rii1ui = β1 ∑ ri1 ( xi1 + ri1 ) + ∑ ri1uii =1 n i =1 i =1

∑ ri =1

2 i1

= β1 ∑ r + β1 ∑ ri1 xi1 + ∑ ri1uii =1 2 i1 i =1 i =1

i =1

∑ rn i =1

∑ ri =1

2 i1

= β1 + ∑ ri1uii =1

∑ ri =1

2 i1

therefore, n = Eβ + r u E β1 1 ∑ i1 i i =1

( )

n ∑ r = β1 + ∑ ri1E (ui ) i =1 i =1 n 2 i1

∑ ri =1

2 i1

E βj = βj19

( )

βj

计量经济学课程课件赵西亮

Too Many or Too Few VariablesWhat happens if we include variables in our specification that don’t belong?

suppose we specify the model as y = β 0 + β1 x1 + β 2 x2 + β 3 x3 + u , and this model satisfies the four assumptions. but the x3 has no effect on y after control x1 , x2 that is, the really model is E ( y | x1 , x2 , x3 ) = E ( y | x1 , x2 ) = β 0 + β1 x1 + β 2 x2 the estimated model including x3 is y = β 0 + β1 x1 + β 2 x2 + β 3 x3 the estimated parameters is unbiased, there is no effect.There is no effect on our parameter estimate, and OLS remains unbiased What if we exclude a variable from our specification that does belong? OLS will usually be biased20

计量经济学课程课件赵西亮

Omitted Variable Bias

Suppose the true model is given as y = β 0 + β1 x1 + β 2 x2 + u, but we ~ ~ ~ = β + β x + u, then estimate y

β1

∑ (x x ) y = ∑ (x x )i1 1 2 i1 1

1 1

本文来源：https://www.bwwdw.com/article/hmtj.html

相关文章：