stata课堂命令讲解

更新时间:2024-06-09 05:08:01 阅读量: 综合文库 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

第六周课 多元统计分析

Manova检验

H0:各个总体的均值相同

均类分析 (平均法)

. clusteraveragelinkage price mpg weight length cluster name: _clus_1 . edit - preserve

. cluster list _clus_1

_clus_1 (type: hierarchical, method: average, dissimilarity: L2) vars: _clus_1_id (id variable)

_clus_1_ord (order variable) _clus_1_hgt (height variable)

other: cmd: cluster averagelinkage price mpg weight length varlist: price mpg weight length range: 0 .

. list _clus_1_id _clus_1_ord _clus_1_hgt 用分析谱系图(不超过50个样本)来分析 clusterdendrogram _clus_1

cluster dendrogram _clus_1 ,horizontal【水平版本】 另一种结构,产生差异结果的话说明数据本身不稳健。 . clusterwardslinkage price mpg weight length cluster name: _clus_2

. cluster wardslinkage price mpg weight length, name(class1)#自己命名生成的变量class#

. clusterdendrogram class1

生成相似矩阵ma1,希望在此基础上进行矩阵分析 . matrix dissimilarity ma1=price mpg weight length 构造简单链接singlelinkage . clear

.clustermatsinglelinkage ma1 obs was 0, now 74 cluster name: _clus_1

. clusterdendrogram _clus_1

在谱系图的基础上,分成k()类命名为class2等选项,研究样本个体具体的属类情况。

clusterkmeans price mpg weight length, k(4) name (class2)

. list class2

+--------+ | class2 | |--------| 1. | 1 | 2. | 4 | 3. | 1 | 4. | 4 | 5. | 3 | |--------| 6. | 4 | 7. | 1 | 8. | 4 | 9. | 2 | 10. | 1 | |--------| 11. | 2 | 12. | 2 | 13. | 2 | 14. | 1 | 15. | 4 | |--------| 16. | 1 | 17. | 4 | 18. | 1 | 19. | 1 | 20. | 1 | |--------| 21. | 1 | 22. | 4 | 23. | 4 | 24. | 1 | 25. | 1 | |--------| 26. | 2 | 27. | 2 | 28. | 2 | 29. | 1 | 30. | 4 |

|--------| 31. | 4 | 32. | 1 | 33. | 4 | 34. | 1 | 35. | 3 | |--------| 36. | 4 | 37. | 4 | 38. | 4 | 39. | 1 | 40. | 1 | |--------| 41. | 2 | 42. | 1 | 43. | 1 | 44. | 1 | 45. | 4 | |--------| 46. | 1 | 47. | 4 | 48. | 4 | 49. | 4 | 50. | 4 | |--------| 51. | 1 | 52. | 1 | 53. | 3 | 54. | 4 | 55. | 3 | |--------| 56. | 4 | 57. | 1 | 58. | 1 | 59. | 3 | 60. | 1 | |--------| 61. | 4 | 62. | 1 | 63. | 1 | 64. | 2 | 65. | 1 | |--------| 66. | 1 |

67. | 4 | 68. | 1 | 69. | 4 | 70. | 3 | |--------| 71. | 4 | 72. | 1 | 73. | 3 | 74. | 2 | +--------+

. list price lass2

variable lass2 not found r(111);

. list price class2

+-----------------+ | price class2 | |-----------------| 1. | 4,099 1 | 2. | 4,749 4 | 3. | 3,799 1 | 4. | 4,816 4 | 5. | 7,827 3 | |-----------------| 6. | 5,788 4 | 7. | 4,453 1 | 8. | 5,189 4 | 9. | 10,372 2 | 10. | 4,082 1 | |-----------------| 11. | 11,385 2 | 12. | 14,500 2 | 13. | 15,906 2 | 14. | 3,299 1 | 15. | 5,705 4 | |-----------------| 16. | 4,504 1 | 17. | 5,104 4 | 18. | 3,667 1 | 19. | 3,955 1 | 20. | 3,984 1 | |-----------------|

21. | 4,010 1 | 22. | 5,886 4 | 23. | 6,342 4 | 24. | 4,389 1 | 25. | 4,187 1 | |-----------------| 26. | 11,497 2 | 27. | 13,594 2 | 28. | 13,466 2 | 29. | 3,829 1 | 30. | 5,379 4 | |-----------------| 31. | 6,165 4 | 32. | 4,516 1 | 33. | 6,303 4 | 34. | 3,291 1 | 35. | 8,814 3 | |-----------------| 36. | 5,172 4 | 37. | 4,733 4 | 38. | 4,890 4 | 39. | 4,181 1 | 40. | 4,195 1 | |-----------------| 41. | 10,371 2 | 42. | 4,647 1 | 43. | 4,425 1 | 44. | 4,482 1 | 45. | 6,486 4 | |-----------------| 46. | 4,060 1 | 47. | 5,798 4 | 48. | 4,934 4 | 49. | 5,222 4 | 50. | 4,723 4 | |-----------------| 51. | 4,424 1 | 52. | 4,172 1 | 53. | 9,690 3 | 54. | 6,295 4 | 55. | 9,735 3 | |-----------------| 56. | 6,229 4 | 57. | 4,589 1 |

30. | 1 | |-------| 31. | 1 | 32. | 1 | 33. | 1 | 34. | 1 | 35. | 2 | |-------| 36. | 1 | 37. | 1 | 38. | 1 | 39. | 1 | 40. | 1 | |-------| 41. | 2 | 42. | 1 | 43. | 1 | 44. | 1 | 45. | 1 | |-------| 46. | 1 | 47. | 1 | 48. | 1 | 49. | 1 | 50. | 1 | |-------| 51. | 1 | 52. | 1 | 53. | 2 | 54. | 1 | 55. | 2 | |-------| 56. | 1 | 57. | 1 | 58. | 1 | 59. | 2 | 60. | 1 | |-------| 61. | 1 | 62. | 1 | 63. | 1 | 64. | 3 | 65. | 1 | |-------|

66. | 1 | 67. | 1 | 68. | 1 | 69. | 1 | 70. | 1 | |-------| 71. | 1 | 72. | 1 | 73. | 1 | 74. | 2 | +-------+ .

聚类分析到此为止,接下来继续讲判定分析 线性+非线性+其他

先看线性判定分析discrimlda(线性的)

. discrimlda price mpg weight length,group(foreign)

Linear discriminant analysis

Resubstitution classification summary

+---------+ | Key | |---------| | Number | | Percent | +---------+

| Classified True foreign | Domestic Foreign | Total -------------+--------------------+---------

Domestic | 43 9 | 52

| 82.69 domestic判别正确率 17.31 | | | Foreign | 0 22 | 22

| 0.00 100.00 foreign判别正确率 | -------------+--------------------+---------

Total | 43 31 | 74 | 58.11 41.89 | 100.00 | | Priors | 0.5000 0.5000 |

.根据分析结果进行下一步检验estat . estatclasstable

100.00 100.00

Resubstitution classification table

+---------+ | Key | |---------| | Number | | Percent | +---------+

| Classified True foreign | Domestic Foreign | Total -------------+--------------------+---------

Domestic | 43 9 | 52 | 82.69 17.31 | 100.00 | | Foreign | 0 22 | 22 | 0.00 100.00 | 100.00 -------------+--------------------+---------

Total | 43 31 | 74 | 58.11 41.89 | 100.00 | | Priors | 0.5000 0.5000 |

. eatatcorr

unrecognized command: eatat r(199);

. estatcorr

Pooled within-group correlation matrix

| price mpg weight length -------------+---------------------------------------- price | 1.00000 mpg | -0.53117 1.00000 weight | 0.70551 -0.77521 1.00000 length | 0.56014 -0.75664 0.91898 1.00000

. estat covariance

Pooled within-group covariance matrix

| price mpg weight length -------------+--------------------------------------------

price | 8799417 mpg | -8438.941 28.6848 weight | 1318950 -2616.617 397186.1 length | 30603.94 -74.63974 10667.35 339.2432

. estaterrorrate错判率

Error rate estimated by error count

| foreign

| Domestic Foreign | Total

-------------+----------------------+----------

Error rate | .1730769 0 | .0865385 -------------+----------------------+----------

Priors | .5 .5 | .

. estatgrsum四个变量的分组描述性统计差异情况

Estimation sample discrimlda Summarized by foreign

| foreign

Mean | Domestic Foreign | Total -------------+----------------------+---------- price | 6072.423 6384.682 | 6165.257 mpg | 19.82692 24.77273 | 21.2973 weight | 3317.115 2315.909 | 3019.459 length | 196.1346 168.5455 | 187.9324 -------------+----------------------+----------

N | 52 22 | 74

每个变量进行方差分析. . estatanova

Univariate ANOVA summaries

| Adj.

Variable | Model MS Resid MS Total MS R-sq R-sq Pr> F

-------------+-------------------------------------------------------------

price | 1507382.7 6.336e+08 6.249e+08 .0024 -.0115 .1713 0.6802 mpg | 378.15352 2065.3059 2042.1943 .1548 .143 13.18 0.0005

F weight | 15496779 28597399 28417939 .3514 .3424 39.02 0.0000 length | 11767.15 24425.512 24252.11 .3251 .3158 34.69 0.0000 ---------------------------------------------------------------------------

Number of obs = 74 Model df = 1 Residual df = 72 生成的判别函数?——典型判别函数法canontest . estatcanontest

Canonical linear discriminant analysis

| | Like- | Canon. Eigen- Variance | lihood

Fcn | Corr.value Prop. Cumul. | Ratio F df1 df2 Prob>F ----+---------------------------------+------------------------------------

1 | 0.7494 1.28083 1.0000 1.0000 | 0.4384 22.094 4 0.0000 e

---------------------------------------------------------------------------

Ho: this and smaller canon. corr. are zero; e = exact F 调用多元函数形式 . estat loadings

Standardized canonical discriminant function coefficients

| function1 -------------+----------- price | -1.084153 mpg | .3115969 weight | 2.04874 length | -.4264069

显示分类函数 statclassfunction

Classification functions

| foreign | Domestic Foreign

-------------+---------------------- price | .0013868 .0022795 mpg | 4.577349 4.435253 weight | -.0341788 -.0421185 length | 2.534884 2.591428

_cons | -241.4898 -231.8288 -------------+----------------------

Priors | .5 .5

69

接下来

说明非线性判别

. discrimlda price mpg weight length,group(clus5)

Linear discriminant analysis

Resubstitution classification summary

+---------+ | Key | |---------| | Number | | Percent | +---------+

| Classified True clus5 | 1 2 3 | Total -------------+------------------------+-------

1 | 59 0 0 | 59 | 100.00 0.00 0.00 | 100.00 | | 2 | 0 10 0 | 10 | 0.00 100.00 0.00 | 100.00 | | 3 | 0 0 5 | 5 | 0.00 0.00 100.00 | 100.00 -------------+------------------------+-------

Total | 59 10 5 | 74 | 79.73 13.51 6.76 | 100.00

| | Priors | 0.3333 0.3333 0.3333 | 三类有两个判别函数 前文皆在此成立 . estat loadings

Standardized canonical discriminant function coefficients

| function1 function2 -------------+---------------------- price | .9994096 .0619517 mpg | .1147418 .3214297 weight | .7125425 -2.05314 length | -.6690039 2.838476

多元方程分析 . estat loadings

Standardized canonical discriminant function coefficients

| function1 function2 -------------+---------------------- price | .9994096 .0619517 mpg | .1147418 .3214297 weight | .7125425 -2.05314 length | -.6690039 2.838476

. estatgrsum

Estimation sample discrimlda Summarized by clus5

| clus5 Mean | 1 2 3 | Total -------------+---------------------------------+----------

price | 4846.746 9981.5 14091.2 | 6165.257 mpg | 22.49153 17.4 15 | 21.2973 weight | 2824.746 3662 4032 | 3019.459 length | 183.4576 205.2 206.2 | 187.9324 -------------+---------------------------------+----------

N | 59 10 5 | 74

. estatmanova

Number of obs = 74

W = Wilks' lambda L = Lawley-Hotelling trace

P = Pillai's trace R = Roy's largest root

Source | Statisticdf F(df1, df2) = F Prob>F -----------+--------------------------------------------------

clus5 | W 0.1043 2 8.0 136.0 35.63 0.0000 e

| P 0.9304 8.0 138.0 15.00 0.0000 a

| L 8.2507 8.0 134.0 69.10 0.0000 a

| R 8.2102 4.0 69.0 141.63 0.0000 u

|--------------------------------------------------

Residual | 71

-----------+-------------------------------------------------- Total | 73

--------------------------------------------------------------

e = exact, a = approximate, u = upper bound on F .

两次判别qta提高判别率

. discrimqda price mpg weight length, group(foreign)

Quadratic discriminant analysis

Resubstitution classification summary

+---------+ | Key | |---------| | Number | | Percent | +---------+

| Classified True foreign | Domestic Foreign | Total -------------+--------------------+---------

Domestic | 45 7 | 52

| 86.54(提高啦!!!!) | | Foreign | 0 22 | 22 | 0.00 100.00 | 100.00 -------------+--------------------+---------

Total | 45 29 | 74 | 60.81 39.19 | 100.00 | | Priors | 0.5000 0.5000 | 原来的——判别之后的类别及概率 .. estat list

+------------------------------------------------+

| | Classification | Probabilities |

| | | | Obs.| True Class. | Domestic Foreign | |-----+----------------------+-------------------|

| 1 | Domestic Domestic | 1.0000 0.0000 | | 2 | Domestic Domestic | 1.0000 0.0000 | | 3 | Domestic Domestic | 0.9935 0.0065 | | 4 | Domestic Domestic | 1.0000 0.0000 |

13.46 | | 100.00 | 5 | Domestic Domestic | 1.0000 0.0000 | |-----+----------------------+-------------------|

| 6 | Domestic Domestic | 1.0000 0.0000 | | 7 | Domestic Foreign * | 0.2235 0.7765 | | 8 | Domestic Domestic | 1.0000 0.0000 | | 9 | Domestic Domestic | 1.0000 0.0000 | | 10 | Domestic Domestic | 1.0000 0.0000 | |-----+----------------------+-------------------|

| 11 | Domestic Domestic | 1.0000 0.0000 | | 12 | Domestic Domestic | 0.8931 0.1069 | | 13 | Domestic Domestic | 1.0000 0.0000 | | 14 | Domestic Foreign * | 0.2789 0.7211 | | 15 | Domestic Domestic | 1.0000 0.0000 |

总体错判率 . estaterrorrate

Error rate estimated by error count

| foreign

| Domestic Foreign | Total

-------------+----------------------+----------

Error rate | .1346154 (降低了!!!) 0 | .0673077 -------------+----------------------+----------

Priors | .5 .5 |【实际上不一样】 . sum foreign

Variable | Obs Mean Std. Dev. Min -------------+--------------------------------------------------------

foreign | 74 .29729730.3 .4601885 0 改变线性概率再检验

. discrimlda price mpg weight length, group(foreign) priors(0.7,0.3)

Linear discriminant analysis

Resubstitution classification summary

+---------+ | Key | |---------| | Number | | Percent | +---------+

| Classified True foreign | Domestic Foreign | Total

Max 1 -------------+--------------------+---------

Domestic | 44 8 | 52 | 84.62 15.38 | 100.00 | | Foreign | 1 21 | 22 | 4.55 95.45 | 100.00 -------------+--------------------+---------

Total | 45 29 | 74 | 60.81 39.19 | 100.00 | | Priors | 0.7000 0.3000 |

主成份分析

对数据进行降维处理PCA选几个变量出来几个主成份 . pca price mpg weight length turn

Principal components/correlation Number of obs = 74

Number of comp. = 5

Trace = 5

Rotation: (unrotated = principal) Rho = 1.0000

--------------------------------------------------------------------------

Component | Eigenvalue Difference Proportion Cumulative

-------------+------------------------------------------------------------

Comp1 | 3.77621 3.01112 0.7552 0.7552

Comp2 | .76509 .488796 0.1530【两个占总体的90%以上,故选这两个】 0.9083

Comp3 | .276294 .139261 0.0553 0.9635

Comp4 | .137033 .0916582 0.0274 0.9909

Comp5 | .0453749 . 0.0091 1.0000 --------------------------------------------------------------------------

Principal components (eigenvectors)

------------------------------------------------------------------------------

Variable | Comp1 Comp2 Comp3 Comp4 Comp5 | Unexplained

本文来源:https://www.bwwdw.com/article/s3x6.html

Top