stata和sas命令对比

更新时间:2024-06-26 03:41:01 阅读量: 综合文库 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

已有 556 次阅读 2010-5-8 10:56 |个人分类:Stata|系统分类:科研笔记|关键词:SAS, STATA SAS Stata Most operators are the same in Stata as in SAS, but in Stata operators do not have mnemonic equivalents. For example, you have to use the ampersand ( & ) and not the word \ This works: var_a >= 1 & var_b <= 10 where this does not: var_a >= 1 and var_b <= 10 These are the operators that are different in Stata: In SAS operators can be symbols or Symbol Definition mnemonic equivalents such as: & or and For & and many situations in SAS order doesn't | or matter: <= can be: =< and >= can be: => >= greater than or equal to <= less than or equal to == equality (for equality testing) != does not equal ! not ^ power Note: Symbols have to be in the order shown: \>= \=> \ /* this is a comment */ * this is also a comment // this is a comment as well To continue a command /* this is a comment */ * this is also a to the next line (line continuation): /// you can comment ; comment here as well For example: list id state gender age income /// race income date Range of values: if 1 <= var_a <= 10 or: if var_a in(1,2,3,4,5,6,7,8,9,10) or a list of character values: if state in(\ if var_a >= 1 & var_a <= 10 or: if inrange(var_a,1,10) or: if inlist(var_a,1,2,3,4,5,6,7,8,9,10) or a list of string values: if inlist(state,\ Stata has a limit of 10 arguments to inlist() (which includes the string variable) when the arguments are strings. More than one variable can be specified. Referencing multiple variables at a time: Say the following variables are in a data file Referencing multiple variables at a time: var1-var5 in the order shown: var1 var2 var3 age var4 To Stata, this means \variables that are var5 Then you could code them as: positionally between var1 and var5.\Notice that var1--var5 To SAS, this means \there is only one dash ( - ). that are positionally between var1 and var5,\age. Referencing multiple variables at a time: var1-var5 is the same as: var1 var2 var3 var4 var5 no matter the positions of the variables are in the observation. Using a colon selects variables containing the same prefix: var: could represent: var1 var2 var10 variable varying var_1 Referencing multiple variables at a time: var? The question mark ( ? ) is a wild card that represents one character in the variable name. It could be a number, a letter, or an underscore ( _ ). var* The asterisk/star ( * ) is a wild card that represents many characters in the variable name. They could be numbers, letters, or underscores. Thus: var* could represent: var1 var2 var10 variable varying var_1 To save the contents of the results window, start logging to a log file BEFORE you submit commands that you want logged. Open a log file by clicking on the icon in the tool bar that looks like a scroll and a traffic light. A \*.log\file is a simple ASCII text file; a \*.smcl\tags. You can also use the log command: log using \replace Note: The replace option simply tells Stata to overwrite the log file if it already exists. This is helpful when you have to run a do-file over and over again. To save the contents of the Log window and/or Output window, go to that window and click on the menu bar's \\In SAS batch mode these files are automatically generated for you. libname in \data new; set use \You can also in.mySASfile; run; or, starting in SAS 8: data click on the \file\icon and select your new; set \dataset. run; Save the dataset newer to \: libname in \data in.newer; set new; run; save \To overwrite the dataset newer if it already exists: save \You can also click on the \ proc contents; On selected variables: proc describe On selected variables: describe id state contents data = in.newer (keep= id state gender age income gender age income); run; summarize On selected variables: summarize age proc means; On selected variables: proc income If you want variable labels and a proc means; var age income; run; or proc univariate style output try: summarize age univariate; var age income; run; income, detail or: codebook age income proc freq; table var1; run; tabulate var1 or, for just checking out your dataset, try the codebook command. A series of 1-way tables: proc freq; tables A series of 1-way tables: tab1 var1 var2 var1 var2; run; A 2-way table: proc freq; tables var1*var2; A 2-way table: tab2 var1 var2 run; proc print; selected variables in this order: proc print; var id age income; run; On selected variables and a limited range of observations: proc print data = new (firstobs = 1 obs = 20); var id age income; run; list On selected variables in this order: list id age income On selected variables and a limited range of observations: list id age income in 1/20 Create a numeric variable with a default generate var1 = 1234 Note: the default numeric length of 8 bytes: var1 = 1234; Create a numeric variable with the minimum allowable length (3 bytes): length var1 3; var1 = 1234; data type is \float.\The statement above is relying on that default. It could have been written explicitly as: generate float var1 = 1234 \float\decimal.\ You could more wisely save storage space by specifying: gen int var1 = 1234 \int\ Generate a string variable with a length of 3 bytes: gen str3 name = \ replace var1 = 123456 Stata automatically increases the storage type if necessary. To change the storage of a variable manually, use the recast command. replace name = \Stata automatically increases length to 5 The condition follows the command: replace var2 = 1 if var1 == 123456 Notice that Stata requires two equals signs when testing equality. Create a character variable with a length of 3 bytes: name = \ Increase the variable length to allow for 5 characters: data new; length name $5; set new; *Change the values of numeric * and character variables: *; var1 = 123456; name = \ Example of an if-then statement: if var1 = 123456 then var2 = 1; replace child = 1 if age <= 10 replace parent = 0 if age <= 10 Since each command is executed on all Example of an if-then do loop: if age <= 10 observations before the next command is then do; child = 1; parent = 0; end; executed, the if-then-do loop is not an option. Stata does have excellent looping tools: foreach, forvalues, and while. Example of an if-then-else: if 0 <= age <= 2 then agegp = 1; else if 2 < age <= 10 then agegp = 2; else if 10 < age <= 20 then agegp = 3; else if 20 < age <= 40 then agegp = 4; For the same reason if-then-do loops (above) are not possible in Stata, the same goes for if-then-else. But here is a way of doing the same thing. In this example \missing(agegp)\else agegp = . ; simply highlight the fact that it has not been assigned a value, just like the else does in if-then-else: gen agegp = . replace agegp = 1 if missing(agegp) /// & age >= 0 & age <= 2 replace agegp = 2 if missing(agegp) /// & age > 2 & age <= 10 replace agegp = 3 if missing(agegp) /// & age > 10 & age <= 20 replace agegp = 4 if missing(agegp) /// & age > 20 & age <= 40 The cond() function can also be used: // nest cond() functions gen agegp = cond(missing(age),., /// else cond(age >= 1 & age <= 2 ,1, /// else cond(age > 2 & age <= 10,2, /// else cond(age > 10 & age <= 20,3, /// else cond(age > 20 & age <= 40,4,.))))) Check out this example of cond() in the Stata code examples page. Better done with the recode command which can also create value labels: recode age ( 0/2.9999 = 1 \to 2 year olds\/// ( 3/10.9999 = 2 \to 10 year olds\/// (11/20.9999 = 3 \to 20 year olds\/// (21/40.9999 = 4 \to 40 year olds\/// ( else = . ) , gen(agegp) test The test option checks to see if the ranges overlap. Since recode's ranges are >= and <= , adding .9999 to the upper range ensures that fractional values are handled correctly. Drop variables var1, var2, and var3: drop var1 var2 var3 Keep variables var1, var2, and var3: keep var1 var2 var3 Keep observations keep if var1 == 1 Drop variables var1, var2, and var3: data new(drop= var1 var2 var3); set new; run; Keep variables var1, var2, and var3: data new(keep= var1 var2 var3); set new; run; Keep observations / subsetting if statement: data new; set new; if var1 = 1 then output

new; run; Delete observations: data new; set new; if var1 = 1 then delete; run; Loop over a variable list (varlist): data new(drop= i); set new; array raymond {4} var1 var2 var3 var4; do i = 1 to 4; if raymond{i} = 99 then raymond{i} = . ; end; run; Check out this array example in the SAS programming examples page. Create variable labels: label age = \plus bonuses\ Define a format: proc format; value yesno 1 = \Assign the format to a variable: data newer; set newer; format smokes yesno.; run; Drop observations: drop if var1 == 1 foreach i of varlist var1 var2 var3 var4 { replace `i' = . if `i' == 99 } Note: Notice that the quote to the left of the local macro variable i is a left quote ( ` ). The left quote is located at the top of your keyboard next to the ( ! 1 ) key. In this example i is a local macro variable that exists only for the duration of the foreach command so it does not need to be dropped like the variable i in the SAS code. label var age \in years\label var income \ Define a format. These are called \labels\label define yesno 1 \ Assign the value label to a variable: label value smokes yesno Remove formats from a variable: data newer; set newer; ** just do not specify a label value smokes . format **; format smokes ; run; Assign formats defined by SAS to a variable: format interview_date mmddyy8.; Assign formats defined by Stata to a variable: format interview_date %tdNN/DD/YY /* pre Stata 10 the format did not start * with the letter \and did not * need two letters for each part of the date: */ format interview_date %dN/D/Y Note: The letter N in %tdNN/DD/YY stands for \of the month\Specifying Mon in %tdDDMonCCYY uses the three letter abbreviation of the name of the month. So %tdNN/DD/YY displays as \11/06/45\and %tdDDMonCCYY displays as \06Nov1945\ Since the Results window/log file is a mix of both the log and the Output window Stata doesn't need title \of Companies That Got a title statement. Titling can be accomplished with Acquired\ a comment. /* Number of Companies That Got Acquired */ proc sort data = new out = newer; by id; sort id run; proc sort data= sashelp.shoes (keep= region product subsidiary stores sales inventory) out= work.shoes; by region subsidiary product; run; /* fix flaw in dataset * where the Copenhagen subsidiary * has 2 obs for product = \Shoe\**/ proc summary nway data= work.shoes; /* the by statement fixes * the variable order in work.shoes **/ by region subsidiary product; var stores sales inventory; output out= work.shoes (drop= _TYPE_ _FREQ_) sum=stores sales inventory;run; /* long to wide because: * there are repeats of by-variable values **/ proc transpose data= work.shoes out= shoes_wide prefix=prodnum; by region subsidiary; var product; run; keep region subsidiary product bysort region subsidiary (product) : gen prodnum = _n reshape wide product, /// i(region subsidiary) j(prodnum) The xpose command is similar but only works with numeric data. It will turn string variables into missing values. /* wide to long because: * there are no repeats of by-variable values **/ proc transpose data= work.shoes_wide out= shoes_long name=prodnum; by region subsidiary; var prodnum: ; run; // \just names the _j variable prodnum reshape long product, i(region subsidiary) j(prodnum) Check out this reshape example in the Stata code examples page. by id: gen f_num = 1 if _n == 1 by id: gen s_num = 1 if _n == 1 & _N == 1 by id: gen l_num = 1 if _n == _N Stata's _n is equivalent to SAS's _n_ in that it is equal to the observation number; but when inside Using by-groups: data newer; set newer; by a by command _n is equal to 1 for the first id; if first.id = 1 then f_num = 1; if first.id = observation of the by-group, 2 for the second 1 and last.id = 1 then s_num = 1; if last.id = observation of the by-group, etc. 1 then l_num = 1; run; Stata's _N is equal to the number of observations in the dataset except in a by command when it is equal to the total number of observations in the by-group. Count the total number of observations within each ID group, and add that total to each observation: proc summary data= new nway; class id; var age; output out= temp(drop= _type_ _freq_) n= totboys; run; proc sort data= temp; by id; run; proc sort data= new; by id; run; data newer; merge new temp; by id; run; bysort id: egen totboys = count(age) Note: in both SAS and Stata, the count will be the number of observations where the variable being counted has a non-missing value. Here we used the variable age. Create a cumulative/running sum of boys bysort id: gen count = sum(gender == 1 & age <= within each ID group: data new; set newer; 18) by id; retain count 0; if first.id then count = 0; if gender = 1 and age <= 18 then count = count + 1; run; data both; merge in.new(in = a) in.newer(in = b); by id; if a = 1 and b = 1; run; Check out this merge example in the SAS programming examples page. use \sort id /* Starting in Stata 11 you have to specify * what type of merge you are doing nor have. * to have your datasets sorted before the merge. * This is a one-to-one merge: */ merge 1:1 id using \of Stata: merge id using \keep if _merge == 3 Stata automatically creates the variable _merge after a merge. Stata will not merge on another dataset if the variable _merge already exists in one of the datasets. The dataset in memory is the \dataset. The dataset that is being merged on is the \dataset. Unlike SAS, variables shared by the master dataset and the using dataset will not be updated (values overwritten) by the using dataset. Like SAS, the formats, labels, and informats of variables shared by the master dataset and the using dataset will be defined by the master dataset. Remember that the master always wins. Use the update option to overwrite missing data in master file. use \append using Concatenate two datasets / add \/* Starting in Stata 11 observations to a dataset: data both; set you can use append without * having a dataset in.new in.newer; run; already in memory: */ append using \ Sorting datasets in order to prepare them for a merge is only required if you are using a version of Stata prior to Stata 11: Create a local macro variable to represent a filename for Stata to use in temporarily storing a data file on the computer's hard drive if requested to do so later: tempfile company use \ Save the dataset that's currently in memory to a temporary filename in Stata's temp directory. This file will be deleted when Stata is exited just like a dataset in SAS's WORK library: save %use \// pre Stata 11 code: sort id merge id using \ Sort datasets in order to prepare them for a merge: Sort permanently stored datasets and create new, sorted copies in the WORK library: proc sort data = in.company out = work.company; by id; run; proc sort data = in.firm out = work.firm; by id; run; data temp2; merge firm(in = a) company(in = b); by id; run; Stata 11 the data does not need to * be sorted but the type of merge needs to be * specified like in this one-to-one merege: */ merge 1:1 id using \ proc surveymeans; cluster sampunit; strata svyset sampunit [pweight = sampwt], stratum; var age income; weight sampwt; strata(stratum) svy: mean age income run; Analyze a subpopulation by implementing the domain option: proc surveymeans; cluster sampunit; strata stratum; domain female; var age income; weight sampwt; run; Starting in SAS 9: proc surveyfreq; cluster sampunit; strata stratum; tables females*var1*var2; weight sampwt; run; When using proc surveyfreq the domain/subpop variable needs to be included in the tables statement. proc surveyreg; cluster sampunit; strata stratum; model depvar = indvar1 indvar2 indvar3; weight sampwt; run; The surveyreg procedure does not have a way of dealing with subpopulations. Using by or where will not suffice as they will compute incorrect standard errors. Starting in SAS 9: proc surveylogistic; cluster sampunit; strata stratum; model depvar = indvar1 indvar2 indvar3; weight sampwt; run; The surveylogistic procedure does not have a way of dealing with subpopulations. Using by or where will not suffice as they will Analyze a subpopulation by implementing the subpop option: svy: mean age income, subpop(females) Note: options come after a comma ( , ). svyset sampunit [pweight = sampwt], strata(stratum) svy: tab var1 var2, subpop(females) svy: tab var1 , subpop(females) svyset sampunit [pweight = sampwt], strata(stratum) svy: regress depvar indvar1 indvar2 indvar3, /// subpop(females) svyset sampunit [pweight = sampwt], strata(stratum) svy: logit depvar indvar1 indvar2 indvar3, /// subpop(females)

compute incorrect standard errors. Create a local macro variable ver: %let ver = 7; version = &ver.; Technically, SAS macro variables begin with an ampersand ( & ) and end with a period ( . ). It's good practice to end your macro variables with a period. local ver = 7 gen version = `ver' Notice that to evaluate the local macro variable ver a left quote ( `&nsp;) is used and then a right quote ( '&nsp;). The left quote is located on your keyboard next to the ( ! 1&nsp;) key. list in 1/10 if stores < 20 // the order of if and in does not matter: list if stores < 20 in 1/10 Both will first subset the data to the first 10 observations and then attempt to subset the data based on the condition \if stores < 20\So, a hack way of doing the same in Stata is to use the sum() function. Since sum() creates a running sum, you have to repeat the condition outside the sum() to subset the data to that condition to list the first 10 observations. The sum() function adds up the true conditions because true conditions evaluate to 1 (one) and false evaluate to 0 (zero). list if sum((stores < 20)) <= 10 & stores < 20 So you have to repeat the condition to subset the dataset to just those observations before starting the running sum. If the condition is long you could mess up typing it twice so put it in a local macro variable: local cond stores < 20 list if sum((`cond')) <= 10 & `cond' This is what the Stata command ifwins does. There is no built-in Stata command to do this, but the contract command can be used like so: preserve contract region product stores , /// freq(frequency) /// percent(percentage) /// cfreq(cumulative_freq) /// cpercent(cumulative_pct) list restore Print a subset of observations when a condition is true just to see examples (not all situations) where the condition exists in your data: /** WHERE subsets the data * * before OBS subsets the data */ proc print data= sashelp.shoes (where=(stores < 20) obs = 10); run; The above code lists the first 10 observations where (stores < 20). Get a frequency count for each combination of a set of multiple categorical variables: ** example of a 3-way table **; proc freq data= sashelp.shoes; tables region * product * stores / list; run;

来源:李琦

| 分享(378) | 浏览(1125)

源地址: http://blog.renren.com/GetEntry.do?id=725558273&owner=240755041

本文来源:https://www.bwwdw.com/article/1tw3.html

Top