0% found this document useful (0 votes)

544 views207 pages

Good Stata Programming Lecture

The document discusses the benefits of becoming proficient in Stata programming through do-files and ado-files. It notes that do-files allow for reproducible research and documentation of work, while ado-files can automate repetitive tasks. However, the author cautions that existing Stata commands and user-written programs should be explored before writing new ado-files to avoid duplicating work. Overall, learning basic Stata programming helps improve efficiency by shifting repetitive tasks to the computer.

Uploaded by

Rumen Kostadinov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

544 views207 pages

Good Stata Programming Lecture

Uploaded by

Rumen Kostadinov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Adelaide, June 2010

6 / 207

Should you become a Stata programmer?

Using do-les

The beauty of this approach is exibility: if you nd an error in an earlier stage of the project, you need only modify the code and rerun that do-le and those following to bring the project up to date. For instance, an academic researcher may need to respond to a review of her papersubmitted months ago to an academic journalby revising the specication of variables in a set of estimated models and estimating new statistical results. If all of the steps producing the nal results are documented by a set of do-les, that task becomes straightforward. I argue that all serious users of Stata should gain some facility with do-les and the Stata commands that support repetitive use of commands. A few hours investment should save days of weeks of time over the course of a sizable research project.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

7 / 207

Should you become a Stata programmer?

Using do-les

That advice does not imply that Statas interactive capabilities should be shunned. Stata is a powerful and effective tool for exploratory data analysis and ad hoc queries about your data. But data management tasks and the statistical analyses leading to tabulated results should not be performed with point-and-click tools which leave you without an audit trail of the steps you have taken. Responsible research involves reproducibility, and point-and-click tools do not promote reproducibility. For that reason, I counsel researchers to move their data into Stata (from a spreadsheet environment, for example) as early as possible in the process, and perform all transformations, data cleaning, etc. with Statas do-le language. This can save a great deal of time when mistakes are detected in the raw data, or when they are revised.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

8 / 207

Should you become a Stata programmer?

Should you become a Stata programmer?

Writing Mata subroutines for ado-les

Context 3: Mata subroutines for ado-les

Your ado-les may perform some complicated tasks which involve many invocations of the same commands. Statas ado-le language is easy to read and write, but it is interpreted: Stata must evaluate each statement and translate it into machine code. Statas Mata programming language (help mata) creates compiled code which can run much faster than ado-le code. Your ado-le can call a Mata routine to carry out a computationally intensive task and return the results in the form of Stata variables, scalars or matrices. Although you may think of Mata solely as a matrix language, it is actually a general-purpose programming language, suitable for many non-matrix-oriented tasks such as text processing and list management.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

16 / 207

Should you become a Stata programmer?

Writing Mata subroutines for ado-les

The Mata programming environment is tightly integrated with Stata, allowing interchange of variables, local and global macros and Stata matrices to and from Mata without the necessity to make copies of those objects. A Mata program can easily generate an entire set of new variables (often in one matrix operation), and those variables will be available to Stata when the Mata routine terminates. Matas similarity to the C language makes it very easy to use for anyone with prior knowledge of C. Its handling of matrices is broadly similar to the syntax of other matrix programming languages such as MATLAB, Ox and GAUSS. Translation of existing code for those languages or from lower-level languages such as Fortran or C is usually quite straightforward. Unlike Statas C plugins, code in Mata is platform-independent, and developing code in Mata is easier than in compiled C.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

17 / 207

Tools for do-le authors

In this section of the talk, I will mention a number of tools and tricks useful for do-le authors. Like any language, the Stata do-le language can be used eloquently or incoherently. Users who bring other languages techniques and try to reproduce them in Stata often nd that their Stata programs resemble Googles automated translation of French to English: possibly comprehensible, but a long way from what a native speaker would say. We present suggestions on how the language may be used most effectively. Although I focus on authoring do-les, these tips are equally useful for ado-le authors: and perhaps even more important in that context, as an ado-le program may be run many times.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

26 / 207

Tools for do-le authors

The local macro

The local macro is an invaluable tool for do-le authors. A local macro is created with the local statement, which serves to name the macro and provide its content. When you next refer to the macro, you extract its value by dereferencing it, using the backtick () and apostrophe () on its left and right: local george 2 local paul = george + 2 In this case, I use an equals sign in the second local statement as I want to evaluate the right-hand side, as an arithmetic expression, and store it in the macro paul. If I did not use the equals sign in this context, the macro paul would contain the string 2 + 2.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

Stata do-le programming: Selected recipes

Introduction

Stata do-le programming: Selected recipes

I now present a number of recipes for solving common problems in Stata do-le programming. These are taken from my book, An Introduction to Stata Programming, Stata Press, 2009. The concept behind this collection of recipes is that you may have a problem similar to one of those illustrated here. Differing somewhat, perhaps, but with enough similarity to provide guidance for the problems successful solution. Just as you might modify a recipe in the kitchen to deal with ingredients at hand or preferences of the dinner guests, you can modify a do-le recipe to meet your needs.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

34 / 207

4.1: Tabulating a logical condition across a set of variables

Tabulating a logical condition across a set of variables

The problem: considering a number of related variables, you want to determine whether, for each observation, all variables satisfy a logical condition. Alternatively, you might want to know whether any satisfy that condition (for instance, taking on inappropriate values), or you might want to count how many of the variables satisfy the logical condition.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

35 / 207

4.1: Tabulating a logical condition across a set of variables

This would seem to be a natural application of egen as that command already contains a number of row-wise functions to perform computations across variables. For instance, the anycount() function counts the number of variables in its varlist whose values for each observation match those of an integer numlist, while the rowmiss() and rownonmiss() functions tabulate the number of missing and non-missing values for each observation, respectively. The three tasks above are all satised by egen functions from Nicholas Coxs egenmore package: rall(), rany() and rcount(), respectively. Why dont you just use those functions, then?

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

36 / 207

What if you want to juxtapose the summary statistics for each aggregate unit with the individual observations in order to compute one or more variables for each record? For instance, you might have repeated-measures data for a physicans patients measuring their height, weight and blood pressure at the time of each ofce visit. You might want to ag observations where their weight is above their median weight, or when their blood pressure is above the 75th percentile of their repeated measurements. Computations such as these may be done with a judicious use of by-groups. For instance,
. by patientid: egen medwt = median(weight) . by patientid: egen bp75 = pctile(bp), p(75)

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

42 / 207

One solution to this problem involves using a ready-made Stata command, tsspell, written by Nicholas J. Cox. This command can handle any aspect of our investigation. It does require that the underlying data be dened as a Stata time series (for instance, with tsset. This makes it less than ideal if your data are ordered but not evenly spaced, such as patient visits to their physician which may be irregularly timed. Another issue arises, though: that raised above with respect to egen. The tsspell program is fairly complicated interpreted code, which may impose a computational penalty when applied to a very large dataset. You may only need one simple feature of the program for your analysis. Thus, you may want to consider analyzing spells in do-le code, perhaps much simpler than the invocation of tsspell. You generally can avoid explicit looping over observations, and will want to do so whenever possible.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

49 / 207

4.4: Computing the length of spells

Assume that you have a variable denoting the ordering of the data (which might be a Stata date or date-and-time variable, but need not be) and that the data have been sorted on that variable. The variable of interest is employer, which takes on values A,B,C... or missing for periods of unemployment. You want to identify the beginning of each spell with an indicator variable. How do we know that a spell has begun? The condition
. generate byte beginspell = employer != employer[_n-1]

will sufce to dene the start of each new spell (using the byte datatype to dene this indicator variable). Of course, the data may be left censored in the sense that we do not start observing the employees job history on her date of hire. But the fact that employer[_n-1] is missing for period 1 does not matter, as it will be captured as the start of the rst spell. What about spells of unemployment? If they are coded as a missing value of employer, they will be considered spells as well.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 50 / 207

4.4: Computing the length of spells

First consider some ctitious data on an employee. She is rst observed working for rm A in 1987, then is laid off in 1990. After a spell of unemployment, she is hired by rm B in 1992, and so on.
. list, sepby(employer) noobs year 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 B B B B B C A employer A A A wage 8.25 8.50 8.75 . . 7.82 7.98 8.12 8.40 8.52 9.00 9.25 . . begins~l 1 0 0 1 0 1 0 0 0 0 1 1 1 0
Programming in Stata and Mata 1 Adelaide, June 2010 51 / 207

Christopher F Baum (BC DDIW)10.18 / 2001

4.4: Computing the length of spells

Notice that beginspell properly ags each change in employment status, including entry into unemployment. If we wanted to ag only spells of unemployment, we could do so with
. generate byte beginunemp = missing(employer) & (employer != employer[_n-1])

which would properly identify years in which unemployment spells commenced as 1990, 1999 and 2006.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

52 / 207

4.4: Computing the length of spells

With an indicator variable agging the start of a spell, we can compute how many changes in employment status this employee has faced, as the count of that indicator variable provides that information. We can also use this notion to tag each spell as separate:
. list, sepby(employer) noobs year 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 B B B B B C A employer A A A wage 8.25 8.50 8.75 . . 7.82 7.98 8.12 8.40 8.52 9.00 9.25 . . begins~l 1 0 0 1 0 1 0 0 0 0 1 1 1 0 beginu~p 0 0 0 1 0 0 0 0 0 0 0 0 1 0 spellnr 1 1 1 2 2 3 3 3 3 3 4 5 6 6 7
Adelaide, June 2010 53 / 207

Christopher F Baum (BC DDIW)10.18 / 2001

Programming in Stata and Mata 1 0

4.4: Computing the length of spells

What if we now want to calculate the average wage paid by each employer?
. sort spellnr . by spellnr: egen meanwage = mean(wage)

Or the duration of employment with each employer (length of each employment spell)?
. by spellnr: gen length = _N if !missing(employer)

Here we are taking advantage of the fact that the time variable is an evenly spaced time series. If we had unequally spaced data, we would want to use Statas date functions to compute the duration of each spell.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

54 / 207

4.4: Computing the length of spells

This example may seem not all that useful as it refers to a single employees employment history. However, all of the techniques we have illustrated work equally well when applied in the context of panel or longitudinal data as long as they can be placed on a time-series calendar. If we add an id variable to these data and xtset id year, we may reproduce all of the results above by merely employing the by id: prex. In the last three examples, we must sort by both id and spell: for example,
. sort id spellnr . bysort id spellnr: egen meanwage = mean(wage)

is now required to compute the mean wage for each spell of each employee in a panel context.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

55 / 207

4.4: Computing the length of spells

A number of additional aspects of spells may be of interest. Returning to the single employees data, we may want to ag only employment spells at least three years long. Using the length variable, we may generate such as indicator as:
. sort spellnr . by spellnr: gen length = _N if !missing(employer) (5 missing values generated) . generate byte longspell = (length >= 3 & !missing(length)) . list year employer length longspell, sepby(employer) noobs year 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 B B B B B C employer A A A length 3 3 3 . . 5 5 5 5 5 1 longsp~l 1 1 1 0 0 1 1 1 1 1
Programming in Stata and Mata

Christopher F Baum (BC / DIW)

Adelaide, June 2010

56 / 207

6.1: Efciently dening group characteristics and subsets

Efciently dening group characteristics and subsets

The problem: say that your cross-sectional dataset contains a record for each patient who has been treated at one of several clinics. You want to associate each patients clinic with an location code (for urban clinics, the Standard Metropolitan Statistical Area (SMSA) in which the clinic is located). The SMSA identier is not on the patients record but it is available to you. How do you get this associated information on each patients record without manual editing? One quite cumbersome technique (perhaps familiar to users of other statistical packages) involves writing a long sequence of statements with if exp clauses. Let us presume that we have Stata dataset patient containing the individuals details as well as clinicid, the clinic ID. Assume that it can be dealt with as an integer. If it were a string code, that could easily be handled as well.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 57 / 207

6.1: Efciently dening group characteristics and subsets

Create a text le, clinics.raw, containing two columns: the clinic ID (clinicid) and the SMSA FIPS code (smsa) For instance,
12367 12467 12892 13211 14012 ... 23435 29617 32156 1120 1120 1120 1200 4560 ... 5400 8000 9240

where SMSA codes 1120, 1200, 4560, 5400, 8000 and 9240 refer to the Boston, Brockton, Lowell, New Bedford, Springeld-Chicopee-Holyoke and Worcester, MA SMSAs, respectively.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

58 / 207

6.1: Efciently dening group characteristics and subsets

Read the le into Stata with infile clinicid smsa using clinics, and save the le as Stata dataset clinic_char. Now use the patient le and give the commands
. merge n:1 clinicid using clinic_char . tab _merge

We use the n:1 form of the merge command to ensure that the clinic_char dataset has a single record per clinic. After the merge is performed you should nd that all patients now have an smsa variable dened. If there are missing values in smsa, list the clinicids for which that variable is missing and verify that they correspond to non-urban locations. When you are satised that the merge has worked properly, type
. drop _merge

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

59 / 207

6.1: Efciently dening group characteristics and subsets

You have performed a one-to-many merge, attaching the same SMSA identier to all patients who have been treated at clinics in that SMSA. You may now use the smsa variable to attach SMSA-specic information to each patient record with merge. Unlike an approach depending on a long list of conditional statements such as
. replace smsa=1120 if inlist(clinicid,12367,12467,12892,...)

this approach leads you to create a Stata dataset containing your clinic ID numbers so that you may easily see whether you have a particular code in your list. This approach would be especially useful if you revise the list for a new set of clinics.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

60 / 207

6.1a: Selecting a subset of observations

As Nicholas Cox has pointed out in a Stata FAQ, the above approach may also be fruitfully applied if you need to work with a subset of observations that satisfy a complicated criterion. This might be best dened in terms of an indicator variable that species the criterion (or its complement). The same approach may be used. Construct a le containing the identiers that dene the criterion (in the example above, the clinic IDs to be included in the analysis). Merge that le with your dataset and examine the _merge variable. That variable will take on values 1, 2 or 3, with a value of 3 indicating that the observation falls in the subset. You may then dene the desired indicator:
. generate byte subset1 = _merge == 3 . drop _merge . regress ... if subset1

Using this approach, any number of subsets may be easily constructed and maintained, avoiding the need for complicated conditional statements.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 61 / 207

8.1: Handling parallel lists

Handling parallel lists

The problem: For each of a set of variables, you want to perform some steps that involve another group of variables, perhaps creating a third set of variables. These are parallel lists, but the variable names of the other lists may not be deducible from those of the rst list.How can these steps be automated? First, lets consider that we have two arbitrary sets of variable names, and want to name the resulting variables based on the rst sets variable names. For instance, you might have some time series of population data for several counties and cities:

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

62 / 207

8.1: Handling parallel lists

. . . . . . . .

local county Suffolk Norfolk Middlesex Worcester Hampden local cseat Boston Dedham Cambridge Worcester Springfield local wc 0 foreach c of local county { local ++wc local sn : word `wc of `cseat generate seatshare`county = `sn / `c }

This foreach loop will operate on each pair of elements in the parallel lists, generating a set of new variables seatshareSuffolk, seatshareNorfolk. . .

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

63 / 207

8.1: Handling parallel lists

Another form of this logic would use a set of numbered variables in one of the loops. In that case, you could use a forvalues loop over the values (assuming they were consecutive or otherwise patterned) and the extended macro function word of. . . to access the elements of the other loop. The tokenize command could also be used. Alternatively, you could use a forvalues loop over both lists, employing the word count extended macro function:
. local n: word count `county . forvalues i = 1/`n { . local a: word ì of `county . local b: word ì of `cseat . generate seatshareà = `b/à . }

yielding the same results as the previous approach.

The problem: if you have unbalanced panel data, how do you ensure that each unit has at least n observations available? It is straightforward to calculate the number of available observations for each unit.
. xtset patient date . by patient: generate nobs = _N . generate want = (nobs >= n)

These commands will produce an indicator variable, want, which selects those units which satisfy the condition of having at least n available observations.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

67 / 207

8.4: Requiring at least n observations per panel unit

This works well if all you care about is the number of observations available, but you may have a more subtle concern: you want to count consecutive observations. You may want to compute statistics based on changes in various measurements using Statas L. or D. time series operators Applying these operators to series with gaps will create missing values. A solution to this problem is provided by Nicholas J. Cox and Vince Wiggins in a Stata FAQ, How do I identify runs of consecutive observations in panel data? The sequence of consecutive observations is often termed a run or a spell. They propose dening the runs in the timeseries for each panel unit:

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

68 / 207

8.4: Requiring at least n observations per panel unit

. . . .

generate run = . by patient: replace run = cond(L.run == ., 1, L.run + 1) by patient: egen maxrun = max(run) generate wantn = (maxrun >= n)

The second command replaces the missing values of run with either 1 (denoting the start of a run) or the prior value + 1. For observations on consecutive dates, that will produce an integer series 1,. . . ,.len where len is the last observation in the run. When a break in the series occurs, the prior (lagged) value of run will be missing, and run will be reset to 1. The variable maxrun then contains, for each patient, the highest value of run in that units sequence.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

69 / 207

A second, somewhat less intuitive but shorter solution:

. bysort pid (vid) : generate count = sum( vid != vid[_n-1] ) . by pid: replace count = count(_N)

This solution takes advantage of the fact that when (vid) is used on the bysort prex, the data are sorted in order of vid within each pid, even though the pid is the only variable dening the by-group. When the vid changes, another value of 1 is generated and summed. When subsequent transactions pertain to the same vendor, vid != vid[_n-1] evaluates to 0, and those zero values are added to the sum.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

73 / 207

8.5: Counting the number of distinct values per individual

This problem is common enough that an ofcial egen function has been developed to tag observations:
. egen tag = tag( pid vid ) . egen count = total(tag), by(pid)

The tag() function returns 1 for the rst observation of a particular combination of pid vid, and zero otherwise. Thus, its total() for each pid is the number of vids with whom she deals. As a last solution, Coxs egenmore package contains the nvals()egen nvals() function, which allows you to say
. egen count = nvals(vid), by(pid)

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

74 / 207

10.1: Computing rm-level correlations with multiple indices

Computing rm-level correlations with multiple indices

The problem: a user on Statalist posed a question involving a very sizable dataset of rm-level stock returns and a set of index fund returns. He wanted to calculate, for each rm, the average returns and the set of correlations with the index funds, and determine with which fund they were most highly correlated. We illustrate this problem with some actual daily stock returns data for 291 rms, 19922006, from CRSP (the Center for Research on Securities Prices): 311,737 rm-daily observations in total. We have constructed nine simulated index funds returns. The hypothetical funds, managed by a group of Greek investment specialists, are labeled the Kappa, Lambda, Nu, Xi, Tau, Upsilon, Phi, Chi and Psi funds. These data are stored in crspsubseta.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 75 / 207

10.1: Computing rm-level correlations with multiple indices

To solve the problem, we dene a loop over rms. For each rm of the nf rms, we want to calculate the correlations between rm returns and the set of nind index returns, and nd the maximum value among those correlations. The variable hiord takes on values 19, while permno is an integer code assigned to each rm by CRSP. We set up a Stata matrix retcorr to hold the correlations, with nf rows and nind columns. The number of rms and number of indices are computed by the word count extended macro function applied to the local macro produced by levelsof.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

76 / 207

10.1: Computing rm-level correlations with multiple indices

. qui levelsof hiord, local(indices) . . . . local nind : word count `indices qui levelsof permno, local(firms) local nf : word count `firms matrix retcorr = J(`nf, `nind, .)

We calculate the average return for each rm with summarize, meanonly. In a loop over rms, we use correlate to compute the correlation matrix of each rms returns, ret, with the set of index returns. For rm n, we move the elements of the last row of the matrix corresponding to the correlations with the index returns into the nth row of the retcorr matrix. We also place the mean for the nth rm into that observation of variable meanret.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

77 / 207

10.1: Computing rm-level correlations with multiple indices

. . . . .

local n 0 qui gen meanret = . qui gen ndays = . local row = `nind + 1 foreach f of local firms { 2. qui correlate index1-index`nind ret if permno == `f 3. matrix sigma = r(C) 4. local ++n 5. forvalues i = 1/`nind { 6. matrix retcorr[`n, `i] = sigma[`row, `i] 7. } 8. summarize ret if permno == `f, meanonly 9. qui replace meanret = r(mean) in `n 10. qui replace ndays = r(N) in `n 11. }

We now may use the svmat command to convert the retcorr matrix into a set of variables, retcorr1-retcorr9. The egen function rowmax() computes the maximum value for each rm. We then must determine which of the nine elements is matched by that maximum value. This number is stored in highcorr.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 78 / 207

10.1: Computing rm-level correlations with multiple indices

. . . .

svmat double retcorr qui egen double maxretcorr = rowmax(retcorr*) qui generate highcorr = . forvalues i = 1/`nind { 2. qui replace highcorr = `i if maxretcorr == retcorr`i /// > & !missing(maxretcorr) 3. }

We now can sort the rm-level data in descending order of meanret, using gsort and list rms and their associated index fund numbers. These values show, for each rm, which index fund their returns most closely resemble. For brevity, we list only the fty best-performing rms.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

79 / 207

10.1: Computing rm-level correlations with multiple indices

. gsort -meanret highcorr . label values highcorr ind . list permno meanret ndays highcorr in 1/50, noobs sep(0) permno meanret ndays highcorr

1 97 5 q1 1 9 8 0q1 E ndpoint

19 8 5 q1

1 9 9 0 q1

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

95 / 207

Ado-le programming: a primer

The syntax statement

. webuse grunfeld, clear . quietly replace invest = . in 28 . quietly replace mvalue = . in 55 . quietly replace kstock = . in 87 . quietly replace kstock = . in 94 . onespell invest mvalue kstock, saving(grun1) replace Observations removed: 28 file grun1.dta saved

A total of 28 observations are removed. The tabulation shows that rms 2, 3 and 5 now have longest spells of 12, 14 and 6 years, respectively.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

egen, nl and gmm programming

egen functions

egen functions
The egen (Extended Generate) command is open-ended, in that any Stata user may dene an additional egen function by writing a specialized ado-le program.The name of the program (and of the le in which it resides) must start with _g: that is, _gcrunch.ado will dene the crunch() function for egen. To illustrate egen functions, let us create a function to generate the 9010 percentile range of a variable. The syntax for egen is:
egen type newvar = fcn(arguments) if in , options

Adelaide, June 2010

112 / 207

egen, nl and gmm programming

egen functions

To illustrate, we use auto.dta:

. sysuse auto, clear (1978 Automobile Data) . bysort rep78 foreign: egen pctrange = pct9010(price)

Now, if we want to compute a summary statistic (such as the percentile range) for each observation classied in a particular subset of the sample, we may use the pct9010() function to do so.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

113 / 207

egen, nl and gmm programming

nl and nlsur programs

You may perform nonlinear least squares estimation for either a single equation (nl) or a set of equations (nlsur). Although these commands may be used interactively or in terms of programmed substitutable expressions," most serious use is likely to involve your writing a function evaluator program. That program will compute the dependent variable(s) as a function of the parameters and variables specied.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

114 / 207

egen, nl and gmm programming

egen, nl and gmm programming

nl and nlsur programs

. use production, clear . nl ces @ lnoutput capital labor, parameters(b0 rho delta) /// > initial(b0 0 rho 1 delta 0.5) (obs = 100) Iteration 0: residual SS = 29.38631 Iteration 1: residual SS = 29.36637 Iteration 2: residual SS = 29.36583 Iteration 3: residual SS = 29.36581 Iteration 4: residual SS = 29.36581 Iteration 5: residual SS = 29.36581 Iteration 6: residual SS = 29.36581 Iteration 7: residual SS = 29.36581 Source SS df MS Number of obs Model 91.1449924 2 45.5724962 R-squared Residual 29.3658055 97 .302740263 Adj R-squared Root MSE Total 120.510798 99 1.21728079 Res. dev. lnoutput /b0 /rho /delta Coef. 3.792158 1.386993 .4823616 Std. Err. .099682 .472584 .0519791 t 38.04 2.93 9.28 P>|t| 0.000 0.004 0.000

= = = = =

100 0.7563 0.7513 .5502184 161.2538

[95% Conf. Interval] 3.594316 .4490443 .3791975 3.989999 2.324941 .5855258

Parameter b0 taken as constant term in model & ANOVA table

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

119 / 207

egen, nl and gmm programming

z 0.00 -6.42 5.73

P>|z| 0.998 0.000 0.000

[95% Conf. Interval] -3.465116 -1.590348 45.80824 3.472113 -.8466975 93.42233

Instruments for equation 1: gear_ratio length headroom _cons

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

126 / 207

egen, nl and gmm programming

gmm programs

The gmm command may be used to estimate models that are not already programmed in Stata, including those containing nonlinear moment conditions, models with multiple equations and panel data models. This illustration merely lays the groundwork for more complex applications of GMM estimation procedures. For instance, we might want to apply Poisson regression in a panel data context. A standard Poisson regression may be written as y = exp(x ) + u If the x variables are strictly exogenous, this gives rise to the moment condition E[x{y exp(x )}] = 0 and we need only compute the residuals from this expression to implement GMM estimation.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

127 / 207

egen, nl and gmm programming

gmm programs

In a panel context, with an individual heterogeneity term (xed effect) i , we have E(yit |xit , i ) = exp(xit + i ) = it i where it = exp(xit and i = exp(i ). With an additive error term , we have the regression model yit = it i +
it

where i is allowed to be correlated with the regressors.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

128 / 207

egen, nl and gmm programming

gmm programs

With strictly exogenous regressors, the sample moment conditions are xit
i t

yi yit it i

where the bar values are the means of y and for panel i. As i depends on the parameters of the model, it must be recalculated within the residual equation.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

129 / 207

egen, nl and gmm programming

gmm programs

Our moment evaluator program for this problem is then:

. program gmm_ppois 1. version 11 2. syntax varlist if, at(name) myrhs(varlist) /// > mylhs(varlist) myidvar(varlist) 3. quietly { 4. tempvar mu mubar ybar 5. gen double `mu = 0 ìf 6. local j = 1 7. foreach var of varlist `myrhs { 8. replace `mu = `mu + `var*àt[1,`j] ìf 9. local ++j 10. } 11. replace `mu = exp(`mu) 12. egen double `mubar = mean(`mu) ìf, by(`myidvar) 13. egen double `ybar = mean(`mylhs) ìf, by(`myidvar) 14. replace `varlist = `mylhs - `mu*`ybar/`mubar ìf 15. } 16. end

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

130 / 207

egen, nl and gmm programming

gmm programs

Using the poisson1 dataset with three exogenous regressors, we estimate the model:
. webuse poisson1, clear . gmm gmm_ppois, mylhs(y) myrhs(x1 x2 x3) myidvar(id) /// > nequations(1) parameters(b1 b2 b3) /// > instruments(x1 x2 x3, noconstant) vce(cluster id) /// > onestep nolog Final GMM criterion Q(b) = 5.13e-27 GMM estimation Number of parameters = 3 Number of moments = 3 Initial weight matrix: Unadjusted Number of obs = 409 (Std. Err. adjusted for 45 clusters in id) Robust Std. Err. .1000265 .0923592 .1156561

Coef. /b1 /b2 /b3 1.94866 -2.966119 1.008634

z 19.48 -32.12 8.72

P>|z| 0.000 0.000 0.000

[95% Conf. Interval] 1.752612 -3.14714 .781952 2.144709 -2.785099 1.235315

Instruments for equation 1: x1 x2 x3

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Circumventing the limits of Statas matrix language

The Mata programming language can sidestep these memory issues by creating matrices with contents that refer directly to Stata variablesno matter how many variables and observations may be referenced. These virtual matrices, or views, have minimal overhead in terms of memory consumption, regardless of their size. Unlike some matrix programming languages, Mata matrices can contain either numeric elements or string elements (but not both). This implies that you can use Mata productively in a list processing environment as well as in a numeric context. For example, a prominent list-handling command, Bill Goulds adoupdate, is written almost entirely in Mata. viewsource adoupdate.ado reveals that only 22 lines of code (out of 1,193 lines) are in the ado-le language. The rest is Mata.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

135 / 207

Introduction to Mata

Speed advantages

Speed advantages
Last but by no means least, ado-le code written in the matrix language with explicit subscript references is slow. Even if such a routine avoids explicit subscripting, its performance may be unacceptable. For instance, David Roodmans xtabond2 can run in version 7 or 8 without Mata, or in version 9 or 10 with Mata. The non-Mata version is an order of magnitude slower when applied to reasonably sized estimation problems. In contrast, Mata code is automatically compiled into bytecode, like Java, and can be stored in object form or included in-line in a Stata do-le or ado-le. Mata code runs many times faster than the interpreted ado-le language, providing signicant speed enhancements to many computationally burdensome tasks.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

136 / 207

Introduction to Mata

An efcient division of labor

Mata interfaced with Stata provides for an efcient division of labor. In a pure matrix programming language, you must handle all of the housekeeping details involved with data organization, transformation and selection. In contrast, if you write an ado-le that calls one or more Mata functions, the ado-le will handle those housekeeping details with the convenience features of the syntax and marksample statements of the regular ado-le language. When the housekeeping chores are completed, the resulting variables can be passed on to Mata for processing. Mata can access Stata variables, local and global macros, scalars and matrices, and modify the contents of those objects as needed. If Matas view matrices are used, alterations to the matrix within Mata modies the Stata variables that comprise the view.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 137 / 207

Outline of the talk

In the rest of this talk, I will discuss: Basic elements of Mata syntax Design of a Mata function Matas interface functions Some examples of StataMata routines

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

138 / 207

Mata language elements

Operators

Operators
To understand Mata syntax, you must be familiar with its operators. The comma is the column-join operator, so : r1 = ( 1, 2, 3 ) creates a three-element row vector. We could also construct this vector using the row range operator (..) as : r1 = (1..3) The backslash is the row-join operator, so c1 = ( 4 \ 5 \ 6 ) creates a three-element column vector. We could also construct this vector using the column range operator (::) as : c1 = (4::6)
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 139 / 207

Mata language elements

Operators

We may combine the column-join and row-join operators: m1 = ( 1, 2, 3 \ 4, 5, 6 \ 7, 8, 9 ) creates a 3 3 matrix. The matrix could also be constructed with the row range operator: m1 = ( 1..3 \ 4..6 \ 7..9 )

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

140 / 207

Mata language elements

Operators

The prime (or apostrophe) is the transpose operator, so r2 = ( 1 \ 2 \ 3 ) is a row vector. The comma and backslash operators can be used on vectors and matrices as well as scalars, so r3 = r1, c1 will produce a six-element row vector, and c2 = r1 \ c1 creates a six-element column vector. Matrix elements can be real or complex, so 2 - 3 i refers to a complex number 2 3 1.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 141 / 207

Mata language elements

Operators

The standard algebraic operators plus (+), minus () and multiply () work on scalars or matrices: g = r1 + c1 h = r1 * c1 j = c1 * r1 In this example h will be the 1 1 dot product of vectors r1, c1 while j is their 3 3 outer product.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

142 / 207

Mata language elements

Element-wise calculations and the colon operator

One of Matas most powerful features is the colon operator. Matas algebraic operators, including the forward slash (/) for division, also can be used in element-by-element computations when preceded by a colon: k = r1 :* c1 will produce a three-element column vector, with elements as the product of the respective elements: ki = r 1i c1i , i = 1, . . . , 3.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

143 / 207

Mata language elements

Element-wise calculations and the colon operator

Matas colon operator is very powerful, in that it will work on nonconformable objects. For example: r4 m2 m3 m4 = = = = ( 1, 2, 3 ) ( 1, 2, 3 \ 4, 5, 6 \ 7, 8, 9 ) r4 :+ m2 m1 :/ r1

adds the row vector r4 to each row of the 3 3 matrix m2 to form m3, and divides the elements of each row of matrix m1 by the corresponding elements of row vector r1 to form m4. Matas scalar functions will also operate on elements of matrices: d = sqrt(c) will take the element-by-element square root, returning missing values where appropriate.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 144 / 207

Mata language elements

Logical operators

Logical operators
As in Stata, the equality logical operators are a == b and a != b. They will work whether or not a and b are conformable or even of the same type: a could be a vector and b a matrix. They return 0 or 1. Unary not ! returns 1 if a scalar equals zero, 0 otherwise, and may be applied in a vector or matrix context, returning a vector or matrix of 0, 1. The remaining logical comparison operators (>, >=, <, <=) can only be used on objects that are conformable and of the same general type (numeric or string). They return 0 or 1. As in Stata, the logical and (&) and or (|) operators may only be applied to real scalars.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 145 / 207

Mata language elements

Subscripting

Subscripting
Subscripts in Mata utilize square brackets, and may appear on either the left or right of an algebraic expression. There are two forms: list subscripts and range subscripts. With list subscripts, you can reference a single element of an array as x[i,j]. But i or j can also be a vector: x[i,jvec], where jvec= (4,6,8) references row i and those three columns of x. Missing values (dots) reference all rows or columns, so x[i,.] or x[i,] extracts row i, and x[.,.] or x[,] references the whole matrix. You may also use range operators to avoid listing each consecutive element: x[(1..4),.] and x[(1::4),.] both reference the rst four rows of x. The double-dot range creates a row vector, while the double-colon range creates a column vector. Either may be used in a subscript expression. Ranges may also decrement, so x[(3::1),.] returns those rows in reverse order.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 146 / 207

Mata language elements

Subscripting

Range subscripts use the notation [| |]. They can reference single elements of matrices, but are not useful for that. More useful is the ability to say x[| i,j \ m,n |], which creates a submatrix starting at x[i,j] and ending at x[m,n]. The arguments may be specied as missing (dot), so x[| 1,2 \4,. |] species the submatrix ending in the last column and x[| 2,2 \ .,.|] discards the rst row and column of x. They also may be used on the left hand side of an expression, or to extract a submatrix: v = invsym(xx)[| 2,2 \ .,.|] discards the rst row and column of the inverse of xx. You need not use range subscripts, as even the specication of a submatrix can be handled with list subscripts and range operators, but they are more convenient for submatrix extraction and faster in terms of execution time.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

147 / 207

Mata language elements

Loop constructs

Loop constructs
Several constructs support loops in Mata. As in any matrix language, explicit loops should not be used where matrix operations can be used. The most common loop construct resembles that of the C language: for (starting_value; ending_value; incr ) { statements } where the three elements dene the starting value, ending value or bound and increment or decrement of the loop. For instance: for (i=1; i<=10; i++) { printf("i=%g \n", i) } prints the integers 1 to 10 on separate lines. If a single statement is to be executed, it may appear on the for statement.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 148 / 207

A Mata function denition includes an argument list, which may be blank. The names of arguments are required and arguments are positional. The order of arguments in the calling sequence must match that in the Mata function. If the argument list includes a vertical bar ( | ), following arguments are optional. Within a function, variables may be explicitly declared (and must be declared if matastrict mode is used). It is good programming practice to do so, as then variables cannot be inadvertently misused. Variables within a Mata function have local scope, and are not accessible outside the function unless declared as external. A Mata function may only return one item (which could, however, be a multi-element structure. If the function is to return multiple objects, Matas st_... functions should be used, as we will demonstrate.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 154 / 207

Matas interface functions

Data access

If youre using Mata functions in conjunction with Statas ado-le language, one of the most important set of tools are Matas interface functions: the st_ functions. The rst category of these functions provide access to data. Stata and Mata have separate workspaces, and these functions allow you to access and update Statas workspace from inside Mata. For instance, st_nobs(), st_nvar() provide the same information as describe in Stata, which returns r(N), r(k) in its return list. Mata functions st_data(), st_view() allow you to access any rectangular subset of Statas numeric variables, and st_sdata(), st_sview() do the same for string variables.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

155 / 207

Some examples of StataMata routines

A multi-variable function

The Stata code:

program centervars, rclass version 11 syntax varlist(numeric) [if] [in], /// GENerate(string) [DOUBLE] marksample touse quietly count if `touse if `r(N) == 0 error 2000 foreach v of local varlist confirm new var `generate`v foreach v of local varlist qui generate `double `generate`v = . local newvars "`newvars `generate`v" mata: centerv( "`varlist", "`newvars", "`touse" ) end

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

173 / 207

Some examples of StataMata routines

A multi-variable function

The le centervars.ado contains a Stata command, centervars, that takes a list of numeric variables and a mandatory generate() option. The contents of that option are used to create new variable names, which then are tested for validity with confirm new var, and if valid generated as missing. The list of those new variables is assembled in local macro newvars. The original varlist and the list of newvars is passed to the Mata function centerv() along with touse, the temporary variable that marks out the desired observations.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

174 / 207

Some examples of StataMata routines

A multi-variable function

Passing a function to Mata

The Stata code:

program centertrans, rclass version 11 syntax varlist(numeric) [if] [in], /// GENerate(string) [TRans(string)] [DOUBLE] marksample touse quietly count if touse if r(N) == 0 error 2000 foreach v of local varlist { confirm new var generatev } local trops abs exp log sqrt if "trans" == "" { local trfn "mf_iden" } else { local ntr : list posof "trans" in trops if !ntr { display as err "Error: trans must be chosen from trops" error 198 } local trfn : "mf_trans" } foreach v of local varlist { qui generate double generatetransv = . local newvars "newvars generatetransv" } mata: centertrans( "varlist", "newvars", &trfn(), "touse" ) end

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

179 / 207

Some examples of StataMata routines

Passing a function to Mata

In Mata, we must dene wrapper functions" for the transformations, as we cannot pass a pointer to a built-in function. We dene trivial functions such as function mf_log(x) return(log(x)) which denes the mf_log() scalar function as taking the log of its argument. The Mata function centertrans() receives the function argument as pointer(real scalar function) scalar f To apply the function, we use Z[ ., . ] = (*f)(X) which applies the function referenced by f to the elements of the matrix X.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 180 / 207

Some examples of StataMata routines

Passing a function to Mata

The Mata code:

version 11 mata: function mf_abs(x) return(abs(x)) function mf_exp(x) return(exp(x)) function mf_log(x) return(log(x)) function mf_sqrt(x) return(sqrt(x)) function mf_iden(x) return(x) void centertrans( string scalar varlist, /// string scalar newvarlist, pointer(real scalar function) scalar f, string scalar touse) real matrix X, Z st_view(X=., ., tokens(varlist), touse) st_view(Z=., ., tokens(newvarlist), touse) Z[ , ] = (*f)(X) Z[ , ] = Z :- mean(Z) end

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

181 / 207

Some examples of StataMata routines

A Mata-based estimation routine

Mata may prove particularly useful when you have an algorithm readily expressed in matrix form. Many estimation problems fall into that category. In this last example, I illustrate how heteroskedastic OLS (HOLS) can be easily implemented in Mata, with Stata code handling all of the housekeeping details. This section draws on joint work with Mark Schaffer. HOLS is a form of Generalised Method of Moments (GMM) estimation in which you assert not only that the regressors X are uncorrelated with the error, but that you also have additional variables Z which are also uncorrelated with the error. Those additional orthogonality conditions serve to improve the efciency of estimation when non-i.i.d. errors are encountered. This estimator is described in help ivreg2 and in Baum, Schaffer & Stillman, Stata Journal, 2007.
Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 182 / 207

Some examples of StataMata routines

A Mata-based estimation routine

A particularly important feature added to Mata in Stata version 10 is the suite of optimize() commands. These commands permit you to dene your own optimization routine in Mata and direct its use. The routine need not be a maximum-likelihood nor nonlinear least squares routine, but rather any well-dened objective function that you wish to minimize or maximise. Just as with ml, you may write a d0, d1 or d2 routine, requiring zero, rst or rst and second analytic derivatives in terms of the gradient vector and Hessian matrix. For ease of use in statistical applications, you may also construct a v0, v1 or v2 routine in terms of the score vector and Hessian matrix. For the rst time, Stata provides a non-classical optimization method, NelderMead simplex, in addition to the classical techniques available elsewhere in Stata.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

183 / 207

Some examples of StataMata routines

A Mata-based estimation routine

The hols command takes a dependent variable and a set of regressors. The exog() option may be used to provide the names of additional variables uncorrelated with the error. By default, hols calculates estimates under the assumption of i.i.d. errors. If the robust option is used, the estimates standard errors are robust to arbitrary heteroskedasticity. Following estimation, the estimates post and estimates display commands are used to provide standard Stata estimation output. If the exog option is used, a SarganHansen J test statistic is provided. A signicant value of the J statistic implies rejection of the null hypothesis of orthogonality.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

184 / 207

Some examples of StataMata routines

A Mata-based estimation routine

The Stata code:

program hols, eclass version 11 syntax varlist [if] [in] [, exog(varlist) robust ] local depvar: word 1 of varlist local regs: list varlist - depvar marksample touse markout touse exog tempname b V mata: m_hols("depvar", "regs", "exog", "touse", "robust") mat b = r(beta) mat V = r(V) local vnames regs _cons matname V vnames matname b vnames, c(.) local N = r(N) ereturn post b V, depname(depvar) obs(N) esample(touse) ereturn local depvar = "depvar" ereturn scalar N = r(N) ereturn scalar j = r(j) ereturn scalar L = r(L) ereturn scalar K = r(K) if "robust" != "" { ereturn local vcetype "Robust" } local res = cond("exog" != "", "Heteroskedastic", "") display _newline "res OLS results" _col(60) "Number of obs = " e(N) ereturn display display "Sargan-Hansen J statistic: " %7.3f e(j) if ( e(L)-e(K) > 0 ) { display "Chi-sq(" %3.0f e(L)-e(K) " ) P-val = " /// %5.4f chiprob(e(L)-e(K), e(j)) _newline } end Christopher F Baum (BC / DIW) Programming in Stata and Mata Adelaide, June 2010 185 / 207

Some examples of StataMata routines

A Mata-based estimation routine

Some examples of StataMata routines

A Mata-based estimation routine

Comparison of HOLS estimation results

. hols price mpg headroom, robust OLS results Robust Std. Err. 66.15838 314.9933 2163.351 Number of obs = 74

price mpg headroom _cons

Coef. -259.1057 -334.0215 12683.31

z -3.92 -1.06 5.86

P>|z| 0.000 0.289 0.000

[95% Conf. Interval] -388.7737 -951.3971 8443.224 -129.4376 283.3541 16923.4

Sargan-Hansen J statistic: 0.000 . hols price mpg headroom, exog(trunk displacement weight) robust Heteroskedastic OLS results Number of obs = 74 Robust Std. Err. 60.375 313.6646 2067.6

price mpg headroom _cons

Coef. -287.4003 -367.0973 13136.54

z -4.76 -1.17 6.35

P>|z| 0.000 0.242 0.000

[95% Conf. Interval] -405.7331 -981.8687 9084.117 -169.0675 247.6741 17188.96

Sargan-Hansen J statistic: 4.795 Chi-sq( 3 ) P-val = 0.1874

1. 71 2. 80 3. 116 4. 132 5. 158 6. 176 7. 179 8. 201 9. 244 10. 277 11. 299 12. 308 13. 335 14. 347 15. 361 16. 448 17. 453 Christopher F Baum 18. 463

2 31 1 36 4 46 3 13 6 56 7 28 6 94 7 17 10 11 11 8 12 26 11 31 14 10 13 14 14 9 17 18 16 23 (BC 17 / DIW) 11

Programming in Stata and Mata

Adelaide, June 2010

204 / 207

Example of Mata programming

Example: Finding nearest neighbors

We must note, however, that the response variables values are very weakly correlated with those of the matchvar. Matching cities on the basis of one attribute does not seem to imply that they will have similar values of air pollution. We thus exercise the routine on two broader sets of attributes: one adding temp and wind, and the second adding precip and days, where days measures the mean number of days with poor air quality.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

205 / 207

Example of Mata programming

Example: Finding nearest neighbors

. nneighbor pop temp wind, y(so2) matchobs(mo3) matchval(mv3) Nearest neighbors for 41 observations of so2 Based on L2-norm of standardized vars: pop temp wind Matched observation numbers: mo3 Matched values: mv3 Correlation[ so2, mv3 ] = 0.1769 . nneighbor pop temp wind precip days, y(so2) matchobs(mo5) matchval(mv5) Nearest neighbors for 41 observations of so2 Based on L2-norm of standardized vars: pop temp wind precip days Matched observation numbers: mo5 Matched values: mv5 Correlation[ so2, mv5 ] = 0.5286

We see that with the broader set of ve attributes on which matching is based, there is a much higher correlation between the so2 values for each city and those for its nearest neighbor.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

206 / 207

Workshop: delegates programming needs and solutions

In this last session, I encourage each of you to specify a statistical or data management problem for which you need programming assistance. We will develop possible solutions to these problems, emphasizing the use of programming principles discussed in the course.

Christopher F Baum (BC / DIW)

Programming in Stata and Mata

Adelaide, June 2010

207 / 207

Stata - Tips PDF
100% (1)
Stata - Tips PDF
114 pages
Basic Stata Programming
No ratings yet
Basic Stata Programming
111 pages
A Comprehensive Guide To Coding and Programming in Stata 1st Edition Verified Download
100% (13)
A Comprehensive Guide To Coding and Programming in Stata 1st Edition Verified Download
16 pages
Using Stat A
No ratings yet
Using Stat A
105 pages
Michael J Panik - Regression Modeling - Methods, Theory, and Computation With SAS-CRC Press (2009)
No ratings yet
Michael J Panik - Regression Modeling - Methods, Theory, and Computation With SAS-CRC Press (2009)
806 pages
Introduction To Stata
No ratings yet
Introduction To Stata
181 pages
An Introduction To Modern Econometrics Using Stata (Christopher Baum) PDF
100% (1)
An Introduction To Modern Econometrics Using Stata (Christopher Baum) PDF
349 pages
Stata Presentacion
No ratings yet
Stata Presentacion
109 pages
Panel Analysis - April 2019 PDF
100% (1)
Panel Analysis - April 2019 PDF
303 pages
Introduction To Stata 8
No ratings yet
Introduction To Stata 8
74 pages
C# Source Generators Explained: Boosting Compile-Time Productivity: Write Smarter Code by Automating Repetition and Enhancing Your C# Projects with Compile-Time Code Generation by BOSCO-IT CONSULTING 2025
No ratings yet
C# Source Generators Explained: Boosting Compile-Time Productivity: Write Smarter Code by Automating Repetition and Enhancing Your C# Projects with Compile-Time Code Generation by BOSCO-IT CONSULTING 2025
461 pages
Time Series Documentation - Mathematica
100% (2)
Time Series Documentation - Mathematica
214 pages
A Little Bit of STATA Programming
No ratings yet
A Little Bit of STATA Programming
32 pages
Lectures
No ratings yet
Lectures
766 pages
L5 Logistic Regression (2011)
100% (1)
L5 Logistic Regression (2011)
55 pages
Baum 2003 - Introduction To Stata
No ratings yet
Baum 2003 - Introduction To Stata
65 pages
Stata Manual 2009
100% (5)
Stata Manual 2009
222 pages
Epidemic Modelling
100% (3)
Epidemic Modelling
75 pages
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
No ratings yet
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
349 pages
Panel Data Analysis Using Stata: Sebastian T. Braun University of ST Andrews
No ratings yet
Panel Data Analysis Using Stata: Sebastian T. Braun University of ST Andrews
90 pages
The Nature of Statistics (Statistics - A Universal Guide To The Unknown Book 1)
No ratings yet
The Nature of Statistics (Statistics - A Universal Guide To The Unknown Book 1)
184 pages
STATA Commands For Unobserved Effects Pa
No ratings yet
STATA Commands For Unobserved Effects Pa
23 pages
Stataquest Tutorial Programming Guide: Henrik Schmiediche June 1997
No ratings yet
Stataquest Tutorial Programming Guide: Henrik Schmiediche June 1997
45 pages
Applied Econometrics
100% (1)
Applied Econometrics
74 pages
Difference Between Association and Causation
No ratings yet
Difference Between Association and Causation
92 pages
Applied Longitudinal Analysis Lecture Notes
No ratings yet
Applied Longitudinal Analysis Lecture Notes
475 pages
DDD Analysis
No ratings yet
DDD Analysis
21 pages
Pig Latin Reference Manual 2
No ratings yet
Pig Latin Reference Manual 2
149 pages
Applied Econometrics Using Stata
100% (1)
Applied Econometrics Using Stata
100 pages
Panel Count Models in Stata
No ratings yet
Panel Count Models in Stata
79 pages
Endogeneity and Instrumental Variables
No ratings yet
Endogeneity and Instrumental Variables
22 pages
Stata Guide To Accompany Introductory Econometrics For Finance
No ratings yet
Stata Guide To Accompany Introductory Econometrics For Finance
175 pages
TreeMap Program
No ratings yet
TreeMap Program
4 pages
Regression With Stata
No ratings yet
Regression With Stata
132 pages
R For Stata Users PDF
100% (1)
R For Stata Users PDF
32 pages
Stata Tutorial
No ratings yet
Stata Tutorial
87 pages
File Download PDF
No ratings yet
File Download PDF
136 pages
Stata Ts Introduction To Time-Series Commands
100% (1)
Stata Ts Introduction To Time-Series Commands
6 pages
Master PHP Over Night
No ratings yet
Master PHP Over Night
203 pages
Baum Intro To Stata Programming Contents
No ratings yet
Baum Intro To Stata Programming Contents
9 pages
IVregression ECO311 Erdinc 14.03
No ratings yet
IVregression ECO311 Erdinc 14.03
11 pages
Stata Guide To Accompany Introductory Econometrics For Finance PDF
No ratings yet
Stata Guide To Accompany Introductory Econometrics For Finance PDF
175 pages
Panel Data Methods For Microeconometrics Using Stata: A. Colin Cameron Univ. of California - Davis
100% (1)
Panel Data Methods For Microeconometrics Using Stata: A. Colin Cameron Univ. of California - Davis
55 pages
CadScriptingLanguages Skill
50% (2)
CadScriptingLanguages Skill
10 pages
Scala Tutorial
No ratings yet
Scala Tutorial
36 pages
Stata Graphs - Examples
No ratings yet
Stata Graphs - Examples
42 pages
STATA Training Session 2
No ratings yet
STATA Training Session 2
45 pages
Useful Stata Commands
No ratings yet
Useful Stata Commands
48 pages
4 - PHP
No ratings yet
4 - PHP
48 pages
UsefulStataCommands PDF
No ratings yet
UsefulStataCommands PDF
51 pages
Tutorial How To Run Panel Data Analysis by Using Stata
No ratings yet
Tutorial How To Run Panel Data Analysis by Using Stata
21 pages
Java Arrays
No ratings yet
Java Arrays
5 pages
Teaching With Stata: Peter A. Lachenbruch & Alan C. Acock Oregon State University
No ratings yet
Teaching With Stata: Peter A. Lachenbruch & Alan C. Acock Oregon State University
28 pages
Graphing Stata (MIT)
No ratings yet
Graphing Stata (MIT)
56 pages
02-Introduction To Stata
No ratings yet
02-Introduction To Stata
19 pages
Working With Arrays-1
No ratings yet
Working With Arrays-1
71 pages
Regression Explained SPSS
No ratings yet
Regression Explained SPSS
24 pages
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
No ratings yet
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
27 pages
Maximum Likelihood Programming in Stata: January 2003
No ratings yet
Maximum Likelihood Programming in Stata: January 2003
18 pages
STATA Training Session 3
No ratings yet
STATA Training Session 3
53 pages
PHP Question Bank Ch-1,2 - 090233
No ratings yet
PHP Question Bank Ch-1,2 - 090233
2 pages
Java CS Tut N Questions
No ratings yet
Java CS Tut N Questions
31 pages
Asynchronous Programming in C# and Visual Basic: Mads Torgersen, Microsoft October 2010
No ratings yet
Asynchronous Programming in C# and Visual Basic: Mads Torgersen, Microsoft October 2010
15 pages
ACT-671 Introduction Econometrics-2012
No ratings yet
ACT-671 Introduction Econometrics-2012
29 pages
Server-Side Scripting (PHP)
No ratings yet
Server-Side Scripting (PHP)
25 pages
Java 8 Interview Questions - Tutorialspoint
No ratings yet
Java 8 Interview Questions - Tutorialspoint
16 pages
7 January 2019 Osu Cse 1
No ratings yet
7 January 2019 Osu Cse 1
66 pages
Books Available From SAS Press
No ratings yet
Books Available From SAS Press
4 pages
The Advantages of Least Squares Monte Carlo
0% (1)
The Advantages of Least Squares Monte Carlo
9 pages
Module 3 IT 104 Arrays and Collections - 051541
No ratings yet
Module 3 IT 104 Arrays and Collections - 051541
21 pages
MacKinnon Critical Values For Cointegration Tests Qed WP 1227
No ratings yet
MacKinnon Critical Values For Cointegration Tests Qed WP 1227
19 pages
Dev0 07 PLPGSQL Arrays
No ratings yet
Dev0 07 PLPGSQL Arrays
17 pages
Logger System
No ratings yet
Logger System
15 pages
Stata Basic Commands
No ratings yet
Stata Basic Commands
62 pages
3 Control Structure and Funcrtions in PHP
No ratings yet
3 Control Structure and Funcrtions in PHP
16 pages
Import Data From Excel To Azure SQL Database Using Azure Data Factory
No ratings yet
Import Data From Excel To Azure SQL Database Using Azure Data Factory
24 pages
Arma 3 Code Optimisation
No ratings yet
Arma 3 Code Optimisation
12 pages
PowerShell - Special Characters and Tokens
No ratings yet
PowerShell - Special Characters and Tokens
4 pages
SIS Model For An Infectious Disease
No ratings yet
SIS Model For An Infectious Disease
3 pages
Java For C# Programmers
No ratings yet
Java For C# Programmers
25 pages
ITE 311 - Unit 3 Conditional Statements, Looping and Array
No ratings yet
ITE 311 - Unit 3 Conditional Statements, Looping and Array
19 pages
Collection Was Modified Enumeration
No ratings yet
Collection Was Modified Enumeration
14 pages
Scala-Quiz MD
No ratings yet
Scala-Quiz MD
10 pages
Namma Kalvi Computer Applications Chapter 7 Study Material em 214982
No ratings yet
Namma Kalvi Computer Applications Chapter 7 Study Material em 214982
11 pages
WEBAPPS - Practice Exercise 8 - PHP Repetition Statement and Array
No ratings yet
WEBAPPS - Practice Exercise 8 - PHP Repetition Statement and Array
8 pages
Java Lab Manual BCA 4 C and D
No ratings yet
Java Lab Manual BCA 4 C and D
6 pages
Author:: Nitin S Senior Engineer
No ratings yet
Author:: Nitin S Senior Engineer
5 pages
Stata Introduction To Stata
No ratings yet
Stata Introduction To Stata
12 pages
Uncertainty Bands: A Guide to Predicting and Regulating Economic Processes
From Everand
Uncertainty Bands: A Guide to Predicting and Regulating Economic Processes
Ashot Tavadyan
No ratings yet
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
From Everand
The Data Detective's Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data
Kim Chantala
No ratings yet