0% found this document useful (0 votes)
403 views12 pages

Stata For Dummies v1m

The document provides an overview of basic Stata functions including how to open and exit Stata, load data, name and label variables, and save outputs. Key points covered include how to open the data editor to enter data, use the command window to run commands, name and label variables, and save a log file to record session outputs and commands. The document is intended to accompany a 2 hour workshop on introductory Stata skills.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
403 views12 pages

Stata For Dummies v1m

The document provides an overview of basic Stata functions including how to open and exit Stata, load data, name and label variables, and save outputs. Key points covered include how to open the data editor to enter data, use the command window to run commands, name and label variables, and save a log file to record session outputs and commands. The document is intended to accompany a 2 hour workshop on introductory Stata skills.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 12

Stata for Dummies:

A Practical Introduction to Stata Basics


83143370.doc

Gwilym Pryce 4 March 2009 These notes are designed to accompany a 2 hour practical workshop on the widely used statistics programme, Stata. The session includes guidance on how to open and exit Stata, how to open a data file, how to use a syntax file, how to save output, how to create a simple table, how to create a graph, and how to run a simple regression.

1. How to Load-up and Exit Stata


To open Stata, click on the Start button and select: Statistical Apps Stata10 To exit Stata, press <alt+F4> or click on File, Exit. Make sure, however, you have saved your work before exiting. You will learn how to do this below.

2. What you see when you open Stata


Depending on how Stata has been set up on your computer, you will see four frames or windows within Stata: Results, Command, Review, and Variables. If any of these are not visible, just click on Window on the menu bar and select the appropriate item. The Window menu also allows you to access other windows Data, Viewer, Do, which are not opened automatically when you load up Stata. A brief explanation of each of these windows is given below. Results: The largest window is the Results window. When you first load up Stata this window usually lists details on the version of Stata, the serial number of your copy, and the web address of the Stata corporation. Command: The Command window is where you type instructions that you want Stata to act on immediately (as opposed to a Do-file which allows you to accumulate a list of commands before requesting that Stata execute them). The results of your commands will usually be displayed in the Results window (no surprises there then). For example, if you click in the Command window and type describe then hit the <Enter> key on your keyboard, it will give basic details of the

dataset in memory. Since we have not opened a data file yet, the Results window should list the following details or something similar:
. describe Contains data obs: vars: size: Sorted by:

0 0 0 (100.0% of memory free)

Notice that the first line of the output lists the command you have entered. It then tells you that you have zero observations (obs), zero variables (vars), the data file has zero size, leaving you with 100% of memory, and that your data are not sorted by any particular variable. In the Command window: notice also that you can scroll through previous commands by pressing <page up> and <page down> on your keyboard. Once the command is in view, you can edit it. If you want to re-enter a command (whether edited or unedited), simply press <Enter> when the command is in view. Review: Having been truly enlightened by the outcome of you first command, you will notice that your describe instruction has also appeared in the Review window. This window keeps a record of your commands. It saves you from having to retype a command. If you want to repeat an instruction, simply click on the appropriate line of the Review window and it will appear in the Command window. You can then edit it, if you wish, in the Command window, before pressing the <Enter> key to run the command. Variables: This window simply lists the variables in memory and the labels ascribed to them. Since we have not loaded a dataset, the Variables window should be blank. Once you do have some variables to play with, you can click on a variable name in the Variables window and the name of the variable will be pasted to the Command window sparing you the trouble of having to type it quite a useful facility when you want to perform an operation on lots of variables. Data: The Data-editor remains out of view until you open it either by typing edit in the Command window, or by selecting Data-editor from the list that pops-up when you select Window from the menu bar. The Data-editor looks like a spreadsheet but is in fact a lot less flexible. For example, variables are always presented as columns with the variable names at the top and observations as rows. You cannot enter formulas in the cells, only data, either numerical or string. Nevertheless, the Data-editor is probably the easiest way to enter your own data (an alternative is the input command, or you can import data from Excel and other formats1). Viewer:
1

Statransfer is a useful companion program it converts data from a wide variety of formats.

This window is only opened if you select Help from the menu bar or if you want to view a Log-file (one that keeps track of your output and commands see below). It is worth noting at this point that Stata has an excellent Help facility. It takes a little while to get used to the format, but it is very comprehensive and has a very consistent structure. The Help facility is so good, in fact, that you could probably get by without ever having to refer to the printed manuals. For example, click on Help, Search, then type edit, and press <Enter>. Scroll down the list of entries on offer in the Viewer window (which will have opened automatically) until you come to the edit hyperlink, and click on it. (Alternatively, you could have simply typed help edit in the Command window). You should then see a detailed description of the edit command. Items in bold and in square brackets refer to the manual volume where you can find more detailed information. In this case, it should say [D] which refers to the Data Manual, which is only worth knowing if you have a set of manuals (expensive). Log-file: It is important to note that nothing you have done so far has been saved or recorded for posterity. Once you close Stata, all the commands youve entered and outputs youve created are lost forever. If you have created or edited a Data-file or Do-file (see below), Stata will ask you if you want to save the changes, but it will not offer such useful prompts for entries to the Command or Results window. To save a record of your Stata session you must open a Log-file. The easiest way to do this is to click File, Log, Begin, then decide on the folder and file name. Do this now so that you have a record of the remainder of this session: Click File, Log, Begin. Call the file Stata for Dummies (or whatever you like) and save it to your H: drive, or temporarily onto the C: drive (note that the latter will be deleted when the computer is turned off, assuming you are using a lab computer). Alternatively you can enter the log using instruction in the Command window followed by the directory and filename. For example,
log using "H:\My Documents\whatever_you_like.smcl"

where smcl is the file extension for Stata log-files. You can view the contents of the log file at any time by going to File, Log, View.

NB Remember to close the log-file before you exit Stata otherwise the file will be lost! To close the log-file simply type close in the Command window (or click File, Log, Close). Dont do this just yet, however. Wait until the end of the session.

3. Create a Do-File (Syntax file)


Although the Command window can be useful for small tasks, the best way to work with Stata for larger projects is to create a Do-file. This is basically a text file where you enter your commands on separate lines and then run a command, or sequence of commands, by highlighting and clicking the Do current file icon (or pressing <Ctrl+R> on the keyboard) while in the Do-file window. Click on the New Do-file icon on the Results window toolbar. If you are not sure which icon to click, you can pass your mouse pointer over each icon in turn to obtain a brief description. Once the Do-file editor has opened, click File, Save as, and choose a suitable directory and file name. You cannot run commands from within the Do-file until youve saved the Do-file.

Its a good idea to add titles and labels to your Do-file to make it easier to follow when you return to it at a future date. If you start a line with an asterix, Stata will ignore everything that follows on that line. Type *=============================== on the first line of your Do-file. Then press <Enter> and type *Stata Training Session on the second line. Then copy the first line (highlight and press <Ctrl+C>), and paste it onto the third line (<Ctrl+V>). Your Do-file should now look something like this:
*=============================== *Stata Training Session *===============================

4. Entering, Labelling and Saving Data


Go to the Command window (if it is not in view, hold down <Alt> then press <Tab> repeatedly until youve reached the Stata icon). Type edit in the Command window and press <Enter> to open the Data Editor (or click on the Data Editor icon on the Results toolbar). In the first column, enter the numbers 1 to 5 on consecutive lines. In the second column, enter the following numbers on consecutive lines: 15845, 74500, 31000, 22000, 20323. In the third column, enter the following words on consecutive lines: female, male, male, female, male.

Then close the Data Editor by clicking on X in the top right corner of the Data Editor, and Accept Changes. You will see that in the Variables window, you now have three variables listed, var1, var2, and var3. We now want to give these variables more meaningful names. Type and run (highlight the lines then press <Crl+R>) the following commands in your Do-file:
rename var1 id rename var2 income rename var3 sex

You will see in the Variables window that the names of the variables have changed accordingly. The next step is to label the variables. Type and run the following three lines from your Do-file:
label variable id "Respondent Identification Code" label variable income "Respondent basic income ()" label variable sex "Sex of respondent"

You will see in the Variables window that the variables now have labels (you might need to widen the Variables window to see this simply use your mouse to drag the edge of Variables window until you can read the variable labels). Save the data in an appropriate folder by typing and running the save command in your Do-file. For example, if you wanted to save the file in H:\My Documents folder (probably not a good idea if you are using a lab computer), you would type:
save "H:\My Documents\income_data.dta"

where dta is the extension used to identify the file as a Stata dataset.

5. Closing and Opening a Data File


First, lets clear everything in memory (this wont affect your Do-file but it will wipe any data youve entered so make sure you have saved your data-file first). Type clear on a new line in your Do-file and then run it (highlight the line then press <Ctrl+R>).

You will see that the Variables window is now blank. Now open the Data-file you have just created: Enter and run the use command in your Do-file. Depending on the folder you saved your data the command will look something like:
use "H:\My Documents\income_data.dta", clear

On this occasion, you dont actually need the comma followed by the clear option since you had already entered clear as a separate command prior to running the use command. Normally, however, you wouldnt run clear as a separate command but as an option at the end of the use command because the latter only clears the data from memory whereas the former wipes everything (macros, scalars, matrices, mata routines, and lots of other stuff you dont need to know about just now). If you had not cleared the data (either separately or as an option) Stata would have come up with an error message warning you that did not open the data file because data in memory would have been lost.

6. Creating New Variables


We shall now learn how to add a new variable using the input command, then how to use the gen command to create a quantitative variable from two existing quantitative variables, then how to create a series of dummy variables from a categorical variable using the tab , gen() command. Create a new variable overtime by running the following syntax from your Do-file:
input overtime 850 0 2000 1000 5000 end

Now label the variable:


label variable overtime "Respondent income from overtime ()"

Now use the gen command to create a new variable called total_income which is the sum of overtime and basic income:
gen total_income = income + overtime label variable total_income Total Income ()

Now run a simple frequency table for sex of respondent using the tab command:
tab sex

This should result in the following table appearing in the Results window:
Sex of | respondent | Freq. Percent Cum. ------------+----------------------------------female | 2 40.00 40.00

male | 3 60.00 100.00 ------------+----------------------------------Total | 5 100.00

This a useful command because one of the options (typed after a comma) is to generate a series of dummy (binary) variables, one for each category of the variable in question. To do this for the sex variable, type:
tab sex, gen(sex_)

which should repeat the frequency table and create two new variables, sex_1 which equals 1 if the observation is female and zero otherwise, and sex_2 which equals 1 if the observation is male and zero otherwise. This is a most useful facility, particularly when has a variable with many potential categories for which a separate dummy variable has to be created for each category (as is often the case when one needs to include the effect of a categorical variable in a regression equation).

7. Creating Tables of Summary Statistics


The sum command is a great way to get a quick summary of a quantitative variable: Type and run sum(income overtime). The result will be a table listing the number of observations, mean standard deviation, minimum and maximum of each variable:
Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------income | 5 32733.6 23989.04 15845 74500 overtime | 5 1770 1940.232 0 5000

By adding the detail option, a more comprehensive list of descriptive statistics is revealed. Running the following command from your Do-file,
sum(income overtime), detail

yields:
Respondent basic income () ------------------------------------------------------------Percentiles Smallest 1% 15845 15845 5% 15845 20323 10% 15845 22000 Obs 5 25% 20323 31000 Sum of Wgt. 5 50% 75% 90% 95% 99% 22000 31000 74500 74500 74500 Largest 20323 22000 31000 74500 Mean Std. Dev. Variance Skewness Kurtosis 32733.6 23989.04 5.75e+08 1.31378 2.983174

Respondent income from overtime () ------------------------------------------------------------Percentiles Smallest 1% 0 0 5% 0 850

10% 25% 50% 75% 90% 95% 99%

0 850 1000 2000 5000 5000 5000

1000 2000 Largest 850 1000 2000 5000

Obs Sum of Wgt. Mean Std. Dev. Variance Skewness Kurtosis

5 5 1770 1940.232 3764500 1.030552 2.640236

To run descriptive statistics by category of another variable such as income by gender you can use the tab categorical variable, sum(continuous variable) command. For example, try entering tab sex, sum(income) You should obtain the following table:
| Summary of Respondent basic income Sex of | () respondent | Mean Std. Dev. Freq. ------------+-----------------------------------female | 18922.5 4352.2422 2 male | 41941 28697.839 3 ------------+-----------------------------------Total | 32733.6 23989.037 5

8. Creating a Graph
Type hist income to get a histogram of income:
3.0e-05 0 1.0e-05 Density 2.0e-05

20000

30000 40000 50000 Respondent basic income ()

60000

Type scatter income overtime to get a scatter plot of basic income and overtime income:
80000 20000 0 Respondent basic income () 40000 60000

1000

2000 3000 4000 Respondent income from overtime ()

5000

Enter and run graph bar (mean) total_income, over(sex) to get a bar chart of the mean income of respondents by sex:

10,000

mean of total_income 20,000 30,000

40,000

female

male

9. Running a Regression
The syntax for running a regression is very simple. Simply type regress followed by the dependent variable, followed by the independent variables (separated by spaces). Run a regression of overtime on basic income and sex using the following syntax: regress overtime income sex_1 You should get a table of regression results that looks like the following:
Source | SS df MS -------------+-----------------------------Model | 12447426.2 2 6223713.12 Residual | 2610573.77 2 1305286.88 -------------+-----------------------------Total | 15058000 4 3764500 Number of obs F( 2, 2) Prob > F R-squared Adj R-squared Root MSE = = = = = = 5 4.77 0.1734 0.8266 0.6533 1142.5

-----------------------------------------------------------------------------overtime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------income | -.0777339 .0279902 -2.78 0.109 -.1981659 .0426982 sex_1 | -3197.65 1225.908 -2.61 0.121 -8472.309 2077.008 _cons | 5593.57 1346.56 4.15 0.053 -200.2087 11387.35 ------------------------------------------------------------------------------

Now try running the regression only on males: regress overtime income if sex == male which should yield the following output:

Source | SS df MS -------------+-----------------------------Model | 10255839.8 1 10255839.8 Residual | 2410826.85 1 2410826.85 -------------+-----------------------------Total | 12666666.7 2 6333333.33

Number of obs F( 1, 1) Prob > F R-squared Adj R-squared Root MSE

= = = = = =

3 4.25 0.2874 0.8097 0.6193 1552.7

------------------------------------------------------------------------------

overtime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------income | -.0789081 .0382577 -2.06 0.287 -.5650182 .4072021 _cons | 5642.817 1837.999 3.07 0.200 -17711.18 28996.81 ------------------------------------------------------------------------------

Now try running the original regression using Whites standard errors (which give more reliable t-values when you have heteroskedasticity) by including the robust option: Run the following regression: regress overtime income sex_1, robust
Linear regression Number of obs = F( 2, 2) = Prob > F = R-squared = Root MSE = 5 6.38 0.1354 0.8266 1142.5

-----------------------------------------------------------------------------| Robust overtime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------income | -.0777339 .0244834 -3.17 0.087 -.1830773 .0276096 sex_1 | -3197.65 1385.386 -2.31 0.147 -9158.487 2763.186 _cons | 5593.57 1787.983 3.13 0.089 -2099.499 13286.64 ------------------------------------------------------------------------------

10.

Save Do-file & close log-file before you go!

Remember to save changes to your Do-file (in the Do-file editor, click File, Save). Also, close the log-file before you exit Stata otherwise the file will be lost. To close the log-file simply type close in the Command window and press <Enter> or click File, Log, Close in the Results window.

10

11. Additional Exercises:


1. If you have completed the above exercises, try opening one of the standard teaching datasets provided with the Stata program: use "Q:\Stata10\auto.dta" If the auto.dta file does not appear to be located in this directory, try going to the File menu and select Example Datasets and click on Example datasets installed with Stata. Alternatively, you should be able to open the dataset with the following command: sysuse auto.dta or open the file from the Stata website:
use http://www.stata-press.com/data/r10/auto.dta

2. Create two new variables: a. First, create a variable equal to the natural log of price: gen price_ln = ln(price) b. Now create a variable equal to the ratio of trunk to length and call this t_to_l_ratio. 3. Now label these two new variables and create summary statistics and histograms for all continuous variables in the data. 4. Create frequency tables and bar charts for categorical variables 5. Create dummy variables for foreign and make. 6. Run a scatter plot of price on weight 7. Run a regression of price on weight, and the dummies you have created 8. Repeat for log price. 9. After running the regression, enter the following command: ereturn list. This command displays the results that Stata saves automatically following a regression (though note that the information is lost as soon as you run another regression or terminate your Stata session). You can access these saved scalars, matrices and macros in subsequent commands. This is very useful if, for example, you want to compute new variables or run tests that require this information.

11

11. Exploring the help system:


Find out more about various Stata commands by typing the following at the command prompt (or use the Help menu on the menu bar): help regress help logit help tabstat help table help functions help language Now click on Help, Contents, Basics, and browse through.

12

You might also like