0% found this document useful (0 votes)
8 views19 pages

Stata Guide 1

The Stata Guide by Chandan Mukherjee provides a comprehensive introduction to using Stata 8 for data analysis, emphasizing its command-driven nature. It covers essential topics such as setting up the Stata interface, starting a session, managing log files, and basic data operations like loading, saving, and sorting data. The guide aims to help beginners navigate Stata effectively, offering practical commands and tips for efficient data management.

Uploaded by

Piyush Paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views19 pages

Stata Guide 1

The Stata Guide by Chandan Mukherjee provides a comprehensive introduction to using Stata 8 for data analysis, emphasizing its command-driven nature. It covers essential topics such as setting up the Stata interface, starting a session, managing log files, and basic data operations like loading, saving, and sorting data. The guide aims to help beginners navigate Stata effectively, offering practical commands and tips for efficient data management.

Uploaded by

Piyush Paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Stata Guide: Chandan Mukherjee

STATA GUIDE

UNIT 1
GETTING STARTED WITH STATA 8

Introduction

This is a self-study guide to the use of STATA. Always bring it along with you to the workshops so
that you can consult it whenever the need arises. It can best be used when you are sitting behind a
computer with STATA up and running.

This guide assumes that you are familiar with the use of Windows
Operating System and with Excel.

As you become acquainted with STATA, you’ll discover that it is


extremely powerful, flexible and versatile as a tool for data analysis. It is a
software programme designed to enable you to converse with your data.

Most beginners experience some difficulty with STATA because, unlike


most other statistical software packages, STATA is essentially command
driven, not menu driven. The version 8 has a menu system. However, this
guide will use mainly the command system. It is easier to use the menu system after one has grasped the
basics of the Stata command system.

The use of the command system means is that you cannot just click (with your mouse) on a particular
option (say, a graph or a calculation) from a pull-down menu, instead you have to instruct STATA
exactly what to do by writing a command. You may find this to be cumbersome at first, but (after some
time and practice) you will see that STATA’s command structure allows you to approach the data from
whatever angle you choose to take.

Now make sure your computer is switch on and STATA is up and running !

1. The windows of STATA: Setting them up

The first thing you’ll note when you look at the screen is that STATA consists of a set of separate
windows with the following titles (on the title bars of the respective windows):

• Intercooled Stata 8.0


• Stata Results
• Stata Command
• Review
• Variables

Of the five windows listed above, the last two are optional but useful to retain them on the screen. The
main window is titled Intercooled Stata 8.0. This window contains the stata main menu, toolbar and
two more windows viz. Stata Results and Stata Command. We suggest that you arrange the windows as
shown by the following picture by dragging them appropriately and resizing them wherever necessary.

1
AUD, MA Economics 2012-2014

Apart from the five visible windows there are three other windows in Stata which are hidden and only
appear when requested on command or by clicking an icon on the toolbar. They are:

• Graph
• Log
• Command
• Help/lookup

And here is a short description what all these windows do:

• The command window is the only window in which you can write in and converse with STATA;
• The results window lists your commands along with the results (calculations, etc.);
• the review window lists all previous commands (when starting up, this is empty);
• the variables window lists the variables in your file (when starting up, this is empty);
• the graph window will appear whenever you instruct STATA to graph something;
• the log window (invisible, unless explicitly requested) allows you to keep a running log (account)
of everything that appears in the results window (you can import the log later into Word if you
want);
• the help/lookup window gives you detailed information on all STATA commands, if requested.

As we go along we shall explain how to use these various windows (there are still other windows which
we shall mention when appropriate).

2
Stata Guide: Chandan Mukherjee

Important Note:

• In what follows, in this guide instructions which you should type in


the command line will be written in bold.
• STATA is case sensitive which means, for example, graph is
different from Graph (STATA understands the former, but not the
latter). Similarly, names of variables are case specific: if you define a
variable as wage, typing WAGE (or Wage) will not work!

2. Starting a session: Default Working Directory, Log file of numerical results and Log file of
commands

Before you start a session with STATA, you should first ensure that your working directory, where you
have your source data files for the session and where you would like to keep your output files
(numerical results, graphs, commands used), is set as desired by you. I have set Stata the working
directory C:\SDE2012 as default. During all the days in this workshop, we shall always use this as the
default directory. The current default directory is always indicated by STATA at the bottom-left corner
of its main window. Please check this now.

Anytime during a working session, if you have any doubt about the default directory, you check it by
the following command (try it now):

pwd [don’t forget to press the <Enter> key after typing the command]

STATA will respond by stating the directory in the Results window.

STATA allows you to keep track of your instructions and results on a separate
file, called a log file. After a session, you can import the file in Word to verify
the instructions you gave and STATA error messages, and to keep a record of
the results of your work. Graphs are not included in a log file. If you want to
keep them, you will have to save separately (we shall come to this later).

To open a log file, you have two alternatives. First, you can instruct STATA to start recording your
commands and the results by typing the following command:

log using XXX, text

XXX is just the name you give to the file; for example, this is first workshop are working on, so you
can give a name ws01 indicating that this is your first session with Stata. The specification text after the
comma tells Stata to keep the log file in text (ASCII) format so that it can be read by a text editor,or, a
word processor such as Microsoft WORD.

You are now ready to execute your third Stata command Type the following and press the <Enter>
key:

log using ws01, text

STATA responds by repeating your instruction in the results window, as follows:

. log using ws01, text


--------------------------------------------------------------------------------------------------
log: C:\SDE2012\ws01.log

3
AUD, MA Economics 2012-2014

log type: text


opened on: 4 Sep 2013 …. etc

There is a second method to open a log file is to use the mouse and click on icon for LOG (4th from the
left on the toolbar) and a dialog box will appear in which you can write the name of the log file (for
example, ws01). We shall try this method later.

Once you have executed the command opening a log file, it will keep a running account of your
activities. Opening a log also creates a log window which you can look at any time you want by clicking
the icon once more.

If you want temporarily to stop the log file, type, log off. To resume the log file, type log on. To close
the file at the end of your work, type log close.

Notes:
• We strongly advise you always to keep a log file of your work. It allows you to
check what you did during a session with STATA, and you can always edit the
log file as a report of your results. This saves time. Both functions are
extremely convenient.
• Add your comments, whenever appropriate, after typing *.
• keep in mind that graphs are never stored in the log file !

Finally, you can also keep a record of all the STATA commands you are executing during a seesion by
opening what is called a command log file. Note that the command log file keeps only the commands
and not the results (and the error messages) like the log file. The utility of a command log file is that
you can re-execute all the commands of a session without having to type them again, by converting this
file into what is called a do file. We shall show you this facility later.

3. Using * to make your own comments

STATA is a command-driven programme and, hence, will issue an error warning if you write something
that does not conform with its syntax. But often you want just to jot down some comments on what you
are doing. These may be conclusions, hints, clues, etc, or just general comments. To do so, put * (=
asterix) in front of your entry into the command window and STATA will ignore what follows, but print
it in the results window anyhow.

Try this, type -

* I have opened a log file by name ws01

and look at the results window. STATA reproduces your text, but does not consider it as a command.
This facility will prove to be very convenient when you are carrying out data analysis and want to
record your ideas along with it!

Hence, * just tells STATA to ignore what follows, but print it anyhow into the results window (and,
hence, into your log file).

4. Getting, saving and clearing data in STATA

STATA without data is not much use. To get data (already stored as STATA files) into STATA is
simple. STATA data files always take the form XXX.dta where XXX is the file name. The use
command allows you to load a STATA data file. For example,

4
Stata Guide: Chandan Mukherjee

use worker.dta

Note: if you stored the india file in another directory than the a: directory, adjust command accordingly!

Note: Data set india.dta stores data on a sample of urban workers in a South-Indian town. The data
give information on the sex of the worker, age, educational level, weekly earnings, and
whether or not it concerns a permanent or temporary job.

You will now notice that the variables window is no longer empty, but instead features 5 variables –
labeled respectively: sex, age, edu, wi, and pt.

If you want to save this data set under a new name, say worker, you can do this as follows:

save worker

This will save the file in you’re a: directory. The file will be saved as worker.dta (hence, .dta is added
as default extension, even if you do not type it in full). Hence, save worker is equivalent to typing
save worker.dta explicitly.

But if you try to do it again, STATA will issue an error warning and refuse to execute your command.
To see this, type,

save worker

STATA replies as follows:

. save a:\worker
file a:\worker.dta already exists
r(602);

Note: r(602) labels the error you made with a number and gives a cryptic explanation. Here STATA
notifies you that you are saving to a file that already exists! STATA does not allow you to do this,
unless you explicitly tell it to overwrite the earlier file with its new version.

To do so, you use the command replace along with save as follows:

save worker, replace

and STATA replies,

. save a:\worker, replace


file a:\worker.dta saved

Hence, to save a file anew after it already exists, you need to combine the save command with the
replace command! Otherwise, STATA will refuse to save a file over an already existing one. You must
tell STATA that you want to override the existing file!

The command clear will clear out all data from active memory, but does not affect existing data files.

Try this:

clear

and you’ll note that the variables window is now empty.

But if you next type,

5
AUD, MA Economics 2012-2014

use worker

you’ll note that the data file is loaded again. The command clear, therefore, does not operate on files,
but simply wipes out the data in the active memory.

5. Getting to know your data: describe, list, sort, codebook and inspect

Make sure the data set worker.dta is loaded. To see what a data set contains, use the describe command
by typing,

describe

STATA replies as follows:


. describe

Contains data from worker.dta


obs: 261 Industrial Worker Data, S.India
vars: 5 7 Aug 2001 21:49
size: 3,132 (99.6% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
sex float %9.0g lsex Sex
age float %9.0g Age
edu float %9.0g ledu Education
wi float %9.0g Weekly Wage Income
pt float %9.0g lpt Permanent/Temporary
-------------------------------------------------------------------------------
Sorted by:

Hence, the data set consists of 5 variables over 261 observations. sex, edu, and pt are categorical
variables and have labels attached to them. These labels are identified by separate names given in the
fourth column above. To see what the labels mean, you can use the following command:

label list

STATA will answer as follows:

. label list
lpt:
0 Otherwis
1 Permanen
2 Otherwis

lsex:
1 MALE
2 FEMALE
ledu:
1 NONE
2 UPPER PR
3 SECONDAR
4 HIGHER

Hence, for example, the variable sex with labels lsex takes the value 1 for male and 2 for female. The
variable edu with labels ledu ranges over the integers from 1 to 4, listing levels of education (none,
upper primary, secondary and higher). The variable pt with labels lpt has labels over the range of
integers from 1 to 5, indicating status of job [permanent, otherwise (temporary), substitute, helper, or
other)].

If you want to look at the values entered for a variable, you can use the list command as follows:

list wi

6
Stata Guide: Chandan Mukherjee

Try it. STATA will show you a numbered list of values for wi. This list is truncated at the bottom of the
results window and ends with the statement – more – , indicating that there is more to come. Press the
space bar to see the next set of values, and continue doing so until you reach the end of the data
(observation 261). If you get fed up with the list going on forever (and with a big data set this can
happen), just press the break key to halt the procedure. (Note: depending on your key board, you may
have to press control break or Fn break simultaneously).

You can list more than one variable at the time – example, try list wi age sex. If you do this, you ‘ll find
that the data will be displayed in matrix form, one row for each observation.

Looking back at the description STATA gave of the data, you’ll note that the last line mentioned
‘Sorted by: ’. In this case, the space after ‘sorted by’ is left blank, meaning that the data base has not
been sorted.

You can always sort your data base with the sort command. For example, type,

sort wi

To check that the data have been sorted, just type describe. STATA will now report that the data have
been sorted by the variable wi. To check this, type list wi, and you’ll see that the data on wi are now
sorted from smallest to highest values.

You can also sort by more than one variable. For example, try the following:

sort sex wi

Using the describe command you’ll note that the data are now sorted by sex and wi. What does this
mean? STATA will first sort by sex and subsequently by wi (leaving sorting by sex unchanged!). You
can verify this by looking at your data as follows:

list sex wi

and you’ll note that all men come first (remember that male = 1 and female = 2, hence, 1 comes first),
and women later. Within each category (male/female), the data are then sorted by weekly wages (wi).

7
AUD, MA Economics 2012-2014

Also try sort pt and then list pt. If you do this you’ll note that only two categories out of the 5 listed are
effectively used – namely, permanent and otherwise. Presumably, when the data set was constructed the
intention was to use more categories, but – in the end – only two were applied.

EXERCISE

Try out the following commands.


Think carefully what they do;
then check whether you got it right (using the list command).

sort sex edu wi


sort sex age
sort age wi
sort wi sex
sort age pt

There are two further useful commands to get to know your variables better: codebook and inspect.
They require little explanation. Just try them out:

codebook wi

yields:

wi --------------------------------------------------------- Weekly Wage Income


type: numeric (float)

range: [15,950] units: .01


unique values: 160 coded missing: 0 / 261

mean: 165.886
std. dev: 133.658

percentiles: 10% 25% 50% 75% 90%


40 72 120 225 370

and,

inspect wi

yields,

wi: Weekly Wage Income Number of Observations


----------------------- Non-
Total Integers Integers
| # Negative - - -
| # Zero - - -
| # Positive 261 204 57
| # ----- ----- -----
| # Total 261 204 57
| # # . . . Missing -
+---------------------- -----
15 950 261
(More than 99 unique values)

8
Stata Guide: Chandan Mukherjee

6. Selecting data using the in qualifier

Sometimes you are only interested in part of your data and not in the whole range. For example, you
may want to look at the smallest 5 values or, alternatively at the highest 5 values. You can select a range
of values using the in qualifier after a command.

For example, try this.

sort wi
list wi in 1/5

and STATA replies with:

. list wi in 1/5

wi
1. 15
2. 15
3. 16
4. 18
5. 18

listing the smallest 5 values. It is more fun, however, if you try the following:

list wi sex age in 1/5

and STATA will answer:

. list wi sex age in 1/5


wi sex age
1. 15 FEMALE 24
2. 15 FEMALE 13
3. 16 FEMALE 50
4. 18 MALE 19
5. 18 MALE 12

Note that, at the bottom of the wage scale, the men are invariably very young, but not the women. Try it
again for the range 1/10 to see whether this is true for the lowest 10 as well.

Now try

list wi in 15/10

and STATA replies:

. list wi in 15/10
Obs. nos. out of range
r(198);

indicating that you made an error!

STATA always moves from the first to the last observation (as sorted in the data base). Using the
qualifier in 15/10, therefore, tells STATA something it cannot do – that is, to go from 15 to 10 (since
STATA will encounter observation 10 first).

9
AUD, MA Economics 2012-2014

What happens if you try list wi in 260/265? Try it! STATA will again tell you that you are out of range.
Why? The reason is that there are only 261 observations in your sample. The range 260 to 265,
therefore, is not possible.

What to do if you want to know the top 5 earnings? Try it first and then read on!

try * try * try * try * try * try * try * try *

If you subtracted 5 from 261, and then typed list wi in 256/261, you’ll have noticed that you end up
with 6 rather than 5 top values. If you subtracted 4 from 261, you got it right. But the procedure is
cumbersome. First, you have to check the size of your sample and then subtract the required number of
top values minus one from the sample size. Not difficult, but cumbersome.

There is an easier way. Try the following:

list wi in –5/-1

STATA replies:

. list wi in -5/-1
wi
257. 500
258. 530
259. 600
260. 622
261. 950

Using the minus sign tells STATA to count from the top down rather than from the bottom up. Hence, -
5 means the fifth highest value! But why not write list wi in –1/-5? Try it. STATA will tell you that
you’re out of range. Why is that?

As explained earlier, STATA always starts from the first to the last and, hence, will encounter –5 (the
fifth from the top) before it gets to –1. If you specify the range as –1 to –5, therefore, STATA will tell
you that you are out of range.

Now try,

list wi sex age in –5/-1

You’ll note that the top 5 are all men, none aged below 20. Check whether this is true for the top ten as
well!

Here are a few more you could try:

list wi sex age edu in 1/10


list wi sex age edu in –10/-1
list edu in 1/20
list edu in –20/-1

The last two are interesting. It tells you that the bottom 20 earners all have no or only primary
education, but several of the top 20 earners also have no or only primary education.

10
Stata Guide: Chandan Mukherjee

An important lesson of these examples is that using the list command after prior sorting of the data can
be very revealing if you look separately at the top or bottom values. You get to know your data better,
even before you engage in more sophisticated statistical analysis.

11
AUD, MA Economics 2012-2014

7. Selecting data using the if qualifier


There is another way in which you can restrict the data to be taken under consideration. This is done
using the if qualifier jointly with logical operators. More specifically, the qualifier if allows you to pick
only those observations for which the logical statement is true. The qualifier if works with the following
logical operators:

STATA’s logical operators

operator meaning

= = (type twice = without space) equal to.


~= not equal to.
< less than
<= less than or equal
> greater than
>= greater than or equal
& and
| or.

Take care: In STATA, == denotes equality testing, while the conventional


sign = merely denotes assignment. Hence, Y = X means assign the value of X to Y, while Y == X
means check whether the statement that Y equals X is true (if so, STATA will return 1; if not it returns
0).

In data analysis and statistics, logical operators are very frequently used. They allow you to trim your
sample to particular subsets relevant to your analysis – for example, men only, or women only; those
with higher education; those younger than 20 years of age, etc. Here are some simple examples (try
them as you go along!):

Command What it does


list sex age if wi <45 Lists sex and age of those with weekly income
below 45
list wi if sex ==1 Lists incomes of men only

list age wi sex if age<20 Lists age, income and sex for those below 20 years
of age
list edu if sex ==2 & wi < 45 Lists educational level of women with incomes
below 45
list wi sex if sex ==1|sex ==2 A silly command! Why? 1

list sex age if edu==1 | edu ==2 List all those with educational level of primary
education or below.
list wi if sex==2 & edu ==4 List income of women with higher education
(you’ll note that there are only 6 women in this
category).

1
The reason why it is silly is that it does the same as list wi sex because sex only takes values 1 and 2.
Hence the logical statement if sex ==1 | sex ==2 picks both men and women.

12
Stata Guide: Chandan Mukherjee

8. count: a little but useful command

An extremely useful command is count. It hardly does anything, but combined with if qualifiers, it can
be very useful. What it does is just count the number of observations in question.

Try, for example,

count

Stata responds with:

261

telling you that there are 261 observations.

But it gets more interesting if you use count with the if qualifiers:

count if sex ==2

and Stata will respond that there are 55 women in the sample.

Now, try this:

count if sex == 1 & wi>450


count if sex == 2 & wi>450

What did you find out?

9. Manipulating data: generating new variables

In data analysis we often generate new variables as we go along. To do this, we use the generate
command. For example, suppose you want to use the logarithms of weekly wages instead of wages
itself.

If you want to use logarithms in base 10, you do this as follows:

generate log10wi = log10(wi)

Note: log10wi is the name given to the new variable (you can use any name!)

If instead you want logarithms in base e, you do as follows:

generate lnwi = log(wi)


If you create both these variables as indicated, you’ll note that they are added in the variables window.

After creating a variable, it is useful to label it. You can do this using the label variable command as
follows:

label variable log10wi “log(wi) in base 10”


label variable lnwi “ln(wi) in base e”

If you now use the describe command you’ll note that the new variables have been added to the list.

You will often use the if qualifier to construct a new variable. Suppose, for example, that you want to
create a new variable of weekly wage income for men only. You can do this as follows:

13
AUD, MA Economics 2012-2014

generate wi_men = wi if sex==1


label variable wi_men “Weekly income: men only”

To check what you have done, try list wi wi_men sex and you’ll see that the newly created variables
only lists incomes for men, leaving incomes of women as missing values.

Important note: In STATA missing values are recorded with a dot •

Now create another variable only listing weekly incomes for women.

At times we also want to create a categorical variable. Suppose, for example, that we want to create a
dummy variable that picks out women: hence, dummy = 1 for women and 0 for men. There are two
ways to do this.

The first way is more cumbersome, but instructive in what it teaches you about data manipulation in
STATA. It proceeds in two steps: first pick the women and assign them the value 1; next pick the men
and assign them 0. You can do this as follows:

generate dummy = 1 if sex==2

(this picks the women and leaves men as missing values)

but if you now do generate dummy = 0 if sex==1, STATA will protest and tell you that you made an
error since dummy is already defined as a variable. In other words, you cannot generate an existing
variable twice. To add the men, you need to use the command replace as follows:

replace dummy = 0 if sex= =1


label variable dummy “gender indicator: female 1; male 0”
This works. Check it with list sex dummy and you’ll note that the dummy picks out the women.

The second way is more elegant, but quite intricate. It makes use of the fact that a logical operator
returns 1 if the statement is true and 0 if false. You do this as follows (calling the new dummy
dummy2):

generate dummy2 = sex= =2


label variable dummy “gender indicator: female 1; male 0”

This is very handy! When sex equals 2 (= female) the logical statement is true, returns 1 and, hence,
assigns the value 1 to dummy2. When sex equals 1 (=male), the logical statement is false, return 0, and,
hence, assigns 0 to dummy2. All in one go! Check what you have done using list sex dummy
dummy2. Both dummies are identical.

You may now wish to assign specific labels to each of the numerical values of the dummy variable –
say, female to 1 and male to 0. You do this by first defining the label and subsequently assigning the
label to the variable in question.

label define dumlbl 1 “female” 0 “male”

[This step defines a label called dumlbl (just a name) and assign the specific labels female to 1 and male
to 0.]

label values dummy2 dumlbl

[This step assigns the labels defined above to the variable dummy2].

14
Stata Guide: Chandan Mukherjee

Check what you did with list sex dummy dummy2 and you’ll note that the variable dummy2 now
features the labels, while the variable dummy doesn’t (since we did not assign labels to dummy).

10. Saving time: abbreviating commands, etc.

STATA is command driven and, hence, involves some typing into the command window. You can,
however, shorter the time you spend on this considerably using a few simple tricks.

First, STATA allows you to abbreviate its commands – usually, the first 1 or 2 letters of the command.
To know which abbreviation to use, just type help followed by the name of the command. Say, try help
list. This will prompt STATA to give you quite a bit of information in return. What matters here is just
the bit where STATA shows the command line with its various options. The relevant extract is as
follows:

List values of variables

list [varlist] [if exp] [in range] [, [no]display nolabel noobs doublespace ]

Note that STATA underlined the first letter of the list command. This means that the command can be
abbreviated by its first letter. Hence, l wi does the same as list wi.

If you have forgotten the abbreviation to use, but you think it is l, check it by using the command which
followed by the abbreviation:

which l

STATA replies:

. which l
built-in command: list

confirming that you got it right. Try which li and you’ll note that STATA also accepts that one.

Next try which g and STATA will tell you that this is the abbreviation for generate. But, if, for
example, you type which b, STATA replies with an error message as follows:

. which b
command b not found as either built-in or ado-file
r(111);

Hence, b is not an abbreviation for a command.

A second trick you can use to simplify your work is that you can edit earlier commands by either using
the Pg Up and Pg Dn keys on your key boards (which allow you to walk through earlier commands) or
by clicking (with your mouse) the relevant command in the review window (which stores the commands
you already executed beforehand).

Hence, for example, do list wi. Now you want to add the variables sex and age to the list. To do this,
press Pg Up once, the previous command now appears in the command window, and add sex and age.

The third trick is that you do not have to type the names of your variables. You can input a variable into
the command line by just clicking on its name in the variables window.

15
AUD, MA Economics 2012-2014

Here is an example using the first and last tricks. You want the command list sex age wi. Instead of
typing it out in full, proceed as follows: type l (the first letter: the abbreviation of the command list) and
then click respectively on sex age wi in the variables window. Hence, the whole command list sex age
wi can be obtained very simply by first typing one letter and then executing three mouse clicks in the
variables window!

11. Exiting STATA

The command exit allows you to stop a STATA session. If, however, you undertook data manipulations
during a session (say, by using the generate command), STATA will refuse to allow you to exit
because you may loose data in the process.

If you insist to do so, however, type,

exit, clear

These commands, taken together, tell STATA that you want to exit and that you are not interested in
keeping whatever data manipulations you made during this session. If you want to keep the newly
created variables, either use save, replace or, better still, save the file under a new name.

For example,

save worker2
log close
exit

The file worker2.dta now contains the original data and those generated in your first work session. By
keeping track of different versions, you will still keep the original data file woker.dta without it being
swamped with the clutter of things you tried out.

12. Dropping and keeping variables

Sometimes you may be working with a data file with a large number of variables, although for the
problem at hand you only need a sub-set of them. To trim down your data set, you can either use the
drop or the keep commands.

To see this, start up STATA again, and load the file you just saved:

use worker2 (to return to your earlier data set)

Now, try this:

drop sex

and you’ll note that the sex variable no longer features in the variables window.

Next try,

keep wi age

and you’ll notice that the data base is now reduced to only two variables.

Important note: Be careful not to use save, replace after you dropped variables from a data set!
Some of your data will be lost!
Save the truncated file under a new name instead!

16
Stata Guide: Chandan Mukherjee

13. Changing the default directory

If you start up STATA and load a file (say, worker.dta), and then you issue the command save worker,
STATA will store the file into the default directory (usually, but not always, c:\data). If you want the
data to be stored elsewhere, you can change your default directory for purpose of the project in
question. You do this by (1) specifying the default directory, and (2) creating a special sub-directory, if
needed.

To see what directory you are presently working in, just type pwd and STATA will tell you what your
current directory is (to remember this command, I think of it as please which directory).

Suppose you now want to create a subdirectory – say, worker. You can do this as follows:

mkdir worker

and then you can change the default to this new directory as follows:

cd worker

STATA responds as follows:

. cd worker
c:\data\worker

which shows that your default directory is now c:\data\worker (if your previous default directory was
correctly set as c:\data).

Next, open a log file as follows:

log using ws01_2, text

STATA responds as follows:

. log using ws01_2


-------------------------------------------------------------------------------
log: c:\data\worker\ws01_2.log
log type: text
opened on: [current date], [current time]

Now you data file and log file can both be found within the same sub-directory specifically created for
the problem at hand. Before we go to the next section, let us close the log file (which will be an empty
file because no command has been executed after opening it).

log close
clear

13 A note on STATA syntax

Now that you have had a bit of practice, you should be able to discern, at least vaguely, the general
rules for composing a STATA command.

17
AUD, MA Economics 2012-2014

STATA commands can be a bit confusing at first. If, for example, you forget
to type a comma “ , “ or a bracket “ ) “ , or you type two brackets “ )) ”, or
you misspell a command, STATA will answer with an error message.

DON’T GET UPSET. AND, ABOVE ALL,


DO NOT HIT YOUR COMPUTER.

In its most general format, a STATA command has four parts:

[command word] [variable name(s) or expression] [qualifier] , [options]

• [command word]: You have already used several command words describe, list, generate,
sort. A STATA command must start with a command word. For example, if you want to create a
new variable by name logwi which is the logarithm of the variable wi (weekly wage earnings) , it is
not enough to type -

logwi = log(wi)

STATA interprets the first word of your instruction as the command word in its own vocabulary, and
logwi is not a command, it is simply the name you have decided to give to the variable which is
logarithim of wi.

The correct instruction to STATA in this case is -

generate logwi=log(wi)

which, in the language of STATA reads as - generate a new variable by name logwi by taking
logarithm of the variable wi.

So, always start with the relevant command word.

• [variable name(s) or expression]: Commands are mostly concerned with something you want
STATA to do on a variable or variables existing in your dataset (such as creating a histogram of a
variable or, creating a new variable). The command word therefore must obviously be followed by
the name(s) of existing variable(s) or the name of a new variable to be used in the command. For
example, sort wi or list age. Some commands, however, do not feature a variable name or
expression; for example the save and use commands are followed by filenames

 [qualifiers]: There are two qualifying STATA words which follow the first two parts [command &
variable name(s)]. They are - in and if. Unlike the first two parts, the qualifiers are optional, you
use them only when they are necessary. Without the qualifiers STATA executes the command
(graph or generate, for example) for the whole dataset. Qualifiers are used to restrict the command
execution to a part of the data.

 The comma separates the main command from the various options to go with the command. You
already came across some cases like save, replace and exit, clear. There is only one comma after
which various options follow.

14 Exporting and importing data from and into STATA

18
Stata Guide: Chandan Mukherjee

You can export and import data from and into STATA by using copy & paste
across window applications.

Let us first export data from STATA. To do this, open the STATA editor: click
editor, and use your mouse to select the whole data set (top left corner of data to
bottom right corner). Click edit and select copy editor data.

Close the editor.

Open up excel and click edit, followed by paste to paste the data into Excel

You’ll note that variable names will also be copied along with the data!

You can import data into STATA in exactly the same way. In Excel, copy the whole range (including
the variable names on top). Go to STATA, type clear to empty data set, and open the editor. Click on
top left corner cell. Then click on edit, followed by paste. You have now brought your data back into
STATA.

Note: There are many more ways to import data into STATA. For the time being, we shall only make
use of those explained above.

19

You might also like