0% found this document useful (0 votes)
19 views15 pages

The Basics of STATA - 2020

This document provides an introduction to the basics of STATA, a statistical software package designed for data management and analysis. It covers essential commands, the user interface, and how to input data, as well as tips for using the software effectively. Additionally, it outlines the rules for naming variables and the importance of keeping a log file for recording commands and outputs.

Uploaded by

aurarolee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views15 pages

The Basics of STATA - 2020

This document provides an introduction to the basics of STATA, a statistical software package designed for data management and analysis. It covers essential commands, the user interface, and how to input data, as well as tips for using the software effectively. Additionally, it outlines the rules for naming variables and the importance of keeping a log file for recording commands and outputs.

Uploaded by

aurarolee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

The Basics of STATA

INTRODUCTION

This file introduces you to the basics of STATA, a powerful statistical package which
is also easy to use. You are presented with the rudiments of STATA which you will
find very useful when you attend the forthcoming computer classes. It is by no means
comprehensive in its coverage, but it will allow you to learn and understand some basic
data management commands that are necessary when analysing real life datasets. It is
therefore very important that you understand and practice through these computer
classes in your own time to ensure you have a good grasp of the commands and concepts
taught.

STATA

STATA has been developed to undertake most types of complex statistical analysis. It
offers most of the functions offered by spreadsheet packages such as MS-Excel, has
readily executable commands that are more wide-ranging than other
statistical/econometrics packages and has one of the best data management tools.
STATA also offers the possibility of programming for more advanced users. It is a
statistical/econometrics software that caters for the needs of the most beginner level
user to the more advanced level user. Although it may appear a little difficult to use at
first, its long-term payoff is definitely worth it. New procedures not included in the
standard package can be downloaded via the web. STATA is now available in 4
different flavours: Small, Intercooled, Special Edition and MP. Over the years, many
versions of STATA have been developed and STATA 13 is the very latest, released in
July 2013.

Help can easily be obtained by typing in the STATA help command e.g. to enter the
extensive help file simply type help in the command box and a range of options will
become available to you. Alternatively, if you need help with a specific command type
help then the name of the command you need help with, e.g. help regress in the
command window. One of the recommended textbooks for this module “Using
STATA for Principles of Econometrics” is particularly helpful to understand how to
run regressions in STATA using its various commands. Note that the book is written in
line with STATA 11, but all the commands will work in STATA 13. There are STATA

1
manuals in the library and additional help can be obtained via the STATA website at
http://www.STATA.com. Another very useful website is
http://www.ats.ucla.edu/stat/STATA/. There are plenty of other resources on the net
and many of these have been written to cover different versions of STATA, often the
earlier versions. Most of the basic commands from the earlier versions will run in
STATA 12. However, there are some commands that STATA 13 will flag up with a
message that they are outdated commands and continue to work, but STATA will also
mention their most up-to-date equivalent.

Starting STATA

Stata can be started several ways. First, there may be shortcut on the desktop that you can
doubleclick. Alternatively, using the Windows menu, click the Start > All Programs >--
- >Stata 13. This will open STATA and give you a screen similar to the following figure.
You can rearrange the windows differently if you wish. To do this, under the Edit menu,
click Preferences > Manage Preferences > Load Preference Set

2
A second way is to simply locate a Stata data file, with *.dta extension, and double-click.

The Results/Output Window

This is where the results of executing most of the commands in STATA are displayed.
For instance, if you run a regression, the output window will return and display the
regression output. When a command line cannot be executed, this is also where STATA
will normally display an error message.

The Command Window

This is where all the commands are typed. When you type a command and press enter,
STATA will execute it. Note that commands should be typed in one whole line in the
command window. In other words, keep typing your command and leave STATA to
sort out how it wants to split the characters in the space of the command window (in a
do file, you are free to split a long command on several lines by using the appropriate
line split syntax).

Note: STATA standard commands are always typed in lowercase letters. For most of
the time, you will be naming variables or creating new ones. Hence, as a good practice,
it is best to name your variables using lowercase letters instead of uppercase letter (this
prevents you from having to switch between lowercase and uppercase letters when
typing in the do file).

The Variables Window

Once you have loaded a data set into STATA, the variables window will display a list
of all the variables present in the data set, along with their labels, the storage type and
the format the data is stored in. Data entered into STATA can be entered as numeric
variables or string variables (see later section on numeric versus string variables).

Rules on naming variables in STATA

Naming variables in STATA is simply the same as writing the title of a column in Excel,
except that there are a few rules that have to be respected in STATA:

(i) Difference between lowercase and uppercase letters - STATA considers variable
name Y to be different from variable name y.

3
(ii) Variable names need to begin with letters or underscores, and cannot begin with
numbers – you cannot name a variable 123abc but you can name it abc123. Starting
with a letter or an underscore, you can use many variations to name a variable. The
name can be a combination of letters and numbers (e.g. var1, _var1, _1var). The
underscore is also the only other character you can use as part of a variable name (e.g.
var_23).

(iii) Other than alphabets, numbers and underscore, STATA does not allow any other
character in a variable name, e.g. STATA will not accept var&23 as a variable name.

(iv) STATA does not allow spaces in the name of a variable – hence you cannot name
a variable real gdp but you can name it realgdp, real_gdp, real1gdp or _realgdp (or any
other name that does not flout the rules on naming variables)

Flouting any of these rules will result in STATA displaying the error message “invalid
name”.

The Command Review Window

This window keeps a record of all the commands you have typed in the command
window from the moment you have opened STATA. It is quite convenient since you
can retrieve these commands by simply clicking on them. However, once you exit
STATA, these commands are lost.

The Pull Down Menus

The pull down menus, from File to Statistics, are where you ask STATA to execute a
particular command (which you search through the pull down menus) instead of typing
the command directly in the command window. If you click through any particular
command, it opens up a window offering you many tick-box options. These options are
similar to what you would get by typing help commandname in the command window
(when typing commands in the command window, the desired option is typed after a
comma). If you ask STATA to execute a particular command, you will notice that the
full command will appear in the command window (i.e. in the way you would have
typed it in the command window had you not used the pull down menu). It will also
appear in the command review window (see further below) and if you have created new
variables in the process, the new variable will be listed alongside existing ones in the

4
variables window. The pull-down menus are useful as a way to learn how to write
commands in STATA. Under the pull down menus, you will find a number of useful
icons (roll your mouse over each to know what they are for) most of which are shortcuts
for writing the commands.

Each of these icons serves a purpose as explicit in their names. For instance, you will
click on the “Data Browser” icon when you want to browse through your data whereas
with the “Data Editor” you can both browse and edit your data set manually. The
“Variables Manager” icon is where you can manage variables in your dataset, such as
naming, labelling variables, etc. The “New Viewer” icon launches an interactive help
& advice window, which links to PDF documentation of the STATA functions (with
STATA version 11, the instruction manuals are pre-installed within the software for
easy access). The log file and the do file editor are particularly important for most of
the operations you will carry out in STATA.

The Do File Editor

The do file editor is where you manually type STATA commands (or copy and paste
them from the command review window) and execute them. The resulting output is
displayed in the output window. A huge advantage of the do file editor is that, being a
text file, it allows you to keep a record of all the commands you have performed on a
particular data set. Once you have written your commands/comments into the do file,
you can save it anywhere and review these commands the next time you open STATA.
Compare this with typing the commands directly in the command window. As soon as
you exit STATA, you will have lost all of these commands.

The do file is particularly important when you want to write and execute several
commands at a time and want to see the output from all these. If you wanted to type a
command in the command window, you need to type one command at a time, press
enter, and see the resulting output. With the do file editor, you can write all the

5
commands you want and run them at one go, and scroll down the output window to see
the resulting outputs from executing all these commands.

The do file editor is also useful when writing your own codes to manipulate and analyse
data.

If you click on the do file editor icon (see above) it will launch a new window like the
figure below. Note that in the figure below, several command lines have already been
typed. In your case, you will see a do file that is blank and you will need to type
commands /comments in it.

The first command inside the do file basically instructs STATA to open (through the
command use) a file called workshop1.dta). This file is located in a drive called F and
in folders and sub-folders (research_methods\2010-11\Workshops\). On execution,

6
STATA will therefore look for the data set called workshop1 under these folders in the
F drive. You will notice several icons below the pull down menus and you will
understand what most of these are for by hovering your mouse over each of them. These

last two are of particular importance. Both icons instruct STATA to execute
the commands in the do file. The main difference between the two is that clicking on
the first icon (the run command icon) will execute all the commands silently - i.e. the
output window will not display any of the commands written inside the do file, but
rather will display a message like this:

run "F:\research_methods\2010-11\Workshops\demonst1.do"

This message is telling you that the program is executing the do file called demonst1.do
(which is located in the appropriate folder).

On the other hand, if you click on the second icon (the do command icon) this will
execute all the commands in the do file and echo each of these written lines (including
comments) in the output window. If you select some lines inside the do file, STATA
will run the do file for these selected lines only. If you do not select any line, then
STATA will run the whole do file line by line and the output will appear in the output
window.

All the commands typed in the do file editor can be saved for later reference by saving
the do file itself (when you save a do-file, STATA automatically gives it a .do extension,
e.g. demonst1.do in the above). You can also write comments in the do-file editor as
notes to yourself for later reference. If you want to write a note/comment in the do file,
you need to put a star before the comments as in the above. If you select 10 lines of
command, 3 of which are comments lines, STATA will recognise the comments when
executing the selected 10 lines of commands and ignore them and move on to the next
line of command. If you forget the star sign, STATA will return an error message saying
it could not recognise the commands (e.g. in the first line of the do file, open is not a
recognised STATA command).

TIP: In the above do file, the second line instructed STATA to open the file called
workshop1.dta and the last line of the do file instructs STATA to save all the changes
we have made to the dataset into a new file called workshop1_revised.dta. Note that
in both cases, we instructed STATA to look into the directory

7
"F:\research_methods\2010-11\Workshops\" (of course your directory will be
different). If you are going to work from one directory only all the time, then instead of
typing the long name of the directory, type the command:

cd "F:\research_methods\2010-11\Workshops\"

at the very beginning of the do file. The command cd means change directory and
instructs STATA to focus on the above directory only, unless specified otherwise. If
you then follow this with:

use “workshop1.dta”, clear

save “workshop1_revised.dta”, replace

These will do all the necessary open and save operations on the data in that specific
directory.

The Log File.

 It is always good practice to keep a record of what you have done, whether in
terms of the commands you used or the output produced from executing these
commands. The way to record session output and commands in STATA is by
using a log file. As the name implies, this is where you log/record all the
commands and outputs in STATA. To open a log file, type log using filename.
This will open a file called filename.log (saved in the specific directory you are
working from) in which all subsequent commands and output will be saved.
 You will find it helpful to use names that will help you to remember what you
did during that session. To append information to an already existing log-file,
type log using filename, append. If you want to replace the contents of an
already existing log-file, type log using filename, replace.
 Once you have finished with all the commands, you need to close the log file,
i.e. at the very end of your do file, type log close.

Inputting Data:

1. Straight into STATA

8
STATA works like any other spreadsheet: within STATA go to data editor (or type
edit in the command window), and input the data manually: each column represents a
variable and each cell will contain the value of the observation for the relevant variable.

2. From Excel into STATA

STATA expects a single matrix or table of data from a single sheet, with at most one
line of text at the start defining the contents of the columns. From your computer:

 Start Excel.
 Enter data in rows and columns (or open the Excel file of interest).
 Select the data of interest, then pull down Edit and choose Copy.
 Start STATA and open the data editor (type edit in the command window or
click on the data editor icon).
 Paste data into the data editor by right clicking on the first cell and choosing
Paste
 Close the data editor
 You can save this data set to a relevant drive.

3. Reading STATA formatted data

 STATA formatted data need to have extension .dta to be recognised by the


software.
 Within STATA, you can then open the file by first pulling down File and
choosing Open. STATA will automatically clear the memory on the computer
from any existing data you were using up to that point. If you wish you could
also apply the STATA use command that reads data that has been saved in
STATA format: use “directory_name\filename”. Most people find it easier to
use the File>Open... menu for reading STATA formatted data. Note the use of
double quotes around the directory and file names when entering the command
manually in the command editor window. They need to be used if your directory
name or file name contains embedded spaces. To avoid confusion over when to
use and when not to use them, it is better to use them all the time (since it is ok
to use them when not needed, but STATA flags up an error message if you fail
to use them when they are needed)

9
Arithmetic, Relational and Logical Operators in STATA

STATA uses relational and logical operators, as described in the following table (the
comments in brackets indicate what the operator means in STATA):

Arithmetic Relational Operators Logical Operators

You won’t probably be using all of the operators but it is useful to know what they
stand for. The arithmetic operators are simple to understand - you can use them to do
any sort of algebraic manipulation. Relational operators impose a condition on a
variable, while logical operators combine two or more relational operators. Note the
difference between the single equal (=) and double equals (==) signs in STATA which
are interpreted differently by STATA. The single equal = is used as a set equal operator
in STATA. It is used in the generate, replace, recode commands and also in some of
the multivariate commands. Examples of the use of the single equal:

generate X = Y
replace Z = A/B
generate M=A-Z
recode X 5=1 4=2 2=4 1=5

The double equals == on the other hand is used to test for equality. It can be used to
check or execute a command if a certain condition holds. Some examples:

assert X==1 (this command checks that X takes value 1 in the data set)

10
regress Y X Z if A==2 (runs a regression only and only if the values of variable
A equal 2 in the data set)
list X Y Z if A==1 (lists X Y Z only if variable A takes value 1 in the dataset)

The order of evaluation (from first to last) of all operators is ! (or ~), ^, - (negation), /,
*, - (subtraction), +, != (or ~=), >, <, <=, >=, ==, & and |. This means that if you ever
need to type and run a command that contains some or all of these operators, STATA
will execute them in the order of evaluation specified above. For instance typing the
command

generate Z=2+4^3 if A!=5

will first ignore all observations of A which take value 5, then evaluate 4^3 then add 2
to this and hence return a new variable Z that takes value 66 when variable A does not
equal 5. Note that the “if” is used as a qualifier but “if” is also used in more advanced
STATA programming commands.

Creating New Variables

To create new variables in STATA, the syntax is

generate newvariablename=any_transformation_you_want

Where newvariablename is the name of the new variable created (respecting the naming
rules in STATA) and any_transformation_you_want refers to what you want your new
variable to be. For instance, you can want your new variable to take value 2, you can
want your new variable to be the sum of two other variables, etc. Suppose you already
have an existing variable called X in your dataset and you want to create a new variable
Y which is twice the value of X, you would type generate Y=2*X. Here are some other
examples:

generate A=2 (creates a new variable called A which takes value 2 for each
observation)
generate B=X/Y (creates a new variable called B which is the ratio of variable
X to Y)
generate Asqr=A^2 (creates a new variable called Asqr which is the square of
A)

11
generate Asqr=A*A (creates a new variable Asqr which is the square of A,
same as above)

Note the similarity between these and the way you would create data in a new column
in a spreadsheet in Excel. In Excel, you would type the relevant formula in one cell and
then drag it to other cells to fill them with your formula. STATA is doing exactly the
same by creating a new column of data but you do not need to type the formula in the
cell, neither do you need to drag the formula across other cells to fill them. Think of
how tedious it would be if you had a very large data set with thousands of observations
(i.e. thousands of rows in Excel) and you had to copy and fill a formula in each of these
rows. STATA can perform many of the common functions used in Excel and even more.
STATA has a range of functions which you can use to transform or create new variables.
For an extensive list of these different functions, type help functions. A commonly
used mathematical function in applied data analysis involves applying a logarithmic
transformation onto an existing variable. The command to perform this is:

generate newvariablename=log(existing_variable)

or

generate newvariablename=ln(existing_variable)

The difference between ln and log is that log has base 10 and ln has base e. Note the
use of brackets around the name of the existing variable. Most functions in STATA
have to be specified as the function name and the variable name or expression in bracket.

String and Numeric Variables in STATA

STATA records values of a variable in two formats – string and numeric. A variable
stored in numeric format simply means its values have been recorded as regular
numbers, e.g. 1, 1.02, 50000, etc. Hence, you can carry out any arithmetic
manipulations on these variables, such as addition, multiplication, or any other
formulae you want to apply on them. Generally, the storage type for numeric variables
will tend to be either long, float, double, byte or int.

On the other hand, if a variable is stored in string format, then its values are recorded
as characters. These characters could be alphabets, symbols (? & * / > -) and even
sometimes what appears as regular numbers or a mix of characters and regular numbers

12
(e.g. the value 1,000058 contains a comma, which is a non-numeric character). You
cannot perform any arithmetic operation on string variables, particularly if your
observations contain only non-numeric characters. However, if the

values of the string variables appear to be regular numbers which you wish to perform
arithmetic operations on, then you would have to first convert the variable (and hence
all its values) into a numeric variable, using the destring command in STATA. Type
help destring for more on how this command works.

Some useful commands in STATA

Below is a list of commands and functions that you may find useful when working with
STATA, with descriptions in brackets telling you briefly what each command does.
Many of these have further detailed options which you can learn about by typing help
command_name

o use (opens a STATA formatted dataset)

o save (saves a STATA formatted dataset)

o sort variablename (sorts data in ascending order)

o gsort +/- variablename (sorts the variable in ascending order if you put a + or
descending order if you put a – in front of the variable name)

o egen (creates new variables when you want to carry out some operations across a
range of variables or across a range of rows, or across observations classified into
groups)

o count (counts the number of observations in the data set)

o summarize (calculates and displays a variety of univariate summary statistics)

o Insobs # (adds observations in the data set to accommodate more observations as


set by #, the number of additional observations)

o help functions (brings up a list of different types of functions available in STATA)

o help graph (brings up a list of graphs that can be used to visualize data)

13
STATA Exercise

The following exercise gives you a hands-on to the basics of STATA using a real life
data set. It begins with importing a MS-Excel data set into STATA and asks you to
perform some basic operations to familiarise yourself with some of the commands that
will be helpful for the later computer classes. At any time, if you are stuck or unsure
about how to proceed or which STATA command to use, carry out a search online with
the question you want answered, e.g. “how do I replace the values of a variable in
STATA”.

Download the Excel data set called CRIME from the module page on Moodle under
the computer classes section. This is a data set that contains crime levels and other
socio-economic information on 46 cities across the US for the year 1982. The data set
(slightly modified for this exercise) is available at Boston College via a website
maintained by Christopher Baum. The variables are defined as follows:

pop = actual population in number

crimes = total number of crimes

unem = unemployment rate (%)

officers = number of police officers

pcinc = per capita income, $

area = land area, square miles

lawexpc = law enforcement expenditure per capita, $

(i) Copy the data and paste the data into STATA and save it in any drive of your choice.
From here on, you need to open a do file and write in it all the necessary STATA
commands to answer the following questions (where necessary, you need to search
what the term means before computing the relevant variable):

(ii) Label the variables using the definitions given above.

(iii) Create a new variable which measures the population density for each city.

(iv) Create a new variable which measures the crime rate per 1000 of population.

14
(v) Calculate the actual law enforcement expenditure in each city.

(vi) Find out the minimum and maximum number of police officers in the data set.

(vii) Create a variable which is the natural logarithm of crimes.

(viii) How many crimes occurred in the richest city?

(ix) What is the average unemployment rate across the U.S.?

(x) Is the richest city also the one with the highest number of police officers?

(xi) Does the city with the highest unemployment rate also have the highest crime level?

(xii) Compute a variable that takes value 1 for cities that have crimes rate higher than
100, and zero otherwise.

(xiii) What is the total number of police officers in each of the two categories of cities
you have identified in part (xii)?

(xiv) Create a scatter graph of crime rate against law enforcement expenditure per capita.

(This file is a modified version of Dr. Dev Vencappa’s original hand-out)

15

You might also like