0% found this document useful (0 votes)
104 views11 pages

Getting Started With Stata

This document provides instructions for downloading, installing, and using Stata. It explains how to download Stata and KeyAccess from the Harvard server. It describes installing both programs and notes that Stata will need to be updated the first time it is opened. The document also discusses importing data into Stata from files like CSV or Excel and manually entering data. It provides commands to generate summary statistics like mean, median, and frequencies and to create histograms to visualize variable distributions.

Uploaded by

Nawsher21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views11 pages

Getting Started With Stata

This document provides instructions for downloading, installing, and using Stata. It explains how to download Stata and KeyAccess from the Harvard server. It describes installing both programs and notes that Stata will need to be updated the first time it is opened. The document also discusses importing data into Stata from files like CSV or Excel and manually entering data. It provides commands to generate summary statistics like mean, median, and frequencies and to create histograms to visualize variable distributions.

Uploaded by

Nawsher21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

21 June 2009

Page 1 of 11

Getting Started with Stata


Stat S100 Summer 2009
The purpose of this tutorial is to learn how to download, install, and use Stata for data manipulation,
visualization, and simple analysis.

1. Downloading and Installing onto your PC


Downloading
You can get a free site-licensed copy of Stata for your computer through Harvards server. With this v
version, you will need to be able to log into the Harvard network every time you would like to use it.
The very first thing you need to do is activate your Harvard PIN account (if you have not already done
so). That can be done here:
https://www.fas.harvard.edu/fasit/utilities/activate-pin/
Next, you will need to download the following items from the FAS software download website:
- go to the software download center (you will need to log in using your Harvard PIN #):
http://downloads.fas.harvard.edu/download
(If you are using a MAC, make sure you click on the correct platform for proper installation)
- Scroll down to the program: KeyAccess, and click on the download button
. Click on I accept
and Continue, and your download should start automatically. Save this to a convenient location on
your computer, like on your desktop. (If you have a pop-up blocker installed, you may have to click on
the banner that opens near the top of the screen). For Macs, the analogous program is called
KeyServer.
- Return to the FAS software download website, and download Stata SE (SE = special edition) as
you did for KeyAccess. This is a large file (~90MB), so it may take a few minutes. For Macs, it is just
called Stata (choose version 10).

Note: if you are going to use Stata outside Harvards network, then you will also need to download

and install VPN Client as well (from the same download page). Every time you want to use Stata
from outside the network, you will need to first start VPN Client to connect to the Harvard servers
(use your FAS username and password in the prompts).

Installing
Once you have downloaded the two programs as outlined above, you now need to install them. First
install the program KeyAccess. Click on the program FAS_k2Client.exe, click OK, click on next
through all the prompts in the window, and then click Install. This should only take a few seconds.
When asked, you do not need to re-start your computer at this time.
Next you will need to install Stata. Open up the program you just downloaded: Stata10.exe. Click on
OK to being a Harvard affiliate, and the install should begin automatically. After about 1 minute,
another window will pop open. Click Next three times, and the installation will continue. After about
30 second, a third prompt window will open up. Click Next through these prompts, and set-up will
continue. After about 3-4 minutes, set-up should be complete. Click OK once the set-up finishes.
You can delete the two programs you downloaded (presumably on your desktop); these were used for
installation only. Now, restart your computer before booting up Stata for the first time.

21 June 2009

Page 2 of 11

Note:: the first time you open up Stata, it will ask if you want to install updates. Please make sure
you go through this process the first time, this will remove some bugs from older versions of the
software. After you do this once, you will no longer need to update throughout the semester.

*Purchasing your own Copy


There is also an option to purchase your own copy of Stata. The advantage to this is you will be able
to run Stata directly from your own hard drive, and will not need to log into the Harvard server every
time you want to use the software. Of course, the downside is, it is quite expensive (from $48 to
$335). For this class, there should be no reason to purchase the software. If you do wish to purchase a
version to use while traveling or while outside the Harvard network, we recommend the least
expensive option, Small Stata, for $48. You only need order the product you want at the Stata website
below and you will be sent an email about where to pick up the software on campus. You can find the
products here:
http://www.stata.com/order/new/edu/gradplan.html
Stata is also available in the FAS computing labs and runs as described below. You do not have to
download the software on a computer in a lab.

2. Start-Up and Data Manipulation


Start-Up
To open Stata on a computer lab PC (or on your computer in which you followed the above
directions), click on Start Programs Stata 10 StataSE 10. A screen should pop up that looks
like this:

21 June 2009

Page 3 of 11

Notice, there are 4 main windows (along with the menus up top):
1. Results Stata will print-out any analysis output or communication (like error messages)
2. Command The user enters commands for Stata to run analysis.
3. Variables This window will list the variables that have been entered into Stata
4. Review This window displays the commands that Stata has processed

Data Entry
There are 3 Main Ways to bring data into Stata: by importing data created by another program or
editor, by manual entry, and by reading a data set that Stata has saved in the native format of the
program. We will mainly be using the importing option in this class whenever you use a data set for
the first time. It is important to know about the manual entry technique, as it may be useful for when
project time roles around at the end of the semester. The tutorial that is part of problem set 1 describes
how to save and re-use data in Statas native format, which is useful when you want to save your
progress when doing your homework.
Importing Data
Most datasets are stored as simple text files (with extensions .csv, .txt, .dat, or even .raw) which can
easily be imported into Stata. However, you will need to do a little bit of work to import an Excel file
the ends with .xls.
This tutorial uses the 2004_Election.csv data file found on the course website here (under the Stata
Information tab):
http://isites.harvard.edu/fs/docs/icb.topic553772.files/2004%20Election.csv
Save this file on your computers desktop to start. To import, click on the menu File Import
ASCII data created by a spreadsheet. In the window that pops open, click on the Browse button, and
select the file you want. You may have to change Files of Type to Comma Separated Values (*.csv).
Click on OK, and the new variables should be entered into Statas memory.
Note: If you get an error message like you must start with an empty dataset, then the simplest fix is
to just type: clear in the command window and click enter. Be careful though, as this will remove any
old data floating around in Statas memory.
Creating a .csv file from a .xls Excel File
Sometimes data will come as an Excel file ending in .xls. The easiest way to deal with this is to open
the file in Excel, and then save as a .csv type file. Once you have the file opened in Excel, go to the
menu: File Save As. In the window that pops open, chagce the option Save as type to the option
CSV (Comma delimited) (*.csv). Save the file in this format in an easy to get to location on your
computer (like the desktop). In the windows that pop open, click on OK and Yes; we really do want to
change this to a file that can be read into other software.
Manually Entering Data (use at your own risk)
Near the top of the Stata window, you will see what looks like a table/spreadsheet ( ). If you click
on this button, the Data Editor window should open. Once this is open, you can simply click on a cell
and enter data, or just copy and paste data directly from Excel. After getting the data set-up the way

21 June 2009

Page 4 of 11

you want it, click on the Preserve button to save your changes for later. You can close this by clicking
like any window in MS-Windows (you must close this window to do any analysis).
on the

3. Data Visualization
Once the dataset is read in, the main concern now is how to manipulate data. Problem set 1 discusses
how to use the Stata menus to produced simple graphs and summary statistics, so that material is not
reproduced here. This section of the tutorial discusses how to enter commands directly into the Stata
Command window. Here, we will learn to get summary statistics (think measures of center, spread,
etc) and graph/plot our data. Except for complex commands, menu choices and direct commands
produce identical results.
Please note: some of the methods illustrated in this tutorial will not be used until the second week of
class or later.

Summary Statistics
To get some quick statistics on a quantitative variable, use the command summarize followed by the
list of variables you are interested, for example (can be done through the menu: Statistics ->
Summaries, Tables, and Test -> Summaries and Descriptive Statistics
-> Display Additional Statistics and selecting bush_perc and gsp in the variables line,
and then click submit)
. summarize bush_perc gsp
When typing in any Stata commands, you can just copy and paste the commands from this tutorial, but
do not copy the dot at the beginning of each command (thats just from the Stata output screen).
Doing so will lead to an error. You will notice that when Stata prints results in the Result window, it
adds the dot at the beginning of the line to indicate a command it has just executed.
Unfortunately, the above does not give the median or quartiles. To get percentiles, you have to give
the option detail, as such (or highlight Display additional statistics from the Display
Additional Statistics menu from above):
. summarize bush_perc gsp, detail
To get frequencies of a categorical variable, use the command tabaulate (menu: Statistics > Summaries, Tables, and Test -> Summaries and Descriptive
Statistics -> Tables -> One-way tables):
. tabulate region

Histograms
To get a quick view of the distribution of a variable, use the command hist (menu: Graphics
-> Histogram). Enter:
. hist bush_perc

21 June 2009

Page 5 of 11

Density

A new window should pop up with a histogram of your chosen variable like this one:

.3

.4

.5
bush_perc

.6

.7

Boxplots
To produce a boxplot of a variable, use the command graph box. (menu: Graphics
-> Box plots). Enter:
. graph box bush_perc
You can also split a boxplot into different categories. (menu: click on the Categories tab in the Box
plots menu window). For example, we can do:
. graph box bush_perc, over(region)

.4

bush_perc
.5

.6

.7

And you should get the following graph:

MW

Scatterplots

NE

21 June 2009

Page 6 of 11

To get a quick visual of how two variables are related, use the command twoway (Note, the first
variable is the y-variable, and the second is the x-variable). (menu - driven: Graphics -> Twoway graph (scatter, lines, etc). In the window that opens, click Create, choose your
Y and X variables appropriately, and click accept. Then in the next window, click submit)
. graph twoway scatter bush_perc gsp
To add the regression line to the scatterplot, add the command lfit like this (menu: from the Twoway window, create the scatterplot graphic above, and then click on Create to add a 2nd plot. In this
plot window, click on the Fit plots option and choose the same X and Y variables as above. Click
Accept, and then in the graphics window there should be two plot objects. Click submit and it should
add the line to the scatterplot like below):
. graph twoway (lfit bush_perc gsp) (scatter bush_perc gsp)

.4

.5

.6

.7

And the result should look like this:

30000

40000

50000
gsp
Fitted values

60000

70000

bush_perc

Saving and Printing Graphs


The easiest way to print a histogram, scatterplot, etc is to right-click on the graph window itself (in
the middle), and then copy and paste into a word processor. From there you can add comments, adjust
the size, etcGraphs can also be saved by Stata with the extension .gph and re-opened during a
session or at a later session. Some Macs will not let you copy directly from Stata, so you will need to
save the graph as a .png file, open this file in a picture editing software, and copy from there into your
word processor document.

21 June 2009

Page 7 of 11

4. Data Analysis
Next week, we will learn to measure and analyze the association between two variables (correlation
and regression). Later in the course, we will see many more ways to do analysis (confidence intervals,
hypothesis testing, ANOVA, etc). Lets do some work on what we know for now:

Correlation
To find the correlation coefficient between two (or more) variables, use the command corr (menu:
Statistics -> Summaries, Tables, and Test -> Summaries and
Descriptive Statistics -> Correlations and covariances):
. corr bush_perc gsp

Regression
To get the printout of a regression (to find the estimates for the slope and intercept of a line), use the
command regress (menu: Statistics -> Linear models and related ->
Linear regression):
:
. regress bush_perc gsp
The results should look like this:
Source |
SS
df
MS
-------------+-----------------------------Model | .036676619
1 .036676619
Residual | .312433678
48 .006509035
-------------+-----------------------------Total | .349110297
49
.0071247

Number of obs
F( 1,
48)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

50
5.63
0.0217
0.1051
0.0864
.08068

-----------------------------------------------------------------------------bush_perc |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------gsp | -3.63e-06
1.53e-06
-2.37
0.022
-6.71e-06
-5.56e-07
_cons |
.6795575
.0634325
10.71
0.000
.5520179
.807097
------------------------------------------------------------------------------

We know we often would like to look at the residual plot for a regression to see if the assumptions are
met (to see if there is a pattern in the residuals, like a U-shape). To get the residual vs. fitted plot (the
fitted variable being your y i ), use the command rvfplot. Note, this command should be entered
directly following the regress command, since it refers back to it (menu: Statistics ->
Linear models and related -> Regression diagnostics -> Residualversus-fitted plot):
. rvfplot

Page 8 of 11

-.2

-.1

Residuals
0

.1

.2

21 June 2009

.4

.45

.5
Fitted values

.55

.6

Practice Problem
S&P 500 Stock Index. This dataset can be downloaded here:
ichart.finance.yahoo.com/table.csv?s=%5EGSPC&a=00&b=3&c=1950&d=01&e=01&f=2009&g=m&ignore=.csv

For this problem we will be using the monthly S&P 500 Index Prices Since 1950. We are going to see
if we can predict the S&P 500 price by the volume traded that day.
Download the above chart onto your desktop (I called it SP500.csv). Open up Stata. Within Stata,
read in the file using the menu: File Import ASCII data created by a spreadsheet like above.
Alternatively, you can copy and paste the file directly from Excel. Open the file in Excel. It should
look like this:

Copy the columns of interest, paste into Statas Data Editor, and then hit Preserve; it should look like
this:

21 June 2009

Page 9 of 11

Once you have the file read into Stata correctly, we can start analyzing the data. Now type in the
commands into the command window (one at a time & without the dot):
. summarize close volume , detail
. summarize close volume
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------close |
709
366.6579
444.0311
17.05
1549.38
volume |
709
4.39e+08
9.54e+08
1024300
7.23e+09

.001

.002

Density
.003

.004

.005

Density
5.0e-10 1.0e-09 1.5e-09 2.0e-09 2.5e-09

. hist close
. hist volume

500

1000

1500

Close

a) What do you notice about the two variables we are interested in?

2.000e+09

4.000e+09
Volume

6.000e+09

8.000e+09

21 June 2009

Page 10 of 11

1000

2000

3000

. graph twoway (lfit close volume) (scatter close volume)

2.00e+09

4.00e+09
Volume
Fitted values

6.00e+09

8.00e+09

Close

. regress close volume


Source |
SS
df
MS
-------------+-----------------------------Model | 81918247.6
1 81918247.6
Residual | 57673581.2
707 81575.0794
-------------+-----------------------------Total |
139591829
708
197163.6

Number of obs
F( 1,
707)
Prob > F
R-squared
Adj R-squared
Root MSE

=
709
= 1004.21
= 0.0000
= 0.5868
= 0.5863
= 285.61

-----------------------------------------------------------------------------close |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------volume |
3.56e-07
1.12e-08
31.69
0.000
3.34e-07
3.79e-07
_cons |
210.1585
11.80873
17.80
0.000
186.9742
233.3429
------------------------------------------------------------------------------

-2000

-1000

Residuals

1000

. rvfplot

1000

2000

3000

Fitted values

b) What is the equation for the least squares

21 June 2009

Page 11 of 11

regression line? What does this mean?

c) What is the predicted closing price for a day that had a billion (109) shares traded? What about for
3.12 billion (3.12*109)?

d) What are our results? Does this seem to be a good fit? How do you know?

You might also like