0% found this document useful (0 votes)
8 views30 pages

0.1 Intro To Stata

The document provides an overview of STATA, a statistical software developed in 1984, detailing its history, advantages, and file naming conventions. It describes the basic windows and commands used in STATA for data analysis, including navigation, loading data, and preliminary data viewing techniques. Additionally, it includes exercises for practical application of the commands discussed.

Uploaded by

jordandaka42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views30 pages

0.1 Intro To Stata

The document provides an overview of STATA, a statistical software developed in 1984, detailing its history, advantages, and file naming conventions. It describes the basic windows and commands used in STATA for data analysis, including navigation, loading data, and preliminary data viewing techniques. Additionally, it includes exercises for practical application of the commands discussed.

Uploaded by

jordandaka42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Data Analysis with STATA

Introduction

Jairos Sambo, MA
March 4, 2019
Lusaka, Zambia

1
STATA: History of STATA
• Program was developed for the IBM PC and the DOS operating system in 1984
by Bill Gould and Sean Becketti
• First version focused mainly on regression but had data management functions
• Over the years has developed strong user input through forums
– Include discussions about STATA
– Writing and sharing of STATA programs
– Led to the development of Statistical Software Components (SSC) archive - a
searchable database of user-written STATA programs
• STATA version 8 was the biggest rewrite of the program
– Had a new interface
– Included new graphics
• Current version of STATA is 18
• Advantages of STATA (over its main competitors SAS and SPSS)
– Cutting-edge statistical procedures
– Small, fast and available for various processors
– Reasonably priced with lifetime license

2
STATA file naming scheme
• *.dta (STATA data files)
• *.do (STATA instruction file. Contains set of
commands for processing STATA data files
and setting-up the STATA environment)
• *.log (This is the output file. Contains all that
is in the output window)
– Activated by issuing the command: log using
filename.log
– Terminated by issuing the command: log close
3
The STATA Menu System
• STATA can be started from the desktop or
program files
 Double clicking the icon on the Desktop
 Start All Programs STATA

4
5
Basic STATA windows. There are 5 basic windows when STATA is
started:

The Command Window:


• This is where
commands are entered
interactively
• Do files are executed
here

6
Basic STATA windows. There are 5 basic windows when STATA is
started:

The Results Window:


• Shows commands executed
and results of commands
• Shows commands from the
MENU
•Shows logs (errors, warnings,
etc.)

7
Basic STATA windows. There are 5 basic windows when STATA is
started:

The Review Window:


• Shows lists of commands
executed from the Command
Window AND Menu

8
Basic STATA windows. There are 5 basic windows when STATA is
started:

The Variables Window:


• Shows variables and
properties of variables of the
active data set

9
Basic STATA windows. There are 5 basic windows when STATA is
started:

The Properties Window:


• New for Stata 12 through 15!
• Shows details on the
composition of the variable
selected and the dataset as a
whole.

10
STATA Organisation and Window System
• Basic STATA windows. There are 5 basic windows when STATA is started:
– The Command Window
• This is where commands are entered interactively
• Do-files executed here
– The Results Window
• Show commands executed and results of commands
• Shows commands executed from the MENU
• Shows logs (errors, warnings etc.)
– The Review Window
• Shows list of commands executed from the Command Window AND Menu
– The Variables Window
• Shows variables and properties of variables of the active data set
– The Properties Window:
• Shows details on the composition of the variable selected and the dataset as a whole.
– There are other STATA windows that are activated for a number of procedures and activities.
For example:
• Graphs
• Browse

11
Basic STATA windows. There are 5 basic windows when STATA is
started:

Windows can be closed by


clicking on the ‘x’ in the
corner of the window (just like
in any Windows based
program), and hidden by
clicking the thumbtack.

12
Basic STATA windows. There are 5 basic windows when STATA is
started:

They can be retrieved from


the ‘Window’ menu.
(Important to remember as
often Windows get closed by
accident!)

13
Basic STATA Commands - Navigation

• STATA statement syntax


– All STATA commands start with a STATA Reserved Word
– The statement might also include a variable name or variable list as
well as options
• Loading data and using file commands:
– DIR, CD and Path information
• These command help in identifying the active folder and moving from one
folder to another. Enter the following commands in the Command Window:
– dir (What do you see? Which folder is this?)
– cd C:\Users\Jairos Sambo\Desktop\STATA TRAINING\data_rals\stata (This makes the
“Stata” folder current)
– dir (Which folder are you in now? What files do you see?)

• Exercise: Look at the names of the files in the folder and try to
match these names with sections of the questionnaire
14
Basic STATA Commands: Using a Data File

• Using data files, i.e. making data files active in computer


memory
– The USE command
• This command loads data into computer memory for use
– Example: use household.dta
– Examine the variables in the variable window
• Depending on the location of the file, the full path may have to be
given
– Example: use C:\Users\Jairos Sambo\Desktop\STATA TRAINING\data_rals\
stata, clear
– The CLEAR option removes the active file from memory before loading the
new one
– Note: If you execute the DIR command and you do not see the file, that means
it is in a different folder so you either include the path or make the folder
current
15
Preliminary View of Data - describe

• The Describe command (Abbrev: d)


– Command can be written as: Describe or as simple as d
– Gives information on file currently loaded in memory
– The default output has two parts
• Information on file size, and variables
• variable types and labels
– For example, if we run describe on the file in computer
memory
• .d
• We will get the output shown below

16
Preliminary View of Data - describe

Contains data from household.dta


obs: 7,934
vars: 148
size: 15,122,204

2. Body of the describe command


storage display value
variable name type format label variable label

cluster double %10.0g Cluster


HH double %10.0g Household number
popwgt double %10.0g Panel Weight
panwgt double %10.0g Panel weight
consent double %10.0g consent Respondent explained the consent form
resettle double %10.0g resettle Resettlement area
chief double %10.0g chief Chief in area
s_dd_new double %10.0g South Coordinate in decimal degrees
e_dd_new double %10.0g East Coordinate in decimal degrees
sameloc double %10.0g sameloc Is the household in the same location as it was in June/July 2012?
samesea double %10.0g samesea Is the household still located within the SEA boundary?

17
Preliminary View of Data - list

• The list command


– This command can be written as: list or l
– This command shows each record, i.e. the
variables and value
– Example:
• enter the command l in the Command Window
• See Output below for the five records. This output
shows all the variables in the record and the value of
each variable

18
Preliminary View of Data - list

cluster HH popwgt panwgt consent resettle

1. 1002 4 419.66283 311.81963 Yes No


2. 1002 6 364.9242 271.1475 Yes No
3. 1002 12 72.98484 67.786876 Yes Yes
4. 1002 14 364.9242 271.1475 Yes No
5. 1002 18 419.66283 311.81963 Yes No

19
Preliminary View of Data - list

• Use of the list command with IN qualifier


– The IN and IF qualifier can be used with this and
other STATA commands
– For example:
• l in 1 (this list the first record)
• l in 5/8 (this list records 5 to 8)

– Exercise. Try:
• list prov dist cluster HH in 1/50 if cluster < 1040
• Explain the output
20
Preliminary View of Data – browse

• Use of the BROWSE command


– This command can be written as browse or br
– The browse command works just like list, but the
data is present in a table format in the BROWSE
window.
– The IF and IN qualifiers work with browse in the
same way they work with list

21
Preliminary View of Data – browse

• Use of the BROWSE command


– The variables of interest can be selected in the
command line (for example: br prov dist cluster
HH if cluster < 1040) or selected using the tick
boxes in the browse window.
– Try:
• br if district>103
• br if dist==103

22
Preliminary View of Data – count

• Use of the COUNT command


– The count command gives the count of the number
of observations.
– Useful for determining the sample size! 7,934 for
RALS
– Also works with the IF command
– Try:
• count
• count if prov==10 (Number of hh in Western province)

23
Preliminary View of Data – sort

• Use of the SORT command


– The sort command puts the data in a requested order
which can be viewed with the BROWSE command
– You can use as many different variables as you like!
– Try:
• sort HH
• sort cluster HH
• sort HH cluster
– What is the difference between the last two commands?

24
Preliminary View of Data – gsort

• Use of the GSORT command


– The gsort command puts the data in a requested order
which can be viewed with the BROWSE command
– This command works just like SORT, but with a ‘-’ you
can reverse the order.
– Try:
• gsort -hh
• gsort +dist +cluster –HH
– Not used as often as SORT but very helpful
sometimes!

25
One more thing...

• STATA has what is described as a ‘user driven


development format.’ This means that anyone can write
Stata command that you can download and use.
Sometimes these commands are really helpful – as you
will see this week. But you should be careful – no one
officially reviews these commands, and if your paper is
returned because you used an inaccurate one – there is
no one to blame!
• To try loading a ‘user written’ command, type ‘findit
tabmiss’ or ‘findit outreg2’ and follow the directions in
the window (you must be connected to the internet).
26
Command Review

• DIR : displays filenames


• CD : changes directory
• USE : loads Stata dataset
• LIST : lists values of variables
• BROWSE : browse data with data editor
• COUNT : count observations
• SORT : sorts data
• GSORT : sorts data in ascending or descending order
• FINDIT : searches the Stata documentation
• HELP : the most important command you will ever learn
27
EXERCISE 1

• Change the path (directory) to the stata data file: " “C:\
Users\Jairos Sambo\Desktop\STATA TRAINING\
data_rals\stata”
• Open : livestock.dta
• Determine the total sample size using the ‘count’
command.
• Determine the sample size in each province using the
‘count’ and ‘if’ commands.
• Sort the dataset in order of farmer category.
• Browse the sorted data to confirm it is correct.
28
EXERCISE 2

• Examine the data files in the ‘stata folder’ and


match them with sections in the RALS 2015
questionnaire

• Write the filename on the appropriate section


page of the questionnaire

29
Thank You

30

You might also like