0% found this document useful (0 votes)

137 views93 pages

Introduction To Stata Software, MaU, 2022

The document outlines a training plan for using Stata software in economics, covering topics such as the Stata interface, data management, statistical analysis, and regression models. It emphasizes the importance of exercises, group work, and familiarity with Stata for effective learning. Additionally, it provides details on the different versions of Stata, their capabilities, and essential commands for exploring and managing datasets.

Uploaded by

gedeojabat6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views93 pages

Introduction To Stata Software, MaU, 2022

Uploaded by

gedeojabat6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 93

Training on Economics Software

Applications: Introduction to Stata©

Tolasa Alemayehu
Economics Department
Mattu University

Nov, 2022
Mattu , Ethiopia
A PLAN (CONTENT) OF TRAINING
1.Introduction
 The Stata Interface
 Exploring and Examining Datasets
 Storing Commands and Outputs

2. Data management
 Creating, Modifying and Defining Variables
 Appending and Merging Datasets
 Collapsing Data Sets

3. Describing Data
 Summary Statistics
 Statistical Tests
 Graphics

4. Analysis of Regression Models

 Steps in Empirical Analysis
 Structure of Economic Data
 Regression Models: Cross Section , Time series and Panel
A PLAN (CONTENT OF TRAINING)
 Exercises will be given for every section
 Working in group is advisable
 Good to have seats in such a way that at least one in
your right or left side has some acquaintances with
Stata
1. introduction
 What is Stata? Why Use Stata?
 Types of Stata
 Stata (pronounced “stah-tah). Version 1

Born in 1985.
 Stata is not an abbreviation but rather a

corruption of the word Statistics.

 Stata is a general purpose software and a

command-driven package (i.e. not

specialized like DAD, Eviews, GAMs, SPSS,
Matlab, nlogit, etc.)
◦ Cross section, panel, and time series
data analysis (Especially suited for the
former two )
Why should I use Stata?
 Stata preferred to other packages as, “a very interactive
package, which makes you feel like you are talking to it
and does exactly what you are telling it to do.”i,.e.,
• Handling and manipulating large data sets (e.g.
millions of observations!)
• Growing capabilities for handling panel and time-
series regression analysis.
• There are improvements in computing speed,
capabilities and functionality.
• Constantly being updated or advanced by users with a
specific need
• Fast and easy to use.
Types (size) of stata
 There are four different types (sizes) available for
each version of Stata:
 1. Stata MP (Multi Processor),the most powerful,
 2. Stata SE (Special Edition),
 3. Stata Intercooled (IC) and
 4. Stata Small.
 The main difference between these versions is the

maximum number of variables, regressors and

observations that can be handled.
 It is important to know these types if one is to make a

good choice of what to buy.

Stata Typ Maximum Maximum Maximum Remarks
e Number of Number of Number of
Variables Regressors Observation
s

· Runs on multiple CPUs

Stata/ or cores, from 2 to 64 but
MP can also run on single
32,767 10,998 2,147,583,647* core. The number of
cores depends on the
licence.
· Fastest version of Stata

· Run on single core.

· Can run on multiple core
Stata/SE 32,767 10,998 2,147,583,647* computers but uses only
single core.

· Run on single core.

· Can run on multiple core
Stata/IC 2,047 798 2,147,583,647* computers but uses only
single core

· Run on single core.

· Can run on multiple core
Small Sta 99 99 1,200 computers but uses only
ta single core
Menu bar

Toolsbar

Results window
Variables window
5 windows
Variables window

Review Window
Properties window

Commands window
The Stata Interface: Windows, Toolbar, Menus, and Dialogs
 Windows
 The Stata windows give you all the key information about

the data file you are using, recent commands, and the results
of those commands.
 The five main windows are the Review, Results, Command,

Variables, and Properties windows.

 There are other, more specialized windows such as the

Viewer, Data Editor, Variables Manager, Do-file Editor,

Graph, and Graph Editor Windows.
 Some of them open automatically when you start Stata,

while others can be opened using the Windows pull-down

menu or the buttons on the tool bar.
 Stata windows are:
• Stata Results To see recent commands and output
• Stata Command To enter a command
• Stata Browser To view the data file
• Stata Editor To edit the data file
• Stata Viewer To get help on how to use Stata
• Variables To see a list of variables
• Review To see recent commands
• Stata Do-file Editor: To write or edit a program
Menus
Stata displays 8 drop-down menus across the top of the outer window, from left to right:
File
Open open a Stata data file (use)
Save/Save as save the Stata data in memory to disk
Do execute a do-file
Filename copy a filename to the command line
Print print log or graph
Exit quit Stata
Edit
Copy/Paste copy text among the Command, Results, and Log windows
Copy Table copy table from Results window to another file
Table copy options what to do with table lines in Copy Table
Prefs Various options for setting preferences. For example, you can save
a particularly layout of the different Stata windows or change the
colors used in Stata windows.
Data
Graphics
Statistics build and run Stata commands from menus
User menus for user-supplied Stata commands (download from Internet)
Window bring a Stata window to the front
Help Stata command syntax and keyword searches
Button bar
The buttons on the button bar are from left to right
Open files from some specific directory use
Save the Stata data in memory to disk: save
Print a log or graph
Open a log, or suspend/close an open log: log

New Do file: Editor Do edit

Edit the data in memory: edit
Browse the data in memory: browse
Important Short cuts
 keyboard commands are quicker to use
than the buttons. The most useful ones are:
 Control-O Open file
 Control-S Save file
 Control-C Copy
 Control-X Cut
 Control-V Paste
 Control-Z Undo
 Control-F Find
 Control-H Find and Replace
1.2. Exploring and Examining Datasets
1.2.1. Exploring Data Files
Common Stata Syntax
• Stata commands follow the same syntax:

[by varilist1:] command [varlist2] [if exp] [in range]

[weight], [options]
• Items inside of the squares brackets are either options

or not available for every command.

• This syntax applies to all Stata commands
 Logical operators used in Stata

~ Not
== Equal
~= not equal
!= not equal
> greater than
>= greater than or equal
< less than
<= less than or equal
& And
| Or
1.2.2. Examining dataset
 Using the command window:
a. Stata file (.dta): use command
b. Excel file (.xlsx): import command
c. CSV file (csv): insheet command
d. .SPSS file: usespss command
 Log file:Stata can save the file in one of 2 d/t formats.

a. Stata Markup and Control Language (SMCL) format

(SMCL format is recommended because SMCL files
can be translated into a variety of formats readable by
applications other than Stata)
b. 2. Log format
• Use
– This command opens an existing Stata data file.
– The syntax is:
• use filename [, clear ] : opens new file
• use [varlist] [if exp] [in range] using filename [, clear]
opens selected parts of file
– If there is no path, Stata assumes it is in the current folder.
– You can use a path name such as: use C:\...\ERHScons1999
– If the path name has spaces, you must use double quotes:
use .”d:\my data\ERHScons1999”
– You can open selected variables of a file using a variable list.
– You can open selected records of a file using if or in.
Examining dataset
Here are some examples of the use command:
• use ERHScons1999 : opens this file for analysis.
• use ERHScons1999 if q1a == 1: opens data from region 1
• use ERHScons1999 in 5/25: opens records 5 through 25

of file
• We can also combine the if and the in commands
• use q1a hhid hhsize cons using ERHScons1999:

This opens 3 variables from ERHScons1999

• use ERHScons1999, clear: clears memory before
opening the new file
Examining dataset
 Clear: The clear command deletes all files, variables, and
labels from the memory to get ready to use a new data file
◦ You can clear memory using the clear command or by
using it as part of the use command. This command does
not delete any data saved to the hard-drive
 Exit: Differs from the Clear command

◦ Closes Stata and other relevant windows

 If data was entered in other formats such as excel,

importing in Stata is simple

 Example: if our data set is in Excel, then use
 Import excel using “C:\Users\eea\Desktop\SD\original\

teff price.xlsx”, sheet(addis) firstrow clear

Examining dataset
 Save
– The save command will save the dataset as a .dta file
under the name you choose. Editing the dataset changes
data in the computer's memory, it does not change the
data that is stored on the computer's disk.
 save “C:\...\ ERHScons1999.dta ”, replace
The replace option allows you to save a changed file to the
disk, replacing the original file.
– Stata is worried that you will accidentally overwrite
your data file.
– You need to use the replace option to tell Stata that you
know that the file exists and you want to replace it.
Examining dataset
• Edit
• This command use to open window called data editor
window that allow us to view all observation in the
memory.
• You can change the data using data editor window but it
is not recommend to edit data using this window
• It is better to correct errors in the data using a Do-file
program that can be saved.
• Browse
• This window is exactly like the Stata editor window
except that you can’t change the data
Examining dataset
• Describe
– This command provides a brief description of the
data file.
– You can use “des” or “d” and Stata will understand.
– The output includes:
• the number of variables
• the number of observations (records)
• the size of the file
• the list of variables and their characteristics
• Storage types: String vs numeric
Examining dataset
 list
◦ This command lists values of variables in data set.
◦ The syntax is:
 list [varlist] [if exp] [in range]
 examples:

◦ list lists entire dataset

◦ list in 1/10 lists observations 1 through 10
◦ list hhsize q1a food lists selected variables
◦ list hhsize sex in 1/20 lists observations 1-20 for
selected variables
Examining dataset
• list with “if” condition
– This command is used to select certain records in carrying

out a command
• command if exp

Examples:
– list hhid q1a food if food >1200 list data if food is > 1200
– list if q1a < 6 lists cases in region is 1 through 5
– Browse hhid q1a food if food >=1200 browse data if

food consumption is above 1200

• Note that “if” statements always use ==, not a single =.

Also note that | indicates “or” while & indicates “and”

Examining dataset

 list with “in”

◦ We also use in to select records based on the case number.
◦ The syntax is: command in exp
For example:
◦ list in 10 list observation number 10
◦ summarize in 10/20 summarize obs 10-20
 codebook

◦ The codebook command is a great tool for getting a quick

overview of the variables in the data file.
◦ It produces a kind of electronic codebook from the data
file, displaying information about variables' names, labels
and values
◦ Codes/ codebook/ codebook hhid q1a food
Examining dataset
 Inspect
 It is a command for getting a quick overview of a data file.
◦ inspect command displays information about the values of
variables and is useful for checking data accuracy
 inspect
 inspect hhid q1a food
• assert
• count
– count command can be used to show the number of
observations that satisfying if options.
– If no conditions are specified, count displays the number of
observations in the data.
• Count: 1452
• count if q1a==3: 466
1.3. STORING: Outputs, Commands & Data
 The following topics are covered:
◦ Using the Do-file Editor
◦ log using
◦ log off
◦ log on
◦ log close
◦ set log type to move tables from Stata to Word and Excel
 Using the Do-file Editor

The Do-file Editor allows you to store a program (a set of

commands),
◦ It makes it easier to check and fix errors,
◦ It allows you to run the commands later,
◦ It lets you show others how you got your result, and
◦ It allows you to collaborate with others on the analysis.
STORING: Outputs, Commands and Data
 In general, any time you are running more than 10 commands
to get a result, it is easier and safer to use a Do-file.
 To open the Do-file Editor, you can click on Windows/Do-file
Editor or click on the envelope on the Tool Bar.
 To run the commands in a Do-file,
 you can click on the Do button.
 If you want to run one or just a few commands rather than the
whole file, mark the commands and click on the Do button
 Note: If you would like to add a note to a do file, but do not
want Stata to execute your notes, /* */ is used for more than
one line and * just for a line
 If one needs to put comments following commands use // with
space from command and write the comments
STORING: Outputs, Commands and
Data
 Saving the Output
◦ Stata Results window does not keep all the output you
generate.
◦ It only stores about 300-600 lines, and when it is full, it
begins to delete the old results as you add new results.
◦ Thus, we need to use log to save the output
 log using
◦ This command creates a file with a copy of all the
commands and output from Stata. The syntax is:
log using filename [, append replace [ text | smcl ] ]
 Append: adds the output to an existing file
 Replace: replaces an existing file with the output
STORING: Outputs, Commands and Data
 Here are some examples:

log using "C:\Users\eea\Desktop\SD\results.smcl”

log using "C:\Users\eea\Desktop\SD\results.smcl , replace
log using "C:\Users\eea\Desktop\SD\results.smcl, append
 log off: This command temporarily turns off the logging of output,

 log on: This command is used to restart the logging,

 log close: is used to turn off the logging and save the file.

 Storing data

 Save

 Save, replace

 Examples

Save "C:\Users\eea\Desktop\SD\verion1.dta”
Save “C:\Users\eea\Desktop\SD\verion2.dta”, replace
Getting help in Stata
• Help: The help command gives you information about any Stata
command or topic
• help [command]

For example,
• help tabulate: gives a description of the tabulate command
• help summarize gives a description of the summarize
• search: a keyword search and Useful when one does not know

stata commands
 Example : search ols
 hsearch : not restricted to key words
 E.g. hsearch weak instruments
 netsearch: when connected to internet

◦ E.g. netsearch outreg2

2. Data Management in Stata
 Some Organizing Tips
 Adding Notes to Datasets and Variables
 Creating and Modifying Variables
 Defining, labeling and renaming Variables
 Appending and Merging Data Sets
 Collapsing Data Sets
 Additional Help on Stata
 Exercises
First, be organized
 Be organized in your data management
 Always use do files for your research project
 Know the Stata version you are working with

◦ What if I do not know the Stata version?

 Save your outputs

◦ capture log
◦ log using commands
 Create a shorter way of writing your directories

◦ The global command

Adding notes on your data set
 You can add notes on your data set
 Example

◦ note: This data contains some variables generated

by Economics staff
 To read notes,

◦ Note
 Notes can be written for variables

◦ Note food: Is this per capital or per week? Please

check.
 To delete notes

◦ notes drop q2_area in 1

◦ notes drop _dta in 2
CREATING NEW VARIABLES
 When new variables are created, they are in memory &
they will appear in the Data Browser
 However, that they will not be saved on the hard-disk

unless you use the save command.

 generate

◦ This command is used to create a new variable.

◦ It is similar to “compute” in SPSS.
 The syntax is;

generate newvar = exp [if exp]

where “exp“ is an expression like
“price*quant” or “1000*kg”
CREATING NEW VARIABLES
 You can use “gen“ or “g” as an abbreviation for
“generate“
 If the expression is an equality or inequality, the

variable will take the values 0 if the expression is false

and 1 if it is true
 If you use “if“, the new variable will have missing

values when the “if“ statement is false

 For example,
 use "$original\ERHScons1999_old.dta", clear
CREATING NEW VARIABLES
• generate age2= ageh*ageh
 create age squared variable
• gen conspercap=food/hhsize
 Creates consumption per capita
• gen consperad=food/aeu
 Creates consumption per adult
• gen highcons =(rconsae>2000)
 Indicates those with consumption of greater than 200
 To know the number of these households
CREATING NEW VARIABLES
• tab highcons
 save "$final\ERHScons1999.dta", replace
 replace : This command is used to change the definition

of an existing variable.
 The syntax is the same:

replace oldvar = exp [if exp] [in exp]

replace cons=. if cons<0: replaces negative consumption
with missing value
 tabulate … generate : This command is useful for

creating a set of dummy variables (variables with a

value of 0 or 1) depending on the value of an existing
categorical variable.
CREATING NEW VARIABLES
 The syntax is:
tabulate old variable, generate(newvariable)
tab q1a, gen(region)
 This creates 6 new variables:

region1=1 if q1a=1 and 0 otherwise

region6=1 if q1a =8 and 0 otherwise
 egen : This is an extended version of “generate” [extended

generate] to create a new variable by aggregating the

existing data.
 The syntax is:

egen newvar=fcn(arguments) [if exp] [in range] , by(var)

CREATING NEW VARIABLES
Functions
 mean() mean
 median() median
 max () standard deviation
 min() standardize variables
 sum () sums
 egen average = mean(cons): creates variable of average

consumption over entire sample

 egen median= median(cons), by(sex): creates variable of median

consumption for each sex

 egen regav = mean(cons), by(region): creates variable of mean

consumption for each region

 egen avecon=mean(cons), by( q1c)

 egen highavecon=(cons> avecon)

CREATING NEW VARIABLES
 Some operators used in Stata
+ addition > greater than
- Subtraction < less than
* Multiplication >= greater than or equal
/ division <= less than or equal
^ power == equal
 Logical ~= not equal
~ not != not equal
| or
& and
 The Variables Manager is a tool for managing properties
of variables both individually and in groups.
 It can be used to create variable and value labels, rename

variables, change display formats, and manage notes.

 It has the ability to filter and group variables as well as to

create variable lists.

 Labeling variable is: label variable var1 "description"
 Labeling the various levels of a categorical variable can be

labeled using the following two Stata syntaxes together:

label define and label value.
 Example: gender has two categories 1 for male and 2 for

female. if gender can be labeled as: label define gender 1

“male” 2 “female”
 label values gender gender
MODIFYING VARIABLES
 We begin with an explanation of how to label data in Stata.
 Then see how to format variables.

◦ rename variable
◦ label variable
◦ Keep/ drop and order/sort
◦ label define/values
 rename variables: This command is used to rename variables

in order to give other variable name. The command is

rename old_variable new_variable
Example: Generate a dummy for the region variable and rename
the new dummy variables
 Label variable: this helps us give a short description of the

variable. Command: label variable yield “output per hectare”

MODIFYING VARIABLES
 We can subset data by keeping or dropping variables, or by
keeping and dropping observations
◦ keep and drop variables
 The keep command is used to keep variables in the list while
dropping other variables
 The drop command is used to delete variables in the list while
keeping other keep and drop observations
 The keep if command is used to keep observations if condition is
met and vice versa for drop.
 If there are many variables to drop and few ones to keep, then

apply keep
 However, if there are many variables to keep and only few to

drop use drop

MODIFYING VARIABLES

Examples
◦ drop pwhole_mixed pretail_mixed
◦ keep pwhole_white pretail_white pwhole_red pretail_red
Note: The two commands are the same in this case
 Sort: This command arranges the observations of the current
data into ascending order based on the values of the variables
listed
 Variable ordering: This command helps us to organize
variables in a way that makes sense by changing the order of
the variables
 order x y z: Puts x first y second z third
 sort x : Puts data in ascending order of the variable x
Appending datasets
 Appending datasets
 Often we don’t have all the info
that we need in one dataset, and
we have to append two or more
datasets into one
 merge two or more datasets
into one
 There are several types of
“appending” “merging”
datasets…
 As long as the variables in the
files are the same and the only
thing you need to do is to add
observations, this is vertical
combination.
 For this we use the append
command.
Appending datasets
 Appending data files
◦ concatenates two datasets, that is, stick them together
vertically, one after another
use "$final\tprice_addis.dta", clear
append using "$final\tprice_dire.dta“
save "$final\tprice_all.dta", replace
◦ The append command does not require that the two
datasets contain the same variables.
◦ But it highly recommended to use identical list of
variables for append command to avoid missing
values from one dataset
Defining Variables
 label define: This command gives a name to a set of value
labels. For example, instead of numbering the regions, we can
assign a label to each region. The syntax is:
label define lblname # "label" # "label" # “label“ [, add modify]
 Where: lblname is the name given to the set of value labels
◦ # are the value numbers
◦ “label”are the value labels
◦ add means add these value labels to the existing set
◦ modify means o change these values in the existing set
Defining Variables
 Note that:
 You can use the abbreviation “label def“
 The double quotation marks are only necessary if there are
spaces in the labels
 Stata will not let you define an existing label unless you say
“modify” or “add“
 label values

◦ This command attaches named set of value labels to a

categorical variable.
 The syntax is:

label values varname [lblname] [, nofix]

label define reg 1"Tigray" 3"Amhara" 4"Oromia" 7"SNNP",modify
label values q1a reg
Merging and appending datasets
If the identifying variable
which appears in the files is
unique in both files, then it's
a one-to-one match.
Unique means that for each
value of this variable, there
is only one observation that
contains it.
In the figure below, country is
the identifying variable.
In both datasets, each country
has only one observation.
Merging and appending datasets
 One-to-one match merging
 The merge command sticks two datasets horizontally, one
next to the other. Before any merge, both datasets must be
sorted by identical merge variable
. use p2sec9a.dta, clear
. sort hhid item1234
. save consumption.dta, replace
.use p_r5, clear
. sort hhid item1234
. save comprice.dta, replace
. use consumption.dta, clear
. merge hhid item1234 using compri
Merging and appending datasets
 One-to-many
matching
◦ If the identifying
variable is unique in
one file, but not
unique in the other,
then it's a one-to-
many matching.
Collapsing data sets
 Collapse
◦ Sometimes we have data files that need to be
aggregated at a higher level to be useful for us.
◦ For example, we have household data but we really
interested in regional data.
◦ The collapse command serves this purpose by
converting the dataset in memory into a dataset of
means, sums, medians and percentiles
 For instance, we would like to see the mean cons in each
q1a and sex of hh head.
 collapse (mean) cons, by(q1a sex)
Additional Stata Resouces
 Don’t forget to get help for command
specific searches
◦ Help help
◦ Search
◦ Hsearch
◦ Netsearch
 http://stataproject.blogspot.com.
 http://www.stata.com/
 http://www.stata.com/statalist/
Additional Stata Resources
 Statalist is
◦ hosted at the Harvard School of Public
Health,
◦ is an email listserver
◦ Stata users including experts writing Stata
programs to users like us
◦ maintain a lively dialogue about all things
statistics and Stata.
4. Data Analysis Using Stata ©
 Describing Data with Summary

Statistics

 Applying Some Statistical Tests in

Stata

 Describing Data with Graphs

 Exercises
3.1.Basic Descriptive Statistics Using Stata
• summarize
– The summarize command produces statistics on continuous

variables like age, food, cons hhsize.

– The syntax looks like this:

summarize [varlist] [if exp] [in range] [, [detail]]

By default, it produces the following statistics:
• Number of observations
• Average (or mean)
• Standard deviation
• Minimum
• Maximum
Basic Descriptive Statistics
Using Stata
If you specify “detail” Stata gives you additional
statistics,such as
• skewness,
• kurtosis,
• the four smallest values
• the four largest values
• various percentiles.
mean = expected value (expectation) of Y = E(Y) = μY = long-
run average value of Y over many repeated occurrences of Y
variance = E[(Y – μY)2 = measure of the squared spread of the
distribution around its mean
standard deviation = σY
Basic Descriptive Statistics using Stata

E  Y  Y 
3

 
skewness =  Y3
 measure of asymmetry (lack of symmetry) of a distribution

 skewness = 0: distribution is symmetric

 skewness > (<) 0: distribution has long right (left) tail

 Skewness mathematically describes how much a distribution

deviates from symmetry

 Kurtosis =
= measure of mass in tails = measure of probability of large values
kurtosis = 3: normal distribution

kurtosis > 3: heavy tails (“leptokurtotic”)

1-62
Basic Descriptive Statistics using Stata
Basic Descriptive Statistics Using Stata

 Here are some examples:

 Summarize: gives statistics on all variables
 summarize hhsize food: gives statistics on selected variables
 summarize hhsize, detail
 summarize hhsize cons if q1a==3: gives statistics on two
variables for one region
 By: This prefix goes before a command and asks Stata to
repeat the command for each value of a variable.
 The general syntax is: by varlist: command
 Note: bysort command is most commonly used to shorten the
sorting process. example of the by prefix are:
 bysort sex: sum rconsae for sex of hh head, give stats on real
per capita consumption.
Basic Descriptive Statistics
Using Stata
 Tabulate, tab1, tab2
◦ These are three related commands that produce frequency
tables for discrete variables.
◦ They can produce one-way frequency tables (tables with
the frequency of one variable) or two-way frequency tables
(tables with a row variable and a column variable.
 Tabulate/ tab: produce a frequency table for one or two
variables
 Tab1: produces a one-way frequency table for each variable
in the variable list
 Tab2: produces all possible two variable tables from the list
of variables
Basic Descriptive Statistics
Using
You Stata
can use several options with these commands:
• Cell: gives the overall percentage for two-way tables
• Column: gives column percentages for two-way tables
• Row: gives row percentages for two-way tables
 There are many other options, including other statistical tests.
 For more information, type “help tabulate”
 Some examples of the tabulate commands are:
 tabulate q1a: produces table of frequency by region
 tabulate q1a sexh: produces a cross-tab of frequencies by region and sex of head
 tab q1a sexh
 tab1 q1a sexh: produces three tables, a frequency table for each variable
 tab2 q1a sexh
 tab2 q1a poor
 tab2 q1a sexh, cell
 tab2 q1a sexh, row
 tab2 q1a sexh, column
Statistical Tests
 ttest command
 We would like to see if the mean of hhsize equals to 6 by using
single sample t-test, testing whether the sample was drawn from a
population with a mean of 6. ttest command is used for this
purpose: ttest hhsize=6
 We are interested that if cons is close to food.
ttest cons=food
 ttest command for independent groups with pooled (equal)
variance: ttest cons, by(sexh)
 ttest command for independent groups using unequal variance:
ttest cons, by(sexh) unequal
STATISTICAL TESTS
 correlate command
◦ The correlate command displays a matrix of Pearson correlations
for the variable listed. E.g correlate cons hhsize
 Correlation vs Causation

 Any 2 variables can be correlated without being the cause of another

cov( X , Z )  XZ

corr(X,Z) = var( X ) var(Z )  X  Z = rXZ
• –1 ≤ corr(X,Z) ≤ 1
• corr(X,Z) = 1 mean perfect positive linear association
• corr(X,Z) = –1 means perfect negative linear association
• corr(X,Z) = 0 means no linear association
• Correlation coefficient is unit less, so it avoids the problems of the
covariance.
• corr(X,Z) when measured in feet same as corr(X,Z) when X & Z in
meters or pounds
1-68
PRESENTING DATA WITH GRAPH
 The Stata graph commands begin with the word graph (in some
cases this is optional).Examples:
◦ graph twoway scatterplots, line plots,
◦ graph bar bar charts
◦ graph pie pie charts
 Examples
◦ graph twoway scatter cons food
 We can show the regression line predicting cons from food using
lfit option.
◦ twoway lfit cons food
 The two graphs can be overlapped like this
◦ twoway (scatter cons hhsize) (lfit cons hhsize)
PRESENTING DATA WITH GRAPH
 Labeling graphs
scatter var1 var2, title("title") subtitle ("subtitle") xtitle
("xtitle") ytitle ("ytitle") note("note")
 Example

scatter ageh cons , title("title") subtitle("subtitle")

xtitle("xtitle") ytitle("ytitle") note("note")
 Histograms and kernel density

◦ histogram cons
◦ histogram cons, normal
 kernel density

◦ kdensity cons
◦ kdensity cons, normal
4. Regression Analysis Using Stata
 Steps in Empirical Analysis
 Structure of Economic Data
 Regression Models

◦ Assumptions and their violations

 Regression Analysis Using Stata

◦ Linear Models: Cross Section

◦ Linear Models: Panel Data
◦ Non linear Models: Cross Section
 Reporting Regression Models
Steps in Empirical Analysis
 Empirical Analysis

• An empirical analysis uses data to test a theory or to estimate a

relationship.
• First step in any empirical analysis is the careful formulation of
the question of interest.
• Literature review is an important step in any empirical analysis
• In some cases a formal economic model is constructed.
• An economic model consists of mathematical equations that
describe various relationships y  f ( x1 , x2 ,...)
• Formal economic modeling is the starting point for empirical
analysis. but it is more common to use economic theory less
formally, or intuition
Steps in Empirical Analysis
• Then we need to turn the economic model into what we call
an econometric model yi  0  1 xi1   2 xi 2  ...   i
• The form of the function must be specified before we can
undertake an econometric analysis.
• We need to deal with variables that cannot reasonably be
observed.
• We must somehow account for the many factors that we
cannot even completely list
• Unobserved factors and error in measurement can be
accounted for using error term or disturbance term
• Once an econometric model has been specified, various
hypotheses of interest can be stated in terms of the unknown
parameters
Structure of Economic Data
Data Management
• Structure of Economic data

– Economic data sets come in a variety of types

– Some econometric methods can be applied with little or no
modification to many different kinds of data sets
– The special features of some data sets must be accounted for or
should be exploited
– We next describe the most important data structures encountered
in applied work
1. Cross-section
• Consists of a sample of individuals, households, firms, cities,

states, countries, or a variety of other units, taken at a given point

in time
• In a pure cross section analysis we would ignore any minor timing

differences in collecting the data

Structure of Economic Data
• An important feature of cross-sectional data is that we can
often assume that they have been obtained by random
sampling from the underlying population, which simplifies
most of the analysis
• But there could be violations of the random sample
assumptions
– Refusal to respond by some group of the respondents
– Sampling from units that are large relative to the
population (the population is not large enough to
reasonably assume the observations are independent draws)
• cross-sectional data is closely aligned with the applied
microeconomics fields, such as labor economics, state and
local public finance, industrial organization, urban
economics, demography, and health economics
Structure of Economic Data
2. Time-series
• A time series data set consists of observations on a variable or
several variables over time. Examples of time series data include
stock prices, money supply, consumer price index, gross domestic
product, annual homicide rates
• Because past events can influence future events and lags in
behavior are prevalent in the social sciences, time is an important
dimension in a time series data set
• The chronological ordering of observations in a time series conveys
potentially important information
• What makes time series more difficult to analyze than cross-
sectional data is the fact that economic observations can rarely, if
ever, be assumed to be independent across time
• Another feature of time series data that can require special attention
is the data frequency at which the data are collected
Structure of Economic Data
3. Pooled cross-sections
• Some data sets have both cross-sectional and time series features
• Pooled cross-section is a combination of several cross-section data
that are collected from the same population in different time periods
• Pooling cross sections from different years is often an effective way
of analyzing the effects of a new policies
• The idea is to collect data from the years before and after a key
policy change
4. Panel or longitudinal data
• A panel data (or longitudinal data) set consists of a time series for
each cross-sectional member in the data set
• Panel data can be collected on household, firms or geographical units
• The key feature of panel data that distinguishes it from a pooled cross
section is the fact that the same cross-sectional units (individuals,
firms, or counties are followed over a given time period
Simple Linear Regression
 In this case we only have one repressor and
a constant yi  0  1 xi   i
The Gauss–Markov
Assumptions
 There are assumptions about the error term
and the explanatory variables xi
 So-called Gauss–Markov assumptions are

A1 E i  0, i 1,2,..., n

A2  1 ,...,  n  and x1 ,..., xn  are independent
A3 V  i   , i 1,2,..., n
2

A4 cov i ,  j  0, i, j 1,2,..., n, i  j

Additional Assumptions
 The relationship of interest is linear
◦ Linearity is in parameter, not in variables
 Data are stationary (pertinent for time series

data)
◦ Distribution is the same over time
 Weak/covariance vs strong stationarity
 Data is random

 Survey design
 Nature of data
Properties of the OLS
Estimator
Under assumptions (A1)–(A4), the OLS


estimator b for has the properties:

◦ Unbiasedness of  E b 

◦ The OLS estimator b of  is the best estimator, i.e.

among the set of estimators, the OLS estimator is
one with the least variance
◦ b is a linear function of the explanatory and the
dependent variables
◦ Hence, b is BLUE for 
Multiple Regression Analysis
 Multiple regression analysis is more
amenable to ceteris paribus analysis
 It allows us to explicitly control for many

other factors which simultaneously affect

the dependent variable

yi  0  1 xi1   2 xi 2  ...   k xik   i

 Multiple regression models can

accommodate many explanatory variables
Violations of GM
Assumptions
A1 E i  0, i 1,2,..., n
A2  1 ,...,  n  and x1 ,..., xn  areindependent
A3 V  i   2 , i 1,2,..., n
A4 cov i ,  j  0, i, j 1,2,..., n, i  j

 Additional
 The relationship of interest is linear
 Data are stationary
 Data is random
Violations of GM Assumptions
 GM assumptions can be violated for a
variety of reasons
 Example 1: The assumption that there is

zero covariance between the error term and

one or more explanatory variables can be
violated due to:
◦ Omitted variables bias
◦ Measurement error
◦ Simultaneity
 Example 2: Linearity assumption may falter
◦ Most behavioral relation ships are non linear
◦ The structure of the data may require us to use
non linear data
Violations of GM Assumptions
 Example4: cross section (household data) is

usually hetroskedastic

 Example 3: Time series data are usually non

stationary

 Example 4: There might be selection problem

How to amend these violations
 Omitted variables
◦ Instrumental variable
◦ Proxy variable
◦ Simultaneous equations models:2SLS
◦ Panel data
 Non linear Models
◦ Use discrete choice models
◦ Corner solution outcomes
 Heteroskedasticity
◦ Use weighted list square
◦ Use non-standard standard errors
 E.g. robust /white standard errors
How to amend these violations
 Non stationary time series
◦ EG EC Model
◦ Johanson Approach

 Non random sample

◦ Selection models
 E.g. Heckman Selection model

 Remember that before trying to amend these

violations, we have to test their existence in a
given data set.
Simple Linear Regression Models: Cross
Section
 General Format

regress depvar indvar if/in weights, options

 The regress command performs OLS

regression and yields an analysis of
variance table, goodness of feet stats, coef.
estimates, se, t stats and p values, and
confidence intervals
 See examples
Basic Format: Linear-Cross Section
◦ The xi prefix is used to dummy code categorical
variables, and we tag these variables with an “i.”
in front of each target variable
xi: regress cons hhsize i.q1a,
robust
◦ By default, Stata selects the first category in the
categorical variable as the reference category. If
we would like to declare a certain category as
reference category
char q1a[omit] 7
xi:regress cons hhsize i.q1a,
robust
Basic Format: Linear-Panel
 Basic format for linear panel data
 xtreg depvar indvar if/in weights, options

Two things to be noted before running panel data

regression models
The dataset should be in long form (not wide form
which is the default after merging two or more
datasets)

use reshape long varlist, i(identifier) j(time

variable)
Basic Format: Linear-Panel
 Basic format for linear panel data
 xtreg depvar indvar if/in weights, options

 Two things to be noted before running panel data

regression models

The dataset should be in long form (not wide form

which is the default after merging two or more
datasets)

use reshape long varlist, i(identifier) j(time

variable)
Reporting Regression Outputs
 One can present regression outputs in a
format that we see on journals, articles etc.
 To do that

◦ Regress the models and store them separately

◦ Use estimates store to do this
◦ Combine the tables using estimates table
command
◦ See Examples

◦ If one would like to report coefficients of only

selected explanatory variables, use the
keep(varlist) option
Many Thanks for Your
Attention and Effort

GMP Dolphin-G Series
0% (1)
GMP Dolphin-G Series
1 page
Econometrics Slides
No ratings yet
Econometrics Slides
289 pages
Econometrics I Lab Tutorial Using STATA
No ratings yet
Econometrics I Lab Tutorial Using STATA
28 pages
Training at Gudar Campus
No ratings yet
Training at Gudar Campus
83 pages
Unit One Software Aplication in Economics
No ratings yet
Unit One Software Aplication in Economics
34 pages
STATA Manual 1
100% (1)
STATA Manual 1
61 pages
Intro To Stata 2022
No ratings yet
Intro To Stata 2022
36 pages
Lecture 1-2 Applied Econometrics
No ratings yet
Lecture 1-2 Applied Econometrics
68 pages
Presentation 1
No ratings yet
Presentation 1
23 pages
CH - 1 - Introduction To Econometrics Software Stata
No ratings yet
CH - 1 - Introduction To Econometrics Software Stata
35 pages
Stata Application Part I
No ratings yet
Stata Application Part I
27 pages
Introduction To Stata 2024-06-18 Handout
No ratings yet
Introduction To Stata 2024-06-18 Handout
52 pages
STATA Frain
No ratings yet
STATA Frain
68 pages
Exam 1 690C 2020 SOLUTIONS Stata
No ratings yet
Exam 1 690C 2020 SOLUTIONS Stata
6 pages
Chapter 14: Introduction To Panel Data
No ratings yet
Chapter 14: Introduction To Panel Data
14 pages
Stata Training
No ratings yet
Stata Training
24 pages
Comandos
No ratings yet
Comandos
51 pages
Analysing Panel Data Using STATA
No ratings yet
Analysing Panel Data Using STATA
13 pages
STATA Notes 2022
No ratings yet
STATA Notes 2022
25 pages
ECONOMETRICS USING STATA Final
No ratings yet
ECONOMETRICS USING STATA Final
181 pages
Stata Commands PDF
No ratings yet
Stata Commands PDF
5 pages
Stata Data Managment
No ratings yet
Stata Data Managment
79 pages
(Data & Variable Management) Stata Data Management
No ratings yet
(Data & Variable Management) Stata Data Management
64 pages
Ucla Stata Intro
No ratings yet
Ucla Stata Intro
95 pages
Lectute 2 - Panel Data Regression
No ratings yet
Lectute 2 - Panel Data Regression
30 pages
Data Analysis With Stata: Creating A Working Dataset: Gumilang Aryo Sahadewo October 9, 2017 Mep Feb Ugm
No ratings yet
Data Analysis With Stata: Creating A Working Dataset: Gumilang Aryo Sahadewo October 9, 2017 Mep Feb Ugm
25 pages
STATA
No ratings yet
STATA
26 pages
Teaching With Stata: Peter A. Lachenbruch & Alan C. Acock Oregon State University
No ratings yet
Teaching With Stata: Peter A. Lachenbruch & Alan C. Acock Oregon State University
28 pages
STATAfor Econ Workshop 3
No ratings yet
STATAfor Econ Workshop 3
12 pages
STATA
No ratings yet
STATA
170 pages
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
No ratings yet
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
27 pages
Introduction To Stata and Data Management
No ratings yet
Introduction To Stata and Data Management
30 pages
Practical Concepts in Stata
No ratings yet
Practical Concepts in Stata
87 pages
08 Dummy Variable
No ratings yet
08 Dummy Variable
24 pages
Practical Introduction To Stata PDF
No ratings yet
Practical Introduction To Stata PDF
58 pages
Endogeneity Test Stata 14 PDF
No ratings yet
Endogeneity Test Stata 14 PDF
18 pages
Introduction To Stata 8
No ratings yet
Introduction To Stata 8
74 pages
Stata An Introduction Summer 2020
No ratings yet
Stata An Introduction Summer 2020
60 pages
Stata Data Managment ALEX Final
No ratings yet
Stata Data Managment ALEX Final
44 pages
Lecture 3
No ratings yet
Lecture 3
31 pages
Applied Economics Using Stata
No ratings yet
Applied Economics Using Stata
170 pages
Summary of Basic STATA Commands and Syntax
No ratings yet
Summary of Basic STATA Commands and Syntax
5 pages
Advanced Microeconomic Theory
No ratings yet
Advanced Microeconomic Theory
103 pages
Lecture 4 SP 2025
No ratings yet
Lecture 4 SP 2025
86 pages
STATA Training For Staff
No ratings yet
STATA Training For Staff
23 pages
Software Material
No ratings yet
Software Material
13 pages
Chapter Three
No ratings yet
Chapter Three
100 pages
Stata Introduction To Stata
No ratings yet
Stata Introduction To Stata
12 pages
Stata Manual Introduction
No ratings yet
Stata Manual Introduction
24 pages
Basics of STATA Software
No ratings yet
Basics of STATA Software
67 pages
Gravity13 Stata
No ratings yet
Gravity13 Stata
80 pages
Applied Econometrics Using Stata
100% (1)
Applied Econometrics Using Stata
100 pages
An Introduction To Stata For Economists: Data Management
No ratings yet
An Introduction To Stata For Economists: Data Management
49 pages
Introduction To Stata For Data Management
No ratings yet
Introduction To Stata For Data Management
7 pages
Applied Econometrics Using Stata
100% (2)
Applied Econometrics Using Stata
100 pages
Stata 1
No ratings yet
Stata 1
24 pages
Stata Basics13
No ratings yet
Stata Basics13
23 pages
Stata: A Brief Introduction
No ratings yet
Stata: A Brief Introduction
9 pages
Stata
No ratings yet
Stata
6 pages
Introduction Stata Slides 2
No ratings yet
Introduction Stata Slides 2
25 pages
A I S ECMT1020: N Ntroduction To Tata
No ratings yet
A I S ECMT1020: N Ntroduction To Tata
15 pages
Day 4 Book Inter Jan'25
No ratings yet
Day 4 Book Inter Jan'25
3 pages
Developing Human Resources Through Educational Institute in Bangladesh
100% (4)
Developing Human Resources Through Educational Institute in Bangladesh
13 pages
Spray Booth Design English
No ratings yet
Spray Booth Design English
7 pages
Demand Analysis of Maggi
83% (6)
Demand Analysis of Maggi
8 pages
Impacts of The World Recession and Economic Crisis On Tourism North America
No ratings yet
Impacts of The World Recession and Economic Crisis On Tourism North America
11 pages
Intel - RKL-S Plamform: System Chipset: Cpu
No ratings yet
Intel - RKL-S Plamform: System Chipset: Cpu
71 pages
Laboratory Details Central Instrument Room
No ratings yet
Laboratory Details Central Instrument Room
8 pages
Ulei
No ratings yet
Ulei
7 pages
Ups Lyonn Rackeable
No ratings yet
Ups Lyonn Rackeable
2 pages
MGT301 MIDTERM Solved PDF
100% (2)
MGT301 MIDTERM Solved PDF
69 pages
Guide To Developing An Approved Culinology Degree Program - Updated 2017
No ratings yet
Guide To Developing An Approved Culinology Degree Program - Updated 2017
15 pages
The Theoretical Framework of The Optimization of Public Transport Travel
No ratings yet
The Theoretical Framework of The Optimization of Public Transport Travel
7 pages
Hydronic Heaters Selection Spreadsheet
No ratings yet
Hydronic Heaters Selection Spreadsheet
19 pages
Airport Terminal Standard Dimensions
No ratings yet
Airport Terminal Standard Dimensions
2 pages
The Economist
No ratings yet
The Economist
27 pages
District Test On The Circular Flow Model-1-1
100% (2)
District Test On The Circular Flow Model-1-1
7 pages
Integral University Lucknow: Assignment - 2 Subject Code - BM333 Subject - Personnel Management Year - 2022-2023
No ratings yet
Integral University Lucknow: Assignment - 2 Subject Code - BM333 Subject - Personnel Management Year - 2022-2023
4 pages
IELTS Listening Test 122
No ratings yet
IELTS Listening Test 122
6 pages
Management MCQ - Merged (1) - 1
No ratings yet
Management MCQ - Merged (1) - 1
1 page
Behavior Aspect of Public Sector Planning and Budg
No ratings yet
Behavior Aspect of Public Sector Planning and Budg
3 pages
1st Pinnacle Open Blitz Chess Tournament 2025
No ratings yet
1st Pinnacle Open Blitz Chess Tournament 2025
4 pages
Beam Telecom PVT LTD.: 8-2-610/A, Road No.10, Banjara Hills, Hyderabad-500034 Tel: +91-40-66272727
No ratings yet
Beam Telecom PVT LTD.: 8-2-610/A, Road No.10, Banjara Hills, Hyderabad-500034 Tel: +91-40-66272727
2 pages
Shortcut Virus Remover
100% (1)
Shortcut Virus Remover
5 pages
Covumaiphuongthionline2 - Menhdequanhe
No ratings yet
Covumaiphuongthionline2 - Menhdequanhe
3 pages
Part IV GFSI Guidance Document Sixth Edition Version 6.3
No ratings yet
Part IV GFSI Guidance Document Sixth Edition Version 6.3
7 pages
Romantic Escapade - South & North Goa
No ratings yet
Romantic Escapade - South & North Goa
15 pages
File Chinh Thuc - HSG 2020 - Vòng 2
No ratings yet
File Chinh Thuc - HSG 2020 - Vòng 2
17 pages
Dietetics As A Profession
No ratings yet
Dietetics As A Profession
11 pages
Sts Benigno Aquino III
No ratings yet
Sts Benigno Aquino III
3 pages