Lecture 1-2 Applied Econometrics
Lecture 1-2 Applied Econometrics
Fozan Fareed
Email ID: [email protected]
Key Objectives of the Course
• Familiarize students with the basic features of Stata and apply econometrics using real data
sets
Data Management
OLS Regression
Tacking Endogeneity
-Edward Deming
1.1 : Back to Basics: Learning by Doing
• Please open the data ‘Lecture 1 HDI’
• The Do-file Editor creates do-files: Write your commands in the do-file and
execute it by clicking on the Execute icon
• Using do-files is essential to reproduce your work later! It's also much more
convenient if you want to modify some commands. Always keep a track of your
program in a do-file.
• Another way: This can be done by going to Data menus and clicking on describe data
1.7: Variable Types in Data Editor
• When the Data Editor is open, you can see that the:
• Columns represent variables, whereas the rows represent observations.
• Missing values: a period (.) for numeric variables and empty quotations "" for
string variables
1.8 : Changing Variable Type
• Numeric to non-numeric: tostring [var name], gen([new var name]) [force]
• Non-numeric to numeric:
• destring [var name], gen([new var name])
• If you have a text variable: encode [var name], gen([new var name])
• If you want to change the type and replace the existing variable
• destring [var name], replace
1.9 : Labelling
• To ensure that the database is easily readable, its important to add some extra
information to your dataset and variables
• Labelling Data: For a data label (that adds a label to your dataset) type label data
“Human Development Index Data” in the Command window & press Enter.
• Labelling Variable: For a variable label (that adds a label to a particular variable)
type label variable [the name of the variable] “[the label]”
• Stata Command: label var [variable name] "label": This adds information on a
specific variable
1.10 : Renaming a Variable
• Lets say that you want to change the name of a variable (s)
• How can you do that?
• Example: the variable urban isn’t precise enough, type a new name for this variable
Note: You can also order a variable before/after some other variable
1.13 : Dropping Observations or Variables
• Drop variable(s): drop [variable(s) name(s)]
• Example: For this example we will create a GDPperCapita variable that provides
information about the economic performance of the country.
label define labelhdi 1 "Very Low Development Level" 2 ‘’Low Development Level" 3
« HighDevelopment Level" 4 "Very High Development Level"
Example:
• collapse (sum) GDP trade TotalPopulation , by ( Region )
1.20 : What does the Help Command do?
• Stata command: help [command]
• Tells you about using the right syntax for the command
• How to reach it via the Menu
• Description of what does the command do
1.21 : Other Useful Commands
Commands Description Examples
• Step 2: Close the using dataset (after making sure that there is a variable common in both)
and open the master database data1- Celebrities.dta
• Now, we merge this master database with the using database mergeA.dta
merge.dta
(master database, currently opened in
your STATA)
Data3- append.dta
(using database)
• Or in simpler terms:
Chapter 2
Descriptive Statistics
“Without data you are just another person with an opinion”
-Edward Deming
2.1 : Summarizing the Data
• Describing the data tells us something about the structure of the data, but
it says little about the data themselves.
• The result is a table containing summary statistics about all the variables in
the dataset.
Note: You can also include conditions with the summarize command. E.g. summarize [varA] if [varB]=1
2.2 : Variable Types
• Statistics and Econometrics deal with several kinds of data: these data could be
discrete or continuous variables or categorical variables.
• Discrete variables: is a variable that takes values from a finite or countable set,
such as the number of students, cars in a parking lot.
• Categorical variables: is a variable that can take on one of a limited, and fixed,
number of possible values, such as sex, age group
2.3 : Frequency Tables
• If you want basic descriptive statistics on a categorical variable, use the command
tabulate which gives frequency one-way tables
• You can also tabulate two categorical variables together to look at bi-variate statistics
2.3 : Frequency Tables (Cont.)
• Making a two-way table using two categorical variables “Regions” & “Development Level”
Africa 0 6 10 31 47
Asia 18 14 16 4 52
Europe 32 5 0 0 37
North & Central Ame.. 4 14 4 1 23
Oceania 2 5 1 2 10
South America 3 7 1 0 11
Total 59 51 32 38 180
• But sometimes your are more interested in percentages, how to get the above table in
percentages?
2.3 : Frequency Tables (Cont.)
• Making a two-way table using two categorical variables
• Stata Command : tab [variable 1] [variable 2], col
Development Level
Region 1 2 3 4 Total
Africa 0 6 10 31 47
0.00 11.76 31.25 81.58 26.11
Asia 18 14 16 4 52
30.51 27.45 50.00 10.53 28.89
Europe 32 5 0 0 37
54.24 9.80 0.00 0.00 20.56
Oceania 2 5 1 2 10
3.39 9.80 3.13 5.26 5.56
South America 3 7 1 0 11
5.08 13.73 3.13 0.00 6.11
Total 59 51 32 38 180
100.00 100.00 100.00 100.00 100.00
1- What is the average “life expectancy” and average “expected years of schooling” for
Asian counties?
2- What percentage of countries with the lowest HDI level (category 4) are in South
America? [use the variable development]
3- Generate a new variable “Poorcountries” which takes the value “1” if countries have low
development level (category 3 & 4) and “0” otherwise
4- What percentage of countries in Africa fall under the “Poor countries” category?
2.3 : Frequency Tables(Cont.)
• If we are interested in the relationship between one categorical and one
quantitative variable, we can describe the quantitative variable for the
different subgroups of the categorical variable
• bysort [categorical var]: summarize [quantitative var], detail
• Ex: Is there a difference in the GDP of countries across poor and rich countries?
Percentiles Smallest
1% .2877007 .2029027
5% 1.182875 .2877007
10% 2.528558 .5860755 Obs 109
25% 33.54116 .7151036 Sum of Wgt. 109
What can we say about the difference in
50% 163.2309 Mean 859.9123
Largest Std. Dev. 2710.746 GDP here?
75% 485.1162 3740.232
90% 2029.069 4944.928 Variance 7348146
95% 2951.687 17662.27 Skewness 6.209043
99% 17662.27 21223.92 Kurtosis 43.64091
-> Poor = 2
Fozan Fareed
Email ID: [email protected]
Class Outline
1. Data Visualization
➢ Scatter Plots, Histogram, Pie charts etc.
➢ How not to put graphs in your research!
2. OLS Regression
➢ Main Assumptions
➢ How to tackle non-linearities
➢ Interpreting Regression Results
Data Visualization
3.1 : Graphics
• Go to the Graphics tab in order to prepare:
• Scatter Plots
• Pie Charts
• Histograms
• Bar Charts
• Others…
twoway (scatter hdivalue sdg3lifeexpectancyatbirthyears, mlabel(country)) if region=="South America", ytitle(Human Development Index)
xtitle(Life expectancy (In Years)) title("HDI and Life Expectancy") subtitle("South American Countries") note("Source: United Nations Data 2016")
3.3 : Pie Chart
Stata Code: graph pie, over(Variable)
graph pie, over(region) pie(_all, explode) plabel(_all percent, format(%4.0g)) title("Regional Classification of Countries")
note("Source: UN Data")
3.4 : Histogram
Stata code: histogram [variable name ], [option]
Histogram [variable], percent
GDP hdivalue
• Correlation is NOT causation!!!
GDP 1.0000
1. Sometimes we have a non linear relationships, and this non-linearity is not captured by
the correlation coefficient.
2. The correlation coefficient, include some extreme values for sample observations
(named: Outliers) that influence the value of the correlation coefficient.
3. Spurious correlation