0% found this document useful (0 votes)

26 views9 pages

Lec11-Stata Regression

The document discusses multiple linear regression analysis using Stata. It uses a dataset on school performance to predict the variable api00 based on acs_k3, meals, and full. Running the regression in Stata, it finds the model to be statistically significant with an R-squared of 0.67, indicating the variables explain a large portion of the variation in api00.

Uploaded by

acegi3476

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views9 pages

Lec11-Stata Regression

Uploaded by

acegi3476

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Multiple linear regression:

Steps for Running Regression-

•1. Examine descriptive statistics
•2. Look at relationship graphically and test correlation(s)
•3. Run and interpret regression
•4. Test regression assumptions

If you need help with any command name, just type:

–help (command name)

Logical Operators:
–Less than: <
–Greater than: >
–Less than or equal to: <=
–Greater than or equal to: >=
–Equals: ==
–Does not equal: !=

webuse auto or sysuse auto.dta

describe

If you want to learn more about the data file, you could list all or some of the observations. For
example, below we list the first five observations.

If we want the name of the car whose weight is between 1000 and 2000 pounds...
–list make if weight > 1000 & weight < 2000 –What if we also wanted weight listed with their
name?•If we want a list of cars and their mileage per gallon (mpg) whose mpg is less than
20 or over 30..
.–list make if mpg < 20 | mpg > 30
list in 1/5
list make price mpg in 1/10
summarize acs_k3, detail
tabulate acs_k3
list snum dnum acs_k3 if acs_k3 < 0
This option tells Stata the range of the observations over which we want to apply the command.
list [variable name] in 5 lists the 5th observation of the variable
list [variable name] in 5/10 lists from 5th to 10th observation
list [variable name] in –3 lists the third-from-the-last
list [variable name] in 5/l lists from 5th to last observation

To label variables label variable [variable name] “comment”

This command allows you to document your data set so that you can make comments
on the variables and give a short description (at most 31 characters long) of the
variable.

Using keep/drop to eliminate variables

keep make rep78 foreign mpg price

keep make price mpg

drop displ gear_ratio

Using keep if/drop if to eliminate observations

drop if missing(rep78)

keep if (rep78 <= 3)

Eliminating variables and/or observations with use

use make mpg price rep78 using auto

use auto if (rep78 <= 3)

use make mpg price rep78 using auto if (rep78 <= 3)

Let’s make a table of rep78 by foreign to look at the repair histories of the foreign and domestic
cars.

tabulate rep78 foreign

tabulate rep78 foreign if rep78 >=4

Let’s make the above table using the column and nofreq options. The command column requests column
percentages while the command nofreq suppresses cell frequencies. Note that column and nofreq come after the
comma. These are options on the tabulate command and options need to be placed after a comma.

tabulate rep78 foreign if rep78 >=4, column nofreq

The use of if is not limited to the tabulate command. Here, we use it with the list command/

list if rep78 >= 4

If we wanted to include just the valid (non-missing) observations that are greater than or equal to 4, we
can do the following to tell Stata we want only observations where rep78 >= 4 and rep78 is not missing.

list if rep78 >= 4 & !missing(rep78)

list if rep78 >= 4 & rep78 !=
Additionally, we can use this code to designate a range of values. Here is a summary of price for the
values 3 through 5 in rep78.

summarize
summarize, detail
summarize price if inrange(rep78,3,5
summarize price if rep78 >= 3 & !missing(rep78)

Correlate: Correlation and covariance between two variables::

correlate (this command computes the correlation coefficient between all the possible pairs of variables
in memory)
correlate mpg price weight, means
or correlate mpg price weight
correlate var1 var2 (this command computes the correlation coefficient between the two variables
specified)
correlate var1 var2, covariance (computes the covariance between the two variables instead of the
correlation coefficient)

Scatter Plot:

Scatter plot of two variables

plot var1 var2 , where var1 is the y-axis variable and var2 is the x-axis variable. Otherwise, you can use
the following command will produce a better quality graph:
graph var1 var2

scatter api00 enroll

twoway (scatter price mpg) (lfit price mpg)
twoway (scatter price mpg) (lfit price mpg)(qfit price mpg)
twoway (scatter api00 enroll, mlabel(snum)) (lfit api00 enroll)

predict e, residual

To draw a histogram
histogram acs_k3
graph [varname], histogram bin(#)
# is the number if intervals we want to specify (#=5 is the default)
Note that we would have obtained the same by typing
graph [varname], bin(#)

If you include normal at the end of the command

graph [varname], histogram bin(#) normal
a line for the normal distribution appears so that you can compare whether your distribution looks like a
normal or it is very different

graph box acs_k3

stem acs_k3
stem full
tabulate full
tabulate dnum if full <= 1
count if dnum==401
graph matrix api00 acs_k3 meals full, half

histogram enroll
histogram enroll, normal bin(20)
histogram enroll, normal bin(20) xlabel(0(100)1600

kdensity enroll, normal

graph box enroll
symplot enroll
qnorm api00
pnorm enroll

ladder enroll
gladder enroll
generate lenroll = log(enroll)
hist lenroll, normal

scatter api00 enroll

twoway (scatter api00 enroll) (lfit api00 enroll)
twoway (scatter api00 enroll, mlabel(snum)) (lfit api00 enroll)

REGRESSION ANALYSIS

You can do the regression analysis by

regress dep_var x1 x2 x3

reg price mpg

Regress with if:

* Before mpg 21
regress price mpg if mpg < 21

* At mpg 21 and after

regress price mpg if mpg >= 21

Generate and regression:

generate lninc=log(inc)
generate d=1 if price>3000
replace d=0 if d==.
The same result could be obtained if we use the following command in Stata:
generate d=(inc>1000)
generate mpg2 = mpg*mpg
generate lprice = log(price)
The multiple linear regression model is estimated by OLS with the regress command For
example,
webuse auto or sysuse auto.dta
regress mpg weight displacement

regresses the mileage(mpg) of a car on weight and displacement. A constant is

automatically added if not suppressed by the option noconst regress mpg weight
displacement, noconst

Estimation based on a subsample is performed as

regress mpg weight displacement if weight > 3000

where only cars heavier than 3000 lb are considered. The Eicker-Huber-White covariance
is reported with the option robust

regress mpg weight displacement, vce(robust)

Change Confidence interval:

Confidence Interval. If you want to change the confidence interval, use the level parameter: .

regress mpg weight displacement, level(99)

As an alternative, you could use the set level command before regress:

. set level 99

. regress mpg weight displacement

F-tests for one or more restrictions are calculated with the post-estimation command test.
For example
test weight displacement

ttest mpg , by(foreign)

tests H0:β1= 0 andβ2= 0 againstHA:β16= 0 orβ26= 0.New variables with residuals and
fitted values are generated by
predict uhat if e(sample), resid
predict pricehat if e(sample)

VIF & Tolerances. Use the vif command to get the variance inflation factors (VIFs) and the tolerances (1/VIF).

vif is one of many post-estimation commands. You run it AFTER running a regression. It uses information Stata has stored
internally.

. vif

Let’s use a different data set this time–

sysuse census, clear
What happened when you tried this?

–You need to clear the data first if you are moving between different data sets

»Either use clear and then sysuse census...» Or type sysuse census, clear to do the same thing

generate –command allows us to create new variables–We want to know the percentage of
adult (> 18 years) population for each state

•Generate adultpop= pop18p / pop –What we did was to create the percentage by dividing the population of adults by the total population

Quick review: what is the average adult population in our sample?

–sum pop18p

Can you create a variable named child that shows the total population of children (0-17 years old)

–generate child = poplt5 + pop5_17

Let’s create a new variable named above that shows states that are above the average adult population
share, how?

generate above = 1 if adultpop> 0.71

How do you think we would do this?

–generate above = 0 if above == . (“if above equals blank”)

•replace–command that changes existing variables

–you can only use generate once on a variable

•So in this example, you would need to do

replace above = 0 if above == .

•replace works very similarly to generate in terms of calculations

–We want to change adultpop from a decimal to a percent (so 0.71 would read 71.0 instead),
how would we do that?
•replace adultpop= adultpop* 100

Regression with Stata

– Simple and Multiple Regression
use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi

save elemapi elemapi.dta will save.

use elemapi elemapi.dta will open

1.1 A First Regression Analysis

regress api00 acs_k3 meals full

(Let’s dive right in and perform a regression analysis using the variables api00, acs_k3, meals and full.
These measure the academic performance of the school (api00), the average class size in kindergarten
through 3rd grade (acs_k3), the percentage of students receiving free meals (meals) – which is an
indicator of poverty, and the percentage of teachers who have full teaching credentials (full). We expect
that better academic performance would be associated with lower class size, fewer students receiving free
meals, and a higher percentage of teachers having full teaching credentials. Below, we show the Stata
command for testing this regression model followed by the Stata output)

regress api00 acs_k3 meals full

Source | SS df MS Number of obs = 313

-------------+------------------------------ F( 3, 309) = 213.41
Model | 2634884.26 3 878294.754 Prob > F = 0.0000
Residual | 1271713.21 309 4115.57673 R-squared = 0.6745
-------------+------------------------------ Adj R-squared = 0.6713
Total | 3906597.47 312 12521.1457 Root MSE = 64.153

------------------------------------------------------------------------------
api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
acs_k3 | -2.681508 1.393991 -1.92 0.055 -5.424424 .0614073
meals | -3.702419 .1540256 -24.04 0.000 -4.005491 -3.399348
full | .1086104 .090719 1.20 0.232 -.0698947 .2871154
_cons | 906.7392 28.26505 32.08 0.000 851.1228 962.3555

Let’s focus on the three predictors, whether they are statistically significant and, if so, the
direction of the relationship. The average class size (acs_k3, b=-2.68), is not statistically
significant at the 0.05 level (p=0.055), but only just so. The coefficient is negative which would
indicate that larger class size is related to lower academic performance — which is what we
would expect. Next, the effect of meals (b=-3.70, p=.000) is significant and its coefficient is
negative indicating that the greater the proportion students receiving free meals, the lower the
academic performance. Please note, that we are not saying that free meals are causing lower
academic performance. The meals variable is highly related to income level and functions more
as a proxy for poverty. Thus, higher levels of poverty are associated with lower academic
performance. This result also makes sense. Finally, the percentage of teachers with full
credentials (full, b=0.11, p=.232) seems to be unrelated to academic performance. This would
seem to indicate that the percentage of teachers with full credentials is not an important factor in
predicting academic performance — this result was somewhat unexpected.

Should we take these results and write them up for publication? From these results, we would
conclude that lower class sizes are related to higher performance, that fewer students receiving
free meals is associated with higher performance, and that the percentage of teachers with full
credentials was not related to academic performance in the schools. Before we write this up for
publication, we should do a number of checks to make sure we can firmly stand behind these
results. We start by getting more familiar with the data file, doing preliminary data checking,
looking for errors in the data.

1.2 Examining data

First, let’s use the describe command to learn more about this data file. We can verify how many
observations it has and see the names of the variables it contains. To do this, we simply type

describe

We will not go into all of the details of this output. Note that there are 400 observations and 21
variables. We have variables about academic performance in 2000 and 1999 and the change in
performance, api00, api99 and growth respectively. We also have various characteristics of the
schools, e.g., class size, parents education, percent of teachers with full and emergency
credentials, and number of students. Note that when we did our original regression analysis it
said that there were 313 observations, but the describe command indicates that we have 400
observations in the data file.

If you want to learn more about the data file, you could list all or some of the observations. For
example, below we list the first five observations.

codebook api00 acs_k3 meals full yr_rnd

(Another useful tool for learning about your variables is the codebook command. Let’s do codebook for the
variables we included in the regression analysis, as well as the variable yr_rnd. We have interspersed some
comments on this output in [square brackets and in bold])

summarize api00 acs_k3 meals full

summarize acs_k3, detail
tabulate acs_k3
list snum dnum acs_k3 if acs_k3 < 0
list dnum snum api00 acs_k3 meals full if dnum == 140

regress api00 acs_k3 meals full

Another dataset:

use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2

regress api00 acs_k3 meals full

save elemapi2

regress api00 enroll -simple linear reg

predict e, residual

clear
use https://stats.idre.ucla.edu/stat/stata/webbooks/reg/elemapi2
regress api00 ell meals yr_rnd mobility acs_k3 acs_46 full emer enroll, beta

correlate api00 ell meals yr_rnd mobility acs_k3 acs_46 full emer enroll
pwcorr api00 ell meals yr_rnd mobility acs_k3 acs_46 full emer enroll, obs sig

Stata Cheat Sheets
100% (1)
Stata Cheat Sheets
6 pages
An Introduction To Modern Econometrics Using Stata (Christopher Baum) PDF
100% (1)
An Introduction To Modern Econometrics Using Stata (Christopher Baum) PDF
349 pages
Data Analysis
No ratings yet
Data Analysis
1 page
AllCheatSheets Stata v15
100% (1)
AllCheatSheets Stata v15
6 pages
Econometrics With Stata PDF
No ratings yet
Econometrics With Stata PDF
58 pages
Stata-Syntax Reference
No ratings yet
Stata-Syntax Reference
4 pages
Lec17-RStudio Basic
No ratings yet
Lec17-RStudio Basic
15 pages
Lec16-Stata PanelData
No ratings yet
Lec16-Stata PanelData
39 pages
Cameron and Trivedi STATA
100% (3)
Cameron and Trivedi STATA
732 pages
Introduction To Stata 2024-06-18 Handout
No ratings yet
Introduction To Stata 2024-06-18 Handout
52 pages
Topic 3-SPSS and STATA
100% (1)
Topic 3-SPSS and STATA
73 pages
100 GPT 4 Prompts For Finance Tn27xDlH
No ratings yet
100 GPT 4 Prompts For Finance Tn27xDlH
16 pages
All Cheat Sheets
No ratings yet
All Cheat Sheets
5 pages
Baum - An Introduction To Modern Econometrics Using Stata
100% (1)
Baum - An Introduction To Modern Econometrics Using Stata
376 pages
Data - Wrangling Analysis
No ratings yet
Data - Wrangling Analysis
26 pages
Macro Assignment
No ratings yet
Macro Assignment
11 pages
Stata Slides
No ratings yet
Stata Slides
45 pages
Stat A Cheat Sheets
No ratings yet
Stat A Cheat Sheets
6 pages
Notes 8 - Examples (March5)
No ratings yet
Notes 8 - Examples (March5)
25 pages
Stata Application Part I
No ratings yet
Stata Application Part I
27 pages
Introduction To STATA
No ratings yet
Introduction To STATA
57 pages
STATA
No ratings yet
STATA
26 pages
Cheat Sheet: With Stata 15
No ratings yet
Cheat Sheet: With Stata 15
6 pages
Introduction To Stata 2012 - Econ4150
No ratings yet
Introduction To Stata 2012 - Econ4150
17 pages
Data Analysis Using Stata
No ratings yet
Data Analysis Using Stata
13 pages
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
No ratings yet
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
27 pages
ECON6067 Stata (II) 2022
No ratings yet
ECON6067 Stata (II) 2022
22 pages
Stata Notebook
No ratings yet
Stata Notebook
9 pages
Creating New Variables: Generate and Replace
No ratings yet
Creating New Variables: Generate and Replace
7 pages
Stata
No ratings yet
Stata
6 pages
Mvreg - Multivariate Regression
No ratings yet
Mvreg - Multivariate Regression
7 pages
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
No ratings yet
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
349 pages
Bio624 Class1handout
No ratings yet
Bio624 Class1handout
48 pages
Statacheatsheets
No ratings yet
Statacheatsheets
6 pages
Cheat Sheet: With Stata
No ratings yet
Cheat Sheet: With Stata
6 pages
Using R For Basic Statistical Analysis
No ratings yet
Using R For Basic Statistical Analysis
11 pages
Stata Codes
No ratings yet
Stata Codes
8 pages
R Regression Commands
No ratings yet
R Regression Commands
5 pages
MGMT 469 Helpful Stata Commands
No ratings yet
MGMT 469 Helpful Stata Commands
8 pages
Computing New Variables Using Generate and Replace
No ratings yet
Computing New Variables Using Generate and Replace
9 pages
Introduction To Stata: Li-Pin Juan
No ratings yet
Introduction To Stata: Li-Pin Juan
41 pages
AllCheatSheets Stata v15
No ratings yet
AllCheatSheets Stata v15
6 pages
Cheat Sheet: With Stata 15
No ratings yet
Cheat Sheet: With Stata 15
1 page
Using Stata With The Fundamentals of Political: Science Research
No ratings yet
Using Stata With The Fundamentals of Political: Science Research
20 pages
Stata Commands To Run Regression
No ratings yet
Stata Commands To Run Regression
1 page
StataCheatSheet Analysis
No ratings yet
StataCheatSheet Analysis
1 page
Summary of Basic STATA Commands and Syntax
No ratings yet
Summary of Basic STATA Commands and Syntax
5 pages
AllCheatSheets Stata v15 PDF
No ratings yet
AllCheatSheets Stata v15 PDF
6 pages
Stata
No ratings yet
Stata
26 pages
Basic Tutorial Stata PDF
No ratings yet
Basic Tutorial Stata PDF
5 pages
UsefulStataCommands PDF
No ratings yet
UsefulStataCommands PDF
51 pages
Useful Stata Commands
No ratings yet
Useful Stata Commands
48 pages
Comandos
No ratings yet
Comandos
51 pages
STATAfor Econ Workshop 3
No ratings yet
STATAfor Econ Workshop 3
12 pages
Stata Reference Manual: What You Should Know About Stata After Taking The Stata Introduction Course
No ratings yet
Stata Reference Manual: What You Should Know About Stata After Taking The Stata Introduction Course
26 pages
Introduction To Stata: 1 Data Manipulation
No ratings yet
Introduction To Stata: 1 Data Manipulation
6 pages

Lec11-Stata Regression

Uploaded by

Lec11-Stata Regression

Uploaded by

Multiple linear regression:

Steps for Running Regression-

If you need help with any command name, just type:

–help (command name)

webuse auto or sysuse auto.dta

To label variables label variable [variable name] “comment”

Using keep/drop to eliminate variables

keep make rep78 foreign mpg price

keep make price mpg

drop displ gear_ratio

Using keep if/drop if to eliminate observations

keep if (rep78 <= 3)

Eliminating variables and/or observations with use

use auto if (rep78 <= 3)

use make mpg price rep78 using auto if (rep78 <= 3)

tabulate rep78 foreign

tabulate rep78 foreign if rep78 >=4, column nofreq

list if rep78 >= 4

list if rep78 >= 4 & !missing(rep78)

Correlate: Correlation and covariance between two variables::

Scatter plot of two variables

scatter api00 enroll

If you include normal at the end of the command

graph box acs_k3

kdensity enroll, normal

scatter api00 enroll

You can do the regression analysis by

reg price mpg

Regress with if:

* At mpg 21 and after

Generate and regression:

regresses the mileage(mpg) of a car on weight and displacement. A constant is

Estimation based on a subsample is performed as

regress mpg weight displacement if weight > 3000

regress mpg weight displacement, vce(robust)

Change Confidence interval:

regress mpg weight displacement, level(99)

. regress mpg weight displacement

ttest mpg , by(foreign)

Let’s use a different data set this time–

Quick review: what is the average adult population in our sample?

–generate child = poplt5 + pop5_17

generate above = 1 if adultpop> 0.71

How do you think we would do this?

•replace–command that changes existing variables

•So in this example, you would need to do

•replace works very similarly to generate in terms of calculations

Regression with Stata

save elemapi elemapi.dta will save.

1.1 A First Regression Analysis

regress api00 acs_k3 meals full

regress api00 acs_k3 meals full

Source | SS df MS Number of obs = 313

1.2 Examining data

codebook api00 acs_k3 meals full yr_rnd

summarize api00 acs_k3 meals full

regress api00 acs_k3 meals full

regress api00 acs_k3 meals full

regress api00 enroll -simple linear reg

You might also like