Computing New Variables Using Generate and Replace
Computing New Variables Using Generate and Replace
● Get the mean and standard deviation of length and we can make z-scores of length.
- **From ECOSTAT: z = (x - mean)/standard deviation
- summarize length
- generate zlength = (length - 187.93)/22.27
- summarize zlength
● To break mpg down into three categories, first make a table of mpg:
- tabulate mpg
● Convert mpg into three categories to make it more readable.
- generate mpg3 = . → make a variable with missing values
- replace mpg3 = 1 if (mpg<=18)
- replace mpg3 = 2 if (mpg>=19) & (mpg<=23)
- replace mpg3 = 3 if (mpg>=24) & (mpg<.)
Recodes with if
● Create a variable called mpgfd that assesses the mileage of the cars wrt their origin.
- sort foreign
- by foreign: summarize mpg, detail
● The generate and recode commands in the next slide recode mpg (INCOMPLETE)
● generate mpgfd = mpg
● recode mpgfd (min/18=0)(19/max=1) if foreign==0
● recode mpgfd(min/24=0)(25/max=1) if foreign==1
● check using:
- by foreign: tabulate mpg mpgfd
To save files:
Select all commands
Send to do-file editor
File name: last name_first name_LBYMET V24_DATE
LECTURE
EXERCISE
sysuse auto, clear
summarize weight
generate wei_kg=weight/2.2
summarize weight wei_kg
tabulate trunk
generate trunk4 = .
replace trunk4=1 if (trunk<=10)
replace trunk4= 2 if (trunk>=11)&(trunk<=15)
replace trunk4=3 if (trunk>=16)
replace trunk4=4 if (trunk>=21)&(trunk<.)
tabulate trunk trunk4
tabulate trunk4 foreign, column
generate trunk4a=trunk
recode trunk4a (min/10=1)(11/15=2)(16/20=3)(21/max=4)
tabulate trunk trunk4a
sort foreign
6/5/18
LABELING DATA
Use a file called autolab that does not have any labels
● Download data from https://stats.idre.ucla.edu/stat/stata/modules/autolab.dta
● Open in Stata
Use the describe command to verify that indeed this file does not have any labels
- describe
Use the label data command to add a label describing the data file. Can be up to 80 characters
long.
- label data “This file contains auto data for the year 1978”
The describe command shows that this label has been applied to the version that is currently in
memory.
- describe
Use the label variable command to assign labels to the variables rep78 price, mpg and foreign
- label variable rep78 “the repair record from 1978”
- label variable price “the price of the car in 1978”
- label variable mpg “the miles per gallon for the car”
- label variable foreign “the origin of the car, foreign or domestic”
Part 2
Use sysuse auto, clear to clear previous labels
describe to check
You can use the keep and drop commands to subset variables
Suppose we want to just have make mpg and price, we can keep just those variables
- keep make mpg price
If we issue the describe command again, we see that those are the only variables left.
- describe
Using the drop command. Clear out the data in memory and use the auto data file.
- sysuse auto, clear
To get rid of variables displ and gear_ratio:
- drop displ gear_ratio
Use describe to check
- describe
Make change permanent
- save auto2
To replace existing file with same name
- Save auto2, replace
- drop if rep78==1
- tabulate rep78, missing
PART 2
(Entire lecture in https://stats.idre.ucla.edu/stata/modules/working-across-variables-
using-foreach/)
July 4, 2018
https://stats.idre.ucla.edu/stata/modules/graph8/intro/introduction-to-graphs-in-stata/
The slope of the fitted quadratic regression function ___ = ____. Compute teh slope at different
values of x= sqft. We will access the regression coefficient using _b[varname]. Calculating the
slope at sqft = 2000, 4000, and 6000 we have,
- di “slope at 2000 = “ 2*_b[sqft2]*2000
- di “slope at 4000 = “ 2*_b[sqft2]*4000
- di “slope at 6000 = “ 2*_b[sqft2]*6000
Using the same approach we can see the predicted values from the estimated regression
- Di “predicted price at 2000 = “
_b[_cons]+_b[sqft2]*2000^2
.
.
.
A more stylish and efficient approach is to use factor variables. We can estimate the quad fcn.
Directly without creating a new variable.
- Regress price c.sqft#c.sqft
Obtain fitted values
- Predict price2
Not only are slopes computed correctly, but we are provided a standard error and interval
estimate as well. Elasticities use the eyex(*) option.
- margins, eyex(*) at(sqft=(2000 4000 6000))
The slopes and elasticities computed above are Conditional because they are computed at
specific values. To compute the Average marginal effects or average elasticities use the
margins command without the at option.
- margins, eyex(*)
A log-linear model
Using the same data, we will estimate a log linear model. To obtain the fitted value of y the most
natural thing to do is compute the antilog.
Use the same data set. Get the detailed summary statistics and histogram of price.
- Summarize price, detail
- Histogram price, percent
Now generate the logarithm of price and plot its histogram.
- Generate lprice = ln(price)
- Histogram lprice, percent
The log-linear regression model is
- Reg lprice sqft
The predicted values are obtained using
- Predict lpricef, xb
- Generate pricef = exp(lpricef)
The variable pricef is the predicted (or forecast) price. Plot the fitted curve.
- Twoway (scatter price sqft) (line pricef sqft, sort lwidth(medthick))
We must calculate the slope and elasticity
- di "slope at 100000 = " _b[sqft]*100000
- di "slope at 500000 = " _b[sqft]*500000
- di "elasticity at 2000 = " _b[sqft]*2000
- di "elasticity at 4000 = " _b[sqft]*4000
We can also compute average marginal effects at each fitted house price in the sample
- Generate me = _b[sqft]*pricef
- Summarize me
Similarly the avg elasticity is
- Generate elas =_b[sqft]*sqft
- Summarize elas
LINK: https://www.stata.com/data/s4poe4/chap02.do
AUGUST 8 2018
https://stats.idre.ucla.edu/stata/webbooks/reg/chapter2/stata-webbooksregressionwith-
statachapter-2-regression-diagnostics/