Stata Task
Stata Task
1. We would like you to save the log file and the do file for each section separately. Kindly send all your
log and do files in a zipped folder
Data Cleaning:
Data Structure:
1. Describe the level of the data and report the number of missing values in “q406”.
2. Create a new variable “age_district” showing average age per district from “q406”. For
calculating the average age per district, ignore values of greater than 64 in “q406”.
3. Make a bar chart showing distribution of age bracket by gender aggregated at district level.
4. Create a new variable “age_brac_district” showing the percentages of individuals falling in each
age bracket using “age_brac” variable created in question 4, section 1.
5. Reshape data in such a manner that one row should represent one district and should have only
four columns: DistrictName, % falling in 14-29, 30-45 and 46-64 age brackets.
6. Make a table in excel/word using stata command having District Names and Average Age per
district.
Regression Analysis:
The Mincer earnings function is a single-equation model that explains wage income as a function of
schooling and experience, named after Jacob Mincer. The equation has been examined on many datasets
and Thomas Lemieux argues it is "one of the most widely used models in empirical economics".
Typically the logarithm of earnings is modelled as the sum of determinants of wage.
The dataset you prepared and merged is Labor Force Survey for the years 200910 and 201011. It contains
wage earned for the last year and other determinants such as number of sex of the member; age of the
member; years of education; marital status; number of years living in the district; if the person has
obtained any professional training; principal professional activities in the last year; any subsidiary
occupation
We want to see out these variables, which variables determine wage of the individuals.
Regression Specification
1) Please write regression specification for other above variables, keeping log of wage earned in the
previous year as the dependent variable and other determinants as the independent variable.
Data Setup
2b) Recode variables of gender, training attended, additional subsidiary work such that instead for female
gender, training not attended and no additional subsidiary work, it shows 0 instead of 2
Regression Analysis
3a) Run the regression in STATA and export the regression results in word format.
3b) Interpret the regression results in 3a and Explain which variables are important for determining wage.
Are there are counter intuitive results?
3c) The variable for marital status, number of years living in the district, principal activities in last 12
months are categorical variables. We want to see the effect on wage of on each category of these
variables. Rerun the regression for each category by taking base for marital status as never married, for
number of years take base as more than 10 years, for activities in 12 months take base as not in labor
force
Hint: use xi command in STATA and check its help manual for how to use categorical variable in
regression
3di) Rerun specification 3c for robust standard errors and cluster at the district level.
Sampling
Draw a sample of 3000 observations such that number of observation in the sample represents the
proportion of observation from each district in the original sample