0% found this document useful (0 votes)
177 views3 pages

Stata Task

1. The document provides instructions to clean data from three datasets, perform analyses, and draw random samples. It involves merging datasets, cleaning variables, creating new variables, describing data, regression analysis, and sampling. 2. Regression analysis is conducted to determine wage determinants using Mincer earnings function with log wage as dependent variable and education, experience and other individual characteristics as independents. 3. Further regressions are run to examine effects of categorical variables like marital status and occupation on wages. Standard errors are clustered at district level to account for within-district correlations.

Uploaded by

Saad Raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views3 pages

Stata Task

1. The document provides instructions to clean data from three datasets, perform analyses, and draw random samples. It involves merging datasets, cleaning variables, creating new variables, describing data, regression analysis, and sampling. 2. Regression analysis is conducted to determine wage determinants using Mincer earnings function with log wage as dependent variable and education, experience and other individual characteristics as independents. 3. Further regressions are run to examine effects of categorical variables like marital status and occupation on wages. Standard errors are clustered at district level to account for within-district correlations.

Uploaded by

Saad Raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Instructions:

1. We would like you to save the log file and the do file for each section separately. Kindly send all your
log and do files in a zipped folder

Data Cleaning:

1. Make one data set out of three.


Hint: Merge 200910 dataset first and append with 201011 dataset.
2. Clean “DistrictName” variable.
Hint: Should have same name for the same district.
3. Make a new Variable “Dist_code” having a unique code for each district.
4. Define and assign labels of districts created in question 2.
5. Make a new variable “age_brac” from “q406” having three brackets: 1 (14-29), 2 (30-45) and 3
(46-64). This variable should have dot (.) if age is greater than 64.
6. Create a new variable “Pcode_new” having last four integers of “P_Code”.

Data Structure:

1. Describe the level of the data and report the number of missing values in “q406”.
2. Create a new variable “age_district” showing average age per district from “q406”. For
calculating the average age per district, ignore values of greater than 64 in “q406”.
3. Make a bar chart showing distribution of age bracket by gender aggregated at district level.
4. Create a new variable “age_brac_district” showing the percentages of individuals falling in each
age bracket using “age_brac” variable created in question 4, section 1.
5. Reshape data in such a manner that one row should represent one district and should have only
four columns: DistrictName, % falling in 14-29, 30-45 and 46-64 age brackets.
6. Make a table in excel/word using stata command having District Names and Average Age per
district.

Regression Analysis:

The Mincer earnings function is a single-equation model that explains wage income as a function of
schooling and experience, named after Jacob Mincer. The equation has been examined on many datasets
and Thomas Lemieux argues it is "one of the most widely used models in empirical economics".
Typically the logarithm of earnings is modelled as the sum of determinants of wage.

The dataset you prepared and merged is Labor Force Survey for the years 200910 and 201011. It contains
wage earned for the last year and other determinants such as number of sex of the member; age of the
member; years of education; marital status; number of years living in the district; if the person has
obtained any professional training; principal professional activities in the last year; any subsidiary
occupation
We want to see out these variables, which variables determine wage of the individuals.

Regression Specification

1) Please write regression specification for other above variables, keeping log of wage earned in the
previous year as the dependent variable and other determinants as the independent variable.

Data Setup

2a) Generate log of wage earned in the last year

2b) Recode variables of gender, training attended, additional subsidiary work such that instead for female
gender, training not attended and no additional subsidiary work, it shows 0 instead of 2

Regression Analysis

3a) Run the regression in STATA and export the regression results in word format.

3b) Interpret the regression results in 3a and Explain which variables are important for determining wage.
Are there are counter intuitive results?

3c) The variable for marital status, number of years living in the district, principal activities in last 12
months are categorical variables. We want to see the effect on wage of on each category of these
variables. Rerun the regression for each category by taking base for marital status as never married, for
number of years take base as more than 10 years, for activities in 12 months take base as not in labor
force

Hint: use xi command in STATA and check its help manual for how to use categorical variable in
regression

3di) Rerun specification 3c for robust standard errors and cluster at the district level.

3dii) what is the purpose of clustering at district level

Sampling

Draw a random sample of 1000 observations

Draw a random sample of 15% of the data

Draw a sample of 3000 observations such that number of observation in the sample represents the
proportion of observation from each district in the original sample

You might also like