Data Analysis Assignment_Prompt
Data Analysis Assignment_Prompt
Exercise 2:
Create two new variables called GVC participation and Trade Openness. GVC participation
is our variable for trade in GVCs, while Trade Openness is our variable for trade in terms of
gross exports and imports. Both these variables should be added as two new columns in the same
merged “Master dataset”. The formula for generating these variables is:
GVC participation = DVA in FX share exp (%) + FVA share exp (%)
[Gross Exports (mils) + Gross Imports (mils)]
Trade Openness = ∗ 100
Gross Output (mils)
Note: GVC participation measures trade in intermediate inputs while trade openness measures
trade in final goods.
Exercise 3:
Now, you need to report the summary statistics for all the variables in our Master dataset listed
below. The statistics should be reported in a table form as below.
Table 1.A: All countries
Variable Mean S. D Min Max
GVC participation
Trade openness
Value added
GDP per capita
LFPR, total
LFPR, male
LFPR, female
Table 1.B: Developed and Developing countries
Developed Developing
Variable Mean S. D Min Max Mean S. D Min Max
GVC participation
Trade openness
Value added
GDP per capita
LFPR, total
LFPR, male
LFPR, female
Note: To distinguish between developed and developing countries in the sample, use the
variable “countrytype” in the original excel sheet data. Countrytype is a binary variable, which
takes a value 0 for developed countries and 1 for developing countries. You should now report
the summary statistics for developed and developing countries using this countrytype variable.
You can either use STATA to create summary stats or do it in excel directly. In STATA you can
use the command: sum variablename if countrytype==0, and so on.
Question 1:
By how much does GVC participation and Trade openness differ across all countries (in Table
1.A), and between developed and developing countries (in Table 1.B)? Briefly describe one key
reason for this difference?
Exercise 4:
Let us now transition from industry to country-level analysis. In other words, we will aggregate
the data across industries, such that our new data is at the country and year level. To do this, you
should create a modified version of the “Master dataset” by only keeping the industry code
indcode = “DTOTAL” or industryid = 70 (and dropping all the other 69 industry groups). This
is the sum total value of each variable across all industries. You should only keep the row
“DTOTAL” for all 65 countries and 24 years. Now your dataset will have 65 countries and 24
years only (No 70 industries anymore). Save this dataset as “Country level dataset”, in the
“Master dataset” excel file as a new sheet.
Note: You can use a software like STATA (or R) to do this step. Or simply do it on Excel. In
STATA, use a command that keeps just the indcode “DTOTAL”.
Exercise 5:
Now, let’s do some country-level analysis (using the above Country-level dataset). Let us plot
some graphs and analyze the data.
Question 2:
(a) Plot GVC participation (x-axis) and GDP per capita (y-axis) on a graph. The graph is for all
countries in the sample. Analyze and interpret the relationship between these two variables.
(Hint: you can use the command “scatter” on STATA to generate such graphs. Or do it how you
like, on excel or any other software).
(b) Plot Trade openness (x-axis) and GDP per capita (y-axis) on a graph. The graph is for all
countries in the sample. Analyze and interpret the relationship between these two variables.
Discuss how the results in this graph differs from the results in part (a).
(c) Plot GVC participation (x-axis) and LFPR, LFPR male and LFPR female (y-axis) on three
separate graphs. The graph is for all countries in the sample. Analyze and interpret the
relationship between GVC participation and overall employment, and on employment by
gender.
Exercise 6:
Finally, let’s repeat the country-level analysis (using the above Country-level dataset) in question
5 for a select sample of developed and developed nations.
Now, only take the following sub-sample of countries: United States (USA), Germany, South
Korea, China, Brazil, and India.
For these selected countries, do the following analysis:
Question 3:
(a) Plot GVC participation (y-axis), LFPR (y-axis), and year (x-axis) on a graph. The graph
should have separate trend lines for each of the selected countries from 1995-2018.
Analyze and interpret the trend of GVC and employment for the selected sample of countries.
(b) Plot Trade openness (y-axis), LFPR (y-axis), and year (x-axis) on a graph. The graph should
have separate trend lines for each of the selected countries from 1995-2018.
Analyze and interpret the trend of trade openness and employment for the selected sample of
countries.
Note: Use different colours for each country’s trend line. And label it clearly.