Assignment 2 Answer Key PDF
Assignment 2 Answer Key PDF
Instructions
In this assignment, you will use data from the dataset included in the Assignment 2 folder. This dataset
contains country-level indicators for wealth and corruption. Wealth is measured by average GDP per capita
(in US$) for the period 2000-2007. Corruption scores for each country are measured based on Transparency
International’s estimation, which surveys business people, risk analysts, and the general public in an effort
to capture perceptions of corruption around the world. Scores on the corruption measure range between 0
(highly corrupt) and 10 (highly clean).1 This measure of corruption was taken in 2002.
Review Lab Guide 4.2 In a separate document, answer the questions below. Submit your answers, along with
your R script, to the Assignment 2 folder on MyCourses before May 24 at 11:59pm. Note that only .doc
and .pdf files are accepted for the write-up. Be as concise as possible in your answers (no need to write long
paragraphs).
Assignment
1. How do you think wealth and corruption might be related? State a testable hypothesis
linking these two variables. Identify the independent and dependent variables.
In this example, either variable could be your outcome, depending on how you conceptualize the relationship.
What is important is that your independent and dependent variable align with your hypothesis. Remember
that a fully specified hypothesis identifies how you expect the outcome to change, depending on levels of the
independent variable. You must not only specify the two variable, but also note the direction of change.
In this instance, my hypothesis is that countries that are more corrupt (lower scores on the TPI’s measure)
are less likely to be wealthy. Corruption scores are the independent variable. Wealth is the outcome variable.
2. Summarize both variables by reporting the minimum and maximum values, relevant mea-
sures of central tendency, as well as the standard deviation. Produce a histogram for each
variable (2 plots total: histogram for IV & histogram for DV). Make your plots as nice-looking
and informative as possible. Save the graphs to your working directory using "Export" in RStu-
dio’s plot viewer. Include these graphs in your write-up. In one paragraph, draw on this data
to describe the distributions of each variable in your own words.
# Independent variable: Corruption
describe(tpi.df$ti_cpi)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 170 4.05 2.11 3.3 3.75 1.63 1.2 9.7 8.5 1.11 0.28 0.16
1 Note: The name of the scale may sound a bit counter-intuitive. Higher scores reflect less corruption.
2 Review also lab guides 1, 2, and 3 if need be, and come to office hours sooner rather than later for extra help.
1
hist(tpi.df$ti_cpi,
main = "Univariate distribution of Transparency International's \n Corruption Perception Index",
xlab = "Corruption Rating \n 0 (highly corrupt) to 10 (highly clean)")
30
20
10
0
2 4 6 8 10
Corruption Rating
0 (highly corrupt) to 10 (highly clean)
table(tpi.df$ti_cpi)
##
## 1.2 1.6 1.7 1.8 1.9 2 2.0999999 2.2
## 1 2 5 1 4 3 5 8
## 2.3 2.4000001 2.5 2.5999999 2.7 2.8 2.9000001 3
## 5 5 6 8 9 3 3 7
## 3.0999999 3.2 3.3 3.4000001 3.5 3.5999999 3.7 3.8
## 4 3 4 4 4 2 5 1
## 3.9000001 4 4.1999998 4.3000002 4.4000001 4.5 4.5999999 4.8000002
## 1 6 1 1 1 8 1 4
## 4.9000001 5.0999999 5.1999998 5.3000002 5.5999999 5.6999998 6 6.0999999
## 4 1 2 1 2 1 2 3
## 6.3000002 6.4000001 6.8000002 6.9000001 7.0999999 7.3000002 7.5 7.6999998
## 3 1 2 1 3 3 1 1
## 7.8000002 8.5 8.6000004 8.6999998 9 9.3000002 9.3999996 9.5
## 1 2 1 1 3 2 1 2
## 9.6999998
## 1
Transparency International’s corruptions scores range from a low of 1.2 to a high of 9.7. Because the variable
is measured on a continuous scale, we should report all three measures of central tendency3 . The mean
corruption score is 4.05 and the median value is 3.3. The modal value, as indicated by the frequency table, is
2.7. The standard deviation is 2.11.
3 Note: If we had a nominal variable, it would not be appropriate to report to mean or the median.
2
# Dependent variable: Wealth
describe(tpi.df$undp_gdp)
40
20
0
##
## 520 580 630 650 710 740 780 800 840 860 870 890 930
## 1 2 1 1 1 1 1 1 1 1 1 1 1
## 980 1020 1027 1050 1070 1100 1170 1270 1317 1370 1390 1470 1480
## 2 2 1 1 1 1 1 1 1 1 1 1 1
## 1520 1580 1590 1610 1620 1670 1690 1700 1710 1720 1820 1940 1969
## 1 1 1 1 1 1 2 1 1 1 1 1 1
## 1990 2000 2060 2100 2130 2220 2260 2270 2300 2400 2420 2460 2470
## 1 1 1 1 2 1 1 1 1 1 1 1 1
## 2600 2670 2890 3120 3210 3230 3570 3580 3620 3810 3980 4080 4170
## 1 1 1 1 1 1 1 1 1 2 1 1 1
## 4220 4260 4300 4360 4550 4580 4610 4798 4830 4870 4890 5000 5010
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5259 5300 5380 5440 5460 5520 5600 5640 5760 5870 5970 6080 6170
## 1 1 1 1 1 1 1 1 1 1 1 1 1
3
## 6210 6370 6390 6470 6560 6590 6640 6690 6760 6850 7010 7130 7280
## 1 1 1 1 1 2 1 1 1 1 1 1 1
## 7570 7770 7830 8170 8230 8840 8970 9120 9210 9430 9820 10070 10240
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 10320 10560 10810 10880 12260 12650 12840 13340 13400 15290 15780 16240 16950
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 17170 17640 18232 18280 18360 18540 18720 19530 19844 21460 21740 22420 24040
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 26050 26150 26190 26430 26920 26940 27100 27570 28260 29100 29220 29480 29750
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 30010 30130 30940 35750 36360 36600 61190
## 1 1 1 1 1 1 1
The averge GDP per capita in the sample is USD$8,949.68. The median GDP per capita is USD$5,279.50.
In this case, the mode is not particularly useful as there are many unique values in the dataset, and very
few observations that take on the same value. Nonetheless, we can still identify the modal values from the
frequency table; they are 580, 980, 1020, 1690, 2130, 3810, and 6590. GDP per capita ranges from a low of
US$520 to a high of USD$61,190. There is also a fairly large amount of variation in the sample, as indicated
by the standard deviaion of USD$9,986.85, which is higher than the average value.
3. Recode corruption into a categorical variable with two levels: low and high. In a sentence
or two, justify your reasoning for what constitutes "low" and "high" levels of corruption by
making reference to your data.
Students may decide on various thresholds to split the sample into “high” and “low” levels of corruption.
What is critical is that you justify your decision with respect to the data. You cannot, for example, be so
selective in that you are left with categories that are essentially empty.
In this case, you may wish to split your sample at either the mean or the median. Another alternative is to
use the mid-point of the scale. Below, I split the sample based on the median.
tpi.df$corr_r <- ifelse(tpi.df$ti_cpi >= 3.3, "Cleaner", "More corrupt")
4. Recode wealth into a categorical variable with three levels: low, moderate, and high. In
a sentence or two, justify your reasoning for what constitutes "low", "moderate," and "high"
levels of wealth by making reference to your data.
As above, there are several different thresholds you might select to recode your variable. Here, though, as we
are splitting the sample into three categories, we should pay close attention to the distribution along the
range of values, and not rely solely on the mean, median, or midpoint of the scale. You must also be careful
that you are not so selective in what constitutes, for example, a “high” level of wealth that you are left with
very few “wealthy countries” (which would be apparent from your cross-tab in the question below).
tpi.df$wealth_r[tpi.df$undp_gdp <= 5279.5] <- "Low wealth" # Lowest half of the world
tpi.df$wealth_r[tpi.df$undp_gdp > 5279.5 & tpi.df$undp_gdp < 9986.85] <- "Moderate wealth"
tpi.df$wealth_r[tpi.df$undp_gdp >= 9986.85] <- "High wealth" # 1 SD or more above the mean
table(tpi.df$wealth_r)
##
## High wealth Low wealth Moderate wealth
## 48 85 37
# Note that the order of these categories is alphabetical.
# For our crosstab, we will want to ensure they are in logical order:
4
5. Install and/or load the gmodels package. Using CrossTable(), produce a contingency table
to examine the relationship between your recoded corruption and wealth variables. Make
sure you place your variables in the correct position in your table. In a paragraph, use
this information to say something about the covariation (i.e., the association or relationship)
between corruption and wealth.
To set up your crosstab, it is essential that the variable you identified as your dependent variable in Question 1
above goes across the rows of your table, and the independent variable goes down your columns. All cross-tabs
must be set up this way because we rely on column percentages to assess how changes in the independent
variable lead to outcomes on the dependent variable. Note that depending on how you categorized your
variable, your cross-tab will look a bit different.
library(gmodels)
##
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## Total Observations in Table: 170
##
## | tpi.df$corr_r
## tpi.df$wealth_r | Cleaner | More corrupt | Row Total |
## ----------------|--------------|--------------|--------------|
## Low wealth | 17 | 68 | 85 |
## | 19.318% | 82.927% | |
## ----------------|--------------|--------------|--------------|
## Moderate wealth | 25 | 12 | 37 |
## | 28.409% | 14.634% | |
## ----------------|--------------|--------------|--------------|
## High wealth | 46 | 2 | 48 |
## | 52.273% | 2.439% | |
## ----------------|--------------|--------------|--------------|
## Column Total | 88 | 82 | 170 |
## | 51.765% | 48.235% | |
## ----------------|--------------|--------------|--------------|
##
##
The results from the cross-tab illustrate a clear relationship between levels of corruption and wealth.
Specifically, countries that are less corrupt (i.e., “cleaner”) are much more likely to have moderate or high
levels of wealth compared to more corrupt countries. In particular, less than 20% of non-corrupt countries
are classified as low levels of wealth (19.3%) while the overwhelming majority of corrupt countries are scored
as low wealth (82.9%). We can also see that there are very few countries that are both corrupt and wealthy
(n = 2, 2.4%).