0% found this document useful (0 votes)
333 views5 pages

Assignment 2 Answer Key PDF

This document provides instructions and an answer key for a lab homework assignment on analyzing the relationship between wealth and corruption among countries. Students are asked to 1) state a hypothesis linking wealth (dependent variable) and corruption (independent variable), 2) summarize the distributions of each variable using descriptive statistics and histograms, and 3) recode the corruption variable into a categorical variable with two levels.

Uploaded by

Elya Renom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
333 views5 pages

Assignment 2 Answer Key PDF

This document provides instructions and an answer key for a lab homework assignment on analyzing the relationship between wealth and corruption among countries. Students are asked to 1) state a hypothesis linking wealth (dependent variable) and corruption (independent variable), 2) summarize the distributions of each variable using descriptive statistics and histograms, and 3) recode the corruption variable into a categorical variable with two levels.

Uploaded by

Elya Renom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Lab Homework # 2 Answer Key

Due: Sunday, May 24 at 11.59pm

Instructions
In this assignment, you will use data from the dataset included in the Assignment 2 folder. This dataset
contains country-level indicators for wealth and corruption. Wealth is measured by average GDP per capita
(in US$) for the period 2000-2007. Corruption scores for each country are measured based on Transparency
International’s estimation, which surveys business people, risk analysts, and the general public in an effort
to capture perceptions of corruption around the world. Scores on the corruption measure range between 0
(highly corrupt) and 10 (highly clean).1 This measure of corruption was taken in 2002.
Review Lab Guide 4.2 In a separate document, answer the questions below. Submit your answers, along with
your R script, to the Assignment 2 folder on MyCourses before May 24 at 11:59pm. Note that only .doc
and .pdf files are accepted for the write-up. Be as concise as possible in your answers (no need to write long
paragraphs).

Assignment
1. How do you think wealth and corruption might be related? State a testable hypothesis
linking these two variables. Identify the independent and dependent variables.
In this example, either variable could be your outcome, depending on how you conceptualize the relationship.
What is important is that your independent and dependent variable align with your hypothesis. Remember
that a fully specified hypothesis identifies how you expect the outcome to change, depending on levels of the
independent variable. You must not only specify the two variable, but also note the direction of change.
In this instance, my hypothesis is that countries that are more corrupt (lower scores on the TPI’s measure)
are less likely to be wealthy. Corruption scores are the independent variable. Wealth is the outcome variable.
2. Summarize both variables by reporting the minimum and maximum values, relevant mea-
sures of central tendency, as well as the standard deviation. Produce a histogram for each
variable (2 plots total: histogram for IV & histogram for DV). Make your plots as nice-looking
and informative as possible. Save the graphs to your working directory using "Export" in RStu-
dio’s plot viewer. Include these graphs in your write-up. In one paragraph, draw on this data
to describe the distributions of each variable in your own words.
# Independent variable: Corruption

describe(tpi.df$ti_cpi)

## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 170 4.05 2.11 3.3 3.75 1.63 1.2 9.7 8.5 1.11 0.28 0.16
1 Note: The name of the scale may sound a bit counter-intuitive. Higher scores reflect less corruption.
2 Review also lab guides 1, 2, and 3 if need be, and come to office hours sooner rather than later for extra help.

1
hist(tpi.df$ti_cpi,
main = "Univariate distribution of Transparency International's \n Corruption Perception Index",
xlab = "Corruption Rating \n 0 (highly corrupt) to 10 (highly clean)")

Univariate distribution of Transparency International's


Corruption Perception Index
60
50
40
Frequency

30
20
10
0

2 4 6 8 10
Corruption Rating
0 (highly corrupt) to 10 (highly clean)
table(tpi.df$ti_cpi)

##
## 1.2 1.6 1.7 1.8 1.9 2 2.0999999 2.2
## 1 2 5 1 4 3 5 8
## 2.3 2.4000001 2.5 2.5999999 2.7 2.8 2.9000001 3
## 5 5 6 8 9 3 3 7
## 3.0999999 3.2 3.3 3.4000001 3.5 3.5999999 3.7 3.8
## 4 3 4 4 4 2 5 1
## 3.9000001 4 4.1999998 4.3000002 4.4000001 4.5 4.5999999 4.8000002
## 1 6 1 1 1 8 1 4
## 4.9000001 5.0999999 5.1999998 5.3000002 5.5999999 5.6999998 6 6.0999999
## 4 1 2 1 2 1 2 3
## 6.3000002 6.4000001 6.8000002 6.9000001 7.0999999 7.3000002 7.5 7.6999998
## 3 1 2 1 3 3 1 1
## 7.8000002 8.5 8.6000004 8.6999998 9 9.3000002 9.3999996 9.5
## 1 2 1 1 3 2 1 2
## 9.6999998
## 1
Transparency International’s corruptions scores range from a low of 1.2 to a high of 9.7. Because the variable
is measured on a continuous scale, we should report all three measures of central tendency3 . The mean
corruption score is 4.05 and the median value is 3.3. The modal value, as indicated by the frequency table, is
2.7. The standard deviation is 2.11.
3 Note: If we had a nominal variable, it would not be appropriate to report to mean or the median.

2
# Dependent variable: Wealth

describe(tpi.df$undp_gdp)

## vars n mean sd median trimmed mad min max range skew


## X1 1 170 8949.68 9986.85 5279.5 7105.41 5455.23 520 61190 60670 1.83
## kurtosis se
## X1 3.98 765.96
hist(tpi.df$undp_gdp,
main = "Distribution of GDP per capita (US$) in the world",
xlab = "GDP per capita (US$)")

Distribution of GDP per capita (US$) in the world


80
60
Frequency

40
20
0

0 10000 20000 30000 40000 50000 60000

GDP per capita (US$)


table(tpi.df$undp_gdp)

##
## 520 580 630 650 710 740 780 800 840 860 870 890 930
## 1 2 1 1 1 1 1 1 1 1 1 1 1
## 980 1020 1027 1050 1070 1100 1170 1270 1317 1370 1390 1470 1480
## 2 2 1 1 1 1 1 1 1 1 1 1 1
## 1520 1580 1590 1610 1620 1670 1690 1700 1710 1720 1820 1940 1969
## 1 1 1 1 1 1 2 1 1 1 1 1 1
## 1990 2000 2060 2100 2130 2220 2260 2270 2300 2400 2420 2460 2470
## 1 1 1 1 2 1 1 1 1 1 1 1 1
## 2600 2670 2890 3120 3210 3230 3570 3580 3620 3810 3980 4080 4170
## 1 1 1 1 1 1 1 1 1 2 1 1 1
## 4220 4260 4300 4360 4550 4580 4610 4798 4830 4870 4890 5000 5010
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 5259 5300 5380 5440 5460 5520 5600 5640 5760 5870 5970 6080 6170
## 1 1 1 1 1 1 1 1 1 1 1 1 1

3
## 6210 6370 6390 6470 6560 6590 6640 6690 6760 6850 7010 7130 7280
## 1 1 1 1 1 2 1 1 1 1 1 1 1
## 7570 7770 7830 8170 8230 8840 8970 9120 9210 9430 9820 10070 10240
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 10320 10560 10810 10880 12260 12650 12840 13340 13400 15290 15780 16240 16950
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 17170 17640 18232 18280 18360 18540 18720 19530 19844 21460 21740 22420 24040
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 26050 26150 26190 26430 26920 26940 27100 27570 28260 29100 29220 29480 29750
## 1 1 1 1 1 1 1 1 1 1 1 1 1
## 30010 30130 30940 35750 36360 36600 61190
## 1 1 1 1 1 1 1
The averge GDP per capita in the sample is USD$8,949.68. The median GDP per capita is USD$5,279.50.
In this case, the mode is not particularly useful as there are many unique values in the dataset, and very
few observations that take on the same value. Nonetheless, we can still identify the modal values from the
frequency table; they are 580, 980, 1020, 1690, 2130, 3810, and 6590. GDP per capita ranges from a low of
US$520 to a high of USD$61,190. There is also a fairly large amount of variation in the sample, as indicated
by the standard deviaion of USD$9,986.85, which is higher than the average value.
3. Recode corruption into a categorical variable with two levels: low and high. In a sentence
or two, justify your reasoning for what constitutes "low" and "high" levels of corruption by
making reference to your data.
Students may decide on various thresholds to split the sample into “high” and “low” levels of corruption.
What is critical is that you justify your decision with respect to the data. You cannot, for example, be so
selective in that you are left with categories that are essentially empty.
In this case, you may wish to split your sample at either the mean or the median. Another alternative is to
use the mid-point of the scale. Below, I split the sample based on the median.
tpi.df$corr_r <- ifelse(tpi.df$ti_cpi >= 3.3, "Cleaner", "More corrupt")

4. Recode wealth into a categorical variable with three levels: low, moderate, and high. In
a sentence or two, justify your reasoning for what constitutes "low", "moderate," and "high"
levels of wealth by making reference to your data.
As above, there are several different thresholds you might select to recode your variable. Here, though, as we
are splitting the sample into three categories, we should pay close attention to the distribution along the
range of values, and not rely solely on the mean, median, or midpoint of the scale. You must also be careful
that you are not so selective in what constitutes, for example, a “high” level of wealth that you are left with
very few “wealthy countries” (which would be apparent from your cross-tab in the question below).
tpi.df$wealth_r[tpi.df$undp_gdp <= 5279.5] <- "Low wealth" # Lowest half of the world
tpi.df$wealth_r[tpi.df$undp_gdp > 5279.5 & tpi.df$undp_gdp < 9986.85] <- "Moderate wealth"
tpi.df$wealth_r[tpi.df$undp_gdp >= 9986.85] <- "High wealth" # 1 SD or more above the mean

table(tpi.df$wealth_r)

##
## High wealth Low wealth Moderate wealth
## 48 85 37
# Note that the order of these categories is alphabetical.
# For our crosstab, we will want to ensure they are in logical order:

tpi.df$wealth_r <- factor(tpi.df$wealth_r,


levels = c("Low wealth", "Moderate wealth", "High wealth"))

4
5. Install and/or load the gmodels package. Using CrossTable(), produce a contingency table
to examine the relationship between your recoded corruption and wealth variables. Make
sure you place your variables in the correct position in your table. In a paragraph, use
this information to say something about the covariation (i.e., the association or relationship)
between corruption and wealth.
To set up your crosstab, it is essential that the variable you identified as your dependent variable in Question 1
above goes across the rows of your table, and the independent variable goes down your columns. All cross-tabs
must be set up this way because we rely on column percentages to assess how changes in the independent
variable lead to outcomes on the dependent variable. Note that depending on how you categorized your
variable, your cross-tab will look a bit different.
library(gmodels)

CrossTable(tpi.df$wealth_r, tpi.df$corr_r, # DV first, then IV.


prop.c = T,
prop.r = F,
prop.t = F,
prop.chisq = F,
format = "SPSS") # Not essential to have this format but can be helpful for percentages

##
## Cell Contents
## |-------------------------|
## | Count |
## | Column Percent |
## |-------------------------|
##
## Total Observations in Table: 170
##
## | tpi.df$corr_r
## tpi.df$wealth_r | Cleaner | More corrupt | Row Total |
## ----------------|--------------|--------------|--------------|
## Low wealth | 17 | 68 | 85 |
## | 19.318% | 82.927% | |
## ----------------|--------------|--------------|--------------|
## Moderate wealth | 25 | 12 | 37 |
## | 28.409% | 14.634% | |
## ----------------|--------------|--------------|--------------|
## High wealth | 46 | 2 | 48 |
## | 52.273% | 2.439% | |
## ----------------|--------------|--------------|--------------|
## Column Total | 88 | 82 | 170 |
## | 51.765% | 48.235% | |
## ----------------|--------------|--------------|--------------|
##
##
The results from the cross-tab illustrate a clear relationship between levels of corruption and wealth.
Specifically, countries that are less corrupt (i.e., “cleaner”) are much more likely to have moderate or high
levels of wealth compared to more corrupt countries. In particular, less than 20% of non-corrupt countries
are classified as low levels of wealth (19.3%) while the overwhelming majority of corrupt countries are scored
as low wealth (82.9%). We can also see that there are very few countries that are both corrupt and wealthy
(n = 2, 2.4%).

You might also like