WEEK 3 Activity - Assignment 1
WEEK 3 Activity - Assignment 1
Instructions: Provide your best answer to all questions, and submit your answers to the Assignment
on OWL. You may either fill out the document (and save it with your owl ID and the rest of your
groups ID infront – ex: bdavis56_WEEK 3 ACTIVITY.docx) or submit just the answers. May be
done in groups of up to 5. Answers will be checked for undue similarities with other groups where
applicable. Everyone should submit the assignment to OWL.
1) Last week you were asked to create a scatterplot using ggplot2 and the gapminder dataset to see the
relationship between gdpPercap (GDP per capita) and lifeExp (life expectancy).
a) Which is the predictor variable and which is the outcome? Why? (2 marks)
ANSWER: gdpPercap is the predictor variable because it is the independent variable, whereas lifeExp is
dependent and a function of GDP per capita.
b) Which variable belongs on the x-axis and which variable belongs on the y-axis of your plot? (1
mark)
ANSWER: lifeExp belongs on the Y-axis, and GDP per capita belongs on the x axis.
2) Classify each of the following variables as numerical and continuous, numerical and discrete, ordinal, or
nominal by marking the appropriate box: (3 marks)
4) Create a graph using the gapminder data set, using only the data for Asia from the year 2000 to the
present, make a scatterplot of “Population (in millions)” vs. “Life expectancy”. Paste your script below,
using hashtags (#) to comment about the purpose of each line. After loading packages and gapminder
dataset, I should be able to paste your script into RStudio and get the same plot that you made. Include a
copy of your plot, too. (10 marks)
ANSWER:
view (gapminder) # Just checking out the full gapminder dataset. Looks like data was collected every 5
years
asia_data <- subset(gapminder, continent == "Asia" & year >= 2000) # This creates a subset for the asia
data that i can work with using the year parameter
view (asia_data) # Just checking out the Asia data. looks good
ggplot(asia_data, aes(x = pop / 1e6, y = lifeExp)) + # This creates a plot of population in millions against
life expectancy
geom_point() + # This adds points to the scatterplot
labs(title = "Scatterplot of Population vs. Life Expectancy in Asia (2000 - Present)",
x = "Population (in millions)",
y = "Life Expectancy") # And these are the names of the labels i want in the graph
5) Create a graph using the gapminder data set to help you determine whether life expectancy follows a
normal distribution. Paste your script below. Does life expectancy follow a normal distribution? Y/N (3
marks)
ANSWER:
6) Create a graph using the gapminder data set, which shows the total population of each continent in the
year 2007. (6 marks)
ANSWER:
data2007 <- subset(gapminder, year == 2007) # This filters the gapminder dataset for the year 2007
ggplot(data2007, aes(x = continent, y = pop, fill = continent)) +
geom_bar(stat = "identity") +
labs(title = "Total Population of Each Continent in 2007",
x = "Continent",
y = "Total Population",
fill = "Continent") # This creates a bar chart of the total population for each continent
7) Indicate whether each question is best answered by Chi square, T-test, or ANOVA. If you choose T-test,
indicate whether it is a 1-tailed or 2-tailed test, and whether it is a paired sample test. For ANOVA, indicate
whether it is 1-way, 2-way, and/or repeat measures.
a) Is the average population of European countries in 1952 different from the average population of
European countries in 2007? (2 marks)
b) Do student grades differ between the departments of Chemistry, Biology, and Physics? (2 marks)
c) Do the grades of Chemistry students improve after using a tutoring service? (2 marks)
ANSWER: Chi-square