PO687 End of Term Project
PO687 End of Term Project
Dr Raluca Popp
December 7, 2020
A word on R code:
• It is not mandatory to add your R code to the assignment, but it is
recommended. It does not count towards the word limit (which is not
strict, anyway) and you will be not marked on it. However, it helps us
when marking the assignment.
• If you produce your document in Word, then you can add the code at the
end of the assignment.
• If you produce the assignment using RMarkdown, then you don’t need to
include the code at the end, as it is part of the document.
1
Formulate hypotheses
1. Pick a dataset among gss, nes and world. Inspect it, have a look at the
variables it contains and at the codebook. Select an outcome and a predictor
variable. These will be the central elements of your assignment. Remember
that the outcome variable needs to be interval, ratio or high-level ordinal - what
we call a continuous variable. Feel free to recode variables where you need to.
Formulate the working and the null hypotheses. (15 points)
Bivariate regression
5. Test the hypothesis you formulated in Step 1 using a regression model.
Present the regression results in a table and interpret them. Use the .05 cut off
point for statistical significance. (15 points)
Multiple regression
6. Expand on the relationship you tested above, by choosing another two
variables that could improve your model. Feel free to recode variables.
6a. Create hypotheses for each new variable (and your outcome variable).
(5 points)
6b. Present univariate analysis on the new variables (descriptive statistics
and visualisations). (5 points)
6c. Run a regression model that includes the new variables. Present the
regression results in a table and interpret them. Use the .05 cut off point for
statistical significance. Run regression diagnostics for your model and discuss
whether your model respects OLS assumptions. If it violates any assumptions,
you need to indicate how you would fix the issue. You don’t need to re-run the
model. (10 points)
2
6d. Compare the new regression model to the model from Step 5, using
the appropriate statistical test. Report the results and interpret them. Is the
second regression model more informative? (5 points)