0% found this document useful (0 votes)
64 views3 pages

PO687 End of Term Project

This document outlines the requirements for an end of term statistics project. Students must: 1) Select a dataset, identify an outcome and predictor variable, and formulate working and null hypotheses. 2) Describe the two variables with appropriate univariate statistics and visualizations. 3) Create a graph to illustrate the bivariate relationship and test the hypothesis with a t-test or non-parametric equivalent. 4) Test the hypothesis with bivariate regression and interpret the results. 5) Expand the analysis by including two additional variables, generating hypotheses, and running/interpreting a multiple regression model with diagnostics. Compare the new model to the initial bivariate regression.

Uploaded by

pp3986
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views3 pages

PO687 End of Term Project

This document outlines the requirements for an end of term statistics project. Students must: 1) Select a dataset, identify an outcome and predictor variable, and formulate working and null hypotheses. 2) Describe the two variables with appropriate univariate statistics and visualizations. 3) Create a graph to illustrate the bivariate relationship and test the hypothesis with a t-test or non-parametric equivalent. 4) Test the hypothesis with bivariate regression and interpret the results. 5) Expand the analysis by including two additional variables, generating hypotheses, and running/interpreting a multiple regression model with diagnostics. Compare the new model to the initial bivariate regression.

Uploaded by

pp3986
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

PO687 End of term project

Dr Raluca Popp
December 7, 2020

The rationale behind the project:


• it will test all the stats skills you acquired this term - from formulating
hypotheses, to visualising relationships, running statistical analysis, pre-
senting and interpreting the results, but also data management, such as
recoding of variables, where needed;
• you have some freedom over the analysis you will run; You have to pick
one of the 3 datasets available to you, and you get two pick the variables
you will use in the analysis;
• rather than telling you exactly what methods to apply, you will need to
think about the variables you are using and which are the appropriate
statistical techniques to test the relationship(s) between the variables you
chose;
• think about it as a miniature research project, but one in which you don’t
need a theory and literature review part. Treat this as practice for your
dissertation next year (if you choose to write one).
Your seminar leaders will not show you how to run analysis on the three
dataset for the final project. Statistical analysis is run the same way, following
the same principles. If you learn which functions to run when, you will then be
able to apply them to any dataset.

A word on R code:
• It is not mandatory to add your R code to the assignment, but it is
recommended. It does not count towards the word limit (which is not
strict, anyway) and you will be not marked on it. However, it helps us
when marking the assignment.
• If you produce your document in Word, then you can add the code at the
end of the assignment.
• If you produce the assignment using RMarkdown, then you don’t need to
include the code at the end, as it is part of the document.

1
Formulate hypotheses
1. Pick a dataset among gss, nes and world. Inspect it, have a look at the
variables it contains and at the codebook. Select an outcome and a predictor
variable. These will be the central elements of your assignment. Remember
that the outcome variable needs to be interval, ratio or high-level ordinal - what
we call a continuous variable. Feel free to recode variables where you need to.
Formulate the working and the null hypotheses. (15 points)

Univariate statistics and visualisations


2. Describe the two variables. Create appropriate visualisations for each
variable, accompanied by the appropriate descriptive statistics (hint: it all de-
pends on the level of measurement). (15 points)

Visualise a bivariate relationship


3. Thinking about the type of variable you selected, create a graph that will
illustrate the relationship between your dependent and independent variables.
Remember that visualisations have to be nice to look at, represent the data
truthfully, be clear and informative. In other words, do not forget to add titles,
labels and so on. (15 points)

Hypothesis testing with a t-test or a non-parametric test


4. Test the hypothesis you formulated in Step 1 using a t-test or a non-
parametric test, depending on which one is appropriate (hint: remember it
depends on whether the variable is normally distributed or not). Report the
test statistics, and its associated p-value. Use the .05 cut off point for statistical
significance and interpret the results. (15 points)

Bivariate regression
5. Test the hypothesis you formulated in Step 1 using a regression model.
Present the regression results in a table and interpret them. Use the .05 cut off
point for statistical significance. (15 points)

Multiple regression
6. Expand on the relationship you tested above, by choosing another two
variables that could improve your model. Feel free to recode variables.
6a. Create hypotheses for each new variable (and your outcome variable).
(5 points)
6b. Present univariate analysis on the new variables (descriptive statistics
and visualisations). (5 points)
6c. Run a regression model that includes the new variables. Present the
regression results in a table and interpret them. Use the .05 cut off point for
statistical significance. Run regression diagnostics for your model and discuss
whether your model respects OLS assumptions. If it violates any assumptions,
you need to indicate how you would fix the issue. You don’t need to re-run the
model. (10 points)

2
6d. Compare the new regression model to the model from Step 5, using
the appropriate statistical test. Report the results and interpret them. Is the
second regression model more informative? (5 points)

You might also like