ProjectInstructions GradeRubric
ProjectInstructions GradeRubric
Grade rubric
Below provides a specific grading rubric for your research project. If you fail to fulfill a
bullet point below, you will lose one or more points depending on how far away you are from
fulfilling it. Your project’s full credit is 100 points.
• Introduce your control variables. You must have at least 3 control variables. [1 point]
• Provide strong reasons why you should include the control variables in your model.
For each of the control variables, provide the reasons. [8 points]
• Provide your regression model in this way: Y = beta_0 + beta_1X_1 + beta_2X_2 +
… Insert your variables for Y and X’s. [1 point]
• For each of the dependent, primary independent, and control variables, provide
numerical and graphical EDA. You should clarify what type the variable is (i.e.,
continuous/categorical), along with appropriate numerical (e.g., frequency table,
mean/median/standard deviation, etc) and graphical summaries (barplot, histogram,
etc.) [10 points]
• Provide numerical and graphical EDA for bivariate relationships (two-variable
relationships): Dependent – Primary Independent and Dependent – Each of the
Control Variables. [10 points]
• Discussion of EDA results: 1) What implications do you extract from your EDA for
your hypothesis? [5 points] 2) What implications do you extract from your EDA for
regression diagnostics? (e.g., do you see any evidence for possible violation of the
regression assumptions, or do you not see any evidence?) [5 points]
5. What to include in the Analysis 2 – Hypothesis Test Results section? [40 points]
• Regression Diagnostics [15 points]
o Draw a plot for Residuals-versus-Fitted Values and discuss the plot regarding
the assumptions for linearity, random disturbances, and constant variances.
o Draw a Normal QQ plot and discuss the normality assumption.
o Test for multicollnearity and discuss the results.
o According to the regression diagnostics, come up with a solution if necessary.
Discuss what improvements your solution makes. (You will need to run
regression diagnostics for your new model if your solution leads to a new
model.)
• Interpretations [20 points]
o What's the meaning of the coefficient for each of the independent variables
(primary and control)?
o Is the coefficient statistically significant at 0.05 or 0.01 level?
o Are the effect sizes substantively large?
o Interpret the overall significance test result and Adjusted R^2.
• Conclusion [5 points]
o What do the analysis results suggest for your hypothesis?
o Your study’s weaknesses or limitations that lead to your future research plan.
• Your regression diagnostics could be surprisingly bad compared to some of the “nice”
examples you’ve seen elsewhere in the course. In many cases, even after improving
diagnostics [for instance with transformations], they may still show various problems
or artifacts due to the nature of the data. If so, say so; and also say what other models
you tried and what those diagnostics looked like (and hence justify your final choice
comparatively; why was it best on balance?)
• Diagnostics will likely be improved by transformations.
• On the other hand, too many transformations make the interpretations more
complicated; you will need to make your own decisions about balancing and
articulating these tradeoffs and justifying whatever final decisions you make.
• If your primary independent variable is not significant in a model you try, it’s fine for
the project. It will indicate that there is no evidence that supports your hypothesis.
You might want to try another model, though. You don’t have to stick to one research
question/hypothesis. Change your hypothesis and model if you want or need.
• If none of the variables, including primary and control, is significant in a model you
try, then the model is no better than merely randomly guessing and then the model
should not be used in practice. Although it's still fine for this project, I would
recommend you should try another model that might be more interesting.
• For most models you might fit to most of these datasets, the Adjusted R^2 will be
surprisingly low compared to some of the “nice” examples you’ve seen elsewhere in
the course. That might simply be the nature of the relationship, and it’s not a sign of a
mistake or of an unjustified model. The primary goal is a model as justified as
possible, on balance, and with significant terms.
III. Useful advice for your writing