Stat Exam
Stat Exam
● Definition: Measures how spread out ● Definition: Indicates how far a data
data is around the average. point is from the average, helping
compare scores.
● Ex: In a class, if most test scores are
close to the average (80), there's low ● Ex: If your test score of 85 is one
standard deviation. standard deviation above the average
(Z score = 1).
Error: Variables:
● Definition: The difference between ● Definition: Discrete (separate
predicted and actual values in a values), Continuous (range of values),
model. Categorical (categories).
● Ex: The difference between predicted ● Ex: Discrete - Count of books (1, 2,
(85) and actual (80) test score is the 3), Continuous - Height (160cm,
error (5). 165cm), Categorical - Colors (Red,
Blue).
❖ Parameters: The numbers tell us the relationship between the predictor (mean) and
outcome (real world).
- A model is described by parameters (bs), and all parameters are estimated. Estimated
because we're using samples since we don't have access to entire populations. We use NHST
to test significance for parameters.
● Standard Error - Tells us how well the parameter estimates the value in a population. To
compute confidence intervals/significance tests we need the standard error.
> Mean - The sum of all scores, divided by the number of scores. Also is the value from which
the squared scores deviate least (some error).
Summary: We use the mean as a model but there is always some kind of error. Errors are
important to make an assessment on the model being a good fit.
- The mean is a model of what happens in the real world: the typical score. It is not a perfect
representation of the data.
- How do we assess how well the mean represents reality? A perfect fit: every data point is
exactly the same. All the scores are the same. Never happens.
➢ Calculating Deviation/Error:
Deviation: Difference between the mean and an actual data point. What was observed - what
the model predicts.
➢ Variance (s²): The sum of squares measures the overall variability (fit), but is dependent
on the number of scores.
- Calculate the average variability by dividing the number of scores (n). s² = ss/(n-1)
- Because the variance is measured in units squared we can't take the square root values. It
becomes the Standard Deviation.
- Standard deviation is just a measure of whether the mean is or isn't representative of the data.
Recap: Sum of squares is the total error, variance is the average error, sd is the average error
expressed in usual units.Think of the sum of squares as how well the model represents/fits the
observed data.
- Big error = too spread out, inaccurate | Small error = more likely to represent the population.
❖ Confidence Intervals:
- All parameters have an associated probability distribution. We use them to determine if the
probability is likely/unlikely. We work out the prob of getting at least the value we have if a null
hypothesis of interest is true (ex: if b = 0, or b1 = b2).
Ex: Would having an invisibility cloak make you more mischievous? Group 1 has a cloak, group
2 does not (t-test).
- The mean difference is the same, gr.1 p value is less than .05 so it is significant. gr.2 p
value is bigger than .05 so it isn't significant.
Why? significance is dependent on the sample size, gr.1 has n=200 people, gr.2 has n=10.
- If the mean diff is 0 for both groups, p value is .02 so it is not significant. Why? the sample is
huge 1m people.
- Limitations: Can be misleading because if you have a larger sample size you can have small
effects that seem to be important and vice versa for small groups that are deemed non
significant. Use effect size instead.
❖ [Effect Size:]
Cohen's D: Difference between the means of the experimental and control group divided by the
standard deviation of the control group.
- D = x̅₁(exp) - x̅₂(con)/s(con)
Recap: Effect sizes give an accurate representation, significance testing is dependent on the
sample.
=-=-=-=
P-Value - Measures how likely it is that any observed difference between groups is due to
chance. Ex: Testing a new drug - if p-value is low, it suggests the drug might be effective.
● Determined by sample size. A large sample can make a small effect size
significant/small can be insignificant. There are no absolutes in NHST.
● P-value is the probability of getting a statistic at least as large as the one observed
relative to all possible values of null from an 'infinite number of identical replications' of
the experiment. They tell us nothing about the importance.
P-Hacking - Selective reporting of significant p-values, trying multiple and reporting the one with
significant yields. HARKing - Making a hypothesis after data analysis.
❖ [EMBER/SREMB:]
Effect Size (third): Measures like Cohen’s d Meta-analysis (fourth): Combines results
help understand the strength of effects. from different studies on the same topic.
Ex: Combining existing knowledge with new Ex: Even with a low p-value, consider if the
data to refine conclusions about a treatment. drug's effect is practically meaningful for
patients.
❖ [Effect Sizes:]
=-=-=-=
Week 4: Data and Graphs
Colour - Many graphs distinguish groups using different colour | Population Pyramid - A
histogram split by group.
=-=-=-=
Week 5: Beast of Bias
Assumptions of Parametric Tests: Tests assume Central Limit Theorem: With a big enough sample,
linearity, normality, and equal variance. averages tend to be normally distributed.
● Ex: Testing a new teaching method - ● Ex: Asking a few friends about their favorite
assuming linearity, normality, and equal color might vary, but asking many gives a
variance. more consistent average.
Residuals & Outliers: Differences between Testing for Normality and Homogeneity:
observed and predicted values; outliers can skew
results. ● Definition: Tests like K-S and Levene's check
for normality and equal variance.
● Ex: Predicting exam scores - the difference
between predicted and actual scores are ● Ex: Using tests like K-S and Levene's to
residuals; someone who studied check if data behaves well with statistical
exceptionally hard might be an outlier. methods.
● Ex: Skewness - how asymmetrical your ● Ex: Estimating the average height of people in
data is (like tall people in a room). Kurtosis your town by asking a smaller group multiple
- how thick or thin the tails of your data are. times and averaging the results.
Linearity and Additivity - The relationship between predictor(s) and outcome is linear. Curved
data is not linear.
Errors: Difference between predicted values and observed values of the outcome variable in
the population model. Values CANNOT be observed.
-
❖ [Outliers:]
Influential Cases - When the outlier is present the b is positive, when it is removed it is a 0 b.
Quicker way is using Cook's Distance: Measures the influence of a single case on the model
as a whole. Absolute values >1 may be a concern.
1) The population needs to be normally distributed, 2) the sampling distribution of the parameter
being measured needs to be normal (mean or b), 3) for parameters to be optimal, the error
(residuals) needs to be normal.
Assumptions underlying the use of parametric tests (based on the normal distribution):
Some features of the data should be normally distributed. The samples being tested should
have approximately equal variances. The data should be at least interval level.
❖ [Central Limit Theorem:] Because of CLT, if the sample is large enough you can
ignore the assumption of normality.
➢ What Normality Affects:
● Significance testing due to the sampling distribution if the sample size is small.
● The residuals/errors are not normal.
● Though we need population distributions to be sure of normality, we use sample distributions
because of limited access to population data. This is why we rely on CLT.
➢ Detecting Normality:
SPSS Test: Kolmogrov-Smirnov (K-S test). P < .05 is sig different to normal, P>.05 normal
distribution.
➢ Normality Tests:
Homogeneity: Assumption that the spread of outcome scores is roughly equal at different
points on the predictor variable.
➢ Detecting Homogeneity:
Limitations: There are good reasons not to use Levene’s test or the variance ratio. In large
samples they can be significant when group variances are similar, and in small samples they
can be non-significant when group variances are very different.
Ex: Estimating the average height of people in your town by asking a smaller group multiple
times and averaging the results.
Limitation: While heteroscedasticity does not cause bias in the coefficient estimates, it does
make them less precise.
=-=-=-=
❖ [Parametric Tests:]
Paired T-Test (one group): Compares the T-Test: Hypothesis testing, compares
means of two measurements taken from the variance, compares means, >5 is not
same individual, object, or related units. significant so means are the same.
Unpaired T-Test (two groups): Compares One Simple Test: Determines if an unknown
the averages/means of two independent or population mean is different from a specific
unrelated groups to determine if there is a value. Ex: Find out if the screws your
significant difference between the two. company produces really weigh 10 grams on
average. To test this, weigh 50 screws and
compare the actual weight with the weight
they should have (10 grams).
One way ANOVA (greater than two): Independent sample: Comparison of means
between two independent groups, with a
Analysis of variants. paired t test for paired data. As the t test is a
parametric test, samples should meet certain
Compares the means of two or more preconditions, such as normality, equal
independent groups to determine if there is variances and independence.
evidence that the population means are
significantly different. Ex: Determine if the Mind Over Matter coping
strategy was more effective at reducing
anxiety than deep breathing exercises
❖ [Non-Parametric Tests:]
Maan-Witney Test: Compares two sample means 2 independent samples with at least ordinal
from the same population, tests whether two sample scaled characteristics need to be available.
means are equal or not. Variables do not have to satisfy any
distribution curve.
● The values of the mean ranks tell you how
the group differs (highest scores have the
highest mean rank).
Null Hypothesis: No difference (central tendency) between the two groups in the population.
Alternative Hypothesis: There is a difference (central tendency) between two groups in the
population.
Kruskal-Wallis H-Test: Compares several If you predict that the medians will increase or
conditions when different participants take part in decrease across your groups in a specific
each condition and the resulting data have order then test this with the
unusual cases or violate any assumption. Jonckheere–Terpstra test.
➢ Assumption:
● One independent variable with two or more levels (independent groups). Test is used when you
have three or more levels.
● All groups should have the same shape distribution, SPSS tests this condition.
➢ Hypothesis:
● Kruskal Wallis test will tell you if there is a significant difference between groups. It won’t tell you
WHICH groups are different, that needs a Post Hoc test.
❖ [Friedman’s ANOVA:]
➢ Hypothesis:
❖ [Spearman’s Coefficient:]
.
When do we use a non-parametric
test? If the median more
accurately represents the center
of distribution in your data, even if
the sample size is large
=-=-=-=
Week 7: Correlation and Chi-Squares
Kendall’s Correlation:
● Ex: Ranking friends based on their helpfulness and seeing how similar your rankings
are.
● Ex: Exploring how both study hours and sleep affect exam scores while considering
other factors.
Chi-Square: Used to show whether or not there is a relationship between two categorical
variables. It can also be used to test if a number of outcomes are occurring in equal frequencies
or not, or conform to a known distribution.
● Ex: When rolling a die, there are six possible outcomes. After rolling a die hundreds of
times, you could tabulate the number of times each outcome occurred and use the
chi-square statistic to test whether these outcomes were occurring in basically equal
frequencies or not (e.g., to test whether the die is weighted).
When to Use Different Tests: Choose Pearson, Spearman, Kendall, or Chi-square based on
data characteristics.
● Ex: Using Pearson for regular relationships, Spearman for non-linear, Kendall for
rankings, and Chi-square for independence between categories.
❖ [Correlation:]
The correlation coefficient is a commonly used measure of the size of an effect: values of ±0.1
represent a small effect, ±0.3 is a medium effect and ±0.5 is a large effect. However, interpret
the size of correlation within the context of the research you’ve done rather than blindly following
these benchmarks.
❖ [Correlations:]
Spearman’s correlation coefficient, rs - A non-parametric statistic and requires only ordinal
data for both variables.
Kendall’s correlation coefficient, τ - Like Spearman’s rs but probably better for small samples.
Partial correlation - Quantifies the relationship between two variables while accounting for the
effects of a third variable on both variables in the original correlation.
Semi-partial correlation - Quantifies the relationship between two variables while accounting
for the effects of a third variable on only one of the variables in the original correlation.
=-=-=-=
Week 8: The Linear Model
● Ex: Predicting weight based on height using a straight line that best fits the data.
● Ex: Plotting heights and weights and drawing a line that represents the average
relationship.
Model Parameters, Fit, and Multicollinearity: Parameters are intercept and slope; fit is
measured by R-squared and F-test.
● Ex: Parameters are like the slope and intercept of your line; R-squared tells you how
well your line fits the data. Be cautious with multicollinearity, where factors are too
related, making it hard to figure out each one's effect.
Define: A way of predicting values of one variable from another based on a model that
describes a straight line. This line is the line that best summarizes the pattern of the data.
➢ To assess how well the model fits the data use:
R2 - Tells us how much variance is explained by the model compared to how much variance
there is to explain in the first place. It is the proportion of variance in the outcome variable that is
shared by the predictor variable.
F, - Tells us how much variability the model can explain relative to how much it can’t explain (i.e.,
it’s the ratio of how good the model is compared to how bad it is).
The b-value - Tells us the gradient of the regression line and the strength of the relationship
between a predictor and the outcome variable. If it is significant (Sig. < 0.05 in the SPSS output)
then the predictor variable significantly predicts the outcome variable.
❖ [Descriptive Statistics:]
Use the descriptive statistics to check the correlation matrix for multicollinearity; that is,
predictors that correlate too highly with each other, r > 0.9.
The fit of the linear model can be assessed using the Model Summary and ANOVA tables from
SPSS.
If you have done a hierarchical regression, assess the improvement of the model at each stage
by looking at the change in R2 and whether it is significant (values less than 0.05 in the column
labeled Sig. F Change).
The F-test tells us whether the model is a significant fit to the data overall (look for values less
than 0.05 in the column labelled Sig.).
❖ [Coefficients:]
The individual contribution of variables to the regression model can be found in the Coefficients
table. If you have done a hierarchical regression then look at the values for the final model.
You can see whether each predictor variable has made a significant contribution to predicting
the outcome by looking at the column labelled Sig. (values less than 0.05 are significant).
The standardized beta values tell you the importance of each predictor (bigger absolute value =
more important).
The tolerance and VIF values will also come in handy later, so make a note of them.
❖ [Multicollinearity:]
To check for multicollinearity, use the VIF values from the table labelled Coefficients.
If these values are less than 10 then that indicates there probably isn’t cause for concern.
If you take the average of VIF values, and it is not substantially greater than 1, then there’s also
no cause for concern.
❖ [Residuals:]
● Look at standardized residuals and check that no more than 5% of cases have absolute values
above 2, and that no more than about 1% have absolute values above 2.5. Any case with a
value above about 3 could be an outlier.
● Look in the data editor for the values of Cook’s distance: any value above 1 indicates a case
that might be influencing the model.
Calculate the average leverage and look for values greater than twice or three times this
average value.
> Mahalanobis distance- A crude check is to look for values above 25 in large samples (500)
and values above15 in smaller samples (100). However, Barnett and Lewis (1978) should be
consulted for more refined guidelines.
Calculate the upper and lower limit of acceptable values for the covariance ratio, CVR. Cases
that have a CVR that fall outside these limits may be problematic.
❖ [Model Assumptions:]
Look at the graph of ZRESID* plotted against ZPRED*. If it looks like a random array of dots
then this is good. If the dots get more or less spread out over the graph (look like a funnel) then
the assumption of homogeneity of variance is probably unrealistic. If the dots have a pattern to
them (i.e., a curved shape) then the assumption of linearity is probably not true. If the dots seem
to have a pattern and are more spread out at some points on the plot than others then this could
reflect violations of both homogeneity of variance and linearity. Any of these scenarios puts the
validity of your model into question. Repeat the above for all partial plots too.
Look at the histogram and P-P plot. If the histogram looks like a normal distribution (and the P-P
plot looks like a diagonal line), then all is well. If the histogram looks non-normal and the P-P
plot looks like a wiggly snake curving around a diagonal line then things are less good. Be
warned, though: distributions can look very non-normal in small samples even when they are
normal.