0% found this document useful (0 votes)
15 views21 pages

Stat Exam

This document provides an overview of key statistical concepts: 1. It introduces the SPINE of statistics - that outcomes are the result of a model plus error, and defines variables, parameters, and important equations. 2. It then explains how to measure the "fit" of a model using concepts like the mean, deviation, variance, standard deviation, and confidence intervals. 3. It concludes by discussing null hypothesis significance testing, limitations of p-values, and introducing alternatives like effect sizes and meta-analysis that provide a more accurate understanding of results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views21 pages

Stat Exam

This document provides an overview of key statistical concepts: 1. It introduces the SPINE of statistics - that outcomes are the result of a model plus error, and defines variables, parameters, and important equations. 2. It then explains how to measure the "fit" of a model using concepts like the mean, deviation, variance, standard deviation, and confidence intervals. 3. It concludes by discussing null hypothesis significance testing, limitations of p-values, and introducing alternatives like effect sizes and meta-analysis that provide a more accurate understanding of results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Quantitative Research Method (KH4013SPS)

Tests/Due: Nov 26 stats. Report 22 December

Week 2: The SPINE of Statistics

Outcome = Model + Error: Central Tendency:

● Definition: Results come from a ● Definition: Finding the middle or


model's prediction plus inherent typical value in a set (mean, median,
mistakes or randomness. mode)
.
● Ex: Predicting your exam score using ● Ex: The average (central tendency)
a study hours model might give 85, but test score of a class: 75, 80, 85. Mean
actual score is 80 due to unforeseen is 80.
factors.

Standard Deviation: Z Scores:

● Definition: Measures how spread out ● Definition: Indicates how far a data
data is around the average. point is from the average, helping
compare scores.
● Ex: In a class, if most test scores are
close to the average (80), there's low ● Ex: If your test score of 85 is one
standard deviation. standard deviation above the average
(Z score = 1).

Error: Variables:
● Definition: The difference between ● Definition: Discrete (separate
predicted and actual values in a values), Continuous (range of values),
model. Categorical (categories).

● Ex: The difference between predicted ● Ex: Discrete - Count of books (1, 2,
(85) and actual (80) test score is the 3), Continuous - Height (160cm,
error (5). 165cm), Categorical - Colors (Red,
Blue).

Variance: Parameters: (bs)


● Definition: A measure showing how - The numbers tell us the relationship
spread out values are. between the predictor and outcome.

● Example: Test scores of 70, 75, 80 - A model is described by parameters (bs),


have higher variance than 75, 75, 75. and all parameters are estimated. Estimated
because we're using samples since we don't
have access to entire populations.
➢ SPINE of Stats:

Important Equation: Outcome = Model + Error

Definition: Results come from a model's prediction plus


inherent mistakes or randomness.

Ex: Predicting your exam score using a study hours model


might give 85, but actual score is 80 because of unknown
errors.

❖ Parameters: The numbers tell us the relationship between the predictor (mean) and
outcome (real world).

- A model is described by parameters (bs), and all parameters are estimated. Estimated
because we're using samples since we don't have access to entire populations. We use NHST
to test significance for parameters.

- Intervals (of a confidence variety) - We construct confidence intervals around them.

● Standard Error - Tells us how well the parameter estimates the value in a population. To
compute confidence intervals/significance tests we need the standard error.

> Mean - The sum of all scores, divided by the number of scores. Also is the value from which
the squared scores deviate least (some error).

Ex: How many friend's stats teachers have is 1, 3, 4, 3, 2

Add them: 1 + 3 + 4 + 3 + 2 = 13 → Divide by the num of scores, n: 13/5 = 2.6


Predication: Avg stat prof has 3 friends, wrong he has 2 actually.

Summary: We use the mean as a model but there is always some kind of error. Errors are
important to make an assessment on the model being a good fit.

❖ Measuring the "fit" of the model":

- The mean is a model of what happens in the real world: the typical score. It is not a perfect
representation of the data.

- How do we assess how well the mean represents reality? A perfect fit: every data point is
exactly the same. All the scores are the same. Never happens.
➢ Calculating Deviation/Error:

Deviation: Difference between the mean and an actual data point. What was observed - what
the model predicts.

Ex: Black line was the predicted mean.


Black dots were what was observed.
Red line was the deviation/error.

- We can't add deviations because some are positive/negative


so we square them.

- Adding squared deviations gives us the Sum of Squared Errors


(ss). (24:13 for table)

SS Formula: SS = ∑(x - x̅)² = 5.20

➢ Variance (s²): The sum of squares measures the overall variability (fit), but is dependent
on the number of scores.

- Calculate the average variability by dividing the number of scores (n). s² = ss/(n-1)

- Because the variance is measured in units squared we can't take the square root values. It
becomes the Standard Deviation.

- Standard deviation is just a measure of whether the mean is or isn't representative of the data.

Recap: Sum of squares is the total error, variance is the average error, sd is the average error
expressed in usual units.Think of the sum of squares as how well the model represents/fits the
observed data.

➢ [Standard Error:] Tells us whether the sample estimate is representative of the


population or not. It is the standard deviation of how spread out the data is.

- Big error = too spread out, inaccurate | Small error = more likely to represent the population.
❖ Confidence Intervals:

- Express the standard error, take your estimate


(mean/regression), add standard deviations to either side, and it
gives you an interval estimate.

- Gives you a range of values, 95% of the time it contains the


population value.

- Looking at different confidence intervals from dif groups tells us


if they come from the same population or not.

❖ [Null Hypothesis Significance Testing (NHST):]

- All parameters have an associated probability distribution. We use them to determine if the
probability is likely/unlikely. We work out the prob of getting at least the value we have if a null
hypothesis of interest is true (ex: if b = 0, or b1 = b2).

- If p < .05 it is unlikely that we have the value, a significant effect.

Ex: Would having an invisibility cloak make you more mischievous? Group 1 has a cloak, group
2 does not (t-test).

- The mean difference is the same, gr.1 p value is less than .05 so it is significant. gr.2 p
value is bigger than .05 so it isn't significant.

Why? significance is dependent on the sample size, gr.1 has n=200 people, gr.2 has n=10.
- If the mean diff is 0 for both groups, p value is .02 so it is not significant. Why? the sample is
huge 1m people.

- Limitations: Can be misleading because if you have a larger sample size you can have small
effects that seem to be important and vice versa for small groups that are deemed non
significant. Use effect size instead.

❖ [Effect Size:]

- Expresses an effect in a standardized way.

Cohen's D: Difference between the means of the experimental and control group divided by the
standard deviation of the control group.

- D = x̅₁(exp) - x̅₂(con)/s(con)

- 0.2 = small, 0.8 = large

Ex: d = 8.05-5.84/2.65 = 0.83


So now with the previous examples: n=200, d=0.88 | n=10, d=0.96 | n=1m, d=0.00. Shows
that this method is less misleading than significance testing.

Recap: Effect sizes give an accurate representation, significance testing is dependent on the
sample.
=-=-=-=

Week 3: The Phoenix of Statistics

❖ [NHST and P Value:]

P-Value - Measures how likely it is that any observed difference between groups is due to
chance. Ex: Testing a new drug - if p-value is low, it suggests the drug might be effective.

● Determined by sample size. A large sample can make a small effect size
significant/small can be insignificant. There are no absolutes in NHST.

● P-value is the probability of getting a statistic at least as large as the one observed
relative to all possible values of null from an 'infinite number of identical replications' of
the experiment. They tell us nothing about the importance.

Limitation: Can be manipulated for publication bias (manipulation techniques include:


researcher degrees of freedom, p-hacking and harking). Not the best indicator of significance.

P-Hacking - Selective reporting of significant p-values, trying multiple and reporting the one with
significant yields. HARKing - Making a hypothesis after data analysis.

❖ [EMBER/SREMB:]

Effect Size (third): Measures like Cohen’s d Meta-analysis (fourth): Combines results
help understand the strength of effects. from different studies on the same topic.

Ex: One teaching method improves scores by


Ex: Combining results from various studies
5 points, another by 2. Effect size helps
compare their impact. on the effectiveness of a diet plan.

Bayesian Estimates (fifth): Combining prior Common Sense (first): Practical


knowledge with current data to refine significance matters as much as statistical
hypotheses. significance.

Ex: Combining existing knowledge with new Ex: Even with a low p-value, consider if the
data to refine conclusions about a treatment. drug's effect is practically meaningful for
patients.
❖ [Effect Sizes:]

Cohen's D: D is the difference between means


divided by the standard deviation.

● d = 0.2 (small), 0.5 (medium), 0.8 (large)

Pearson's Correlation Coefficient R: A Perfect relationship (r = -1 or 1) observed data falls


measure of the strength of a relationship between exactly on the line. Weaker relationship (r = -0.5 or
two continuous variables, or between one 0.5) the observed data is scattered around the line.
continuous variable and a categorical variable
containing two categories.

● R=-1 (perfect negative relationship), 0 (no


relationship), 1 (perfect positive relationship)

● 0.1 (small), 0.3 (medium), 0.5 (large)

Odds Ratio: The odds of an event occurring are


defined as the probability of an event occurring
divided by the probability of that event not
occurring.

Ex: The ratio of the odds of lung cancer in


smokers divided by the odds of lung cancer in
non-smokers: (647/622)/(2/27)=14.04

Why are effect sizes better than p-values?


1)They encourage interpreting effects on a continuum, 2) interpretation is not determined by
sample size, and 3) researcher degrees of freedom exist but are less likely because effect sizes
are not a good/bad category of results.

=-=-=-=
Week 4: Data and Graphs

❖ Graphs that Summarize:

Line Chart: Shows trends over time with


connected data points.

● Usually shows the mean score for


different groups connected by a line.

● Ex: Tracking weight change over a


year.

Bar Chart: Uses bars to compare different


categories.

● Usually shows the mean score for


different groups as bars. Can be
stacked, simple, 3-D, or clustered.

● Ex: Comparing the number of books


read by different people.

Boxplot: Highlights a dataset's distribution,


including outliers. (box-whisker plot)

● Shows the median, range, IQR, and


upper and lower quartile

● Ex: Representing test scores


distribution, identifying outliers.

Error Bars: Displays uncertainty or variability


in data points.

● Adding error bars to bar/line charts


show the confidence interval of each
mean.

● Ex: Showing the range of possible


values for each data point.
❖ [Showing Different Groups:]

Colour - Many graphs distinguish groups using different colour | Population Pyramid - A
histogram split by group.

Population Pyramids: Graphical age and


gender distribution representation.

● Compares frequency distributions of


several groups simultaneously,

● Ex: Illustrating the distribution of age


and gender in a city.

Histogram: Visualizes the distribution of a


dataset.

● Plots each score against its frequency

● Ex: Visualizing the distribution of


heights in a group of people.

Scatterplot: Plots individual data points in


two dimensions.

● Plots scores on one variable against


scores on another.Shows the
relationship between two variables.

● Slope of dots --> the direction of the


relationship between variables.
Amount of scatter --> the strength of
correlation

● Ex: Plotting study hours against exam


scores for each student.

=-=-=-=
Week 5: Beast of Bias

Assumptions of Parametric Tests: Tests assume Central Limit Theorem: With a big enough sample,
linearity, normality, and equal variance. averages tend to be normally distributed.

● Ex: Testing a new teaching method - ● Ex: Asking a few friends about their favorite
assuming linearity, normality, and equal color might vary, but asking many gives a
variance. more consistent average.

Residuals & Outliers: Differences between Testing for Normality and Homogeneity:
observed and predicted values; outliers can skew
results. ● Definition: Tests like K-S and Levene's check
for normality and equal variance.
● Ex: Predicting exam scores - the difference
between predicted and actual scores are ● Ex: Using tests like K-S and Levene's to
residuals; someone who studied check if data behaves well with statistical
exceptionally hard might be an outlier. methods.

Skewness and Kurtosis: Measures of asymmetry Bootstrapping: A technique to estimate the


and tail heaviness in a distribution. distribution of a statistic.

● Ex: Skewness - how asymmetrical your ● Ex: Estimating the average height of people in
data is (like tall people in a room). Kurtosis your town by asking a smaller group multiple
- how thick or thin the tails of your data are. times and averaging the results.

❖ [The General Linear Model (GLM):] Outcome = Model + Error

Linearity and Additivity - The relationship between predictor(s) and outcome is linear. Curved
data is not linear.

bn - Estimate of parameter for a predictor, n. Direction/strength of relationship/effect, difference


in means

b0 - Estimate of the value of the outcome when predictor(s) = 0 (intercept)


-
❖ [Errors vs. Residuals:]

Errors: Difference between predicted values and observed values of the outcome variable in
the population model. Values CANNOT be observed.

Residuals: Differences between predicted values and observed


values of the outcome variable in the sample model. CAN be
observed, represents population errors.
Spherical Errors: The population model should have; homoscedastic errors and independant
to inspect the model residuals.

Normality of Something-or-Other: Population model errors, sampling distribution.

-
❖ [Outliers:]

Influential Cases - When the outlier is present the b is positive, when it is removed it is a 0 b.

➢ Detecting Outliers and Influential Cases:

Graphs: Scatterplots (less helpful with several predictors),


and histograms.

Standardized Residuals (>3): The difference between what


the model predicts and the score observed.

- Average sample: 95% of sr are between ±2


- 99% of sr should lie between ±2.5
- If the absolute value of the sr is >3, it is an outlier.

Quicker way is using Cook's Distance: Measures the influence of a single case on the model
as a whole. Absolute values >1 may be a concern.

DF Beta (unstandardized/standardized): The change


in b when a case is removed, be wary of standardized
values with absolute values >1

Ex: significant effect, as num of dragons increases so


does the sheep being eaten. Can't remove outliers
usually.
-
❖ [Normal Distributions:]

1) The population needs to be normally distributed, 2) the sampling distribution of the parameter
being measured needs to be normal (mean or b), 3) for parameters to be optimal, the error
(residuals) needs to be normal.

Assumptions underlying the use of parametric tests (based on the normal distribution):
Some features of the data should be normally distributed. The samples being tested should
have approximately equal variances. The data should be at least interval level.

❖ [Central Limit Theorem:] Because of CLT, if the sample is large enough you can
ignore the assumption of normality.
➢ What Normality Affects:

● Significance testing due to the sampling distribution if the sample size is small.
● The residuals/errors are not normal.
● Though we need population distributions to be sure of normality, we use sample distributions
because of limited access to population data. This is why we rely on CLT.

➢ Detecting Normality:

Graphs: Boxplots, histograms, Q-Q plots or P-P plots.

Numbers: Skew and Kurtosis, these values are 0 in a normal distribution.

SPSS Test: Kolmogrov-Smirnov (K-S test). P < .05 is sig different to normal, P>.05 normal
distribution.

➢ Normality Tests:

K-S Test: Can be used (but shouldn’t be) to see if a


distribution of scores significantly differs from a
normal distribution.

● In the SPSS table <.05 scores are significantly


different from a normal distribution. Otherwise,
scores are normally distributed.

Limitation: In large samples these tests can be


significant even when the scores are only slightly
different from a normal distribution.

Skewness and Kurtosis: To check that the


distribution of scores is normal, look at the values of
skewness and kurtosis in the output.

● Positive values of kurtosis indicate a heavy-tailed


distribution (low dist score), negative values
indicate a light-tailed distribution (high dist score).

● The further the value is from zero, the more likely


it is that the data are not normally distributed.

You can convert these scores to z-scores by dividing


by their standard error. If the resulting score (when
you ignore the minus sign) is greater than 1.96 then
it is significant (p < 0.05).

Limitation: Significance tests of skew and kurtosis


should not be used in large samples (because they
are likely to be significant even when skew and
kurtosis are not too different from normal).
❖ [Homogeneity of Variance (homoscedasticity):]

Homogeneity: Assumption that the spread of outcome scores is roughly equal at different
points on the predictor variable.

➢ Detecting Homogeneity:

Graphs: Scatterplots, box plots, error


bars (SDs), residuals vs. predicted
values.

Numbers: Variance ratio (divide the


biggest variance by the smallest), if it
is <2 it's fine.

SPSS Test: Lavene's test (validity


depends on sample size). P<.05 sig
variance is unequal, P>.05 variance is
equal. Bootstrapping.

Limitations: There are good reasons not to use Levene’s test or the variance ratio. In large
samples they can be significant when group variances are similar, and in small samples they
can be non-significant when group variances are very different.

➢ Bootstrapping: A technique to estimate the distribution of a statistic. Resamples a


single data set to create many simulated samples.

Ex: Estimating the average height of people in your town by asking a smaller group multiple
times and averaging the results.

➢ Heteroscedasticity: Refers to data with unequal variability (scatter) across a set of


second, predictor variables. Tends to be cone-shaped.

Limitation: While heteroscedasticity does not cause bias in the coefficient estimates, it does
make them less precise.

=-=-=-=

Week 6: Parametric & Non-Parametric Tests

❖ [Parametric Tests:]

Paired T-Test (one group): Compares the T-Test: Hypothesis testing, compares
means of two measurements taken from the variance, compares means, >5 is not
same individual, object, or related units. significant so means are the same.
Unpaired T-Test (two groups): Compares One Simple Test: Determines if an unknown
the averages/means of two independent or population mean is different from a specific
unrelated groups to determine if there is a value. Ex: Find out if the screws your
significant difference between the two. company produces really weigh 10 grams on
average. To test this, weigh 50 screws and
compare the actual weight with the weight
they should have (10 grams).

One way ANOVA (greater than two): Independent sample: Comparison of means
between two independent groups, with a
Analysis of variants. paired t test for paired data. As the t test is a
parametric test, samples should meet certain
Compares the means of two or more preconditions, such as normality, equal
independent groups to determine if there is variances and independence.
evidence that the population means are
significantly different. Ex: Determine if the Mind Over Matter coping
strategy was more effective at reducing
anxiety than deep breathing exercises

Pearson's Coefficient: Evaluates if there is Paired Sample: Comparing one group to


evidence for a linear correlation among the itself, for example ratio of same male/female
same pairs of variables in the population. students over the years.

❖ [Non-Parametric Tests:]

Wilcoxon, Mann-Whitney, Kruskal-Wallis, Spearman’s:


● Definition: Alternatives to parametric tests, useful when assumptions aren't met.
● Ex: Comparing test scores of two groups when data doesn’t quite follow a normal
distribution.

Maan-Witney Test: Compares two sample means 2 independent samples with at least ordinal
from the same population, tests whether two sample scaled characteristics need to be available.
means are equal or not. Variables do not have to satisfy any
distribution curve.
● The values of the mean ranks tell you how
the group differs (highest scores have the
highest mean rank).

Reports: U-statistic, the z, the significance value,


the medians and their corresponding ranges (box
plot). Also, calculate effect size.
➢ Mann Whitney/Wilcoxon Hypothesis:

Null Hypothesis: No difference (central tendency) between the two groups in the population.

Alternative Hypothesis: There is a difference (central tendency) between two groups in the
population.

➢ Wilcoxon Ranked Test:

Wilcoxon Signed Rank Test: Compares two Looking at a histogram number of


conditions when the scores are related (same positive/negative differences tells you how the
participants) and the resulting data have unusual group differs (greater number in a particular
cases or violate assumptions. direction tells you the direction of the result).

If <0.05 the 2 conditions are significantly different.

Report: T-statistic, corresponding z, exact


significance value and effect size. Report the
medians and their ranges (boxplot).

➢ Assumptions of Wilcoxon Rank Test:


❖ [Kruskal-Wallis H-Test:]

Kruskal-Wallis H-Test: Compares several If you predict that the medians will increase or
conditions when different participants take part in decrease across your groups in a specific
each condition and the resulting data have order then test this with the
unusual cases or violate any assumption. Jonckheere–Terpstra test.

A value <0.05 means the groups are significantly


different.

Pairwise comparisons - Compare all possible


pairs of groups with a p-value that is corrected so
that the error rate across all tests remains at 5%.

Report: H-statistic, the degrees of freedom and


the significance value for the main analysis. For
any follow-up tests, report an effect size, the
corresponding z and the significance value. Also
report the medians and their corresponding
ranges (or draw a boxplot).

➢ Assumption:

● One independent variable with two or more levels (independent groups). Test is used when you
have three or more levels.

● Ordinal scale, ratio scale, or interval scale dependant variables.

● Observations should be independent. There should be no relationship between the members in


each group/between groups.

● All groups should have the same shape distribution, SPSS tests this condition.

➢ Hypothesis:

H0 (null): Population medias are equal

H1 (alternative): Population medians are not equal

● Kruskal Wallis test will tell you if there is a significant difference between groups. It won’t tell you
WHICH groups are different, that needs a Post Hoc test.
❖ [Friedman’s ANOVA:]

Friedman’s ANOVA: Compares several


conditions when the data are related (same
participants in each condition) and the
resulting data have unsual cases/violate
assumptions.

● <0.05 means significantly different

● Can follow up main analysis with


pairwise comparisons ""

Report: x2 statistic, degrees of freedom and


the significance value for the main analysis.
For follow-up tests, effect size. z, and
significant value. Report medians and their
ranges (boxplot).

➢ Hypothesis:

Null hypothesis: There is no significant difference between the dependent groups.

Alternative: There is a significant difference between the dependent groups.

❖ [Spearman’s Coefficient:]

Spearman's Coefficient: Measures the


strength and direction of association between
two ranked variables.

● Assess how well the relationship


between two variables can be
described using monotonic function.

● 10 is the minimum number needed in


a sample for the spearman’s ranked
test to be valid.

.
When do we use a non-parametric
test? If the median more
accurately represents the center
of distribution in your data, even if
the sample size is large

=-=-=-=
Week 7: Correlation and Chi-Squares
Kendall’s Correlation:

● Definition: Measures strength and direction of ordinal associations.

● Ex: Ranking friends based on their helpfulness and seeing how similar your rankings
are.

Partial and Semipartial Correlations:

● Definition: Considers relationships while controlling for other variables.

● Ex: Exploring how both study hours and sleep affect exam scores while considering
other factors.

Chi-Square: Used to show whether or not there is a relationship between two categorical
variables. It can also be used to test if a number of outcomes are occurring in equal frequencies
or not, or conform to a known distribution.

● Ex: When rolling a die, there are six possible outcomes. After rolling a die hundreds of
times, you could tabulate the number of times each outcome occurred and use the
chi-square statistic to test whether these outcomes were occurring in basically equal
frequencies or not (e.g., to test whether the die is weighted).

When to Use Different Tests: Choose Pearson, Spearman, Kendall, or Chi-square based on
data characteristics.

● Ex: Using Pearson for regular relationships, Spearman for non-linear, Kendall for
rankings, and Chi-square for independence between categories.

❖ [Correlation:]

Define: A crude measure of the relationship between variables is the covariance.

● If we standardize this value we get Pearson’s correlation coefficient, r.

● The correlation coefficient has to lie between −1 and +1.

● A coefficient of +1 indicates a perfect positive relationship, a coefficient of −1 indicates a


perfect negative relationship, and a coefficient of 0 indicates no linear relationship.

The correlation coefficient is a commonly used measure of the size of an effect: values of ±0.1
represent a small effect, ±0.3 is a medium effect and ±0.5 is a large effect. However, interpret
the size of correlation within the context of the research you’ve done rather than blindly following
these benchmarks.

❖ [Correlations:]
Spearman’s correlation coefficient, rs - A non-parametric statistic and requires only ordinal
data for both variables.

Kendall’s correlation coefficient, τ - Like Spearman’s rs but probably better for small samples.

The point-biserial correlation coefficient, rpb - Quantifies the relationship between a


continuous variable and a variable that is a discrete dichotomy (ex: there is no continuum
underlying the two categories, such as dead or alive).

The biserial correlation coefficient, rb - Quantifies the relationship between a continuous


variable and a variable that is a continuous dichotomy (ex: there is a continuum underlying the
two categories, such as passing or failing an exam).

❖ [Partial and Semipartial Correlations:]

Partial correlation - Quantifies the relationship between two variables while accounting for the
effects of a third variable on both variables in the original correlation.

Semi-partial correlation - Quantifies the relationship between two variables while accounting
for the effects of a third variable on only one of the variables in the original correlation.

Chi-Square is greater than 2 groups, n > 30, >5 in any cell

=-=-=-=
Week 8: The Linear Model

General Definition: Describes the linear relationship between variables.

● Ex: Predicting weight based on height using a straight line that best fits the data.

Line of Best Fit: Represents the best-fitting line in a scatterplot.

● Ex: Plotting heights and weights and drawing a line that represents the average
relationship.

Model Parameters, Fit, and Multicollinearity: Parameters are intercept and slope; fit is
measured by R-squared and F-test.

● Ex: Parameters are like the slope and intercept of your line; R-squared tells you how
well your line fits the data. Be cautious with multicollinearity, where factors are too
related, making it hard to figure out each one's effect.

❖ [Linear Models (Regression):]

Define: A way of predicting values of one variable from another based on a model that
describes a straight line. This line is the line that best summarizes the pattern of the data.
➢ To assess how well the model fits the data use:

R2 - Tells us how much variance is explained by the model compared to how much variance
there is to explain in the first place. It is the proportion of variance in the outcome variable that is
shared by the predictor variable.

F, - Tells us how much variability the model can explain relative to how much it can’t explain (i.e.,
it’s the ratio of how good the model is compared to how bad it is).

The b-value - Tells us the gradient of the regression line and the strength of the relationship
between a predictor and the outcome variable. If it is significant (Sig. < 0.05 in the SPSS output)
then the predictor variable significantly predicts the outcome variable.

❖ [Descriptive Statistics:]

Use the descriptive statistics to check the correlation matrix for multicollinearity; that is,
predictors that correlate too highly with each other, r > 0.9.

❖ [The Model Summary:]

The fit of the linear model can be assessed using the Model Summary and ANOVA tables from
SPSS.

R2 tells you the proportion of variance explained by the model.

If you have done a hierarchical regression, assess the improvement of the model at each stage
by looking at the change in R2 and whether it is significant (values less than 0.05 in the column
labeled Sig. F Change).

The F-test tells us whether the model is a significant fit to the data overall (look for values less
than 0.05 in the column labelled Sig.).

❖ [Coefficients:]

The individual contribution of variables to the regression model can be found in the Coefficients
table. If you have done a hierarchical regression then look at the values for the final model.

You can see whether each predictor variable has made a significant contribution to predicting
the outcome by looking at the column labelled Sig. (values less than 0.05 are significant).

The standardized beta values tell you the importance of each predictor (bigger absolute value =
more important).

The tolerance and VIF values will also come in handy later, so make a note of them.

❖ [Multicollinearity:]

To check for multicollinearity, use the VIF values from the table labelled Coefficients.
If these values are less than 10 then that indicates there probably isn’t cause for concern.

If you take the average of VIF values, and it is not substantially greater than 1, then there’s also
no cause for concern.

❖ [Residuals:]

● Look for cases that might be influencing the model.

● Look at standardized residuals and check that no more than 5% of cases have absolute values
above 2, and that no more than about 1% have absolute values above 2.5. Any case with a
value above about 3 could be an outlier.

● Look in the data editor for the values of Cook’s distance: any value above 1 indicates a case
that might be influencing the model.

Calculate the average leverage and look for values greater than twice or three times this
average value.

> Mahalanobis distance- A crude check is to look for values above 25 in large samples (500)
and values above15 in smaller samples (100). However, Barnett and Lewis (1978) should be
consulted for more refined guidelines.

Look for absolute values of DFBeta greater than 1.

Calculate the upper and lower limit of acceptable values for the covariance ratio, CVR. Cases
that have a CVR that fall outside these limits may be problematic.

❖ [Model Assumptions:]

Look at the graph of ZRESID* plotted against ZPRED*. If it looks like a random array of dots
then this is good. If the dots get more or less spread out over the graph (look like a funnel) then
the assumption of homogeneity of variance is probably unrealistic. If the dots have a pattern to
them (i.e., a curved shape) then the assumption of linearity is probably not true. If the dots seem
to have a pattern and are more spread out at some points on the plot than others then this could
reflect violations of both homogeneity of variance and linearity. Any of these scenarios puts the
validity of your model into question. Repeat the above for all partial plots too.

Look at the histogram and P-P plot. If the histogram looks like a normal distribution (and the P-P
plot looks like a diagonal line), then all is well. If the histogram looks non-normal and the P-P
plot looks like a wiggly snake curving around a diagonal line then things are less good. Be
warned, though: distributions can look very non-normal in small samples even when they are
normal.

You might also like