Class Notes
Class Notes
Data set being used for class: Dr. Mousseau’s dissertation data set. Identity
development and belief in God.
Class Four
Lecture:
o Normal distribution: when mean, median, and mode are all the same (bell curve).
Distribution of values are typical. “Most people are this way, but some people are that
way”.
o Most people are within 3 standard deviations of the means, however many people like
to go by 2 standard deviations because that still covers 95% of the population.
Inferential Statistics
P value (probability value): An indicator used to determine whether your findings are observed
by chance or because your hypothesis is actually supported. It has to be below a certain
threshold for you to be able to reject the null hypothesis. “The probability that a researcher
would obtain the results as extreme as they did if the null hypothesis were true” OR the
probability that you would get the results that you got, if you were wrong. Often below 0.05 (or
5% chance that I would get these results even if my hypothesis were wrong).
When we say that something is significantly different and then report p < 0.05, we are stating
that we are at least 95% sure (or more) that the sample mean and the mean we are testing it
against (either another sample or the general population) are not the same.
Degrees of Freedom (df): the number of degrees of freedom for a collection of sample data is
the number of sample values that can vary after certain restrictions have been imposed on all
data values. OR in inferential stats, we are using data points or observations (N=number of
these points) to estimate parameters (Educated estimates about populations) for variables.
-We may use one or more pieces of information to estimate our parameter (mean, SD).
-Every time we use a piece of info to estimate a parameter, that piece of info cannot
change or vary because we are using it.
-We are left with df, or the number of pieces of information we have to estimate a
population value.
-You lose a degree of freedom or number of degrees of freedom every time you look at
a specific piece of information, by imposing restriction on your model. Degrees of
freedom refer to what is LEFT after running all your analyses.
-Very low degrees of freedom is a red flag in papers
Hypothesis Testing…
As statisticians we are always estimating the probability that our findings would be observed if
we were wrong. We test whether a relationship or difference TRULY exists or if it’s a function of
“random noise” (or chance) from the sample.
One-Tailed/Directional Test: Make a prediction of the direction in which the individual will differ
from the mean. The rejection region is located in only one tail of the distribution.
Two-Tailed/Nondirectional Test: Reject extremes in both tails (because we did not make a
statement on the hypothesis about which way it would go).
Error…
-Type 1 Error: The null hypothesis was true but we rejected it and accepted our
hypothesis instead. We said we were right when we were actually wrong. (alpha
statistic, so if alpha is .15, we have a 15% chance of making a type 1 error, or saying our
hypothesis was true when it was not). We set our own alpha level, it’s the same as the
p-value we’re using.
-Type 2 Error: We didn’t reject the null but our hypothesis was supported. We were
right but we didn’t say we were right. You miss a significant finding because you thought
you were wrong. We refer to the probability of making a type II error as B (fancy looking
B). the POWER of a test is 1- the fancy looking B (or the probability of making a type II
error). Our ability to recognize significant changes.
P-Value/Significance Review…
P-value= the probability that we would see our results even if we were wrong
Tests of Differences…
Effect Sizes: a measure of the strength of an effect (how well does the intervention work).
Hypothesis testing only tells us that there is a “significant” difference or relationship. It does not
tell us the strength, or magnitude, of this effect.
All tests of statistical significance involves a comparison between an observed value (obtained
via your study) and the expected value.
Every type of test rests on particular assumptions about your data (that it’s evenly distributed,
normal, etc.) Anytime you do a test, check your assumptions when you select a test.
Chi-Square Tests:
-Chi-Square is a statistical test commonly used to compare observed data with data we would
expect to obtain according to a specific hypothesis.
-Chi-Square is the sum of the squared difference between observed (o) and the expected ( e )
data (or the deviation, d) divided by the expected data in all possible categories.
-Based on frequencies and variables being independent on one another (on does not affect the
other)
-One-Sample (Goodness of Fit): Explores the proportion of cases that fall into the
various categories of a single variable and compares these with hypothesized values.
We are always comparing observed (from sample) and expected (based on general population
stats).
Check your p-value in chi-square to determine if there is a significant different, then look at chi-
square value to speak to how.
Parametric are for continuous data, non-parametric are for categorical data.
Parametric:
Interval or ratio data
Large sample size
Assumptions:
All observations are independent of other observations
Data are normally distributed
The variances I the different treatment groups are the same
(Homoscedasticity/homogeneity of variances)
Nonparametric:
Nominal or ordinal data
Small sample size
Assumptions:
Independence of observations (one observation does not influence another
observation)
Do not have underlying assumptions about the shape or parameters of the
underlying population distribution.
Drawbacks:
Not as powerful as parametric tests (if the data are normally distributed)
Not as easily interpretable (less meaning in the interpretation)
Less precision
Mostly limited to univariate and bivariate analyses
The T-TEST
Compares differences between two groups. Has one IV (discrete variable with two levels), and
one DV (interval or ratio variable, or continuous). Looks at difference, variability and the sample
size (bigger sample size = less chance for error due to chance)
To do a T-Test in Stats
>analyze
>compare means
>independent samples t-test
Classes 10/11
Factorial notation: count the number of independent variables and then put levels of each IV as
the ____ x ______
SO, if there are four independent variables we would have four blanks with x in between:
__x__x__x__
THEN we fill in the blanks with the levels of each IV. So if one IV is grade and we’re looking at
four grades, the number in the blank would be 4 because we are looking at four different grade
levels: 4x___x___x___. THEN just fill in the rest of the blanks with what each IV has for levels
and you would get (made up rest of numbers): 4x2x3x2 is what it would look like, four numbers
because there are four IVs and then the numbers in the actual notation are the number of
levels in each IV.
STATISTICAL ASSUMPTIONS:
Assumptions: allow us to do a certain statistical test BECAUSE the data is appropriate for it (ex:
data is normally distributed, equal variance, etc.)
T-Test ASSUMPTIONS:
• Assumptions related to in/dependence of your groups
• Assess independence to determine independent or dependent samples t-Test
ANOVA ASSUMPTIONS:
• Normality (raw scores and residuals)
• Homoscedasticity (i.e., equal variances in each group)
Before reporting results for your test (i.e., t-Test, ANOVA, correlation), briefly state that you
checked the assumptions for the (non)parametric test and list the assumption(s).
IF YOUR DATA IS NOT NORMALLY DISTRIBUTED: you would know this by checking (through the
shapiro wilkes’ test), and you would need to do a transformation in SPSS that will allow you to
run your statistical anaylsis. (typically we just raise the variable to the half power or logx). You
can do this in SPSS by choosing “natural log transformation” (which is just logx) when you’re
requesting the Q_Q plot.
THEN, you need to make a new variable label to have for the transformed data. Go to
“transform, compute variable”, choose your variable and put in in parenthesis, then put lg10
(for logx) before it.
To exclude a level of data, go to “date” “select cases” then make a new rule for “if condition is
satisfied” (second option). Then choose your variable, put NOT and the variable in parenthesis
then the variable number (that was labeled) that you’re removing.
If your Shapiro Wilkes’ test ‘Sig’ is significant (less than 0.05) that means your data is not
normally distributed. For the purpose of this class, if it’s close (somewhere around .05, you can
assume normality).
Tests of Association
THE CORRELATION
Looks at the magnitude and direction of LINEAR relationships.
– Correlation coefficient is a number that offers a standardized way of expressing
the magnitude and direction of relationships.
– Correlation is basically just a line of best fit when you simplify it.
– Based on the following:
■ Sample Size
■ Raw Scores
■ Mean
■ Standard Deviation
CORRELATION ASSUMPTIONS
• Normality: X and Y are normally distributed
• Homoscedasticity: The variance of residual is the same for any value of X or Y.
• Linearity: The relationship between X and Y is linear
• Outliers: there are no significant outliers in the data for X by Y (you can remove
outliers though to be able to do a Correlation)
■ PEARSON’S r
■ The correlation coefficient used for parametric data.
– A measure of the extent to which paired scores occupy the same or opposite
positions within their distributions.
– The degree to which 2 variables overlap.
IN SPSS
ANALYZE
>REGRESSION
>LINEAR
DEP: what you’re looking to predict
IND: what you think will predict it or affect it
>PLOTS
Y:dep
X: “zpred”
INTERPRETATION OF TABLES
Adjusted r square = what it predicts for in percentage of variance