0% found this document useful (0 votes)
34 views

Class Notes

This document contains class notes from a quantitative analysis course. It outlines administrative details like course goals, assignments, and projects. It also summarizes key concepts that were covered in class like variables, scales of measurement, parameters, statistics, outliers, normal distributions, and inferential statistics concepts like p-values and degrees of freedom. Students were instructed on using SPSS software and provided a data set to work with. Key topics to be covered in future classes were also noted.

Uploaded by

katie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Class Notes

This document contains class notes from a quantitative analysis course. It outlines administrative details like course goals, assignments, and projects. It also summarizes key concepts that were covered in class like variables, scales of measurement, parameters, statistics, outliers, normal distributions, and inferential statistics concepts like p-values and degrees of freedom. Students were instructed on using SPSS software and provided a data set to work with. Key topics to be covered in future classes were also noted.

Uploaded by

katie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Class Notes

ED881: Quantitative Analysis

Class One: 1/25/21


Administrative Time:
o Class will be online for the first few weeks until COVID is a little more stable.
o Course goals: This is the stats that will be on the EPPP. Take notes and make flashcards
because this is a huge chunk of information tested on the EPPP (and comps).
o We will learn to be critical consumers of research, developers of research, and
interpreters/explainers of research.
o IAT- Take sometime this week.
o Quizzes due each unit (not in first unit)
o Three major projects
o Quizzes: normally due Sat at midnight (11:59)
o Completed via canvas
o Multiple choice/short answer
o Can use notes, textbooks, other materials
o No consulting with classmates
o 60 minutes to complete quiz, two attempts with highest grade kept
o Drafts: normally due Sundays at midnight (11:59) (worth points)
o Methods and results section: (okay to consult with each other)
o Developing own research question
o Running stats in SPSS
o Interpret results
o Writing full methods/results section of a research paper
o Drafts will be due for each stage along the way and we will receive feedback to
make improvements
o Final submission due 4/21
o Independent Quantitative Research Study:
o Develop research question based on literature (good idea to relate it to your
dissertation topic) Feel free to use questions developed in ed864, just tell dr.
neves so turn-it-in doesn’t flag you.
o Describe how you would explore question
o No stats analysis involved
o Need to provide empirically based rationale for methods
o 10-minute presentation of study due last day of class
o Draft due for 4/11 (feedback only, not graded)
o Final submission due 5/5
o Individual meetings will occur throughout the course on class days to further develop
studies/receive feedback.
o Independent workday will occur in March (29th) to catch up/discuss questions as a
group.
o Canvas Review:
o At top are two major projects for the semester.
o Modules will be posted every week.
o Peer Support Discussion is not graded but can be used for support/resource
sharing (use it!!)
o SPSS
o Statistics package, popular in field, used for data analysis
o .sav means it is an SPSS file and you need SPSS to open it
o We can access SPSS through the remote desktop. We can also rent it (but it will
not be free)
o SPSS output:
 You will be required to provide statistical output (code log of everything
you’ve done in SPSS) with some assignments
 Please provide these in word or PDF format (file>save as PDF,
copy+paste) You may need to export instead of save as
o Under IBM SPSS Statistics on the remote desktop.
o It can take a while to load and if it’s a really big file, just wait- it’s not frozen.
o IF you’re on your RIv desktop, be sure you’re saving it to your student drive
(under your name). IF it is not saved under your name, it will get deleted and you
will lose your work.
o Always use the wording Dr. Neves provides in class to report data, you can add follow
up sentences but do not deviate from what she tells us (no need to cite that, it is widely
accepted)
Lecture:
o Variables: a property of an object or event that can take on different values.
o Think: What are we capturing/measuring?
o Instruments: any tool used to measure and yield measurement data.
o Think: How will we define and measure our variable?
o Scale: A conceptualization of the measurement data (numbers) we collect.
o Think: How will we make meaning of our measurement data?
o Independent Variable (IC): The variable we control and examine its influence on the
dependent variable (predictor variable)
o Dependent Variable: The variable we measure to see the effects of the independent
variable (criterion/outcome variable)
o Discrete Variables: variables that have well-defined finite sets of possible values and
do not have an inherent order (categorical variables)
o Scales of Measurement: (NOIR)
o N-Nominal
o O-Ordinal
o I-Interval
o R-Ratio
o Nominal Scales: different groups can only be categorized based on names/labels.
Frequencies are the only numbers that can be used to described variables measured
with this scale. You can only talk about values on this, you can’t go further without
more information.
o Ordinal Scales: items are classified according to whether they have more or less of a
characteristic. ORDER MATTERS. The order matters, but not the difference between
variables (like rank: 1st place, 2nd place, 3rd place). Relative quantity (greater, faster)
comparisons. Example (how would you rate the service of our wait staff, how much
pain are you in on 1-10, how many stars would you give this movie.
o Interval scales: equal units along interval (think like number line) difference between
two units is equal. Represents the same magnitude on the trait or characteristic
being measured across the whole range of the scale. We can say if something is
greater by the number of units but not by relatively how much because there is no
absolute zero point. Because of that, it’s not possible to make statements
comparing how many times more one thing is another (“I’m twice as happy as you
are, etc.) but you can say how many points greater one thing is over another.
Example: use the scale below to rate these questions: how much do you love
statistics. Likert Scales are examples of intervals (although it is sometimes argued
that they should be ordinal).
o Ratio Scales: Measurement that enables both precise and relative units of
comparison due to the availability of an absolute zero point. Absolute ‘zero’ point is
necessary for relative difference comparisons (meaning there must be an absence of
that unit. Examples: age, income, years of participation.

Data set being used for class: Dr. Mousseau’s dissertation data set. Identity
development and belief in God.

For next week:


o Get on SPSS and look at variables you see in the data set provided.
o Play around with SPSS
Class Three: 2/8/21
o In this class we are classifying valid psychological scales as Interval scales!!!
o Parameter: measurable quality of a population, or a numerical summary characteristic
of a population.
o Statistic: measurable quality of a sample, or a numerical summary characteristic of a
sample.
o Reports in a population are a true representation of opinion while reports in a sample
have a margin of error and confidence interval.
Mode: most frequently occurring value
Median: exact midpoint of all the values

Outliers: more than 3 standard deviations of the mean (typically)

z-score: standard deviations


Direction of skew is linked to which side your mean is being pulled away from the mode

“The tale is in the tail”


Dispersion: The extent to which observed values are “spread out” around the center

Class Four
Lecture:
o Normal distribution: when mean, median, and mode are all the same (bell curve).
Distribution of values are typical. “Most people are this way, but some people are that
way”.
o Most people are within 3 standard deviations of the means, however many people like
to go by 2 standard deviations because that still covers 95% of the population.

Inferential Statistics

P value (probability value): An indicator used to determine whether your findings are observed
by chance or because your hypothesis is actually supported. It has to be below a certain
threshold for you to be able to reject the null hypothesis. “The probability that a researcher
would obtain the results as extreme as they did if the null hypothesis were true” OR the
probability that you would get the results that you got, if you were wrong. Often below 0.05 (or
5% chance that I would get these results even if my hypothesis were wrong).

When we say that something is significantly different and then report p < 0.05, we are stating
that we are at least 95% sure (or more) that the sample mean and the mean we are testing it
against (either another sample or the general population) are not the same.

Degrees of Freedom (df): the number of degrees of freedom for a collection of sample data is
the number of sample values that can vary after certain restrictions have been imposed on all
data values. OR in inferential stats, we are using data points or observations (N=number of
these points) to estimate parameters (Educated estimates about populations) for variables.
-We may use one or more pieces of information to estimate our parameter (mean, SD).
-Every time we use a piece of info to estimate a parameter, that piece of info cannot
change or vary because we are using it.
-We are left with df, or the number of pieces of information we have to estimate a
population value.
-You lose a degree of freedom or number of degrees of freedom every time you look at
a specific piece of information, by imposing restriction on your model. Degrees of
freedom refer to what is LEFT after running all your analyses.
-Very low degrees of freedom is a red flag in papers

Hypothesis Testing…
As statisticians we are always estimating the probability that our findings would be observed if
we were wrong. We test whether a relationship or difference TRULY exists or if it’s a function of
“random noise” (or chance) from the sample.
One-Tailed/Directional Test: Make a prediction of the direction in which the individual will differ
from the mean. The rejection region is located in only one tail of the distribution.
Two-Tailed/Nondirectional Test: Reject extremes in both tails (because we did not make a
statement on the hypothesis about which way it would go).

Error…

-Type 1 Error: The null hypothesis was true but we rejected it and accepted our
hypothesis instead. We said we were right when we were actually wrong. (alpha
statistic, so if alpha is .15, we have a 15% chance of making a type 1 error, or saying our
hypothesis was true when it was not). We set our own alpha level, it’s the same as the
p-value we’re using.
-Type 2 Error: We didn’t reject the null but our hypothesis was supported. We were
right but we didn’t say we were right. You miss a significant finding because you thought
you were wrong. We refer to the probability of making a type II error as B (fancy looking
B). the POWER of a test is 1- the fancy looking B (or the probability of making a type II
error). Our ability to recognize significant changes.

Testing Hypotheses: T-Tests and Anova Testing

Over the next week…


-Try to do t-test/anova testing for hypothesis for in-class research question
-Start working on lit review for theoretical research paper

P-Value/Significance Review…
P-value= the probability that we would see our results even if we were wrong

Tests of Differences…

Effect Sizes: a measure of the strength of an effect (how well does the intervention work).
Hypothesis testing only tells us that there is a “significant” difference or relationship. It does not
tell us the strength, or magnitude, of this effect.

Types of effect sizes:


-standardized measures (like Pearson’s r, Cohen’s d, and odds ratio)
-unstandardized measures of effect (e.g., the raw difference between group means and
unstandardized regression coefficients.
Associations (Pearson’s r and r2)
- r = 0.1-0.23, small effect
- r = 0.24-0.36, medium effect
- r = >0.37, large effect
Differences (Cohen’s d)
- d = 0.2, small effect
- d = 0.5, medium effect
- d = 0.8, large effect

More Inferential Statistics


Inferential statistics allow you to determine the likelihood of your findings being the result of
random chance.
-a statistically significant finding at the .05 level indicates that there might be a real
effect.

All tests of statistical significance involves a comparison between an observed value (obtained
via your study) and the expected value.

Every type of test rests on particular assumptions about your data (that it’s evenly distributed,
normal, etc.) Anytime you do a test, check your assumptions when you select a test.

Chi-Square Tests:
-Chi-Square is a statistical test commonly used to compare observed data with data we would
expect to obtain according to a specific hypothesis.
-Chi-Square is the sum of the squared difference between observed (o) and the expected ( e )
data (or the deviation, d) divided by the expected data in all possible categories.
-Based on frequencies and variables being independent on one another (on does not affect the
other)

-One-Sample (Goodness of Fit): Explores the proportion of cases that fall into the
various categories of a single variable and compares these with hypothesized values.

-Two-Sample (Test for Independence): Used to determine whether two categorical


variables are related. It compares the frequency of cases found in the various categories
of one variable across the different categories of one another.

We are always comparing observed (from sample) and expected (based on general population
stats).

Chi square calculator is in module 7/8 in Canvas!!!!


Degrees of freedom on chi square is number of rows minus one times number of columns
minus one.

Check your p-value in chi-square to determine if there is a significant different, then look at chi-
square value to speak to how.

PARAMETRIC VERSUS NON PARAMETRIC TESTS

Parametric are for continuous data, non-parametric are for categorical data.

Parametric:
Interval or ratio data
Large sample size
Assumptions:
All observations are independent of other observations
Data are normally distributed
The variances I the different treatment groups are the same
(Homoscedasticity/homogeneity of variances)
Nonparametric:
Nominal or ordinal data
Small sample size
Assumptions:
Independence of observations (one observation does not influence another
observation)
Do not have underlying assumptions about the shape or parameters of the
underlying population distribution.
Drawbacks:
Not as powerful as parametric tests (if the data are normally distributed)
Not as easily interpretable (less meaning in the interpretation)
Less precision
Mostly limited to univariate and bivariate analyses

The T-TEST
Compares differences between two groups. Has one IV (discrete variable with two levels), and
one DV (interval or ratio variable, or continuous). Looks at difference, variability and the sample
size (bigger sample size = less chance for error due to chance)

Two primary types of t-tests:


-Independent-sample (between groups, student t-test)
Looks at differences between two SEPARATE/INDEPENDENT groups…
-Dependent Sample (paired t-test)
Looks at difference among the same participants over two time points.
Levine’s value will be statistically significant if there is a variance in the data (when choosing top
versus bottom)

The ANOVA (Analysis of Variance)


Compares differences between 2 or more groups: are the groups significantly different or could
the differences be due to chance?
-IV = Discrete variable(s) with 2 or more levels
-race
-child, adolescent, or adults
-Experimental condition

-DV = interval or ratio carriable (continuous)


-attitude about technology use for children
-Recall on a memory task
-Anova Assumptions: normality and homoscedasticity (equal variances in each group)
F Statistic: Based on variance
- F < 1 = no difference
- F = 1 = no difference
- F > 1 = difference
One-way Anova
-one IV (with two or more levels)
-one DV
Two (or more)-way Anova
-2+ IVs (with two or more levels)
-2x2 anova
-2x3 anova
-3x4 anova
-one DV

TWO PRIMARY TYPES OF ANOVAS


-Between groups: looks at differences between two or more separate/independent groups
-Within groups: looks at differences among the same participants over two or more time points

To do a T-Test in Stats
>analyze
>compare means
>independent samples t-test

Classes 10/11

Factorial notation: count the number of independent variables and then put levels of each IV as
the ____ x ______
SO, if there are four independent variables we would have four blanks with x in between:
__x__x__x__
THEN we fill in the blanks with the levels of each IV. So if one IV is grade and we’re looking at
four grades, the number in the blank would be 4 because we are looking at four different grade
levels: 4x___x___x___. THEN just fill in the rest of the blanks with what each IV has for levels
and you would get (made up rest of numbers): 4x2x3x2 is what it would look like, four numbers
because there are four IVs and then the numbers in the actual notation are the number of
levels in each IV.

MAIN EFFECTS AND INTERACTIONS

Interaction effects: effect of one IV on DV when being moderated by another.

Interaction: treatment type (IV1) has an effect on the dependent variable


Main effect: time spent in treatment (IV2) has an effect on the dependent variable
Main effect & interaction: there is a significant difference in score based on both IVs but one
has more of an effect.

STATISTICAL ASSUMPTIONS:
Assumptions: allow us to do a certain statistical test BECAUSE the data is appropriate for it (ex:
data is normally distributed, equal variance, etc.)

T-Test ASSUMPTIONS:
• Assumptions related to in/dependence of your groups
• Assess independence to determine independent or dependent samples t-Test
ANOVA ASSUMPTIONS:
• Normality (raw scores and residuals)
• Homoscedasticity (i.e., equal variances in each group)

Before reporting results for your test (i.e., t-Test, ANOVA, correlation), briefly state that you
checked the assumptions for the (non)parametric test and list the assumption(s).

IF YOUR DATA IS NOT NORMALLY DISTRIBUTED: you would know this by checking (through the
shapiro wilkes’ test), and you would need to do a transformation in SPSS that will allow you to
run your statistical anaylsis. (typically we just raise the variable to the half power or logx). You
can do this in SPSS by choosing “natural log transformation” (which is just logx) when you’re
requesting the Q_Q plot.
THEN, you need to make a new variable label to have for the transformed data. Go to
“transform, compute variable”, choose your variable and put in in parenthesis, then put lg10
(for logx) before it.

To exclude a level of data, go to “date” “select cases” then make a new rule for “if condition is
satisfied” (second option). Then choose your variable, put NOT and the variable in parenthesis
then the variable number (that was labeled) that you’re removing.

If your Shapiro Wilkes’ test ‘Sig’ is significant (less than 0.05) that means your data is not
normally distributed. For the purpose of this class, if it’s close (somewhere around .05, you can
assume normality).

Tests of Association

THE CORRELATION
Looks at the magnitude and direction of LINEAR relationships.
– Correlation coefficient is a number that offers a standardized way of expressing
the magnitude and direction of relationships.
– Correlation is basically just a line of best fit when you simplify it.
– Based on the following:
■ Sample Size
■ Raw Scores
■ Mean
■ Standard Deviation
CORRELATION ASSUMPTIONS
• Normality: X and Y are normally distributed
• Homoscedasticity: The variance of residual is the same for any value of X or Y.
• Linearity: The relationship between X and Y is linear
• Outliers: there are no significant outliers in the data for X by Y (you can remove
outliers though to be able to do a Correlation)

■ PEARSON’S r
■ The correlation coefficient used for parametric data.
– A measure of the extent to which paired scores occupy the same or opposite
positions within their distributions.
– The degree to which 2 variables overlap.

Positive: as x increases, y increases, r will be +


Negative: as x increases, y decreases, r will be –
So you look at the direction (neg or pos), the size (none/neglectable, weak, moderate, or large),
and significance.
RUNNING A CORRELATION IN SPSS:
>analyze
>correlate
>bivariate (for two variable)

To build a scatter plot for the correlation:


>GRAPHS
>LEGACY DIAGLOGUES
>SCATTER
>SIMPLE SCATTER

THE LINEAR REGRESSION


Allows you to determine whether 1+ variables can predict an outcome variable.
– Can determine the amount of variance in one variable that is explained by other
variables.
■ IV = predictor variable: The variable(s) that you hypothesize is(are)
predicting an outcome.
■ DV = criterion variable: The variable that is being influenced/predicted.
REGRESSION ASSUMPTIONS
• Normality: All variables in the regression model are normally distributed
• Linearity: The relationship between X and the mean of Y is linear.
• Homoscedasticity: The variance of residual is the same for any value of X.
• Independence: Observations are independent of each other.
**Typically you should run a correlation before you run a regression because you want to see if
there’s even a relationship before moving forward with the regression.

THE LINEAR REGRESSION


■ The linear regression line is the straight line that passes through the data points that
results in the smallest residuals.
– A straight line denoted by the following equation:
■ y = a + bx
– y = DV/criterion variable
– x = IV/predictor variable
– a = The y-intercept (where regression line touches y-axis)
– b = The slope/gradient of the regression line

THE LINEAR REGRESSION


■ An example using the regression equation.
– Predicting final paper grade based on revision hours.
■ IV/predictor = revision hours
■ DV/criterion = final grade
■ y-intercept = 33.4
■ Gradient = 1. 8
y = bx + a
Criterion = y-intercept + (gradient * revision hours)
Final Grade = 33.4 + (1.8 * revision hours)

IN SPSS
ANALYZE
>REGRESSION
>LINEAR
DEP: what you’re looking to predict
IND: what you think will predict it or affect it
>PLOTS
Y:dep
X: “zpred”

INTERPRETATION OF TABLES
Adjusted r square = what it predicts for in percentage of variance

We only look at standardized coefficients, unstandardized coefficients do not matter

Always check distribution and assumption

You might also like