0% found this document useful (0 votes)
2 views19 pages

Statistics Cheat Sheet Formulas and Steps

The document is a comprehensive statistics cheat sheet covering various statistical concepts and tests, including descriptive and inferential statistics, scales of measurement, z-scores, chi-square tests, t-tests, ANOVA, and correlation measures. It outlines the assumptions, hypotheses formulation, calculations, critical values, and reporting results for each statistical test. This resource is aimed at aiding students at Vrije Universiteit Amsterdam in understanding and applying statistical methods.

Uploaded by

kiu21018
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views19 pages

Statistics Cheat Sheet Formulas and Steps

The document is a comprehensive statistics cheat sheet covering various statistical concepts and tests, including descriptive and inferential statistics, scales of measurement, z-scores, chi-square tests, t-tests, ANOVA, and correlation measures. It outlines the assumptions, hypotheses formulation, calculations, critical values, and reporting results for each statistical test. This resource is aimed at aiding students at Vrije Universiteit Amsterdam in understanding and applying statistical methods.

Uploaded by

kiu21018
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

lOMoARcPSD|13158120

Statistics cheat sheet: formula's and steps

Statistiek I (Vrije Universiteit Amsterdam)

Scannen om te openen op Studeersnel

Studeersnel wordt niet gesponsord of ondersteund door een hogeschool of universiteit


Gedownload door jayda mahes ([email protected])
lOMoARcPSD|13158120

Statistics
Descriptive statistics = techniques that enable us to summarize a set of numbers (stick to this today)
Inferential statistics = techniques that enable us to make inferences based on the data

Scales of measurement
For visualization, but also other description of data or to choose the right analysis for inferential
statistics, the scales of measurement is important

1. Nominal Only categories, no order. Gender/marital


status/ethnicity/political/preference/type of
crime)
2. Ordinal Logical order (can be ranked), (Education/crime seriousness/social-economic
no fixed distance between status/military ranks/top 5 students in
values. class/likert type questions; questions from fear
of crime survey totally disagree-totally agree)
3. Interval Logical order and fixed distance (Intelligence/temperature/test score on an
between values, value zero exam/voltage/time of the day)
doesn’t mean absence.
4. Ratio Logical order, fixed distance (recidivism/weight/age/number of crimes/time
between values and meaningful spent on something/ amount/frequency)
zero point (0 means nothing).

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Z scores
= the number of standard deviations that a score differs from its mean
Raw score to z score
1 Draw a picture
2 Calculate z score

3 Table J.1 Choose the area’s you’re interested in


4 Answer
Z score to raw score
1 Draw a picture
2 Determine z score Determine the area you’re interested in (area beyond/between), look for
the given percentage in that column and the corresponding z score
3 Calculate x score

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Hypotheses

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Goodness-of-fit chi-square
Nominal data: 1 variable,, >= 2 categories
• One variable, multiple outcomes
• Compares observed and expected frequencies
• Overall test of significance
• Expected frequencies must be 5 or more in each cell
 Is non-parametric and distribution free
 Expected frequencies based on theoretical distribution
 Degrees of freedom = c - 1
Assumptions:
• One nominal variable
• Observations are independent
Examples:
“Does the handedness among the LiS students equal the expected distribution?”
“Was the deck adequately shuffled?”
“Is the coin fair?”
1 Formulate H0 : variable equals theoretical distribution
Hypotheses H1 : variable does not equal theoretical distribution

2 Calculate χ2

fo = observed frequencies
fe = expected frequencies (if null hypothesis is true)

Make a table (calculate all seperately):

A
B
C
D

 = sum of the last column

3 Determine the Depends on the degrees of freedom (df) = the number of observations
critical value χ2 out of the total that are free to vary
(α set at .05)  df = c–1
 c = number of categories

 Look at the J2 table

4 Decide on χ2 higher than the critical value? Significant: Reject H0, accept H1
hypotheses χ2 lower than the critical value? Not significant: Retain H0

5 Report the “The results do (not) provide sufficient evidence to reject/retain the null
results hypothesis” + what it means

χ2 (df, n = n) = χ2, p </>.05)

Chi square test of independence

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Nominal data : 2 variables, >= 2 categories


• Two variables, multiple outcomes
• A test for difference in proportions
• Overall test of significance
• Expected frequencies must be > 5
• Expected frequencies based on calculations: fe of a cell = (f row )(f column )/N
• Degrees of freedom = (#rows-1)*(#columns-1)
Assumptions:
• Two nominal variables
• Observations are independent
Examples:
• Is there a difference in the distribution of stress (low/high) for people who work out 0-2 times
a week or more than two times?
1 Formulate hypothesis H0 : variables are independent
H1 variables are related
2 Calculate χ2

 Make a table with the totals of the rows and columns to


calculate this (total value row x total value column/total value)

Make a table (calculate all seperately):

A
B
C
D

 = sum of the last column

3 Determine the critical Degrees of freedom = (#rows-1)*(#columns-1)


value χ2 (α set at .05)
 Look at the J2 table

4 Decide on hypotheses χ2 higher than the critical value? Significant: Reject H0, accept H1
χ2 lower than the critical value? Not significant: Retain H0

5 Report the results “The results do (not) provide sufficient evidence to reject/retain the
null hypothesis” + what it means

χ2 (df, n = n) = χ2, p </>.05)

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

One sample z test


Interval/ratio data: 1 IV, 1 DV, 1 sample
• Examines whether the obtained sample mean differs from a known or hypothetical population
mean
• A test for differences between means
Assumptions:
• Interval or ratio data
• A random sample
• Normally distributed population
• Population standard deviation must be known
Example:
H0 = The mean number of children in this sample is similar to the population mean
1 Formulate hypothesis H0: sample mean is similar to the hypothesized or known
(one- or two-tailed test?) population mean
H1: sample mean differs from the hypothesized or known
population mean

Directional hypothesis (higher/lower)  one-tailed test


Non-directional hypothesis (differ) two-tailed test

2 Calculate standard error =

= standard deviation population


M = sample mean
n = size dataset

3 Calculate z test

M = sample mean
= population mean

4 Critical value z Table j1: z-table


 Two tailed: look at area C at .025 (two critical values)
 One-tailed: look at area C at .05 (one critical value)

5 Decide on hypotheses One-tailed:


Z value lower than the critical value? Retain H0
Z value higher than the critical value? Reject H0 & accept H1

Two-tailed:
Z value is in between the critical value? Retain H0
Z value beyond one of the critical values? Reject H0 & accept H1

6 Report the results “There was (not) sufficient evidence that the sample mean differs
from the hypothesized or know population mean”

(Z = Z, p </>.05)

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

One sample t test


Interval/ratio data: 1 IV, 1 DV, 1 sample
• A test for difference between means (if the sd of a population is unknown)
• Estimated standard deviation of a population based on the standard deviation of a sample SXP
Assumptions:
• Interval or ratio data
• Random sample from a normally distributed population (especially if n < 30)
Sample standard deviation:
SX =

Population estimate standard deviation:


SXP =

1 Formulate hypothesis H0: sample mean is similar to the hypothesized or known


(one- or two-tailed test?) population mean
H1: sample mean differs from the hypothesized or known
population mean
2 Calculate standard error =

3 Calculate t test

4 Critical value t Df = n – 1
• Look at table J.3 (choose the most conservative one)
5 Decide on hypotheses One-tailed:
t value lower than the critical value? Retain H0
t value higher than the critical value? Reject H0 & accept H1

Two-tailed:
t value is in between the critical value? Retain H0
t value beyond one of the critical values? Reject H0 & accept H1

6 Report the results “There was (not) sufficient evidence that the sample mean differs
from the hypothesized or know population mean”

(t (df)= t, p </>.05)

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Independent samples t test


Interval/ratio data: 1 IV, 1 DV, 2 independent samples
• Test for difference between means
• 1 independent variable, 2 samples (experimental and control group)
• Does the difference between two sample means (M 1-M2) match the difference between
population means (µ1- µ2)
Assumptions:
• Interval or ratio data
• Two random samples
• Normally distributed populations
• Population variances are equal
1 Formulate hypotheses H0: the difference between two sample means matches the
(one-or two -tailed?) difference between two population means (M1-M2) = (µ1- µ2)
H1: the difference between two sample means differs from the
difference between to population means
S2XP unknown? • Table for estimate of the
population variance

Control group (1) X1– M1 (X1– M1

Experimental group (1) X2– M2 (X2– M2

2 Calculate standard
error of the difference
between sample
means
n = sample size group
S2XP = estimation population variance

3 Calculate t test

4 Critical value t

• Table J.3

5 Decide on hypotheses One-tailed:


t value lower than the critical value? Retain H0
t value higher than the critical value? Reject H0 & accept H1

Two-tailed:
t value is in between the critical value? Retain H0
t value beyond one of the critical values? Reject H0 & accept H1
6 Report the results (t (df)= t, p </>.05)

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

One -way between subjects ANOVA


Interval/ratio data: 1 IV, 1 DV, > 2 independent samples
• Tests for difference between more than 2 groups
Assumptions:
• Interval orratiodata
• Randomsamples
• Normally distributed populations (can be safely ignored when> 30)
• Population variances are equal
1 Formulate hypotheses H0:the populations from which the samples were drawn have equal
means
H1: at least two of the population means differ
2 Uses the F distribution:

Complete the ANOVA table:


MS = mean square k = number of groups
BET = between n = number of subjects in each group
W = within N = total number of subjects in all the groups
T = total
SS = sum of squares

3 Determine critical Look in table J4


value F • Df numerator = df BET (teller)
• Df denominator df W (noemer)
α = .05: in lightface type
α = .01 in boldface type
4 Decide on hypotheses F value lower than the critical value? Retain H0
F value higher than the critical value? Reject H0 & accept H1
5 If H0 is rejected: post- To find out which groups differ:
hoc comparisons  Tukey’shonestly significant difference test (Tukey HSD)

n= number of subjects in each sample if all


samples are the same size, otherwise:

 q: look up in table J5
• Column = number of means being compared
• Row = Dfw

 Subtract the means from each other separately (doesn’t matter


which order; ignore - signs), the outcomes that are higher than
the critical value are significant.
6 Report the results The sample means were (not) found to differ significantly
(F(number of means compared, dfW) = Fratio, p </> .05)
 If the result was significant: “Post hoc comparisons showed
that……..” (report which groups differed)

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Correlation
Association = as one variable changes, the other variables changes in a predictable manner
 the variables covary

Correlation = a measure of the degree of association among variables (not a matter of cause and
effect!!!)

ρ (rho) = correlation in a population

Phi (rϕ)
Nominal data: Correlation
• Measure of correlation between two nominal dichotomous variables (correlation varies from
-1 to 1)
 R = 0 = no relationship
 R = -1 = perfect relationship
 R = 1 = perfect relationship
Assumptions of phi:
• Nominal variables
• Data are in the form of two dichotomies
1 Formulate hypotheses H0: the two dichotomous variables are not related
H1: the two dichotomous variables are related
2 Calculate phi
X X
X (a) (b)
X (c) (d)

3 Calculate X2 test Testing whether the found value of phi is significantly different from
statistics 0 is done using a χ2 distribution:

4 Determine critical value Find critical value X2 in the table, with df = (#columns-1)*(#rows-1) =
X2 1
5 Decide on hypotheses X2 lower than critical value? Retain H0
X2 higher than critical value? Reject H0, accept H1
6 Report the results The variables X and Y are (not) related
r = correlation in a sample

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Pearson r
Interval/ratio data: Correlation
• Measure of correlation for for 2 interval/ratio variables (correlation varies from -1 to 1)
 R = 0 = no relationship
 R = -1 = perfect negative relationship (one increases, the other decreases)
 R = 1 = perfect positive relationship (one increases, the other one does too)
• The sign (+ or -) indicates the direction of the relationship
• Measure of linear relationship
• Be aware of restriction of the range
• Prediction is limited to the range of the original variables
• Coefficient of determination (r squared) = proportion of the variance in one variable that is
explained by another variable
Assumptions
• Interval or ratio data
• Data are paired
• Linear relationship
• Normal distribution for X and Y variables
1 Formulate hypotheses H0: the variables are not related (PXY = 0)
(one- or two-tailed?) H1: the variables are related (PXY 0)

2 # Xvariable Yvariable (X-MX) (Y-MY) (X-MX)2 (Y-MY)2


1
2
3
ΣX = ΣY = Σ(X-MX)2 = Σ(Y-MY)2 =
MX = MY =
Calculate pearson r

3 Determine critical value Look at Table J6


pearson r  Df = n-2
 N = # of pairs of scores
4 Decide on hypotheses Pearson r lower than critical value? Retain H0
Pearson r higher than critical value? Reject H0, accept H1
5 Report the results No/high/low correlation

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Scatterplot:

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Linear regression
Interval/ratio data: Regression
• If there is a significant correlation, knowing the value of one variable will assist in predicting
the value of the other variable
• Regression: predicting one variable from another variable
• Provides an equation that helps predicting the value of Y
• Prediction is limited to the original range of the values
• Standard error of estimate (�Ŷ) = standard deviation of Y scores around the regression line
Assumptions:
• Interval or ratio data
• Data are paired
• Linear relationship
• Only used when Pearson r is statistically significant

 Get info from pearson r table

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

BOOK REID

Chapter 1 - INTRODUCTION
Absolute value: the magnitude of a number irrespective of whether it is positive or negative.
Data (plural of datum): factual information, often in the form of numbers.
Descriptive statistics: techniques that are used to summarize a set of numbers.
Inferential statistics: techniques that are used in making decisions based on data.
Mean: sum of the scores divided by the total number of scores.

Chapter 2 - DESCRIBING NOMINAL AND ORDINAL DATA


Bar graph: a graph in which the frequency of each category or class of observation is indicated by
the length of its associated bar.
Bimodal: a descriptive term for a distribution that has two modes.
Frequency distribution: a listing of the different values or categories of the observations along
with the frequency with which each occurred.
Interval scale of measurement: a measurement scale in which the magnitude of the difference
between numbers is meaningful, and thus, addition and subtraction are possible. However, there is
no true zero, and thus, multiplication and division are not meaningful.
Measure of central tendency: a single number that is chosen to best summarize an entire set of
numbers.
Median: a measure of central tendency. It is the midmost score in a distribution. In other words, the
median splits a distribution in half, with just as many scores above it as below it. It is at the 50th
percentile.
Mode: a measure of central tendency. It is the most common category or score.
Nominal scale of measurement: a measurement scale in which numbers serve as names of
categories. In this level of measurement, the magnitude of the number is arbitrary.
Ordinal scale of measurement: a measurement scale in which the magnitude of the numbers
indicates the order in which events occurred. In this level of measurement, the magnitude of the
number is meaningful.
Percentile rank: the percentage of the data at or below a category or score.
Pie chart: a presentation of categorical data in which the area of a slice of a circle is indicative of
the relative frequency with which the category occurs.
Range: a measure of variability for ordinal data.
Ratio scale of measurement: a measurement scale in which the magnitude of the difference
between numbers is meaningful, and there is a true zero. Thus, multiplication and division as well
as addition and subtraction are meaningful.
Relative frequency: the frequency of a category divided by the total frequency.
Unstable: a term used to describe a measure, such as of central tendency, that can vary significantly
with only a few changes to the original set of data. This is seen as an undesirable quality.
Variability: how much scores differ or deviate from each other.

Chapter 3 - DESCRIBING INTERVAL AND RATIO DATA—I


Bell-shaped curve: a symmetrical distribution in which the highest frequency scores are located
near the middle, and the frequency drops the farther a score is from the middle.
Deviation: the difference between a score and some measure, usually the mean. Thus, with
population data, the deviation equals X - μ.
Frequency polygon: a graphic presentation for use with interval or ratio data. It is similar to a
histogram except that the frequency is indicated by the height of a point rather than the height of a
bar. The points are connected by straight lines.
Histogram: a graph used with interval or ratio data. As with the bar graph, frequencies are
indicated by the length of the associated bars. However, as there are no distinct categories in a
histogram, the bars are positioned side by side.

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Mean: a measure of central tendency for use with interval or ratio data. It is what is commonly
called an average, but in statistics, the term average can refer to a mean, median, or mode. The
mean is the sum of the scores divided by the number of scores.
Negatively skewed: a nonsymmetrical distribution in which the tail pointing to the left is larger
than the tail pointing to the right.
Normal distribution: a specific, bell-shaped distribution. Many statistical procedures assume that
the data are distributed normally.
Population: the entire group that is of interest.
Positively skewed: a nonsymmetrical distribution in which the tail pointing to the right is larger
than the tail pointing to the left.
Range: a measure of variability. With interval or ratio data, it equals the difference between the
upper real limit of the highest score or category and the lower real limit of the lowest score or
category.
Real limits: with interval or ratio data, the actual limits used in assigning a measurement. These are
halfway between adjacent scores. Each score thus has an upper and a lower real limit.
Sample: a subset of a population.
Standard deviation: a measure of variability—the average deviation of scores within a
distribution. It is defined as the square root of the variance. The symbol for the population standard
deviation is Σ.
Sum of the squared deviations: for a population, it is equal to Σ(X – μ)2 or Σx2. It is often
abbreviated as “sum of squares,” which is shortened even further to SS.
Symmetrical distribution: a distribution in which the right half is the mirror image of the left half.
In such a distribution, there is a high score corresponding to each low score.
Unimodal distribution: a distribution with only one mode.
Variance: a measure of variability—the average of the sum of the squared deviations of scores
from their mean. The symbol for the population variance is σ2.
x: the symbol for a deviation. Thus, x = (X – μ) if we are dealing with a population.

Chapter 4 - DESCRIBING INTERVAL AND RATIO DATA—II


Inflection point: a point on a graph where the curvature changes from concave to convex or from
convex to concave.
Parameter: a measure of a characteristic of a population, such as its mean or its variance.
Raw score: your data as they are originally measured, before any transformation.
Statistic: a measure of a characteristic of a sample, such as it mean.
z score: conversion of raw data so that the deviation is measured in standard deviation units and the
sign, positive or negative, indicates the direction of the deviation.

Chapter 6 - THE LOGIC OF INFERENTIAL STATISTICS


Alpha: another term for Type I error. Its symbol is α.
Alpha level: criterion set for rejecting the null hypothesis. This is usually .05.
Alternative hypothesis ((H11)): when used with a difference design, the statement that the
treatment
does have an effect.
Association designs: research undertaken to determine whether an observed association is likely to
generalize.
Control group: in a between-groups design, the group of subjects that does not receive the
treatment.
Correlational study: a study in which the researcher does not randomly assign the subjects and
does not manipulate the value of a variable. As a result, at the conclusion of the study, the
researcher has little confidence that there is a cause and effect relationship between the variables.
Dependent variable (DV): in an experiment, the variable whose value is not directly controlled by

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

the researcher. Its value may be changed by the IV


Difference designs: research undertaken to determine whether an observed difference is likely to
generalize.
Experimental group: in a between-groups design, the group of subjects that does receive the
treatment.
Independent variable (IV): in an experiment, the variable the experimenter manipulates or directly
controls.
Intrinsic plausibility: decision-making process in which the alternative that seems most reasonable
is accepted as being true.
Null hypothesis (H00): when used with a difference design, the statement that the treatment does
not
have an effect.
Power: the probability of correctly rejecting a false null hypothesis. This probability is 1 – β.
Quasi-experiment: an experiment in which the researcher manipulates the value of the independent
variable but does not randomly assign the subjects. As a result, at the conclusion of the study, the
researcher has less confidence in concluding that there is a cause and effect relationship between the
independent and dependent variables than with a true experiment.
Random sample: A sample in which every member or subset of the population has an equal chance
of being chosen.
Scientific method: an approach to understanding that emphasizes rigorous logic and that careful
observation is the ultimate authority for determining truth. It is a self-correcting approach that limits
bias.
True experiment: an experiment in which the researcher randomly assigns the subjects and also
manipulates the value of the independent variable. As a result, at the conclusion of the study, the
researcher is justified in reaching a cause and effect conclusion concerning the relationship between
the independent and dependent variables.
Type I error: the probability of rejecting the null hypothesis when it is in fact true. This probability
is equal to alpha, a, which is usually 5%.
Type II error: the probability of retaining the null hypothesis when it is in fact false. The
probability is equal to beta, β, which is usually not known.

Chapter 7 - FINDING DIFFERENCES WITH NOMINAL DATA—I


The Goodness-of-Fit Chi-Square

Area of rejection: area of the distribution equal to the alpha level. It is also called the critical
region.
Critical region: area of the distribution equal to the alpha level. It is also called the area of
rejection.
Degrees of freedom (df): the number of observations out of the total that are free to vary.
Expected frequencies: with nominal data, the outcome that would be expected if the null
hypothesis were true.
Independent: two events, samples, or variables are independent if knowing the outcome of one
does not enhance our prediction of the other.
Observed frequencies: with nominal data, the actual data that were collected.
Significant: in statistics, a measure of how unlikely it is that an event occurred by chance.

Chapter 8 - FINDING DIFFERENCES WITH NOMINAL DATA—II


The Chi-Square Test of Independence

Bonferroni method: a procedure to control the Type I error rate when making numerous
comparisons. In this procedure, the alpha level that the experimenter sets is divided by the number
of comparisons.

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Dependent: two events, samples or variables are dependent if knowing the outcome of one
enhances our prediction of the other.
Effect size: a measure of how strong a statistically significant outcome is.
Gambler’s fallacy: the incorrect assumption that if an event has not occurred recently, then the
probability of it occurring in the future increases.
Interaction: a statistical term indicating that the effects of two or more variables are not
independent.
Post hoc comparisons: statistical procedures utilized following an initial, overall test of
significance to identify the specific samples that differ.

Chapter 9 - FINDING DIFFERENCES WITH INTERVAL AND RATIO DATA—I


The One-Sample z Test and the One-Sample t Test

Biased estimator: an estimator that does not accurately predict what it is intended to because of
systematic error.
Central limit theorem:
—with increasing sample sizes, the shape of the distribution of sample means (sampling distribution
of the mean) rapidly approximates the normal distribution irrespective of the shape of the
population from which it is drawn.
—the mean of the distribution of sample means is an unbiased estimator of the population mean.
—and the standard deviation of the distribution of sample means (σM) = σx/ .
Confidence interval: the range of values that has a known probability of including the population
parameter, usually the mean.
Law of large numbers: the larger the sample size, the better the estimate of population parameters
such as μ.
One-tailed or directional test: an analysis in which the null hypothesis will be rejected if an
extreme outcome occurs in only one direction. In such a test, the single area of rejection is equal to
alpha.
Sampling distribution of the mean: a theoretical probability distribution of sample means. The
samples are all of the same size and are randomly selected from the same population.
Standard error: the standard deviation of the sampling distribution of a statistic. Thus the standard
error of the mean is the standard deviation of the sampling distribution of means.
Two-tailed or nondirectional test: an analysis in which the null hypothesis will be rejected if an
extreme outcome occurs in either direction. In such a test, the alpha level is divided into two equal
parts.

Chapter 10 - FINDING DIFFERENCES WITH INTERVAL AND RATIO DATA—II


The Independent Samples t Test and the Dependent Samples t Test

Carryover effect: a treatment or intervention at one point in time may affect or carryover to
another point in time.
Counterbalancing: a method used to control for carryover effects. In counterbalancing, the order
of the treatments or interventions is balanced so that an equal number of subjects will experience
each order of presentation.
Longitudinal study: a study in which subjects are measured repeatedly across time. A repeated
measures design is a type of longitudinal study.
Standard error of the difference between sample means (S(M1-M2)
): the standard deviation of the
sampling distribution of the difference between sample means.
Standard error of the mean difference (SMD ): the standard deviation of the sampling distribution
of the mean difference between measures.
Chapter 11 –

Gedownload door jayda mahes ([email protected])


lOMoARcPSD|13158120

Chapter 14 - IDENTIFYING ASSOCIATIONS WITH NOMINAL AND INTERVAL OR RATIO DATA


The Phi Correlation, the Pearson r Correlation, and the Point Biserial Correlation

Coefficient of determination: the square of the correlation. It indicates the proportion of variability
in one variable that is explained or accounted for by the variability in the other variable.
Coefficient of nondetermination: the proportion of the variability of one variable not explained or
accounted for by the variability of the other variable. For phi, it is equal to 1 – .
Correlation: a measure of the degree of association among variables. A correlation indicates
whether a variable changes in a predicable manner as another variable changes.
Covariance: a statistical measure indicating the extent to which two variables vary together.
Covary: if knowledge of how one variable changes assists you in predicting the value of another
variable, the two variables are said to covary.
Multiple correlation (R): the association between one criterion variable and a combination of two
or more predictor variables.
Negative correlation: a relationship between two variables in which as one variable increases in
value, the other variable decreases in value. Also, as one variable decreases in value, the other
increases in value.
Partial correlation: a procedure in which the effect of a variable that is not of interest is removed.
Pearson r: correlation used with interval or ratio data.
phi (rϕϕ): correlation used with nominal data. It is a form of Pearson r.
Point biserial (rpb): correlation used when one variable is nominal (a true dichotomy) and the other
consists of interval or ratio data.
Positive correlation: a relationship between two variables in which as one variable increases in
value, so does the other variable. Also, as one variable decreases in value, so does the other.
Regression: procedure researchers use to develop an equation that permits the prediction of one
variable of a correlation if the value of the other variable is known.
Restriction of the range: reducing the range of values for a variable will reduce the size of the
correlation.
rho (ρ): symbol used for the population correlation.
True dichotomy: a natural division of scores into two distinct categories.

Chapter 15 – LINEAR REGRESSION


Error variance: the variance of Y scores around the regression line.
Linear regression: procedures used to determine the equation for the regression line.
Multiple regression: situation in which several variables (Xs) are used to predict one other variable
(Y).
Regression line: with linear regression, a straight line indicating the value of Y that is predicted to
occur for each value of X. The predicted value of Y is called Ŷ.
Regression weight: another term for the slope of the regression line.
Slope of the line: one of the two determinants of the equation for a straight line. It is the ratio of the
change in the Y variable divided by the change in the X variable. It has the symbol “b” in the
equation Y = bX + a.
Standard error of estimate: the standard deviation of Y scores around the regression line. Its
symbol is
YY intercept: one of the two determinants of the equation for a straight line. It is the value of Y when
X is equal to 0. It is, therefore, the value of Y when the line crosses the Y axis. It has the symbol “a”
in the equation Y = bX + a.

Gedownload door jayda mahes ([email protected])

You might also like