0% found this document useful (0 votes)

2 views19 pages

Statistics Cheat Sheet Formulas and Steps

The document is a comprehensive statistics cheat sheet covering various statistical concepts and tests, including descriptive and inferential statistics, scales of measurement, z-scores, chi-square tests, t-tests, ANOVA, and correlation measures. It outlines the assumptions, hypotheses formulation, calculations, critical values, and reporting results for each statistical test. This resource is aimed at aiding students at Vrije Universiteit Amsterdam in understanding and applying statistical methods.

Uploaded by

kiu21018

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views19 pages

Statistics Cheat Sheet Formulas and Steps

Uploaded by

kiu21018

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

lOMoARcPSD|13158120

Statistics cheat sheet: formula's and steps

Statistiek I (Vrije Universiteit Amsterdam)

Scannen om te openen op Studeersnel

Studeersnel wordt niet gesponsord of ondersteund door een hogeschool of universiteit

Gedownload door jayda mahes ([email protected])
lOMoARcPSD|13158120

Statistics
Descriptive statistics = techniques that enable us to summarize a set of numbers (stick to this today)
Inferential statistics = techniques that enable us to make inferences based on the data

Scales of measurement
For visualization, but also other description of data or to choose the right analysis for inferential
statistics, the scales of measurement is important

1. Nominal Only categories, no order. Gender/marital

status/ethnicity/political/preference/type of
crime)
2. Ordinal Logical order (can be ranked), (Education/crime seriousness/social-economic
no fixed distance between status/military ranks/top 5 students in
values. class/likert type questions; questions from fear
of crime survey totally disagree-totally agree)
3. Interval Logical order and fixed distance (Intelligence/temperature/test score on an
between values, value zero exam/voltage/time of the day)
doesn’t mean absence.
4. Ratio Logical order, fixed distance (recidivism/weight/age/number of crimes/time
between values and meaningful spent on something/ amount/frequency)
zero point (0 means nothing).

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Z scores
= the number of standard deviations that a score differs from its mean
Raw score to z score
1 Draw a picture
2 Calculate z score

3 Table J.1 Choose the area’s you’re interested in

4 Answer
Z score to raw score
1 Draw a picture
2 Determine z score Determine the area you’re interested in (area beyond/between), look for
the given percentage in that column and the corresponding z score
3 Calculate x score

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Hypotheses

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Goodness-of-fit chi-square
Nominal data: 1 variable,, >= 2 categories
• One variable, multiple outcomes
• Compares observed and expected frequencies
• Overall test of significance
• Expected frequencies must be 5 or more in each cell
 Is non-parametric and distribution free
 Expected frequencies based on theoretical distribution
 Degrees of freedom = c - 1
Assumptions:
• One nominal variable
• Observations are independent
Examples:
“Does the handedness among the LiS students equal the expected distribution?”
“Was the deck adequately shuffled?”
“Is the coin fair?”
1 Formulate H0 : variable equals theoretical distribution
Hypotheses H1 : variable does not equal theoretical distribution

2 Calculate χ2

fo = observed frequencies
fe = expected frequencies (if null hypothesis is true)

Make a table (calculate all seperately):

A
B
C
D

 = sum of the last column

3 Determine the Depends on the degrees of freedom (df) = the number of observations
critical value χ2 out of the total that are free to vary
(α set at .05)  df = c–1
 c = number of categories

 Look at the J2 table

4 Decide on χ2 higher than the critical value? Significant: Reject H0, accept H1
hypotheses χ2 lower than the critical value? Not significant: Retain H0

5 Report the “The results do (not) provide sufficient evidence to reject/retain the null
results hypothesis” + what it means

χ2 (df, n = n) = χ2, p </>.05)

Chi square test of independence

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Nominal data : 2 variables, >= 2 categories

• Two variables, multiple outcomes
• A test for difference in proportions
• Overall test of significance
• Expected frequencies must be > 5
• Expected frequencies based on calculations: fe of a cell = (f row )(f column )/N
• Degrees of freedom = (#rows-1)*(#columns-1)
Assumptions:
• Two nominal variables
• Observations are independent
Examples:
• Is there a difference in the distribution of stress (low/high) for people who work out 0-2 times
a week or more than two times?
1 Formulate hypothesis H0 : variables are independent
H1 variables are related
2 Calculate χ2

 Make a table with the totals of the rows and columns to

calculate this (total value row x total value column/total value)

Make a table (calculate all seperately):

A
B
C
D

 = sum of the last column

3 Determine the critical Degrees of freedom = (#rows-1)*(#columns-1)

value χ2 (α set at .05)
 Look at the J2 table

4 Decide on hypotheses χ2 higher than the critical value? Significant: Reject H0, accept H1
χ2 lower than the critical value? Not significant: Retain H0

5 Report the results “The results do (not) provide sufficient evidence to reject/retain the
null hypothesis” + what it means

χ2 (df, n = n) = χ2, p </>.05)

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

One sample z test

Interval/ratio data: 1 IV, 1 DV, 1 sample
• Examines whether the obtained sample mean differs from a known or hypothetical population
mean
• A test for differences between means
Assumptions:
• Interval or ratio data
• A random sample
• Normally distributed population
• Population standard deviation must be known
Example:
H0 = The mean number of children in this sample is similar to the population mean
1 Formulate hypothesis H0: sample mean is similar to the hypothesized or known
(one- or two-tailed test?) population mean
H1: sample mean differs from the hypothesized or known
population mean

Directional hypothesis (higher/lower)  one-tailed test

Non-directional hypothesis (differ) two-tailed test

2 Calculate standard error =

= standard deviation population

M = sample mean
n = size dataset

3 Calculate z test

M = sample mean
= population mean

4 Critical value z Table j1: z-table

 Two tailed: look at area C at .025 (two critical values)
 One-tailed: look at area C at .05 (one critical value)

5 Decide on hypotheses One-tailed:

Z value lower than the critical value? Retain H0
Z value higher than the critical value? Reject H0 & accept H1

Two-tailed:
Z value is in between the critical value? Retain H0
Z value beyond one of the critical values? Reject H0 & accept H1

6 Report the results “There was (not) sufficient evidence that the sample mean differs
from the hypothesized or know population mean”

(Z = Z, p </>.05)

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

One sample t test

Interval/ratio data: 1 IV, 1 DV, 1 sample
• A test for difference between means (if the sd of a population is unknown)
• Estimated standard deviation of a population based on the standard deviation of a sample SXP
Assumptions:
• Interval or ratio data
• Random sample from a normally distributed population (especially if n < 30)
Sample standard deviation:
SX =

Population estimate standard deviation:

SXP =

1 Formulate hypothesis H0: sample mean is similar to the hypothesized or known

(one- or two-tailed test?) population mean
H1: sample mean differs from the hypothesized or known
population mean
2 Calculate standard error =

3 Calculate t test

4 Critical value t Df = n – 1
• Look at table J.3 (choose the most conservative one)
5 Decide on hypotheses One-tailed:
t value lower than the critical value? Retain H0
t value higher than the critical value? Reject H0 & accept H1

Two-tailed:
t value is in between the critical value? Retain H0
t value beyond one of the critical values? Reject H0 & accept H1

6 Report the results “There was (not) sufficient evidence that the sample mean differs
from the hypothesized or know population mean”

(t (df)= t, p </>.05)

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Independent samples t test

Interval/ratio data: 1 IV, 1 DV, 2 independent samples
• Test for difference between means
• 1 independent variable, 2 samples (experimental and control group)
• Does the difference between two sample means (M 1-M2) match the difference between
population means (µ1- µ2)
Assumptions:
• Interval or ratio data
• Two random samples
• Normally distributed populations
• Population variances are equal
1 Formulate hypotheses H0: the difference between two sample means matches the
(one-or two -tailed?) difference between two population means (M1-M2) = (µ1- µ2)
H1: the difference between two sample means differs from the
difference between to population means
S2XP unknown? • Table for estimate of the
population variance

Control group (1) X1– M1 (X1– M1

Experimental group (1) X2– M2 (X2– M2

2 Calculate standard
error of the difference
between sample
means
n = sample size group
S2XP = estimation population variance

3 Calculate t test

4 Critical value t

• Table J.3

5 Decide on hypotheses One-tailed:

t value lower than the critical value? Retain H0
t value higher than the critical value? Reject H0 & accept H1

Two-tailed:
t value is in between the critical value? Retain H0
t value beyond one of the critical values? Reject H0 & accept H1
6 Report the results (t (df)= t, p </>.05)

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

One -way between subjects ANOVA

Interval/ratio data: 1 IV, 1 DV, > 2 independent samples
• Tests for difference between more than 2 groups
Assumptions:
• Interval orratiodata
• Randomsamples
• Normally distributed populations (can be safely ignored when> 30)
• Population variances are equal
1 Formulate hypotheses H0:the populations from which the samples were drawn have equal
means
H1: at least two of the population means differ
2 Uses the F distribution:

Complete the ANOVA table:

MS = mean square k = number of groups
BET = between n = number of subjects in each group
W = within N = total number of subjects in all the groups
T = total
SS = sum of squares

3 Determine critical Look in table J4

value F • Df numerator = df BET (teller)
• Df denominator df W (noemer)
α = .05: in lightface type
α = .01 in boldface type
4 Decide on hypotheses F value lower than the critical value? Retain H0
F value higher than the critical value? Reject H0 & accept H1
5 If H0 is rejected: post- To find out which groups differ:
hoc comparisons  Tukey’shonestly significant difference test (Tukey HSD)

n= number of subjects in each sample if all

samples are the same size, otherwise:

 q: look up in table J5
• Column = number of means being compared
• Row = Dfw

 Subtract the means from each other separately (doesn’t matter

which order; ignore - signs), the outcomes that are higher than
the critical value are significant.
6 Report the results The sample means were (not) found to differ significantly
(F(number of means compared, dfW) = Fratio, p </> .05)
 If the result was significant: “Post hoc comparisons showed
that……..” (report which groups differed)

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Correlation
Association = as one variable changes, the other variables changes in a predictable manner
 the variables covary

Correlation = a measure of the degree of association among variables (not a matter of cause and
effect!!!)

ρ (rho) = correlation in a population

Phi (rϕ)
Nominal data: Correlation
• Measure of correlation between two nominal dichotomous variables (correlation varies from
-1 to 1)
 R = 0 = no relationship
 R = -1 = perfect relationship
 R = 1 = perfect relationship
Assumptions of phi:
• Nominal variables
• Data are in the form of two dichotomies
1 Formulate hypotheses H0: the two dichotomous variables are not related
H1: the two dichotomous variables are related
2 Calculate phi
X X
X (a) (b)
X (c) (d)

3 Calculate X2 test Testing whether the found value of phi is significantly different from
statistics 0 is done using a χ2 distribution:

4 Determine critical value Find critical value X2 in the table, with df = (#columns-1)*(#rows-1) =
X2 1
5 Decide on hypotheses X2 lower than critical value? Retain H0
X2 higher than critical value? Reject H0, accept H1
6 Report the results The variables X and Y are (not) related
r = correlation in a sample

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Pearson r
Interval/ratio data: Correlation
• Measure of correlation for for 2 interval/ratio variables (correlation varies from -1 to 1)
 R = 0 = no relationship
 R = -1 = perfect negative relationship (one increases, the other decreases)
 R = 1 = perfect positive relationship (one increases, the other one does too)
• The sign (+ or -) indicates the direction of the relationship
• Measure of linear relationship
• Be aware of restriction of the range
• Prediction is limited to the range of the original variables
• Coefficient of determination (r squared) = proportion of the variance in one variable that is
explained by another variable
Assumptions
• Interval or ratio data
• Data are paired
• Linear relationship
• Normal distribution for X and Y variables
1 Formulate hypotheses H0: the variables are not related (PXY = 0)
(one- or two-tailed?) H1: the variables are related (PXY 0)

2 # Xvariable Yvariable (X-MX) (Y-MY) (X-MX)2 (Y-MY)2

1
2
3
ΣX = ΣY = Σ(X-MX)2 = Σ(Y-MY)2 =
MX = MY =
Calculate pearson r

3 Determine critical value Look at Table J6

pearson r  Df = n-2
 N = # of pairs of scores
4 Decide on hypotheses Pearson r lower than critical value? Retain H0
Pearson r higher than critical value? Reject H0, accept H1
5 Report the results No/high/low correlation

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Scatterplot:

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Linear regression
Interval/ratio data: Regression
• If there is a significant correlation, knowing the value of one variable will assist in predicting
the value of the other variable
• Regression: predicting one variable from another variable
• Provides an equation that helps predicting the value of Y
• Prediction is limited to the original range of the values
• Standard error of estimate (�Ŷ) = standard deviation of Y scores around the regression line
Assumptions:
• Interval or ratio data
• Data are paired
• Linear relationship
• Only used when Pearson r is statistically significant

 Get info from pearson r table

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

BOOK REID

Chapter 1 - INTRODUCTION
Absolute value: the magnitude of a number irrespective of whether it is positive or negative.
Data (plural of datum): factual information, often in the form of numbers.
Descriptive statistics: techniques that are used to summarize a set of numbers.
Inferential statistics: techniques that are used in making decisions based on data.
Mean: sum of the scores divided by the total number of scores.

Chapter 2 - DESCRIBING NOMINAL AND ORDINAL DATA

Bar graph: a graph in which the frequency of each category or class of observation is indicated by
the length of its associated bar.
Bimodal: a descriptive term for a distribution that has two modes.
Frequency distribution: a listing of the different values or categories of the observations along
with the frequency with which each occurred.
Interval scale of measurement: a measurement scale in which the magnitude of the difference
between numbers is meaningful, and thus, addition and subtraction are possible. However, there is
no true zero, and thus, multiplication and division are not meaningful.
Measure of central tendency: a single number that is chosen to best summarize an entire set of
numbers.
Median: a measure of central tendency. It is the midmost score in a distribution. In other words, the
median splits a distribution in half, with just as many scores above it as below it. It is at the 50th
percentile.
Mode: a measure of central tendency. It is the most common category or score.
Nominal scale of measurement: a measurement scale in which numbers serve as names of
categories. In this level of measurement, the magnitude of the number is arbitrary.
Ordinal scale of measurement: a measurement scale in which the magnitude of the numbers
indicates the order in which events occurred. In this level of measurement, the magnitude of the
number is meaningful.
Percentile rank: the percentage of the data at or below a category or score.
Pie chart: a presentation of categorical data in which the area of a slice of a circle is indicative of
the relative frequency with which the category occurs.
Range: a measure of variability for ordinal data.
Ratio scale of measurement: a measurement scale in which the magnitude of the difference
between numbers is meaningful, and there is a true zero. Thus, multiplication and division as well
as addition and subtraction are meaningful.
Relative frequency: the frequency of a category divided by the total frequency.
Unstable: a term used to describe a measure, such as of central tendency, that can vary significantly
with only a few changes to the original set of data. This is seen as an undesirable quality.
Variability: how much scores differ or deviate from each other.

Chapter 3 - DESCRIBING INTERVAL AND RATIO DATA—I

Bell-shaped curve: a symmetrical distribution in which the highest frequency scores are located
near the middle, and the frequency drops the farther a score is from the middle.
Deviation: the difference between a score and some measure, usually the mean. Thus, with
population data, the deviation equals X - μ.
Frequency polygon: a graphic presentation for use with interval or ratio data. It is similar to a
histogram except that the frequency is indicated by the height of a point rather than the height of a
bar. The points are connected by straight lines.
Histogram: a graph used with interval or ratio data. As with the bar graph, frequencies are
indicated by the length of the associated bars. However, as there are no distinct categories in a
histogram, the bars are positioned side by side.

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Mean: a measure of central tendency for use with interval or ratio data. It is what is commonly
called an average, but in statistics, the term average can refer to a mean, median, or mode. The
mean is the sum of the scores divided by the number of scores.
Negatively skewed: a nonsymmetrical distribution in which the tail pointing to the left is larger
than the tail pointing to the right.
Normal distribution: a specific, bell-shaped distribution. Many statistical procedures assume that
the data are distributed normally.
Population: the entire group that is of interest.
Positively skewed: a nonsymmetrical distribution in which the tail pointing to the right is larger
than the tail pointing to the left.
Range: a measure of variability. With interval or ratio data, it equals the difference between the
upper real limit of the highest score or category and the lower real limit of the lowest score or
category.
Real limits: with interval or ratio data, the actual limits used in assigning a measurement. These are
halfway between adjacent scores. Each score thus has an upper and a lower real limit.
Sample: a subset of a population.
Standard deviation: a measure of variability—the average deviation of scores within a
distribution. It is defined as the square root of the variance. The symbol for the population standard
deviation is Σ.
Sum of the squared deviations: for a population, it is equal to Σ(X – μ)2 or Σx2. It is often
abbreviated as “sum of squares,” which is shortened even further to SS.
Symmetrical distribution: a distribution in which the right half is the mirror image of the left half.
In such a distribution, there is a high score corresponding to each low score.
Unimodal distribution: a distribution with only one mode.
Variance: a measure of variability—the average of the sum of the squared deviations of scores
from their mean. The symbol for the population variance is σ2.
x: the symbol for a deviation. Thus, x = (X – μ) if we are dealing with a population.

Chapter 4 - DESCRIBING INTERVAL AND RATIO DATA—II

Inflection point: a point on a graph where the curvature changes from concave to convex or from
convex to concave.
Parameter: a measure of a characteristic of a population, such as its mean or its variance.
Raw score: your data as they are originally measured, before any transformation.
Statistic: a measure of a characteristic of a sample, such as it mean.
z score: conversion of raw data so that the deviation is measured in standard deviation units and the
sign, positive or negative, indicates the direction of the deviation.

Chapter 6 - THE LOGIC OF INFERENTIAL STATISTICS

Alpha: another term for Type I error. Its symbol is α.
Alpha level: criterion set for rejecting the null hypothesis. This is usually .05.
Alternative hypothesis ((H11)): when used with a difference design, the statement that the
treatment
does have an effect.
Association designs: research undertaken to determine whether an observed association is likely to
generalize.
Control group: in a between-groups design, the group of subjects that does not receive the
treatment.
Correlational study: a study in which the researcher does not randomly assign the subjects and
does not manipulate the value of a variable. As a result, at the conclusion of the study, the
researcher has little confidence that there is a cause and effect relationship between the variables.
Dependent variable (DV): in an experiment, the variable whose value is not directly controlled by

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

the researcher. Its value may be changed by the IV

Difference designs: research undertaken to determine whether an observed difference is likely to
generalize.
Experimental group: in a between-groups design, the group of subjects that does receive the
treatment.
Independent variable (IV): in an experiment, the variable the experimenter manipulates or directly
controls.
Intrinsic plausibility: decision-making process in which the alternative that seems most reasonable
is accepted as being true.
Null hypothesis (H00): when used with a difference design, the statement that the treatment does
not
have an effect.
Power: the probability of correctly rejecting a false null hypothesis. This probability is 1 – β.
Quasi-experiment: an experiment in which the researcher manipulates the value of the independent
variable but does not randomly assign the subjects. As a result, at the conclusion of the study, the
researcher has less confidence in concluding that there is a cause and effect relationship between the
independent and dependent variables than with a true experiment.
Random sample: A sample in which every member or subset of the population has an equal chance
of being chosen.
Scientific method: an approach to understanding that emphasizes rigorous logic and that careful
observation is the ultimate authority for determining truth. It is a self-correcting approach that limits
bias.
True experiment: an experiment in which the researcher randomly assigns the subjects and also
manipulates the value of the independent variable. As a result, at the conclusion of the study, the
researcher is justified in reaching a cause and effect conclusion concerning the relationship between
the independent and dependent variables.
Type I error: the probability of rejecting the null hypothesis when it is in fact true. This probability
is equal to alpha, a, which is usually 5%.
Type II error: the probability of retaining the null hypothesis when it is in fact false. The
probability is equal to beta, β, which is usually not known.

Chapter 7 - FINDING DIFFERENCES WITH NOMINAL DATA—I

The Goodness-of-Fit Chi-Square

Area of rejection: area of the distribution equal to the alpha level. It is also called the critical
region.
Critical region: area of the distribution equal to the alpha level. It is also called the area of
rejection.
Degrees of freedom (df): the number of observations out of the total that are free to vary.
Expected frequencies: with nominal data, the outcome that would be expected if the null
hypothesis were true.
Independent: two events, samples, or variables are independent if knowing the outcome of one
does not enhance our prediction of the other.
Observed frequencies: with nominal data, the actual data that were collected.
Significant: in statistics, a measure of how unlikely it is that an event occurred by chance.

Chapter 8 - FINDING DIFFERENCES WITH NOMINAL DATA—II

The Chi-Square Test of Independence

Bonferroni method: a procedure to control the Type I error rate when making numerous
comparisons. In this procedure, the alpha level that the experimenter sets is divided by the number
of comparisons.

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Dependent: two events, samples or variables are dependent if knowing the outcome of one
enhances our prediction of the other.
Effect size: a measure of how strong a statistically significant outcome is.
Gambler’s fallacy: the incorrect assumption that if an event has not occurred recently, then the
probability of it occurring in the future increases.
Interaction: a statistical term indicating that the effects of two or more variables are not
independent.
Post hoc comparisons: statistical procedures utilized following an initial, overall test of
significance to identify the specific samples that differ.

Chapter 9 - FINDING DIFFERENCES WITH INTERVAL AND RATIO DATA—I

The One-Sample z Test and the One-Sample t Test

Biased estimator: an estimator that does not accurately predict what it is intended to because of
systematic error.
Central limit theorem:
—with increasing sample sizes, the shape of the distribution of sample means (sampling distribution
of the mean) rapidly approximates the normal distribution irrespective of the shape of the
population from which it is drawn.
—the mean of the distribution of sample means is an unbiased estimator of the population mean.
—and the standard deviation of the distribution of sample means (σM) = σx/ .
Confidence interval: the range of values that has a known probability of including the population
parameter, usually the mean.
Law of large numbers: the larger the sample size, the better the estimate of population parameters
such as μ.
One-tailed or directional test: an analysis in which the null hypothesis will be rejected if an
extreme outcome occurs in only one direction. In such a test, the single area of rejection is equal to
alpha.
Sampling distribution of the mean: a theoretical probability distribution of sample means. The
samples are all of the same size and are randomly selected from the same population.
Standard error: the standard deviation of the sampling distribution of a statistic. Thus the standard
error of the mean is the standard deviation of the sampling distribution of means.
Two-tailed or nondirectional test: an analysis in which the null hypothesis will be rejected if an
extreme outcome occurs in either direction. In such a test, the alpha level is divided into two equal
parts.

Chapter 10 - FINDING DIFFERENCES WITH INTERVAL AND RATIO DATA—II

The Independent Samples t Test and the Dependent Samples t Test

Carryover effect: a treatment or intervention at one point in time may affect or carryover to
another point in time.
Counterbalancing: a method used to control for carryover effects. In counterbalancing, the order
of the treatments or interventions is balanced so that an equal number of subjects will experience
each order of presentation.
Longitudinal study: a study in which subjects are measured repeatedly across time. A repeated
measures design is a type of longitudinal study.
Standard error of the difference between sample means (S(M1-M2)
): the standard deviation of the
sampling distribution of the difference between sample means.
Standard error of the mean difference (SMD ): the standard deviation of the sampling distribution
of the mean difference between measures.
Chapter 11 –

Gedownload door jayda mahes ([email protected])

lOMoARcPSD|13158120

Chapter 14 - IDENTIFYING ASSOCIATIONS WITH NOMINAL AND INTERVAL OR RATIO DATA

The Phi Correlation, the Pearson r Correlation, and the Point Biserial Correlation

Coefficient of determination: the square of the correlation. It indicates the proportion of variability
in one variable that is explained or accounted for by the variability in the other variable.
Coefficient of nondetermination: the proportion of the variability of one variable not explained or
accounted for by the variability of the other variable. For phi, it is equal to 1 – .
Correlation: a measure of the degree of association among variables. A correlation indicates
whether a variable changes in a predicable manner as another variable changes.
Covariance: a statistical measure indicating the extent to which two variables vary together.
Covary: if knowledge of how one variable changes assists you in predicting the value of another
variable, the two variables are said to covary.
Multiple correlation (R): the association between one criterion variable and a combination of two
or more predictor variables.
Negative correlation: a relationship between two variables in which as one variable increases in
value, the other variable decreases in value. Also, as one variable decreases in value, the other
increases in value.
Partial correlation: a procedure in which the effect of a variable that is not of interest is removed.
Pearson r: correlation used with interval or ratio data.
phi (rϕϕ): correlation used with nominal data. It is a form of Pearson r.
Point biserial (rpb): correlation used when one variable is nominal (a true dichotomy) and the other
consists of interval or ratio data.
Positive correlation: a relationship between two variables in which as one variable increases in
value, so does the other variable. Also, as one variable decreases in value, so does the other.
Regression: procedure researchers use to develop an equation that permits the prediction of one
variable of a correlation if the value of the other variable is known.
Restriction of the range: reducing the range of values for a variable will reduce the size of the
correlation.
rho (ρ): symbol used for the population correlation.
True dichotomy: a natural division of scores into two distinct categories.

Chapter 15 – LINEAR REGRESSION

Error variance: the variance of Y scores around the regression line.
Linear regression: procedures used to determine the equation for the regression line.
Multiple regression: situation in which several variables (Xs) are used to predict one other variable
(Y).
Regression line: with linear regression, a straight line indicating the value of Y that is predicted to
occur for each value of X. The predicted value of Y is called Ŷ.
Regression weight: another term for the slope of the regression line.
Slope of the line: one of the two determinants of the equation for a straight line. It is the ratio of the
change in the Y variable divided by the change in the X variable. It has the symbol “b” in the
equation Y = bX + a.
Standard error of estimate: the standard deviation of Y scores around the regression line. Its
symbol is
YY intercept: one of the two determinants of the equation for a straight line. It is the value of Y when
X is equal to 0. It is, therefore, the value of Y when the line crosses the Y axis. It has the symbol “a”
in the equation Y = bX + a.

Gedownload door jayda mahes ([email protected])

Statistics Formulas
100% (6)
Statistics Formulas
8 pages
MCQ Machine Learning
No ratings yet
MCQ Machine Learning
23 pages
Chi Square Test
100% (2)
Chi Square Test
75 pages
Pls-Sem P
100% (1)
Pls-Sem P
32 pages
Simple Test of Hypothesis
100% (1)
Simple Test of Hypothesis
49 pages
Formulas and Tables For Statistics
No ratings yet
Formulas and Tables For Statistics
11 pages
Section 5.7
No ratings yet
Section 5.7
47 pages
Simple Test of Hypothesis
60% (20)
Simple Test of Hypothesis
49 pages
Business Research Methods: Univariate Statistics
No ratings yet
Business Research Methods: Univariate Statistics
55 pages
Chi Square
No ratings yet
Chi Square
34 pages
23MT2013 DSS CO4 Session 19 Statistical Tests
No ratings yet
23MT2013 DSS CO4 Session 19 Statistical Tests
42 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
75 pages
Research Methodology - I
No ratings yet
Research Methodology - I
55 pages
Class12 ISC Maths Board Questions Chapter Linear Regression
No ratings yet
Class12 ISC Maths Board Questions Chapter Linear Regression
34 pages
Summary Advanced Statistics
No ratings yet
Summary Advanced Statistics
11 pages
Final Exam of Business Statistics I at ADA University
No ratings yet
Final Exam of Business Statistics I at ADA University
14 pages
6.1 Central Limit Theorem
No ratings yet
6.1 Central Limit Theorem
4 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
6 pages
Final 221220 Statmeth Solutions
No ratings yet
Final 221220 Statmeth Solutions
4 pages
Statistics (Autosaved)
No ratings yet
Statistics (Autosaved)
75 pages
Expi Psych Chapter 14-16
No ratings yet
Expi Psych Chapter 14-16
9 pages
Hypothesis Testing 2
No ratings yet
Hypothesis Testing 2
7 pages
Hypothesis Testing - Chi Squared Test
No ratings yet
Hypothesis Testing - Chi Squared Test
16 pages
Test of Significance
No ratings yet
Test of Significance
32 pages
06 - Testing of Hypothesis
No ratings yet
06 - Testing of Hypothesis
24 pages
Statistics - Exam Reviewer (Final)
No ratings yet
Statistics - Exam Reviewer (Final)
10 pages
Reviewer Psychstats Midterms
No ratings yet
Reviewer Psychstats Midterms
12 pages
Quantitative Methods For Management: Session - 10
No ratings yet
Quantitative Methods For Management: Session - 10
95 pages
QM Lecture 10 - Chi Square Tests
No ratings yet
QM Lecture 10 - Chi Square Tests
48 pages
Probability and Statistics - Lecture 4
No ratings yet
Probability and Statistics - Lecture 4
35 pages
Rangkuman Rumus & Tabel Statistika
No ratings yet
Rangkuman Rumus & Tabel Statistika
12 pages
Hypotheses
No ratings yet
Hypotheses
29 pages
Statppt2 - Test Statistic, Z-Critical & T-Critical
No ratings yet
Statppt2 - Test Statistic, Z-Critical & T-Critical
44 pages
L4 Hypothesis Tests 2021 F
No ratings yet
L4 Hypothesis Tests 2021 F
27 pages
L-10 Tests of Significance-2
No ratings yet
L-10 Tests of Significance-2
41 pages
Biostatistics L11+12 2021
No ratings yet
Biostatistics L11+12 2021
9 pages
Week 6 Lecture 1 - 2023-2024
No ratings yet
Week 6 Lecture 1 - 2023-2024
47 pages
Chi - Square
No ratings yet
Chi - Square
21 pages
Merged Statistics II Cheat Sheet
No ratings yet
Merged Statistics II Cheat Sheet
9 pages
Statistical Data Analysis-Descriptive and Correlational
No ratings yet
Statistical Data Analysis-Descriptive and Correlational
11 pages
2nd Half Notes
No ratings yet
2nd Half Notes
131 pages
Chi Square Test 2
No ratings yet
Chi Square Test 2
27 pages
Inferential Statistics: Example
No ratings yet
Inferential Statistics: Example
2 pages
C22 P09 Chi Square Test
No ratings yet
C22 P09 Chi Square Test
33 pages
Chapter Test: For B. V. Sc. & A. H
No ratings yet
Chapter Test: For B. V. Sc. & A. H
41 pages
Chi-Square Test: by Dr. M.Supriya Moderator:Dr.B.Aruna, M.D. (H)
No ratings yet
Chi-Square Test: by Dr. M.Supriya Moderator:Dr.B.Aruna, M.D. (H)
75 pages
Half Z, T and Chi Square Distribution
No ratings yet
Half Z, T and Chi Square Distribution
8 pages
Descriptive Statistics Vs Inferential Statistics
No ratings yet
Descriptive Statistics Vs Inferential Statistics
8 pages
Probability and Statistics - 3
No ratings yet
Probability and Statistics - 3
59 pages
Ch6 - Hypothesis Testing
No ratings yet
Ch6 - Hypothesis Testing
31 pages
T Test Chi-Square Test
No ratings yet
T Test Chi-Square Test
26 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
6 pages
Statistical Estimation
No ratings yet
Statistical Estimation
37 pages
c004ffce-df68-449b-b911-36aeb8d2a4d5
No ratings yet
c004ffce-df68-449b-b911-36aeb8d2a4d5
24 pages
10 Statistical Inference
No ratings yet
10 Statistical Inference
22 pages
Group 4 (Analysis of Variance)
No ratings yet
Group 4 (Analysis of Variance)
80 pages
03 Inferential Statistics2025
No ratings yet
03 Inferential Statistics2025
38 pages
Chapter 7 Hypothesis Testing
No ratings yet
Chapter 7 Hypothesis Testing
22 pages
Psychology Statistics
No ratings yet
Psychology Statistics
26 pages
Wa0102.
No ratings yet
Wa0102.
52 pages
Formula Sheet AP Statistics 2019 Free-Response Questions
No ratings yet
Formula Sheet AP Statistics 2019 Free-Response Questions
3 pages
Stat11t - Formulas-Triola 11th Ed.
No ratings yet
Stat11t - Formulas-Triola 11th Ed.
8 pages
Lilliefors Test
No ratings yet
Lilliefors Test
2 pages
Difference in Difference Models
No ratings yet
Difference in Difference Models
30 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Passmedicine Statistics Note 2021: Prepared by DR - Abohaneen Mrcpase Telegram Group
No ratings yet
Passmedicine Statistics Note 2021: Prepared by DR - Abohaneen Mrcpase Telegram Group
25 pages
Week 1-2 Forecasting 2019
No ratings yet
Week 1-2 Forecasting 2019
92 pages
Module 3 - Statistical Inference-1
No ratings yet
Module 3 - Statistical Inference-1
19 pages
Estimation and Sanple Size
No ratings yet
Estimation and Sanple Size
12 pages
BECS-184 June - 2010-June - 2024 - 12-11-24
No ratings yet
BECS-184 June - 2010-June - 2024 - 12-11-24
24 pages
Example Chapter 11
No ratings yet
Example Chapter 11
23 pages
Chapter Three
No ratings yet
Chapter Three
5 pages
twoWayANOVA SPSS
No ratings yet
twoWayANOVA SPSS
21 pages
InstrumentalVars Kolesar Gsas - Harvard 0084L 10796
No ratings yet
InstrumentalVars Kolesar Gsas - Harvard 0084L 10796
162 pages
Chapter5 Infererence Based On Two Samples
No ratings yet
Chapter5 Infererence Based On Two Samples
37 pages
KWT 5.ukuran Keragaman Data
No ratings yet
KWT 5.ukuran Keragaman Data
27 pages
Sta 2301 Toh April 2024
No ratings yet
Sta 2301 Toh April 2024
4 pages
6.1 A. Estimate The Factor Effects. Which Effects Appear To Be Large?
No ratings yet
6.1 A. Estimate The Factor Effects. Which Effects Appear To Be Large?
28 pages
Experimental Designs and ANOVA
No ratings yet
Experimental Designs and ANOVA
78 pages
Day 1 - Minitab Basic - Master Seven Tools Plus
No ratings yet
Day 1 - Minitab Basic - Master Seven Tools Plus
77 pages
Transforming Normal To Standard Normal
No ratings yet
Transforming Normal To Standard Normal
14 pages
Studi Deskriptif Effect Size Penelitian
No ratings yet
Studi Deskriptif Effect Size Penelitian
17 pages
The Chi-Squared Test With TI-Nspire IB10
No ratings yet
The Chi-Squared Test With TI-Nspire IB10
5 pages
Nama: Eka Faradilla NPM: 177052071 Kelas: A1
No ratings yet
Nama: Eka Faradilla NPM: 177052071 Kelas: A1
2 pages
Inverse Normal Distribution (TI-nSpire)
No ratings yet
Inverse Normal Distribution (TI-nSpire)
2 pages
2023-3-Per-Smkjt (Q)
No ratings yet
2023-3-Per-Smkjt (Q)
3 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Hypothesis Testing: Six Sigma Thinking, #6
From Everand
Hypothesis Testing: Six Sigma Thinking, #6
Sumeet Savant
No ratings yet
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet

Statistics Cheat Sheet Formulas and Steps

Uploaded by

Statistics Cheat Sheet Formulas and Steps

Uploaded by

lOMoARcPSD|13158120

Statistics cheat sheet: formula's and steps

Statistiek I (Vrije Universiteit Amsterdam)

Scannen om te openen op Studeersnel

Studeersnel wordt niet gesponsord of ondersteund door een hogeschool of universiteit

1. Nominal Only categories, no order. Gender/marital

Gedownload door jayda mahes ([email protected])

3 Table J.1 Choose the area’s you’re interested in

Gedownload door jayda mahes ([email protected])

Gedownload door jayda mahes ([email protected])

Make a table (calculate all seperately):

 = sum of the last column

 Look at the J2 table

χ2 (df, n = n) = χ2, p </>.05)

Chi square test of independence

Gedownload door jayda mahes ([email protected])

Nominal data : 2 variables, >= 2 categories

 Make a table with the totals of the rows and columns to

Make a table (calculate all seperately):

 = sum of the last column

3 Determine the critical Degrees of freedom = (#rows-1)*(#columns-1)

χ2 (df, n = n) = χ2, p </>.05)

Gedownload door jayda mahes ([email protected])

One sample z test

Directional hypothesis (higher/lower)  one-tailed test

2 Calculate standard error =

= standard deviation population

4 Critical value z Table j1: z-table

5 Decide on hypotheses One-tailed:

Gedownload door jayda mahes ([email protected])

One sample t test

Population estimate standard deviation:

1 Formulate hypothesis H0: sample mean is similar to the hypothesized or known

Gedownload door jayda mahes ([email protected])

Independent samples t test

Control group (1) X1– M1 (X1– M1

Experimental group (1) X2– M2 (X2– M2

5 Decide on hypotheses One-tailed:

Gedownload door jayda mahes ([email protected])

One -way between subjects ANOVA

Complete the ANOVA table:

3 Determine critical Look in table J4

n= number of subjects in each sample if all

 Subtract the means from each other separately (doesn’t matter

Gedownload door jayda mahes ([email protected])

ρ (rho) = correlation in a population

Gedownload door jayda mahes ([email protected])

2 # Xvariable Yvariable (X-MX) (Y-MY) (X-MX)2 (Y-MY)2

3 Determine critical value Look at Table J6

Gedownload door jayda mahes ([email protected])

Gedownload door jayda mahes ([email protected])

 Get info from pearson r table

Gedownload door jayda mahes ([email protected])

Chapter 2 - DESCRIBING NOMINAL AND ORDINAL DATA

Chapter 3 - DESCRIBING INTERVAL AND RATIO DATA—I

Gedownload door jayda mahes ([email protected])

Chapter 4 - DESCRIBING INTERVAL AND RATIO DATA—II

Chapter 6 - THE LOGIC OF INFERENTIAL STATISTICS

Gedownload door jayda mahes ([email protected])

the researcher. Its value may be changed by the IV

Chapter 7 - FINDING DIFFERENCES WITH NOMINAL DATA—I

Chapter 8 - FINDING DIFFERENCES WITH NOMINAL DATA—II

Gedownload door jayda mahes ([email protected])

Chapter 9 - FINDING DIFFERENCES WITH INTERVAL AND RATIO DATA—I

Chapter 10 - FINDING DIFFERENCES WITH INTERVAL AND RATIO DATA—II

Gedownload door jayda mahes ([email protected])

Chapter 14 - IDENTIFYING ASSOCIATIONS WITH NOMINAL AND INTERVAL OR RATIO DATA

Chapter 15 – LINEAR REGRESSION

Gedownload door jayda mahes ([email protected])

You might also like