0% found this document useful (0 votes)
805 views15 pages

AP Statistics Chapter Notes (1-12)

This document provides an overview of analyzing categorical data and various graphs used to display categorical data, including frequency tables, two-way tables, dotplots, stem-and-leaf plots, histograms, and comparing distributions. It discusses measuring relative frequency and joint relative frequency for categorical variables. Different types of distributions are also outlined such as symmetric, right-skewed, left-skewed, uniform, unimodal, and bimodal.

Uploaded by

Laura Mesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
805 views15 pages

AP Statistics Chapter Notes (1-12)

This document provides an overview of analyzing categorical data and various graphs used to display categorical data, including frequency tables, two-way tables, dotplots, stem-and-leaf plots, histograms, and comparing distributions. It discusses measuring relative frequency and joint relative frequency for categorical variables. Different types of distributions are also outlined such as symmetric, right-skewed, left-skewed, uniform, unimodal, and bimodal.

Uploaded by

Laura Mesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

ADVANCED

PLACEMENT
Statistics

Notes by:
Jeremiah James dela Rosa
Temecula, CA
[email protected]
Thank you, Stats Medic, Luke & Lindsey!
ANALYSIS
CHAPTER 1:DATA

Quantitative Data Graphs


with
Displaying
Displaying Categorical Data leaves
stem y
c ategorical data
The distribution
· of
-8 123
Frequency (counts)

!in
-

5678
frequency(percent/proportion)
I
-
relative
233477
· Two-Way Table (Label) 3 589

Total
·
marginal relative frequency: (P(c) DOTPLOT 4015
isdoc's s
variable 1
STEMPLOT
ranges
percent or proportion OF
I stem-and-leafplot)
-

HISTOGRAM
individuals have
that a

e
A B
en e
one
specific value For

I categorical variable.
-
j · jointrelative Frequency: (Alc) roughly
symmetric
right-skewed left-skewed uniform
Lunimodal)
double
(bimodal)
peaked

-
percent or
proportion of

individuals that have a

Total C specific value


for one

the question says context


categorical variable and
SOCV+
...

specific value for another "Describe the distribution."


· Association categorical value.

there is an association · conditional relative Frequency: (*(i) Shape:the distribution (context)


of is (shape)
gaps bet. (9ap).
between two variables if with
a peak at (highestpoint) and
-percent or proportion of

knowing the value one individuals that have outliers (values)


of a Outlier:There seems to be at
-

variable predicts the specific value For one


(mean/median) the
of distribution
Center:the
value the
of other categorical variable among
Symmetric
*
individuals who share the is (mean/median + units). use
-> mean

value s kewed
*
same another
of use median
(condition).
->

categorical variable
Variability:The distribution of (context) has

a (SD/IQR/ range units).


+

variable
Graph Categorical
of

side-by-side
the question says...
pie
war gran
· chart

-
"Compare the distribution."
You
* will do the same thing with socr+context
Describe the SHAPE d istributions.
OFboth

Ent
bar graph
· #- segmented
Identify any OUTLIERS distributions.
For both
bar graph
compare the CENTER (which is greater (lesser)
varies more)
0 the VARIABILIM (which
-
compare
always
* write
in contextofthe problem.
mosaic plot

Quantitative with
Numbers
Describing Data Let's Talk aboutOUTLIERS:

·
The mean & SD a re greatlyaffected outliers.
by
Measures Center:
of
(non-resitant)
·
mean-average I
2x:
=
· The median & IQR are not affected by outliers.
n
cresistant)

a
mes en
middle value (odd value datal
of
· median -

two middle values (even value data)


of
-average of

med mean
Variability: med
mean
Measures of
or the distribution is skewed...
IFthere are outliers
·
minimum

~x-values
range:
·
maximum -

& IQR
use median

standard If distribution is roughlysymmetrical. . .


·
(SD):Sx =E (xi-x- mean
· t he
deviation use mean & SD

n -> data
n o . oF

for means
How Find
to outliers?
Interquartile (Quartile 1)
·

range
(IQR):IQR Q3
=
-

G1- · 1.5xI P R
Rule

(25%) 91-(1.5 x
IQR)
↳(Quartile 3) low outlier <
(for median)
(15) outlier > 03 + (1.5 x IQR)
high
SD:
Interpreting if?
what
BOXPLOTS the min.
"the (context) typically varies by
about
min.
&I med Q3
max.
if
what
is an outlier,
or max

11
Outlier whatwill be your
(SD + unit) From the mean (X+
of unit)." ↳ min. I max.?
-
remove your
outliers, label them

on your boxplot. The


new min is the lowest
(label) data (same For max).

(Sx)<
-

describes variance: (SD)


(or statement)
·
number that a population.
·
parameter:a
Five number summary:
(or statement) describes sample. ·
that a
· statistic:a number

minimum

S
-

sample estimates population


QI(quartile 1;25th percentile)
① SAAT> Edit
-

P - P LandlorL2 -
median

Q3 (quartive3; 75th percentile)


E - A ② SAT>cak -

-
1 1-varstats
maximum
Sx -
I
2: z-arstate
:
CHAPTER 2:MODELING DISTRIBUTIONS QUANTITATIVE
OF DATA

Location Distribution
Describing Frequencygraph (OGIVE)
in a
cumulative relative

Example:
Two ways to describe location
cumulative
Cumulative relative
* percentiles -
p1 observations
oF Age Freg. Freg. Freg. vel.
Freg.
less than or equal
to it. 40 -45 2 7 2 4.4 % 7 4.4 108

20%
standardized.
*
For an individual value 45 -
50 7
L
-
9 15.6% -
80
scores
in a distribution tells
(z scores) 48.9%
55 22
-

how many standard 50 13


28.9%1.2,
-

us
L
deviations From the - 60
55 60 75.6%
Falls, 26.71.2,
-

the value
x M
mean 12 L
>34
-

and in whatdirection. 40
z 60 65
15.6.1, 91.11.
-

y
>4)
=
L
O
Z- score Interpretation: 20
where 65 70 3
-44 b. 712, 97.81
-

x va l u e
=
"context) is (E-score)
mean 70 -75 2.2 1. 100%
I
>45
=

m -
standard deviations 4045505560657075
0 SD
=

(apore below)
3
the
percentile;M e d i a n :5 0th percentile; 03: percentile
1 -
75th
91 = 25th
33

mean of(m+unit). an
* ogive allows you to examine location in a distribution

the completed graph allows you estimate the


to

& vice-versa.
percentile for an individual value

Transformation Data
of
CENTER
SHAPE VARIABILIN
LOC. centers,location
*

ADDITION/ -mean. Five number summary, percentage


no change a
I no change
SUBTRACHON
variabilit
*

range, IPR, standard deviation


MULTIPLICATION)
DIVISION
no change I
b E b

Curves and Normal Distributions


Density

↳ Uniform density curve


·
Density curves
-
models
variable
the distribution
with a curve that: E Why is

since
the height
the area
12?
under the curve should

t
be equal to 1, then the distance
9
2 on
is always above the horizontal axis.

Height
*

2 units' the horizontal axis is equal the


to

has exactly
*
1 it.
underneath
reciprocal of the heights.
-
The a re a under the cur ve

and above anyinterval OF C D A L


= w
x

values on the horizontal axis

estimates the proportion of


A 2
=

1
x
1
=

in -
all observations thatFall
2
# ApproximatelyNormal
that interval.

Mean of
a density curve-pointa twhich
the balance in
roughlysymmetric, single-peaked, bell-shaped
curve would by
*described a
made solid
of material. Normal distribution
density curve called a Normal curve. Any
the
parameters:m e a n (M) &SP(O).
is
Median of
a density curve -
equal is completely specified bytwo

areas point, the point that

divides the area under -o do- did so so


the curve in half.
less than

t
12Ft Finding a value
Finding the area

-1 -or
tail
under the curve From an area

<Probability)
1. S
med med X x <med always Find
* z-score than
X=
greater
!
First
left-skewed 3:invNorm
roughly symm, right-skewed M:0

1.
lower:2
area:a re a
I
0:1
Upper:1000
M:mean

I
context in between 5: Sd

Empirical Rule (68-95-99.7 Rule)


lower:.lawer
upper
u.
8:1
M center
IF
*
Finding the
score,
z! Ez
up per: z
- use

M:0 & 8:1

~ 99.7%
I N Assessing Normality
~ 95%
* If these values close to
I(SD)
a re
- mean I count
~ 68 68-95-99.7, then the

A A mean I 2(SD) count distribution is approximately


Normal.
3(SD)

mal. . . . . . .
I count
mean

Normal Probability
plot

-
the scatter plot of ordered pairs (x, y) is

(data values, expected z-score For each

individual in a quantitative data set.

look linear the scatter plot

Besi
-

For an almost Form of

i t ' salmost distribution is


if linear, then the
0.15%
approximately Normal.

30 20-0 20 30
M 0
- -
CHAPTER 3:EXPLORING QUANTITATIVE DATA
TNO-VARIABLE

onlyapplies to linear association.


correlation (r)
-

variables ·
o r explain
helps predict
· Explanatoryvariable (input)
-
-
preferably, a correlation has a
in response
changes a
shown.
graph
variable.
-only a number. NO UNITS.
·
Response variable (predicted output) -
measures the

outcome of
a study. -
does not implycausation.
no perfect
correlation moderate correlation
· Scatterplot

obsstrong!
Least -

Squares LSRL
Line
Regression
-
0
weak

↑ How to describe
this CC
0

Expanony
the correlation r
of =
confirms the
that
scatterplot?
linear association between (explanatory) and

Direction:(positive negative none

response) is (positive/negative) and


outlier
Unusual Feature:

Form:linear nonlinear (weak/moderate strong)."


-
Strength:weak moderate
strong
· residuals IE
describe Direction & Strength
using
*

Predicted) The actual (y-context) was

Interpretation: correlation (r), given.


it (Actual -

cresidual) (above/below) the


"There is a moderatelystrong, positive, linear (y -

y)
between (explanatory variable) predicted value for x (#
=
in
context?"
relationship

and (response variable). There doesn't

Features in this relationship.


"
· slope (b):
seem unusual
For every increase in (x-context)
the predicted (y-context(increases/decreases)
· LSRL Equation:
by(slope unit
of y)."
8: LinReg(a bx)
SAT
+

> Calc >


*slope

predictedy
a
=
bx
+ - explanator
variable
·
y-int(a): "When (x-context) is 0, the


y-int predicted by-contexts is ly-int)."

·
Residual in Calculator:
(s):
1.) Type your data in L1 and L2.
· standard deviation
"The actual ly-context)
2)
Calculate (SRL Equation. is typically about (s+ unit) away From

3)
Go back the
to table. Highlight (3 the number predicted bythe LSRL

and click with


x (context)."
=

4) Choose RESIDUAL.
· COCFFicient
ofdetermination (r4:
This is

3
a

"About in
variability
RESIDUAL PLOT. (r2% the
of

E
I t identifies if
LSRL
a LINEAR MODEL
ly-context) is accounted For the
by

is APPROPRIATE. at (x-context)."
i We look for

NO LEFTO VER

CURVED PATTERN.
E
X PLAN ATORY

It given the &


r,sx, sy, x Y, use these

Formulas to Find the LSRL equation:

b
r.
=
a
y
=
-
bx

explanatory variables that


a re outside
·
Extrapolation -

the
of range data
of which the

LSRL was calculated.

Influential points
·
-

can greatly affectcorrelation


and regression calculations.
· Outliers -
out pattern
of (large residuals).
·
High Leverage -

very large x-values.

the the
values of
1:raise
· Power Model:Option
explanatory variable by an integer, p.

Option 2:take the pth root of the

response variable.

· Exponential & Logarithmic Models:take the logarithm


(log (base 10) or In (basee) OF

one or both variables.

LSRL scatterplot & residual plot


always
* check the
LINEAR MODEL is APPROPRIATE
before concluding ifa
simple random sample (SRS)
·

possible sample
* make sure to do
WITHOUT REPLACEMENT
SAMPLING
remains
SAMPLES

CHAPTER
gives every

4: COLLECTING DATA sampling


-

the when doing SRS.


·
Convenience
given size
same
of a
-
ichooses individuals
chance to be chosen

mining
to reach
MPLING
easiest
TYPES OF SAMPLING WELL identical slips
·
of paper
POPULATION ·voluntary sampling
iRNEeD

xx x
individuals choose

ter
-

to be a part of the

#
+

study b/c of open invitation.


SAMPLE sampling
both
* of these BIAS,
HOW TO CHOOSE AN SRS? can
lead to
x method
Technology:
·
stratified random sampling which leads
to an over or

of the study.
individual divide the population in
·
cluster sampling underestimate
Label: Label each
-

Strata (similar way) random sampling AVOID BIAS,


1 to N.
in some -
divide the population ·
systematic WE MIGHT
From
affect their individual else can go
Randomize: Use an RNG to get that might into non-overlapping groups selects every 4th
- but what
response. Then choose a & WRONG ???
n different integers (ignore stratum & then of individuals that are located based on the population size undercoverage:
occurs when
separate
-

repeats, if necessary). Randomly select desired sample size. Randomly members of the
combine these SRSS to near each other. some
select: Choose the individuals k to likely
the sample. all the select a value from Ito population are less
form these clusters and
first individual,
who correspond to the integers. be chosen or cannot
be
similar within clusters identify the to
↑able D: strata are
individuals in the chosen individual. chosen in a sample.
and choose every kth
*

when
Label: Label each individual CHOMOGENEOUS), but diFF
are included in the sample.
iF there's
* a pattern in -nonresponse:
occurs

individual chosen For


and an
with a distinctn umerical label between, stratified different
c lusters are the way the population is
the sample can't be contacted.
with the same number of digits
*
samples tend to give not when
(e.g., if using two digits use
estimates within (HETEROGENEOUS ordered, the sample may response bias: occurs
Oto NN, o r three digits
use more precise the systematic
but similar between be a representative of there is a

001 to NNN) of unknown population pattern of inaccurate


values than SRSs. saves
* time & money. population. to survey question.
Randomize: Read consecutive answers a
the appropriate
groups of digits of
across a
length from left to right Description:
line in table D, lignore repeats, BASIC PRINCIPLES OF EXPERIMENTAL DESIGN:
if necessary) until number of
- Number the companies from 1 to 20
treatments.
size desired are
sample
selected.
Comparison: Use a design that compares two or more
(20 individuals)
tabled)
·

to assign
individuals that chance process (slips of paper, RN6, - Use a random number generator to
Select: Choose the ·
Random Assignment: Use a

randomly units to treatments. This helps create produce 10 different random integers
correspond to the experimental
treatments are imposed from 1 to 20 (random assignment)
selected labels. roughly equivalent groups before
Control: keep other variables the same for all groups. Control helps
avoid - Select the first 10 different
Slips of paper:
·

Write corresponding confounding and reduces


variation in responses, making it integers (Group 1) and assign them to
Label:
letters on identical easier to decide if a treatment is effective. additional lighting (Treatment 1)
numbers or

enough experimental
units so
treatment on - Select the remaining companies
Replication: Impose
each
s lips of paper.
(Group 2) and assign them to the
·

treatments can be distinguished


Randomize: Put in a bowl or
that the effects of the
and chance differences between groups. same lighting (Treatment 2).
hat, shuttle the papers from
assigned - Compare the increase in worker
unitsare
mmmminLEET experimental
let individuals take one paper,
RANDOMIZED DESIGN productivity between the two groups.
(no replacement). Description:
based
Select: Group individuals Group" - Form blocks based on grade level
-> Treatment 1 7
on the slip of
paper they got.
random X (n)
compare
(Individuals + Blocks) because scores
on the geometry final exam are
individuals ->
assignment & likely to vary by grade level since
Group 2->Treatment 2 Freshmen who takes geometry tend
TYPES OF STUDIES (n) to be more advanced in their math
example:
coursework.
·
Observational Group -> additionalencompare - Assign each individual student
-
observes individuals
random F (10)
from 1 to 100 for Freshmen. Use a
20 companies -> number worker

and measures variables


generator - Group 2 the same a productivity random number generator to obtain
but does not ->
50 random integers (random
of interest lighting

monomn andreassignmenttorepeaterintothebroch
a ttempt to influence assignment), select these students
the
respons. Experimental and assign them to online (Block 1 +
BOMIZED BLOCK DESION Treatment 1). The remaining
-
delibaretly impose students are assigned to teacher
(conditions) Treatments
- taught (Block 1 + Treatment 2).

tex
treatments (h)
on individuals to ->
random scompare Block
I - Assigned each individual student
measure their responses. ↑ individuals
Ch) Treatment I
(n)
combine
compare
&
from 1 to 400 for Sophomores. Use
a random number generator again to
Treatment) f
moman
ED PAIRS DESIGN &Block ->
random - (r) compare obtain 200 random integers (random
assignment t assignment), select these students
(n) ->treatments
a common experimental and assign them to online (Block 2 +
(x)
Treatment 1). The remaining
design for comparing two example: online
I students are assigned to teacher
treatments that uses blocks

MPD, eshmen
#
random
number
↑ 1503 compare
1 taught (Block 2 + Treatment 2).
of size 2. In some -> sources
combine results - At the end of the course, let them
inClass
a
two very similar experimental - (100) generator -
& take the same geometry final exam
units and the levels
are paired a grade compare and compare the scores (compare).
two treatments are randomly - &ophomores ->
random
number
online
(200) compare
X
a - Once all students have taken the
within each pair. (400)
assigned generator X in-class -> scores test, and the scores have been
In others, each experimental 2208) compared for each treatment for
each block, then combine the results
unit receives both treatments levels: different values of a
and compare (combine and compare).
·

in a random order.
VOCABS988 Factor-
example: occurs when combination of treatments?
*

·confounding: levels
are associated Factors x
A track coach wants to know whether his long-distance two variables control: keeping other variables
that their control group: used to provide
·

way
·
in such a
runners are faster running the track clockwise or baseline for comparing the constant for all experimental
response variable
a

counterclockwise. Design an experiment that uses a effects on


units.
effects of other treatments. treatment
replication: giving
each
matched-pairs design to investigate this question. Explain
cannot be distinguished from placebo effect: describes the
·

units so
each other. enough experimental
your method of pairing. that has fact that some subjects in an that any difference in the effects
· placebo: treatment experiment will respond
ingredient, but is can be distinguished.
no active favorably to any treatment,
Description: otherwise like other treatments. block: group of experimental
even an inactive treatment. · a

Have each long distance runner race 1 mile in each ·treatment: a specific condition known before the
·double-blind: neither the subj units that are

direction. Some runners are faster than others, so using applied to the individuals in experiment to be similar in some
nor who interact with
those
each runner as his or her own “pair” accounts for variation an experiment. them and measure the way that is expected
to affect

experimental unit: the object which the response to the treatments.


in 1-mile race times among the runners. For each runner, ·
response variable know
randomly assign the order in which the treatments to which a treatment is treatment subject is receiving.
a ·
sampling variability different
randomly assigned. single-blind:either the subjects random samples of the same
(clockwise a nd counterclockwise) are assigned — by
·

or the people who interact size from the pop. produce


flipping a coin. Heads indicates the runner will race
·
subjects: human beings are
with them and measure the
same

& iFF. estimates.


clockwise first and counterclockwise second; tails indicates the experimental units. response variable don't know

the runner will race counterclockwise first and clockwise variable which treatment subject is Statistically significant: the
Factors: an explanatory
·
·
a

second. Allow adequate recovery time between the races. that is manipulated and may divingassignment:
rare experimental
·

are
bserved results of
too unusual to be
a study
explained
For each runner, record the 1-mile race times for each cause a change in the response units are assigned to treatments chance alone.
variable. by
direction. using a
chance process.
SAMPLING VARIABILIM & SAMPLE
SIZE

I
random samples tend to
larger
closer CRITRIA FOR ESTABLISHING CAUSATON WHEN WE CAN'T
DO AN EXPERIMENT:
produce estimates that are
is strong. The association the
between explanatory variable
value than 1) the association
to the true population
and the response variable strong.
is

smaller random samples. In other


link
larger Many studies ordifferent kinds shows
words, estimates from The consistent. a
2) association is

precise. the explanatory and


between response variable. This reduces
samples are more
the chance that some other variable specific one
to group
or one study explains the association.
SAANSTALLY SI6 NIFI CANT
3) Larger values of the explanatory variable are associated with

IF
* % 15%, yes, is
it stronger responses. The individuals have consistency in the
individuals don't.
statistically significant and explanatory variable, and some

in time. The continued application


have happened
itmay 4) Alleged cause precedes effect
in the
by chance alone of whatthe cause mightbe shows the effect
long rull.
If
* % 75%, no, it is is believableat

placemeans
are
not statistically significant
possible.
and it mayhave happened
by coincidence only.
OF IDENTFYING TE In
*
sampling variability
PROCESS error:
(P-VALUES the margin of
PERCENAGE Apexam
creates an interval of
1.) Identify
the difference in mean.
- All planned studies must be reviewed plausible values.

2.) make a simulation and dotplot. in advance by an institutional review sample


I margin
of

board charged with protecting the estimates


dots are
Identifyhow
3) many
safety and well-being of the subjects. random
the difference conducting
*

greater or equal to assignment:


in fromstep 1. - All individuals who are subjects in a
mean

the percentage
calculate
4) of study must give their informed consent use the
than
before data are collected.
(oreC
how are greater
many dots
to the difference.
Randomize
or
equal mean

select
the 5% rule and - All individual data must be kept
5)compare to

the study is statistically confidential. Only statistical summaries


state if
for groups of subjects
significant not in the context
or
problem.
of the ↑
it'svery
here, but

THE SCOPE INFERENCE


OF
* All thesenotes
know
essential to
individuals will be used so
· Random selection of things since
these
much in Chapter do
the population
allow inference about you cannot
studies
7 through 12 since your future
which the individuals were who
From
chapter 4 is all and projects
chosen. collecting your
aboutcollecting
·

Random Assignment individuals


of First.
& ata
data. There are
about
to groups allows inference RANDOM IS ALWAYS
a lot
of information
cause and effect. IMPORTANT 888

SUMMARY: WERE INDNIDUALS RANDOMLY ASSIGNED TO GROUPS?

YES NO

Inference population: YES


about
Inference population: YES
about

Inference about cause &CFFest: NO


YES Inference about cause &eFFect:YES
WERE INDIVIDUALS
RANDOMLY SELECTED? Inference population:No
about
Inference population: No
about
NO Inference about cause &eFFert:NO
Inference about cause &eFFect:YES
CHAPTER 5:PROBABILITY

Minim
Definitions;
Formulas: · random process-generates outcomes that are determined purely
outcomes
number in event A
of by chance.
·
P(A) =
space
total number of outcomes in sample
· probability - outcome between 0 and 1.

-must add to 1.
rule:
·
Complement
·
large numbers
law of - we observe
if more and more trials ofany
P(AY) 1 -
P(A) approaches the true
proportion
=

random process, the

Addition
· Rule for mutually exclusive events: probability.
exclusive
mutually no event can happen at the same time.
P(AUB) P(A) P(B)
·
-

+
=

Addition Rule: simulation imitates a random process in such a waythat simulated


General · -

P(AnB) consistent
outcomes are real-world
with outcomes.

P(AUB) P(A) P(B)


=
+
-

simulation process:

Describe how you will simulate one trial (one reptition)
·
Conditional Probabilities ("given that"):
② Perform many trials (repetitions)

③ Use the result to answer the question.


P(given event occurs)
·
sample space - all
list of possible outcomes.

P(B(A)
Y*
= · conditional probability
-
another
probability
event
thatone

is
happens
event

known to have
given
happened.
that

knowing whether or not one event has


Independent events if
-

·
·
Independent Events:
occurred does not change the probability that the
other event will happen.
P(A) P(A(BY P(A(B)
=
=

or

P(B) P(B(AC) P(B(A)


complement
=

·OR
· AND
· General Multiplication Rule:

P(A 1B) P(A). = P(B(A)


·
Multiplication Rule for Independent Events:

P(A1B) P(A) =
.

P(B)
·
At least one probability Rule:
-
union (V) - intersection (1) -
As
p (at leastone) =1-P (none)
C
N A A
-
B 1 2
8
HOW SMALL DOES THIS PROBABILIT

TO BE FOR US TO SAY THAT


HAVE

-
IT IS UNUSUAL?

LESS TAN 5% Ba 3 4
ifthere is evidence

r
How know
to convincing
the
From question?
percentage proportion From A B

or
Identifythe
the problem.

② Perform the simulation. 213


dots
I
out of
③ Count the number of
the total number simulation.
of

④ IF:

AR-pcanc
it is statistically
proportion of
*
dots
(5%, significant
↓ P(A1B)
From #3
the

p(A)
based
a
on
X
question.

* proportion of
From #3
dots
5%,
itis not statistically

based
significant

question.
on the E *sa ACPIBIAY:
P(C/A*c
B -

-
P(A'nB)

P(Are)
CHAPTER 6:Random Variables & Probability
Distributions
DEFINITONS:
FORMULAS:
numerical values of outcome
·
Random
Discrete variables · random variable -
an

random process.

t
From a
t h e other
From probabilities
where i smissing
P(X k),
* =

·
Probability distribution-random variables of possible
P(X k) = 1
= -
P(X Fk) values and their probabilities. -

P(X k+n) discrete random variables Fixed set possible


of X -values
P(XIk) P(X k) P(X
..
-

kD
+ =
= + + .
+ ·
them.
=

gaps bet.
*

values with

(xi) (Pi) (e.g. shoe size, number children


of in your family, etc.)
Mx E(X) (X,)(P.) (x2)(P2) +
* = +
+ ...
=

(expected value) of
a discrete random variable
54 Mx)YP2) Mx)2 (pi) mean
*
-
* (x, -

Mx) (P.) (xz + - +


.. .
(Xi
+ -

the same
trials of
value many, many
=

average of
Ox
*

N
= random process.
a discrete random variable
standard deviation of
*

Continuous Random variable


·
-measures how much a value typically varies
the mean
many, many trials
after

#
From
of the random process.

x_x*neige
Height can take any value
in an interval.
· Continuous random variable -

-x probabilitiesin (e.g. length, salary, height, unfixed time, etc.)


Foot
values are given, without
Uniform density curve used in -

xM the mean and standard deviation.


z =

t E
Normal density
* curve-used in
the mean I standard

Random variables deviations are given. (* Find z-scores)


·
Transforming Linear
* transformation on a
independent random variables -
when I help us
cannot the
predict
SHAPE
C ENTER VARIABILIT Random variable
y
value of
-

-
Y- same to or same
mean: My = a +
bMx -

knowing the valueof


one variable

does not probability


change the
same
*
y: b.*: b
s.d.:Ox 1b10x
=

distribution the other variable.


of

· variance (82) - the


square of standard deviation.
Random variables
·
Combining when we do independent trials of
(D X -y) Linear
* binomial setting -

1) difference
=

(S
·
x + of the same random process thata
combination
*

Sum
=

*
particular outcome ("success") occurs.
R.V.
Ms= Mx Mp Mx My
= -

mean:
+ My - Binary?"success"or "Fail"
mean:aMx b My +
U
the outcome o n e trial
Independent?Knowing
of
I Ox+ O O Ox +
OY I
+ anything
variance: about
tell
=

does not us
=
or
s.d.: bo E the outcome other trials.
of

s.d.:Os N
OY
=
Ob
=
+

Y iFX and Y are W


Number? " Fixed trials.
number of

independent. same probability? p of same probability.


possible
successes of
binomial random variable -
count of
variables:
·

Normal random
when combining values X.
of
by10 p.
MS Mr. Find the sum or difference the
of values binomial distribution specified
① Find the or -

Find the z-score mean


* (expected value) of a binomial random variable
② Find the OS or
Op. a binomial random variable
normaleaf to Find the probability. average value of
-

⑥ Use
-

after many trials.


·
Binomial Random variables standard deviation
* of
a binomial random variable
*

p)*
-

P(X x) (Y)(p) (1
*
-
-how a value typicallyvary from the mean
* =

trials.
=

after many

·
10% condition -
a binomial
if setting;notindependent, we

P(X x) =
-> binompdr (trial: _iP:-ix-value:1) can use the 10% condition to proceed with the

calculations to treat each individual as independent.


P(X(x) -> (trial: _iP:-ix-value:1)
binomedr
condition helps us t h e probability
identifythat
·
Large counts -

distribution of
X approximately
is Normal.

P(X(x) -> (trial: _iP:-ix-value:


binomedr 1) ·
Geometric setting - when we perform independent
trials,

trials
number of until we

11)
the
1-binomedr(trial: _iP:-ix-value:
we record

P(Xx) ->
get a success.

P(X)x) -> (trial: _iP:-ix-value:


1- binomedr 1) L
n
Binary?"success"or "Fail"
the outcome o n e trial
Independent?Knowing
I of
anything about
(expected value):Mx E(X) does not tell us
mean
* =
up
=
- or

the outcome other trials.


of
↳ Trial?number t rials it
4-P)
of took to get
a success.
standard deviation:Ox
*
=

same probability? p of same probability.


size
samplesize; N population
10% condition: n <0.10N,
when n=
=

*
· Geometric random variable - trials
number of t akes to
it get
a success.

Large Counts Condition (LCC):npI 10;


*
n(1-p) I 10
- -
·
Geometric distribution -
probability
starting
of
success

1.
with
any
of trial,
o
x-values

· Geometric Random variables


INTERPRETATIONS IN CONTEXT:
-(p)
*

P(X x) (1
* =
=
-

p) (probateages chance/probability
Probability:"There a
_ix-value:1)
·
is

P(X x) geometpdf (p:


(context).
= ->

"
OF

P(X=x) ->
1)
geometraf (p:_ix-value:
*)
· Mean: "Itmany, many mint) were randomly
P(X(x) ->

geometcalf (p:_ix-value:
#) selected, the average (context) is

P(X1x) -> 1- geometraf (p:_ix-value:


about (M+unit)."
geometraf (p:_ix-value:
1)
P(x)x) -> 1 -

Standard Deviation:"If unit were


value):Mx E(X) t
· many
many,
mean (expected
* =
=

randomly selected, the (context)


standard deviation:Ox
*

F
=

typicallyvaries
by
about10+ unit)
(M+ unit) -

shape
* of the distribution:always right-skewed From the mean of ·

Random variables (Discrete, continuous,


when the sample size is small (withoutapplying the 10% condition) ·
Describing Binomial or Geometric):
· shape is right-skewed, p (0.5
the SHAPE, CENTER & VARIABILIM.
Describe
left-shewed, p>0.5
· shape is

approximately Normal, p 0.5


=

·
shape is
CHAPTER 7:SAMPLING DISTRIBUTION
IF:
CLAIMS:

I
DEFINITIONS: thereis convincing
* 5%
true.
① Assume that the claim
is
proportion of
dots

numberthatdecanon.
*

(p, M,0) #1
evidence
·
Parameter -

② Make a simulation the


of From
based on the
sampling distribution.
question.
describes
·
Statistic (I, X,sx) -
number that 0
3 From the simulation...
some char. OFSAMPLE. dots
5%, thering evidence
out of
count the number of proportion of dots
*
Distribution values ALL individuals
Population of the total number simulation.
of #1
-
·
From
in based on the
a sample.

values POSSIBLE
question.
Distribution ofALL
sampling
-
·

the
samples of same size

From the same population. INTERPRETATONS:


when
* increasing the sample size
n)
population?"Assuming less than 10%
is
the
·
10% the
of
that sample size, OF
decreases the variability of
is met."
all (population in context). Then, 10% condition
sampling distribution.
center (MP
·
unbiased estimator -
is
the or
MI) is · Standard Deviation (one statistic) "In size
SRSS OF (sample size, n), the
the true value of
equal to
sample (proportion/mean) (subjectincontext)
OF
the parameter (Por M).
typically varies about
by (SD+unit) From the
·
Central limit theorem -
ifthe population distribution
true (proportion/mean) OF (P/M+ unit)."
is not Normal, but the sample

size is large enough (n = 30), (two statistics, difference)


"the (P, P2/Fi-x2 in context) in the
difference
=

is
the sampling distribution
by CLT.
sample (proportion (mean) (subject
or incontext) typicallyvaries by
approx. Normal
17
(SP+unit)
about From the true difference of (P1-P2/M, -M2+ unit).

FORMULAS:
Given one statistic: Given two statistics:
PROPORTON MEAN DIFFERENCE IN PROPORTONS DIFFERENCE IN MEANS

o
-

I
1
APPROX. NORMAL IF POPULAMON DIST. IS NORMAL
BOTH
⑰ NORMAL
POPULATIONNORMAL
IE

Approx. Normal by IFskewed:


ApproximatelyNormal by

j
conditiont Central Limit
T heorem Approximately Normal by
E
Large
counts condition (LCC)
Large counts

in
(C(T) central Limit theorem (CLT)
up I 10 n, P, = 10 n2 P2 ] 10 n, 30
↳ n = 30

a e~
n(1 P) = 10
-
12 = 30

mentines"asn
proportions/ Fractions/percents average/means
-

whensamplingwithoutrepairmanas Fre a

Erin

n L0.10N
I
Mp P Mx M
= =

Mx, xz M, M2

isane
In
I ⑧
-
=

# Op =
P(1 P)
n
-

or I
=

values OF I
and I are usually Found on
questions i
before the words less than or
greater than.
=>

EM_I
Mx within the
x less than, greater than or

M
at hand such as
-

· e
z z = =

finding theirdifferences
=

statistic. Therefore, its


the value when
Ox
x2

?
x,

·
, >xz
X - -
>0

,cz x,- z o

- ·
*

- -
-
+

P -
P
z

·z
=
=

of
⑧ -

P(1 p)
(P P2)
M ME,
-

(x) xz) Mx,-x


-

xu
-
- - -

or t ae
z z
=
=
2 =

normal car(lower,
=

use upper, Mir, 0:1) , -P2 Ex, -


Xz

.
i s less
find
to the probability that

value OR
than greater than
=(P,-P2) (P. -P2) (X, x) (4, M)
a

e
or
- ·9.z
- - -
-

two values (use two z-scores).


in between ·z =

- +
P1) P2)
P2(1
-

what is sampling without replacement?


*
nz

it doesn't become INDEPENDENT.


ag" =
It's when
(lower, upper,
normal car M:0, 0:1)
-

use
Meaning that when an event occurs, the probability is less
find
to the probability that
t hat
OF event
changes already. OR
value
than greater than a
or
For example, in a standard deck of cards, when I

two values (use two z-scores).


between
we take out one card and do not putitback in
in the deck (no replacement), then the probability
Ors ome changes
event because it'sone card less.

a
There's
*
"wording"ofthe question that will help
us identifyifwe do n eed
not to do the

10% condition.
-
When you see the statement


"Isubjects) LIKE MESE,"
then the 10% condition doesn't
need to be shown since we're

to the
- I
onlygoing to generalize P/M zP/m z, Plu zz
given sample, not the population. -

lower ber Tower upper



lower upper
CHAPTER 8: ESTIMATING PROPORTIONS WITHCONFIDENCE
The
question says...

vocabulary: &011.SEETIGE s! IAEeIIeE CIC%


·
point estimator:a chosen statistic (B, x, sx) do this FOUR-STEP PROCESS or
SID C??8
that will provide

appe
a reasonable legend:
-
& replace based
aboutthe
estimate parameter FOR - on the problem
calculations
(P, M,0) & s h ow

remains the same


interval ofplausible
& -

Confidence Internal:gives
· an
& - name I title
STATE:parameter &
(believable) values of
an C 1: & - reminder Inotes
- so
* what?
unknown population parameter
(C1) true proportion of ext].
confidence interval for p=
(p, M, o) based on a sample
data. method & conditions:
PLAN:Inference

(C1):success rate (capture rate)


·
confidence level inet. &One-sample z interval For p

the
of method that produces
generalize
(n+context) * so we can
Random:random sample

lii"
of to population.
the interal. the

without
so sampling
·
margin of error:Only accounts
for sampling variability,
samples like
these, replacement
is ok

for SD
If
* the problem says "... For & we can solve
like
other condition. Thati s sampling
sources e r ro r
of
not s how 10%
do not

nonresponse, undercoverage, O replacement.


with
s o the sampling distribution
A
response bias.
is approx.Nor mal

· critical value (2*):invnorm (area:--iM:0;0:1) recommended)


Option 1 is
calculations:(two options. with the
some
* common 2* marks needs * be careful Formula
Only BLACK & RED
*

95%:2* 1.960 to be shown


0
I
point estimate 1
Margin of
E r ror
80%:2* 1.282 ⑪
=

-= (Statistic) I (**) (SEP)


99%: z* 2.576
n
P
=

90%:z* 1.645
=
=

z* (1 P)
I
-
*
= z
n

Some Formulass,reminders:
Using 1-PropIInt
Given the interval (A,B) where A is the lower value
ISTAT -> Tests A:1-PropIInt
->

A B
B is the upper value On the calculator
* I

X:np A, B

B
A (sample size)
#A
n:
margin
-
* e r ro r-
; of
pointestimate
=
2
C-level:C
*

CONCLUDE:Interpretation

sample
* size:(when I is unknown, use p 0.5)=

A B
C interval tr
(it has decimals round up) We are confidentthatthe from

true proportion OF [parameter


in
context].
where
-neeeep:
·a
E sample prop.
=

-app-
margin
ME=
er ror
of

z*=critical value FOR


n sample size
=

STATE:parameter & C 1:
:
SEP
Standard error of
*
P(1-B)
confidence interval For 1 2 context
p-Pc=true difference
=

in the proportion of [context].


margin ofe r ro r. . . P
*

8 PLAN: inference method & conditions:


when ...

confidence level (1);ME (8) ⑳


inet. E Two-sample I interval For p, -P2

context)
sample of(n.+

interval) Random:random generalize

E
I wider ⑧ -

* so we can
independentrandom sample of
(n2+ context). to the population.
P
it

cond.
Moanainmincool consentences so campsite
replacement
is ok
·
nc 10.10N for SD
(1); ME(b)
& we can solve
sample size,
all
·
8 P,
in few
Large
*
(narrower counts:n , 110 so the sampling
&
*
distribution of p, Pz

-

n. (1 -

B.) I 10 is approx.Nor mal


D ecision
* making: recommended)
Option 1 is
DO: calculations:(two options.
-
isthe confidence internal contains the
marks needs * be careful with the Formula
BLACK & RED

G*Only
-its
to be shown
0
I
point estimate -
Margin of
E r ror

I, = Pz = (Statistic) I (**) (SEP)

M, R2 =

P,(1 B.)
Yz(1
=

*
- -
52)
p. 5 z +

-
I n, 12
have convincing evidence (in context). z* =

using 2-Prop Int


some interpretations: A B
ISTAT Tests B :2 Prop IInt - I
->
->

·
Confidence interval:(A, B) On the calculator
*
A B
X,:n,P, X2:UzPz I

"We are C1 confident that the interval


n,:n , nz:z
3

From
A E
B captures the
[parate,t] C-level:C

true proportion of CONCLUDE:Interpretation


parameter:P
...
=

M true
=
mean of ...

that the interval t


we are confident from

confidence level: (C1) (1-2 context)


·
captures the true
p.-Pc= difference in the proportions
[parameter
OF
in
context].
"Ifwe take many, many samples
and calculate
C% convincing evidence?
interval For each, about
a confidence

them
OF
will capture the
[Parmte"]" +, t 1st
->

is
proportion;
greater
-- Endproportion i-
+
-
noincitethe
a diFFerence bIC
internal contains 0.
CHAPTER 9:TESTING CLAIMS ABOUT PROPORTIONS

The question says:

Isthe 111 ade...?


vocabulary:

·
significance test:procedure for

data to
using
decide bet. two
observed
do this FOUR-STEP PROCESS or
SID C??8
competing claims (hypotheses).
FOR - appe -
&
legend:
replace
on the
based
problem
· null hypothesis (Ho): evidence against STATE: hypotheses, a, parameter & s h ow calculations

valuee remains the same


Ho:P Po
Parameterall
& -

test
Ho:
We
=

to
want
& - name I title
Ha:P(C., F) Po & - reminder Inotes

(d) - so what?
(context) using x =
*

where true proportion of


p=
· alternative hypothesis (Ha):evidence For
be in FUTURE TENSE
should always *

the
"ALL"
word
null value F orget
do not
(one-sided) Ha: parameter (or> *

method & conditions:


CHa:P ( or > Po) PLAN:Inference

Itwo-sided) Ha: parameter I null value inet. (One-sample z test


F or p
(n+context) * so we cangeneralize
(Ha:P Po) Random:random sample

S
F of to the population.
sentence) without
P-value (probability): probabilityofgetting 10% condition:n<0.10N (or in so sampling
these,
· *
is ok
- replacement
He that is samples like for SD
evidence For If
* the problem says "... For & we can solve
cond.
10% condition. Thati s sampling
than the s how
as strong/stronger do not

observed evidence when


replacement.
with
s o the sampling distribution
A
*
Ho is true. 10 & n(1-p)? 10
is approx.Nor mal
Large counts:p1
do NOT
use here!!!
(2):where
*

·
significance level we compare the Option 1 is recommended)
P-value DO: calculations:(two options. with the formula
BLACK & RED marks needs be careful
*
paramete
-
Only Statistic
-

possible: 0.01, 0.05, 0.10


0
1
to be shown
② standardized teststatistic (z).= SD
is use 0.05.
no &
IF given x
P =
In n = Po =
- Po lower:
M
;
0

Ho. There is normalcdF


P-value <2:
*
REJECT z =
upper: i0:1
evidence For Ha (in context). z P-value= Po(1 -

PO)
convincing
=

n P-value
TO REJECT Ho. There is 1-PropITest
P-value) &:
* FAIL
Using
Ha (in context). Tests 5 :1-PropITest
not convincing evidence For
ISTAT ->
->

H
On the calculator
*
I Error:we rejectHo, when to is true;
· Type Po
* (two-sided)
Po: Po

npoot mere
-

For Ha
<Po&> Po (one-sided) lower per
in

gives convincing evidence when


n
X:
Ho is True.
n:n

He is true;
·
Type II Error: we rejectHo,
rail to when

CONCLUDE:Interpretation
gives not convincing evidence For Ha when
Because the
(P-value)
p-value of (/) a 2, we
Ha is True. =

I fl e ss than ()
Test:probabilitythatthe test
will find *

·
Power a
of There is ( Hain context).
Ha when a
rejectNo. mining
evidencethatt h e t ru e
P: proportion of

convincing evidence For


It
* greater than (3)
alternative value the
specific of is not convincing evidence ( Hain context).
Fail to rejectto. There thatt h e t ru e
P: proportion of
is true.
parameter

& reminders:

-app-
some Formulas
FOR
TRUT

hypotheses, a, parameter

i
STATE:
HoTrue HaTrue where true difference
p,-Pc= (1-2 context)
P, P2 =

(context)
-
We to
want test Ho:P,-P2= 0 in the proportion of


(C)

tytype
standardized
Ha: P,-Pc(.,I) O
using d

the
=

test statistic P,<P2 p, >Pz P, IP2 should always be in FUTURE TENSE


Forsites
*

the
"ALL"
word
parameter
* F orget
do not
-Statistic method &
-

conditions:
PLAN:Inference
SD

in F.
P (Type I Error) &
=
met. (Two-sample z Forp, -P2
test
we can generalize
(n.+ context). * so
M ISandom:random sample random assignmento f
population/
x

S
-

error) 1 Power to the


P (Type I z
-

= =

(n2+ context).
O
independentrandom sample random assignmento f show causation.

Power=1- P(Type I Error) without


where: *
10% condition:
1: 10.10N cor insentence). So sampling
is ok
B cond. replacement
x
the power: 10.10N for SD
=

Increasing nc & we can solve

sample size (8), alpha (8), M = Po


P, n2E,
Large 110, =I
0p
*
counts:n , I 10 so
A the sampling
distance Ha (8)
to 0

&
=

n. (1 -

Pc) I 10 n=(1 -
) 110 distributor al
do
* use
not individual , &Pc here!!!
some interpretations: recommended
is
options
calculations.(two options,

So
Type I Error: with the Formula
marks needs * be careful
Only BLACK & RED Statistic
-
paramete
(contexti s true,
*

"The null hypothesis to be shown


② standardized teststatistic (z).=
1 SD
we find
but convincing evidence

P1P
In YMz
=
x
n, nz
=
=
M 0
4, z
" =

lower:
(context).
;
For Ha -
P, -
P2
normalcdF
z
upper: i0:1
Pc
=
z P-value=
Type I
Error: =

Pc)
=
P,(1 Pc) -

Pc(1
-

P-value
"The (context) is true,
+

alternative hypothesis 2 PropITest n, n2


using.
-

evidence
but
we do not
find convincing ISTAT ->
62
Tests->:
-
PropITest
For Ha (context). On the calculator
*

H
Power:P(Reject Hais
to true) x1:n , P, * P2 (two-sided)
n1:H ,
<P2 & 7 P2(one-sided)

npoot mere
-

"IFHa lower Ener


in

is true (at
a specific value in context) 2 :Me
X Pz
R2:Hz
there is a (power probability F inding
of
"
P, P2
convincing evidence rejectt h e
to null (context). CONCLUDE: interpretation
<

context.
P-value: Because
()
the
(P-value)
p-value of (/) a =2, we
Use
*
as
I fl e ss than
*

"AssumingtoistheHostsare
e
There is (Ha context).
mining
evidencethatt h e in
rejectNo. p Pz t ru e
-- = diFFCrnCe OF ...

probability g etting
of
than (3)
It
* greater
Fail to rejectto. There is not convincing evidence thatt h e t ru e difference (Ha incontext).
P. -P2 =
or ...
:
:
CHAPTER 10: ESTIMATING MEANS WITHCONFIDENCE

vocabulary: Some formulas and reminders:

t*(critical value means).For the margin e r ro r (ME)

amthe same
· For of
* MEX, nM,
be
to used, we need the Margin
* e r ro r
of

population D
(W), but
ME is proportional
be,
*
to so...
since h ave
we do not that,
and are given sample SD(SX),
the distribution will vary more. nx4
ME, assuming everything
=

I
remains the same
·
degrees freedom:since
of we are using t* and o u r sampling *
ME is larger for higher CL.
distribution will vary more, we need

degrees F reedom.
of ME
* doesn'taccountFor bias, only
t his
think about ... if
Ihad
five sampling variability.
gummy bears, have
and I
five students,
I'll let
the studentchoose, theyhave
sample size
*

First know
Ifyou do not

*
*
5choices. The s tudentwill
next have 4

choices, until it
goes down only
to one n = t the t* From the

student, where thatone studenthave problem, use 2*


no more "Freedom"to
choose From based on the CL.
the varietyo fgummy bears. Thus, distribution
* d ifferences
of

n no difference
1 formula.
(E)
-

=> standard Normal


[ 7

.........
1- diFFY (t)
It
diFF
plot OF
dot
L
paired data
0

*Two treatments (M.-M2) b ased


is on the

difrences means
of (X, -X2).
"standard deviation"of
·
Standard error (SEI): this is the
Mean
*
difference (Mdiff) is based on the
a
sampling distribution for mean,
but since we do not know the
paired data means (XaiFF).
population standard deviation (Ox), ↳ f rom
result recording two values
replace it with Sx on the
we
of the same quantitative variable

Formula. For each individual OR For each


pair similar individuals.
of

The
question says...
IEEEIreE CIG% do this FOUR-STEP PROCESS or
SID C888
2011,SEATIGE s,
gerete m
appe app
&

FOR - FOR - .
&

& -
s h ow calculations

remains the same


I title
STATE: & C1.: & - name
parameter reminder Inotes
& -

STATE:parameter & - so
* what?
C 1:
(C1) M true mean of ext].
confidence interval For
=

(%) confidence interval Form,-M2=


true difference 1 -
2 context

owe cangenerate
PLAN:Inference
method & conditions:
· in the mean [context].
of

inet. &One-sample interval For M we can inter inference method & conditions:

experiment,
random so PLAN:

(n+context) OR & effect


Or
cause
Random:random sample assignment

S
of random

without
inet. E Two-sample Minterval Form, -

M2
(or in sentence) sampling
*
101 condition:1 <0.10N
so
ok (n,+ context).
these, replacement
is
Random:Independent random sample of

I
-

"... For samples like & we can solve for SD (n2+ context).
cond. If
* the problem says -

Independentr andom sample of


do s how
not 10% condition.
dist. is approx. Normal OR
population sampling
bot) (in context)
so
A the
Normal/Large sample
*
n = 30 by CLI
distribution is
(for
randomized experiment
doesn't
·given sample data, graph
·
3
only one these
of approx.Nor mal n,<0.10N
10%
*
*
should be satisfied to
show strong skewness or outliers condition: -
(or insentence)
be approx. Normal. nc <0.10N BOTH
Option 1 is recommended) -

dist. is approx. Normal


calculations:(two options. · population
DO: with the formula
be careful 30; by CLI
& RED marks needs *
Normal/Large sample n, 1 n2 1 30

I
* Only BLACK *

0 given sample data, both graph


*s'deatto
to be shown 2 Margin of
E r ror
① point estimate I
doesn'ts how strong skewness

fail.
x = df n
=
- 1 =
(Statistic) I (t*) (SEx) be approx. Normal. or outliers

*
** all From DO:
n =

calculator X = Option 1 is
recommended)
Sx =
*
calculations.(two options. with the Formula
marks needs * be careful
BLACK & RED
t* inrI Only
*

df n 1 shown
0
=

be Margin of
E r ror
Interval to 2
(8: is estimate I
= -
=
5Inter
① point
int: using
areach X, (Statistic) (t*) (SET)
I
A B
= Xz =

A B
I

t
I
M, R2
t*
=
=

also
Table B.
calculator
X, -xz I
For ** you can use

the list . .
on
Table B, if itis not
For of in

from the list.


(e.g., od where n is the
CONCLUDE:Interpretation always round of down *
invI calculator
t = df =

df n
=
-
1
smaller sample
A +unit Brunit
C to size between the
2-SampTInt (0: 2-SampTInt)
interval
We are confidentthatthe From
two
using given.
captures the M =
true mean [parameter
OF
in
context].
A B
A B I
,

For
pro-an like one
diFF
sample tinterval for m
CONCLUDE:Interpretation

A Brunit
all
* are the same process a
we are
(C%) confident that the intewal From
+ unit
to

exceptfor the
-
Following: captures the M,-Mc =true difference (1-2 context) in the means
X
Use
* XdiFIinstead of only [parameter
OF
in
context].
use
* MdiFF instead Monly
of

Use
*

UdiFE NdiFEinstead of n N only

Use
* SdIFFinstead SxOnly
of convincing evidence?
[context]. Yes,
parameter:Mdiff
*
the true mean difference of
+, I -> Yes,
convincing
;
-

-
-

convincing
i -

+
-
eng
inference
* method is called:paired to interval a diFFerence bIC
or
internal contains 0.
one sample t-interval For MaiFF
in
*
the graphing, graph the
individual data each set.
or
differences, not
CHAPTER 11: TESTING CLAIMS ABOUT MEANS The question says:

E.letreeatIcezcoe
Afew things to remember about the 10% condition: Is
read the problem
* note:always
i tw i l l apply!
Apply10% condition when: carefully if

-sampleaname an
-
sampling withoutreplacement
-
not independentevents diet
(mostlyproportions or two prop.) legend:
-random samples based
-
& replace
on the problem
not apply 10% condition when: STATE: hypotheses, a, parameter s h ow calculations
Do &

remains the same


like I volunteers
We to
want test Ho:M MdiFt=
Motunit & -

-
samples these
name I title
& -

-
independent Ha:M MdiFF(<,, I) Not unit & - reminder Inotes

(context) using G (d)


= - so
* what?
-random assignment (experiments where m true
=
mean of

(Context) 6 (d)
OR mean difference of using
=

true
reminders: MdiFF= "ALL
the word
* F orget
do not
· connection between CI& ST:
method & conditions:
·
IFthe interval contains to (null value) as a plausible value, PLAN:Inference

rail rejectto. There is convincing evidence paired t test


for MdiFF
we
inet. (One-sample me
to not
test for

that (Ha in context). (n+context) random assignment(context)


Random:random sample

E
of experiments
Ift h e shownandth
to pop.
·
interval does NOT
contain Ho (hull value) as a plausible -> so we can gen. -> so we can

value, we rejectto. There isconvincing evidence


10%
*
condition:n<0.10N (or in sentence) -
Sameor
cond.
a ssigned/experiment
-

n ot needed it
(Ha
that in context). *
so dist
approx. Normal.
* Population distribution is
is
A C will make the same decision as a two-SIDED ST. *Normal/Large sample:
(n) by CLI. Normal
- 30
*

n
* =
=

MF Mo strong skewness or outliers.


sampling distribution shows
no
2 0.05
(1:951 >
=
*
-

-
c1.:90% x 0.10 =
Do:
1 Option 1 is
recommended)
calculations.(two options. with the formula
c1.:99%> 2 0.0 1 be careful
=

marks needs *
Only BLACK & RED parameter
=

*
Statistic -

-
c1. -> x = 1-C1 ①
to be shown
0
2 S tatistic:
Test
standard error
n =

IdiFr=

FT
B in Finding P-value: t ((a(c) t EdiFF-MdiFF 1
df
=

table n
using t OR
= -
=

Sx
· =

SdiFF=
( ca1c) SdiFF
to df
=

· one-sided:you need of, then check the tail probability. In


p-value in
using tcdf:(lower: upper:-idf:-I
Then, give an interval ofyour this Format:
2:T-test P-value = -;

<P-value ↳ see p-value is greater P-value =

Ifyou by2.
it
end if two-sIPED (F), multiply P-value
than o, check the Ifi ti s
*
· two-sided:you need the same thing as a
one-sided. But write
has "E-A" the
the p-value interval you s hould it'sstill below
get be whole thing if

multiplied by2 (x2). E- 6.

1.24...E -
5<0.0000/
xc < P-value < x2 (2.9.
1.24x10 0
soon
1.24 E 7 -

Loweror loweron
-
...

lowerto
or
upper:1000

CONCLUDE:Interpretation

-e
Because the
(P-value)
p-value of (/) a 2,
=
we

* I fl e ss than ()

rejectto.
There is
mining
evidencethatt h e M:t ru e mean ( Hain context
of
+
unit)
It
* greater than (3)
Fail to rejectto. There is not convincing evidence thatt h e M:t ru e mean ( Hain context
or
+
unit)
I distribution
distribution
E.
thatis

-
*

N(m, -)
*when
ice, differentsor means
FOR
- appe
- WM
in

in
STATE
use

CONCLUDEu s e
HaiM,-M2

Ha: h, >M2
>O

in context.
STATE:hypotheses, a, parameter

We want to test - , M2 =
where M, -

Mc
in
true
=

the
difference

means of
(1-2 context)
(context)
t Ho:M, -

Mz 0
=

d (C)
using =

Hai, -Mz(.,I) 0
=

*wigharcmsion!
·A
-n, n ,
in STATE
use HaiM, -M2 <0 e,? mic na
CONCLUDEu s e Ha: M, <M2 in context. method & conditions:
in
PLAN:Inference

t
inet. E Two-sample t testF or M, -M2
Random samples Random assignment/experiment
Random:Independent

E
sin context)
i n context)

-
HaiM,-MzF0

asees
in STATE
use
* not s amples
needed if are independent
or
assigned/experiment
10%
-

in CONCLUDEu s e Ha: M.FM2 in context. *


condition:
A nu cond. -

*BOTHpopulation
distributions are approx. Normal.
·

Normal/Large
* sample:
(n.) 1 nz) I 30 by CLF.
t - 30 and n2=
*n, =

strong skewness or outliers.


shows no
is whether that
is in * BOTH sampling distributionS
one-sample-one sample given
1
·
Do:
DATA form or STATISTICS Form (X, Sx)
Option 1 is recommended)
calculations:(two options. with the formula
two samples whether thati s in needs be careful
*
given & RED marks

I
are BLACK
means Only Statistic parameter
-

difference of * -

0
·
to be shown
data) 2 S tatistic:
Test
DATA form (Find mean & Sd oreach or
① standard error

x, *z (x, x2) (m, hz) 1


df
-
=
=
n
-

X2, Sx2)
-
-

(I., Sx.,
=

STATSTCS form t =
where i n here is

taf:sower-ins:-
Sx, = Sxz =
the smaller sample size
· mean difference - one sample data are paired;
using n, =
nz =

DATA (two data are given and are paired (


form :2-SampTTest
Firstthen
subtract
* Find
YaiFr &
SaiFF t (ca1c)
=
df ((a1c)
=

P-value =

by2.
(F), multiply P-value
STATSNCS form (diff, Stiff) P-value
(calc)
= Ifi ti s
* two-sIPED

· P-value (non-calculator)
-

parameter (null value)


E Mo- statistic -
-

t
floor
=

loweron or
Loweror
lower t
standard er ror of
SEx
upper:1000
CONCLUDE:Interpretation

One-sample two-sample
* t test M, < M2
* paired
*
TestFor M,-hz M, -M2 0
(P-value) (/)
=

M context.
For
2, M,<Me
-

F or M
t Test Because the p-value of a =
we use as

(0 () M. M2
I fl e ss than
*

(x, x)
I resenttomereincing
evidencethatweeitherthedifferenceandstainatt
- -

IdiFF -

MdiFE
t t IF
SdiFF
=
=

It
* greater than (3)
S. Sch
t naiFE
n,nz
CHAPTER 12:INFERENCEFOR DISTRIBUTIONS & RELATIONSHIPS
do
*
not always look at the
question. Always identify the

sample and variable to know which Chi-square to use.

The question
* mightstate ...

& significance level


STATE: Hypotheses
"Do the data give evidence aboutthe distribution of..." Ho: The claimed distribution of (context) is true correct).

Chi-Square Goodness ofFit (GOF) Ha: The claimed distribution of (context) is not (true/correct).
-

perform test
for a distribution given these
*
claimed distribution can be changed to the

wording of etc."
the problem like
"equally likely,
One samples,one variable Using & =

PLAN: inference method & conditions


e.g. M&M's "MOMBag""color"
Inference Method:Chi-Square Test
for Goodness of Fit
-expected counts: Pi so we can generalize
conditions:Random:Randomlyselected (n+context) to
the population
where n sample
=
size or
*
Randomized experiment (so we can infer
causation
<0.10N
Pi = individual probability 10% coud.:n -

without
same
*
rule applies,
so sampling
OK
on this 10% cond
replacement is

-
distribution is not approximatelyNormal like the previous ones.

counts 5s o sampling dist.


but it should be approximatelyChi-square Large counts:All expected are

e
I
is approximately
I fit'spossible to list your
*

Chi-square
which is right-skewed. DO: expected count, then list it.

(non-negative values) E.) ... (0:-Ei)


(0,- x" from calculator df k 1
=
-
- =

0 x
.

significant,
=

Follow up analysis:IFthe testi s statistically


Ei S TAT TEST
-
where K is your

find the largestc omponent (contribution) - - D:


-> GOF Test number of

X and explain. categories.


of first last
1, -> observed

expected

I
L2 ->

P-value P-value
= (
from calculator
The question
* mightstate ...

or jupper:idf:)=P-value
"Do have ..." tcdf(lower:
we
convincing evidence of
a difference
P-value reject
Homogeneity
CONCLUDE: to.
Chi-square of the p-value of (2 C, we
=

Because
*

There is convincing evidence that[Ha in context).


-

perform test
for a distribution given
P-value (& to reject o
the p-value of C, we
fail
sampless,one
=

Because
*
It variable
There is not convincing evidence that(Ha in context).
"2 Brands of""color"
e.g. Gummy Bears
Gummy Bears
(RT)(CT)
-expected counts:
PLAN: inference method & conditions
( TT)
RT total
row
Tt table total
=

Inference Method:Chi-Square Test


for Goodness Fit
=

of
CT column total
=
so we can generalize
conditions:Random:Randomlyselected (n+context) to
the population
& significance level
STATE:Hypotheses
or
*
Randomized experiment (so we can infer
causation
in categorical var
distribution <0.10N
Ho: There is no
difference 10% coud.:n -

without
population 1 and population 2. same rule applies,
so sampling
for
*

OK
on this10% cond
replacement is

There is in categorical var


distribution
Ha: a difference like the previous ones.

for population 1 and population 2.


Large counts:All expected counts are
75, so sampling dist.
** make a table of expected
is approximately
Using & =

MATRIX Chi-square
values using the
DO: 1 -
15). Edit
->

FUNCTIONo n your calculator.

[A]- observed (do not


E.) ... (0:-Ei) from calculator

(0,- x2
-
include
x reject
CONCLUDE: P-value to.
=
.
=

table to tals) Because the p-value of (2 C, =


we
*
S TAT -
TEST
Ei
[B] expected (automatic) There is convincing evidence that[Ha in context).
Test
+

->
C: do not edit
- -
P-value (& to reject o
first last
*Because the p-value of C,
=
we
fail
R r ow
=
P-value P-value
= (
from calculator
There is not convincing evidence that(Ha in context).
df (R
-

1)(c -

1) 2 column
=

jupper:idf:)=P-value
=

or
tcdf(lower:

The question
* mightstate ...

& conditions
PLAN: inference method

"Do we have convincing evidence of


an association..."
Inference Method:Chi-Square Test
for Goodness of Fit
Independence
Chi-Square for so we can generalize
conditions:Random:Randomlyselected (n+context) to
the population
-

perform test
for a distribution given or
*
Randomized experiment (so we can infer
one sample, two variables
<0.10N
causation

10% coud.:n

-Tactingdebtre
-

Taco Tongue without


"Seniors" same rule applies,
so sampling
Eyebrow
*

OK
2.9. Evil on this10% cond
replacement is

seniors like the previous ones.

(RT)(CT)
-expected counts: sampling dist.
Large counts:All expected counts are
75, so

( TT) ** make a table of expected


is approximately
Hypotheses & significance level values using the MATRIX Chi-square
STATE:
FUNCTIONo n your calculator.
Ho: There isnot an association between (41)s,(rare) for (sample)
CONCLUDE:
the p-value of
P-value
(2 C, we reject -
Ho.
(sample)
=

Ha: There isa n association between (41)s,(rare) for * Because

There is convincing evidence that[Ha in context).


Using & =

DO: 1 -
15). Edit
->

Because the p-value of


P-value (& =

C, we
fail to reject o
*

[A]- observed (do not


E.) ... (0:-Ei) from calculator that(Ha context).
(0,- x2
-
in
include convincing evidence
x
There is not
tals)
.

table to
= =

STAT -> TEST


Ei (automatic)
C: Test [B] expected
+

do not edit
->

First last
jupper:idf:)=p-value
R r ow
=

P-value P-value, from


=
calculator or xicdf(lower:
df (R =
-

1)(c -

1) 2 column
=
CHAPTER 12:INFERENCEFOR DISTRIBUTIONS & RELATIONSHIPS

Sampling Distribution of slope (b)


2. I. N. E.

Iconditions.
I I I I
some symbols:Statistic Parameter

Population

·
y-intercept a

o
My x
=

Bx
+

slope D B
SD residuals S
of
sam"re
y a
=
bx
+
5D slope SEb Ob
X-rar X-rar

scatterplot needs to show a linear relationship.


LINEAR:the
*
Some Formulas: have leftover curved pattern.
Also, the residual plot no

computer output:
SHAPE:approximatelyNormal
is independent each other.
T P Each observation of
SECoef
CENTER:Mb B
Predictor
axCoef SEaY
INDEPENDENT:
*

withoutreplacement, check 10%


cond
prae
=

When sampling
I constant
VARIABILITY:Ob *
x two-sided
SEp*
L
t
=

Ox b* ↓

(x-variable)
n
test

S
sq(adj)
STANDARD ERROR: SE R = R -
=

=
S = -

sg
n -
1 residuals
~SD of residual
"cottenation (re) NORMAL:A
* dutplot of residuals
cannot show

Y;
2
(ti skew or outliers.
strong
-

SD RESIDUALS:S
OF =

n 2
(SED):
-

Standard b
Error of EQUAL SD:The
* residual plot has roughlyequal variability

STANDARY EST:t bE
=

"The slope of the LSRL for (x-var context)


at x-value.
each

RANDOM:Data sample randomized experiment.


* came
from random or

CONFIDENCE: one and overcontenttypically vanisethesee


I
margino f
er ror

b, as of this notes.
interpretations
DECREEOF: a, are chapter 3
dfe * on

Confidence Interval for Slope (b)

STATE: parameter & C1.: DO:O calculator or & Formulas

L1:x-values;L2:y-values ② Use this if


Format you were given

(C1) a computer output

Fidenceinterfor
theseteeth.
G:LinReg TInt

populate
Tests /
STAT >

b b
= t* t *
=
b t*
I SEb

df n
= -
2 =

-in-1, of df n
=
-
2

A, B

t* t *
2

SEp=Eb
<
gie computer =

PLAN:Inference method & conditions: output


A B
-in-, fi
- Interval for slope T the output.
Inference Method:One sample t*is not the on

CONCLUDE:
interpretation

3
conditions:L i n e a r :
A B
C

sets
Independent: interval
always i n We are confidentthatthe from
context

!orase captures the B true


= slope the
of LSRL for and Sext].

Test Slope (b)


Significance for

STATE:parameter & hypotheses: DO:O calculator or & Formulas

to test L1:x-values;L2:y-values ② Use this if


Format you were given
we want ①
computer output

3 ineserestre
*nolinear relationship F:LinReg T Test
a
Ho:B 0
=

STAT > Tests >


*thisti s also
relationship b
-
B c
negative t T from the
Ha:0 5 from output t the
=
*

t t
or =

SEb
=

relationship output.
b > 0 positive
*

n 2
df
=

tcdf
=

lower: idf:
-

relationship - ;
upper:
B 0
= some
*
conclude part.
P-value: P-value:
population (SRL
B true slope of

=

p-value is greater multiply P-value by2.


[x-context] and [y-context] Ifyou see
it Ifi ti s
* two-sIPED (F),
for than o, check the end if

has "E-#" write


the

using a C it's
if still below
=
whole thing
E- 6.
-

1.9.-izixi os
PLAN:Inference method & conditions:

Inference Method:
One sample t for
test slope
lower.
I
Upper: 1000
loweron soon or
lower t
upper:1000

3
conditions:L i n e a r :

Independent: always i n
CONCLUDE: interpretation
context (P-value) ((X)
easi: Because the
()
p-value of a C=
we...
I fl e ss than
Random: (x-context] [y-context].
*

Ho and
There is between
rejectNo. mining
evidence. OF

It
* greater than (3)
is not convincing evidence. OF
Ha between (x-context] and It-context].
rejectto. There
Fail to

Congratulations!
You have finished the AP Statistics Course!
—Mr. Jeremiah James dela Rosa
Thank you to Stats Medic, Luke Wilcox, and Lindsey Gallas!

You might also like