AP Statistics Chapter Notes (1-12)
AP Statistics Chapter Notes (1-12)
PLACEMENT
Statistics
Notes by:
Jeremiah James dela Rosa
Temecula, CA
[email protected]
Thank you, Stats Medic, Luke & Lindsey!
ANALYSIS
CHAPTER 1:DATA
!in
-
5678
frequency(percent/proportion)
I
-
relative
233477
· Two-Way Table (Label) 3 589
Total
·
marginal relative frequency: (P(c) DOTPLOT 4015
isdoc's s
variable 1
STEMPLOT
ranges
percent or proportion OF
I stem-and-leafplot)
-
HISTOGRAM
individuals have
that a
e
A B
en e
one
specific value For
I categorical variable.
-
j · jointrelative Frequency: (Alc) roughly
symmetric
right-skewed left-skewed uniform
Lunimodal)
double
(bimodal)
peaked
-
percent or
proportion of
value s kewed
*
same another
of use median
(condition).
->
categorical variable
Variability:The distribution of (context) has
variable
Graph Categorical
of
side-by-side
the question says...
pie
war gran
· chart
-
"Compare the distribution."
You
* will do the same thing with socr+context
Describe the SHAPE d istributions.
OFboth
Ent
bar graph
· #- segmented
Identify any OUTLIERS distributions.
For both
bar graph
compare the CENTER (which is greater (lesser)
varies more)
0 the VARIABILIM (which
-
compare
always
* write
in contextofthe problem.
mosaic plot
Quantitative with
Numbers
Describing Data Let's Talk aboutOUTLIERS:
·
The mean & SD a re greatlyaffected outliers.
by
Measures Center:
of
(non-resitant)
·
mean-average I
2x:
=
· The median & IQR are not affected by outliers.
n
cresistant)
a
mes en
middle value (odd value datal
of
· median -
med mean
Variability: med
mean
Measures of
or the distribution is skewed...
IFthere are outliers
·
minimum
~x-values
range:
·
maximum -
& IQR
use median
n -> data
n o . oF
for means
How Find
to outliers?
Interquartile (Quartile 1)
·
range
(IQR):IQR Q3
=
-
G1- · 1.5xI P R
Rule
(25%) 91-(1.5 x
IQR)
↳(Quartile 3) low outlier <
(for median)
(15) outlier > 03 + (1.5 x IQR)
high
SD:
Interpreting if?
what
BOXPLOTS the min.
"the (context) typically varies by
about
min.
&I med Q3
max.
if
what
is an outlier,
or max
11
Outlier whatwill be your
(SD + unit) From the mean (X+
of unit)." ↳ min. I max.?
-
remove your
outliers, label them
(Sx)<
-
minimum
S
-
P - P LandlorL2 -
median
-
1 1-varstats
maximum
Sx -
I
2: z-arstate
:
CHAPTER 2:MODELING DISTRIBUTIONS QUANTITATIVE
OF DATA
Location Distribution
Describing Frequencygraph (OGIVE)
in a
cumulative relative
Example:
Two ways to describe location
cumulative
Cumulative relative
* percentiles -
p1 observations
oF Age Freg. Freg. Freg. vel.
Freg.
less than or equal
to it. 40 -45 2 7 2 4.4 % 7 4.4 108
20%
standardized.
*
For an individual value 45 -
50 7
L
-
9 15.6% -
80
scores
in a distribution tells
(z scores) 48.9%
55 22
-
us
L
deviations From the - 60
55 60 75.6%
Falls, 26.71.2,
-
the value
x M
mean 12 L
>34
-
and in whatdirection. 40
z 60 65
15.6.1, 91.11.
-
y
>4)
=
L
O
Z- score Interpretation: 20
where 65 70 3
-44 b. 712, 97.81
-
x va l u e
=
"context) is (E-score)
mean 70 -75 2.2 1. 100%
I
>45
=
m -
standard deviations 4045505560657075
0 SD
=
(apore below)
3
the
percentile;M e d i a n :5 0th percentile; 03: percentile
1 -
75th
91 = 25th
33
mean of(m+unit). an
* ogive allows you to examine location in a distribution
& vice-versa.
percentile for an individual value
Transformation Data
of
CENTER
SHAPE VARIABILIN
LOC. centers,location
*
since
the height
the area
12?
under the curve should
t
be equal to 1, then the distance
9
2 on
is always above the horizontal axis.
Height
*
has exactly
*
1 it.
underneath
reciprocal of the heights.
-
The a re a under the cur ve
1
x
1
=
in -
all observations thatFall
2
# ApproximatelyNormal
that interval.
Mean of
a density curve-pointa twhich
the balance in
roughlysymmetric, single-peaked, bell-shaped
curve would by
*described a
made solid
of material. Normal distribution
density curve called a Normal curve. Any
the
parameters:m e a n (M) &SP(O).
is
Median of
a density curve -
equal is completely specified bytwo
t
12Ft Finding a value
Finding the area
-1 -or
tail
under the curve From an area
<Probability)
1. S
med med X x <med always Find
* z-score than
X=
greater
!
First
left-skewed 3:invNorm
roughly symm, right-skewed M:0
1.
lower:2
area:a re a
I
0:1
Upper:1000
M:mean
I
context in between 5: Sd
~ 99.7%
I N Assessing Normality
~ 95%
* If these values close to
I(SD)
a re
- mean I count
~ 68 68-95-99.7, then the
mal. . . . . . .
I count
mean
Normal Probability
plot
-
the scatter plot of ordered pairs (x, y) is
Besi
-
30 20-0 20 30
M 0
- -
CHAPTER 3:EXPLORING QUANTITATIVE DATA
TNO-VARIABLE
variables ·
o r explain
helps predict
· Explanatoryvariable (input)
-
-
preferably, a correlation has a
in response
changes a
shown.
graph
variable.
-only a number. NO UNITS.
·
Response variable (predicted output) -
measures the
outcome of
a study. -
does not implycausation.
no perfect
correlation moderate correlation
· Scatterplot
obsstrong!
Least -
Squares LSRL
Line
Regression
-
0
weak
↑ How to describe
this CC
0
Expanony
the correlation r
of =
confirms the
that
scatterplot?
linear association between (explanatory) and
y)
between (explanatory variable) predicted value for x (#
=
in
context?"
relationship
predictedy
a
=
bx
+ - explanator
variable
·
y-int(a): "When (x-context) is 0, the
↳
y-int predicted by-contexts is ly-int)."
·
Residual in Calculator:
(s):
1.) Type your data in L1 and L2.
· standard deviation
"The actual ly-context)
2)
Calculate (SRL Equation. is typically about (s+ unit) away From
3)
Go back the
to table. Highlight (3 the number predicted bythe LSRL
4) Choose RESIDUAL.
· COCFFicient
ofdetermination (r4:
This is
3
a
"About in
variability
RESIDUAL PLOT. (r2% the
of
E
I t identifies if
LSRL
a LINEAR MODEL
ly-context) is accounted For the
by
is APPROPRIATE. at (x-context)."
i We look for
NO LEFTO VER
CURVED PATTERN.
E
X PLAN ATORY
b
r.
=
a
y
=
-
bx
the
of range data
of which the
Influential points
·
-
the the
values of
1:raise
· Power Model:Option
explanatory variable by an integer, p.
response variable.
possible sample
* make sure to do
WITHOUT REPLACEMENT
SAMPLING
remains
SAMPLES
CHAPTER
gives every
mining
to reach
MPLING
easiest
TYPES OF SAMPLING WELL identical slips
·
of paper
POPULATION ·voluntary sampling
iRNEeD
xx x
individuals choose
ter
-
to be a part of the
#
+
of the study.
individual divide the population in
·
cluster sampling underestimate
Label: Label each
-
repeats, if necessary). Randomly select desired sample size. Randomly members of the
combine these SRSS to near each other. some
select: Choose the individuals k to likely
the sample. all the select a value from Ito population are less
form these clusters and
first individual,
who correspond to the integers. be chosen or cannot
be
similar within clusters identify the to
↑able D: strata are
individuals in the chosen individual. chosen in a sample.
and choose every kth
*
when
Label: Label each individual CHOMOGENEOUS), but diFF
are included in the sample.
iF there's
* a pattern in -nonresponse:
occurs
to assign
individuals that chance process (slips of paper, RN6, - Use a random number generator to
Select: Choose the ·
Random Assignment: Use a
randomly units to treatments. This helps create produce 10 different random integers
correspond to the experimental
treatments are imposed from 1 to 20 (random assignment)
selected labels. roughly equivalent groups before
Control: keep other variables the same for all groups. Control helps
avoid - Select the first 10 different
Slips of paper:
·
enough experimental
units so
treatment on - Select the remaining companies
Replication: Impose
each
s lips of paper.
(Group 2) and assign them to the
·
monomn andreassignmenttorepeaterintothebroch
a ttempt to influence assignment), select these students
the
respons. Experimental and assign them to online (Block 1 +
BOMIZED BLOCK DESION Treatment 1). The remaining
-
delibaretly impose students are assigned to teacher
(conditions) Treatments
- taught (Block 1 + Treatment 2).
tex
treatments (h)
on individuals to ->
random scompare Block
I - Assigned each individual student
measure their responses. ↑ individuals
Ch) Treatment I
(n)
combine
compare
&
from 1 to 400 for Sophomores. Use
a random number generator again to
Treatment) f
moman
ED PAIRS DESIGN &Block ->
random - (r) compare obtain 200 random integers (random
assignment t assignment), select these students
(n) ->treatments
a common experimental and assign them to online (Block 2 +
(x)
Treatment 1). The remaining
design for comparing two example: online
I students are assigned to teacher
treatments that uses blocks
MPD, eshmen
#
random
number
↑ 1503 compare
1 taught (Block 2 + Treatment 2).
of size 2. In some -> sources
combine results - At the end of the course, let them
inClass
a
two very similar experimental - (100) generator -
& take the same geometry final exam
units and the levels
are paired a grade compare and compare the scores (compare).
two treatments are randomly - &ophomores ->
random
number
online
(200) compare
X
a - Once all students have taken the
within each pair. (400)
assigned generator X in-class -> scores test, and the scores have been
In others, each experimental 2208) compared for each treatment for
each block, then combine the results
unit receives both treatments levels: different values of a
and compare (combine and compare).
·
in a random order.
VOCABS988 Factor-
example: occurs when combination of treatments?
*
·confounding: levels
are associated Factors x
A track coach wants to know whether his long-distance two variables control: keeping other variables
that their control group: used to provide
·
way
·
in such a
runners are faster running the track clockwise or baseline for comparing the constant for all experimental
response variable
a
units so
each other. enough experimental
your method of pairing. that has fact that some subjects in an that any difference in the effects
· placebo: treatment experiment will respond
ingredient, but is can be distinguished.
no active favorably to any treatment,
Description: otherwise like other treatments. block: group of experimental
even an inactive treatment. · a
Have each long distance runner race 1 mile in each ·treatment: a specific condition known before the
·double-blind: neither the subj units that are
direction. Some runners are faster than others, so using applied to the individuals in experiment to be similar in some
nor who interact with
those
each runner as his or her own “pair” accounts for variation an experiment. them and measure the way that is expected
to affect
the runner will race counterclockwise first and clockwise variable which treatment subject is Statistically significant: the
Factors: an explanatory
·
·
a
second. Allow adequate recovery time between the races. that is manipulated and may divingassignment:
rare experimental
·
are
bserved results of
too unusual to be
a study
explained
For each runner, record the 1-mile race times for each cause a change in the response units are assigned to treatments chance alone.
variable. by
direction. using a
chance process.
SAMPLING VARIABILIM & SAMPLE
SIZE
I
random samples tend to
larger
closer CRITRIA FOR ESTABLISHING CAUSATON WHEN WE CAN'T
DO AN EXPERIMENT:
produce estimates that are
is strong. The association the
between explanatory variable
value than 1) the association
to the true population
and the response variable strong.
is
IF
* % 15%, yes, is
it stronger responses. The individuals have consistency in the
individuals don't.
statistically significant and explanatory variable, and some
placemeans
are
not statistically significant
possible.
and it mayhave happened
by coincidence only.
OF IDENTFYING TE In
*
sampling variability
PROCESS error:
(P-VALUES the margin of
PERCENAGE Apexam
creates an interval of
1.) Identify
the difference in mean.
- All planned studies must be reviewed plausible values.
the percentage
calculate
4) of study must give their informed consent use the
than
before data are collected.
(oreC
how are greater
many dots
to the difference.
Randomize
or
equal mean
select
the 5% rule and - All individual data must be kept
5)compare to
YES NO
Minim
Definitions;
Formulas: · random process-generates outcomes that are determined purely
outcomes
number in event A
of by chance.
·
P(A) =
space
total number of outcomes in sample
· probability - outcome between 0 and 1.
-must add to 1.
rule:
·
Complement
·
large numbers
law of - we observe
if more and more trials ofany
P(AY) 1 -
P(A) approaches the true
proportion
=
Addition
· Rule for mutually exclusive events: probability.
exclusive
mutually no event can happen at the same time.
P(AUB) P(A) P(B)
·
-
+
=
P(AnB) consistent
outcomes are real-world
with outcomes.
simulation process:
①
Describe how you will simulate one trial (one reptition)
·
Conditional Probabilities ("given that"):
② Perform many trials (repetitions)
P(B(A)
Y*
= · conditional probability
-
another
probability
event
thatone
is
happens
event
known to have
given
happened.
that
·
·
Independent Events:
occurred does not change the probability that the
other event will happen.
P(A) P(A(BY P(A(B)
=
=
or
·OR
· AND
· General Multiplication Rule:
P(A1B) P(A) =
.
P(B)
·
At least one probability Rule:
-
union (V) - intersection (1) -
As
p (at leastone) =1-P (none)
C
N A A
-
B 1 2
8
HOW SMALL DOES THIS PROBABILIT
-
IT IS UNUSUAL?
LESS TAN 5% Ba 3 4
ifthere is evidence
r
How know
to convincing
the
From question?
percentage proportion From A B
①
or
Identifythe
the problem.
④ IF:
AR-pcanc
it is statistically
proportion of
*
dots
(5%, significant
↓ P(A1B)
From #3
the
p(A)
based
a
on
X
question.
* proportion of
From #3
dots
5%,
itis not statistically
based
significant
question.
on the E *sa ACPIBIAY:
P(C/A*c
B -
-
P(A'nB)
P(Are)
CHAPTER 6:Random Variables & Probability
Distributions
DEFINITONS:
FORMULAS:
numerical values of outcome
·
Random
Discrete variables · random variable -
an
random process.
t
From a
t h e other
From probabilities
where i smissing
P(X k),
* =
·
Probability distribution-random variables of possible
P(X k) = 1
= -
P(X Fk) values and their probabilities. -
kD
+ =
= + + .
+ ·
them.
=
gaps bet.
*
values with
(expected value) of
a discrete random variable
54 Mx)YP2) Mx)2 (pi) mean
*
-
* (x, -
the same
trials of
value many, many
=
average of
Ox
*
N
= random process.
a discrete random variable
standard deviation of
*
#
From
of the random process.
x_x*neige
Height can take any value
in an interval.
· Continuous random variable -
t E
Normal density
* curve-used in
the mean I standard
-
Y- same to or same
mean: My = a +
bMx -
1) difference
=
(S
·
x + of the same random process thata
combination
*
Sum
=
*
particular outcome ("success") occurs.
R.V.
Ms= Mx Mp Mx My
= -
mean:
+ My - Binary?"success"or "Fail"
mean:aMx b My +
U
the outcome o n e trial
Independent?Knowing
of
I Ox+ O O Ox +
OY I
+ anything
variance: about
tell
=
does not us
=
or
s.d.: bo E the outcome other trials.
of
s.d.:Os N
OY
=
Ob
=
+
Normal random
when combining values X.
of
by10 p.
MS Mr. Find the sum or difference the
of values binomial distribution specified
① Find the or -
⑥ Use
-
p)*
-
P(X x) (Y)(p) (1
*
-
-how a value typicallyvary from the mean
* =
trials.
=
after many
·
10% condition -
a binomial
if setting;notindependent, we
P(X x) =
-> binompdr (trial: _iP:-ix-value:1) can use the 10% condition to proceed with the
distribution of
X approximately
is Normal.
trials
number of until we
11)
the
1-binomedr(trial: _iP:-ix-value:
we record
P(Xx) ->
get a success.
*
· Geometric random variable - trials
number of t akes to
it get
a success.
1.
with
any
of trial,
o
x-values
P(X x) (1
* =
=
-
p) (probateages chance/probability
Probability:"There a
_ix-value:1)
·
is
"
OF
P(X=x) ->
1)
geometraf (p:_ix-value:
*)
· Mean: "Itmany, many mint) were randomly
P(X(x) ->
geometcalf (p:_ix-value:
#) selected, the average (context) is
F
=
typicallyvaries
by
about10+ unit)
(M+ unit) -
shape
* of the distribution:always right-skewed From the mean of ·
·
shape is
CHAPTER 7:SAMPLING DISTRIBUTION
IF:
CLAIMS:
I
DEFINITIONS: thereis convincing
* 5%
true.
① Assume that the claim
is
proportion of
dots
numberthatdecanon.
*
(p, M,0) #1
evidence
·
Parameter -
values POSSIBLE
question.
Distribution ofALL
sampling
-
·
the
samples of same size
is
the sampling distribution
by CLT.
sample (proportion (mean) (subject
or incontext) typicallyvaries by
approx. Normal
17
(SP+unit)
about From the true difference of (P1-P2/M, -M2+ unit).
FORMULAS:
Given one statistic: Given two statistics:
PROPORTON MEAN DIFFERENCE IN PROPORTONS DIFFERENCE IN MEANS
o
-
I
1
APPROX. NORMAL IF POPULAMON DIST. IS NORMAL
BOTH
⑰ NORMAL
POPULATIONNORMAL
IE
j
conditiont Central Limit
T heorem Approximately Normal by
E
Large
counts condition (LCC)
Large counts
in
(C(T) central Limit theorem (CLT)
up I 10 n, P, = 10 n2 P2 ] 10 n, 30
↳ n = 30
a e~
n(1 P) = 10
-
12 = 30
mentines"asn
proportions/ Fractions/percents average/means
-
whensamplingwithoutrepairmanas Fre a
Erin
↑
n L0.10N
I
Mp P Mx M
= =
Mx, xz M, M2
isane
In
I ⑧
-
=
# Op =
P(1 P)
n
-
or I
=
values OF I
and I are usually Found on
questions i
before the words less than or
greater than.
=>
EM_I
Mx within the
x less than, greater than or
M
at hand such as
-
· e
z z = =
finding theirdifferences
=
?
x,
·
, >xz
X - -
>0
,cz x,- z o
- ·
*
- -
-
+
P -
P
z
⑧
·z
=
=
of
⑧ -
P(1 p)
(P P2)
M ME,
-
xu
-
- - -
or t ae
z z
=
=
2 =
normal car(lower,
=
.
i s less
find
to the probability that
value OR
than greater than
=(P,-P2) (P. -P2) (X, x) (4, M)
a
e
or
- ·9.z
- - -
-
- +
P1) P2)
P2(1
-
use
Meaning that when an event occurs, the probability is less
find
to the probability that
t hat
OF event
changes already. OR
value
than greater than a
or
For example, in a standard deck of cards, when I
a
There's
*
"wording"ofthe question that will help
us identifyifwe do n eed
not to do the
10% condition.
-
When you see the statement
↓
"Isubjects) LIKE MESE,"
then the 10% condition doesn't
need to be shown since we're
to the
- I
onlygoing to generalize P/M zP/m z, Plu zz
given sample, not the population. -
appe
a reasonable legend:
-
& replace based
aboutthe
estimate parameter FOR - on the problem
calculations
(P, M,0) & s h ow
Confidence Internal:gives
· an
& - name I title
STATE:parameter &
(believable) values of
an C 1: & - reminder Inotes
- so
* what?
unknown population parameter
(C1) true proportion of ext].
confidence interval for p=
(p, M, o) based on a sample
data. method & conditions:
PLAN:Inference
the
of method that produces
generalize
(n+context) * so we can
Random:random sample
lii"
of to population.
the interal. the
without
so sampling
·
margin of error:Only accounts
for sampling variability,
samples like
these, replacement
is ok
for SD
If
* the problem says "... For & we can solve
like
other condition. Thati s sampling
sources e r ro r
of
not s how 10%
do not
90%:z* 1.645
=
=
z* (1 P)
I
-
*
= z
n
Some Formulass,reminders:
Using 1-PropIInt
Given the interval (A,B) where A is the lower value
ISTAT -> Tests A:1-PropIInt
->
A B
B is the upper value On the calculator
* I
X:np A, B
B
A (sample size)
#A
n:
margin
-
* e r ro r-
; of
pointestimate
=
2
C-level:C
*
CONCLUDE:Interpretation
sample
* size:(when I is unknown, use p 0.5)=
A B
C interval tr
(it has decimals round up) We are confidentthatthe from
-app-
margin
ME=
er ror
of
STATE:parameter & C 1:
:
SEP
Standard error of
*
P(1-B)
confidence interval For 1 2 context
p-Pc=true difference
=
context)
sample of(n.+
⑧
E
I wider ⑧ -
* so we can
independentrandom sample of
(n2+ context). to the population.
P
it
⑳
cond.
Moanainmincool consentences so campsite
replacement
is ok
·
nc 10.10N for SD
(1); ME(b)
& we can solve
sample size,
all
·
8 P,
in few
Large
*
(narrower counts:n , 110 so the sampling
&
*
distribution of p, Pz
⑳
-
n. (1 -
G*Only
-its
to be shown
0
I
point estimate -
Margin of
E r ror
M, R2 =
P,(1 B.)
Yz(1
=
*
- -
52)
p. 5 z +
-
I n, 12
have convincing evidence (in context). z* =
·
Confidence interval:(A, B) On the calculator
*
A B
X,:n,P, X2:UzPz I
From
A E
B captures the
[parate,t] C-level:C
M true
=
mean of ...
them
OF
will capture the
[Parmte"]" +, t 1st
->
is
proportion;
greater
-- Endproportion i-
+
-
noincitethe
a diFFerence bIC
internal contains 0.
CHAPTER 9:TESTING CLAIMS ABOUT PROPORTIONS
·
significance test:procedure for
data to
using
decide bet. two
observed
do this FOUR-STEP PROCESS or
SID C??8
competing claims (hypotheses).
FOR - appe -
&
legend:
replace
on the
based
problem
· null hypothesis (Ho): evidence against STATE: hypotheses, a, parameter & s h ow calculations
test
Ho:
We
=
to
want
& - name I title
Ha:P(C., F) Po & - reminder Inotes
(d) - so what?
(context) using x =
*
the
"ALL"
word
null value F orget
do not
(one-sided) Ha: parameter (or> *
S
F of to the population.
sentence) without
P-value (probability): probabilityofgetting 10% condition:n<0.10N (or in so sampling
these,
· *
is ok
- replacement
He that is samples like for SD
evidence For If
* the problem says "... For & we can solve
cond.
10% condition. Thati s sampling
than the s how
as strong/stronger do not
·
significance level we compare the Option 1 is recommended)
P-value DO: calculations:(two options. with the formula
BLACK & RED marks needs be careful
*
paramete
-
Only Statistic
-
PO)
convincing
=
n P-value
TO REJECT Ho. There is 1-PropITest
P-value) &:
* FAIL
Using
Ha (in context). Tests 5 :1-PropITest
not convincing evidence For
ISTAT ->
->
H
On the calculator
*
I Error:we rejectHo, when to is true;
· Type Po
* (two-sided)
Po: Po
npoot mere
-
For Ha
<Po&> Po (one-sided) lower per
in
He is true;
·
Type II Error: we rejectHo,
rail to when
CONCLUDE:Interpretation
gives not convincing evidence For Ha when
Because the
(P-value)
p-value of (/) a 2, we
Ha is True. =
I fl e ss than ()
Test:probabilitythatthe test
will find *
·
Power a
of There is ( Hain context).
Ha when a
rejectNo. mining
evidencethatt h e t ru e
P: proportion of
& reminders:
-app-
some Formulas
FOR
TRUT
hypotheses, a, parameter
i
STATE:
HoTrue HaTrue where true difference
p,-Pc= (1-2 context)
P, P2 =
(context)
-
We to
want test Ho:P,-P2= 0 in the proportion of
⑧
(C)
tytype
standardized
Ha: P,-Pc(.,I) O
using d
the
=
the
"ALL"
word
parameter
* F orget
do not
-Statistic method &
-
conditions:
PLAN:Inference
SD
in F.
P (Type I Error) &
=
met. (Two-sample z Forp, -P2
test
we can generalize
(n.+ context). * so
M ISandom:random sample random assignmento f
population/
x
S
-
= =
(n2+ context).
O
independentrandom sample random assignmento f show causation.
&
=
n. (1 -
Pc) I 10 n=(1 -
) 110 distributor al
do
* use
not individual , &Pc here!!!
some interpretations: recommended
is
options
calculations.(two options,
So
Type I Error: with the Formula
marks needs * be careful
Only BLACK & RED Statistic
-
paramete
(contexti s true,
*
P1P
In YMz
=
x
n, nz
=
=
M 0
4, z
" =
lower:
(context).
;
For Ha -
P, -
P2
normalcdF
z
upper: i0:1
Pc
=
z P-value=
Type I
Error: =
Pc)
=
P,(1 Pc) -
Pc(1
-
P-value
"The (context) is true,
+
evidence
but
we do not
find convincing ISTAT ->
62
Tests->:
-
PropITest
For Ha (context). On the calculator
*
H
Power:P(Reject Hais
to true) x1:n , P, * P2 (two-sided)
n1:H ,
<P2 & 7 P2(one-sided)
npoot mere
-
is true (at
a specific value in context) 2 :Me
X Pz
R2:Hz
there is a (power probability F inding
of
"
P, P2
convincing evidence rejectt h e
to null (context). CONCLUDE: interpretation
<
context.
P-value: Because
()
the
(P-value)
p-value of (/) a =2, we
Use
*
as
I fl e ss than
*
"AssumingtoistheHostsare
e
There is (Ha context).
mining
evidencethatt h e in
rejectNo. p Pz t ru e
-- = diFFCrnCe OF ...
probability g etting
of
than (3)
It
* greater
Fail to rejectto. There is not convincing evidence thatt h e t ru e difference (Ha incontext).
P. -P2 =
or ...
:
:
CHAPTER 10: ESTIMATING MEANS WITHCONFIDENCE
amthe same
· For of
* MEX, nM,
be
to used, we need the Margin
* e r ro r
of
population D
(W), but
ME is proportional
be,
*
to so...
since h ave
we do not that,
and are given sample SD(SX),
the distribution will vary more. nx4
ME, assuming everything
=
I
remains the same
·
degrees freedom:since
of we are using t* and o u r sampling *
ME is larger for higher CL.
distribution will vary more, we need
degrees F reedom.
of ME
* doesn'taccountFor bias, only
t his
think about ... if
Ihad
five sampling variability.
gummy bears, have
and I
five students,
I'll let
the studentchoose, theyhave
sample size
*
First know
Ifyou do not
*
*
5choices. The s tudentwill
next have 4
choices, until it
goes down only
to one n = t the t* From the
n no difference
1 formula.
(E)
-
.........
1- diFFY (t)
It
diFF
plot OF
dot
L
paired data
0
difrences means
of (X, -X2).
"standard deviation"of
·
Standard error (SEI): this is the
Mean
*
difference (Mdiff) is based on the
a
sampling distribution for mean,
but since we do not know the
paired data means (XaiFF).
population standard deviation (Ox), ↳ f rom
result recording two values
replace it with Sx on the
we
of the same quantitative variable
The
question says...
IEEEIreE CIG% do this FOUR-STEP PROCESS or
SID C888
2011,SEATIGE s,
gerete m
appe app
&
FOR - FOR - .
&
& -
s h ow calculations
STATE:parameter & - so
* what?
C 1:
(C1) M true mean of ext].
confidence interval For
=
owe cangenerate
PLAN:Inference
method & conditions:
· in the mean [context].
of
inet. &One-sample interval For M we can inter inference method & conditions:
experiment,
random so PLAN:
S
of random
without
inet. E Two-sample Minterval Form, -
M2
(or in sentence) sampling
*
101 condition:1 <0.10N
so
ok (n,+ context).
these, replacement
is
Random:Independent random sample of
I
-
"... For samples like & we can solve for SD (n2+ context).
cond. If
* the problem says -
I
* Only BLACK *
fail.
x = df n
=
- 1 =
(Statistic) I (t*) (SEx) be approx. Normal. or outliers
*
** all From DO:
n =
calculator X = Option 1 is
recommended)
Sx =
*
calculations.(two options. with the Formula
marks needs * be careful
BLACK & RED
t* inrI Only
*
df n 1 shown
0
=
be Margin of
E r ror
Interval to 2
(8: is estimate I
= -
=
5Inter
① point
int: using
areach X, (Statistic) (t*) (SET)
I
A B
= Xz =
A B
I
t
I
M, R2
t*
=
=
also
Table B.
calculator
X, -xz I
For ** you can use
the list . .
on
Table B, if itis not
For of in
df n
=
-
1
smaller sample
A +unit Brunit
C to size between the
2-SampTInt (0: 2-SampTInt)
interval
We are confidentthatthe From
two
using given.
captures the M =
true mean [parameter
OF
in
context].
A B
A B I
,
For
pro-an like one
diFF
sample tinterval for m
CONCLUDE:Interpretation
A Brunit
all
* are the same process a
we are
(C%) confident that the intewal From
+ unit
to
exceptfor the
-
Following: captures the M,-Mc =true difference (1-2 context) in the means
X
Use
* XdiFIinstead of only [parameter
OF
in
context].
use
* MdiFF instead Monly
of
Use
*
Use
* SdIFFinstead SxOnly
of convincing evidence?
[context]. Yes,
parameter:Mdiff
*
the true mean difference of
+, I -> Yes,
convincing
;
-
-
-
convincing
i -
+
-
eng
inference
* method is called:paired to interval a diFFerence bIC
or
internal contains 0.
one sample t-interval For MaiFF
in
*
the graphing, graph the
individual data each set.
or
differences, not
CHAPTER 11: TESTING CLAIMS ABOUT MEANS The question says:
E.letreeatIcezcoe
Afew things to remember about the 10% condition: Is
read the problem
* note:always
i tw i l l apply!
Apply10% condition when: carefully if
-sampleaname an
-
sampling withoutreplacement
-
not independentevents diet
(mostlyproportions or two prop.) legend:
-random samples based
-
& replace
on the problem
not apply 10% condition when: STATE: hypotheses, a, parameter s h ow calculations
Do &
-
samples these
name I title
& -
-
independent Ha:M MdiFF(<,, I) Not unit & - reminder Inotes
(Context) 6 (d)
OR mean difference of using
=
true
reminders: MdiFF= "ALL
the word
* F orget
do not
· connection between CI& ST:
method & conditions:
·
IFthe interval contains to (null value) as a plausible value, PLAN:Inference
E
of experiments
Ift h e shownandth
to pop.
·
interval does NOT
contain Ho (hull value) as a plausible -> so we can gen. -> so we can
n ot needed it
(Ha
that in context). *
so dist
approx. Normal.
* Population distribution is
is
A C will make the same decision as a two-SIDED ST. *Normal/Large sample:
(n) by CLI. Normal
- 30
*
n
* =
=
-
c1.:90% x 0.10 =
Do:
1 Option 1 is
recommended)
calculations.(two options. with the formula
c1.:99%> 2 0.0 1 be careful
=
marks needs *
Only BLACK & RED parameter
=
*
Statistic -
-
c1. -> x = 1-C1 ①
to be shown
0
2 S tatistic:
Test
standard error
n =
IdiFr=
FT
B in Finding P-value: t ((a(c) t EdiFF-MdiFF 1
df
=
table n
using t OR
= -
=
Sx
· =
SdiFF=
( ca1c) SdiFF
to df
=
Ifyou by2.
it
end if two-sIPED (F), multiply P-value
than o, check the Ifi ti s
*
· two-sided:you need the same thing as a
one-sided. But write
has "E-A" the
the p-value interval you s hould it'sstill below
get be whole thing if
1.24...E -
5<0.0000/
xc < P-value < x2 (2.9.
1.24x10 0
soon
1.24 E 7 -
Loweror loweron
-
...
lowerto
or
upper:1000
CONCLUDE:Interpretation
-e
Because the
(P-value)
p-value of (/) a 2,
=
we
* I fl e ss than ()
rejectto.
There is
mining
evidencethatt h e M:t ru e mean ( Hain context
of
+
unit)
It
* greater than (3)
Fail to rejectto. There is not convincing evidence thatt h e M:t ru e mean ( Hain context
or
+
unit)
I distribution
distribution
E.
thatis
-
*
N(m, -)
*when
ice, differentsor means
FOR
- appe
- WM
in
in
STATE
use
CONCLUDEu s e
HaiM,-M2
Ha: h, >M2
>O
in context.
STATE:hypotheses, a, parameter
We want to test - , M2 =
where M, -
Mc
in
true
=
the
difference
means of
(1-2 context)
(context)
t Ho:M, -
Mz 0
=
d (C)
using =
Hai, -Mz(.,I) 0
=
*wigharcmsion!
·A
-n, n ,
in STATE
use HaiM, -M2 <0 e,? mic na
CONCLUDEu s e Ha: M, <M2 in context. method & conditions:
in
PLAN:Inference
t
inet. E Two-sample t testF or M, -M2
Random samples Random assignment/experiment
Random:Independent
E
sin context)
i n context)
-
HaiM,-MzF0
asees
in STATE
use
* not s amples
needed if are independent
or
assigned/experiment
10%
-
*BOTHpopulation
distributions are approx. Normal.
·
Normal/Large
* sample:
(n.) 1 nz) I 30 by CLF.
t - 30 and n2=
*n, =
I
are BLACK
means Only Statistic parameter
-
difference of * -
0
·
to be shown
data) 2 S tatistic:
Test
DATA form (Find mean & Sd oreach or
① standard error
X2, Sx2)
-
-
(I., Sx.,
=
STATSTCS form t =
where i n here is
taf:sower-ins:-
Sx, = Sxz =
the smaller sample size
· mean difference - one sample data are paired;
using n, =
nz =
P-value =
by2.
(F), multiply P-value
STATSNCS form (diff, Stiff) P-value
(calc)
= Ifi ti s
* two-sIPED
· P-value (non-calculator)
-
t
floor
=
loweron or
Loweror
lower t
standard er ror of
SEx
upper:1000
CONCLUDE:Interpretation
One-sample two-sample
* t test M, < M2
* paired
*
TestFor M,-hz M, -M2 0
(P-value) (/)
=
M context.
For
2, M,<Me
-
F or M
t Test Because the p-value of a =
we use as
(0 () M. M2
I fl e ss than
*
(x, x)
I resenttomereincing
evidencethatweeitherthedifferenceandstainatt
- -
IdiFF -
MdiFE
t t IF
SdiFF
=
=
It
* greater than (3)
S. Sch
t naiFE
n,nz
CHAPTER 12:INFERENCEFOR DISTRIBUTIONS & RELATIONSHIPS
do
*
not always look at the
question. Always identify the
The question
* mightstate ...
Chi-Square Goodness ofFit (GOF) Ha: The claimed distribution of (context) is not (true/correct).
-
perform test
for a distribution given these
*
claimed distribution can be changed to the
wording of etc."
the problem like
"equally likely,
One samples,one variable Using & =
without
same
*
rule applies,
so sampling
OK
on this 10% cond
replacement is
-
distribution is not approximatelyNormal like the previous ones.
e
I
is approximately
I fit'spossible to list your
*
Chi-square
which is right-skewed. DO: expected count, then list it.
0 x
.
significant,
=
expected
I
L2 ->
P-value P-value
= (
from calculator
The question
* mightstate ...
or jupper:idf:)=P-value
"Do have ..." tcdf(lower:
we
convincing evidence of
a difference
P-value reject
Homogeneity
CONCLUDE: to.
Chi-square of the p-value of (2 C, we
=
Because
*
perform test
for a distribution given
P-value (& to reject o
the p-value of C, we
fail
sampless,one
=
Because
*
It variable
There is not convincing evidence that(Ha in context).
"2 Brands of""color"
e.g. Gummy Bears
Gummy Bears
(RT)(CT)
-expected counts:
PLAN: inference method & conditions
( TT)
RT total
row
Tt table total
=
of
CT column total
=
so we can generalize
conditions:Random:Randomlyselected (n+context) to
the population
& significance level
STATE:Hypotheses
or
*
Randomized experiment (so we can infer
causation
in categorical var
distribution <0.10N
Ho: There is no
difference 10% coud.:n -
without
population 1 and population 2. same rule applies,
so sampling
for
*
OK
on this10% cond
replacement is
MATRIX Chi-square
values using the
DO: 1 -
15). Edit
->
(0,- x2
-
include
x reject
CONCLUDE: P-value to.
=
.
=
->
C: do not edit
- -
P-value (& to reject o
first last
*Because the p-value of C,
=
we
fail
R r ow
=
P-value P-value
= (
from calculator
There is not convincing evidence that(Ha in context).
df (R
-
1)(c -
1) 2 column
=
jupper:idf:)=P-value
=
or
tcdf(lower:
The question
* mightstate ...
& conditions
PLAN: inference method
perform test
for a distribution given or
*
Randomized experiment (so we can infer
one sample, two variables
<0.10N
causation
10% coud.:n
-Tactingdebtre
-
OK
2.9. Evil on this10% cond
replacement is
(RT)(CT)
-expected counts: sampling dist.
Large counts:All expected counts are
75, so
DO: 1 -
15). Edit
->
C, we
fail to reject o
*
table to
= =
do not edit
->
First last
jupper:idf:)=p-value
R r ow
=
1)(c -
1) 2 column
=
CHAPTER 12:INFERENCEFOR DISTRIBUTIONS & RELATIONSHIPS
Iconditions.
I I I I
some symbols:Statistic Parameter
Population
·
y-intercept a
o
My x
=
Bx
+
slope D B
SD residuals S
of
sam"re
y a
=
bx
+
5D slope SEb Ob
X-rar X-rar
computer output:
SHAPE:approximatelyNormal
is independent each other.
T P Each observation of
SECoef
CENTER:Mb B
Predictor
axCoef SEaY
INDEPENDENT:
*
When sampling
I constant
VARIABILITY:Ob *
x two-sided
SEp*
L
t
=
Ox b* ↓
(x-variable)
n
test
S
sq(adj)
STANDARD ERROR: SE R = R -
=
=
S = -
sg
n -
1 residuals
~SD of residual
"cottenation (re) NORMAL:A
* dutplot of residuals
cannot show
Y;
2
(ti skew or outliers.
strong
-
SD RESIDUALS:S
OF =
n 2
(SED):
-
Standard b
Error of EQUAL SD:The
* residual plot has roughlyequal variability
STANDARY EST:t bE
=
b, as of this notes.
interpretations
DECREEOF: a, are chapter 3
dfe * on
Fidenceinterfor
theseteeth.
G:LinReg TInt
populate
Tests /
STAT >
b b
= t* t *
=
b t*
I SEb
df n
= -
2 =
-in-1, of df n
=
-
2
A, B
t* t *
2
SEp=Eb
<
gie computer =
CONCLUDE:
interpretation
3
conditions:L i n e a r :
A B
C
sets
Independent: interval
always i n We are confidentthatthe from
context
3 ineserestre
*nolinear relationship F:LinReg T Test
a
Ho:B 0
=
t t
or =
SEb
=
relationship output.
b > 0 positive
*
n 2
df
=
tcdf
=
lower: idf:
-
relationship - ;
upper:
B 0
= some
*
conclude part.
P-value: P-value:
population (SRL
B true slope of
↳
=
using a C it's
if still below
=
whole thing
E- 6.
-
1.9.-izixi os
PLAN:Inference method & conditions:
Inference Method:
One sample t for
test slope
lower.
I
Upper: 1000
loweron soon or
lower t
upper:1000
3
conditions:L i n e a r :
Independent: always i n
CONCLUDE: interpretation
context (P-value) ((X)
easi: Because the
()
p-value of a C=
we...
I fl e ss than
Random: (x-context] [y-context].
*
Ho and
There is between
rejectNo. mining
evidence. OF
It
* greater than (3)
is not convincing evidence. OF
Ha between (x-context] and It-context].
rejectto. There
Fail to
Congratulations!
You have finished the AP Statistics Course!
—Mr. Jeremiah James dela Rosa
Thank you to Stats Medic, Luke Wilcox, and Lindsey Gallas!