0% found this document useful (0 votes)
10 views13 pages

Lect w7 f2023

The document provides comprehensive lecture notes on hypothesis testing, covering terminology, procedures, and types of errors involved in statistical analysis. Key concepts include the null and alternative hypotheses, Type I and Type II errors, significance levels, and various statistical tests for univariate and bivariate data. It emphasizes the importance of assumptions, test selection, and proper interpretation of results in hypothesis testing.

Uploaded by

louiscollomp2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views13 pages

Lect w7 f2023

The document provides comprehensive lecture notes on hypothesis testing, covering terminology, procedures, and types of errors involved in statistical analysis. Key concepts include the null and alternative hypotheses, Type I and Type II errors, significance levels, and various statistical tests for univariate and bivariate data. It emphasizes the importance of assumptions, test selection, and proper interpretation of results in hypothesis testing.

Uploaded by

louiscollomp2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

UCR RMET101 Fall2023

J.Resovsky Lecture Notes


Week 7: Hypothesis Testing

V-clip Timestamps
(m13a/aa, slides 2 and 3)  0:33 intro to h-test terminology; 1:13 “null” vs “alternative”; 3:50 “right” and
“wrong” side; 4:50 one-sided/tailed vs two-sided/tailed alt-hypos; 5:40 reject/fail to reject/ support; 6:18 Type
1or2 error and Type1 explained; 8:06 Type 2 error and “alpha” versus “beta”; 8:33 alpha “tolerance” (“alpha for
testing” / “significance level for the test) and “P-value” and “sig”/”significance”; 10:45 Power of a test; 11:45
first part of how to choose alpha; 12:35 intro to hypo testing procedure (emphasis on picking correct test and
stating correct conclusion). m13aa general form and meaning of test statistics and interpretation of P-value.

(m13b; slide 3… and some 4,5)  0:00 step1 = defining pop stat of interest; 3:04 state null hypo and name
test and importance of these steps; 4:33 stating alt hypo and its importance; 6:05 setting alpha and more
about how/why to choose values lower/higher than 0.05; 8:03 assumptions checking step and the
“randomness” assumption; 8:55 NORMALCY ASSUMPTION (often called “sample size”, but this isn’t really
what it is!); 9:45 other assumptions, equality of variance; 10:28 converting SPSS or table
“significance”/”p”/”tail-area” test result to the REAL p-value of your test outcome.

(m13bb; slides 4,5)  0:00 how to use my 2 slides with 11 tests summarized; 1:43 special explanation of
paired t-test notation and logic; 4:10 univariate Chi-squared versus Z-tests; 4:50 univariate anova versus
indep or pooled t-test; 5:10 Bivariate tests
Asking For Information Gathering Information Assessing Information

Why to ask The correct measurement How does it look?


(philosophy) (levels, reliability,construct validity) (descriptives)

What to ask The correct process Looking deeper


(vocabulary) (internal validity, design) (bivariates, normalcy, reliability)

How to ask The correct subjects What answers does it give?


(rules) (external validity, sampling) (hyp. Testing)

• Hypothesis Testing: I) terminology


– Null hypothesis (H0 or H0) : what I will attempt to falsify, but reject only if the
evidence (my sample) is extremely convincing. ALWAYS WITH EQUAL SIGN
– Alternative hypothesis (Ha or H1 or H1): what I will attempt to support through
the rejection of H0, but only if the evidence points in this direction.
– Type I Error: rejecting H0 when it is true
• . a-significance (significance level of the test): the chance of making a Type I
error
• . OR the greatest chance of making a Type I error that I will tolerate. If the chance is
higher, I won’t reject H0.
• P-significance: the chance of a true H0 given what I observe in my samples. I reject
H0 if and only if P < a. [IN SPSS, this is “sig”; in textbook it is “tail area”]
– Type II Error: not rejecting H0 when it is false
• . b. = the chance of making a type II error.
• power of a test = P (yes, P again, confusing!) = chance of rejecting H0 with your
sample. Depends on variability in the population and the size of the sample. Useful
only in deciding if you want to try a bigger sample. P = 1 - b because no hypothesis
is ever perfectly true.
• Research procedures that make a bigger tend to make b smaller, and vice versa
– Notes
• I don’t talk much about ordinal data in these chapters. You can treat it like nominal, but really can act like scale
with more advanced tests
• Null hypothesis is almost always the hypothesis of no relationship between variables, or no difference from
simplest distribution characteristics. NO MATTER WHAT HYPOTHESIS IS ACTUALLY FAVORED BY YOUR
THEORY
• Hypo Testing: II) procedure (page 424 / 437/ 477 D&P)
– Define (what words go with which symbol) the Population characteristic of interest
• Means, proportions, or observation counts more than two categories?
• Number of samples or groups (one, two, more than two)
• Univariate or bivariate or multivariate
– Formulate H0 and H1
• H0 always “=“ [even if the hypothesis you prefer is ǂ ]
• Always about populations (symbols must be m or p or b or r, never x, ƥ, b, or r)
• Always compare to a reference value or (for differences) to 0.
• H1 > or < for 1-sided (one-tailed) alternative: when question is about one population stat
being “more” “less” “-er” (small-er, fast-er, light-er) than the other or than the reference.
• H1 ǂ for 2-sided (two-tailed) alternative: when question is about whether a population stat is
“same” / “different” from that of another pop or the reference
– Use H0 to Name the Test (e.g. Pooled T test) and Test statistic (i.e. t/z/F/chisq) and
list formulas for test stat and degrees of freedom
• formulas for the test all have the form: t/z/F/chisq = (obs – hypo) / stnd.error]
• Degrees of freedom formulas all include “number of sample elements” -1 and/or “number of categories” -1
– Set Significance level BEFORE computing test result: (0.05 is ‘standard’ caution; 0.01 if extra
careful/skeptical, sometimes 0.10 for prelim research to form naïve hypothesis/justify further, more careful testing)
– Assumptions (Will the test be valid/reliable? will the test work?)
– Test score Computation (fill in blanks in formula; get t, z, chi-squared or F; don’t forget about df!)
– H1 P-value from Test score (From table/SPSS. If necessary, make wrong-side [1-P] and two-side
corrections [P/2 or P*2] … more about this on slide 5)
– Conclusion: Don’t use “accept” or “prove” or “disprove”
• if P < alpha, reject H0, support the specified H1.
• If P > alpha, fail to reject (not same as accept!!) null hyp, support alt hyp
• “is/isn’t sufficient evidence to reject the null hypothesis, implying there is/isn’t significant evidence of… ”
ShortList of Tests: Univariate
• Large (one-)Sample T-test for means
– H0: mx = m0 (= mhypothesized = mref )
– Example: based on a sample of Dutch nationals, is the average Dutch person taller than the supposed/known American
average of 178cm
• X = height in cm; m0 = 178; H1: mx > m0
• Paired T-test for mean difference
– H0: (mx = mdiff =) mx -x = m0
diff 1 2

– Example: based on a sample of Dutch nationals, is the average difference of age 10 and age 20 heights (average teenage
growth) for Dutch people Dutch person greater than the supposed/known American average of 55cm
• X1 = age 10 height in cm, x2 = age 20 height; x diff = x2-x1; m0 = 55; H1: mdiff > m0
• Large (one-)Sample Z-test for proportion
– H0: pA = po (= phypothesized = pref )
– Example: based on a sample of Dutch nationals, is the percentage of Dutch persons taller than 200cm the supposed/known
American average of 10%?
• pA = proportion over 2 meters; p0 = 0.10; H1: px > p0
• Univariate 2 (more than two category proportions/counts of interest)
– H0 : pA = pAo ; : pAB = pBo ; : pC = pCo ; ….. [exact reference value for each category]
– Example: based on a sample of Dutch nationals, are the proportions of blonds, brunettes, redheads, and ‘others’ the same as
the American 10%, 20%, 5%, and 65%
• Stats of interest are the Dutch population proportions of each of the 4 hair color types, Expected counts (needed for test
statistic formula) are n*0.1, n*0.2, n*0.05, n*0.65; H1: at least one population proportion differs from the hypothesized value.
• Univariate ANOVA 2 (more than two population means of interest)
– H0: m1 = m2 = m3 … ; H1: at least one inequality of pop means.
– Example: based on a sample of Dutch nationals, are the average IQs of blonds/brunettes/redheads/others the same?
ShortList of Tests: Bivariate
• (two) Independent Sample T-test for means (different pop
variances)
– H0: m1 = m2 (or m1 - m2 = 0)
– Example: based on a samples of Dutch and Americans nationals, is the average Dutch person taller than the average
American?
• X = height in cm; m is pop mean of x; pop 1 is Dutch, pop2 American; H1: m1 > m2
• Two Sample Z-test for proportions
– H0: p1 = p2 (or p1 - p2 = 0)
– Example: based on a samples of Dutch and Americans, is the percentage of Dutch persons taller than 200cm greater
than that for Americans?
• p = population proportion over 2 meters; pop A American, pop D Dutch, H1: pD > pA
• (Linear) correlation test
– H0: correlation r = 0 (or regression slope b = 0)
– Example: based on a samples of Dutch, do taller people have higher IQs?
• X is height, y is IQ; r is linear correlation of IQ with height for the population, b is the slope of the population
linear relationship for determining IQ based on height. H1 : r > 0 : or b > 0

• 2 for tests for Independence or Homogeneity


– H0 : Distributions of the two categorical variables are
Independent/Homogeneous; H1: H0 not true
– Examples: [these all have more than 2 categories, but also could be two-cat variables]
• Independence: Based on a random sample of Dutch, is the proportion of Dutch people of
low/medium/high IQ independent of whether their hair color is blond/brunette/red/other?
• Homogeneity: Based on random samples of Dutch, Americans, and Germans, are the incidence of
blond/brunette/red/other the same in the three countries?
Asking For Information Gathering Information Assessing Information

Why to ask The correct measurement How does it look?


(philosophy) (levels, reliability,construct validity) (descriptives)

What to ask The correct process Looking deeper


(vocabulary) (internal validity, design) (bivariates, normalcy, reliability)

How to ask The correct subjects What answers does it give?


(rules) (external validity, sampling) (hyp. Testing)

• Hypothesis Testing: IIIa) univariate


– Scale H0: m = m0 = mref [example: is the average Dutch person taller than the supposed European average of
178cm]
• z test table only if I know the population standard deviation
• Usually large-sample t-test, using standard error of the mean and df = n-1
• Assumptions: random; n > 30 or population known to be normal or sample
appears normal enough to assume the population distribution is normal.
• Paired sample t test is really just the same as this one (measurement is
difference of two scale variables for each sample element)!
– Binomial (two-level ordinal!!) H0: p = p0
– [ example: Are a majority of Europeans in favor of reintroducing strict border controls?  implies p0 = 0.50 = 50%; alternative p > 0.50]
• Assumptions: random; ƥ*n >10 and (1-ƥ)*n > 10
• One large-sample z-test for a population proportion
Asking For Information Gathering Information Assessing Information

Why to ask The correct measurement How does it look?


(philosophy) (levels, reliability,construct validity) (descriptives)

What to ask The correct process Looking deeper


(vocabulary) (internal validity, design) (bivariates, normalcy, reliability)

How to ask The correct subjects What answers does it give?


(rules) (external validity, sampling) (hyp. Testing)

• Hypothesis Testing: IIIb) univariate


– Paired Scale H0: mx-y (= mdiff )= m0 (m0 usually = 0)
[ examples: Is the average Dutch married couple of the same height? Did (individual) student scores improve in exam2 relative to exam1]
• Test is paired t-test Often confused with independent sample or pooled t-tests
• Key identifying feature is that for the paired t-test, for each sample element (e.g.
married couple) there are two measurements recorded in two separate columns
(e.g. heights of `husband’ and `wife’) [for independent/pooled t, sample elements have one relevant measurement
each, but the sample is divided into two independent subsets of elements]
• Follow rules for large-sample t-test!
– Note on 1-sided and 2-sided tests
• “Larger/smaller/less/more/majority/minority/improve/worse” in question/hypo imply
one-sided
• “Different/same” in question/hypo imply two-sided
• Check if table or SPSS gives one-sided or two-sided!!! Divide two-sided P by 2 to
get 1-sided P
• Use one-sided P-value if HA is > or < AND the data show the SAME direction as
hypothesized
• Use (1.0- (one-sided P-value)) if HA is > or < AND the data imply a direction
opposite of that hypothesized.
• Use two-sided P-value if HA is ǂ. This means 2.0*(one-sided P-value)
• SPSS P-values called “sig”
Asking For Information Gathering Information Assessing Information

Why to ask The correct measurement How does it look?


(philosophy) (levels, reliability,construct validity) (descriptives)

What to ask The correct process Looking deeper


(vocabulary) (internal validity, design) (bivariates, normalcy, reliability)

How to ask The correct subjects What answers does it give?


(rules) (external validity, sampling) (hyp. Testing)

• Hypothesis Testing: IV) bivariate (cause-effect)


– 2-group Categorical Cause-Scale effect H0: m1= m2
[ example: Are Dutch taller than Americans on average?]
• Usually (two-) independent-sample t-test,
• standard error of the mean is sqrt [(s12 / n1 ) + (s22 / n2 ) ]
• df = (V1+V2)2 / [ (V12/(n1-1) + V22 / (n2-1) ] (very difficult to calculate!!)
• For a conservative estimate, use lowest possible df = n1 -1, when n1 is smaller than n2
– (if H0 is rejected for conservative estimate, don’t need to calculate exact df)
• Assumptions: random; independent; n > 30 or sample appears normal for each sample
• When justified, more ‘powerful’ variant of this test is pooled t-test,
– If we can assume that the two populations have the same variance, df = n1 + n2 – 2 (this is the highest possible df)
– SPSS tests a null hypothesis of equal variances with “Levene’s” test:
» Levene’s sig < 0.05  reject equal variances  cannot pool  use non-pooled t-test output line

– 2-group Categorical Cause – two-category Effect H0: p1 = p2


• [ example: Do equal percentages of men and women favor Trump?]
• Two sample z-test, using average p= pc = (n1ƥ1 + n2ƥ2) / (n1+n2)
• standard error is sqrt {[pc *(1- pc )]*[(1/ n1 ) + (1/ n2 )] }
• Assumptions: random; independent, p c *n >10 and (1- p c)*n > 10 for each of n1
and n2
• Note: standard error for confidence interval different than that for z-test (see textbook formula)!
• Chi-squared is just as good or better when H1 is TWO-sided
– Note: just two categories independent variable for now. 3 or more categories is Chi-
squared (proportions) or ANOVA F-tests (means) (Chapters 12,15)
Asking For Information Gathering Information Assessing Information

Why to ask The correct measurement How does it look?


(philosophy) (levels, reliability,construct validity) (descriptives)

What to ask The correct process Looking deeper


(vocabulary) (internal validity, design) (bivariates, normalcy, reliability)

How to ask The correct subjects What answers does it give?


(rules) (external validity, sampling) (hyp. Testing)

• Hypothesis Testing: V) univariate-categorical


– The question: how different is your sample pie chart (or univariate
bar chart) from what you expect of the population?
• Difference tests likelihood that population hypothesis is correct
• Relative frequency is what matters. (thus pie not bar)
• Total pie being different is what matters
– not just one slice, especially if slice is small  “standard error” for determining
relative size of differences is expected slice size
– What you expect is specified by a theory
• Example: 200 autos, 4 colors, theory is no preference  25% each  .25*200=50 of each
color [n1exp=50 | n2exp=50 | n3exp= 50 | n4exp=50]
• Example: Americans favor death penalty 2/1, with ¼ undecided. Are Europeans
the same? [pEXP1=.5 favor, pEXP2=.25 against, pEXP3=.25 undecided]; nEXP=
nSAMPLE*pEXP, so for sample size 200, [n1exp=100 | n2exp=50 | n3exp= 50]
– Difference measured by
– c = Sall cats [
2
(expectedthiscat - observedthiscat)2 / expectedthiscat ]
• Look up in table 8 for df=k-1, where k is number of categories
• Bigger score  lower significance (P)  H0 less likely true.
• Always positive, usually bigger numbers than z or t scores for same P.
Asking For Information Gathering Information Assessing Information

Why to ask The correct measurement How does it look?


(philosophy) (levels, reliability,construct validity) (descriptives)

What to ask The correct process Looking deeper


(vocabulary) (internal validity, design) (bivariates, normalcy, reliability)

How to ask The correct subjects What answers does it give?


(rules) (external validity, sampling) (hyp. Testing)

• Hypothesis Testing: V) bivariate-categorical


– The question: how different are the sample pie (or bar) charts from different
groups? (Is distribution homogeneous? Are variables independent?)
• Identity of Groups is the second variable
– Results after random assignment to different experiment conditions after one population
sample
– Or… characteristics of samples drawn from various different populations
• Difference tests likelihood of H0 that groups are the same
• Relative frequency is what matters. (thus pie not bar)
• Total pie being different is what matters
– not just one slice, especially if slice is small
– What you expect is specified by a “sameness” formula for tables
– expectedthisbox = totalthisrow x totalthiscolumn / total in study
• Note that expected cell counts are easy for 2-by-2 studies!
– Difference measured by c2 again
• Look up in table 9 for df=(number_rows - 1)*(number_columns – 1)
– Assumption: need at least 5 EXPECTED per cell
• Sometimes makes 2x2 Chi-squared possible when two sample proportion test
doesn’t work
More about Chi-squared tests: Univariate

• Univariate: just one set of categories


– Not proportion of “yes” for several groups…. That would be a
combination of yes-no categories and group identity
categories…
– With just two categories, univariate chi-square is the equivalent
of a two-sided z-test for a single population proportion
– Null hypothesis is either a specific proportion for each group
(provided in question) or the assumption that the proportions in
all categories are the same (you may have to calculate this by
dividing the number of subjects by the number of categories)
More about Chi-squared tests: Bivariate
• Homogeneity is about category counts (proportions) in
DIFFERENT populations, or in the results of two
DIFFERENT experimental treatments
– Independent variable USUALLY is population group identity
– Best to use words, not symbols for Hypotheses
– H0: The true proportions for [category descriptions for variable 1] are the
same for all [category descriptions for variable 2]
– H1: is always 2-sided, and can always be written: “H0 is not true”
• Independence is about category counts for two different
variables within one population
– Usually survey or observational designs
– Again, use words for the hypotheses
– Again, H1 always 2-sided: “H0 is not true”
– H0: [variable 1 description] and [variable 2 description] are independent
Asking For Information Gathering Information Assessing Information

Why to ask The correct measurement How does it look?


(philosophy) (levels, reliability,construct validity) (descriptives)

What to ask The correct process Looking deeper


(vocabulary) (internal validity, design) (bivariates, normalcy, reliability)

How to ask The correct subjects What answers does it give?


(rules) (external validity, sampling) (hyp. Testing)

• Hypothesis Testing: VI) bivariate-scale


– The question: are two variables correlated? Is there a
significant linear relationship
• NOT the same as asking if correlation strong (R)
• NOT the same as asking if correlation explains variance (R2)
– Basically, is a test of the significance of the slope or
correlation (can choose either form of hypothesis):
• Because r is proportional to b, same significance of both tests
• Sample R  population r  H0: r = 0
• b for sample  b for pop  H0: b = 0
• Also possible hypotheses about intercept, but that has no impact
on whether or not things are correlated, and is not as important
– Assumes (approximately) normal distribution about the
line for all x-value
– Assumes (approximately) the same width of distribution
around line for all x-values

You might also like