0% found this document useful (0 votes)
63 views2 pages

STT 215 Exam 1 Study Guide

Uploaded by

avaldiri768
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views2 pages

STT 215 Exam 1 Study Guide

Uploaded by

avaldiri768
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

STT 215 Exam 1 Study Guide: Chapters 1-4

Chapter 1: What is statistics?


Statistics is the art and science of learning from data.
Three parts of the statistical process:
1) Design
2) Describe
3) Infer
We attempt to estimate population parameters with sample statistics.
With random sampling, each subject in the population has the same chance of being chosen.
Just as individual people vary, individual samples vary. We expect that most sample statistics will fall within a
certain range, but we do not expect that the sample statistics have the same values. We expect that the population
parameter falls within that range.
The margin of error is a measure of the variability of sample statistics. For proportions (percentages), we can
approximate the margin error
1
𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 = , 𝑤ℎ𝑒𝑟𝑒 𝑛 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
√𝑛
Results of study are considered statistically significant if they would rarely be observed with only ordinary random
variation.

Chapter 2: What is descriptive statistics?


Categorical vs. Quantitative variables
For both variables, we are interested in distributions.
For categorical variables, we are often interested in modal categories. We use frequency tables, pie charts, and bar
graphs to display the data.
For quantitative variables, we are often interested in shape (modes, symmetry or skew, outliers), center (mean or
median), and variability (standard deviation or interquartile range, IQR). We use dot plots, stem-and-leaf plots,
histograms, box plots, and time plots. Using technology, you should know how to calculate the mean & standard
deviation as well as the median & IQR (five-number summary). You should know how to identify outliers, using the
IQR rule. The empirical rule gives us the percentage of observations contained within some number of standard
deviations. You should also know how to calculate and interpret z-scores using the following formula:
𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑛
𝑧 − 𝑠𝑐𝑜𝑟𝑒 =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

Chapter 3: How are two variables associated?


We are curious about the association between the response variable and the explanatory variable.
For two categorical variables, we use contingency tables and find conditional proportions. The marginal
proportions are often less helpful.
For one categorical variable and one quantitative variable, we can use summary statistics or side-by-side boxplots
to look for associations.
For two quantitative variables, we use scatterplots and analyze the pattern. We look at
▪ Trend: linear, curved, clusters, no pattern
▪ Direction: positive, negative, no direction
▪ Strength: how closely the points fit the trend. The correlation, r, measures the strength of the linear
association and ranges from -1 to +1, with values closer to -1 and +1 demonstrating strong, linear
association.
We can use the regression line, 𝑦̂ = 𝑎 + 𝑏𝑥, to describe the line and to make predictions. We can use 𝑟 2 (the
correlation value squared) to judge the accuracy of the predictions and to indicate the amount of variability in the
response variable that explanatory variable actually explains.
With scatterplots, we look at regression outliers carefully. We may deem them to be influential points if they fall at
the extreme values of x or if excluding them changes the regression line or correlation value dramatically. We are
cautious about extrapolating beyond the range data values that we have.
We know that association does not imply causation. Lurking or confounding variables may influence the
association. An extreme example is Simpson’s paradox, where the direction of the association reverses when we
introduce a third variable.

Chapter 4: How are data gathered? What does good design look like?
Observational studies vs. Experiments
Observational studies are prone to lurking variables because there is no randomization.
Sample surveys can use simple random sampling, cluster random sampling, and/or stratified random samples. A
sampling frame provides the list of all possible subjects. Under SRS, each subject in the sampling frame has the
same chance of being selected. Under cluster random sampling, clusters (naturally existing groups) are selected
and then all subjects in those clusters are surveyed. Finally, under stratified sampling, strata are determined using
subjects’ characteristics and then subjects are randomly selected from each strata.
Observational studies can be retrospective, prospective, or cross-sectional (at a given point in time). One special
case of a retrospective study is the case-control study, where equal numbers of subjects with a condition and
without the condition are asked about their previous experiences with the explanatory variable (e.g., lung cancer
and smoking).
When sampling for surveys it is important to guard against biases, including: sampling bias (due to undercoverage
or volunteer samples), response bias, and nonresponse bias. In contrast to sample surveys, a census attempts to
survey every subject in the population.
Experiments, when well-designed, can help us establish causality between the response and explanatory variables.
Good design includes drawing a representative sample, using random assignment for the experimental treatment
and control treatment, and avoiding bias through double-blinding.
Multi-factor experiments have at least two explanatory variables and are interested in the effects of each
explanatory variable as well their interaction. Blocking and matched-pair, and crossover designs are other
experimental techniques.

You might also like