Mod 3 Statistical Methods
Mod 3 Statistical Methods
1
Statistics
2
Statistical Methods
Distributions
3
Statistical Methods
Distributions - Types
Probability Distribution - A probability distribution is a function that describes the likelihood of obtaining the
possible values that a random variable can assume. In other words, the values of the variable vary based on the
underlying probability distribution.
A function ( or mapping ) of events to probabilities
Motivation:
Using historical data and experience (or assumptions), a convenient way to estimate or predict probabilities of
events
Methods:
• Using histograms
• Using probability density functions
• Using cumulative distribution functions
4
Statistical Methods
Distributions - Types
A discrete probability distribution (applicable to the scenarios where the set of possible outcomes is
discrete, such as a coin toss or a roll of dice) can be encoded by a discrete list of the probabilities of the
outcomes, known as a probability mass function.
A continuous probability distribution (applicable to the scenarios where the set of possible outcomes can
take on values in a continuous range (e.g. real numbers), such as the temperature on a given day) is typically
described by probability density functions (with the probability of any individual outcome actually being 0).
5
Statistical Methods
Statistical Inference
Statistical inference is the process of using data analysis to deduce properties of an underlying distribution
of probability.
Statistical inference makes propositions about a population, using data drawn from the population with
some form of sampling. Given a hypothesis about a population, for which we wish to draw inferences,
statistical inference consists of (first) selecting a statistical model of the process that generates the data and
(second) deducing propositions from the model.
Inferential statistical analysis infers properties of a population, for example by testing hypotheses and
deriving estimates. It is assumed that the observed data set is sampled from a larger population.
Descriptive statistics is solely concerned with properties of the observed data, and it does not rest on the
assumption that the data come from a larger population.
The conclusion of a statistical inference is a statistical proposition.
6
Statistical Methods
Statistical Inference
7
Statistical Methods
Statistical Inference Procedure
8
Statistical Methods
Importance of Statistical Inference
9
Statistical Methods
Analysis of variance
10
Statistical Methods
Analysis of variance
ANOVA is a form of statistical hypothesis testing heavily used in the analysis of experimental data.
A test result (calculated from the null hypothesis and the sample) is called statistically significant if it is
deemed unlikely to have occurred by chance, assuming the truth of the null hypothesis.
A statistically significant result, when a probability (p-value) is less than a pre-specified threshold (significance
level), justifies the rejection of the null hypothesis, but only if the a priori probability of the null hypothesis is
not high.
In the typical application of ANOVA, the null hypothesis is that all groups are random samples from the same
population.
For example, when studying the effect of different treatments on similar samples of patients, the null
hypothesis would be that all treatments have the same effect (perhaps none).
Rejecting the null hypothesis is taken to mean that the differences in observed effects between treatment
groups are unlikely to be due to random chance.
11
Statistical Methods
Analysis of variance
ANOVA is the synthesis of several ideas and it is used for multiple purposes.
It is difficult to define concisely or precisely.
ANOVA is a statistical tool used in several ways to develop and confirm an explanation for the observed data.
12
Statistical Methods
Analysis of variance
13
Statistical Methods
Classes of models
Fixed-effects models
The fixed-effects model (class I) of analysis of variance applies to situations in which the
experimenter applies one or more treatments to the subjects of the experiment to see whether the
response variable values change. This allows the experimenter to estimate the ranges of response
variable values that the treatment would generate in the population as a whole.
Random-effects models
Random-effects model (class II) is used when the treatments are not fixed. This occurs when the
various factor levels are sampled from a larger population. Because the levels themselves are
random variables, some assumptions and the method of contrasting the treatments (a multi-
variable generalization of simple differences) differ from the fixed-effects model.[19]
14
Statistical Methods
Classes of models
• Mixed-effects models
• A mixed-effects model (class III) contains experimental factors of both fixed and
random-effects types, with appropriately different interpretations and analysis for
the two types.
• Example: Teaching experiments could be performed by a college or university
department to find a good introductory textbook, with each text considered a
treatment. The fixed-effects model would compare a list of candidate texts. The
random-effects model would determine whether important differences exist among
a list of randomly selected texts. The mixed-effects model would compare the (fixed)
incumbent texts to randomly selected alternatives.
15
Statistical Methods
Analysis of variance
• One-way ANOVA is used to test for differences among two or more independent groups (means), e.g.
different levels of urea application in a crop, or different levels of antibiotic action on several different
bacterial species,or different levels of effect of some medicine on groups of patients. However, should these
groups not be independent, and there is an order in the groups (such as mild, moderate and severe disease),
or in the dose of a drug (such as 5 mg/mL, 10 mg/mL, 20 mg/mL) given to the same group of patients, then a
linear trend estimation should be used. The one-way ANOVA is used to test for differences among at least
three groups, since the two-group case can be covered by a t-test. When there are only two means to
compare, the t-test and the ANOVA F-test are equivalent; the relation between ANOVA and t is given by F = t2.
• Factorial ANOVA is used when the experimenter wants to study the interaction effects among the
treatments.
• Repeated measures ANOVA is used when the same subjects are used for each treatment (e.g., in a
longitudinal study).
• Multivariate analysis of variance (MANOVA) is used when there is more than one response variable.
16
Statistical Methods
One-way ANOVA
One-way analysis of variance (abbreviated one-way ANOVA) is a technique that can be used to
compare means of two or more samples (using the F distribution).
This technique can be used only for numerical response data, the "Y", usually one variable, and
numerical or (usually) categorical input data, the "X", always one variable, hence "one-way".
The one-way ANOVA is used to test for differences among at least three groups, since the two-group
case can be covered by a t-test.
When there are only two means to compare, the t-test and the F-test are equivalent; the relation
between ANOVA and t is given by F = t2.
An extension of one-way ANOVA is two-way analysis of variance that examines the influence of two
different categorical independent variables on one dependent variable.
17
Statistical Methods
Multivariate Analysis of Variance (MANOVA)
18
Statistical Methods
THANK YOU