Manual PIFACE Ingles
Manual PIFACE Ingles
specified margin of error, for either a finite population of specified size, or an infinite population
(or sampling with replacement).
The confidence interval is of the form p +/- ME, where p is the sample proportion and ME is the
margin of error:
Where z is a critical value from the normal distribution, p is the sample proportion, n is the
sample size, and N is the population size.
The dialog is designed such that a sample size n is computed whenever you change any of the
other input values. If you change n, a new ME is computed (using sigma in place of s in the
above formulas).
Finite population
If "Finite population" is checked, calculations are based on the population size N entered in the
adjacent input field. If the box is unchecked, the "N" field is hidden, and calculations are based
on an infinite population.
Worst case
If "Worst case" is checked, computations are based on the assumption that the true sample
proportion, pi, is .5. If it is not checked, then a pi value other than .5 may be entered in the field
to the right of the checkbox.
Confidence
Choose the desired confidence coefficient for the ME. This determines the value of z.
Margin of Error
Enter the target value of the margin of error here. Note that the actual value achieved after you
collect the data will be more or less than this amount, because it will be based on p (estimated
from the data) rather than pi.
Enter the sample size here if you want to see what margin of error is achieved with that size
sample. When you change n, it is rounded to the nearest integer; otherwise, n is not rounded
Note on sliders
Sliders may be varied by clicking or dragging along the center of the scale. If you drag along at
the level of the tick labels, you can change the endpoints of the scale. The small button at the
upper right of each slider converts it to and from a keyboard-input mode. In that mode, you can
also change the scale endpoints and number of digits displayed.
Graphics
A simple graphics facility is available from the Options menu. Enter the variables for the x and y
axes, and the range of values for the x variable. More information is available through the
graphics dialog's help menu.
--------------------
Jacob Cohen has proposed effect-size measures for several statistical procedures, and has
defined "small," "medium," and "large" effect sizes accordingly. These effect-size measures are
standardized quantities, e.g. an absolute difference of means divided by the standard deviation.
They are quite popularly used in sample-size problems, because they are so easy to use; you
don't have to think very hard to get an answer.
And that's the rub. You don't have to think nearly enough! Planning a study always requires
careful thought: what is the goal, how do we operationalize the research question, what do we
measure and how, what study design is needed, what result would be of scientific importance,
etc. None of those issues are addressed in specifying small, medium, or large on a
standardized scale. If you really care about the scientific merits of your work, then you should
not take this easy route. And I certainly will not help you do it using my software.
Suppose that a study involves measuring the thickness of fibers. There are various instruments
that could be used to do that. It makes sense that if an inaccurate instrument is used, you
should have more observations in the experiment than if you use really accurate
measurements. However, using, say, a "medium" effect size in the planning, you get the SAME
sample size regardless of whether you use a micrometer, caliper, or 6-inch plastic ruler.
That's because Cohen's measures are standardized. Using a micrometer, a medium effect is
perhaps a thousandth of an inch in absolute terms; whereas, using a ruler, a medium effect is
perhaps an eighth of an inch. If a .01- inch difference in mean fiber thicknesses is considered to
be important, then the plastic-ruler study is useless, while the micrometer study is over-powered
and could be done adequately with fewer data.
To do a responsible job of planning the study, you need to decide (1) what effect size, in
ABSOLUTE units (e.g., inches in the above example), is of importance from a scientific point of
view; and (2) how variable are the measurements (e.g., accuracy of instrumentation). Typically,
these are both hard questions. Question (1) requires a lot of thought and discussion. Question
(2) requires some experience with similar measurements, and/or a pilot study.
It is certainly a lot easier to talk about the ratio of (1) and (2), as Cohen does, rather than the
two quantities separately. But it is not good science.
For more discussion, see my article in a refereed publication of the American Statistical
Association:
Lenth, R.V. (2001), "Some Practical Considerations for Effective Sample Size Determination,"
The American Statistician, 55, 187-193.
This dialog provides for power analysis of a one-sample test of proportions. You have your
choice of the normal approximation, the beta approximation (which is more accurate), or two
exact calculations based on the binomial distribution. In the latter cases, the size of the test
(i.e., the probability of a type I error) is often much smaller than the stated significance level.
MORE DETAILS
Common notations: Suppose that X is a binomial random variable with "success" probability p
and n independent trials, and let P = X/n.
Normal approximation:
We approximate distribution of P with N(mu=p, sigma^2=p*(1-p)/n). This has the same mean
and variance as the true distribution of P. The critical region is obtained using this approximation
with p=p0, then the power is computed using this approximation with p=p1.
Beta approximation:
We approximate distribution of P with Beta(a=(n-1)*p, b=(n-1)*(1-p)). This has the same mean
and variance as the true distribution of P. The critical region is obtained using this approximation
with p=p0, then the power is computed using this approximation with p=p1.
Exact:
In the exact test, the significance level alpha is taken as an upper bound on the size of the test
(its power under the null hypothesis). Since X has a discrete distribution, the size cannot be
controlled exactly and is often much lower than the specified alpha.
For the alternative p < p0, let xl denote the largest x for which Pr(X <= x | p = p0) <= alpha.
Then the power is equal to Pr(X <= xl | p = p1) and the size of the test is Pr(X <= xl | p = p0).
For the alternative p > p0, let xu denote the smallest x for which Pr(X >= x | p = p0) <= alpha.
Then the power is equal to Pr(X >= xu | p = p1). and the size of the test is Pr(X >= xu | p = p0).
For the alternative p != p0, compute both powers as above but replace alpha by alpha/2; then
sum the powers and sizes.
This is like the exact method, except the critical values are calculated based on an adjusted
Wald statistic. This does NOT guarantee that the size of the test is less than alpha.
Let
pAdj = (x + 2) / (n + 4),
1 - alpha/2 quantile).
Then xl is the largest x such that z <= -zCrit, and xu = smallest x such that z >= zCrit. Compute
the power as in the exact method using these critical values.
Note: This is the exact method used in the JMP statistical software.
where ph1 and ph2 are estimates of p1 and p2 based on n1 and n2 trials, and
If the continuity correction is used, the numerator is decreased in absolute value by .5 * (1/n1 +
1/n2) (at most).
Under the null hypothesis, Z is taken to be N(0,1). Under the alternative hypothesis, the
variance of Z is not equal to 1 because then p1 and p2 are not averaged.
Note: For numerical stability, the ranges of p1 and p2 are limited so that min[n1p1, n1(1 - p1)]
>= 5 and min[n2p2, n2(1 - p2)] >= 5.
This dialog provides for sample-size determination for estimating a mean to within a specified
margin of error, for either a finite population of specified size, or an infinite population (or
sampling with replacement).
The confidence interval is of the form y-bar +/- ME, where y-bar is the sample mean and ME is
the margin of error:
where t is a critical value from the t distribution, s is the sample SD, n is the sample size, and N
is the population size.
The dialog is designed such that a sample size n is computed whenever you change any of the
other input values. If you change n, a new ME is computed (using sigma in place of s in the
above formulas).
Finite population
If "Finite population" is checked, calculations are based on the population size N entered in the
adjacent input field. If the box is unchecked, the "N" field is hidden, and calculations are based
on an infinite population.
Confidence
Choose the desired confidence coefficient for the ME. This determines the value of t.
Sigma
Enter your best guess at the population SD (based on historical or pilot data). For a finite
population, sigma^2 is defined as the sum of squared deviations from the population mean,
divided by N-1.
Margin of Error
Enter the target value of the margin of error here. Note that the actual value achieved after you
collect the data will be more or less than this amount, because it will be based on s (estimated
from the data) rather than sigma.
Enter the sample size here if you want to see what margin of error is achieved with that size
sample. When you change n, it is rounded to the nearest integer; otherwise, n is not rounded
Note on sliders
Sliders may be varied by clicking or dragging along the center of the scale. If you drag along at
the level of the tick labels, you can change the endpoints of the scale. The small button at the
upper right of each slider converts it to and from a keyboard-input mode. In that mode, you can
also change the scale endpoints and number of digits displayed.
Graphics
A simple graphics facility is available from the Options menu. Enter the variables for the x and y
axes, and the range of values for the x variable. More information is available through the
graphics dialog's help menu.
This dialog provides for power analysis of a one-sample t test or a pooled t test. The effect size
is the difference between the null and the target mean (in the one-sample test), or the mean
difference of the pairs (in the paired t test).
You may choose whether to solve for effect size or sample size when you click on the "Power"
slider. The current values are scaled upward or downward to make the power come out right,
while preserving (at least approximately) the ratio of the two SDs or two ns
This dialog provides for power analysis of a two-sample t test or a two-sample t test of
equivalence. If the "equal SDs" box is checked, then the pooled t test is used; otherwise, the
calculations are based on the Satterthwaite approximation.
You have three choices for sample-size allocation. "Independent" lets you specify n1 and n2
separately; "Equal" forces n1 = n2; and "Optimal" sets the ratio n1/n2 equal to sigma1/sigma2
(which minimizes the SE of the difference of the sample means).
You have a choice between using power or ROC area as the criterion for deciding sample size
(see the section below on ROC area for additional explanation).
You may choose whether to solve for effect size or sample size when you click on the "Power"
(or "ROC area") slider. The current values are scaled upward or downward to make the power
come out right, while preserving (at least approximately) the ratio of the two SDs or two ns.
To study a test of equivalence, check the "Equivalence" checkbox. A "Threshold" window will
appear for entering the negligible difference of means. An equivalence test is a test of the
hypotheses H0: |mu1 - mu2| >= threshold versus H1: |mu1 - mu2| < threshold. The test is done
by
Then H0 is rejected only if both H01 and H02 are rejected. If both tests are of size alpha, then
the size of the two tests together is at most alpha -- that's because H01 and H02 are disjoint.
Another way to look at this test is to construct a 100(1 - 2*alpha) percent CI for (mu1 - mu), and
reject H0 if this interval lies entirely inside the interval [-threshold, +threshold].
Options menu
"Use integrated power" is an experimental enhancement. Consider the plot of power versus
alpha; integrated power is the area under that curve. (This curve is also known as the ROC -
receiver operating characteristic - curve.) The integrated power, or area under the ROC curve,
is the average power over all possible values of alpha (therefore, it does not depend on alpha).
Consider two hypothetical studies of identical size and design. Suppose that in one of them,
there is no difference between the means, and in the other, the difference is the value specified
in the dialog. We compute the t statistic in each study. Then the integrated power is the
probability that the t statistic for the second (non-null) study is "more significant" than the one
from the null study. That is, it is the chance that we will correctly distinguish the null and non-
null studies. The lowest possible integrated power is .5 except in unreasonable situations (such
as using a right-tailed test when the difference is negative).
An advantage of using integrated power instead of power is that it doesn't require you to specify
the value of alpha (so the "alpha" widget disappears when you select the integrated power
method). Also, it is somewhat removed from the trappings of hypothesis testing, in that in the
above description, the t statistic is only used as a measure of effect size, not as a decision rule.
This may make it more palatable to some people (please tell me what you think!) A suggested
target integrated power is 0.95 - roughly comparable to a power of .80 at alpha = .05.
The above description of integrated power is for a regular t test, as opposed to an equivalence
test. In an equivalence test, the analogous definition and interpretation applies, but in cases
where the threshold is too small relative to sigma1 and sigma2, the power function is severely
bounded and this can make the integrated power less than .5.
This is a simple interface for studying power and sample- size problems for simple or multiple
linear regression models. It is designed to study the power of testing one predictor, x[j], in the
presence of other predictors. The power of the t test of a regression coefficient depends on the
error SD, the SD of the predictor itself, and the multiple correlation between that predictor and
other predictors in the model. The latter is related to the variance inflation factor. It is assumed
that the intercept is included in the model.
No. of predictors: Enter the total number of predictors (independent variables) in the regression
model. SD of x[j]: Enter the standard deviation of the values of the predictor of interest.
VIF[j]: (This slider appears only when there is more than one predictor.) Enter the variance-
inflation factor for x[j]. In an experiment where you can actually control the x values, you
probably should use an orthogonal design where all of the predictors are mutually uncorrelated
-- in which case all the VIFs are 1. Otherwise, you need some kind of pilot data to understand
how the predictors are correlated, and you can estimate the VIFs from an analysis of those
data.
Two-tailed: Check or uncheck this box depending on whether you plan to use a two-tailed or a
one-tailed test. If it is one-tailed, it is assumed right-tailed.
If a left-tailed test is to be studied, reverse the signs and think in terms of a right-tailed test.
Error SD: The SD of the errors from the regression model. You likely need pilot data or some
experience using the same measurement instrument.
Detectable beta[j]: The clinically meaningful value of the regression coefficient that you want to
be able to detect.
Sample size: The total number of observations in the regression dataset. This is forced to be at
least 2 greater than the number of predictors.
Power: The power of the t test, at the current settings of thre parameter values.
Solve for: Determines what parameter is solved for when you change the value of the Power
slider.
This dialog is used to specify an ANOVA model for study in a power analysis. Once you fill-in
the fields, clicking on one of the buttons at the bottom generates a graphical interface (GUI)
based on the model you specify.
The "Differences/Contrasts" buttons generates a GUI designed for studying the powers of
comparisons or contrasts among the levels of fixed factors, or combinations thereof. This is
probably what you want for most sample-size planning.
The "F tests" button creates a GUI for studying the powers of the F tests in the ANOVA table.
This is most useful when you want to study the powers of tests of random effects.
There are several built-in models; you may find what you want among them. These also serve
as examples of how to specify models.
"Model" is the only field that is required; there are defaults for the rest.
Title
Model
The terms in this model define the dialog. Separate the terms using "+" signs. Use "*" for
interactions, e.g., "A*B". Use "()", e.g., "Subj(Treat)". A "|" generates all main effects and
interactions, e.g., "A|B|C" is the same as "A + B + A*B + C + A*C + B*C + A*B*C".
Levels
You can set the starting number of levels for any factors in the model. (Since the levels
can be manipulated in the GUI, it is not mandatory to specify them here. The default for any
factor is 2 levels. Specify levels in the form "name levels name levels ...", e.g., "A 2 B 3".
(1) Locking levels: A specification like "A=B 3" sets A and B to always have the same
number of levels, starting at 3.
(2) Fractional designs: If the name of a factor is preceded by "/", then the total number of
observations in the experiment is divided by the number of levels of that factor. For example,
"row= col=/treat 5" specifies a 5x5 Latin square.
Random factors
Any factors listed here are taken to have random levels. Give their names, separated by
spaces. These settings can be altered later in the F-test GUI, but NOT in the
differences/contrasts GUI.
Replicated
If this box is checked, then a "within-cell error" term is added to the model, and an additional
window appears to specify the starting number of replications. If the box is NOT checked, then
the design is taken to be unreplicated, and a "residual" term will be added to the model. If the
model is saturated, nothing can be tested unless one or more of the factors is random.
Finally, if there are replications but the model is not saturated, the GUI assumes a residual term
that pools the within-cell error with the unspecified terms.
The sample sizes are inputted as n1 and n2. The "equal ns" checkbox forces them to be equal
when checked.
The variances to be compared are inputted via the sliders for Variance 1 and Variance 2. Use
the drop-down list to specify the alternative hypothesis interest. (The null hypothesis in all
cases is that Var1 = Var2.). Use the alpha slider to set the desired sugnificance level for the
test.
The power slider displays the power of the test for the current parameter settings. This slider is
not clickable. To determine, say, the sample size for a given power, vary n1 and/or n2 until the
desired power is achieved.
This dialog provides rudimentary power analysis for a test of a coefficient of multiple
determination (R-square). The underlying model is that we have a sample of N iid multivariate
random vectors of length p, and that the pth variable is regressed on the first p-1 variables.
(n - k - 1) R^2
F = -----------------
k (1 - R^2)
This is the usual ANOVA F. The distinction that makes this dialog different from the one for
regular ANOVA is that the predictors are random. The power computed here is unconditional,
rather than conditional.
True rho^2 value: The population value of R^2 at which we want to compute the power.
Sample size: The number of N multivariate observations in the data set.
References:
Gatsonis, C. and Sampson, A. (1989), Multiple correlation: Exact power and sample size
calculations. Psychological Bulletin, 106, 516--524.
Note (9-18-06): This may still have some rough edges; the values obtained by my algorithms
seem to differ slightly from those provided in the Gatsonis and Sampson paper.
This dialog provides rudimentary power analysis of a chi-square test. Using prototype data of
sample size n*, compute the chi-square statistic Chi2*. Enter n* and Chi2* in the windows
provided; these define the effect size to use in the power calculations.
The prototype data should be fake data constructed to reflect the effect size of clinical or
scientific importance. Use the analysis you plan to do on these fake data to obtain the chi-
square value Chi2* for the dialog. The prototype data should include the expected frequencies
(or whatever), but should not include random fluctuations.
This dialog provides rudimentary power analysis for an exact test of a Poisson parameter.
Specifically, our data are assumed to be x_1, x_2, ..., x_n iid Poisson(lambda), so that x =
sum{x_i} is Poisson with mean n*lambda serves as the test statistic. The critical value(s) for x
are obtained using quantiles of the null Poisson distribution so as to cut off tail(s) of probability
less than or equal to alpha. The power of the test is then the probability of the critical region,
computed from a Poisson distribution with the specified value of lambda.
lambda0: The value of the Poisson parameter under the null hypothesis.
size (output value): The actual probability of the critical region under the null hypothesis.
Lucita y Abita son hermanas para siempre y Richard y Gladys somos una familia para siempre