0% found this document useful (0 votes)
77 views6 pages

Sample or Sample Standard Deviation Can Refer To Either The Above-Mentioned Quantity As Applied To Deviation (The Standard Deviation of The Entire Population)

The standard deviation is a measure of how spread out the values in a data set are around the mean. It indicates whether the values are clustered closely to the mean or spread far from it. A lower standard deviation means values are closer to the mean, while a higher one means they are more dispersed. The standard deviation can be calculated for populations or samples, and is the square root of the variance. It is commonly used to understand the spread of many types of data in statistics, science, and other fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views6 pages

Sample or Sample Standard Deviation Can Refer To Either The Above-Mentioned Quantity As Applied To Deviation (The Standard Deviation of The Entire Population)

The standard deviation is a measure of how spread out the values in a data set are around the mean. It indicates whether the values are clustered closely to the mean or spread far from it. A lower standard deviation means values are closer to the mean, while a higher one means they are more dispersed. The standard deviation can be calculated for populations or samples, and is the square root of the variance. It is commonly used to understand the spread of many types of data in statistics, science, and other fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

In 

statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of


values.[1] A low standard deviation indicates that the values tend to be close to the mean (also called
the expected value) of the set, while a high standard deviation indicates that the values are spread
out over a wider range.
Standard deviation may be abbreviated SD, and is most commonly represented in mathematical
texts and equations by the lower case Greek letter sigma σ, for the population standard deviation, or
the Latin letter s, for the sample standard deviation.[2]
The standard deviation of a random variable, sample, statistical population, data set, or probability
distribution is the square root of its variance. It is algebraically simpler, though in practice,
less robust than the average absolute deviation.[3][4] A useful property of the standard deviation is that
unlike the variance, it is expressed in the same unit as the data.
The standard deviation of a population or sample and the standard error of a statistic (e.g., of the
sample mean) are quite different, but related. The sample mean's standard error is the standard
deviation of the set of means that would be found by drawing an infinite number of repeated samples
from the population and computing a mean for each sample. The mean's standard error turns out to
equal the population standard deviation divided by the square root of the sample size, and is
estimated by using the sample standard deviation divided by the square root of the sample size. For
example, a poll's standard error (what is reported as the margin of error of the poll), is the expected
standard deviation of the estimated mean if the same poll were to be conducted multiple times.
Thus, the standard error estimates the standard deviation of an estimate, which itself measures how
much the estimate depends on the particular sample that was taken from the population.
In science, it is common to report both the standard deviation of the data (as a summary statistic)
and the standard error of the estimate (as a measure of potential error in the findings). By
convention, only effects more than two standard errors away from a null expectation are
considered "statistically significant", a safeguard against spurious conclusion that are really due to
random sampling error.
When only a sample of data from a population is available, the term standard deviation of the
sample or sample standard deviation can refer to either the above-mentioned quantity as applied to
those data, or to a modified quantity that is an unbiased estimate of the population standard
deviation (the standard deviation of the entire population).

Contents

 1Basic examples

o 1.1Population standard deviation of grades of eight students

o 1.2Standard deviation of average height for adult men

 2Definition of population values

o 2.1Discrete random variable

o 2.2Continuous random variable

 3Estimation
o 3.1Uncorrected sample standard deviation

o 3.2Corrected sample standard deviation

o 3.3Unbiased sample standard deviation

o 3.4Confidence interval of a sampled standard deviation

o 3.5Bounds on standard deviation

 4Identities and mathematical properties

 5Interpretation and application

o 5.1Application examples

 5.1.1Experiment, industrial and hypothesis testing

 5.1.2Weather

 5.1.3Finance

o 5.2Geometric interpretation

o 5.3Chebyshev's inequality

o 5.4Rules for normally distributed data

 6Relationship between standard deviation and mean

o 6.1Standard deviation of the mean

 7Rapid calculation methods

o 7.1Weighted calculation

 8History

 9Higher dimensions

 10See also

 11References

 12External links

Basic examples[edit]
Population standard deviation of grades of eight students[edit]
Suppose that the entire population of interest is eight students in a particular class. For a finite set of
numbers, the population standard deviation is found by taking the square root of the average of the
squared deviations of the values subtracted from their average value. The marks of a class of eight
students (that is, a statistical population) are the following eight values:
These eight data points have the mean (average) of 5:
First, calculate the deviations of each data point from the mean, and square the result of
each:
The variance is the mean of these values:
and the population standard deviation is equal to the square root of the variance:
This formula is valid only if the eight values with which we began form the
complete population. If the values instead were a random sample drawn from
some large parent population (for example, they were 8 students randomly and
independently chosen from a class of 2 million), then one divides by 7 (which
is n − 1) instead of 8 (which is n) in the denominator of the last formula, and the
result is  In that case, the result of the original formula would be called
the sample standard deviation and denoted by s instead of  Dividing by n − 1
rather than by n gives an unbiased estimate of the variance of the larger parent
population. This is known as Bessel's correction.[5][6] Roughly, the reason for it is
that the formula for the sample variance relies on computing differences of
observations from the sample mean, and the sample mean itself was
constructed to be as close as possible to the observations, so just dividing
by n would underestimate the variability.

Standard deviation of average height for adult


men[edit]
If the population of interest is approximately normally distributed, the standard
deviation provides information on the proportion of observations above or below
certain values. For example, the average height for adult men in the United
States is about 70 inches (177.8 cm), with a standard deviation of around
3 inches (7.62 cm). This means that most men (about 68%, assuming a normal
distribution) have a height within 3 inches (7.62 cm) of the mean (67–73 inches
(170.18–185.42 cm)) – one standard deviation – and almost all men (about
95%) have a height within 6 inches (15.24 cm) of the mean (64–76 inches
(162.56–193.04 cm)) – two standard deviations. If the standard deviation were
zero, then all men would be exactly 70 inches (177.8 cm) tall. If the standard
deviation were 20 inches (50.8 cm), then men would have much more variable
heights, with a typical range of about 50–90 inches (127–228.6 cm). Three
standard deviations account for 99.7% of the sample population being studied,
assuming the distribution is normal or bell-shaped (see the 68-95-99.7 rule, or
the empirical rule, for more information).

Definition of population values[edit]


Let μ be the expected value (the average) of random variable X with
density f(x):
The standard deviation σ of X is defined as
which can be shown to equal
Using words, the standard deviation is the square root of
the variance of X.
The standard deviation of a probability distribution is the same as that
of a random variable having that distribution.
Not all random variables have a standard deviation. If the distribution
has fat tails going out to infinity, the standard deviation might not exist,
because the integral might not converge. The normal distribution has
tails going out to infinity, but its mean and standard deviation do exist,
because the tails diminish quickly enough. The Pareto distribution with
parameter  has a mean, but not a standard deviation (loosely speaking,
the standard deviation is infinite). The Cauchy distribution has neither a
mean nor a standard deviation.

Discrete random variable[edit]


In the case where X takes random values from a finite data
set x1, x2, ..., xN, with each value having the same probability, the
standard deviation is
or, using summation notation,
If, instead of having equal probabilities, the values have
different probabilities, let x1 have probability p1, x2 have
probability p2, ..., xN have probability pN. In this case, the
standard deviation will be

Continuous random variable[edit]


The standard deviation of a continuous real-valued random
variable X with probability density function p(x) is
and where the integrals are definite integrals taken
for x ranging over the set of possible values of the
random variable X.
In the case of a parametric family of distributions, the
standard deviation can be expressed in terms of the
parameters. For example, in the case of the log-normal
distribution with parameters μ and σ2, the standard
deviation is

Estimation[edit]
See also: Sample variance

Main article: Unbiased estimation of standard


deviation

One can find the standard deviation of an entire


population in cases (such as standardized testing)
where every member of a population is sampled.
In cases where that cannot be done, the standard
deviation σ is estimated by examining a random
sample taken from the population and computing
a statistic of the sample, which is used as an
estimate of the population standard deviation.
Such a statistic is called an estimator, and the
estimator (or the value of the estimator, namely the
estimate) is called a sample standard deviation,
and is denoted by s (possibly with modifiers).
Unlike in the case of estimating the population
mean, for which the sample mean is a simple
estimator with many desirable properties
(unbiased, efficient, maximum likelihood), there is
no single estimator for the standard deviation with
all these properties, and unbiased estimation of
standard deviation is a very technically involved
problem. Most often, the standard deviation is
estimated using the corrected sample standard
deviation (using N − 1), defined below, and this is
often referred to as the "sample standard
deviation", without qualifiers. However, other
estimators are better in other respects: the
uncorrected estimator (using N) yields lower mean
squared error, while using N − 1.5 (for the normal
distribution) almost completely eliminates bias.

Uncorrected sample standard


deviation[edit]
The formula for the population standard deviation
(of a finite population) can be applied to the
sample, using the size of the sample as the size of
the population (though the actual population size
from which the sample is drawn may be much
larger). This estimator, denoted by sN, is known as
the uncorrected sample standard deviation, or
sometimes the standard deviation of the
sample (considered as the entire population), and
is defined as follows:[7]
where  are the observed values of the sample
items, and  is the mean value of these
observations, while the denominator N stands
for the size of the sample: this is the square
root of the sample variance, which is the
average of the squared deviations about the
sample mean.
This is a consistent estimator (it converges in
probability to the population value as the
number of samples goes to infinity), and is
the maximum-likelihood estimate when the
population is normally distributed.[citation
needed]
 However, this is a biased estimator, as the
estimates are generally too low. The bias
decreases as sample size grows, dropping off
as 1/N, and thus is most significant for small or
moderate sample sizes; for  the bias is below
1%. Thus for very large sample sizes, the
uncorrected sample standard deviation is
generally acceptable. This estimator also has
a uniformly smaller mean squared error than
the corrected sample standard deviation.

Corrected sample standard


deviation[edit]
If the biased sample variance (the
second central moment of the sample, which is
a downward-biased estimate of the population
variance) is used to compute an estimate of
the population's standard deviation, the result
is
Here taking the square root introduces
further downward bias, by Jensen's
inequality, due to the square root's being
a concave function. The bias in the
variance is easily corrected, but the bias
from the square root is more difficult to
correct, and depends on the distribution in
question.
An unbiased estimator for the variance is
given by applying Bessel's correction,
using N − 1 instead of N to yield
the unbiased sample variance, denoted s2:

You might also like