What Is Variance?: Key Takeaways
What Is Variance?: Key Takeaways
Variance (σ2) in statistics is a measurement of the spread between numbers in a data set. That is,
it measures how far each number in the set is from the mean and therefore from every other
number in the set.
KEY TAKEAWAYS
Understanding Variance
Variance is calculated by taking the differences between each number in the data set and the
mean, then squaring the differences to make them positive, and finally dividing the sum of the
squares by the number of values in the data set.
What Is Covariance?
Covariance measures the directional relationship between the returns on two assets. A positive
covariance means that asset returns move together while a negative covariance means they move
inversely. Covariance is calculated by analyzing at-return surprises (standard deviations from the
expected return) or by multiplying the correlation between the two variables by the standard
deviation of each variable.
Understanding Covariance
Covariance evaluates how the mean values of two variables move together. If stock A's return
moves higher whenever stock B's return moves higher and the same relationship is found when
each stock's return decreases, then these stocks are said to have positive covariance. In finance,
covariances are calculated to help diversify security holdings.
When an analyst has a set of data, a pair of x and y values, covariance can be calculated using
five variables from that data. They are:
Given this information, the formula for covariance is: Cov(x,y) = SUM [(xi - xm) * (yi - ym)] /
(n - 1)
While the covariance does measure the directional relationship between two assets, it does not
show the strength of the relationship between the two assets; the coefficient of correlation is a
more appropriate indicator of this strength.
Covariance Applications
Covariances have significant applications in finance and modern portfolio theory. For example,
in the capital asset pricing model (CAPM), which is used to calculate the expected return of an
asset, the covariance between a security and the market is used in the formula for one of the
model's key variables, beta. In the CAPM, beta measures the volatility, or systematic risk, of a
security in comparison to the market as a whole; it's a practical measure that draws from the
covariance to gauge an investor's risk exposure specific to one security.
Meanwhile, portfolio theory uses covariances to statistically reduce the overall risk of a portfolio
by protecting against volatility through covariance-informed diversification.
Possessing financial assets with returns that have similar covariances does not provide very
much diversification; therefore, a diversified portfolio would likely contain a mix of financial
assets that have varying covariances.
Example of Covariance Calculation
Assume an analyst in a company has a five-quarter data set that shows quarterly gross domestic
product (GDP) growth in percentages (x) and a company's new product line growth in
percentages (y). The data set may look like:
• Q1: x = 2, y = 10
• Q2: x = 3, y = 14
• Q3: x = 2.7, y = 12
• Q4: x = 3.2, y = 15
• Q5: x = 4.1, y = 20
The average x value equals 3, and the average y value equals 14.2. To calculate the covariance,
the sum of the products of the xi values minus the average x value, multiplied by the yi values
minus the average y values would be divided by (n-1), as follows:
Cov(x,y) = ((2 - 3) x (10 - 14.2) + (3 - 3) x (14 - 14.2) + ... (4.1 - 3) x (20 - 14.2)) / 4 = (4.2 + 0 +
0.66 + 0.16 + 6.38) / 4 = 2.85
Having calculated a positive covariance here, the analyst can say that the growth of the
company's new product line has a positive relationship with quarterly GDP growth.
1. The mean value is calculated by adding all the data points and dividing by the number of
data points.
2. The variance for each data point is calculated, first by subtracting the value of the data
point from the mean. Each of those resulting values is then squared and the results
summed. The result is then divided by the number of data points less one.
3. The square root of the variance—result from no. 2—is then taken to find the standard
deviation.
For an in-depth look, read more about calculating standard deviation and other volatility
measures in Excel.
KEY TAKEAWAYS
On the other hand, one can expect aggressive growth funds to have a high standard deviation
from relative stock indices, as their portfolio managers make aggressive bets to generate higher-
than-average returns.
A lower standard deviation isn't necessarily preferable. It all depends on the investments one is
making, and one's willingness to assume the risk. When dealing with the amount of deviation in
their portfolios, investors should consider their personal tolerance for volatility and their overall
investment objectives. More aggressive investors may be comfortable with an investment
strategy that opts for vehicles with higher-than-average volatility, while more conservative
investors may not.
Standard deviation is one of the key fundamental risk measures that analysts, portfolio managers,
advisors use. Investment firms report the standard deviation of their mutual funds and other
products. A large dispersion shows how much the return on the fund is deviating from the
expected normal returns. Because it is easy to understand, this statistic is regularly reported to
the end clients and investors.
The variance helps determine the data's spread size when compared to the mean value. As the
variance gets bigger, more variation in data values occurs, and there may be a larger gap between
one data value and another. If the data values are all close together, the variance will be smaller.
This is more difficult to grasp than are standard deviations, however, because variances represent
a squared result that may not be meaningfully expressed on the same graph as the original
dataset.
Standard deviations are usually easier to picture and apply. The standard deviation is expressed
in the same unit of measurement as the data, which isn't necessarily the case with the variance.
Using the standard deviation, statisticians may determine if the data has a normal curve or other
mathematical relationship. If the data behaves in a normal curve, then 68% of the data points will
fall within one standard deviation of the average, or mean data point. Bigger variances cause
more data points to fall outside the standard deviation. Smaller variances result in more data that
is close to average.
A Big Drawback
The biggest drawback of using standard deviation is that it can be impacted by outliers and
extreme values. Standard deviation assumes a normal distribution and calculates all uncertainty
as risk, even when it’s in the investor's favor—such as above average returns.
The variance is determined by subtracting the value of the mean from each data point, resulting
in -0.5, 1.5, -2.5 and 1.5. Each of those values is then squared, resulting in 0.25, 2.25, 6.25 and
2.25. The square values are then added together, resulting in a total of 11, which is then divided
by the value of N minus 1, which is 3, resulting in a variance approximately of 3.67.
The square root of the variance is then calculated, which results in a standard deviation measure
of approximately 1.915.
Or consider shares of Apple (AAPL) for the last five years. Returns for Apple’s stock were
37.7% for 2014, -4.6% for 2015, 10% for 2016, 46.1% for 2017 and -6.8% for 2018.
The average return over the five years is 16.5%.
The value of each year's return less the mean is 21.2%, -21.2%, -6.5%, 29.6%, and -23.3%. All
those values are then squared to yield 449.4, 449.4, 42.3, 876.2, and 542.9, respectively. The
variance is 590.1, where the squared values are added together and divided by 4 (N minus 1).
The square root of the variance is taken to get the standard deviation of 24.3%. (For related
reading, see "What Does Standard Deviation Measure In a Portfolio?")
The coefficient of variation is helpful when using the risk/reward ratio to select investments. For
example, an investor who is risk-averse may want to consider assets with a historically low
degree of volatility and a high degree of return, in relation to the overall market or its industry.
Conversely, risk-seeking investors may look to invest in assets with a historically high degree of
volatility.
While most often used to analyze dispersion around the mean, quartile, quintile, or decile CVs
can also be used to understand variation around the median or 10th percentile, for example.
The coefficient of variation formula or calculation can be used to determine the variance between
the historical mean price and the current price performance of a stock, commodity, or bond.
KEY TAKEAWAYS
• The coefficient of variation (CV) is a statistical measure of the dispersion of data points
in a data series around the mean.
• In finance, the coefficient of variation allows investors to determine how much volatility,
or risk, is assumed in comparison to the amount of return expected from investments.
• The lower the ratio of the standard deviation to mean return, the better risk-return trade-
off.
•
Coefficient of Variation Formula
Below is the formula for how to calculate the coefficient of variation:
For illustrative purposes, the following 15-year historical information is used for the investor's
decision:
• SPDR S&P 500 ETF has an average annual return of 5.47% and a standard deviation of
14.68%. SPDR S&P 500 ETF's coefficient of variation is 2.68.
• Invesco QQQ ETF has an average annual return of 6.88% and a standard deviation of
21.31%. QQQ's coefficient of variation is 3.09.
• iShares Russell 2000 ETF has an average annual return of 7.16% and a standard
deviation of 19.46%. IWM's coefficient of variation is 2.72.
Based on the approximate figures, the investor could invest in either the SPDR S&P 500 ETF or
the iShares Russell 2000 ETF, since the risk/reward ratios are comparatively the same and
indicate a better risk-return trade-off than the Invesco QQQ ETF.
What Is the Correlation Coefficient?
The correlation coefficient is a statistical measure that calculates the strength of
the relationship between the relative movements of two variables. The values
range between -1.0 and 1.0. A calculated number greater than 1.0 or less than -
1.0 means that there was an error in the correlation measurement. A correlation
of -1.0 shows a perfect negative correlation, while a correlation of 1.0 shows a
perfect positive correlation. A correlation of 0.0 shows no relationship between
the movement of the two variables.
A value of exactly 1.0 means there is a perfect positive relationship between the two variables.
For a positive increase in one variable, there is also a positive increase in the second variable. A
value of -1.0 means there is a perfect negative relationship between the two variables. This
shows that the variables move in opposite directions - for a positive increase in one variable,
there is a decrease in the second variable. If the correlation between two variables is 0, there is
no relationship between them.
The strength of the relationship varies in degree based on the value of the correlation coefficient.
For example, a value of 0.2 shows there is a positive correlation between two variables, but it is
weak and likely insignificant. Experts do not consider correlations significant until the value
surpasses at least 0.8. However, a correlation coefficient with an absolute value of 0.9 or greater
would represent a very strong relationship.
Investors can use changes in correlation statistics to identify new trends in the financial markets,
the economy, and stock prices.
KEY TAKEAWAYS
• Correlation coefficients are used to measure the strength of the relationship between two
variables.
• Pearson correlation is the one most commonly used in statistics. This measures the
strength and direction of a linear relationship between two variables.
• Values always range between -1 (strong negative relationship) and +1 (strong positive
relationship). Values at or close to zero imply weak or no relationship.
• Correlation coefficient values less than +0.8 or greater than -0.8 are not considered
significant.
In other words, investors can use negatively-correlated assets or securities to hedge their
portfolio and reduce market risk due to volatility or wild price fluctuations. Many investors
hedge the price risk of a portfolio, which effectively reduces any capital gains or losses because
they want the dividend income or yield from the stock or security.
Correlation statistics also allows investors to determine when the correlation between two
variables changes. For example, bank stocks typically have a highly-positive correlation to
interest rates since loan rates are often calculated based on market interest rates. If the stock price
of a bank is falling while interest rates are rising, investors can glean that something's askew. If
the stock prices of similar banks in the sector are also rising, investors can conclude that the
declining bank stock is not due to interest rates. Instead, the poorly-performing bank is likely
dealing with an internal, fundamental issue.
Standard deviation is a measure of the dispersion of data from its average. Covariance is a
measure of how two variables change together, but its magnitude is unbounded, so it is difficult
to interpret. By dividing covariance by the product of the two standard deviations, one can
calculate the normalized version of the statistic. This is the correlation coefficient.
What Is Frequency Distribution?
Frequency distribution is a representation, either in a graphical or tabular format, that displays
the number of observations within a given interval. The interval size depends on the data being
analyzed and the goals of the analyst. The intervals must be mutually exclusive and exhaustive.
Frequency distributions are typically used within a statistical context. Generally, frequency
distribution can be associated with the charting of a normal distribution.
Visual Representation
Both histograms and bar charts provide a visual display using columns, with the y-axis
representing the frequency count, and the x-axis representing the variable to be measured. In the
height of children example, the y-axis is the number of children, and the x-axis is the height. The
columns represent the number of children observed with heights measured in each interval.
In general, a histogram chart will typically show a normal distribution, which means that the
majority of occurrences will fall in the middle columns. Frequency distributions can be a key
aspect of charting normal distributions which show observation probabilities divided among
standard deviations.