3) Statistical Measures of Asset Returns
3) Statistical Measures of Asset Returns
A property and potential draw back of the arithmetic mean is its sensitivity to extreme values/outliers.
Since all observations are equal weighted, the arithmetic mean can be pulled upwards or downwards sharply
because of the presence of outliers.
The median: is the value of the middle item in a dataset, that has been sorted in ascending or descending
order.
In an odd numbered sample of n observations, median is the value of the observation that occupies (n+1)/2
position.
In an even numbered sample, median is the mean of the observation that occupies the position n/2 and (n+2)/2.
A distribution will have only one median.
An advantage compared to mean is that, the outliers don’t affect it.
It doesn’t use all the information about the size of the observations and only focuses on the relative position of
the ranked observations
It is less mathematically tractable than a mean.
The mode: The mode is the most frequently occurring value in a dataset. A dataset can have more than one
mode or no mode at all.
unimodal- when a dataset has a single value that is observed most frequently
bimodal- when a dataset has 2 most frequently occurring values.
Measures of location
Involves identifying values at or below which specified proprtions of the data lie.
Quantiles: A value at or below which a stated fraction of the data lies. Also referred to as a fractile.
a) Quartiles: divide the distribution into quarters
b) Qunitiles: divides the distribution into fifths
c) Deciles: into tenths
d) Percentiles: into hundredths
Interquartile range (IQR): difference between the third quartile and the first quartile, Q3 - Q1
One way to visualise dispersion data across quartiles is to use a diagram called box and whisker chart.
The whiskers are the lines that run from the box and are bounded by the “fences,” which represent the lowest
and highest values of the distribution.
Dividing data into quantiles based on a specific objectively quantifiable characteristic, such as sales, market
capitalization, or asset size allows analysts to evaluate the impact of that specific characteristic on a quantity
of interest, such as asset returns, sales, growth, or valuation metrics.
MEASURES OF DISPERSION
Dispersion: is the variability around the central tendency.
Mean addresses the returns and dispersion addresses the risk and uncertainty.
Absolute dispersion: amount of variability present without comparison to any reference point or benchmark.
Common measures of dispersion are:
The range: difference between the maxima and minimum values in a dataset
Range = Maximum value - Minimum value
advantage is the ease of computation, disadvantage is that it cannot tell how the data is distributed.
it is sensitive to outliers that may not be a representative of the distribution.
Mean Absolute Deviations: Since sum of deviations around the mean would always be 0, we can examine the
absoulte deviations around the mean.
it uses all the observations in the sample, but is difficult to manipulate mathematically.
Sample variance and sample standard deviation:
a) Sample Variance: average of the squared deviations around the mean
20/07/2024
b) Coefficient of variation: at times it becomes difficult to interpret standard deviation in terms of
relative degree of variability of different sets of data, may be because the datasets have markedly
different means or they have different units of measurement. Hence coefficient of variation is used.
Relative dispersion: amount of dispersion relative to a reference value or benchmark. Measured using
coefficient of variation: ratio of standard deviation of a set of observations to their mean.
When the observations are returns CV measures the amount of risk (standard deviation) per unit of
reward. (Mean)
if the sample mean is negative, the statistic becomes useless.
CV is a scale free measure and allows the direct comparison of dispersion of across different datasets.
Kurtosis 21/07/2024
Greater chance of deviations from mean is perceived as a higher risk.
The statistical measure that indicates the combined weight of the tails of a distribution relative to the rest of
the distribution.
A distribution having fatter tails than normal distributions is called leptokurtic or fat tailed,.
A distribution having thinner tails than normal distributions is called platykurtic or thin tailed.
A distribution having similar weights in tails like normal distribution is called mesokurtic.
A fat-tailed (thin-tailed) distribution tends to generate more frequent (less frequent) extremely large deviations
from the mean
A normal distribution has a kurtosis of 3, fat tailed has above 3.0 and thin tailed has less than 3.0.
Sample excess kurtosis: A sample measure of the degree of a distribution’s kurtosis in excess of the normal
distribution’s kurtosis.
the above equation is a sample covariance which is the average value of the product of deviation of observation
on two random variables from their sample means.
covariance is a measure of the joint variability a two random variables
If the random variables vary in the same direction then their covariance is positive. If they vary in the
opposite direction their covariance is negative.
Size of the covariance is difficult to interpret it as it involves squared units units of measure. Hence we use
correlation coefficient.
Sample correlation coefficient: standardized measure of how two variables in a sample more together
Expresses the strength of linear relationships b/w 2 random variables
Properties of correlation
A) correlation ranges from -1 to +1 for 2 random variables.
B) a correlation of zero indicates absence of any linear relationship b/w the variables
C) a positive correlation of +1 indicates perfect linear relationship
D) a negative correlation of -1 indicates perfect inverse linear relationship.