QBA Summary Notes
QBA Summary Notes
To calculate Geomean
- Identify rates given in question
- Place in table below with 1+ rate
- Use Excel function “Geomean” -> highlight 2nd collumn
- Geometric mean rate of return = (Geomean – 1) x 100
Measures spread
Standard Deviation
- Shows variation around sample mean
- Is the square root of the variance
- The higher the standard deviation, the more the data is spread
- Larger standard deviation = larger flatter distribution
- Small standard deviation = Tall narrow distribution
Coefficient of Variation
- Most commonly used in business to compare stock prices
- Measures relative variation
- Shows variation relative to the mean
- Always expressed as a (%)
- Used to compare variability of 2 or more datasets in different units
- When comparing stock with the coefficient of variation:
o The lower the percentage, the safer the stock
o The higher the percentage, the risker the stock -> due to higher percentage changes (stock)
- Example: Let’s consider two investment options TKC vs KFC
o TKC:
= sd/mean x 100%
has a standard deviation of $100
Average price of share for TKC is $10,000 per share
CV=1%
o KFC:
has a standard deviation of $20
Average price of $100 per share
CV= 20% -> therefore, KFC is the far riskier stock as its CV is a higher percentage
Kurtosis
- Measures the relative concentration of values in the centre
compared to the tails
- Stock returns have quite a lot of extreme values and therefore we
say they have high kurtosis -> they have a lot of values out in
their tails
- Large kurtosis value = extreme values are more likely
- Positive kurtosis means more values in the tails
- The bell curve (normal distribution) has a kurtosis of 3
Box plots and Skewness
- Left skewed → With a longer tail below the mean, the shape of distribution is left-skewed.
- Symmetric → If the data are symmetric around the median, then the box and central line are centred
between the endpoints.
- Right-skewed → With a longer tail above the mean, the shape of distribution is right-skewed.
Less than
more than
1. A Priori approach
- obtains probabilities through deductive reasoning based on prior knowledge of the process
- Example: Dice -> 1/6 for a particular number, or 3/6 for even number
2. Empirical
- based on observed data
- use observations from the past to estimate the likelihood of an event
3. Subjective
- based on expert judgement -> could be based on an individual’s past experience, personal opinion, or
analysis of a particular situation
- different to priori because we don’t have prior knowledge of the process
- different to empirical because we don’t have past data to observe
- Example of where subjective approach is needed: What is the probability that Italy will win the world cup
2022?
Visualising events
- Two ways to visualise the outcomes of the same event are contingency tables and decision trees.
- Example 1: Contingency Table
- Ends of the rows and columns when summed together give total sample space
Conditional Probability: the probability of one event, given that another event has occurred
- If events A and B are independent, than P(A and B) = P(A) x P(B) -> same as before
Multiplication Rule
- The multiplication rule is used to find a joint probability of two events using a conditional probability of
one event and marginal probability of the event being conditioned on.
1. Counting rule 1
- Counting rule 1 determines the number of possible outcomes when k events can occur on each of n trials.
- Number of outcomes is equal to K^n → eg. rolling a dice three times, 6^3 = 216 possible outcomes
2. Counting rule 2
- Generalised version of counting rule 1 -> allows for the number of events to differ
3. Counting rule 3
- Used to work out the number of ways that ‘n’ items can be arranged in unique order
4. Counting rule 4
- Allows us to find out the number of ways in which a subset of an entire group of items can be arranged in
order.
- Each possible arrangement is called a permutation.
5. Counting rule 5
- Number of ways of selecting X objects from n objects, irrespective of order
- Each possible selection is called a combination.
Easier to do on
excel
OR Variance = (x^2 * P(X)) – E(X)^2
OR
- Binomial distribution: used where a rv X counts the number of “events of interest” occurring from a fixed
number of observations or trials (n)
- In excel, input “=binom.dist(“
- Number_s = x
- Trials = n
- Probability_s =
- Cumulative = False for P(X=x) & True for P(X<x) -> use number below x
- IE. if finding P(X<10), put into excel that x = 9 and press cumulative = True
- AND. P(X>10) = 1 – P(X<10)
- Examples of a binomial distribution include:
o Tossing a coin two times and counting the number of heads
o Taking 10 lightbulbs from a warehouse and counting the number of defects
- This is because removing one person or item from an infinite population will not affect the proportion of
items that exhibit the event of interest
- However, in a finite population, the proportion would be affected unless the item is replaced after it is
chosen
Characteristics
Example:
Covariance
- the covariance of a probability distribution measures the strength of the linear relationship between two
random variables (x and y) and is an important measure when assessing portfolio risk in finance.
- A positive covariance indicates a positive relationship -> when x increases, y also increases on average
- A negative covariance indicates a negative relationship -> when x increases, y on average decreases
- If the two random variables are independent, then the covariance will be zero
- Example:
- Expected value formula: the expected value of the sum of two random variables, X and Y, is simply the
sum of their separate expectations
- Variance formula: if we consider the variance of the sum of X and Y we find it is equal to the sum of the
separate variances of X and Y plus 2 times the covariance between them
- Standard deviation formula: if we consider the variance of the sum of X and Y we find it is equal to the sum
of the separate variances of X and Y plus 2 times the covariance between them
- To compute the normal distribution probability in Excel we need to use the NORM.DIST Excel function.
- Always hit “TRUE”
- IE. P(X<2) = norm.dist(2,mean,s.d,TRUE)
- Also referred to as the “Z” distribution
- Mean = 0
- Standard Deviation = 1
- Any normal distribution X (with any m and s combination) can be transformed into the standardised normal
distribution Z
- X units are translated into Z units by subtracting the mean of X and
dividing by the standard deviation of X
- X values above the mean have
positive Z values
- X values below the mean have
Negative Z values
- We convert to the standard normal to be able to find probabilities
What is normal?
- Many misunderstandings, about the normal distribution, have occurred in both the business and public
sectors through the years.
- These misunderstandings have caused both business blunders and sparked public policy debates about these
errors.
- The collapse of large financial institutions in 2008 is one such example. According to one theory, the
investment banking industry's application of the normal distribution to assess risk may have contributed to
the global collapse.
- Whenever we collect a sample and calculate a stat for that sample, that sample stat is a random variable,
and so it has a distribution
- For example, with the sample mean; every time you get a different sample, you will get a different sample
mean -> so the sample mean is a random variable which depends on who is in the sample
- Every random variable has a distribution
- Generally speaking, the larger your sample size, the closer your sample mean will be to the middle
- Example:
- Amazon calculates it needs an annual average purchase amount of $50 to be profitable. They sample 25
customers and calculate a sample mean of $49.50. What should they do? They should use a sample
distribution
- Sampling distribution: the distribution of all possible values of a sample statistic for a given sample size
(n) selected from a population
- It helps us know how accurate the statistic is for estimating the corresponding population parameter
- Any statistics we calculate (like the sample mean, sample proportion, sample median and the sample
variance) all have sampling distributions that are dependent on both the population distribution and the
chosen sample size
- Using the example, there is a distribution for sample means from all possible amazon samples of 25
customer’s annual purchase amounts
Population mean =
Population standard
deviation =
- Now to work out the sampling distribution of the sample mean for samples of n = 2 people, we need to
consider the sample means from all possible samples of size 2, when sampling with replacement from the
population. We can then find the sampling distribution based on these sample means and calculate the
summary measures for the sampling distribution
Sample mean =
Sample standard
deviation =
- Standard deviation of the sample statistics here here is called a ‘standard error’, because it is usually
trying to estimate a population parameter, so any deviation from this is considered an ‘error’
- The means of the population and sample means distribution will always be the same, while the standard
deviation will be greater than the standard error (because for the sampling distribution, we are finding the
means, and therefore bringing the values closer towards the middle)
- No matter what the population distribution, we will always find that the population mean is equal to the
mean of the sampling distribution of the sample mean, and the standard deviation of the sample mean’s
distribution will be smaller than that of the population sd for samples greater than 1
-
- When an estimator is equal to the parameter it is trying to estimate, it is called an unbiased estimator
- This results in estimates that are typically closer to the sample mean
- This means that it is highly unlikely to get a sample mean of 510g or higher
- Recap: The power of the Central Limit Theorem is that it tells us that even if the population is not normal
(it could be any strange shape, uniform, exponential, hypergeometric, Poisson, unkown etc), the sample
mean from large enough samples will be approximately normal.
- The CLT and the standard errors of the mean and of the proportion are based on samples selected from
infinite populations or from finite populations with replacement.
- However, in instances where we sample without replacement from populations that are of a finite size we
should use a finite population correction factor whenever the sample size is a significant portion of the
overall population. Let's take a look at how to calculate this.
- In most survey research, we sample without replacement from populations that are of a finite size (N).
- In these cases we should use the finite population correction (fpc) factor to reduce the standard error of the
mean and the standard error of the proportion.
- Reducing the standard errors is important because it leads to an increase in the accuracy of these estimators.
- The reduction occurs because as the sample grows, there are less observations that we do not know, which
are the source of the uncertainty in the sample statistic.
- In the limit, as the sample grows to cover the entire population, there will be no uncertainty and the sample
statistic will equal the population parameter exactly, and the standard error will be zero.
- We use the finite population correct factor when the sample size, n, is more than 5% of the population size,
N (i.e. n/N > 0.05) AND sampling is without replacement.
- The finite population correction factor is always less than 1, so it will always reduce the standard error,
resulting in a more precise estimate of the population parameter.
- As the sample size increases to become a larger portion of the population the standard error is reduced by a
larger amount