0% found this document useful (0 votes)
48 views28 pages

QBA Summary Notes

The document provides information about quantitative business analysis techniques including how to calculate the geometric mean, measures of spread such as standard deviation and coefficient of variation, and distributions of discrete random variables. Key terms defined include kurtosis, skewness, population versus sample, and probability concepts such as complement of an event, conditional probability, and independence.

Uploaded by

Jordan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views28 pages

QBA Summary Notes

The document provides information about quantitative business analysis techniques including how to calculate the geometric mean, measures of spread such as standard deviation and coefficient of variation, and distributions of discrete random variables. Key terms defined include kurtosis, skewness, population versus sample, and probability concepts such as complement of an event, conditional probability, and independence.

Uploaded by

Jordan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

QBA Summary Notes

To calculate Geomean
- Identify rates given in question
- Place in table below with 1+ rate
- Use Excel function “Geomean” -> highlight 2nd collumn
- Geometric mean rate of return = (Geomean – 1) x 100

Measures spread

Standard Deviation
- Shows variation around sample mean
- Is the square root of the variance
- The higher the standard deviation, the more the data is spread
- Larger standard deviation = larger flatter distribution
- Small standard deviation = Tall narrow distribution

Coefficient of Variation
- Most commonly used in business to compare stock prices
- Measures relative variation
- Shows variation relative to the mean
- Always expressed as a (%)
- Used to compare variability of 2 or more datasets in different units
- When comparing stock with the coefficient of variation:
o The lower the percentage, the safer the stock
o The higher the percentage, the risker the stock -> due to higher percentage changes (stock)
- Example: Let’s consider two investment options TKC vs KFC
o TKC:
= sd/mean x 100%
has a standard deviation of $100
Average price of share for TKC is $10,000 per share
CV=1%

o KFC:
has a standard deviation of $20
Average price of $100 per share
CV= 20% -> therefore, KFC is the far riskier stock as its CV is a higher percentage

Kurtosis
- Measures the relative concentration of values in the centre
compared to the tails
- Stock returns have quite a lot of extreme values and therefore we
say they have high kurtosis -> they have a lot of values out in
their tails
- Large kurtosis value = extreme values are more likely
- Positive kurtosis means more values in the tails
- The bell curve (normal distribution) has a kurtosis of 3
Box plots and Skewness

- Left skewed → With a longer tail below the mean, the shape of distribution is left-skewed.
- Symmetric → If the data are symmetric around the median, then the box and central line are centred
between the endpoints.
- Right-skewed → With a longer tail above the mean, the shape of distribution is right-skewed.

- To find population variance using excel function = Var.p

- To find standard deviation using excel function = STDEV.P

Formulae - sample versus population


o Population number = N
o Sample number = n
o For population, 'N', replaces 'n-1' for the variance and standard deviation
o Population parameters are generally denoted using Greek letters.
o Sample statistics are usually denoted using Latin letters.

Formulas for outliers

Less than

more than

Using Excel to calculate outliers: https://www.youtube.com/watch?v=6VUqR5gXkuI

3 Common approaches for assessing the probability of an uncertain event

1. A Priori approach
- obtains probabilities through deductive reasoning based on prior knowledge of the process
- Example: Dice -> 1/6 for a particular number, or 3/6 for even number
2. Empirical
- based on observed data
- use observations from the past to estimate the likelihood of an event

3. Subjective
- based on expert judgement -> could be based on an individual’s past experience, personal opinion, or
analysis of a particular situation
- different to priori because we don’t have prior knowledge of the process
- different to empirical because we don’t have past data to observe
- Example of where subjective approach is needed: What is the probability that Italy will win the world cup
2022?

The complement of an event A


- The complement of an event A is the subset of outcomes that are not part of an event.
- For example, if the event is 'a customer plans to purchase a product', then the complement of the event
would be 'the customer does not plan to purchase the product'.
- It is noted by the symbol A' (and A followed by an apostrophe). We can find the probability of A' as:
P(A') = 1 - P(A)

Visualising events
- Two ways to visualise the outcomes of the same event are contingency tables and decision trees.
- Example 1: Contingency Table

- Ends of the rows and columns when summed together give total sample space

- Example 2: Decision Tree

- Probability of joint event = P (A and B) = P(A) x P(B)


The General Addition Rule:
- Used to find the probability that either event A or event B, or both A and B occur
- The general rule says that the probability of A or B can be found by adding the marginal probabilities of
events A and B and then subtracting their joint probability

Mutually Exclusive Events:


- Refers to events that cannot occur at the same time -> rolling a six and rolling a five on a single roll

Collectively Exhaustive Events:


- Events that, together, cover the entire sample space
- Therefore, at least one of these events must occur
- Example: choosing a random day of the week, either a weekday or a weekend -> both collectively
exhaustive and mutually exclusive
- Note that if a set events, say X, Y and Z, are mutually exclusive and collectively exhaustive then their
probabilities must sum to 1, that is P(X) + P(Y) + P(Z) = 1

Conditional Probability: the probability of one event, given that another event has occurred

Eg. Contingency Table Eg. Decision Tree


Independence
- When the occurrence of one event does not affect the probability of occurrence of another event, the events
are said to be independent.
- For example, events A and B are independent when the probability of A is not affected by the fact that
event B has occurred and vice versa. This concept is represented in the formula below

- If events A and B are independent, than P(A and B) = P(A) x P(B) -> same as before

Multiplication Rule
- The multiplication rule is used to find a joint probability of two events using a conditional probability of
one event and marginal probability of the event being conditioned on.

3.2 Bayes’ Theorem


- Used to revise previously calculated probabilities, based on new information
- Allows us to reverse the conditioning between two events or variables

Bayes' theorem (general form):


- The only difference is that the marginal probability in the denominator in the simple form, P(A), has been
written out as the sum of the intersections of A with a set of mutually exclusive and collectively exhaustive
events, B1…, Bk.
3.3 Counting Rules
- Rules for counting the number of possible discrete outcomes of an experiment

1. Counting rule 1
- Counting rule 1 determines the number of possible outcomes when k events can occur on each of n trials.
- Number of outcomes is equal to K^n → eg. rolling a dice three times, 6^3 = 216 possible outcomes

2. Counting rule 2
- Generalised version of counting rule 1 -> allows for the number of events to differ

3. Counting rule 3
- Used to work out the number of ways that ‘n’ items can be arranged in unique order

4. Counting rule 4
- Allows us to find out the number of ways in which a subset of an entire group of items can be arranged in
order.
- Each possible arrangement is called a permutation.
5. Counting rule 5
- Number of ways of selecting X objects from n objects, irrespective of order
- Each possible selection is called a combination.

Module 4. Introduction to discrete probability distributions

Discrete random variable

Expected value E(X) =  [X * P(X)]

U can use sum product on excel and


highlight first two collums

Mean & Mode are also E(X)

Variance of a discrete random variable

Easier to do on
excel
OR Variance =  (x^2 * P(X)) – E(X)^2

Standard Deviation of a discrete rv

OR
- Binomial distribution: used where a rv X counts the number of “events of interest” occurring from a fixed
number of observations or trials (n)
- In excel, input “=binom.dist(“
- Number_s = x
- Trials = n
- Probability_s = 
- Cumulative = False for P(X=x) & True for P(X<x) -> use number below x
- IE. if finding P(X<10), put into excel that x = 9 and press cumulative = True
- AND. P(X>10) = 1 – P(X<10)
- Examples of a binomial distribution include:
o Tossing a coin two times and counting the number of heads
o Taking 10 lightbulbs from a warehouse and counting the number of defects

- There are 4 requirements for a situation to be modelled with a binomial distribution


1. Must be a fixed number of trials (n)
2. Two categories are mutually exclusive and collectively exhaustive (whether the event occurs or not)
3. Each observation has constant probability for the event of interest occuring ()
4. Observations are independent (outcome of one observation does not affect the outcome of the other)

- There are 2 random sampling methods that can deliver independence:


1. Sampling from an infinite population without replacement
2. Sampling from a finite population with replacement

- This is because removing one person or item from an infinite population will not affect the proportion of
items that exhibit the event of interest
- However, in a finite population, the proportion would be affected unless the item is replaced after it is
chosen

- Formula for binomial distribution:

Binomial Distribution Characteristics


1.2 Poisson Distribution
Poisson Distribution: is used when counting the number of occurrences of an event in a fixed interval of time,
area, volume or distance / “window of opportunity”
- In excel, input “=poisson.dist(“
- x=x
- mean = 
- cumulative = depends
- IF finding P(X<4), you use x = 3
- IF finding P(X>=2), you do 1-P(X<=1)
- Different to binomial distribution (binomial distribution counts the number of outcomes in a fixed number
of trials)
- For the Poisson model to be applied, there should be no upper limit on the potential counts
- The window of opportunity: is a continuous interval of time, area, volume etc. in which at least one
occurrence of an event can occur -> ie it could be the number of insurance claims in a month, the number of
flat tyres in 100,000 km of driving
- Properties of the poisson distribution:
o The data count is the number of times an event occurs in a given area/time/window of opportunity
o The probability that an event occurs in one window of opportunity is the same for all other windows of
opportunity
o The number of events in one window of opportunity occur independently of events in other windows of
opportunity
o The probability that two or more events occur in a window of opportunity approaches zero, as the window
of opportunity becomes smaller
o The average number of events per window of opportunity is denoted  (lambda) where  > 0
o Formula for the poisson distribution

Characteristics

Example:

Customers arrive at a fast food outlet at a rate of 3 per minute.

Find the distribution of customer arrivals per minute


Steps
1. 1 minute is given as the fixed window of opportunity
2. Assume customers arrive independantly
3. Given  = 3 per minute
4. Poisson seems a good choice for this problem

For example, if  = 3, the probability of observing 5 events (successes) is calculated as:


 The poisson distribution is positively skewed -> tail extends further to the right
 Distributions go off into infinity -> range = (0,)
 Help with Poisson on excel - https://www.youtube.com/watch?v=ovGKby95xKA
4.4 Hypergeometric Distribution
- Similar to the binomial distribution, the hypergeometric distribution models probabilities for the number of
"events of interest" in a sample or in a fixed number of trials.
- Used when selecting from a finite population without replacement -> diff to binomial which is used only
when selecting from an infinite population or finite population with replacement
- In excel, input =hypergeom.dist(
- Sample_s = x
- number_sample = sample size
- population_s = population sample = A for mean equation
- number_p = population size
- IE. in a sample size of 5 (workshop example), P(X>3) = P(X=4) + P(X=5)
- The hypergeometric distribution allows the probabilities to change based on the results of earlier events so
the probability of an event of interest is no longer constant over all observations.
- Ie. Taking a blue ball out of a jar of red and blue ball will change the probability of getting a blue ball again
on the next trial (without replacement)

- Properties of the hypergeometric distribution:


o “n” trials in a sample taken from a finite population of size N
o The sample is taken without replacement
o Outcomes of trials are now dependant
o Concerned with finding the probability of “X=xi” items of interest in the sample where there are “A”
items of interest in the population
 Help with hypergeometric on excel - https://www.youtube.com/watch?v=QbTDpBDWbGs
4.5 Covariance and summing random variables

Covariance
- the covariance of a probability distribution measures the strength of the linear relationship between two
random variables (x and y) and is an important measure when assessing portfolio risk in finance.
- A positive covariance indicates a positive relationship -> when x increases, y also increases on average
- A negative covariance indicates a negative relationship -> when x increases, y on average decreases
- If the two random variables are independent, then the covariance will be zero

- Example:

The Mean =  [X * P(X) -> same


Sums of random variables
- The covariance is useful when dealing with the sums of both variables
- Sums of random variables are used commonly in finance when we construct portfolios of stocks
- As the returns of each asset in a portfolio can be considered to be random variables, knowing how to deal
with their sums is very useful
o The return of a portfolio can be found as a weighted sum of the assets that make up that portfolio
o Each of the assets returns can be considered to be random variables

- Expected value formula: the expected value of the sum of two random variables, X and Y, is simply the
sum of their separate expectations

- Variance formula: if we consider the variance of the sum of X and Y we find it is equal to the sum of the
separate variances of X and Y plus 2 times the covariance between them

- Standard deviation formula: if we consider the variance of the sum of X and Y we find it is equal to the sum
of the separate variances of X and Y plus 2 times the covariance between them

The weighted sum of 2 random variables:


- Expected value formula: Lowercase a and b represent constant numbers. X and Y are random variables.
- Variance formula: The variance is then found for aX+bY

Three must know facts relating to random variables:


1. Random variables are usually represented by capital letters X, Y, Z
2. Observed values of the random variables are usually denoted by lower case letters such as x, y, z
3. A probability distribution describes the likelihood of obtaining the possible values that are random variable
can assume
- When you do any probability question, the first thing you might like to do is define a random variable and
then identify its probability distribution
Module 5. Continuous Probability Distributions

5.1 Introducing key concepts


- Continuous Random Variable: can potentially take any value within a range, depending on how accurately
and precisely it can be measured
- There is infinite possible values that the random variable can take
- Examples of continuous random variables:
 Delivery times, Time between trades, Asset returns, Default rates, Company profit, GDP, Revenue, Cost,
Height, Volume of cans, Weight of packets, Temperatures
- For discrete distributions, we have probabilities at fixed points but in between two integers the probability
is zero (non existent)
- In contrast, continuous distributions need to have a value for every possible point that X can take -> for all
values on the horizontal axis within the range for which the variable is defined
- We only find the probability of a range of values, not the probability of any particular value
- The probability of any particular value is zero (the probability of getting a sales revenue of 1 million dollars
is 0)
- To find a probability for a range of values from a probability density function we find the area under the
curve across the range -> integration
- Continuous distributions can be a variety of shapes (note the shape of the shaded areas). Below are four
examples of continuous distributions

Probability Density Function Interpretation


- Each density curve f(x) represents the relative likelihood of each X value
- The height of the PDF is a measure of the relative likelihood of obtaining values in that neighbourhood
(higher point on curve means these values are more likely to occur)
- The area of each shaded region is the probability that X is in that region
- The area under the entire PDF is always exactly 1
Ie. P(a<X<b) = 1 if (a,b) covers all possible values of X
- The area under a single value for X equals 0
Ie. P(a<X<a) = 0
- Probabilities for continuous random variables are only considered for regions, ie.

5.2 Normal Distribution


- Normal distribution also known as bell-shaped density curve or the gaussian distribution
- The normal distribution is not only symmetrical but bell-shaped - meaning most of the values of the
continuous variable will cluster around the mean.
- Properties:
o Symmetric (skewness = 0)
o Mean = median = mode -> always occurs at the centre of the distribution and is the highest point
o Location given by the parameter; 
o Spread is determined by the standard deviation parameter; 
o The random variable has an infinite theoretical range: -  to + 
- Most real data are not normally distributed, however, there are some exceptions
- Exceptions:
o IQ test results
o Brownian motion → This is the erratic random movement of microscopic particles in a fluid and is
believed to be normally distributed.
o Sums of many random variables
- Mostly, we use normal distributions as an approximation

 The larger the ^2, the


greater the spread

5.3 Standard Normal Distributions

- To compute the normal distribution probability in Excel we need to use the NORM.DIST Excel function.
- Always hit “TRUE”
- IE. P(X<2) = norm.dist(2,mean,s.d,TRUE)
- Also referred to as the “Z” distribution
- Mean = 0
- Standard Deviation = 1
- Any normal distribution X (with any m and s combination) can be transformed into the standardised normal
distribution Z
- X units are translated into Z units by subtracting the mean of X and
dividing by the standard deviation of X
- X values above the mean have
positive Z values
- X values below the mean have
Negative Z values
- We convert to the standard normal to be able to find probabilities  

 Use excel and hit “standardise”


 Use excel and hit “NORM.INV”
IE. P(X>?) = 10%
- Same as P(X<?) = 90%
- Norm.inv(90%,mean,standard dev)

IE. P(a<X<b) = 95%


- 95% means 2.5% below a & 2.5%
above b
- a = Norm.inv(2.5%,mean,sd)
- b = Norm.inv(97.5%,mean,sd)

For Excel tutorial -> https://www.youtube.com/watch?


time_continue=36&v=bWh7Av_PavY&feature=emb_logo

Type True in excel

5.4 Assessing Normality

What is normal?
- Many misunderstandings, about the normal distribution, have occurred in both the business and public
sectors through the years.
- These misunderstandings have caused both business blunders and sparked public policy debates about these
errors.
- The collapse of large financial institutions in 2008 is one such example. According to one theory, the
investment banking industry's application of the normal distribution to assess risk may have contributed to
the global collapse.

The following are the seven indicators for assessing normality:


1. Is the sample mean ≈ sample median?
2. Is the empirical rule approximately satisfied?
3. Is the IQR ≈ 1.33 standard deviations?
4. Is the boxplot (smaller n) and histogram (larger n) close to symmetric?
5. Is the histogram roughly bell-shaped?
6. Is there an absence of clear extreme, outlying observations or ‘fat tails’?
7. Are the sample skewness and kurtosis statistics ≅ 0?

5.5 Uniform Distribution


- A continuous uniform distribution: is also called the rectangular distribution because it has equal density
for all possible outcomes of the random variable
 Height = 1/b-a

- Properties of the uniform distribution:


5.6 Exponential Distribution
- To compute the exponential distributions in Excel we need to use the EXPON.DIST Excel function
- X=x
- Lambda =  = 1/mean
- Cumulative = True
- If asking probability of occuring within a period of time, say 7 days, must do P(X/7) and put x into
excel as x/7
- The exponential distribution: is a positive-valued, right skewed distribution
- Range of 0 -> 
- Long tail out to the right
- As lambda () decreases, the distribution shifts closer to the X-axis
for the lower values of X as the tail goes to higher values off to
infinity
- It is always: Mode < median < mean
- Mean = 1/ λ
- Standard deviation = 1/ λ
- Variance = 1/ λ^2
- It has one parameter λ (lambda)
- The Exponential Distribution is often used to model the length of
time between two occurences of an event (ie. time between events
such as time between trucks arriving at an unloading dock, time
between people arriving to be seated in a restaurant, time between
transactions at an ATM)
- Related to the poisson distribution:
o If Y ~ Poisson (λ), counted in a fixed period of time, then the time between each event counted (say X) has
an exponential distribution with mean 1/λ.
o Poisson counts how many events occur in a given period of time, whereas exponential can be used to
model the time between successive occurrences
- For help on exponential probability distribution on excel -> https://www.youtube.com/watch?v=cPA6f-
3WUEc&t=4s

Module 6. Sampling Distributions

We look at the distribution of 2 types of statistics:


1. The distribution of the sample mean
2. The distribution of the sample proportion

- Whenever we collect a sample and calculate a stat for that sample, that sample stat is a random variable,
and so it has a distribution
- For example, with the sample mean; every time you get a different sample, you will get a different sample
mean -> so the sample mean is a random variable which depends on who is in the sample
- Every random variable has a distribution
- Generally speaking, the larger your sample size, the closer your sample mean will be to the middle
- Example: 
- Amazon calculates it needs an annual average purchase amount of $50 to be profitable. They sample 25
customers and calculate a sample mean of $49.50. What should they do? They should use a sample
distribution 
- Sampling distribution: the distribution of all possible values of a sample statistic for a given sample size
(n) selected from a population
- It helps us know how accurate the statistic is for estimating the corresponding population parameter
- Any statistics we calculate (like the sample mean, sample proportion, sample median and the sample
variance) all have sampling distributions that are dependent on both the population distribution and the
chosen sample size
- Using the example, there is a distribution for sample means from all possible amazon samples of 25
customer’s annual purchase amounts

Developing and Visualising a Sampling Distribution


- Assume there is a population of size N=4, with random variable, X, as the number of business meetings this
month. Values of X are 18,20,22,24

Population mean =

Population standard
deviation =

- Now to work out the sampling distribution of the sample mean for samples of n = 2 people, we need to
consider the sample means from all possible samples of size 2, when sampling with replacement from the
population. We can then find the sampling distribution based on these sample means and calculate the
summary measures for the sampling distribution

Comparing the population to the sample mean distribution


- More likely to get a sample mean of 21, despite having a uniform distribution
- Distribution has become much more centred towards the middle

Sample mean =

Sample standard
deviation =

- Standard deviation of the sample statistics here here is called a ‘standard error’, because it is usually
trying to estimate a population parameter, so any deviation from this is considered an ‘error’
- The means of the population and sample means distribution will always be the same, while the standard
deviation will be greater than the standard error (because for the sampling distribution, we are finding the
means, and therefore bringing the values closer towards the middle)

The Sampling Distribution of the mean

- No matter what the population distribution, we will always find that the population mean is equal to the
mean of the sampling distribution of the sample mean, and the standard deviation of the sample mean’s
distribution will be smaller than that of the population sd for samples greater than 1
-

- When an estimator is equal to the parameter it is trying to estimate, it is called an unbiased estimator

Sampling distribution of the mean: 


 The mean of the sample mean is always equal to the population mean
 The standard error of the sample mean is equal to the population standard deviation divided by the
square root of the sample size
 The standard error of the sample mean is smaller than the population mean when n > 1 (and gets
smaller as the sample size grows)
 If the population is normally distributed, the sample mean also follows a normal distribution
Standard error of the mean (standard deviation of the sample mean):
 Different sample of the same sample size, n, from the same population will yield different sample
means
 A measure of the variability in the mean from sample to sample is given by the standard error of the
mean, calculated by 

- This results in estimates that are typically closer to the sample mean

If the population is normally distributed:


- If a population is normally distributed with mean and standard deviation, the sampling distribution of X is
also normally distributed with:

Sampling distribution properties: normal population:


- As n increases, standard error decreases
 Norm.s.dis in excel to find the probability once you already have the z-score

 Norm.s.inv to find the z-score when given probability


 And hit TRUE
- Now we have found the numbers on the Z distribution, the final step is to transform these points back into
equivalent points on the sampling distribution of ‘X bar’
Let's recap some key points
1. The mean of the sample mean is always equal to the population mean
2. The standard error of the sample mean is equal to the population standard deviation divided by the square
root of the sample size
3. The standard error of the sample mean is smaller than the population mean when n > 1 (and gets smaller as
the sample size grows)
4. If the population is normally distributed, the sample mean also follows a normal distribution
Calculating the z-score in excel: https://www.youtube.com/watch?v=0pheNN99HU4

6.3 The Central Limit Theorem (CLT)

- CLT: Even if the population is not normally distributed, sample


means of random samples from any population will be
approximately normally distributed, as long as the sample size is
large enough
- If a sample is at least 30, then the Central Limit Theorem will apply
o The sample mean’s sampling distribution will be
approximately normal
o The approximation always improves for larger sample sizes

- For most population distributions, n>=30 will give a sampling


distribution for the sample mean that is approximately normal
- For fairly symmetric distributions, n>=15 is enough for
approximate normality
- Visual example ->
Got this value using table

- This means that it is highly unlikely to get a sample mean of 510g or higher
- Recap: The power of the Central Limit Theorem is that it tells us that even if the population is not normal
(it could be any strange shape, uniform, exponential, hypergeometric, Poisson, unkown etc), the sample
mean from large enough samples will be approximately normal.

6.4 Sampling Distribution of the Proportion


-  = the population proportion
- Sample proportion (p) estimates 

- The sampling distribution of p follows a binomial distribution


- By the Central Limit Theorem, p is approximately distributed as a normal distribution when n is large
enough
- u’p’ is the average value of the sample proportion 
- POPULATION PROPORTION = 
- The average value of our sample proportion will be equal to the population proportion, so it is an unbiased
estimator. This was similar for our sample mean – where the mean of the sample mean was the same as the
mean for the population.
- The sampling distribution of p follows a binomial distribution. By the CLT, p is approx. distributed as a
normal distribution when n is large enough 
- Conditions: 

 Assuming simple random


sampling (SRS)

 If the conditions are met, then we


can say that the sample proportion
is approx normal

Z-value for proportions:


 If the above criteria are met, then we can say that the sample proportion is approximately normal due to
the Central Limit Theorem.
 This means that if we standardise it, we should approximately get a standard normal distribution. We
can then work with the standard normal tables to find probabilities for the sample proportion.
 Standardise p to a Z-value with the formula:
Sampling from finite populations

- The CLT and the standard errors of the mean and of the proportion are based on samples selected from
infinite populations or from finite populations with replacement.
- However, in instances where we sample without replacement from populations that are of a finite size we
should use a finite population correction factor whenever the sample size is a significant portion of the
overall population. Let's take a look at how to calculate this.

Finite population correction factor

- In most survey research, we sample without replacement from populations that are of a finite size (N).
- In these cases we should use the finite population correction (fpc) factor to reduce the standard error of the
mean and the standard error of the proportion.
- Reducing the standard errors is important because it leads to an increase in the accuracy of these estimators.
- The reduction occurs because as the sample grows, there are less observations that we do not know, which
are the source of the uncertainty in the sample statistic.
- In the limit, as the sample grows to cover the entire population, there will be no uncertainty and the sample
statistic will equal the population parameter exactly, and the standard error will be zero.
- We use the finite population correct factor when the sample size, n, is more than 5% of the population size,
N (i.e. n/N > 0.05) AND sampling is without replacement.
- The finite population correction factor is always less than 1, so it will always reduce the standard error,
resulting in a more precise estimate of the population parameter.
- As the sample size increases to become a larger portion of the population the standard error is reduced by a
larger amount

Review the formula:


- Finite population correction factor:
 Note that the numerator is always smaller than the denominator since n is greater than 1 for all
practical cases, so the fpc is less than 1
-

Calculating standard errors:


 The fpc is always less than 1
 So it always reduces the standard error
 Resulting in more precise estimates of population parameters

You might also like