Statistics Top Wise Important Formulas
Statistics Top Wise Important Formulas
1. Set Theory
Set theory is a branch of mathematical logic that deals with collections of objects called sets. A set is a well-defined
collection of distinct objects.
Key Concepts:
a) Set Notation:
- Sets are usually denoted by capital letters: A, B, C, etc.
- Elements are listed within curly braces: {1, 2, 3}
- The symbol ∈ means "is an element of": 2 ∈ {1, 2, 3}
- The symbol ∉ means "is not an element of": 4 ∉ {1, 2, 3}
b) Types of Sets:
- Finite sets: Have a countable number of elements
- Infinite sets: Have an uncountable number of elements
- Empty set: Contains no elements, denoted as {} or ∅
c) Set Operations:
- Union (A ∪ B): All elements that are in A or B (or both)
- Intersection (A ∩ B): All elements that are in both A and B
- Difference (A - B): All elements in A that are not in B
- Complement (A'): All elements in the universal set that are not in A
Examples:
2. In a class of 30 students:
Let M be the set of students who play music
Let S be the set of students who play sports
If 15 students play music, 20 play sports, and 10 play both:
- |M ∪ S| = |M| + |S| - |M ∩ S| = 15 + 20 - 10 = 25
So, 25 students play either music or sports (or both)
2. Permutations
Permutations are arrangements of objects where order matters. They are used when we want to know how many
ways we can arrange a set of objects.
Key Formulas:
Examples:
2. In how many ways can the first three positions in a race with 8 runners be filled?
This is a permutation of 8 objects taken 3 at a time.
P(8,3) = 8! / (8-3)! = 8 × 7 × 6 = 336 ways
3. Combinations
Combinations are selections of objects where order doesn't matter. They are used when we want to know how many
ways we can select objects from a larger set.
Key Formula:
Examples:
2. In a standard 52-card deck, how many 5-card poker hands are possible?
This is a combination of 52 cards taken 5 at a time.
C(52,5) = 52! / (5! × 47!) = 2,598,960 possible hands
3. A bag contains 5 red marbles and 3 blue marbles. How many ways can you select 4 marbles?
This is a combination problem with two types of objects. We can break it down:
- 4 red, 0 blue: C(5,4) × C(3,0) = 5 × 1 = 5 ways
- 3 red, 1 blue: C(5,3) × C(3,1) = 10 × 3 = 30 ways
- 2 red, 2 blue: C(5,2) × C(3,2) = 10 × 3 = 30 ways
- 1 red, 3 blue: C(5,1) × C(3,3) = 5 × 1 = 5 ways
Total: 5 + 30 + 30 + 5 = 70 ways
2. Descriptive statistics: Mean, median, mode, frequency tables, bar graphs, pie charts
1. Mean:
Mean = 63 / 22 = 2.86
2. Median:
To find the median, we first need to find the middle position. Total frequency = 22 Middle position = (22 + 1) / 2 = 11.5
The median is 3.
3. Mode:
The mode is the value with the highest frequency. In this case, it's 3 with a frequency of 8.
● Mean = 2.86
● Median = 3
● Mode = 3
Bayes' Theorem
For events A and B, where P(B) > 0:
P(A|B) = [P(B|A) * P(A)] / P(B)
Using the total probability theorem:
P(A|B) = [P(B|A) * P(A)] / [P(B|A)P(A) + P(B|A')P(A')]
A discrete random variable is a variable that can take on a countable number of distinct values. Each possible value of
the discrete random variable has a probability associated with it.
## Properties
The probability mass function, denoted as P(X = x) or f(x), gives the probability that a discrete random variable X takes
on the value x.
Properties of PMF:
1. 0 ≤ P(X = x) ≤ 1 for all x
2. ∑P(X = x) = 1, summed over all possible values of x
The cumulative distribution function, denoted as F(x), gives the probability that X takes on a value less than or equal to
x.
Properties of CDF:
1. 0 ≤ F(x) ≤ 1 for all x
2. F(x) is non-decreasing
3. lim(x→-∞) F(x) = 0 and lim(x→∞) F(x) = 1
## Variance
The variance of a discrete random variable X, denoted Var(X) or σ², measures the spread of the distribution around its
mean.
1. Bernoulli Distribution
- Models a single trial with two possible outcomes (success/failure)
- P(X = 1) = p, P(X = 0) = 1 - p
- E(X) = p, Var(X) = p(1-p)
2. Binomial Distribution
- Models the number of successes in n independent Bernoulli trials
- P(X = k) = C(n,k) * p^k * (1-p)^(n-k)
- E(X) = np, Var(X) = np(1-p)
3. Poisson Distribution
- Models the number of events occurring in a fixed interval of time or space
- P(X = k) = (λ^k * e^(-λ)) / k!
- E(X) = λ, Var(X) = λ
4. Geometric Distribution
- Models the number of trials until the first success in repeated Bernoulli trials
- P(X = k) = p * (1-p)^(k-1)
- E(X) = 1/p, Var(X) = (1-p) / p²
## Examples
The expectation or expected value of a random variable X, denoted as E(X) or μ, is a measure of the central tendency
of the distribution.
### Example:
Rolling a fair six-sided die:
E(X) = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = 3.5
## 2. Variance
Variance measures the spread or dispersion of a random variable around its mean.
## 3. Conditional Expectation
The conditional expectation of X given Y, denoted as E(X|Y), is the expected value of X when Y is known or fixed.
### Example:
Consider a two-stage experiment:
1. Roll a fair die (Y)
2. Flip a coin Y times and count the number of heads (X)
The law of total expectation, also known as the law of iterated expectations or the tower rule, states that the expected
value of a random variable X can be computed by first taking the conditional expectation with respect to another
random variable Y and then taking the expectation of that result.
E(X) = E(E(X|Y))
For discrete Y: E(X) = ∑(E(X|Y = y) * P(Y = y)), summed over all possible values of y
For continuous Y: E(X) = ∫(E(X|Y = y) * f(y) dy), integrated over the entire range of Y
### Properties:
1. Useful for computing expectations in multi-stage experiments
2. Helps in breaking down complex expectations into simpler parts
This means that in our two-stage experiment, we expect an average of 1.75 heads.
### Formula:
P(X = x) = p^x * (1-p)^(1-x), where x ∈ {0, 1}
### Properties:
- Mean (μ) = p
- Variance (σ²) = p(1-p)
## 2. Binomial Distribution
### Formula:
P(X = k) = C(n,k) * p^k * (1-p)^(n-k)
where C(n,k) is the binomial coefficient ("n choose k")
### Properties:
- Mean (μ) = np
- Variance (σ²) = np(1-p)
## 3. Poisson Distribution
### Formula:
P(X = k) = (λ^k * e^(-λ)) / k!
where λ is the average rate of occurrence
### Properties:
- Mean (μ) = λ
- Variance (σ²) = λ
### Formula:
P(X = k) = p * (1-p)^(k-1), where k = 1, 2, 3, ...
### Properties:
- Mean (μ) = 1/p
- Variance (σ²) = (1-p) / p²
### Formula:
P(X = x) = 1/n, where x ∈ {a, a+1, ..., b} and n = b - a + 1
### Properties:
- Mean (μ) = (a + b) / 2
- Variance (σ²) = ((b - a + 1)² - 1) / 12
## Practical Examples
1. Bernoulli: A website A/B test where each user either clicks (1) or doesn't click (0) on a button.
2. Binomial: In a political survey of 1000 people, modeling the number who support a particular candidate.
4. Geometric: The number of times you need to spin a roulette wheel before hitting your lucky number.
5. Discrete Uniform: Modeling the day of the week a person is born, assuming equal probability for each day.
Understanding these distributions and when to apply them is crucial for modeling real-world phenomena and making
informed decisions based on probability theory.
Example:
A company is considering three investment options with the following potential returns based on market conditions:
Option A: $50k, $80k, $30k
Option B: $40k, $70k, $60k
Option C: $55k, $65k, $45k
Maximax solution: Choose Option A, as it has the highest possible payoff of $80k.
Example:
Using the same investment options:
Option A: $50k, $80k, $30k
Option B: $40k, $70k, $60k
Option C: $55k, $65k, $45k
Maximin solution: Choose Option C, as its worst-case scenario ($45k) is better than the others.
Formula: Assume all outcomes are equally likely and choose the alternative with the highest average payoff.
Max[Average(payoffs for each alternative)]
Example:
For the same investment options:
Option A: ($50k + $80k + $30k) / 3 = $53.33k
Option B: ($40k + $70k + $60k) / 3 = $56.67k
Option C: ($55k + $65k + $45k) / 3 = $55k
Formula: Choose based on a weighted average of the best and worst outcomes.
Max[α * Max(payoffs) + (1-α) * Min(payoffs)]
Where α is the optimism coefficient (0 ≤ α ≤ 1)
Example:
Let's use α = 0.6 for the same investment options:
Option A: 0.6 * $80k + 0.4 * $30k = $60k
Option B: 0.6 * $70k + 0.4 * $40k = $58k
Option C: 0.6 * $65k + 0.4 * $45k = $57k
5. Minimax Regret
Formula:
1. Calculate the regret matrix (opportunity loss)
2. Find the maximum regret for each alternative
3. Choose the alternative with the minimum of these maximum regrets
Example:
Let's create a regret matrix for our investment options:
Option A: 5, 0, 30
Option B: 15, 10, 0
Option C: 0, 15, 15
Maximum regrets:
Option A: 30
Option B: 15
Option C: 15
Minimax Regret solution: Choose either Option B or C, as they have the lower maximum regret.
Example:
Suppose we have probabilities for our market conditions:
Good (30%), Normal (50%), Poor (20%)
EMV solution: Choose Option B, as it has the highest expected monetary value.
Formula: EVPI = EMV with perfect information - EMV without perfect information
Example:
EMV with perfect information:
0.3 * $80k + 0.5 * $60k + 0.2 * $45k = $63.5k
This means that perfect information about market conditions would be worth $4.5k to the decision-maker.
These models provide different approaches to decision-making under uncertainty, each with its own assumptions and
implications. The choice of model often depends on the decision-maker's attitude towards risk and the specific context
of the decision.
11. Simple decision trees (with not more than 2 decision nodes)
Certainly! I'll explain simple decision trees with up to two decision nodes and provide an example problem.
Decision trees are graphical representations of decision-making processes. They consist of:
1. Decision nodes (squares)
2. Chance nodes (circles)
3. End nodes (triangles)
4. Branches (lines connecting nodes)
For a simple decision tree with up to two decision nodes, we typically follow these steps:
1. Draw the initial decision node
2. Draw branches for each alternative
3. Add chance nodes for uncertain outcomes
4. Add end nodes with payoffs
5. Calculate expected values working backwards from right to left
An entrepreneur is considering opening an ice cream shop. They have two decisions to make:
1. Shop size: Small or Large
2. Menu variety: Basic or Gourmet
The success of the shop depends on customer demand, which can be High or Low.
```mermaid
graph TD
A[Shop Size] -->|Small| B[Menu]
A -->|Large| C[Menu]
B -->|Basic| D((Demand))
B -->|Gourmet| E((Demand))
C -->|Basic| F((Demand))
C -->|Gourmet| G((Demand))
D -->|High 60%| H[$50k]
D -->|Low 40%| I[$20k]
E -->|High 70%| J[$70k]
E -->|Low 30%| K[$10k]
F -->|High 70%| L[$100k]
F -->|Low 30%| M[$-20k]
G -->|High 80%| N[$150k]
G -->|Low 20%| O[$-40k]
```
Therefore, the optimal decision is to open a large shop with a gourmet menu, which has an expected value of $112k.
This example demonstrates how a simple decision tree with two decision nodes can be used to analyze a complex
decision-making process involving multiple choices and uncertain outcomes. The tree helps visualize the options,
calculate expected values, and determine the optimal course of action based on the given probabilities and payoffs.
Discrete joint probability deals with the probability of two or more discrete random variables occurring together.
Key concepts:
- Joint Probability Mass Function (PMF): P(X=x, Y=y)
- Marginal Probability: P(X=x) or P(Y=y)
- Conditional Probability: P(X=x | Y=y) or P(Y=y | X=x)
Example:
Suppose we have two dice, a red (X) and a blue (Y) die. The joint probability of getting a 3 on the red die and a 4 on
the blue die is:
Properties:
1. 0 ≤ P(X=x, Y=y) ≤ 1 for all x and y
2. ∑∑ P(X=x, Y=y) = 1 (sum over all possible values of X and Y)
2. Properties of Expectation
Expectation (E) is the average value of a random variable over many trials.
Key properties:
a) Linearity: E[aX + b] = aE[X] + b
b) Additivity: E[X + Y] = E[X] + E[Y]
c) Multiplicativity for independent variables: E[XY] = E[X]E[Y]
Example:
Let X be the outcome of rolling a fair six-sided die.
E[X] = (1 + 2 + 3 + 4 + 5 + 6) / 6 = 3.5
If we define Y = 2X + 1:
E[Y] = E[2X + 1] = 2E[X] + 1 = 2(3.5) + 1 = 8
3. Properties of Variance
Variance (Var) measures the spread of a random variable around its mean.
Key properties:
a) Var(X) = E[(X - μ)²] = E[X²] - (E[X])²
b) Var(aX + b) = a²Var(X)
c) For independent variables: Var(X + Y) = Var(X) + Var(Y)
Example:
For the fair six-sided die:
E[X²] = (1² + 2² + 3² + 4² + 5² + 6²) / 6 = 91/6
Var(X) = E[X²] - (E[X])² = 91/6 - (3.5)² = 35/12 ≈ 2.92
Problem:
Two fair six-sided dice are rolled, a red (X) and a blue (Y) die.
Solutions:
b) P(X=4) = 1/6
This problem demonstrates the application of joint probability, conditional probability, expectation, and variance in a
simple scenario involving dice rolls.
1. Markov's Inequality
Markov's inequality provides an upper bound for the probability that a non-negative random variable X is greater than
or equal to some positive value a.
Formula:
P(X ≥ a) ≤ E[X] / a
Where:
X is a non-negative random variable
E[X] is the expected value of X
a is any positive real number
Key points:
- Applies to non-negative random variables only
- Provides a loose bound, but is applicable to any distribution
- Most useful when we only know the expected value of the distribution
Example problem:
The average time students spend on social media daily is 2 hours. Using Markov's inequality, find an upper bound for
the probability that a randomly selected student spends at least 5 hours on social media daily.
Solution:
Let X be the time spent on social media (in hours).
E[X] = 2 hours
a = 5 hours
Therefore, the probability that a randomly selected student spends at least 5 hours on social media is at most 40%.
2. Chebyshev's Inequality
Chebyshev's inequality provides an upper bound for the probability that a random variable X deviates from its mean by
more than k standard deviations.
Formula:
P(|X - μ| ≥ kσ) ≤ 1 / k²
Where:
X is a random variable with mean μ and standard deviation σ
k is any positive real number
Key points:
- Applies to any distribution with a finite mean and variance
- Provides a tighter bound than Markov's inequality
- Most useful when we know both the mean and variance of the distribution
Example problem:
The scores on a standardized test are normally distributed with a mean of 500 and a standard deviation of 100. Use
Chebyshev's inequality to find an upper bound for the probability that a randomly selected student's score deviates
from the mean by more than 200 points.
Solution:
μ = 500
σ = 100
a = 200
Therefore, the probability that a randomly selected student's score deviates from the mean by more than 200 points is
at most 25%.
Note: In this case, since we know the distribution is normal, we could calculate the exact probability, which would be
about 4.6%. Chebyshev's inequality provides an upper bound that is valid for any distribution with the same mean and
standard deviation, not just the normal distribution.
A continuous random variable is a variable that can take on any value within a given range. Unlike discrete random
variables, which can only take on specific values, continuous random variables can assume an infinite number of
values.
Key concepts:
c) Expected Value:
E[X] = ∫[all x] x * f(x) dx
d) Variance:
Var(X) = E[(X - μ)²] = ∫[all x] (x - μ)² * f(x) dx
a) Uniform Distribution:
- Constant probability density over an interval [a, b]
- PDF: f(x) = 1 / (b - a) for a ≤ x ≤ b
- Mean: μ = (a + b) / 2
- Variance: σ² = (b - a)² / 12
c) Exponential Distribution:
- Models time between events in a Poisson process
- PDF: f(x) = λe^(-λx) for x ≥ 0
- Mean: 1 / λ
- Variance: 1 / λ²
Example Problem:
Problem: The time it takes to complete a certain task is uniformly distributed between 10 and 20 minutes.
Solution:
a) PDF:
f(x) = 1 / (b - a) = 1 / (20 - 10) = 1/10 for 10 ≤ x ≤ 20
b) P(X > 15) = (20 - 15) / (20 - 10) = 5/10 = 0.5 or 50%
c) Expected value:
E[X] = (a + b) / 2 = (10 + 20) / 2 = 15 minutes
d) Variance:
Var(X) = (b - a)² / 12 = (20 - 10)² / 12 = 100 / 12 ≈ 8.33 (minutes²)
Key properties:
c) Mean: μ = (a + b) / 2
d) Variance: σ² = (b - a)² / 12
Example Problem:
The time to complete a certain task is uniformly distributed between 5 and 15 minutes.
1) What is the probability that the task takes less than 8 minutes?
2) What is the expected time to complete the task?
3) What is the standard deviation of the completion time?
Solution:
a = 5, b = 15
For large n and p not too close to 0 or 1, the binomial distribution can be approximated by a normal distribution. This is
based on the Central Limit Theorem.
When using this approximation, we often apply a continuity correction, which means we adjust the boundaries of our
interval by 0.5.
Example Problem:
A fair coin is tossed 100 times. Use the normal approximation to estimate the probability of getting 45 or fewer heads.
Solution:
2) Calculate μ and σ:
μ = n * p = 100 * 0.5 = 50
σ = √(n * p * (1-p)) = √(100 * 0.5 * 0.5) = 5
4) Standardize to Z-score:
Z = (X - μ) / σ = (45.5 - 50) / 5 = -0.9
Therefore, the probability of getting 45 or fewer heads in 100 tosses of a fair coin is approximately 0.1841 or 18.41%.