Statistics - Basic Concepts Part 2
Statistics - Basic Concepts Part 2
Statistics - Basic Concepts Part 2
Continuous distribution
❑ A statistical distribution used for Continuous data
NormalDistribution
(the Bell curve)
Number of customers arriving at the checkout counter
of a grocery store in an hour
0 1 2 3 4 5 6 …
Transforming data in functions
Observed data Standard functions
?
Continuous probability distributions
1. **Normal Distribution**:
- **Finance**: Stock returns are often assumed to follow a normal distribution, especially when analyzing their
historical data over short periods.
- **Quality Control**: Measurements of products, when many factors contribute to minor variations, tend to have a
normal distribution.
2. **Uniform Distribution**:
- **Simulation**: When modeling random events where each interval of time, length, or other continuous measures is
equally likely, a continuous uniform distribution is used.
- **Manufacturing**: If a product's lifespan is known to last between two fixed times with any time in between being
equally likely, it's modeled as a uniform distribution.
3. **Exponential Distribution**:
- **Reliability Engineering**: Describing the time between failures of a process or system that has a
constant failure rate.
- **Queuing Theory**: Modeling the time between arrivals in a system like customers at a bank or calls
at a call center when these times are memoryless.
These distributions find their applications in diverse sectors like finance, engineering, and operations
research, among others. They play a crucial role in shaping business strategies, decision-making processes,
and optimization efforts.
Discrete probability distributions
1. **Binomial Distribution**:
- **Quality Control**: Determining the probability of getting a certain number of defective products in a sample from a
production batch.
- **Marketing**: Estimating the success rate of email marketing campaigns by gauging the number of users who click on
an email link out of all the recipients.
2. **Poisson Distribution**:
- **Customer Service**: Predicting the number of customer service calls a call center can expect within a given time
frame.
- **Supply Chain Management**: Estimating the number of order arrivals at a warehouse in a particular time interval.
3. **Uniform Distribution**:
- **Inventory Management**: When each product in a catalog has an equal likelihood of being purchased,
their demand can be modeled with a uniform distribution.
- **Resource Allocation**: If every task in a process is equally likely to be selected for processing, their
assignment can be modeled uniformly.
4. **Bernoulli Distribution**:
- **Finance**: Determining the success or failure of an investment, where success could be a profit and
failure could be a loss.
- **Human Resources**: Assessing the outcome of an employee training program, where success indicates
the employee passed and failure indicates they did not.
While the search results provide a general understanding of probability distributions, the specific business
applications are derived from the basic properties of each distribution.
Z values and areas under the Normal curve
1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10
Height
Heights of Men and Women
1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10
Height
Heights of Men
and Women
Plot of probability Density Function
Higher
probabilities
Lower Lower
probabilities probabilities
1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10
Height
Heights of Men and Women
1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10
Height
Heights of Men and Women
1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10
Height
Heights of Men
and Women
Prob(1.50 < Height < 1.70) = ?
1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10
Height
Heights of Men and Women
Prob(Height = 1.55) = 0
Area of a
line is zero
1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10
Height
Confidence intervals
Confidence interval
Range in which we expect to find the true value of our variable
Confidence level
Probability that computed intervals cover the true value of our
variable
2. **Standard Error**:
- It measures how accurate a sample mean is as an estimate of the population mean.
- Imagine you take multiple samples (like monthly sales from different regions) and calculate
their averages. The standard error tells you how much those averages differ from one another.
- The smaller the standard error, the more confident you can be that the sample mean
accurately represents the population mean.
In essence, while the standard deviation is about variability within a single dataset, the
standard error is about the accuracy of average values across multiple datasets or samples
Z values and areas under the Normal curve
Z values
Area between -1 and 1 68.26 %
-2 and 2 95.44 %
-3 and 3 99.74 %
-6 and 6 99.99966 %
[3.4 in 1000000]
Six Sigma approach
Hypothesis H0
Hypothesis H0
m M m is not checked
is checked
Hypothesis tests (2 of 2)
Two populations have the same mean
Hypothesis H0 : mean 1 = mean 2 => mean 1 – mean 2 = 0
Hypothesis H1 : mean 1 different to mean 2
1. Compute the standard error:
standard error = sqrt [s1*s1/n1 + s2*s2/n2]
2. Choose a given confidence level
3. Compute the confidence interval centered in 0
4. Compute the difference between means
5. ¿Does the mean difference fall inside the confidence interval?
Yes = means are equal with the corresponding confidence level
No = means are not equal
95.5% confidence interval