Common Probalility Built in Functions
Common Probalility Built in Functions
1.Bernoulli distribution:
Applications of the Bernoulli Distribution
1. Binary Outcomes: The Bernoulli distribution is ideal for modeling experiments or trials
that have exactly two possible outcomes, such as:
o Coin flips (heads or tails)
o Yes/No survey responses
o Pass/Fail tests
o Success/Failure in business decisions or marketing campaigns
2. Quality Control: In quality control, the Bernoulli distribution can model whether a
manufactured item is defective or not. For instance, if the probability of an item being
defective is ppp, each product's quality can be modeled as a Bernoulli random variable.
3. Reliability Engineering: The distribution is used to model the reliability of individual
components in a system. For example, if a machine has a probability ppp of working
without failure, each trial (i.e., the machine's operation) follows a Bernoulli distribution.
4. Clinical Trials: In clinical trials, where a treatment can either work or not work
(success/failure), the Bernoulli distribution can model the outcome of individual treatment
attempts. If a drug has a success rate of ppp, the trials of the drug can be modeled as
Bernoulli trials.
5. Modeling Bernoulli Processes: A Bernoulli process is a sequence of independent
Bernoulli trials, where each trial has the same probability of success ppp. Applications
include:
o Modeling customer purchases (purchase or no purchase)
o Modeling website clicks (click or no click)
o Modeling defective items in a production line
6. Machine Learning: In machine learning, Bernoulli distributions are used in classification
problems where the output is binary (e.g., predicting whether an email is spam or not). The
distribution also appears in Naive Bayes classifiers when dealing with binary features.
7. Randomized Algorithms: Some randomized algorithms, particularly in probabilistic
computing, use the Bernoulli distribution to decide whether to proceed with an operation
based on a random coin flip.
8. Economics and Finance: Bernoulli distributions are used to model events like stock price
movements (up or down) or default/no-default scenarios in credit risk modeling.
Advantages of the Bernoulli Distribution
1. Simplicity and Intuition: The Bernoulli distribution is easy to understand and use, as it
deals with only two outcomes—success or failure. This simplicity makes it a natural
choice for modeling binary outcomes in real-world scenarios.
2. Wide Applicability: It is applicable to a wide range of problems where events have
exactly two possible outcomes. This makes it versatile for applications in fields like
quality control, marketing, and clinical trials.
3. Foundational for Other Distributions: The Bernoulli distribution is a building block for
other more complex distributions. For example:
o Binomial distribution is the sum of multiple independent Bernoulli trials.
o Geometric distribution models the number of Bernoulli trials before the first success.
4. Parameter Estimation: Estimating the parameter ppp (probability of success) from data
is straightforward. The sample proportion of successes provides an unbiased estimator for
ppp, making it easy to calculate probabilities and make inferences.
5. Discrete Nature: Since it is a discrete distribution, it is well-suited for problems where
outcomes are countable (i.e., limited to success or failure).
Disadvantages of the Bernoulli Distribution
1. Limited to Two Outcomes: The most significant limitation of the Bernoulli distribution
is that it can only model scenarios with two possible outcomes (success or failure). It cannot
model situations with more than two possible outcomes without modification or extension
(e.g., multinomial distribution).
2. Assumes Independence: The Bernoulli distribution assumes that trials are independent,
which may not always be true in real-world scenarios. In many applications, the outcome
of one trial can influence subsequent trials, making the independence assumption
unrealistic.
3. Fixed Probability: The Bernoulli distribution assumes that the probability of success ppp
remains constant for each trial. This assumption may not hold in dynamic systems where
the probability of success changes over time or under different conditions.
4. Not Suitable for Complex Outcomes: The Bernoulli distribution is not suitable for
modeling outcomes with multiple categories or more complex relationships, such as
ordinal or continuous data. For example, if a response variable has more than two possible
responses or a continuous range of outcomes, the Bernoulli distribution cannot be applied.
5. Limited to Binary Data: The Bernoulli distribution is limited to binary outcomes (0 or 1),
which may not be ideal for more complex data structures that require multiple categories
or continuous outcomes.
6. Over-Simplification: In some cases, the Bernoulli distribution may oversimplify a
problem. Real-world phenomena can often involve more than two categories, different
probabilities for each category, or depend on external factors that the Bernoulli distribution
does not account for.
Bernoulli distribution Mass Functions:
1.The pbinom() function in R is used to calculate the cumulative distribution function (CDF) for the
binomial distribution, not the Bernoulli distribution directly. However, it is often used in contexts where
a Bernoulli distribution is a special case of a binomial distribution with only one trial (i.e., n=1).
Syntax of pbinom() in R:
pbinom(q, size, prob)
Where:
Example:
# Parameters
The pbinom() function in R calculates the cumulative distribution function (CDF) for the
binomial distribution, and since the Bernoulli distribution is a special case of the binomial
distribution (with n=1), it works for Bernoulli distributions as well.
The CDF gives the probability that the random variable X takes a value less than or equal to a
given value q
Syntax of pbinom():
pbinom(q, size, prob)
Where:
Example in R
# Parameters for the Bernoulli distribution
q <- 0 # Value of X
n <- 1 # Number of trials (Bernoulli, so n = 1)
p <- 0.6 # Probability of success
# Display results
print(p0) # Probability of tails (failure): 0.4
print(p1) # Probability of success or failure: 1.0
3.qbinom()
Syntax of qbinom():
# Display results
print(q1) # Output: 0 (failure)
print(q2) # Output: 1 (success)
Syntax of rbinom():
Where:
Example:
# Parameters for the Bernoulli distribution
n <- 10 # Number of simulations (e.g., 10 coin tosses)
size <- 1 # Number of trials (Bernoulli, so size = 1)
prob <- 0.6 # Probability of success (heads)
# Display results
print(result_single )
# Output: Either 0 (tails) or 1 (heads)
print(result_multiple)
# Output: A vector of 10 random outcomes (0s and 1s)
Binomial Distribution:
Built in functions of Binomial Distributions
1.pbinom (): Probability mass function of binomial distribution
The Probability Mass Function (PMF) of a binomial distribution gives the probability of obtaining exactly
k successes in n independent trials, where each trial has two possible outcomes (success or failure) with
a constant probability of success p.
Syntax:
dbinom(x, size, prob)
# Parameters
n <- 5 # Number of trials
k <- 3 # Number of successes
p <- 0.5 # Probability of success
# Calculate P(X = 3)
probability <- dbinom(k, size = n, prob = p)
print(probability)
Output:
[1] 0.3125
This confirms that the probability of getting exactly 3 heads in 5 flips of a fair coin is 0.3125 or 31.25%.
2.qbinom (): Cumulative probability in binomial distribution
The Cumulative Probability in a binomial distribution refers to the probability of getting up to
or less than a certain number of successes in a series of independent trials. It is the sum of the
probabilities of all outcomes from 0 successes up to a given number k successes.
Syntax:
pbinom(q, size, prob)
Example:
# Parameters
n <- 5 # Number of trials
p <- 0.5 # Probability of success
k_values <- 0:2 # Success values (0, 1, 2)
Output:
[1] 0.5
Example:
# Parameters
p <- 0.95 # Cumulative probability
n <- 10 # Number of trials
prob <- 0.5 # Probability of success
Output: [1] 8
4.rbinom():To Generate random samples in binomial distribution.
The rbinom() function in R is used to generate random samples from a binomial distribution.
This function allows you to simulate binomial trials and get the outcomes based on the number of
trials, the probability of success, and the number of random samples you want to generate.
Syntax of rbinom()
rbinom(n, size, prob)
Example:
# Set up parameters
n <- 5 # Number of trials
p <- 0.5 # Probability of success
# Generate random numbers for each trial (5 random numbers between 0 and 1)
random_numbers <- runif(n, min = 0, max = 1)
Output:
Random numbers generated: 0.725, 0.322, 0.446, 0.849, 0.179
Successes: 3
3. Poisson distribution:
Built in functions:
1.dpois():
The dpois() function in R is used to calculate the probability mass function (PMF) for the Poisson
distribution. It returns the probability of observing a specific number of events (k) given a Poisson-
distributed random variable with a certain rate parameter (λ).
Syntax of dpois() function:
dpois(x, lambda)
Example:
# Define the parameters
lambda <- 3 # Average number of customers
x <- 5 # Number of customers we want to calculate the probability for
Output:
[1] 0.9160825
This confirms that the probability of receiving at most 5 customers in the next 10 minutes is
approximately 91.61%.
3.qpois():
Example:
# Define the parameters
lambda <- 3 # Average number of customers
p <- 0.9 # Cumulative probability
Output:
[1] 6
To calculate the quantile corresponding to a cumulative probability of 0.9 for a Poisson
distribution with a rate parameter λ=3.
4.rpois():
The rpois() function in R is used to generate random numbers from a Poisson distribution. It allows
you to simulate random events or occurrences based on the Poisson distribution for a specified rate
parameter λ (mean number of events) and a given number of random observations.
Syntax of rpois() function:
rpois(n, lambda)
Example:
# Define the parameters
lambda <- 3 # Average number of customers per 10 minutes
n <- 5 # Number of random observations to generate
Syntax:
dnorm(x, mean = 0, sd = 1)
Example:
# Given values
x <- 5 # Value for which we compute the density
mean <- 3 # Mean of the distribution
sd <- 2 # Standard deviation
Output:
The calculated density is: 0.1209854
The verified density using dnorm() is: 0.1209854
2.pnorm():
Syntax:
pnorm(x, mean = mean, sd = sd)
# Given values
x <- 5 # Value for cumulative probability
mean <- 3 # Mean of the distribution
sd <- 2 # Standard deviation
# Given values
p <- 0.8413 # Cumulative probability
mean <- 3 # Mean of the distribution
sd <- 2 # Standard deviation
Output:
The quantile corresponding to P(X <= p) is: 5
4.rnorm():
Syntax:
rnorm(n, mean = 0, sd = 1)
Example:
# Set mean and standard deviation
mean <- 10
sd <- 5
Output:
Generated random value: 7.72
2.Uniform Distribution:
Properties of the Uniform Distribution
Applications of uniform distribution:
Merits and demerits of Uniform Distribution:
Applications:
Built in functions:
1.dunif():
Syntax:
dunif(x, min = 0, max = 1)
Output:
[1] 0.3333333
2.punif():
Syntax:
punif(x, min = a, max = b)
Output:
[1] 0.3333333
This result confirms that the cumulative probability up to x=3 in the uniform distribution from 2 to 5 is
0.3333.
3.qunif():
Syntax:
qunif(p, min = a, max = b)
Example:
# Parameters for the uniform distribution
p <- 0.25 # cumulative probability
a <- 2 # lower bound
b <- 5 # upper bound
Output:
[1] 2.75
4.runif():
Syntax:
runif(n, min = a, max = b)
Example:
# Generate a single random number from a uniform distribution [2, 5]
set.seed(123) # For reproducibility
random_number <- runif(1, min = 2, max = 5)
Output:
[1] 3.854125
A seed is set (as in the example with set.seed(123)), as runif() generates random numbers.
3.Student ‘T Distribution.
Properties Student ‘T Distribution:
Limitations Student ‘T Distribution:
Syntax:
dt(x, df)
Example:
# Given values
x <- 2.0
df <- 10
Output:
[1] 0.01764
2.pt():
Syntax:
pt(q, df)
q: The t-value (upper bound of integration).
df: Degrees of freedom.
Example:
# Given values
q <- 2.0
df <- 10
Output:
[1] 0.9772499
This means that approximately 97.72% of the t-distributed random variable with 10 degrees of freedom
is less than or equal to 2.0.
3.qt():
Syntax:
qf(p, df1, df2)
Example:
# Given values
p <- 0.95
df1 <- 5
df2 <- 10
# Calculate the quantile using the qf function
quantile_value <- qf(p, df1, df2)
Output:
[1] 4.88
This means that the value of F that corresponds to a cumulative probability of 0.95 with degrees
of freedom df1=5and df2=10is approximately 4.88.
4.rt():
The rt() function simulates random variables from the t-distribution, which is a continuous
probability distribution that is bell-shaped but has heavier tails compared to the normal
distribution. The t-distribution is commonly used in statistics, particularly for hypothesis testing
(like t-tests), especially when the sample size is small or the population variance is unknown.
Syntax:
rt(n, df)
Parameters:
n: The number of random variables you want to generate from the t-distribution.
df: The degrees of freedom for the t-distribution, which determines the shape of the distribution.
Example:
# Generate 10 random values from a t-distribution with 5 degrees of freedom
n<-1
df<-5
random_values <- rt(n, df)
Output: 0.980