ML Unit-3
ML Unit-3
Measures of Variability
Range: The difference between the highest and lowest values in a dataset.
Variance: The average of the squared differences from the mean.
Standard Deviation: The square root of the variance, measuring how
spread out the values are from the mean.
Example:
Using the same dataset of scores:
Range = 95 - 78 = 17
Variance = 35.36
Standard Deviation = 5.95
Z-score = (x – µ) / σ
Z-test is mainly used when the population mean and standard deviation
are given.
Confidence interval:
A confidence interval is a range of values that is likely to contain the true
population parameter. It is used to estimate the range of values in which
the population parameter lies. The confidence interval is calculated from
the sample data and is often used in hypothesis testing.
Population
It refers to the collection that includes all the data from a defined group
being studied. The size of the population may be either finite or infinite.
Sample
The study of the entire population is always not feasible, instead, a
portion of data is selected from a given population to apply the statistical
methods. This portion is called a Sample. The size of the sample is
always finite
Descriptive Statistics Inferential Statistics
Purpose Describe and Make inferences and draw
summarize the data conclusions about a population
based on sample data
The sample space for an experiment is the set of all experimental outcomes.
Example: In the experiment of hitting a target, sample space can be hitting a
target, missing the target.
For example:- when we toss a coin, either we get Head OR Tail, only two
possible outcomes are possible (H, T). But when two coins are tossed then
there will be four possible outcomes, i.e. {(H, H), (H, T), (T, H), (T, T)}.
Joint Probability
When the probability of two more events occurring together and at the same
time is measured it is marked as Joint Probability. For two events A and B, it
is denoted by joint probability is denoted as, P(A∩B) intersection of two or
more events.
Formula: P(A∩B) = P(A) * P(B)
Example: Find the probability that the number three will occur twice when
two dice are rolled at the same time.
Solution: Number of possible outcomes when a die is rolled = 6
i.e. {1, 2, 3, 4, 5, 6}
Let A be the event of occurring 3 on first die and B be the event of occurring
3 on the second die.
Both the dice have six possible outcomes, the probability of a three
occurring on each die is 1/6.
P(A) =1/6
P(B )=1/6
P(A, B) = 1/6 x 1/6 = 1/36
Marginal Probability
Probability of a single event occurring, independent of other events. It's
found by summing the probabilities of the event across all possible
outcomes of the other variable(s).
Now we have to calculate these probabilities by using a two-way table.
If you are given a pmf = pXY(x,y) , and we will calculate the marginal
probability pY(y).
To calculate the marginal probability we will use the formula
py(y)=∑ip(xi,y).
Let's draw a table to calculate these probabilities.
Conditional Probability
The probability of an event A based on the occurrence of another event B is
termed conditional Probability. It is denoted as P(A|B) and represents the
probability of A when event B has already happened.
Here:
P(A | B) = The probability of A given B (or) the probability of A which
happens after B
P(B | A) = The probability of B given A (or) the probability of B which
happens after A
P(A ∩ B) = The probability of happening of both A and B
P(A) = The probability of A
P(B) = The probability of B
Example: A bag contains 3 red and 7 black balls. Two balls are drawn at
random without replacement. If the second ball is red, what is the
probability that the first ball is also red?
Solution:
Let A: event of selecting a red ball in first draw
B: event of selecting a red ball in the second draw
P(A ∩ B) = P(selecting both red balls) = 3/10 × 2/9 = 1/15
P(B) = P(selecting a red ball in the second draw) = P(red ball and rad ball or
black ball and red ball)
= P(red ball and red ball) + P(black ball and red ball)
= 3/10 × 2/9 + 7/10 × 3/9 = 3/10
∴ P(A|B) = P(A ∩ B)/P(B) = 1/15 ÷ 3/10 = 2/9.
Example: Two dice are rolled, if it is known that atleast one of the dice
always shows 4, find the probability that the numbers appeared on the dice
have a sum 8.
Solution:
Let,
A: one of the outcomes is always 4
B: sum of the outcomes is 8
Then, A = {(1, 4), (2, 4), (3, 4), (4, 4), (5, 4), (6, 4), (4, 1), (4, 2), (4, 3), (4,
5), (4, 6)}
B{(4, 4), (5, 3), (3, 5), (6, 2), (2, 6)}
n(A) = 11, n(B) = 5, n(A ∩ B) = 1
P(B|A) = n(A ∩ B)/n(A) = 1/11.
Actually the basic difference between them is that the joint probability is the
probability of two events occurring simultaneously, and in the marginal
probability is the probability of an event irrespective of the outcome of
another variable, and conditional probability is the probability of one event
occurring in the presence of a second event.
Bayes’ Theorem
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian
reasoning, which determines the probability of an event with uncertain
knowledge.
In probability theory, it relates the conditional probability and marginal
probabilities of two random events.
P(A|B) is known as posterior, which we need to calculate, and it will be
read as Probability of hypothesis A when we have occurred an evidence B.
Example:
There are two urns containing colored balls. The first urn contains 50 red
balls and 50 blue balls. The second urn contains 30 red balls and 70 blue
balls. One of the two urns is randomly chosen (both urns have a probability
of 50% of being chosen) and then a ball is drawn at random from one of the
two urns. If a red ball is drawn, what is the probability that it comes from
the first urn?
Solution
In probabilistic terms, what we know about this problem can be formalized
as follows:
The unconditional probability of drawing a red ball can be derived using the
law of total probability:
Discrete Distribution
A discrete probability distribution is a type of probability distribution that
shows all possible values of a discrete random variable along with the
associated probabilities. In other words, a discrete probability distribution
gives the likelihood of occurrence of each possible value of a discrete
random variable.
Such a distribution will represent data that has a finite countable number of
outcomes
A discrete probability distribution counts occurrences that have countable
or finite outcomes.
In finance, discrete distributions are used in options pricing and forecasting
market shocks or recessions.
Represented by bars or points, such as in a histogram or probability mass
function plot.
Examples: binomial distribution, Poisson distribution, geometric
distribution
Continuous Distribution
Continuous Probability Distributions. A continuous distribution describes
the probabilities of a continuous random variable's possible values. A
continuous random variable has an infinite and uncountable set of possible
values (known as the range).
Involves continuous random variables that can take any value within a
range. Examples include height, weight, temperature, and time.
Represented by smooth curves, such as the bell curve of the normal
distribution.
Examples: normal distribution, exponential distribution, beta distribution.
Normal Distribution
Normal distribution, also known as the Gaussian distribution, is
a probability distribution that is symmetric about the mean, showing that
data near the mean are more frequent in occurrence than data far from the
mean.
The normal distribution appears as a "bell curve" in graphical form.
Where,
N is the normal distribution
µ is the mean of the population
σ is the standard deviation of the population
n is the sample size