Natural Language Processing Natural Language Processing: Unit - 1 Elementary Probability Theory
Natural Language Processing Natural Language Processing: Unit - 1 Elementary Probability Theory
Unit – 1
Elementary Probability Theory
Mathematical Foundations
• Elementary Probability Theory
• Probability spaces
• Conditional probability and independence
• Bayes' theorem
• Random variables
• Expectation and variance
• Notation
• Joint and conditional distributions
• Determining P
• Standard distributions
• Bayesian statistics
Elementary Probability Theory
Probability space
A probability space models random events and is made up of
three parts:
1.Sample space: the set of all possible outcomes. For example,
if you toss a coin twice, the sample space is {HH,HT,TH,TT}.
The sample space is sometimes denoted by the Greek letter
omega (Ω).
2.Event space: the set of all events (can be zero to any number).
3.Probability function, or the assignment of probability to the
event. For example, the odds of tossing a coin and getting a
heads is 50%. Probabilities are nonzero and add up to 1.*
The sample Space, S
The sample space, S, for a random phenomena is the set of all
possible outcomes
Examples
Tossing a coin – outcomes
S ={Head, Tail}
2. Rolling a die – outcomes
S={1, 2, 3, 4, 5, 6}
Event E
• Events in probability can be defined as a set of outcomes of a
random experiment
• The sample space indicates all possible outcomes of an
experiment
• Thus, events in probability can also be described as subsets of
the sample space
Independent events: Examples
A coin is flipped and a die is rolled. A card is drawn from a deck and
Find the probability of getting a replaced; then a 2nd card is drawn.
Head on the coin and a 4 on the Find the probability of getting a
die queen and then an ace
{
H 1, H 2, H 3,H 4, H 5, H 6, P(Q)=4/52
T 1, T 2, T 3, T 4, T 5, T 6 P(A)=4/52
} P(Q ∩ A)=P(A).P(B)=(4/52).(4/52)
P(A)=1/2 P(Q ∩ A)=1/169
P(B)=1/6
P(A ∩ B)=1/12=P(A).P(B)=1/12
Independent events: Examples
A box contains 3 red balls, 2 blue Selecting 2 blue balls
balls, and 5 white balls. A ball is
selected and its color noted. Then
it is replaced. A 2nd ball is
selected and its color noted. Find Selecting a blue ball and then a
the probability of white ball
1. Selecting 2 blue balls
2. Selecting a blue ball and then a
white ball Selecting a red ball and then a blue
3. Selecting a red ball and then a ball
blue ball
Dependent events
• Two events are dependent if the outcome of the first event affects the outcome
of the second event, so that the probability is changed.
Example:
Suppose we have 5 blue marbles and 5 red marbles in a bag. We pull out one
marble, which may be blue or red. Now there are 9 marbles left in the bag. What
is the probability that the second marble will be red? It depends.
• If the first marble was red, then the bag is left with 4 red marbles out of 9 so
the probability of drawing a red marble on the second draw is 4/9 .
• But if the first marble we pull out of the draw is blue, then there are still 5 red
marbles in the bag and the probability of pulling a red marble out of the bag is
5/9 .
• The second draw is a dependent event.
• It depends upon what happened in the first draw.
Conditional probability: Example
Two dies are thrown simultaneously and the sum of the numbers obtained is
found to be 7. What is the probability that the number 3 has appeared at least
once?
Solution: The sample space S would consist of all the numbers possible by the
combination of two dies. Therefore S consists of 6 × 6 i.e. 36 events.
• Event A indicates the combination of the numbers obtained is found to be 7.
A = {(1, 6)(2, 5)(3, 4)(4, 3)(5, 2)(6, 1)}
P(A) = 6/36
• Event B indicates the combination in which 3 has appeared at least once.
B = {(3, 1)(3, 2)(3, 3)(3, 4)(3, 5)(3, 6)(1, 3)(2, 3)(4, 3)(5, 3)(6, 3)}
P(B) = 11/36
n(A ∩ B) = 2
P(A ∩ B) = 2/36
• Applying the conditional probability formula we get,
P(B/A)=2/6=1/3 =P(A ∩ B)/P(A)
Bayes Theorem
Let E1, E2,…, En be a set of events associated with a sample space S, where
all the events E1, E2,…, En have nonzero probability of occurrence and
they form a partition of S. Let A be any event associated with S, then
according to Bayes theorem,
If a random variable X is
distributed according to the pmf
p(x), then we will write X ~ p(x).
For a discrete random variable,
Expectation
Example:
Notice that the expected value is not one of the possible outcomes: you can’t roll a 3.5.
However, if you average the outcomes of a large number of rolls, the result approaches
3.5.
Variance
Standard Deviation
Joint and conditional distributions
• The joint probability mass function for two discrete random variables X,
Y is:
Marginal Distribution
• Related to a joint pmf are marginal pmfs, which total up the probability
masses for the values of each variable separately :
P(x) = n! • px • qn-x
(n – x )!x!
for x = 0, 1, 2, . . ., n
where
n = number of trials
x = number of successes among n trials
p = probability of success in any one trial
q = probability of failure in any one trial (q = 1 – p)
Example
A die is thrown 6 times. If ‘getting an odd number’ is a success, what is the
probability of
(i) 5 successes? (ii) At least 5 successes? (iii) At most 5 successes?
Multinominal Distribution
The generalization of a binomial trial to the case where each of the trials has
more than two basic outcomes is called a multinomial experiment, and is
modeled by the multinomial distribution.
Normal distribution
Data can be "distributed" (spread out) in different ways.
In many cases the data tends to be around a
central value with no bias left or right, and it
gets close to a "Normal Distribution"