0% found this document useful (0 votes)
13 views29 pages

2 Probability and Statistics

The document provides an overview of statistics and probability, highlighting the differences between probability (predicting future events) and statistics (analyzing past events). It covers key concepts such as descriptive and inferential statistics, probability models (discrete and continuous), joint and conditional probabilities, Bayes' theorem, and various probability distributions including Bernoulli, binomial, and normal distributions. The document also discusses the Central Limit Theorem and the empirical rule related to normal distributions.

Uploaded by

mathurarushi4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views29 pages

2 Probability and Statistics

The document provides an overview of statistics and probability, highlighting the differences between probability (predicting future events) and statistics (analyzing past events). It covers key concepts such as descriptive and inferential statistics, probability models (discrete and continuous), joint and conditional probabilities, Bayes' theorem, and various probability distributions including Bernoulli, binomial, and normal distributions. The document also discusses the Central Limit Theorem and the empirical rule related to normal distributions.

Uploaded by

mathurarushi4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Statistics & Probability

Probability vs. Statistics Both deal with uncertainty, randomness

Probability: It deals with the prediction of likelihood of future events


 Logically self-contained
 Follows few rules
 Has one correct answer

Statistics: It involves the analysis of the frequency of past events


 Works on experimental data
 No single correct answer
 Helps to understand patterns, relationships, and trends within data
Statistics – Classification
 Descriptive Statistics: It involves methods for summarizing and presenting data in a
meaningful and concise manner. Provides a snapshot of the data’s characteristics.

e.g.: Mean, Median, Mode, Range, Variance, Standard Deviation

 Inferential Statistics: It involves using sample data to draw inferences, predictions, or


decisions about a larger population.

e.g.: Hypothesis Testing, Confidence Intervals


Terminology
 A population is the entire group that you want to draw conclusions about.

 A sample is the specific group that you will collect data from using random
sampling. The size of the sample is always less than the total size of the population.

Sampling

Inference

Link: Quora
I.I.D
 Independent and identically distributed (or IID) random variables are mutually
independent of each other and are identically distributed in the sense that they are drawn
from the same probability distribution.

Example: Coin flips, Rolling a fair six-sided die, survey responses are assumed to be IID
Who is best? X vs. Y
Exams 1 2 3 4 5 6 7 8 9 10
Student Scores (50)
X 28 32 44 33 43 30 41 36 42 27
Y 48 19 22 45 26 50 31 28 38 49

Student X Student Y
SUM 356 356
MEAN 35.6 35.6
MEDIAN 34.5 34.5
STANDARD 6.4 11.9
DEVIATION
Probability
 In a random experiment, probability of an event is a number indicating how likely that
event will occur.

 The sample space associated with a random experiment is the set of all
possible outcomes. An event is a subset of the sample space.
e.g., Ω = {1,2,3,4,5,6}

 A real number P(A) is assigned to every event A, called the probability of A.

 This number P(A) is always between 0 and 1, where 0 indicates impossibility and
1 indicates certainty.
Probability
To qualify as a probability, P must satisfy three axioms:

 Axiom 1: P(A) ≥ 0 for any event A

 Axiom 2: Probability of the sample space, P(Ω) = 1

 Axiom 3: If A1,A2, . . . are disjoint (mutually exclusive) events, then

P(A1 A2 A3..... )=P(A1) + P(A2) + P(A3) + ....


Probability Models – Discrete Probability
 Discrete Probability Model: In discrete probability models we can compute the
probability of events by adding all the corresponding outcomes

 Bernoulli Model: Simple Model that represents binary outcomes


with a single parameter

 Binomial Model: Describes the number of successes in a fixed Source: Wikipedia


number of independent Bernoulli trial

 Poisson Model: Models the number of events occurring in a fixed


interval of time or space
Probability Models – Continuous Probability
 Continuous Probability Models: In continuous probability models, a random variable X
can take on any value (is continuous).

 Uniform Distribution: Represents a situation where all values in an interval are equally
likely. e.g.: Rolling of a fair six-sided die, generating random numbers between 0 and 1

 Normal (Gaussian) Distribution: Describes continuous variables that are symmetrically


distributed around a mean, often referred to as the bell curve. e.g.: Height of adult
individuals in a population
Probability Models – Joint Probability
Joint Probability: It is the likelihood of more than one event occurring at the same time
Conditions:

 One is that events X and Y must happen at the same time. Example: Throwing two dice
simultaneously.

 The other is that events X and Y must be independent of each other. That means the
outcome of event X does not influence the outcome of event Y.
Example: Rolling two Dice.

 If the following conditions met, then P(A∩B) = P(A) * P(B).


Probability Models – Joint Probability
Tossing a fair coin and Rolling a Die

Find the joint probability of getting heads on the coin toss and rolling a 5 on the die?

 the probability of getting heads (event A): P(A) = 0.5

 the probability of rolling a 5 (event B): P(B) = 1/6

 P(A ∩ B) = P(A) * P(B) = 0.5 * (1/6) = 1/12 ≈ 0.0833

What will happen to the joint probability of two dependent events?


Probability Models – Conditional Probability
 Conditional probability: It is the probability of occurrence of an event B given the
knowledge that an event A has already occurred. It is denoted by P(B|A).

 The joint probability of two dependent events then becomes P(A and B) = P(A)P(B|A)

Note:

 Two events A and B are independent if P(AB) = P(A) P(B)

 Two events A and B are conditionally independent given C if they are independent after
conditioning on C

P(AB|C) = P(B|AC)P(A|C) = P(B|C)P(A|C)


Conditional Probability - Example
60% of students pass the Final and 45% of students pass both the Final and the
Midterm. What percent of students who passed the Final also passed the Midterm?

What percent of students passed the Midterm given they passed the Final?

P(F) =0.6 , P(M and F) = 0.45, P(M|F)=?

P(M and F) = P(F) P(M|F)

P(M|F) = P(M and F) / P(F) = .45 / .60 = .75


Bayes Theorem
Joint Probability of two dependent events:
P(A and B) = P(A)P(B|A) P(B and A) = P(B)P(A|B)

P(A)P(B|A) = P(B)P(A|B) P(A|B) = P(A) P(B|A) / P(B)

This is the Bayes theorem

It tells us: how often A happens given that B happens: P(A|B) – Posterior Probability
When we know:
 how often B happens given that A happens, P(B|A) – Likelihood
 how likely A is on its own, P(A) – Prior Probability
 how likely B is on its own, P(B) – Evidence
Bayes Theorem [Example]
A rare disease affect 0.1% of the population. A test proposed for this disease is 99% accurate
for people who have the disease. What is the probability that a person actually has the disease
if they test positive?

 

 

 Only 9% chance that the person is having the disease!


Probability Distribution
 A probability distribution is a mathematical function that describes the likelihood of
various outcomes in a random experiment or process.

 Probability Mass Function (PMF):

 The PMF gives the probability that a specific value of the random variable occurs in
case of discrete distributions

 Probability Density Function (PDF):

 The PDF in case of continuous distributions, gives the relative likelihood of the random
variable taking on a particular value.
 The area under the PDF curve over an interval represents the probability of the variable
falling within that interval
Probability Mass Function

 A PMF describes the probabilities of


discrete random variables taking on specific
values.

 It provides a complete distribution of


probabilities for all possible outcomes.

𝟏 𝟏 𝟏 𝟏 𝟏 𝟏
𝑬 𝑿 =𝟏 +𝟐 +𝟑 +𝟒 +𝟓 +𝟔
𝟔 𝟔 𝟔 𝟔 𝟔 𝟔
= 𝟑. 𝟓
PMF-Example
Consider a biased coin with P(H) = 0.7 and P(T) = 0.3

X: The number of times the coin lands heads up in two consecutive flips.

Possible Outcomes: X={0,1,2}

P(X=0) = P(TT) = 0.3*0.3 = 0.09


P(X=1) = P(HT) + P(TH) = 0.7*0.3+0.3*0.7 = 0.42 PMF
P(X=2) = P(HH) = 0.7*0.7 = 0.49

Expected Value (Mean):


PMF-Example
Consider a biased coin with P(H) = 0.7 and P(T) = 0.3

X: The number of times the coin lands heads up in two consecutive flips.

Possible Outcomes: X={0,1,2}

P(X=0) = P(TT) = 0.3*0.3 = 0.09


P(X=1) = P(HT) + P(TH) = 0.7*0.3+0.3*0.7 = 0.42 PMF, Total Probability = 1.0
P(X=2) = P(HH) = 0.7*0.7 = 0.49

Variance, Var(X)
+
Types of PMFs – Bernoulli Distribution
 The Bernoulli distribution models a single binary trial with two possible outcomes:
success (S) with probability "p" and failure (F) with probability "q" (where q = 1 - p).

 Each trial in the Bernoulli distribution has only two possible outcomes, often labeled as 1
(success) and 0 (failure).
Types of PMFs – Binomial Distribution
 It is a discrete probability distribution that models the number of successes in a fixed
number of independent Bernoulli trials
 It summarizes the probability that a value will take one of two independent values
under a given set of parameters or assumptions
N = 20 trials with p=q=1/2
The binomial distribution of obtaining exactly n
successes out of N Bernoulli trials is given by:
Types of PMFs – Poisson Distribution (
 It is a discrete probability distribution that models the number of events occurring in a
fixed interval of time or space, given the average rate of occurrence.
 The distribution is used to model rare events that occur randomly and independently
over a specified time or space interval

λ: the average rate of occurrence


k: number of events
Probability Density Function
 It is used to describe the probability distribution of a continuous random variable, where
the set of possible outcomes is an uncountably infinite range, such as real numbers within
an interval.

 It defines the likelihood of the variable falling within a particular range of values.
Uniform Distribution
 It describes a situation where all values within a specific interval [a, b] are equally
likely to occur, i.e., have the same probability of occurring

a b
PDF - Normal/Gaussian Distribution
 It is a continuous probability distribution that is
characterized by its bell-shaped curve. The curve
tails off towards the extremes.

 It is symmetric with the highest point at the


mean, and the spread of the distribution
determined by the standard deviation.

( )
( )
Normal Distribution - Empirical (68-95-99.7)Rule
It is a statistical guideline that describes the approximate
distribution of data in a normal distribution

 About 68.26% of the data falls within 1σ of µ.

 About 95.44% of the data falls within 2σ of µ.

 About 99.72% of the data falls within 3σ of µ.

 The rest 0.28% of the whole data lies outside http://www.cs.uni.edu/~campbell/stat/normfact.html

3σ of µ
Example
Consider a dataset of exam scores that follows a normal distribution with a mean of 80 and
a standard deviation of 10.

 About 68% of the scores fall within the range: 70-90


 About 95% of the scores fall within the range: 60-100
 About 99.7% of the scores fall within the range: 50-110
Central Limit Theorem
 It describes the behavior of the sample means from a population, regardless of the
population's underlying distribution

 It states that as the sample size increases, the distribution of sample means
approaches a normal distribution, regardless of the original population's distribution.

 Provided we have a population with μ and σ and take large random samples (n ≥ 30)
from the population with replacement, the distribution of the sample means will be
approximately normally distributed with:

You might also like