0% found this document useful (0 votes)
116 views

Probability in Data Science

The document discusses various statistical distributions including binomial, Poisson, and hypergeometric distributions. It defines key terms like probability distribution function, probability mass function, cumulative distribution function, mean, variance, and covariance. It explains that the binomial distribution models experiments with a fixed number of trials with two outcomes, the Poisson distribution models rare, independent events over an interval, and the hypergeometric distribution is used when sampling without replacement from a finite population. Examples are given of calculating probabilities and values using the binomial and Poisson distributions in Python.
Copyright
© © All Rights Reserved
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views

Probability in Data Science

The document discusses various statistical distributions including binomial, Poisson, and hypergeometric distributions. It defines key terms like probability distribution function, probability mass function, cumulative distribution function, mean, variance, and covariance. It explains that the binomial distribution models experiments with a fixed number of trials with two outcomes, the Poisson distribution models rare, independent events over an interval, and the hypergeometric distribution is used when sampling without replacement from a finite population. Examples are given of calculating probabilities and values using the binomial and Poisson distributions in Python.
Copyright
© © All Rights Reserved
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Consultancies

Bi reporting tools hr.scandinaviantech.se


Power BI
Data Studio
Google Big Query
Empirical
Discrete
continus

Distribution Describe the shape of the data

PDF(continous) Probability Distribution Function /probability Density function


PMF(discrete) Values taken by X random variable and their associateed probabilities

CDF add the PDF to get CDF and denoted as F(X)

E(X)=Mean Expected value sum of X*P(X)


V(X)

If two random variables move in the same direction, then the covariance will be positive, if they move in the opposite dire
The covariance tells the sign but not the magnitude about how strongly the variables are positively or negatively related.
correlation coefficient provides such measure of how strongly the variables are related to each other.

for Two Random variables X,Y

covariance

correlation coeffiencent

m-Solope of a regresion equation

Binomial Distribution- Assumtions


•Experiment involves n identical trials
•Each trial has exactly two possible outcomes: success and failure
•Each trial is independent of the previous trials
• p is the probability of a success on any one trial
q = (1-p) is the probability of a failure on any one trial
•p and q are constant throughout the experiment
•X is the number of successes in the n trials
Poisson Distribution
•Describes discrete occurrences over a continuum or interval
•A discrete distribution
•Describes rare events
•Each occurrence is independent any other occurrences.
•The number of occurrences in each interval can vary from zero to infinity.
•The expected number of occurrences must hold constant throughout the experiment.

The Hypergeometric Distribution


•The binomial distribution is applicable when selecting from a finite population with replacement or from an infinite popu
•The hypergeometric distribution is applicable when selecting from a finite population without replacement.
if Z table is one sided ie st
ad
move in the opposite direction the covariance will be negative.
tively or negatively related. The
ated to each other.

ion equation
nt or from an infinite population without replacement.
replacement.
if Z table is one sided ie starting form 0 instead of -infinite then we need to
add 0.5 to value in the table
Import scipy 19 T
import numpy as np 25

from import scipy.stats import binom


from scipy.stats import poission

binom(2,20,0.06) 2 represent <=2


20 represents no.of samples

possion
to get zvalue give the area
sf
binom(k,n,p)=(19,25,0.65)

k<=2,20,0.06
p(x<=2) cummuilative binom
cdf(k,n,p)

lambda =3.2
X=5
survival function

You might also like