0% found this document useful (0 votes)

5 views27 pages

Probability

Uploaded by

ai23mtech11008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views27 pages

Probability

Uploaded by

ai23mtech11008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Probability

Sample vs population:

Why n-1 in denominator?

But when we use true mean instead of sample mean in the formula we dont need bias correction.
● Geometric and exponential distributions are memoryless.
● Chi square RV is the summation of different std. Normal RVs.
● Markov inequality gives the upper bound of the tail probability.

Markov property: A stochastic process has the Markov property if the future state of the process
depends only on the present state, and not on the sequence of events that preceded it.

It is memoryless.
● Choose t-distribution when population size is n<30 and variation of the population is not known. It
is a general form of the z distribution. Tails are longer in the T distribution, so choose when probability is
more distributed on tails.

● Chi square distribution is used to check goodness of fit, categorical data and/or sample size is
small. It is a combination of squares of different std. Normal distribution.

● Choose Normal distribution when distribution is symmetric and data is distributed near the mean.

● When two RVs X, Y are orthogonal then, E (XY) = 0.

● When two rvs are independent then they are also uncorrelated. But converse not true.

Skewness:
It is a measure how much a pdf deviates from being symmetric.
https://en.wikipedia.org/wiki/Skewness#:~:text=In%20probability%20theory%20and%20statistics,zero%2
C%20negative%2C%20or%20undefined.

In skewness, mode lies where the peak forms. Mean shifts most towards the skewness, and median lies
in between them. Median is the best measure when there is skewness (presence of outlier).
Positive skewed: Income distribution, Housing prices, Sales data of a store.
Negative skewed: Age at retirement, Score of an exam.
Zero skewed: Height data, IQ scores.

Kurtosis: It tells us about the tailedness of a pdf. Means how sharp the tails are of a pdf. It's a fourth
order mean. Normal distribution has 0 kurt. If tails are sharp, then kurt is positive else negative.

Law of large numbers:

It tells that as the number of samples increases, sample mean tends to true mean.
1. Weak LLN: Convergence in Probability

2. Strong LLN: Almost sure convergence

Central Limit theorem

The sampling distribution of the mean will always be normally distributed, as long as the sample size is
large enough. Regardless of whether the population has a normal, Poisson, binomial, or any other
distribution, the sampling distribution of the mean will be normal when n is large (n>30). n is the number
of RVs chosen in sampling.

Or Z = (x_bar - mu)/sigma/sqrt(n), Z~N(0,1).

Kernel Density estimation:

It is a non-parametric method to estimate the pdf of a rv based on kernels as weights. Idea is to use a
kernel function( normal pdf for example) and then add the values of it to estimate the actual pdf. A
smoothing function known as bandwidth(variance of normal distribution) controls the smoothness of the
pdf.
https://en.wikipedia.org/wiki/Kernel_density_estimation
Percentiles and quantiles:
A percentile is a measure that indicates the value below which a given percentage of the data falls. For
example, the 25th percentile is the value below which 25% of the data points are found.

A quantile is a general term for values that divide a dataset into equal-sized intervals. The data is divided
into q equal parts, and the k-th quantile is the value below which k/q of the data falls.
Special Types of Quantiles:

● Quartiles: Divide the data into 4 equal parts (25%, 50%, 75%, 100%).
● Deciles: Divide the data into 10 equal parts (10%, 20%, ..., 100%).
● Percentiles: Divide the data into 100 equal parts (1%, 2%, ..., 100%).

Q-Q(quantile-quantile) plot:
It is a graphical technique used to check whether given data points (or rv X) follow a specific(normal or
any other) distribution. Or two rvs X, Y follow same distribution.
Chebyshev’s Inequality:
If we know the given data follows normal dist, then we can approximate that x% of the data lies within mu
+- sigma using the 68-95-99.7 rule. But what if we don’t know the distribution. If i know mean and std.
Deviation, we can leverage this equality to answer few questions, like what percentage of individuals have
a salary in the range of [20k, 60k].
It provides an upper bound of the variance of a random variable given it’s mean.
Log-Normal distribution:
X~lognormal(mu, sigma) then Y = log(X) ~ N(mu, sigma).

https://en.wikipedia.org/wiki/Log-normal_distribution check occurrences and application section in the

wikipedia.
[51]
● The length of comments posted in Internet discussion forums follows a log-normal distribution.
[53]
● The length of chess games tends to follow a log-normal distribution.
● In economics, there is evidence that the income of 97%–99% of the population is distributed
[69] [70]
log-normally. (The distribution of higher-income individuals follows a Pareto distribution).
You can take log(X) and check whether log(X) follows normal(use QQ plot). If so, we can apply all the
mathematics. Generally, log-normal distribution is more common in human behavior and internet
companies.

Power-law distribution:
Power law functions have very long tail. And it follows 80-20 rule, means, 80% of the region falls in
starting 20% range. Pareto distribution follows power law distribution.
Power law functions are basically when relative change is dependent on powers and not on initial values.

https://en.wikipedia.org/wiki/Power_law
https://en.wikipedia.org/wiki/Pareto_distribution
To check whether a distribution follows pareto or power law distribution you can use log log plot. If is gives
a decreasing straight line, then it follows.

Xm is the scale parameter. This denotes that below xm pareto dist is not defined.
Kepler’s law follows power law or pareto’s distribution.
Wealth distribution: A small percentage of population holds most of the wealth.
Book sales: A small percentage of books give majority income.

Power transform (Box-cox transform).

Transform any rv X -> normal distribution.
https://builtin.com/data-science/box-cox-transformation-target-variable
Where lambda is the optimal value of log likelihood of the transformed data yi.
For lambda=0, xi is the log normal distribution.
For lambda=1, xi is already normally distributed, only shift happens.
Limitations:
1. it is sensitive to outliers.
2. It is applicable to only positive data.

So, what’s the recipe?

You have given data, you try to check from which distribution it came using QQ plot. Otherwise you can
also transform to Gaussian distribution using Box-cox plot. Then make inferences like what % of
population has age>60 like that.

How to measure how two RVs are related?

Covariance, pearson correlation coeff, spearman rank corr-coeff.

Covariance: It give qualitative measure of the tendency of linear relationship of two jointly distributed rvs.
It does not quantify.
If greater values of one variable mainly correspond with greater values of the other variable, and the same
holds for lesser values (that is, the variables tend to show similar behavior), the covariance is positive.
Units of measurements can change the result. This can be fixed using correlation coeff.

Pearson correlation coeff:

https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
Both covariance and pearson assumes relation to be linear. But what if they not. Also, it does not consider
slope of the relation.

Spearman rank correlation coeff:

Better to use when there is non-linear relationship.It is pearson’s correlation coeff of the ranks of the
dataset. Idea is to assign ranks for the data points and find pearson correlation of ranks rx and ry.
It is also robust to outliers.

https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
“Correlation does not employ causation.” Correlation does not imply causation means that the
co-occurrence of two events does not require that one causes the other.
For example, see below example.

Nobel prize winner per capita correlates with chocolate consumption per capita. But this does not mean
that chocolate consumption causes winning nobel prize.
Confidence Intervals (C.I)
It indicates that there is a x% chance that the true mean(or parameter) of a given rv lies between two set
of values.
Let x_bar represent the sample mean and mu represent the population mean. Now, if we repeat the
sampling multiple times, each time, we get a different value of sample mean, x_bar. In 95% of the
sampling experiments, mu will be between the endpoints of the C.I calculated using x_bar, but in 5% of
the cases, it will not be. 95% C.I does NOT mean that mu lies in the interval with a probability of 95%.

CI is a better measure than point estimate because point estimates seem to vary according to sampling.

How to calculate CI:

CI for mean of a rv:

Use CLT to get CI for sample mean distribution of a random variable.
For mean, If we know std deviation then we can use CLT to get CI for true mean. If we dont know then we
can use student’s t distribution.
If we want to get CI for some other param, like sigma or median, then use can use bootstrap CI.

Bootstrap method to get CI

1. Let’s say we want to find out the CI for median.

2. Let’s say we have the input x of size n.
3. Idea of this method is to resample m (<=n) from x and find out median of these samples
m1.
4. Perform step 3 several times say k = 1000.
5. Now you have 1000 medians of the resampled values. Sort this and get the percentile
values. Like if you want 95% CI, then find lower percentile 2.5% and upper percentile
97.5%.
Hypothesis testing:
It is a statistical method that is used to determine whether the assumption about population parameter
true or not given the experimental data.
Steps:
Null hypothesis is the default assumption about data i.e. statements like “no difference” or “no effects”.
Both null and alternate hypothesis are mutually exclusive. Proof by contradiction that one statement is
correct while other is wrong.

How to read it: For the above case, it goes like: if the null hypothesis is true(there is no difference in the
pop mean of the distributions), then there is 90% chance that x = mu1-mu2 is 10cm.

If p-value is high, accept null hyp. Else choose alternate hypothesis.

What is p-value:
It is the probability of observation given my assumption(null hypothesis) is true.
p(obs | Ho) = p-value.
So choose a null hypothesis such that you can get the pdf of the data.

Significance level:
It is the threshold for rejecting null hypothesis.
If p <= alpha, reject H0

Example of coin toss:

Permutation test:
K-S test:
It is used to check whether given samples came from a given reference pdf or not (one-sample test).
Or, two samples came from the same pdf (two sample k-s test).

One sample test: "How likely is it that we would see a collection of samples like this if they were drawn
from that probability distribution?"
Two sample test: “How likely is it that we would see two sets of samples like this if they were drawn from
the same (but unknown) probability distribution?"
Null Hypothesis for two sample: Two samples came from a population with the same pdf.

Dn,m is the maximum difference between both the cdfs.

Introduction To The Practice of Basic Statistics (Textbook Outline)
100% (14)
Introduction To The Practice of Basic Statistics (Textbook Outline)
65 pages
Eco 2
No ratings yet
Eco 2
31 pages
Ic Engine by Mathur Sharma PDF
25% (4)
Ic Engine by Mathur Sharma PDF
2 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
Intro To Data Science Lecture 2
No ratings yet
Intro To Data Science Lecture 2
12 pages
Stats Midterms Cheat Sheet
No ratings yet
Stats Midterms Cheat Sheet
3 pages
INVJM000008044484(1)
No ratings yet
INVJM000008044484(1)
3 pages
Business Statistics - Sessions 4 To 7
No ratings yet
Business Statistics - Sessions 4 To 7
43 pages
biostatistis
No ratings yet
biostatistis
35 pages
Lecture note on biostatistics
No ratings yet
Lecture note on biostatistics
74 pages
Key Concepts Ch1 6
No ratings yet
Key Concepts Ch1 6
2 pages
QM Formula Class
No ratings yet
QM Formula Class
31 pages
Statistics
No ratings yet
Statistics
20 pages
Lecture01 Uppsala EQG 12
No ratings yet
Lecture01 Uppsala EQG 12
39 pages
Sasin DECS 434 Session 1 and 2 - Probability and Excel
No ratings yet
Sasin DECS 434 Session 1 and 2 - Probability and Excel
104 pages
ML2_Math_Algo
No ratings yet
ML2_Math_Algo
72 pages
Stats and Maths For Data Analyst
No ratings yet
Stats and Maths For Data Analyst
23 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
MMW Notes
No ratings yet
MMW Notes
10 pages
Statistics and Econometrics
No ratings yet
Statistics and Econometrics
12 pages
Stats 1 Formulae
No ratings yet
Stats 1 Formulae
26 pages
Basic Business Statistics: Numerical Descriptive Measures
No ratings yet
Basic Business Statistics: Numerical Descriptive Measures
33 pages
Revision - Elements or Probability: Notation For Events
No ratings yet
Revision - Elements or Probability: Notation For Events
20 pages
Measures of Variability and Position
No ratings yet
Measures of Variability and Position
34 pages
Probability - Statistics - class notes
No ratings yet
Probability - Statistics - class notes
15 pages
Probs-Stats Revision Notes
No ratings yet
Probs-Stats Revision Notes
19 pages
ANALYST Sources
No ratings yet
ANALYST Sources
23 pages
STATISTICS (Averages and Variation)
No ratings yet
STATISTICS (Averages and Variation)
8 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
Statistics Notes 1702100127
No ratings yet
Statistics Notes 1702100127
22 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
A 18-Page Statistics & Data Science Cheat Sheets
No ratings yet
A 18-Page Statistics & Data Science Cheat Sheets
18 pages
SLIDES - Statistics-Descriptive Statistics
No ratings yet
SLIDES - Statistics-Descriptive Statistics
25 pages
4485-2
No ratings yet
4485-2
16 pages
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
No ratings yet
Statistics Study Guide: Matthew Chesnes The London School of Economics September 22, 2001
22 pages
Cheat Sheet 2 in 1-1
No ratings yet
Cheat Sheet 2 in 1-1
2 pages
Module Wise Important Formulae
No ratings yet
Module Wise Important Formulae
45 pages
Measures of Central Tendency: Mean
No ratings yet
Measures of Central Tendency: Mean
7 pages
Notes
No ratings yet
Notes
12 pages
Maths Roadmap For Machine Learning - Statistics
No ratings yet
Maths Roadmap For Machine Learning - Statistics
5 pages
MECH 262 - Notes (Statistics)
No ratings yet
MECH 262 - Notes (Statistics)
7 pages
Statistics Guide
No ratings yet
Statistics Guide
27 pages
Statistics S1 Theory
No ratings yet
Statistics S1 Theory
8 pages
Statistical Methods
No ratings yet
Statistical Methods
16 pages
History Reporting
No ratings yet
History Reporting
61 pages
9.1. Prob - Stats
No ratings yet
9.1. Prob - Stats
19 pages
LQ1 Notes
No ratings yet
LQ1 Notes
15 pages
SDM 1 Formula
No ratings yet
SDM 1 Formula
9 pages
Preview: "Effectiveness of Structured Teaching Programme On
No ratings yet
Preview: "Effectiveness of Structured Teaching Programme On
24 pages
Formula Sheet
No ratings yet
Formula Sheet
19 pages
Jyotish Dwara Rog Upachar
89% (9)
Jyotish Dwara Rog Upachar
213 pages
GB Academy Equation List
No ratings yet
GB Academy Equation List
16 pages
Module01_ProbabilityAndHypothesisTesting
No ratings yet
Module01_ProbabilityAndHypothesisTesting
62 pages
A. Variables:: Types of Distributions
No ratings yet
A. Variables:: Types of Distributions
10 pages
Error and Uncertainty: General Statistical Principles
No ratings yet
Error and Uncertainty: General Statistical Principles
8 pages
Kinesio Taping Pediatrics
95% (22)
Kinesio Taping Pediatrics
240 pages
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
No ratings yet
Probability & Statistics Facts and Formulae: Guides To Statistical Information 1
4 pages
ISDS 361A - Cheat Sheet Exam 1.pdf
No ratings yet
ISDS 361A - Cheat Sheet Exam 1.pdf
2 pages
DOTCOM PPTX
No ratings yet
DOTCOM PPTX
31 pages
[FREE PDF sample] (Original PDF) Research Methods in Psychology: Evaluating a World of Information 3rd Edition ebooks
100% (1)
[FREE PDF sample] (Original PDF) Research Methods in Psychology: Evaluating a World of Information 3rd Edition ebooks
55 pages
General Purpose: Poisson Distribution
No ratings yet
General Purpose: Poisson Distribution
11 pages
BQ24738 With Voltages
No ratings yet
BQ24738 With Voltages
1 page
BC558 Silicon PNP Transistor Audio Amplifier, Switch: Features
No ratings yet
BC558 Silicon PNP Transistor Audio Amplifier, Switch: Features
2 pages
SPORTS
100% (3)
SPORTS
80 pages
Big Six: User Guide
No ratings yet
Big Six: User Guide
56 pages
MAT 211 Introduction To Business Statistics I Lecture Notes
No ratings yet
MAT 211 Introduction To Business Statistics I Lecture Notes
69 pages
AP Statistics Study Guide
100% (1)
AP Statistics Study Guide
12 pages
Mba Statistics Midterm Review Sheet
No ratings yet
Mba Statistics Midterm Review Sheet
1 page
What are Crafty Buildy Strategy Simulation Games_ – How To Market A Game
No ratings yet
What are Crafty Buildy Strategy Simulation Games_ – How To Market A Game
28 pages
Scheff 2000 Shame and The Social Bond A Sociological Theory
No ratings yet
Scheff 2000 Shame and The Social Bond A Sociological Theory
16 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Effect of Organoclay Type and Clay Polyurethane Interaction Che - 2018 - Applied
No ratings yet
Effect of Organoclay Type and Clay Polyurethane Interaction Che - 2018 - Applied
11 pages
MOdule 1 Garm 6 Final
No ratings yet
MOdule 1 Garm 6 Final
11 pages
Render Token Whitepaper
No ratings yet
Render Token Whitepaper
10 pages
Miranda Mcbride Resume 2016
No ratings yet
Miranda Mcbride Resume 2016
3 pages
Maternity Benefit Form
No ratings yet
Maternity Benefit Form
6 pages
Electronic Archiving System Report
No ratings yet
Electronic Archiving System Report
6 pages
1.AECC - Communication and Its Basic Forms
No ratings yet
1.AECC - Communication and Its Basic Forms
3 pages
Principles of Classroom Assessment
No ratings yet
Principles of Classroom Assessment
6 pages
12 CBSE Goodwill _solution
No ratings yet
12 CBSE Goodwill _solution
6 pages
Menu Minuman Lnd New
No ratings yet
Menu Minuman Lnd New
2 pages
Whrs
No ratings yet
Whrs
6 pages
Death Note Manga - Google Search
No ratings yet
Death Note Manga - Google Search
6 pages
Coal Crusher Rent
No ratings yet
Coal Crusher Rent
3 pages
Alfa Laval Fan Evap ERC00146EN
No ratings yet
Alfa Laval Fan Evap ERC00146EN
2 pages
Oilseparators According EN858
No ratings yet
Oilseparators According EN858
11 pages
Parte 1 IEEE 400.2
No ratings yet
Parte 1 IEEE 400.2
2 pages
Dra Aft Surve Ey: Proc Cedures and Cal Lculation N: Readi Ing The Draf Ftmark of TH He Ship
No ratings yet
Dra Aft Surve Ey: Proc Cedures and Cal Lculation N: Readi Ing The Draf Ftmark of TH He Ship
4 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet

Probability

Uploaded by

Probability

Uploaded by

Probability

Why n-1 in denominator?

● When two RVs X, Y are orthogonal then, E (XY) = 0.

Law of large numbers:

2. Strong LLN: Almost sure convergence

Central Limit theorem

Or Z = (x_bar - mu)/sigma/sqrt(n), Z~N(0,1).

Kernel Density estimation:

https://en.wikipedia.org/wiki/Log-normal_distribution check occurrences and application section in the

Power transform (Box-cox transform).

So, what’s the recipe?

How to measure how two RVs are related?

Pearson correlation coeff:

Spearman rank correlation coeff:

How to calculate CI:

CI for mean of a rv:

Bootstrap method to get CI

1. Let’s say we want to find out the CI for median.

If p-value is high, accept null hyp. Else choose alternate hypothesis.

Example of coin toss:

Dn,m is the maximum difference between both the cdfs.

You might also like