Probability: Youtube: Learn With Ca. Pranav, Instagram: @learnwithpranav, Telegram: @pranavpopat, Twitter: @pranav - 2512
Probability: Youtube: Learn With Ca. Pranav, Instagram: @learnwithpranav, Telegram: @pranavpopat, Twitter: @pranav - 2512
Pranav Popat
Probability
→ First use of Probability was made 300 years back in Europe by a group of
Know about mathematicians to enhance their chances of winning in gambling
Probability → It is a full-fledged subject and become an integral part of statistics
→ Theories of Testing Hypothesis and Estimation are based on probability
Subjective Dependent on personal judgment, useful in decision making.
Probability It is out scope of our syllabus
Types
Objective This is based on Mathematical Rules and not judgment
Probability based. We will study this section in our chapter.
Experiment A performance that produces certain results
Random Random An experiment is defined to be random if the results of the
Experiment Experiment experiment depend on chance only.
Examples Tossing a coin, throwing a dice, drawing cards from a pack
Events The results or outcomes of a random experiment are known as events
Based on Combination of Events
Simple or Elementary If the event cannot be decomposed into
further events
Composite or Compound An event that can be decomposed into two or
more simple events
Demerits or → Applicable only when events are finite and are equally
Limitations likely
→ Limited application of this definition like in tossing coin,
throwing dice, cards etc.
Other Notes → 0 ≤ 𝑃(𝐴) ≤ 1, P(A) = 1 means Sure Event, P(A) = 0
More about means impossible event
Classical → Probability of non-occurrence of an event A is denoted
Probability by P(A’) or P(A) is called as complimentary event of A.
P(A’) = 1−P(A)
Odds in Favor no. of favorable events
of an Event no. of unfavorable events
Odds Against no. of unfavorable events
an Event no. of favorable events
If an experiment results in p outcomes and if it is repeated q times then
Special Formula
Total no. of outcomes = 𝑝
Suits
Terms used in 52
Cards Deck
(four) Spades - ♠ Hearts - Diamond - Clubs - ♣
Ranks
A (Ace), K (King), Q (Queen), J (Jack), 10, 9, 8, 7, 6, 5, 4, 3, 2
(13)
no.of times the event occured during experimental trials 𝑓𝐴
Relative Frequency Relative Frequency = total no.of trials
= n
Definition of 𝑓
Probability by this method is defined as 𝑃(𝐴) = lim → n𝐴
Probability
(Relative Frequency on infinite no. of trials is equal to probability)
Sample Space (denoted a non-empty set containing all the elementary
by S or Ω-omega) events of a random experiment as sample points
Event A Event which is under consideration for probability
Set Based calculations is defined as a non empty subset of Set
Probability S (Sample Space)
Probability Formula no. of sample points in A 𝑛(𝐴)
𝑃(𝐴) = =
no. of sample points in S 𝑛(𝑆)
This definition is also based on Sets Concepts. Here Probability is not a simple
ratio like above, but can be said as function P defined on S known as
Probability Measure.
Axiomatic Or P(A) is defined as the probability of A as per this function only if below
Modern conditions are satisfied:
Definition of Condition 1 𝑃(𝐴) ≥ 0, for every 𝐴 ⊆ 𝑆
Probability Condition 2 𝑃(𝑆) = 1
Condition 3 For any sequence of mutually exclusive events A1, A2, A3...
𝑃(𝐴 ∪ 𝐴 ∪ 𝐴 ∪ … ) = 𝑃(𝐴 ) + 𝑃(𝐴 ) + 𝑃(𝐴 ) + …
Theoretical Distribution
Area under
Normal Curve
From To Area/Probability
μ +σ 34.135%
+σ +2σ 13.59%
+2σ +3σ 2.14%
3σ ∞ 0.135%
From To Area/Probability
-σ +σ 68.3%
-2σ +2σ 95.5%
-3σ +3σ 99.7%
A. INTRODUCTION TO STATS
Definition
Singular Sense:
Scientific method that is used for collecting, analyzing and presenting data
Used to draw statistical inferences
Inferences means conclusion reached on the basis of evidence and reasoning
Example:
After applying statistical methods we have arrived at a conclusion that in last 5 years crime rate is
reduced.
Plural Sense:
Data qualitative or quantitative collected to do statistical analysis
Example: Based on Cricket Match statistic of this stadium, chasing team wins mostly
History of Stats
Word Origin
Latin word – Status
Italian word – Statista
German word – statistic
French word – statistique
Publication:
Koutilya’s book Arthashastra
Stat records on Agriculture found in Ain-i-Akbari (author Abu Fezal)
Census: First ever census done in Egypt (300 years BC to 2000 BC)
Application of Stats
There are various but we will confine to below:
1. Economics: Time Series analysis, index, demand analysis, econometrics, regression analysis
2. Business Management: business decisions rely upon QT
3. Commerce/ Industry: Sales, Purchase, RM, Salary Wages etc. data are analyze for business
decisions and policy making
Limitation of Stats:
1. Relevant for aggregate data and not individual data
2. Quantitative data can only be used, however for qualitative – it needs to be converted into
quantitative
3. Projections are based on conditions/ assumptions and any change in that will change the
projection
4. Sampling based conclusions are used, improper sampling leads to improper results
B. COLLECTION OF DATA
1. Interview Method:
a. Personal Interview: directly from respondents. Example: Natural Calamity, Door to
Door Survey
b. Indirect Interview: when reaching to person difficult, contact associated persons.
Example: Rail accident
c. Telephone Interview: over phone, quick and non-responsive
Type of Interview/
Personal Indirect Telephone
Parameters
Accuracy High Low Low
Coverage Low Low High
Non Response Low Low High
3. Observation Method:
a. Data collected by direct observation or using instrument
b. Example: Height check, Weight check,
c. Although more accurate but it is time consuming, low coverage and laborious
Scrutiny of Data
C. PRESENTATION OF DATA
D. FREQUENCY DISTRIBUTION
Important Terms
3. Class Limit: for a class interval CL is the minimum and maximum value the class interval may
contain. Minimum = Lower Class Interval (LCL) and Maximum = Upper Class Interval (UCL)
Example:
Example:
Class Type LCL UCL LCB UCB Class Type LCL UCL LCB UCB
Mutually Mutually
10-19 10 19 9.5 19.5 10-20 10 20 10 20
Inclusive Exclusive
Mutually Mutually
20-29 20 29 19.5 29.5 20-30 20 30 20 30
Inclusive Exclusive
Mutually Mutually
30-39 30 39 29.5 39.5 30-40 30 40 30 40
Inclusive Exclusive
7. Cumulative Frequency
8. Frequency Density
𝐅𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲 𝐨𝐟 𝐜𝐥𝐚𝐬𝐬
𝐂𝐥𝐚𝐬𝐬 𝐥𝐞𝐧𝐠𝐭𝐡 𝐨𝐟 𝐭𝐡𝐚𝐭 𝐜𝐥𝐚𝐬𝐬
9. Relative Frequency or % Frequency
𝐅𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲 𝐨𝐟 𝐜𝐥𝐚𝐬𝐬
𝐓𝐨𝐭𝐚𝐥 𝐅𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲 𝐨𝐟 𝐭𝐚𝐛𝐥𝐞
Class Frequency Relative Percent
Class Frequency
Length Density Frequency Frequency
10-20 5 10 0.5 5/18 27.7%
20-30 2 10 0.2 2/18 11.11%
30-40 8 10 0.8 8/18 44.44%
40-50 3 10 0.3 3/18 16.67%
Total 18
2. Frequency Polygon
a. Usually preferable for ungrouped frequency distribution
b. Can be used for grouped also but only if class lengths are even
c. Steps to create
Plot (xi, fi) where xi = class value (in case of ungrouped), mid value (in case of
grouped) and fi = frequency
Join all plotted points to make line segments which eventually will become a polygon
(a shape with multiple number of line segments)
4. Frequency Curve
a. It is a limiting form of Area Diagram (Histogram) or frequency polygon
b. It is obtained by drawing smooth and free hand curve though the mid points
c. These are of below four types:
Bell Shaped
U-Shaped
J-Shaped
Combination of Curves as Mixed Curve
Central Tendency
Partition
Values Number of equal parts Hundred (100)
Percentiles Number of Percentiles Ninety Nine (99)
Denoted by 𝑃 ,𝑃 ,𝑃 ,…,𝑃
(𝑛 + 1)𝑃 𝑡𝑒𝑟𝑚,
𝑃 Quartile 1 2 3
ℎ𝑒𝑟𝑒 𝑝 = , , ,
4 4 4
How to
calculate (𝑛 + 1)𝑃 𝑡𝑒𝑟𝑚,
𝑃 Decile 1 2 3 9
Partition ℎ𝑒𝑟𝑒 𝑝 = , , ,…,
Values 10 10 10 10
(𝑛 + 1)𝑃 𝑡𝑒𝑟𝑚,
𝑃 Percentile 1 2 3 99
ℎ𝑒𝑟𝑒 𝑝 = , , ,…,
100 100 100 100
Definition Mode is the value that occurs the maximum number of times.
Type of Mode A distribution can be uni-modal, bi-modal or multi-modal
𝑓 −𝑓
𝑀𝑜𝑑𝑒 = 𝑙 + × 𝐶𝑙𝑎𝑠𝑠 𝑙𝑒𝑛𝑔𝑡ℎ
Mode 2𝑓 − 𝑓 − 𝑓
For Frequency
Distribution Here,
𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠,
𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑝𝑟𝑒 − 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠,
𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑜𝑠𝑡 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠
Relationship General AM ≥ GM ≥ HM
between AM, When all the observations are same AM = GM = HM
GM and HM When all the observations are distinct AM > 𝐺𝑀 > HM
Dispersion
CORRELATION
Bi-Variate When data are collected on two discrete variables simultaneously, they are known
Data as Bi-Variate data
Bi-Variate
Distribution of Bi-Variate data is called as Bivariate Distribution
Distribution
Meaning Frequency distribution involving two discrete variables.
Marginal If we make a separate distribution from bi-variate frequency
Distribution distribution where we take aggregate of only one variable at a
Bi-Variate
time. Total no. of marginal distributions = 2
Frequency
Conditional If we make a separate distribution from bi-variate frequency
Distribution
Distribution distribution where we take one variable related one class
interval of another variable. Total no. of conditional
distributions = m + n (m = no. of rows, n = no. of columns)
While studying two variables at the same time, if it is found that the change in one
variable leads to change in the other variable either directly or inversely, then the
two variables are known to be associated or correlated.
Correlation
Positive Correlation If two variables move in the same direction
Negative Correlation If two variables move in the opposite direction
No Correlation If no change due to each other
Scatter
Diagram
→ It is a unit-free measurement
→ Value of r lies from -1 to +1 both inclusive
→ Change of origin or Scale
Properties Change of Origin No impact
Change of Scale No impact of value, but if change of
scale of both variables are of different
sign then sign r will also change
REGRESSION
Regression Estimation of one variable for a given value of another variable on the basis of an
Analysis average mathematical relationship between the two variables
Line Regression line of Y on X
Estimation of Y Regression
Regression Coefficient of Y on X denoted by 𝒃𝒚𝒙
(when it is Coefficient
dependent on X) 𝒀 − 𝒀 = 𝒃𝒚𝒙 (𝑿 − 𝑿),
Form
𝑋 𝑎𝑛𝑑 𝑌 𝑎𝑟𝑒 𝑚𝑒𝑎𝑛𝑠 𝑜𝑓 𝑋 𝑠𝑒𝑟𝑖𝑒𝑠 𝑎𝑛𝑑 𝑌 𝑠𝑒𝑟𝑖𝑒𝑠
Properties of
U= and V = , then
Regression lines 𝑝
and coefficient 𝑏 =𝑏 , 𝑏𝑢𝑣 = 𝑏𝑥𝑦
𝑞
Intersection of two regression lines Two regression (if not identical) will
intersect at the point (𝑥̅ , 𝑦) [means]
Relation between correlation and
𝑟 = ± ±𝑏 ×𝑏
regression coefficients
𝑏 ,𝑏 𝑎𝑛𝑑 𝑟 all will have same sign