0% found this document useful (0 votes)
32 views1 page

Probability and Statistics

Best probability

Uploaded by

maitysayan088
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views1 page

Probability and Statistics

Best probability

Uploaded by

maitysayan088
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Let x1,x2,x3,x4,.....

,xn
Then arithmetic mean is
Case-1 AM=X bar = series(sum of all
numbers)/n(Total number of
observation)

Let x1 have frquency f1, x2 f2, x3 f3,


x4 f4,....,xn fn
Types of experiment
Case-2 Then X bar=
-Deterministic
x1*f1+x2*f2+x3*f3+...+xn*fn
An experiment whose result can be predicted with certainty.
/f1+f2+f3+...+fn
Example-For an ideal gas, PV=nRT; if 4 variables are kept the
same
Let class interval of l0-l1 have
-Probabilistic
frequency of f1
An experiment whose result may not be unique for each trail
l1-l2 f2
Exhaustive Event Random Experiment Cases l2-l3 f3
The total number of possible outcome If in each trail of an experiment .....
of a random experiment conducted under identical condition, ln-1 -ln fn
Case-3 Then, x1=l0+l1/2
Example- A die thrown the outcome is not unique but maybe
exhaustive event={1,2,3,4,5,6} anyone of the possible outcome x2=l1+l2/2
x3=l2+l3/2
Favorable Event ...
The number of cases favorable to an event in a trail is the number of Then AM= X bar =
outcome which entails the happening of the event x1*f1+x2*f2+x3*f3+....
Example-If we throw two die the total possible exhaustive event is 36 /f1+f2+f3+f4+....(n)
now the favorable event of getting a sum of 9 is 4 i.e, {(3,6),(4,5),(5,4),(6,3)}
If x and f are too big from case-3
Mutually Exclusive Event Take an arbitrary no from x and
Events are said to be mutually exclusive if happening of any defining di=xi-A/h
Case-4
one of them preclude the happening of all other events Types of event where A is the point and h is the
Example-If we throw a die and get 1 we cannot get any class width
Arithmetic Mean
other possible event in that particular throw Then X bar= A = h/n(autosum(fd))

Equally likely event Say grp 1 has n1 observation and


Outcomes of a trail is said to be equally likely if taking into mean x bar 1 and grp 2 has n2
consideration of all the relevant evidence, there are no mean to observation and mean x bar 2
Combined Mean
expect one in preference to the other Then total of two group is n1*x bar 1
+n2*xbar2
representative or not.(not always true 😂😂
Example-There are no reason to get better marks if you are class
)
and combined mean = total / n1+n2

Finite sample space


A limited number of samples

Countable sample space


Sample space
The total number of sample space is in rational or natural number.
PROBABILITY
Uncountable sample space
The number of sample space is in complex or irrational number.

Probability of an event happening


If a random experiment or trail results in 'n' exhaustive, mutually
exclusive and equally likely outcome, out of which 'm' are favorable Theorem for AM
occurrence of event(E), then the probability P of happening of
event E is denoted as P(E) and defined by, Let x1,x2,x3,x4,.....,xn
P(E)=Total number of favorable event/Total exhaustive event=m/n Case-1 Then GM=( x1*x2*x3*x4*...*xn)^(1/n)
=(AutoMul for i=1 to n (xi))^1/n
Sigma Field
If a set X follows a following condition: Let x1 have frequency f1,x2 f2,x3
i)X⊂⅀ f3,x4 f4,.....,xn fn Log GM = I/N autosum{(log xi)fi}
Case-2
ii)X complement⊂⅀ Mean Then GM(x)={automul i=1-n(xi^fi)} ^ =AM(log xi)
iii)X is closed under countable union, ie, X1,X2,X3,..., ⊂⅀=> ∪Xi⊂⅀ {1/(autosum i=1-n)(fi)=1/N}
[union of any set for power set of X belong to ⅀]
Then, X is called Sigma Field. Some important formulas are:
(S,B,P) triplet is called probability space. 1+2+3+....+n=n*(n+1)/2
S is the sample space, B is the sigma field, 1^2+2^2+3^2+..+n^2=n*(n+1)*(2n+1)/6
P is the probability function on B 1^3+2^3+3^3+....+n^3={n*(n+1)/2}^2
Here ⅀ is the total possible event
Geometric Mean Say grp 1 has n1 observation and
Discrete Random variable mean g1 and grp 2 has n2 observation
A variable which can assume only a countable number of real value for which and mean g2
Cominbined Mean
the variable takes depend on the chance is called discrete random variable Then product of two group is g1^n1 *
g2^n2/(n1+n2)
Probability Mass Function (PMF) and combined GM = g1^n1 * g2^n2
If X is a discrete random variable with x1,x2,x3,x4,.. then the function then,
P(X)={P(x=Xi), xi=x1,x2,... or 0, otherwise} is called PMF for random variable X
PMF condition
1. P(x=xi)≥0 for all number i
Random Variable
2.ΣP(x=xi) = 1
A random variable is a function X(ω)
i)Mean, E(x)=Σx*P(x)
with domain S and range (-infinity,
ii)Variance, E(x^2)-(E(x))^2,E(x^2)=Σx^2*P(x)
infinity) such that for every real
Continuous Random Variable number 'a' the event is
X is called continuous if it takes all {ω:X(ω)≤a} belongs to B for all
possible value between its limit elements a belongs to R

Theorem For GM

Harmonic Mean Relationship between AM and GM

Evaluation methods
Various types of measures
Probability Density Function(PDF) 1)Arrange data in Ascending order
PDF of a random variable is 2)Count the total number of data in
defined as the picture given above the data set, say 'n'
If the value of function after 3a)If n is odd, median=(n+1/2)th obsn
integration comes to 1 it is a PDF 3b)If n is even, median={(n/2)th obsn+
(n/2+1)th obsn}/2

To find the median of grouped


frequency
The Value lying at the middle of the 1)Make sure to match the lower value
Meadian
arranged data set of the succesing group with the
upper value of precedding group or
vice versa
2)Find out the class mid value = lower
boundary+upper boundary/2
Positive correlation
3)Find their respective cummulative
If the values of two variable follow the same
frequency
direction that means if 1 variable increases
4)Median=xl+(N/2-cf)/f*h
another variable also increases
where N is the total frequency
examples:-
cf is the cummulative frequency of
Height and weight of a person
the preceding class
Demand and Supply
f is the frequency of the N/2 value
class
xl is the lower class boundary of the
median class
h is the class width or upper
boundary-lower boundary

Mode can be written as the highest frequency


Negative Correlation types of correlation
corresponding to the observation
If the values of two variable follow the opposite
direction that means if 1 variable increase, other
decreases Method of eveluation
Mode
Examples:- 1)Find the class with highest frequency, that class is modal class
Price and Demand 2)Mode=xl+{fm-f(m-1)}/[2fm-{f(m-1)+ f(m+1)}]*h
Pressure and Volume of a ideal gas where, xl is the lcb of modal class
fm is the frequency of modal class
f(m-1) is the frequency of previous class
f(m+1) is the frequency of the next class
h is the class width

1st Quartile
It divides the total distribution to 1:3
No-Correlation
Central tendency ratio. So it is a value of the variable
Q1=xl+(N/4-cf)/f*h
If the values of two variable follow different such that 25% of the observation
directions irrespective of one another It is defined as the statistical
falls below it and 75% of the
example:- measure that identifies a remaining observation falls above it
Height and weight of two separate people single value as a
representation of the entire 2nd Quartile
Correlation Coefficient
Two independent variable are un- It divides the total distribution to 1:1
Coefficient between two random variable x & y population, to appear at the ratio. So it is a value of the variable
related/independent if cov(x,y)=0
Then r(x,y)=0
are denoted by- center of the entire QUARTILES Types such that 50% of the observation Q2=xl+(N/2-cf)/f*h Here N is the total frequency
r(x,y) also known as Karl Pearson's coefficient say N/n is the required class(rc)
but if r(x,y)=0 then may or may not population. falls below it and 50% of the
Theory of co-relation. It is defined by, xl is the lcb of rc
be independent remaining observation falls above it.
r(x,y)=cov(x,y)/σx.σy cf is the cummulative frequency of
Example x= -3 -2 -1 0 1 2 3
where, cov(x,y)=1/n(Σ{[xi-x bar]*[yi- y bar]}),
Co-relation It is also called the median
previous class
y= 9 4 1 0 1 4 9
σx and σy are the respective SD Co-relation is an analysis of f is the frequency of rc
cov(x,y)=0 3rd Quartile
but x & y are related as y=x^2
(square root of {1/n*Σ[xi-x bar ]^2} = relationship between two It divides the total distribution to 3:1
sqrt{i/n*Σ[xi]^2-x bar^2})
variable. There are mainly 3 ratio. So it is a value of the variable
Q3=xl+(3N/4-cf)/f*h
QUARTILE DEVIATION such that 25% of the observation
Shearman Rank Co-relation types of correlation
ρ(x,y)=1-{6*Σ(di^2)}/n*(n^2-1) falls below it and 75% of the
where di is the difference of rank in xi and yi remaining observation falls above it

Tied Case Some Important Formulas IQR (Inter-Quartile Range)


ρ(x,y) = 1- (6*(Σdi^2+Tx+Ty))/n*(n^2-1) 1+2+3+4+5+6+.....+n=n*(n+1)/2 IQR = Q3 - Q1
where Tx=Σ(mi(mi^2-1)/12 where mi is the number 1^2+2^2+3^2+4^2+...+n^2=n*(n+1)*
Quartile deviation Autosum i=1-n(fi) is N or the total
of times the data is repeated in the given set (2*n+1)/6 QD=(Q3-Q1)/2
It is the half of IQR number of observations recorded
1^3+2^3+3^3+...+n^3={n*(n+1)/2}^2

RANGE
It is the simpliest measure of dispersion
Range= Maximum - Minimum

MEAN DEVIATION ABOUT ANY POINT 'A'


Regression line y on x Ma = (autosum for i=1 to n) (|xi - A|*fi)/ n

COMBINED VARIANCE
VARIENCE if two groups with n1 and n2 observation have x1 and x2 as mean and s1 and
var(x) = 1/n {autosum i=1 to n(xi-mean)^2 * fi} s2 as standard deviation, then combined variance is
var=n1*s1^2 + n2^2 + n1(x1-combined mean)^2 + n2(x2-combined mean)^2/n1+n2

Regression analysis STANDARD DEVIATION


Regression line x on y
It is a mathematical measure of the SD=sqrt{var(x)}
avg relationship between 2 or more
variables in terms of the original unit COEFFIENT OF QUARTILE DEVIATION
of data. CV= QD/Median * 100%
Angle between 2 regression lines
If the angle is 90 or tanΘ the lines are COEFFIENT OF VARIATION(CV):-
independent (r=0) CV=SD/Mean 8 100%
If the angle is 0 or tanΘ is 0 or pi then the
are perfectly co-rellated (r=-1 or 1) Minimum
It is the lowest value or the smallest
NOTE: Sign of byx and bxy are value in the observation
always the same and so will be the Measure of Central Tendency It is the smallest or greater than 0%
sign of r, ie., if both are -ve, r is also - of the data
ve and vice versa
bxy = r*σx/σy
byx = r*σy/σx Probability and
r = sqrt(byx*bxy) 1st Quartile
r will always be between -1 and 1 Statistics It is at first quarter position dividing
the total observation in 1/4 (25%)
and 3/4 (75%) population.
It is greater than 25% of the data
Moment measure of skewness
Denoted by g1 To get a quick summary of both
Median
g1=m3/(m2)^3/2 measure of central tendency and
It is the central value that divides the
If g1=0 it is symmetric measure of dispersion, we combine 5
total observation in half or 50%
If g1>0 it is positively skewed values
if g1<0 it is negatively skewed 3rd Quartile
It is the third quarter position
Pearson's skewness measure dividing the total observation in 3/4
1st=> (mean-mode)/SD (75%) and 1/4(25%) population.
2nd=> 3*(mean-median)/SD It is greater than 75% of the data
Both should be from -3 to 3

Skewness Maximum
Bowley's skewness measure It is the highest value in the
It is the lack of symmetry. Any deviation from the
{(Q3-Q2)-(Q2-Q1)}/(Q3-Q1) observation
symmetry is called asymmetry. A frequency distribution is
called symmetrical if the variable value equidistant from the BOX PLOT It is greater than all other values
mean have equal frequency A box plot is a graphical (100%) in the data
Deviation of the frequency representation of five number
A frequency distribution having summary with outliars plotted Outlyers
moderate peakness is known as
distribution individually. An outlyer is an observation in a Lower point=Q1-1.5*IQR
Mestokurtic.(usually it follows normal given data set, that lies far from the Higher point=Q2-1.5*IQR
curve). observation.

A frequency distribution having more In a box plot:-


peakness than a moderate curve is 1)The central box spanning from Q1 to Q3
known as Leptokurtic . Kurtosis 2)A line in the box is our median (Q2)
It is refers to the peakness of a 3)Points outside the outliar range are plotted individually by
A frequency distribution having less frequency curve. It measures the circling them and mentioning their values.
peakness than a moderate curve is convexity of the curve. 4)Lines extended from the box out to the smallest and largest
known as Pletokurtic . observation that are not outliar.

Measure of Kurtosis Note: Box plot can be Horizontal or Vertical


denoted by g2
g2=(m4/{m2^2})-3
If g2=0 ->Mestokurtic if A=0 the Rth order is called raw moment
If g2>0 ->Leptokurtic The Rth order moment about any point A is m'(A)=
If g2<0 ->Platokurtic i/n{(autosum i=1 to n) (xi-A)^r} for simple data
1/n{(autosum i=1 to n) (xi -A)^r*fi} for frequency type If A= x bar= mean, the Rth order is
called as Rth order central moment

What is statistics? 1st order raw moment is "Mean"


Statistics is a science of decision making on the basics. Second order raw moment is "Zero"
observation drawn from population under uncertainty, i.e, it is a First order central moment is "Zero"
Statistics
mathematical disciplined convened with the collection. of sample
of data summerising the data, analysing the data and c=class width
interpretation of the data towards a valid decision Moments
m'1=m1(corrected)
PRIMARY DATA - The data which are
collected for the first time by the m'2=m2-c^2/12
investigation or someone on behalf of RAW MOMENTS
him for the purpose of given enquiry. m'3=m3-(m1*c^2)/4
Example:- Data collection of ages
from your class.
m'4=m4-(m2*c^2)/2 + 7*c^4/240
Based on Source
SECONDARY DATA - The data which Error Correction (sheppard's correction)
is taken from a record or the data is c=class width
already available, that means data
are already collected by some agency m'2=m2-c^2/2
or by some organisation. I am
accessing the date I own purpose. CENTRAL MOMENTS m'3=m3
Example:- The health data collected
from some hospitals.
m4=m4-(m2*c^2)/2+7*c^4/240
QUALITATIVE DATA- When the data
isn't expressed using numbers
Example:-Color of any object
Based on type Types of DATA
QUANTITATIVE DATA- When the data
is expressed through numbers
Example:-Age, Height, Weight

Cross Sectional DATA


It is a type of data in which is
collected by observing many objects
at the same or approximately the
same point of time or without
regards of the difference of time. Time Series Data / Historical Data
Example- Rainfall in a year in Based on the number of object
When the data are arranged
different states of India measurment taken in a particular time
according to time or order of time,
frame
then the data
FREQUENCY DISTIBUTION
When the data are written one by
one and we can observe all the data.

NOMINAL
The scale used for labelling variable into distinct classification
and doesn't include numerical representation.

CARDINAL
It is used to simple describe the order of the variable but
doesn't have the property of different but two variable.
eg:-How satisfying a service is
LEVELS OF MEASUREMENT
INTERVAL
A numeric scale where the order of the variable is known and
difference of the two variable is also known and there doesn't
exist the zero.
Eg:- Time

RATIO
An interval scale which have the property of containing the zero

Discrete Variable: When the data taken are isolated or discrete values within
its range of variation
Example:-No. of phone calls recieved in a day, No. of Children in a home
Variable types
CONTINUOUS VARIABLE: When the data is assumed to take every possible
value from its range of variation
Example:- Age, Weight, Height, etc.

Advantage-
Data can be presented in depth
Disadvantage- Textual/ Paragraph format
Inefficient way to represent statistical data
Time consuming to locate a singular data

Advantage-
Easier data view
Easily comparable with other data
Disadvantage- Tabular Format
Difficult to handle large amount of
data
Difficult to go in depth

Advantages-
Easy data view
Easily understandable by all
Easily Comparable with other data
Can handle large amount of data
Disadvantages-
Cant go in depth
Only comparable on two factors

Theory

Representation of data

Simple Bar Diagram - Graph which consists of a


no. of rectangle (usually called a bar) is used
for one dimensional comparison. It is used to
show absolute change in magnitude of time and
space. Change in time or space is usually taken
at x-axis. with rectangles of equal width are
drawn with length varying with a magnitude

Multiple Bar Diagram - It is used for


comparison of two or more variables. For
comparison of magnitude of two or more
variables, a group of rectangles are placed side
by side, The bars and shaded or colored to
show which bar represent which variable and
explains in the body.

Graphical Format

Types

Sub Divide Bar Diagram:- Bars are drawn to


represent the total magnitude of the variable.
One bar represent one time period or space
area. Then each bar is divided into several
segments and each segment represents a
component of the total. To distinguish to
between the different components we use
different shading or coloring and explain it in
the body.

HISTOGRAM- A histogram is a graphical


representation of grouped frequency data
with continuous classes. It is an area diagram
and can be defined as a set of rectangle

PIE chart-When an aggregate is divided into


different into different components rather
than their absolute contribution.
It can be calculated as each value/Total
value*360 is the angle of their sector.

Bar and Histogram Differences

LESS THAN-
In this type of the reference are taken are the value in
the upper boundary similarly as in greater than types
at lower boundary of the class
Types
GREATER THAN-
In this type of the reference are taken are the value in
the lower boundary similarly as in lesser than types at
upper boundary of the class

CUMMULATIVE FREQUENCY
The number of observations which
are less than or equal specified limit
is called cummulative frequency

OGIVE CURVE
A graph of less than and greater than cummulative graph

Relative Frequency=Frequency of the class/Total Frequency

Frequency Density=Frequency of the class/Class Width

You might also like