Probability and Statistics
Probability and Statistics
,xn
Then arithmetic mean is
Case-1 AM=X bar = series(sum of all
numbers)/n(Total number of
observation)
Theorem For GM
Evaluation methods
Various types of measures
Probability Density Function(PDF) 1)Arrange data in Ascending order
PDF of a random variable is 2)Count the total number of data in
defined as the picture given above the data set, say 'n'
If the value of function after 3a)If n is odd, median=(n+1/2)th obsn
integration comes to 1 it is a PDF 3b)If n is even, median={(n/2)th obsn+
(n/2+1)th obsn}/2
1st Quartile
It divides the total distribution to 1:3
No-Correlation
Central tendency ratio. So it is a value of the variable
Q1=xl+(N/4-cf)/f*h
If the values of two variable follow different such that 25% of the observation
directions irrespective of one another It is defined as the statistical
falls below it and 75% of the
example:- measure that identifies a remaining observation falls above it
Height and weight of two separate people single value as a
representation of the entire 2nd Quartile
Correlation Coefficient
Two independent variable are un- It divides the total distribution to 1:1
Coefficient between two random variable x & y population, to appear at the ratio. So it is a value of the variable
related/independent if cov(x,y)=0
Then r(x,y)=0
are denoted by- center of the entire QUARTILES Types such that 50% of the observation Q2=xl+(N/2-cf)/f*h Here N is the total frequency
r(x,y) also known as Karl Pearson's coefficient say N/n is the required class(rc)
but if r(x,y)=0 then may or may not population. falls below it and 50% of the
Theory of co-relation. It is defined by, xl is the lcb of rc
be independent remaining observation falls above it.
r(x,y)=cov(x,y)/σx.σy cf is the cummulative frequency of
Example x= -3 -2 -1 0 1 2 3
where, cov(x,y)=1/n(Σ{[xi-x bar]*[yi- y bar]}),
Co-relation It is also called the median
previous class
y= 9 4 1 0 1 4 9
σx and σy are the respective SD Co-relation is an analysis of f is the frequency of rc
cov(x,y)=0 3rd Quartile
but x & y are related as y=x^2
(square root of {1/n*Σ[xi-x bar ]^2} = relationship between two It divides the total distribution to 3:1
sqrt{i/n*Σ[xi]^2-x bar^2})
variable. There are mainly 3 ratio. So it is a value of the variable
Q3=xl+(3N/4-cf)/f*h
QUARTILE DEVIATION such that 25% of the observation
Shearman Rank Co-relation types of correlation
ρ(x,y)=1-{6*Σ(di^2)}/n*(n^2-1) falls below it and 75% of the
where di is the difference of rank in xi and yi remaining observation falls above it
RANGE
It is the simpliest measure of dispersion
Range= Maximum - Minimum
COMBINED VARIANCE
VARIENCE if two groups with n1 and n2 observation have x1 and x2 as mean and s1 and
var(x) = 1/n {autosum i=1 to n(xi-mean)^2 * fi} s2 as standard deviation, then combined variance is
var=n1*s1^2 + n2^2 + n1(x1-combined mean)^2 + n2(x2-combined mean)^2/n1+n2
Skewness Maximum
Bowley's skewness measure It is the highest value in the
It is the lack of symmetry. Any deviation from the
{(Q3-Q2)-(Q2-Q1)}/(Q3-Q1) observation
symmetry is called asymmetry. A frequency distribution is
called symmetrical if the variable value equidistant from the BOX PLOT It is greater than all other values
mean have equal frequency A box plot is a graphical (100%) in the data
Deviation of the frequency representation of five number
A frequency distribution having summary with outliars plotted Outlyers
moderate peakness is known as
distribution individually. An outlyer is an observation in a Lower point=Q1-1.5*IQR
Mestokurtic.(usually it follows normal given data set, that lies far from the Higher point=Q2-1.5*IQR
curve). observation.
NOMINAL
The scale used for labelling variable into distinct classification
and doesn't include numerical representation.
CARDINAL
It is used to simple describe the order of the variable but
doesn't have the property of different but two variable.
eg:-How satisfying a service is
LEVELS OF MEASUREMENT
INTERVAL
A numeric scale where the order of the variable is known and
difference of the two variable is also known and there doesn't
exist the zero.
Eg:- Time
RATIO
An interval scale which have the property of containing the zero
Discrete Variable: When the data taken are isolated or discrete values within
its range of variation
Example:-No. of phone calls recieved in a day, No. of Children in a home
Variable types
CONTINUOUS VARIABLE: When the data is assumed to take every possible
value from its range of variation
Example:- Age, Weight, Height, etc.
Advantage-
Data can be presented in depth
Disadvantage- Textual/ Paragraph format
Inefficient way to represent statistical data
Time consuming to locate a singular data
Advantage-
Easier data view
Easily comparable with other data
Disadvantage- Tabular Format
Difficult to handle large amount of
data
Difficult to go in depth
Advantages-
Easy data view
Easily understandable by all
Easily Comparable with other data
Can handle large amount of data
Disadvantages-
Cant go in depth
Only comparable on two factors
Theory
Representation of data
Graphical Format
Types
LESS THAN-
In this type of the reference are taken are the value in
the upper boundary similarly as in greater than types
at lower boundary of the class
Types
GREATER THAN-
In this type of the reference are taken are the value in
the lower boundary similarly as in lesser than types at
upper boundary of the class
CUMMULATIVE FREQUENCY
The number of observations which
are less than or equal specified limit
is called cummulative frequency
OGIVE CURVE
A graph of less than and greater than cummulative graph