EMQC_Notes
EMQC_Notes
1. Linear Algebra
By using linear algebra, we can convert any row data into useful data or data which is easily
interpretable by particular machine. Every type of data can be represented as matrices or as vectors.
Linear algebra is fuel of machine learning.
Word Embedding : Representing a word with vector.
1.2) Basic Terminology :
y
Cit
um
Linear dependent vectors : A finite set of a vector space is said to be linearly dependent (LD) if there
ant
exists a set of scalers k1, k2, k3, …, kn such that, k1u1 + k2u2 + … + knun = O (zero vector)
Remember set of scalers can be zero or nonzero.
Linear independent vectors : If none of the vectors can be written as a linear combination of the others.
OR A set of vectors is linearly independent if the only linear combination of the vectors that equals 0
Qu
Questions :
1) If a set of vectors are LD then all vectors can be represented by a linear combination of set of
vectors ?
Answer : Surprisingly the answer is NO. Consider these vector set {0, v}, {(1, 0, 0), (0, 1, 0), (0, 0, 0)}.
In first case we cannot represent v as linear combination of vector 0 for v ≠ 0. And in second set we
cannot represent (1, 0, 0) or (0, 1, 0) as linear combination. Here you may observe a pattern in each
Engineering Mathematics
case there are no non-zero LD vectors so, the correct answer to this question is all vectors can be
represented by a linear combination of set of vectors if and only if it contains at least 1 non-zero LD
vector.
2) can we have more than 2 independent vectors in R2. Answer is NO. Because if you have two
independent vectors then we can construct any vector in R2 using these two-independent
vectors. If we add some vector to this set, we would have set having dependent vectors. If we
extend this idea then we cannot have more than n independent vector in Rn space. Or in other
word “If a subset of Rn contains more than n vectors, then the subset is linearly dependent.”
3) what if we have 3 vectors or less than 3 vectors in R3 space are they independent or dependent
? For 2 vectors in R2 space is easy we just have to put ratio but here in above case we cannot
use, so we have another method which we will learn later.
Note :
• A single element set {v} is linearly independent if and only if v ≠ 0.
• Another meaning of Ax = b. We know that A is group of vectors. So, this means b is linear
combination of columns of A. Not only that but If we have set of A and b then this set is
y
linearly dependent provided columns of A are linearly independent.
• b is column space of A means b can be represented as linear combination of columns of A.
1.3)
x+y=2
System of linear equations :
Linear
2x – 3y = 1.5 Linear
x2 + y = 5 Non-linear
Cit
• Here A is called coefficient matrix, and if you include b into it, it is called argument matrix.
um
x – xy = 4 Non-linear
Group of these linear equation known as system of linear equation. We can represent this system of
linear equation in the form of Ax = b.
x1 – 2x2 = -1
ant
-x1 + 3x2 = 3
This system of linear equation can be represented in two formats :
1 −2 𝑥1 −1 1 −2 −1
[ ][ ] = [ ] 𝑥1 [ ] + 𝑥2 [ ] = [ ]
−1 3 𝑥2 3 −1 3 3
This system of linear equation can have
Qu
Questions :
Engineering Mathematics
1) If Ax = 0 has some solution then what can you say about linearly dependency of columns of A.
Answer : If solution is trivial then columns of A are independent. If they are nontrivial then it will be
linearly dependent. If they are nontrivial then we can write one row as ax1+bx2+…=0 which is
dependency condition.
2) Suppose a matrix A3x4 contains 3 linearly independent columns, What can you say about the
solutions to Ax = b ? (here b ≠ 0)
Answer : size is 3 x 4 means here we are taking about R3 space. And there are 4 vectors of that space
clearly it is written 3 linearly independent columns so any of the one column is redundant means we
can create that with 3 linearly independent columns so we ignore that column and we always have
solution.
3) Suppose a matrix A5x4 contains 3 linearly independent columns. What can you say about the
solutions to Ax = b ? (here b ≠ 0)
Answer : Size is 5 x 4 means here we are taking about R5 space. And there are 4 vectors out of which
3 are linearly independent columns so remaining 1 vector may or may not be linearly independent.
So, there may be solution to this system or there cannot be a solution. And answer also depends upon
the b as if b is a linear combination of columns of A then solution exists as we have combination of 5
y
and otherwise no solution.
So, in Conclusion we can say
Cit
1. If you can fill space (this also include if b is linear combination of columns of A) → always
solution. Because we can create b from A means we can create solution i.e. b out of A.
2. If you cannot fill space → cannot be inferred
4) Consider a matrix A with dimension mxn. For a system Ax = b, what can you say about below
statement ? (note that b may or may not be zero)
Statement : if m<n then Ax = b always has solution. (False)
um
Answer : Because nothing can be given about columns of A i.e. if they are linearly independent.
Statement : If m>n then none of the system Ax = b has solution. (False)
Answer : Because one of the columns of A can be linearly dependent upon b so we can have solution.
ant
Questions :
Engineering Mathematics
Answer : ∑𝑛𝑖=1 𝑐𝑖𝑎𝑖 = 0 this condition means columns of A are linearly dependent as we can represent
ai using other combination of columns of A (please look carefully). Now, it is also given that
∑𝑛𝑖=1 𝑎𝑖 = 𝑏 this means we have solution as we can represent b as linear combination of columns of
A. Now, as b is linear combination of columns of A. Columns of A are dependent so there can be many
combinations of dependency of columns of A, I mean all columns can be same or some are same so
there are infinitely many combinations. So, there are infinitely many solutions.
y
Gaussian elimination is nothing but converting original matrix to echelon form of matrix.
1.5.1) Echelon form of matrix (or Row echelon form) : (This method is used to calculate rank of matrix)
Cit
• All nonzero rows are above any rows of all zeros
• All entries in a column below a leading entry are zero
• The leading entry of any row occurs to the right of the leading entry of the row above it. (take
second row as an example, leading entry of any row occurs – Take second row, to the right of
the leading entry of the row above it. – row above it means first row and right of the first row
is second which is in the right side.)
um
ant
Questions :
1) Which are basic variables (pivot) and which are free variables in given augmented matrix ?
Engineering Mathematics
Answer :
1 −3 0 −1 −2 0
1 4 3 0 0 1 2 0 2
1 0 0 0 0 1 0 3 3
[0 0 1 −3 1] , [ ],[ ] , [ 0 0 1 5]
0 1 0 0 0 0 1 1 −1
0 0 0 0 0 0 1 0 3
0 0 0 0 0 0
First here variables mean columns and we ignore last columns as it is augmented matrix (read question
carefully) last column represents “b”. Now, we take each matrix one by one.
Matrix 1 : basic variables (columns) : 1,3 ; free variables : 2, 4.
Matrix 2 : basic variables : 1, 2; free variables : NULL.
Matrix 3 : basic variables : 1, 3, 4; free variables : 2, 5.
Matrix 4 : basic variables : 1, 2; free variables : 3.
From this example we can conclude that free variable or free column is always linearly dependent on
pivot columns.
y
:
1) Swap the positions of two rows.
2) Multiply a row by a non-zero scalar.
Cit
3) Add to one row a scaler multiple of another. (We can do this operation because if u = v and x
= y then u + x = v + y and if you want intuitive reason then replacing one vector by resultant
vector doesn’t change its meaning)
um
1.5.4) Row reduced echelon form :
Row reduced echelon form = all pivots are 1. All elements below + above pivots are zero. In row
echelon form we only have all element below are zero but here in row reduced echelon form we have
one more condition that elements above pivots are also zero.
ant
NOTE :
1) Rank is zero only for zero matrix.
2) Number of variables = Number of columns of coefficient matrix = pivot columns + Free
variables
• Heterogenous (Ax = b)
Question :
1) Solve the following system of equations using Gaussian reduction.
𝑥1 + 𝑥2 − 𝑥3 = 0
𝑥1 − 𝑥2 + 𝑥3 = 2
2𝑥1 − 𝑥2 − 𝑥3 = −3
Answer : Augmented matrix is
1 1 −1 | + 0
[1 −1 1 | + 2]
2 −1 −1 | − 3
Converting it to row echelon form,
1 1 −1 | + 0
[0 −1 1 | + 1]
0 0 1 |+3
∴ 𝑥3 = 3, 𝑥2 = 2, 𝑥1 = 1
2) Solve the following system of equations.
y
5𝑥1 − 11𝑥2 = −2
5 −11 | − 2
9 | + 1]
−2 | − 1
um
1 −2 | − 1
[0 1 | − 3]
0 0 |+0
∴ 𝑥2 = −3, 𝑥1 = −7. This is beautiful example, if you encounter 0 0 |0 in any row then it is not
ant
always that solution is infinite. If pivot element = n then unique solution regardless of 0 0… | 0.
3) Consider converting augmented matrix to row echelon form you come up with following
3 5 −4 | + 0
[0 −3 0 | + 0]
Qu
0 0 0 |+0
Clearly, this is not in row echelon form as it fails to satisfy third point of form. So, what to do now ?
Answer : if you see final row it says 0 = 0 (which is true)
Very first step : Identify the free variables and assign a constant parameter to that. Clearly third
column or variable is free variable. If there are more than one free variable then assign more than one
parameter to them.
∴ 𝑥3 = 𝑘 𝑎𝑛𝑑 𝑥2 = 0
3𝑥1 + 5𝑥2 − 4𝑥3 = 0
3𝑥1 − 4𝑘 = 0, 𝑥1 = 4𝑘/3
4𝑘/3
𝑥=[ 0 ]
𝑘
So, how many solutions are possible = infinite
How many Linearly independent solutions are possible = 1
Here number of free variables is 1 that is why you have one independent solution which means one
nullity. Which means determinant is zero (presence of free variable implies zero determinant).
Engineering Mathematics
Here b = 0 now consider, b = some real number i.e. heterogenous system of equation. Then the value
of x would be x + some constant. This constant represents the perpendicular distance between two
resultant line of Ax = 0 and Ax = b.
4) Consider converting augmented matrix to row echelon form you came up with following
1 −3 1 |+4 1 −3 1 | + 4
[−1 2 −5 | + 3] → [0 −2 −4 | + 7]
5 −13 13 | + 8 0 0 0 |+2
Answer : If you see carefully last row is 0 = 2 which is false. Means there is no point of solving these
equations simultaneously because equation itself is wrong after gaussian elimination. So, there is no
solution.
y
0 stretch along z axis. There is decrease in column space from R3 to R2 which means decrease in rank.
So in R3 space we have R2 space filled which represents plane with infinite solution.
Cit
If you see in 4th question, we have zero in last row. Now b represents that after applying linear
transformation third variable should land on 2 point of third axis. Which is impossible like how third
vector has 0 stretch and can land on 2 which is non-sense so no solution exists.
Sometimes the required vector b itself represents whole plane in that case solution is infinite.
But here the catch in homogenous system we do not have nonzero b so, in homogenous system of
equation we always have solution (trivial or non-trivial).
um
6) Find numbers a, b, c and d such that the linear system corresponding to the augmented
matrix.
1 2 3 |𝑎
ant
[0 4 5 |𝑏 ]
0 0 𝑑 |𝑐
has a) no solution, and b) infinitely many solutions.
Answer : a) for no solution, d = 0, c ≠ 0, a and b could be anything.
b) for infinitely solution, d = 0, c = 0, a and b could be anything.
Qu
7) Rank (A) ≠ Rank (A|b) then system has ? or same question with Rank (A) = Rank(A|b) then
system has?
• No solution
• Unique solution
• Infinite solution
Answer : Consider a first case when b is Linear combination of columns of A. In that case the rank (A)
= Rank (A|b). And because we can create b from A we always have solution. Remember solution exist
meaning there can be unique or infinite solution.
Now, consider second case when b is not linear combination of columns A. In that case the rank (A) ≠
Rank (A|b). because we cannot produce b from A solution doesn’t exist. And we are adding new
dimension to row space of A. The rank of matrix will be increase by one. So, Rank(A|b) = 1 + Rank(A)
if b is Linearly independent.
Engineering Mathematics
Now, let’s look at another point of view. In question 6, we have first case where c ≠ 0, we know that
we cannot create nonzero c from any columns of A. Means b is Linearly independent. So, in first case
in question 6, rank(A) + 1 = Rank(A|b) and in second case if rank(A) = Rank(A|b) = n then there is
unique solution. And if Rank(A) = Rank (A|b) < n then obviously we have free variables so there is
infinite solution.
y
In this section we talk only about square matrix. Let’s start
1.7.1) Determinant : It represents area in R2 space and volume in R3 space.
Some properties of determinant :
1. Determinant of identity matrix is 1.
Example : |
𝑡𝑎 𝑡𝑏
𝑐 𝑑
| = 𝑡|
𝑎 𝑏
𝑐 𝑑
| and |
𝑐
Cit
2. The determinant changes sign when two rows (or two columns) are exchanged.
3. Linearity for one row (or one column) at a time
𝑎 + 𝑎′ 𝑏 + 𝑏′ = 𝑎 𝑏 + 𝑎′ 𝑏′
𝑑
| |
𝑐 𝑑
| |
𝑐 𝑑
|
um
See this carefully, |𝑎 + 𝑎′ 𝑏 + 𝑏′| = |𝑎′ 𝑏 | + |𝑎 𝑏′| This is also same as we can
𝑐 𝑑 𝑐 𝑑 𝑐 𝑑
interchange in addition sign.
𝑎 𝑏 2𝑎 2𝑏
And one more observation, in matrix we have 2 [ ]=[ ] but in determinant only
𝑐 𝑑 2𝑐 2𝑑
ant
Questions :
1) Now, we can calculate determinant of any matrix using only first 3 properties. But how?
Answer : We know that we can convert any matrix into row echelon form when we end up getting
upper triangular matrix and then we can apply these 3 properties to get determinant in which only
one entry per row is there so, after that we apply 3rd property to get desired value of determinant.
2) And we end up having more determinants which adds up to the value of determinants. We
conclude by observation that if we expand nxn determinant then total nn determinants and
n! terms will survive and other will die (evaluate to zero).
NOTE :
1) Above cofactor formula is also tells us some useful property that when we multiply element
and cofactor of same row and then add them, we get determinant of matrix.
Ex. 𝑫𝒆𝒕(𝑨) = 𝒂11𝑪𝟏𝟏 + 𝒂𝟏𝟐𝑪𝟏𝟐 + ⋯ + 𝒂𝟏𝒏𝑪𝟏𝒏. This also known as big formula
2) Whereas, when we multiply element and cofactor of different row and then add them, we
get zero.
Ex. 𝑫𝒆𝒕(𝑨) = 𝒂11𝑪𝟐𝟏 + 𝒂𝟏𝟐𝑪𝟐𝟐 + ⋯ + 𝒂𝟏𝒏𝑪𝟐𝒏
3) Matrix is singular if its determinant is zero.
y
𝐴 = [𝑎21 𝑎22 𝑎23] 𝑎𝑛𝑑 𝑻𝒓𝒂𝒏𝒔𝒑𝒐𝒔𝒆𝒐𝒇(𝐶𝑜𝑓𝑎𝑐𝑡𝑜𝑟(𝐴)) = [𝐶12 𝐶22 𝐶32]
𝑎31 𝑎32 𝑎33 𝐶13 𝐶23 𝐶33
𝐴 × 𝐴𝑑𝑗(𝐴) = [ 0
|𝐴| 0
0
|𝐴|
0 |𝐴|
1 0 0
0
0]
um
∴ 𝐴 × 𝐴𝑑𝑗(𝐴) = |𝐴| [0 1 0]
0 0 1
𝐴𝑑𝑗(𝐴)
∴𝐴× =𝐼
|𝐴|
ant
𝐴𝑑𝑗(𝐴)
Which means |𝐴|
= 𝑖𝑛𝑣𝑒𝑟𝑠𝑒𝑜𝑓(𝐴)
𝑎 𝑏 1 𝑑 −𝑏
Inverse of 2x2 matrix : 𝐴 = [ ] ℎ𝑎𝑠 𝑖𝑛𝑣𝑒𝑟𝑠𝑒 [ ]
𝑐 𝑑 𝑎𝑑−𝑏𝑐 −𝑐 𝑎
Qu
Questions :
1) Why rectangular matrix does not have determinant ?
Answer : Determinant gives us useful information about almost everything from eigen values to
inverse. Also, determinant gives us information about contribution of each axis of resultant vector to
its space. Now, consider if some matrix always has zero determinant. Such matrix is useless and we
never calculate its determinant or inverse or something related to it. Now consider 2 x 3 matrix which
has zero contribution to third axis and thus its determinant is always zero or not applicable. Now,
consider 3 x 2 matrix, such matrix represents plane in ℝ𝟑 space. That is why we don’t calculate
determinant of rectangular matrix.
2) Determinant of inverse and adjoint of matrix.
1 𝑑 −𝑏
Answer : Consider, 2 x 2 matrix and its inverse 𝑎𝑑−𝑏𝑐 [ ].
−𝑐 𝑎
|𝐴−1 | = (
1 𝑑 −𝑏 1 1
)2 | | = 𝑎𝑑−𝑏𝑐 = |𝐴|.
𝑎𝑑−𝑏𝑐 −𝑐 𝑎
𝐴𝑑𝑗(𝐴)
We know that, |𝐴| = 𝐴−1 , 𝐴𝑑𝑗(𝐴) = 𝐴−1 × |𝐴|
Engineering Mathematics
Cramer’s rule :
y
𝑏3 𝑎32 𝑎33 𝑎31 𝑏3 𝑎33 𝑎31 𝑎32 𝑏3
|𝐴1|
1
∴𝑥=
Cit
|𝐴|
[|𝐴2|]
|𝐴3|
um
ant
But why are they always in the limelight? – It turns out that several properties of matrices can be
analyzed based on their eigenvalues. And in machine learning there is concept called regularization
which uses concept of eigen values and eigen vector. Which is used by every machine learning model.
There are infinite eigen vectors for any matrix but the question is how many of them are LI ?
Calculating eigenvalues and eigen vectors :
𝑨𝒗 = 𝝀𝒗
∴ 𝐴𝑣 − 𝜆𝑣 = 0
∴ (𝐴 − 𝜆𝐼)𝑣 = 0
But v ≠ 0, so det(A-𝜆I) = 0. Determinant is zero that means there are free variable and infinite solution.
Question :
1) Eigenvectors from different eigenvalues are linearly independent.
Answer : Consider 𝜆1, 𝜆2 are the eigenvalues corresponding to eigenvectors e1 and e2 respectively.
Then if 𝜆1 ≠ 𝜆2 then e1 and e2 are linearly independent. This we have to prove.
Engineering Mathematics
3) If number of free variables in (A- 𝜆1I)x = 0 are 2 then what is geometric multiplicity of 𝜆1?
y
Answer : As free variables defines number of LI eigenvectors GM = 2 and AM = at least 2. Free variable
= number of LI eigenvectors why ? because we assign variable k to row which is zero or to free variable.
This variables like k,s,t decides eigenvectors.
Cit
Now, we have seen that if (𝜆 − 𝜆1 )𝑚1 then we can have at least one or at most m1 LI eigen vectors
but is there any special matrices for which AM = GM. Let’s see.
In other words – Are there matrices An x n which will always have n independent eigenvectors (even
when one or more eigenvalues are repeating) ?
um
Yes!, symmetric matrices.
if eigenvalues are repeating and Aij = Aji which mean AT = A. How can single eigenvalue have more
than two eigen vectors ? – It is possible when consider AG = 2. And GM should be 2 as well (as it is
symmetric matrix) so eigenvector have form v = ku1 + lu2 where k, l is constant and u1 and u2 are LI
eigenvectors. If the n × n matrix A is symmetric then eigenvectors corresponding to different
Qu
eigenvalues must be orthogonal to each other. Furthermore, in this case there will exist n linearly
independent eigenvectors for A, so that A will be diagonalizable.
Question :
1) Can Ax = 0 have a unique non-trivial solution ?
Engineering Mathematics
Answer : x = 0 is trivial solution. This always exist in case of homogenous equation. That means rows
are linearly dependent because we always have c1x1 + c2x2 + … + cnxn = 0. Means we always have
infinite many solutions because we can have infinite many combinations for each solution of x. So, we
can replace each c1 or c2 or c3 by different equation which means we have more than one solution
but question asks for unique solution. Which cannot exist.
One interesting fact is Ax = 0 is nothing but characteristic equation corresponding to 𝜆 = 0.
Read question no 3 carefully, No of LI eigenvectors for 𝜆 = 0 are 2 that means we are talking about
free variable which is nothing but nullity so, another definition of NULITY = No. of eigenvectors in
characteristics equation corresponding to 𝝀 = 𝟎.
2) Let A be 5 x 5 matrix and one of the eigenvalues of A is 0. It is also known that there are 4
linearly independent eigenvectors corresponding to 0 eigenvalue. What is Rank of A?
Answer : If one of the eigenvalues of A is 0. That means rank < 5. And 4 LI eigenvectors are
corresponding to 0 which is nothing but Nullity or number of free variables. Which means rank = 1.
y
3) Let A be 15 x 15 matrix and one of its eigenvalues is zero. What is rank of A ?
4)
Cit
Answer : one of its eigenvalues is zero which means rank < 15. But no other information is given so
rank cannot be determined.
Let A be 15 x 15 matrix and one of its eigenvalues is zero. It is also known that there are 10
linearly independent eigenvector of A. What is rank of A?
Answer : 10 linear independent eigenvectors of A is given which may or may not be corresponds by
um
eigenvalue zero. Which means rank cannot determined.
1.8.2) Cayley-Hamilton theorem : It says every square matrix satisfies its own characteristics equation.
From Cayley-Hamilton theorem, we can also say that if eigenvalue of A is 𝜆 then eigenvalue of An is 𝜆n.
ant
eigenvalues.
Question :
1) Let A be 3 x 7 and B be 7 x 3 and eigenvalues of AB are 1, 2, 4. Then what will be eigenvalues
of BA ?
Answer : As from 1.8.3, AB and BA shares same eigenvalues so eigenvalues of BA will be 1, 2, 4 and 4
0’s. Why 4 0’s because dimension of BA is 7x7 so after copying non-zero eigenvalues all remaining
eigenvalues must be zero to accommodate.
2) Let A be 4 x 3 and B be 3 x 4 then AB must have at least one zero eigenvalues. It one eigenvalue
is not zero then BA would be wrong. One more reasoning is that A has 3 columns and the
columns of AB are linear combination of A only. Because AB is 4 x 4 and is generated using A.
Means A has 3 columns and using these 3 columns and AB contains 4 columns. Now, these 4
columns should be linearly dependent as these 4 columns are generated from 3 LI columns of
A or LD columns of A. that means determinant of AB is zero which means one of the
eigenvalues of AB should be zero. As determinant is product of eigenvalues.
Engineering Mathematics
3) Consider A be 10x10 matrix having rank 2. So, Nullity is 8. Rank is 2 means determinants should
be zero. Because you have 10 dimensions of which only 2 dimensions are dominating so
determinant is zero. Determinant is nothing but product of eigenvalues. Which means one of
the eigenvalues is zero. And This nullity is nothing but no of linearly independent eigenvector
corresponding to these zero eigenvalues.
NOTE :
1) Single eigen value can have multiple LI eigenvectors. You have written this but above but
still.
2) If there is pivot element in any column then it is LI also. So, if every row contains pivot
element means that columns can have ability to produce any vector in that space.
3) When solving Ax = b question, also take help from pivot and free variable.
4) If Ax = b has unique solution then A has to be invertible. This seems correct at first glance
but consider A matrix having 3 x 2. (See page no 6 Q2) Here unique solution exists but A is
not invertible since it is not square matrix. So, this sentence is false.
y
5) Unit eigenvector is nothing but vector divided by magnitude. Same as unit vector. Consider
x be the unit eigenvector then x.xT is 1 because x.xT is nothing but magnitude of unit vector
which is nothing but 1.
1.8.4) LU decomposition :
Cit
6) A matrix is diagonalizable if and only if it has n LI eigenvectors.
In LU decomposition, we convert matrix into Lower triangular and Upper triangular matrix. Off course,
we cannot convert every matrix so, there is condition “A matrix must be able to be reduced to row-
um
echelon form U, without interchanging any rows.”
𝐿21 × 𝑈11 𝐿31 × 𝑈12 + 𝐿32 × 𝑈22 𝐿31 × 𝑈13 + 𝐿32 × 𝑈23 + 𝑈33
2) Interesting method :
1 4 −3
𝐴 = [−2 8 5 ]
3 4 7
First, we convert this matrix into row echelon form,
1 4 −3 𝑅2 +2𝑅1 1 4 −3 𝑅3 −3𝑅1 1 4 −3 𝑅3 +0.5𝑅2 1 4 −3
𝐴 = [−2 8 5 ] → [0 16 −1] → [0 16 −1] → [0 16 −1 ]
3 4 7 3 4 7 0 −8 16 0 0 15.5
In first transformation, we multiplied row 1 by -2 and subtracted it from row 2. So, we write that -2 in
L. similarly in next step, we multiplied row 1 by 3 and subtracted it from row 3. So, we write that 3 in
L. That’s how we fill L. and at the end we get some random matrix which is nothing but U.
1 0 0 1 4 −3
𝐿 = [−2 1 0] , 𝑈 = [0 16 −1 ]
3 −1/2 1 0 0 15.5
Engineering Mathematics
One thing to note that If we multiply diagonal entries of upper triangular matrix you will get
determinant of original matrix. This is one of the methods to find out determinant but use only row
transformation and no row swapping.
y
• (AT)-1 = (A-1)T.
Cit
um
A matrix can be both L/U triangular.
ant
orthonormal vectors. Q-1 = QT or QTQ = I. A and B are orthogonal if A.B = ATB=0. And if it also
unit vector then it is called orthonormal. So, orthonormal = orthogonal + unit vector.
7) Idempotent : An idempotent matrix is a matrix which, when multiplied by itself, yield itself
A2=A. also An = A for any n>0.
8) Nilpotent : It is a square matrix N such that Nk = 0 for some positive integer k.
9) Involuntary : A-1 = A.
10) Hermitian : The conjugate transpose of a complex matrix A, denoted by A*, is given by A* =
(Ā)T. Where the entries of A are the complex conjugates of the corresponding entries of A.
Hermitian when A* = A.
11) Skew-Hermitian : A* = -A
Engineering Mathematics
2. Calculus
2.1) Limits :
lim 𝑓(𝑛)
𝑓(𝑛) 𝑛→𝑎
lim 𝑓(𝑛); lim =
𝑛→𝑎 𝑛→𝑎 𝑔(𝑛) lim 𝑔(𝑛)
𝑛→𝑎
While solving limits, we encounter many unusual forms of limits. Some of are given below :
∞ 0
, ,∞−
∞ 0
∞, 00 , 0 . ∞, ∞0 , 1∞
y
I. Factorization : For factorization, we follow some formula like,
Cit
𝑎3 + 𝑏 3 = (𝑎 + 𝑏)(𝑎2 − 𝑎𝑏 + 𝑏 2 )
𝑎3 − 𝑏 3 = (𝑎 − 𝑏)(𝑎2 + 𝑎𝑏 + 𝑏 2 )
um
ant
Qu
II. Rationalization :
(𝑥−1)(√𝑥−1) (𝑥+1)(√𝑥−1) (𝑥+1)(√𝑥−1) 2
Example : lim 2𝑥 2 +𝑥+3
= 2𝑥2 +3𝑥−2𝑥+3 = (𝑥−1)(2𝑥+3) = 2×5 = 0.2
𝑥→1
Engineering Mathematics
y
as 𝑥 form by taking 𝑥 𝑚 common.
Example
𝑥→∞
√3𝑥 2 +2
: lim 𝑥−2 =
2
√3+ 2
𝑥
2
(1− )
𝑥
= √3
Cit
2.1.3) Solving ∞ − ∞ form : Try to combine first ∞ and second ∞ terms.
lim 𝑓(𝑥).𝑔(𝑥)
∴ 𝑙 = 𝑒 𝑥→𝑎
𝐥𝐢𝐦 𝒈(𝒙)(𝒇(𝒙)−𝟏)
𝐥𝐢𝐦 𝒇(𝒙)𝒈(𝒙) = 𝒆𝒙→𝒂
𝒙→𝒂
This is only applicable when f(x) → 1 when x→a. Because in l equation it is 1+f(x) and here only f(x).
2.1.5) L-hospital rule : All the FUL are derived from L-hospital rule only.
2.2) Continuity :
A function f(x) is continuous at x = a if lim 𝑓(𝑥) = 𝑓(𝑎) OR LHL = RHL = value of function at a.
𝑥→𝑎
Example :
Engineering Mathematics
√1 + 𝑘𝑥 − √1 − 𝑘𝑥
, −1 ≤ 𝑥 < 0
𝑓(𝑥) = 𝑥
2𝑥 + 1
{ , 0≤𝑥<1
𝑥−1
Find k. if f(x) is continuous at x = 0.
Answer : f(0) = -1 because we select 2nd at 2nd eq., we have 0<=x so, x is equal to 0 in this function.
√1+𝑘𝑥−√1−𝑘𝑥
LHL = lim− 𝑥
= lim−𝑘 = −1
𝑥→0 𝑥→0
NOTE :
1) If F and G are continuous at x = a then F(x)±G(x), F(x).G(x), F(x)/G(x) is also continuous off
course G(a) ≠ 0 in last form.
2) Composite function F(G(x)) is continuous when G(x) is continuous at x = a and f(x) is
continuous at x = G(a).
y
Is it possible that two functions are not continuous at some point but their addition is continuous at
that point ? – Let's suppose we have two functions, f(x) and g(x), defined as follows:
2.3) Differentiability :
𝑓(𝑎−ℎ)−𝑓(𝑎) 𝑓(𝑎−ℎ)−𝑓(𝑎)
Slope = = =𝐴
𝑎−ℎ−𝑎 −ℎ
𝑓(𝑎+ℎ)−𝑓(𝑎) 𝑓(𝑎+ℎ)−𝑓(𝑎)
Qu
y
Cit
um
Here u has criteria of ILATE.
ant
Tic-tac-toe method :
Qu
Approximate by integrals :
When a summation has the form ∑𝑛𝑘=𝑚 𝑓(𝑘), where 𝑓(𝑘) is a monotonically increasing function, we
can approximate it by integrals:
Engineering Mathematics
𝑛 𝑛 𝑛+1
∫ 𝑓(𝑥)𝑑𝑥 ≤ ∑ 𝑓(𝑘) ≤ ∫ 𝑓(𝑥)𝑑𝑥
𝑚−1 𝑘=𝑚 𝑚
When f(k) is a monotonically decreasing function, we can use a similar method to provide the
bounds
y
𝑛+1 𝑛 𝑛
∫
𝑚
𝑓(𝑥)𝑑𝑥 ≤ ∑
𝑘=𝑚
Cit
𝑓(𝑘) ≤ ∫
𝑚−1
𝑓(𝑥)𝑑𝑥
For maxima and minima f’(x) = 0. Which means if there is maxima or minima then at that point f’(x) =
0 but if f’(x) = 0 then there is maxima or minima ? – This is false statement because take f(x) = X3, it’s
f’(x) = 0 at x = 0 but it is neither minima nor maxima.
um
Point of inflection : It is a point from which function is changing from concave to convex or vice versa.
At that point f’’(x) = 0. But we know that it is not point of minima and maxima. Critical points are
points where f’(x) = 0.
ant
Example :
F(x) = x3 – 3x + 3
F’(x) = 3x2 – 3 = 0
Qu
X = +1, -1.
Which point is local minima and local maxima? For that we put this value in f’(x) and calculate different
value of F’(x) and then we put those value in lines where we have +1 and -1 is represented. Like this
Putting values after taking second derivative. Sagging if value is positive and hogging if value is
negative.
Engineering Mathematics
Theorem 1 (Intermediate value theorem). If f is a continuous function on the closed interval [a,b], and
if d is between f(a) and f(b), then there is a number c Є [a,b] with f(c) = d.
If d = 0, we have useful logic. F is continuous and f(a) and f(b) have opposite sign means there exists
point where c is intersecting x axis.
A function f(x) is continuous in the interval [0, 2]. It is known that f(0) = f(2) = -1 and f(1) = 1. Which
one of the following statements must be true? –
Answer : for option A, let’s say it is false and we have f(y) – f(y+1) = g(y). Now, from note 2.2 first point
y
we know that g(y) is also continuous between (0, 1) so we take g(0) = f(0) – f(1) = -2 and g(1) = f(1) –
Cit
f(2) = 1 + 1 = 2. So, g(0) * g(1) < 0 means there is a point in between 0 and 1 when g(y) is zero which
means for y between 0 to 1 f(y) = f(y+1) hence option A is correct. Similarly, option d is also correct.
Theorem 2 (Roll’s Theorem). Suppose f is continuous on [a,b] and differentiable on (a,b), and suppose
that f(a) = f(b). then there is a number c Є [a,b] with f’(c) = 0.
Theorem 3 (The Mean Value Theorem). Suppose f is continuous on [a,b] and differentiable on (a,b).
um
Then there is a number c Є [a,b] with (f(b) – f(a))/(b-a) = f’(c).
ant
Qu
Engineering Mathematics
3. Probability
3.1) Introduction :
Axioms of probability :
1. Nonnegativity : P(A) ≥ 0
2. Normalization : P(Ω) = 1
3. Additivity : If A ∩ B = φ (disjoint), then P(A U B) = P(A) + P(B)
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡 𝑜𝑓 𝐴 𝑛(𝐴)
Where 𝑃(𝐴) = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 𝑝𝑜𝑖𝑛𝑡𝑠
= 𝑛(𝑆)
y
𝑃(𝐸 ∪ 𝐹 ∪ 𝐺) = 𝑃(𝐸) + 𝑃(𝐹) + 𝑃(𝐺) − 𝑃(𝐸 ∩ 𝐹) − 𝑃(𝐸 ∩ 𝐺) − 𝑃(𝐹 ∩ 𝐺) + 𝑃(𝐸 ∩ 𝐹 ∩ 𝐺)
Cit
Conditional probability is nothing but “Change in belief”. Example of conditioning :
• We throw 2 dice
um
• We look for P(Sum of 2 faces is 9)
From above equation we have, 𝑃(𝐴𝐵) = 𝑃(𝐵) × 𝑃(𝐴|𝐵). We call this as joint distribution.
We can extend this idea to tree cases, 𝑃(𝐴𝐵𝐶) = 𝑃(𝐶) × 𝑃(𝐵|𝐶) × 𝑃(𝐴|𝐵𝐶)
It also equals to 𝑃(𝐵𝐶𝐴) = 𝑃(𝐴) × 𝑃(𝐶 |𝐴) × 𝑃(𝐵|𝐶𝐴) = 𝑃(𝐵) × 𝑃(𝐶 |𝐵) × 𝑃(𝐴|𝐵𝐶) similarly you
can generate many formulas. We call this as Factorization of joint distribution.
In above diagram if you don’t understand RHS, then you know if you have to find final probability you
multiply all probability that occurs along the way. So RHS is nothing but that thing only. It is
multiplication of all cases probability.
y
1. Nonnegative : P(A|B) ≥ 0
2. Normalization : P(Ω|B) = 1
Cit
3. Additivity : 𝑃(𝐴 ∪ 𝐵|𝐶) = 𝑃(𝐴|𝐶) + 𝑃(𝐵|𝐶) You can easily prove this by using de’ morgen’s
formula. Provided A and B are disjoint event.
Mutually exclusive : Two events are mutually exclusive or disjoint if they do not occur at the same
time. Which mean 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵).
um
Mutually exhaustive : When a sample space S is divided into many mutually exclusive events such
that their union forms the entire sample space, these events are said to be mutually exhaustive events.
ant
Which means mutually exhaustive needs to be first mutually exclusive and then union should cover S.
Qu
Question :
Answer : Off course first is true. You can prove it by expanding terms or by intuition.
2) An unbalanced dice (with 6 face numbered from 1 to 6) is thrown. The probability that the
face value is odd is 90% of the probability that the face value is even. The probability of getting
any even numbered face is the same. If the probability that the face is even given that it is
Engineering Mathematics
greater than 3 is 0.75, which one of the following options is closest to the probability that the
face value exceeds 3?
P(even|>3) = P(4,6) / P(>3), P(4, 6) = P(4) + P(6) you do addition because 𝑃(𝑒𝑣𝑒𝑛 ∩ > 3) = 𝑃(4 ∪ 6)
P(>3) = 0.468
Answer : here were introducing some variable G or set G into existing set. We have seen introducing
set into existing set in marginalization,
But in question it is given E|F so in previous equation we put |F after every variable. We get,
y
𝑃(𝐸∩𝐹∩𝐺) 𝑃(𝐸∩𝐹∩𝐺 𝑐 )
i.e. 𝑃(𝐸|𝐹) = 𝑃(𝐸, 𝐺|𝐹) + 𝑃(𝐸, 𝐺 𝑐 |𝐹) = 𝑃(𝐹)
+ 𝑃(𝐹) then we apply formula from 3.2.1,
then cancel P(F) then we get this formula.
Cit
𝑃(𝐸|𝐹, 𝐺)𝑃(𝐺|𝐹) + 𝑃(𝐸|𝐹, 𝐺 𝑐 )𝑃(𝐺 𝑐 |𝐹)
𝑷(𝑬|𝑭) = 𝑷(𝑬, 𝑮|𝑭) + 𝑷(𝑬, 𝑮𝒄 |𝑭) this step is important.
𝑃(𝐸) ≥ 𝑃(𝐸 ∩ 𝐹 ∩ 𝐺), 𝑃(𝐹) ≥ 𝑃(𝐸 ∩ 𝐹 ∩ 𝐺), 𝑃(𝐺) ≥ 𝑃(𝐸 ∩ 𝐹 ∩ 𝐺) Minimum of these should also
ant
be less or equal.
5) If the occurrence of event F makes event E more likely, then the occurrence of E necessarily
makes F also more likely.
Answer : Now, how to express more likely. It is very simple. Usual probability is P(E) now if you say
Qu
“some event” is more likely then, we express it as P(E) <= P(some event). Similarly, here they are asking
if P(E|F) >= P(E) then P(F|E) >= P(F). Is it true ?
First term means 𝑃(𝐸 ∩ 𝐹) ≥ 𝑃(𝐸). 𝑃(𝐹) then we can get second as well. Which means true.
Let A1, A2, A3, …, An be disjoint events that form a partition of the sample space, and assume that
P(Ai) > 0, for all I, for any event B such that P(B) > 0, we have
𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 ) 𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 )
𝑃(𝐴𝑖 |𝐵) = =
𝑃(𝐵) 𝑃(𝐴1 )𝑃(𝐵|𝐴1 ) + ⋯ + 𝑃(𝐴𝑛 )𝑃(𝐵|𝐴𝑛 )
3.3.1) Independence of events :
𝑷(𝑩|𝑨) = 𝑷(𝑩)
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵|𝐴) = 𝑃(𝐴). 𝑃(𝐵)
Which mean Events A and B are independent if 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵)
y
You can see second diagram it says independent event but it is not. There must be some information
given about A and B. So, when 𝑃(𝐴) = 0.3, 𝑃(𝐵) = 0.4, 𝑃(𝐴 ∩ 𝐵) = 0.12, then A and B is called
Cit
independent event unless not. You cannot say the independency of two event just by looking at Venn
diagram, some information must be given.
NOTE : 𝑷(𝑨 ∩ 𝑩) = 𝑷(𝑨). 𝑷(𝑩) This equation does not directly imply independency, it is just
consequences of 𝑷(𝑩|𝑨) = 𝑷(𝑩). Independency means 𝑷(𝑩|𝑨) = 𝑷(𝑩).
Question :
um
1) Can two disjoint events be independent ?
Answer : If two disjoint events are there, they are called mutually exclusive event means 𝑃(𝐴 ∩ 𝐵) =
0. And independent means 𝑃(𝐴). 𝑃(𝐵) = 𝑃(𝐴 ∩ 𝐵). Which means if either A or B is zero then two
ant
2) If A and B are independent then what can you say about A’, B’.
Which means if A and B are independent then A and B’ are independent, A’ and B are independent,
A’ and B’ are independent.
Three event A, B, C are called independent events when all of four condition satisfies :
Answer : They are mutually exclusive. So, they are only independent if A or A’ are zero. Off course
both cannot be zero at the same time.
If events A and B are independent then P(A) = (A|B) or P(B) = P(B|A) or P(AB) = P(A).P(B)
Two events A and B are independent given C : 𝑃(𝐴 ∩ 𝐵|𝐶) = 𝑃(𝐴|𝐶). 𝑃(𝐵|𝐶)
Above sentence does not means A and B are independent. It says when it is given C then they are
independent. So, if above equation is true then it does not imply that P(A) = P(A|B) or P(B) = P(B|A).
which is true when A and B are independent but in this nothing about C or any other condition is given.
If 𝑃(𝐴𝐵) = 𝑃(𝐴). 𝑃(𝐵|𝐴) then 𝑃(𝐴𝐵|𝐶) = 𝑃(𝐴|𝐶). 𝑃(𝐵|𝐴𝐶). This if and then is not true I’m just
observing pattern and deriving one equation. You can see why it is not true in question section.
𝑃(𝐵|𝐴𝐶) = 𝑃(𝐵|𝐶) 𝑜𝑟 𝑃(𝐴|𝐵𝐶) = 𝑃(𝐴|𝐶) this is true if A and B are independent given C.
y
Similar for independence, in conditional independence we also have some conditions :
Question :
Cit
If A and B are independent given C then A and B’ are independent given C then A and B’ are
independent given C, A’ and B are independent given C, A’ and B’ are independent given C.
1) Assume A and B are independent i.e. 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵). Can we also say that A and B
are independent given C.
um
ant
Answer :
Let’s say A and B are independent. i.e. 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵). If you now say given C then two events
will become disjoint as there is no common region between A and B when given that C. so, that’s why
Qu
3.4.1) Random variable : It is the function that maps an outcome to the real number.
Consider, rolling a dice. Sample space is {1, 2, 3, 4, 5, 6} and you have number line. Then Random
variable is mapping these subsets of sample space to real number.
X(2) = 20, X(3) = 30. The meaning of x can be anything for example here, x may represent number of
1 or 2 or 4 on dice.
Engineering Mathematics
But then how are we writing X = 3 as you have seen last year ? – X = 3 is actually a notation which
represents { w | X(w) = 3}. Means Set of all outcomes whose value is 3.
Question :
y
1) True/false ? – Suppose we have two events X = a, X = b. We can say that both of the events
are mutually exclusive.
Cit
Answer : It is true. Because single outcome does not map to double value of X.
Answer : Yes, because X only maps to a, b, c and these set contains some events and as reason
um
explained in 1st question, these 3 events should be mutually exhaustive.
3) Consider we coin toss twice, and let X = number of head after both flips, X1= number of heads
on first flip, X2 = number of heads on second flip then P(X<2) = ? and P(X1+X2<2) = ?
ant
Answer :
Qu
Answer :
Engineering Mathematics
Answer : Part a is done as we have calculated distribution of X and Y. Now for second It says conditional
distribution P(Y|X) for X = 2. Which means P(Y | X = 2) = P(Y=1|X=2) + P(Y=2|X=2) + … . Remember
y
here summation of terms represents conditional probability it is not an operator between Y and X.
NOTE :
Cit
1) A probability distribution is valid if sum of all probabilities is 1.
2) If min(x1, x2, x3, x4,…)>1 then every term is greater than one because min of these term is
greater then 1. So, value greater than min will also be greater than 1.
um
Types of Random variable :
• Discrete : A random variable is called discrete if takes either finite or countably infinite number
of values. The number of sixes in the two rolls.
ant
• Continuous : A random variable that can be an uncountably infinite number of values. For an
example. Consider the experiment of choosing a point a from the interval [-1, 1].
So, what is probability mass function ? – Listing down probability of each value for a discrete random
variable is called PMF.
Question :
1) Suppose that a coin is tossed twice. Let X represents the number of heads that can come up.
Write down PMF of X.
Answer : As we have seen that definition it is list of probability of each value of discrete random
variable. So, P(X=2) = ¼, P(X=1) = ½, P(X=0) = ¼.
Answer : PMF is P(X=1) = 3/25, P(X=2) = 4/25, … after this you if you add all these probabilities you
will get 1. That means probability distribution is valid.
1. Collect all the possible outcomes that give rise to the event {X=x}.
2. Add their probabilities to obtain PX(x).
b) Expectation of Random variable : The PMF of a random variable X provides us with several
numbers, the probabilities of all the possible values of X. Expectation is a way to summaries
in one number. We define the expected value (also called the expectation or the mean) of a
random variable X. with PMF pX(x), by
𝐸[𝑋] = ∑ 𝑥𝑝𝑋 (𝑥).
𝑥
Consider a roll of dice. You get i rupee if outcome is i. How much do you expect to get per trial ? How
much do you expect in N trials ?
y
Cit
um
Machine learning is all about finding the expectation of probability, which value is most likely to occur
based on probability.
ant
So, what is the difference between average and expectation, both are same thing but when data
becomes very large, we say AVG tends to expectation.
Answer : In this type of question we assign one random variable to whatever they have asked for here
they are asking about number of tries to get 6. Hence, X = Number of rolls to get 6. And we want E[X]
expected value of this X. Now, two cases are possible, we can get 6 on first trial with probability 1/6
or we not get 6 and count first roll and move to second roll. This is process will be infinite.
2) What is the expected number of tries to get HH and TT when tossing coin ?
Engineering Mathematics
Answer : In this type of question we use tree method, we cannot use first tree. We create two more
variable. And find their expectation first then we calculate main expectation asked in question.
Where X1 and X2 represents Number of tries to get HH and TT given that first toss is Head and Number
of tries to get HH and TT given that first toss is Tail respectively.
y
Cit
We solve this equation and get E[X1] = E[X2] = 3. And E[X] = 3.
3) Let X1, X2 be independent. Bernoulli random variables with parameter p (i.e., they are
independent and satisfy P(Xi=1)=p, P(Xi=0)=1−p) Find E[X12X2]?
Answer : If you find combined variable like X1+X2 or X12X2 do not afraid. Just Make truth table and find
respective values and then make new pdf of new random variable. Let’s solve this question.
um
ant
c) Cumulative distribution function (CDF) : It is the sum of probability. As the name indicates
cumulative. It is denoted as 𝐹(𝑋) = 𝑃(𝑋 ≤ 𝑥)
Qu
Example, find the cumulative distribution function of the total of heads obtained in four tosses of a
balanced coin. – Here they are asking about cumulative the total of head obtained in four tosses,
which means they want sum of probabilities of 0, 1, 2, 3, 4. Because maximum 4 heads can appear in
4 tosses.
𝐹(0) = 𝑃(𝑋 ≤ 0) = 𝑃(0), 𝐹(1) = 𝑃(𝑋 ≤ 1) = 𝑃(0) + 𝑃(1), similarly, for the rest of function up to
4. Hence, the cumulative distribution function is given by,
Observe that this distribution function is defined not only for the values taken on by the given random
variable, but for all real numbers. For instance, we can write F(1.7) = 5/16 and F(100) = 1, although the
probabilities of getting at most 1.7 heads or at most 100 heads in four tosses of a balanced coin may
not be of any real significance.
y
Now let’s see the behavior of PMF and CDF :
Cit
um
Which means you can get PMF from CDF and vice versa.
ant
d) Variance : Mean is the way to summaries data and variance is one other way. Together they
serve as good summary of data. Variance is average distance from mean. Variance tells us that
how spread-out our data is. Variance is always nonnegative.
Qu
Consider two cases, in first we have 3 different point and average is 0. In second case we have new 3
different points and average is 0 as well.
But we know that two cases are different. So, expectation does not give you info about data enough.
That’s why we introduced a new concept which tells us about spread of data from expectation.
Similarly, there is concept of covariance which captures some other data. Here we don’t go in depth.
∑(𝑥 − 𝑚𝑒𝑎𝑛)2
𝑉𝑎𝑟(𝑋) = = 𝑚𝑒𝑎𝑛((𝑥 − 𝑚𝑒𝑎𝑛)2 ) = 𝐸[(𝑥 − 𝐸[𝑥])2 ]
𝑛
Example, consider the random variable X. which has the PMF
1
𝑝𝑋 (𝑥) = {9 𝑖𝑓 𝑥 𝑖𝑠 𝑎𝑛 𝑖𝑛𝑡𝑒𝑔𝑒𝑟 𝑖𝑛 𝑡ℎ𝑒 𝑟𝑎𝑛𝑔𝑒 [−4,4]
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find out mean and variance
Answer : You can solve for mean using xp(x) for formula but here -4 to 4 have same probability and 0
otherwise, which means this is PMF is symmetric so means is 0. Var(X)=E[X2].
60
𝐸[𝑥 2 ] = ∑𝑥 2 𝑃(𝑥) =
9
We can simplify the formula of variance.
y
𝑉𝑎𝑟(𝑥) = 𝐸[(𝑥 − 𝐸[𝑥])2 ] = 𝐸[𝑥 2 − 2. 𝑥. 𝐸[𝑥] + 𝐸[𝑥]2 ] = 𝐸[𝑥 2 ] − 𝐸[𝑥]2
Properties of expectation and variance :
• Y = aX + b
E[Y] = a E[X] + b Cit
Var[Y] = Var[aX] = a2 Var[X] (adding constant doesn’t affect spread of data)
• E[X1 + X2] = E[X1] + E[X2]
Var[X1 + X2] = Var[X1] + Var[X2]. (if X1 and X2 are independent)
um
e) Standard deviation : square of standard deviation is variance. 𝜎 2 = 𝑉𝑎𝑟(𝑋)
ant
3.4.2) Discrete random variable : Our usual PMF is popular and there exists other types of PMF which
are not so popular. Example of random variable : Bernoulli, Binomial, Poisson, Uniform. These are all
example of PMF but have different features.
Qu
a) Bernoulli Random Variable : For all its simplicity, the Bernoulli random variable is very
important. It takes on two values, 1 and 0. It takes on a 1 if an experiment with probability p
resulted in success and a 0 otherwise.
1 𝑖𝑓 𝑎 ℎ𝑒𝑎𝑑.
Example, 𝑋 = {
0 𝑖𝑓 𝑎 𝑡𝑎𝑖𝑙.
𝑝 𝑖𝑓 𝑥 = 1.
𝑃(𝑋 = 𝑥) = { Find expectation and variance.
1 − 𝑝 𝑖𝑓 𝑥 = 0.
Answer : E[X] = p and E[X2] = p, Var[X] = E[X2] – E[X]2 = p(1-p)
Engineering Mathematics
Example, P(H) = p. and X is number of heads in n tosses. Then we can distribute this n tosses in to k
success and n-k failure. Consider another example, P(success) = P and X : binomial Random variable
and represents number of success in n trials.
𝑛
∴ 𝑃(𝑋 = 𝑘) = ( ) 𝑝𝑘 (1 − 𝑝)𝑛−𝑘
𝑘
Now we find expectation and variance of binomial random variable.
y
We cannot easily find expectation by usual method we have to use another intuition. Why because
𝑛
Cit
𝐸[𝑋] = ∑𝑥𝑃(𝑥) = ∑𝑥 ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥
𝑥
This is very difficult to solve so, we split our Binomial random variable into many Bernoulli random
variable.
In Bernoulli distribution we have two unknown or also called parameter namely n and p. because k is
case type variable, where we put different value of k depending upon question or data.
Question :
1) Your probability class have 300 students and each student has probability 1/3 of getting an A.
independently of any other student. What is the mean of X, then number of students that get
an A?
2) If you want exactly k heads (success) then what should be the probability ?
Answer : let P(H) = p, 𝑃(𝑋 = 𝑘) = (𝑛𝑘)𝑝𝑘 (1 − 𝑝)𝑛−𝑘 question ask us for the ideal or value of p for
which we have exactly k heads. Which means we have to maximize this function (P(X = k)) with respect
to p. Let P(X=k) = f(p). We take log on both sides,
Engineering Mathematics
𝑛
log(𝑓(𝑝)) = log (( )) + 𝑘 log(𝑝) + (𝑛 − 𝑘) log(1 − 𝑝)
𝑘
We take differentiation, if we maximize logx or x only it will give us same value but here we are using
log strategy because it simplifies our problem. You will get p = k/n as answer.
Now, one thing to note that if your probability of success is high then your probability vs number of
success will go more to the right. Because it has high chance that you would get success after more
trials.
Example of such graph would be n = number of gate aspirants and p = probability to get good rank.
y
(less)
Cit
c) Poisson distribution : In previous we saw cases where n is very large and p is very small. This
is Poisson distribution is same where n tends to infinite and p tends to zero. This is very
common case and that is why it is reasonable to create another type of distribution.
An RV X has the Poisson distribution with parameter λ, where λ > 0, if the PMF of X is
𝑃(𝑋 = 𝑘) =
𝑒 −λ λk
, 𝑘 = 0,1,2, …
um
𝑘!
Note that here X can be seen as representing the number of events occurring in a fixed interval where
events occur randomly throughout the interval
ant
Question :
Qu
1) On an average there are 2 accidents per day, then what is probability for 4 accidents on a
given day ?
Answer : Here, X = No. of events in a certain duration. Accident is nothing but events so we have
written here a very general case. Average is nothing but means or expectation which is given here as
2 which indirectly implies that λ = 2. Because expectation = λ in Poisson distribution.
𝑒 −2 24
∴ 𝑃(𝑥 = 4) =
4!
2) How do I identify a Poisson RV i.e. X ?
Answer : Some examples of random variables that generally obey the Poisson probability law :
𝑒 −4 44
∴ 𝑃(𝑥 = 4) = ≈ 0.195
4!
d) Uniform distribution : Uniform distribution is common in both discrete and continuous
random variable. So, when we talk in discrete, we say discrete uniform random variable and
for continuous, continuous uniform random variable.
A random variable X has a discrete uniform distribution if each of the n values in its range, say x1, x2,
1
x3, …, xn has equal probability. Then, 𝑃(𝑥𝑖 ) = where P(x) represents the probability mass function
𝑛
y
(PMF).
1
𝑃(𝑥) = 𝑁 , 𝐸[𝑋] =
𝑁+1
2
, 𝑉𝑎𝑟[𝑋] =
𝑁 2 −1
12 Cit
Example, If x is uniformly distributed on the set {1,2,3…,N} then
um
ant
3.4.3) Continuous random variable : In discrete random variable we were having probability mass
function but in continuous random variable we have probability density function because of
continuous nature of random variable.
𝒃
Probability density function : A random variable has a PDF f(x) if 𝑷(𝒂 ≤ 𝑿 ≤ 𝒃) = ∫𝒂 𝒇(𝒙)𝒅𝒙 for all
Qu
+∞
a, b. and for valid pdf : 𝑷(𝑿 ≤ ∞) = ∫−∞ 𝒇(𝒙)𝒅𝒙 = 𝟏
f(x) is likelihood of event. We can compare probabilities using f(x) between two events. It will give you
information about x that it is more likely that your random variable x is between 3 and 6 than 6 and 7.
Comparatively it can tell which range is more likely.
Engineering Mathematics
𝑎
What is the intuition behind 𝑃(𝑎 ≤ 𝑋 ≤ 𝑎) = ∫𝑎 𝑓(𝑥)𝑑𝑥 = 0 ? – This is called point probability and
it is zero because there is infinite value between any range and question is asking about specific
probability which means there is 1/infinity chances meaning zero.
Answer : first we integrate and equate to 1 we will get c then we apply limits and get the final answer.
a) Uniform distribution :
1
𝑐 = 𝑏−𝑎 , 𝑎 ≤ 𝑥 ≤ 𝑏
𝑓(𝑥) = { probability α length of interval.
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
y
b) Normal distribution : also known as Gaussian distribution. Arises naturally in physical
phenomena. In case of population, decay, etc. It is so natural and normal that is why it is said
Cit
to be normal distribution. Symmetrical distribution (around mean).
um
ant
There is a type of normal distribution called standard normal distribution where means is 0 and
standard deviation is 1.
Qu
𝜆𝑒 −𝜆𝑥 𝑖𝑓 𝑥 ≥ 0
𝑓(𝑥) = {
0 𝑖𝑓 𝑥 < 0
Is said to be an exponential random variable. E[X]=1/λ, Var[X]=1/λ2
3.4.4) Mean, Mode and Median : Mean is the average of number. Mode is the most frequently
number and Median is the middle number in sorted sequence of number. Why it is required ? – When
probability is given then we use random variable. And when row data is given then we use 3M. Median
shows the center of all data in number line.
y
Cit
um
ant
Qu