0% found this document useful (0 votes)

4 views

EMQC_Notes

Uploaded by

Dhananjay Chaurasia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

EMQC_Notes

Uploaded by

Dhananjay Chaurasia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Engineering Mathematics

1. Linear Algebra

1.1) Why Study Linear Algebra :

By using linear algebra, we can convert any row data into useful data or data which is easily
interpretable by particular machine. Every type of data can be represented as matrices or as vectors.
Linear algebra is fuel of machine learning.
Word Embedding : Representing a word with vector.
1.2) Basic Terminology :

Scaler : Any number

Vector : Which have direction and magnitude →
Linear combination of vector : (addition of Scaler times vectors)

y
Cit
um
Linear dependent vectors : A finite set of a vector space is said to be linearly dependent (LD) if there
ant

exists a set of scalers k1, k2, k3, …, kn such that, k1u1 + k2u2 + … + knun = O (zero vector)
Remember set of scalers can be zero or nonzero.
Linear independent vectors : If none of the vectors can be written as a linear combination of the others.
OR A set of vectors is linearly independent if the only linear combination of the vectors that equals 0
Qu

is the trivial linear combination (i.e. all coefficient = 0).

Vector Space : Collection of vectors (it should always have zero vector). Example R2, R3, R4.
The row space of an m×n matrix A is the subspace of Rn spanned by rows of A. The column space of A
is a subspace of Rm spanned by columns of A.
Filling the space (Span) : Any two linearly independent vector can fill R2. Because we can form any
vector using these linearly independent vectors. This is same as two equation and two variables. In
general, “Any n linearly independent vector can fill Rn space.”

Questions :
1) If a set of vectors are LD then all vectors can be represented by a linear combination of set of
vectors ?
Answer : Surprisingly the answer is NO. Consider these vector set {0, v}, {(1, 0, 0), (0, 1, 0), (0, 0, 0)}.
In first case we cannot represent v as linear combination of vector 0 for v ≠ 0. And in second set we
cannot represent (1, 0, 0) or (0, 1, 0) as linear combination. Here you may observe a pattern in each
Engineering Mathematics

case there are no non-zero LD vectors so, the correct answer to this question is all vectors can be
represented by a linear combination of set of vectors if and only if it contains at least 1 non-zero LD
vector.
2) can we have more than 2 independent vectors in R2. Answer is NO. Because if you have two
independent vectors then we can construct any vector in R2 using these two-independent
vectors. If we add some vector to this set, we would have set having dependent vectors. If we
extend this idea then we cannot have more than n independent vector in Rn space. Or in other
word “If a subset of Rn contains more than n vectors, then the subset is linearly dependent.”
3) what if we have 3 vectors or less than 3 vectors in R3 space are they independent or dependent
? For 2 vectors in R2 space is easy we just have to put ratio but here in above case we cannot
use, so we have another method which we will learn later.

Note :
• A single element set {v} is linearly independent if and only if v ≠ 0.
• Another meaning of Ax = b. We know that A is group of vectors. So, this means b is linear
combination of columns of A. Not only that but If we have set of A and b then this set is

y
linearly dependent provided columns of A are linearly independent.
• b is column space of A means b can be represented as linear combination of columns of A.

1.3)
x+y=2
System of linear equations :
 Linear
2x – 3y = 1.5  Linear
x2 + y = 5  Non-linear
Cit
• Here A is called coefficient matrix, and if you include b into it, it is called argument matrix.
um
x – xy = 4  Non-linear
Group of these linear equation known as system of linear equation. We can represent this system of
linear equation in the form of Ax = b.
x1 – 2x2 = -1
ant

-x1 + 3x2 = 3
This system of linear equation can be represented in two formats :
1 −2 𝑥1 −1 1 −2 −1
[ ][ ] = [ ] 𝑥1 [ ] + 𝑥2 [ ] = [ ]
−1 3 𝑥2 3 −1 3 3
This system of linear equation can have
Qu

Case 1 : Unique solution

x1 – 2x2 = -1
-x1 + 3x2 = 3
x1 = 3, x2 = 2. i.e. Unique solution if each column of A is linearly independent. It also means two lines
intersect at one point.
Case 2 : No solution
x1 – 2x2 = -1
-x1 + 2x2 = 3
i.e. No solution if column of A is linearly dependent. It also means two lines are parallel or does not
intersect at all.
Case 3 : Infinite solution
x1 – 2x2 = -1
-x1 + 2x2 = 1
i.e. Infinite solution if column of A and b is linearly dependent. It also means two lines are same.

Questions :
Engineering Mathematics

1) If Ax = 0 has some solution then what can you say about linearly dependency of columns of A.
Answer : If solution is trivial then columns of A are independent. If they are nontrivial then it will be
linearly dependent. If they are nontrivial then we can write one row as ax1+bx2+…=0 which is
dependency condition.
2) Suppose a matrix A3x4 contains 3 linearly independent columns, What can you say about the
solutions to Ax = b ? (here b ≠ 0)
Answer : size is 3 x 4 means here we are taking about R3 space. And there are 4 vectors of that space
clearly it is written 3 linearly independent columns so any of the one column is redundant means we
can create that with 3 linearly independent columns so we ignore that column and we always have
solution.
3) Suppose a matrix A5x4 contains 3 linearly independent columns. What can you say about the
solutions to Ax = b ? (here b ≠ 0)
Answer : Size is 5 x 4 means here we are taking about R5 space. And there are 4 vectors out of which
3 are linearly independent columns so remaining 1 vector may or may not be linearly independent.
So, there may be solution to this system or there cannot be a solution. And answer also depends upon
the b as if b is a linear combination of columns of A then solution exists as we have combination of 5

y
and otherwise no solution.
So, in Conclusion we can say

Cit
1. If you can fill space (this also include if b is linear combination of columns of A) → always
solution. Because we can create b from A means we can create solution i.e. b out of A.
2. If you cannot fill space → cannot be inferred
4) Consider a matrix A with dimension mxn. For a system Ax = b, what can you say about below
statement ? (note that b may or may not be zero)
Statement : if m<n then Ax = b always has solution. (False)
um
Answer : Because nothing can be given about columns of A i.e. if they are linearly independent.
Statement : If m>n then none of the system Ax = b has solution. (False)
Answer : Because one of the columns of A can be linearly dependent upon b so we can have solution.
ant

In general, we can say that,

1) Solutions of Ax = b, Amxn has m linearly independent columns ?
Yes → There is always a solution.
No → May or may not be solutions.
Qu

2) Solution of Ax = b, b is linear combination of columns of A?

Yes → There is always a solution (Unique or infinite)
Columns of A are LI Yes → Unique
No → infinite
No → There is never a solution (No solution)

1.4) Matrix multiply by matrix :

Multiplying matrix with one vector is nothing but applying linear transformation to that one vector.
Similarly multiplying one matrix with second matrix is nothing but applying linear transformation of
one matrix to the all columns of second matrix. Remember columns of matrix represents vectors only.
1 2 𝑎 𝑏 𝑎 + 2𝑐 𝑏 + 2𝑑 1 2 1 2
[ ][ ]=[ ] = [𝑎 [ ] + 𝑐 [ ] 𝑏 [ ] + 𝑑 [ ]]
3 4 𝑐 𝑑 3𝑎 + 4𝑐 3𝑏 + 4𝑑 3 4 3 4

Questions :
Engineering Mathematics

Answer : ∑𝑛𝑖=1 𝑐𝑖𝑎𝑖 = 0 this condition means columns of A are linearly dependent as we can represent
ai using other combination of columns of A (please look carefully). Now, it is also given that
∑𝑛𝑖=1 𝑎𝑖 = 𝑏 this means we have solution as we can represent b as linear combination of columns of
A. Now, as b is linear combination of columns of A. Columns of A are dependent so there can be many
combinations of dependency of columns of A, I mean all columns can be same or some are same so
there are infinitely many combinations. So, there are infinitely many solutions.

1.5) Gaussian Elimination : An algorithm for solving systems of linear equations.

y
Gaussian elimination is nothing but converting original matrix to echelon form of matrix.
1.5.1) Echelon form of matrix (or Row echelon form) : (This method is used to calculate rank of matrix)

Cit
• All nonzero rows are above any rows of all zeros
• All entries in a column below a leading entry are zero
• The leading entry of any row occurs to the right of the leading entry of the row above it. (take
second row as an example, leading entry of any row occurs – Take second row, to the right of
the leading entry of the row above it. – row above it means first row and right of the first row
is second which is in the right side.)
um
ant

Meaning of third point : box represents some non-zero values

and * represents any value (including zero)

Example of echelon form :
0 −3 3 4 5 −2
5 1 −6 1
0 0 0 1 7 3
(0 1 −2 0 ) , ( )
0 0 0 0 0 −1
0 0 0 −6
0 0 0 0 0 0
Examples which are not in echelon form :
1 0 5 0 1 0 3 2
[0 0 0 0] , [0 0 1 5]
0 −1 0 1 0 1 0 3

1.5.2) Pivot and free variables :

In row echelon form – A variable whose coefficient is leading nonzero is called a pivot or basic
variable. Otherwise, the variable is known as a free variable. Free variables are also known as NULLITY.

Questions :
1) Which are basic variables (pivot) and which are free variables in given augmented matrix ?
Engineering Mathematics

Answer :
1 −3 0 −1 −2 0
1 4 3 0 0 1 2 0 2
1 0 0 0 0 1 0 3 3
[0 0 1 −3 1] , [ ],[ ] , [ 0 0 1 5]
0 1 0 0 0 0 1 1 −1
0 0 0 0 0 0 1 0 3
0 0 0 0 0 0
First here variables mean columns and we ignore last columns as it is augmented matrix (read question
carefully) last column represents “b”. Now, we take each matrix one by one.
Matrix 1 : basic variables (columns) : 1,3 ; free variables : 2, 4.
Matrix 2 : basic variables : 1, 2; free variables : NULL.
Matrix 3 : basic variables : 1, 3, 4; free variables : 2, 5.
Matrix 4 : basic variables : 1, 2; free variables : 3.
From this example we can conclude that free variable or free column is always linearly dependent on
pivot columns.

1.5.3) Elementary row operations :

We perform this operation to convert any matrix to row echelon form of matrix.
There are three types of elementary row operations which may be performed on the rows of a matrix

y
:
1) Swap the positions of two rows.
2) Multiply a row by a non-zero scalar.

Cit
3) Add to one row a scaler multiple of another. (We can do this operation because if u = v and x
= y then u + x = v + y and if you want intuitive reason then replacing one vector by resultant
vector doesn’t change its meaning)
um
1.5.4) Row reduced echelon form :
Row reduced echelon form = all pivots are 1. All elements below + above pivots are zero. In row
echelon form we only have all element below are zero but here in row reduced echelon form we have
one more condition that elements above pivots are also zero.
ant

1.5.5) Rank of matrix :

Rank has different representation :
• Linearly independent rows
• Linearly independent columns
Qu

• Pivot elements in echelon form of matrix

• Non-zero rows of an echelon form of matrix

Procedure of finding the rank of matrix :

1) Convert matrix to row echelon form using elementary row operations
2) Then count pivot elements of echelon form of matrix

NOTE :
1) Rank is zero only for zero matrix.
2) Number of variables = Number of columns of coefficient matrix = pivot columns + Free
variables

1.6) Solving System of linear equations :

Types of system of linear equations :
• Homogenous (Ax = 0)
Engineering Mathematics

• Heterogenous (Ax = b)

Question :
1) Solve the following system of equations using Gaussian reduction.
𝑥1 + 𝑥2 − 𝑥3 = 0
𝑥1 − 𝑥2 + 𝑥3 = 2
2𝑥1 − 𝑥2 − 𝑥3 = −3
Answer : Augmented matrix is
1 1 −1 | + 0
[1 −1 1 | + 2]
2 −1 −1 | − 3
Converting it to row echelon form,
1 1 −1 | + 0
[0 −1 1 | + 1]
0 0 1 |+3
∴ 𝑥3 = 3, 𝑥2 = 2, 𝑥1 = 1
2) Solve the following system of equations.

y
5𝑥1 − 11𝑥2 = −2

Answer : Augmented matrix is

Converting it to row echelon form,

[−4
1
Cit
−4𝑥1 + 9𝑥2 = 1
𝑥1 − 2𝑥2 = 1

5 −11 | − 2
9 | + 1]
−2 | − 1
um
1 −2 | − 1
[0 1 | − 3]
0 0 |+0
∴ 𝑥2 = −3, 𝑥1 = −7. This is beautiful example, if you encounter 0 0 |0 in any row then it is not
ant

always that solution is infinite. If pivot element = n then unique solution regardless of 0 0… | 0.

3) Consider converting augmented matrix to row echelon form you come up with following
3 5 −4 | + 0
[0 −3 0 | + 0]
Qu

0 0 0 |+0
Clearly, this is not in row echelon form as it fails to satisfy third point of form. So, what to do now ?
Answer : if you see final row it says 0 = 0 (which is true)
Very first step : Identify the free variables and assign a constant parameter to that. Clearly third
column or variable is free variable. If there are more than one free variable then assign more than one
parameter to them.
∴ 𝑥3 = 𝑘 𝑎𝑛𝑑 𝑥2 = 0
3𝑥1 + 5𝑥2 − 4𝑥3 = 0
3𝑥1 − 4𝑘 = 0, 𝑥1 = 4𝑘/3
4𝑘/3
𝑥=[ 0 ]
𝑘
So, how many solutions are possible = infinite
How many Linearly independent solutions are possible = 1
Here number of free variables is 1 that is why you have one independent solution which means one
nullity. Which means determinant is zero (presence of free variable implies zero determinant).
Engineering Mathematics

Here b = 0 now consider, b = some real number i.e. heterogenous system of equation. Then the value
of x would be x + some constant. This constant represents the perpendicular distance between two
resultant line of Ax = 0 and Ax = b.

If there are free variables then there are infinite solution

4) Consider converting augmented matrix to row echelon form you came up with following
1 −3 1 |+4 1 −3 1 | + 4
[−1 2 −5 | + 3] → [0 −2 −4 | + 7]
5 −13 13 | + 8 0 0 0 |+2
Answer : If you see carefully last row is 0 = 2 which is false. Means there is no point of solving these
equations simultaneously because equation itself is wrong after gaussian elimination. So, there is no
solution.

5) Why sometimes solution exist and sometimes not?

Answer : if you see in 3rd question, we are getting 0 is third row what does that means ? That means

y
0 stretch along z axis. There is decrease in column space from R3 to R2 which means decrease in rank.
So in R3 space we have R2 space filled which represents plane with infinite solution.

Cit
If you see in 4th question, we have zero in last row. Now b represents that after applying linear
transformation third variable should land on 2 point of third axis. Which is impossible like how third
vector has 0 stretch and can land on 2 which is non-sense so no solution exists.
Sometimes the required vector b itself represents whole plane in that case solution is infinite.
But here the catch in homogenous system we do not have nonzero b so, in homogenous system of
equation we always have solution (trivial or non-trivial).
um
6) Find numbers a, b, c and d such that the linear system corresponding to the augmented
matrix.
1 2 3 |𝑎
ant

[0 4 5 |𝑏 ]
0 0 𝑑 |𝑐
has a) no solution, and b) infinitely many solutions.
Answer : a) for no solution, d = 0, c ≠ 0, a and b could be anything.
b) for infinitely solution, d = 0, c = 0, a and b could be anything.
Qu

7) Rank (A) ≠ Rank (A|b) then system has ? or same question with Rank (A) = Rank(A|b) then
system has?
• No solution
• Unique solution
• Infinite solution
Answer : Consider a first case when b is Linear combination of columns of A. In that case the rank (A)
= Rank (A|b). And because we can create b from A we always have solution. Remember solution exist
meaning there can be unique or infinite solution.
Now, consider second case when b is not linear combination of columns A. In that case the rank (A) ≠
Rank (A|b). because we cannot produce b from A solution doesn’t exist. And we are adding new
dimension to row space of A. The rank of matrix will be increase by one. So, Rank(A|b) = 1 + Rank(A)
if b is Linearly independent.
Engineering Mathematics

Now, let’s look at another point of view. In question 6, we have first case where c ≠ 0, we know that
we cannot create nonzero c from any columns of A. Means b is Linearly independent. So, in first case
in question 6, rank(A) + 1 = Rank(A|b) and in second case if rank(A) = Rank(A|b) = n then there is
unique solution. And if Rank(A) = Rank (A|b) < n then obviously we have free variables so there is
infinite solution.

8) Steps to check whether one solution is linear combination of set of vectors.

Answer : We know that if for Ax = b, solution exists then b is linear combination of set of vectors (i.e.
A) This is same as our question. Which means we can use this structure (i.e. Ax = b) to find whether
one vector is linear combination of other vectors. Write every information that you have in the form
of [A|b], i.e. A and b represent set of vectors and solution set respectively. Now, convert this [A|b]
into row echelon form and count free variable. If you found any free variable then there is infinite
solution. And if there are no free columns, we will get unique solution which represents that there is
linear combination of set of vectors i.e. A which gives as b.

1.7) Determinant, inverses, eigenvalues and eigenvectors :

y
In this section we talk only about square matrix. Let’s start
1.7.1) Determinant : It represents area in R2 space and volume in R3 space.
Some properties of determinant :
1. Determinant of identity matrix is 1.

Example : |
𝑡𝑎 𝑡𝑏
𝑐 𝑑
| = 𝑡|
𝑎 𝑏
𝑐 𝑑
| and |
𝑐
Cit
2. The determinant changes sign when two rows (or two columns) are exchanged.
3. Linearity for one row (or one column) at a time
𝑎 + 𝑎′ 𝑏 + 𝑏′ = 𝑎 𝑏 + 𝑎′ 𝑏′
𝑑
| |
𝑐 𝑑
| |
𝑐 𝑑
|
um
See this carefully, |𝑎 + 𝑎′ 𝑏 + 𝑏′| = |𝑎′ 𝑏 | + |𝑎 𝑏′| This is also same as we can
𝑐 𝑑 𝑐 𝑑 𝑐 𝑑
interchange in addition sign.
𝑎 𝑏 2𝑎 2𝑏
And one more observation, in matrix we have 2 [ ]=[ ] but in determinant only
𝑐 𝑑 2𝑐 2𝑑
ant

we can do this to only one row or column.

• If any row or columns are linearly combination of row or columns respectively then
determinant is zero. This property can be obtained by first 3 properties.
• Det(AB) = Det(A) x Det(B) Provided A and B should be square matrix of same order.
Qu

Questions :
1) Now, we can calculate determinant of any matrix using only first 3 properties. But how?
Answer : We know that we can convert any matrix into row echelon form when we end up getting
upper triangular matrix and then we can apply these 3 properties to get determinant in which only
one entry per row is there so, after that we apply 3rd property to get desired value of determinant.

2) And we end up having more determinants which adds up to the value of determinants. We
conclude by observation that if we expand nxn determinant then total nn determinants and
n! terms will survive and other will die (evaluate to zero).

The cofactor formula :

The determinant is the dot product of any row (or column) i of A with its cofactors using other rows.
𝐷𝑒𝑡(𝐴) = 𝑎11𝐶11 + 𝑎12𝐶12 + ⋯ + 𝑎1𝑛𝐶1𝑛
And yes, we are calculating determinants using this formula since childhood.
Cofactor (Cij) = (-1)i+j Minorij
Engineering Mathematics

NOTE :
1) Above cofactor formula is also tells us some useful property that when we multiply element
and cofactor of same row and then add them, we get determinant of matrix.
Ex. 𝑫𝒆𝒕(𝑨) = 𝒂11𝑪𝟏𝟏 + 𝒂𝟏𝟐𝑪𝟏𝟐 + ⋯ + 𝒂𝟏𝒏𝑪𝟏𝒏. This also known as big formula
2) Whereas, when we multiply element and cofactor of different row and then add them, we
get zero.
Ex. 𝑫𝒆𝒕(𝑨) = 𝒂11𝑪𝟐𝟏 + 𝒂𝟏𝟐𝑪𝟐𝟐 + ⋯ + 𝒂𝟏𝒏𝑪𝟐𝒏
3) Matrix is singular if its determinant is zero.

1.7.2) Inverse of matrix :

Inverse of a matrix is a matrix which if we multiply this matrix to original matrix, we get identity matrix.
It is denoted as A-1.
Proof of inverse formula :
Consider one problem,
𝑎11 𝑎12 𝑎13 𝐶11 𝐶21 𝐶31

y
𝐴 = [𝑎21 𝑎22 𝑎23] 𝑎𝑛𝑑 𝑻𝒓𝒂𝒏𝒔𝒑𝒐𝒔𝒆𝒐𝒇(𝐶𝑜𝑓𝑎𝑐𝑡𝑜𝑟(𝐴)) = [𝐶12 𝐶22 𝐶32]
𝑎31 𝑎32 𝑎33 𝐶13 𝐶23 𝐶33

Using observation from NOTE, we get

Cit
Look closely the second matrix. We call this Transposeof(Cofactor(A)) = adjoint(A). So, question is what
is A x Adj(A) = ?

𝐴 × 𝐴𝑑𝑗(𝐴) = [ 0
|𝐴| 0

0
|𝐴|
0 |𝐴|
1 0 0
0
0]
um
∴ 𝐴 × 𝐴𝑑𝑗(𝐴) = |𝐴| [0 1 0]
0 0 1
𝐴𝑑𝑗(𝐴)
∴𝐴× =𝐼
|𝐴|
ant

𝐴𝑑𝑗(𝐴)
Which means |𝐴|
= 𝑖𝑛𝑣𝑒𝑟𝑠𝑒𝑜𝑓(𝐴)

𝑎 𝑏 1 𝑑 −𝑏
Inverse of 2x2 matrix : 𝐴 = [ ] ℎ𝑎𝑠 𝑖𝑛𝑣𝑒𝑟𝑠𝑒 [ ]
𝑐 𝑑 𝑎𝑑−𝑏𝑐 −𝑐 𝑎
Qu

Questions :
1) Why rectangular matrix does not have determinant ?
Answer : Determinant gives us useful information about almost everything from eigen values to
inverse. Also, determinant gives us information about contribution of each axis of resultant vector to
its space. Now, consider if some matrix always has zero determinant. Such matrix is useless and we
never calculate its determinant or inverse or something related to it. Now consider 2 x 3 matrix which
has zero contribution to third axis and thus its determinant is always zero or not applicable. Now,
consider 3 x 2 matrix, such matrix represents plane in ℝ𝟑 space. That is why we don’t calculate
determinant of rectangular matrix.
2) Determinant of inverse and adjoint of matrix.
1 𝑑 −𝑏
Answer : Consider, 2 x 2 matrix and its inverse 𝑎𝑑−𝑏𝑐 [ ].
−𝑐 𝑎
|𝐴−1 | = (
1 𝑑 −𝑏 1 1
)2 | | = 𝑎𝑑−𝑏𝑐 = |𝐴|.
𝑎𝑑−𝑏𝑐 −𝑐 𝑎
𝐴𝑑𝑗(𝐴)
We know that, |𝐴| = 𝐴−1 , 𝐴𝑑𝑗(𝐴) = 𝐴−1 × |𝐴|
Engineering Mathematics

|𝐴𝑑𝑗(𝐴)| = ||𝐴| × 𝐴−1 |

|𝐴𝑑𝑗(𝐴)| = |𝐴|𝑛 × |𝐴−1 | = |𝐴|𝑛−1

Cramer’s rule :

Proof : Consider Ax = b. and let A-1 exist (unique solution)

∴ 𝑥 = 𝑖𝑛𝑣𝑒𝑟𝑠𝑒(𝐴) × 𝑏
1 1 𝐶11 𝐶12 𝐶13 𝑏1
∴𝑥= 𝑎𝑑𝑗(𝐴) × 𝑏 ∴𝑥= [𝐶21 𝐶22 𝐶23] [𝑏2]
|𝐴| |𝐴|
𝐶31 𝐶32 𝐶33 𝑏3
1 𝐶11𝑏1 + 𝐶12𝑏2 + 𝐶13𝑏3
∴𝑥= [𝐶21𝑏1 + 𝐶22𝑏2 + 𝐶23𝑏3]
|𝐴|
𝐶31𝑏1 + 𝐶32𝑏2 + 𝐶33𝑏3
Now, closely look at big formula. Row of matrix x is same as big formula. I mean in the same formate.
First row, second row and third row are the determinant of following matrix :
𝑏1 𝑎12 𝑎13 𝑎11 𝑏1 𝑎13 𝑎11 𝑎12 𝑏1
𝐴1 = [𝑏2 𝑎22 𝑎23] , 𝐴2 = [𝑎21 𝑏2 𝑎23] , 𝐴3 = [𝑎21 𝑎22 𝑏2]

y
𝑏3 𝑎32 𝑎33 𝑎31 𝑏3 𝑎33 𝑎31 𝑎32 𝑏3
|𝐴1|
1
∴𝑥=

Cit
|𝐴|
[|𝐴2|]
|𝐴3|
um
ant

1.7.3) Eigen values and eigen vectors :

When you apply linear transformation on some vector v, if you get same vector scaled by some factor.
Then that vector is known as eigen vector and factor is called eigen value.
𝑨𝒗 = 𝝀𝒗 such that 𝝀 ∈ ℝ, 𝒗 ≠ 𝟎, 𝑨 𝒊𝒔 𝒔𝒒𝒖𝒂𝒓𝒆 𝒎𝒂𝒕𝒓𝒊𝒙
Qu

But why are they always in the limelight? – It turns out that several properties of matrices can be
analyzed based on their eigenvalues. And in machine learning there is concept called regularization
which uses concept of eigen values and eigen vector. Which is used by every machine learning model.

There are infinite eigen vectors for any matrix but the question is how many of them are LI ?
Calculating eigenvalues and eigen vectors :
𝑨𝒗 = 𝝀𝒗
∴ 𝐴𝑣 − 𝜆𝑣 = 0
∴ (𝐴 − 𝜆𝐼)𝑣 = 0
But v ≠ 0, so det(A-𝜆I) = 0. Determinant is zero that means there are free variable and infinite solution.

Question :
1) Eigenvectors from different eigenvalues are linearly independent.
Answer : Consider 𝜆1, 𝜆2 are the eigenvalues corresponding to eigenvectors e1 and e2 respectively.
Then if 𝜆1 ≠ 𝜆2 then e1 and e2 are linearly independent. This we have to prove.
Engineering Mathematics

Proof : for contradiction let’s say, e1 = ke2 (assume)

Ae1 = 𝜆1e1 = 𝜆1ke2 and Ae1 = Ake2 = 𝜆2ke2
∴ 𝜆1𝑘𝑒2 = 𝜆2𝑘𝑒2
𝜆1 = 𝜆2
Take contrapositive of this you get if 𝜆1 ≠ 𝜆2 then e1≠e2 which means they are linearly independent.

2) If 𝜆1 ≠ 𝜆2 then corresponding eigenvectors are linearly independent. And if

𝜆1 = 𝜆2 or if eigen values are repeating then we can get either one or two linearly
independent vectors. In general, we can say that if (𝝀 − 𝝀𝟏 )𝒎𝟏 (𝝀 − 𝝀𝟐 )𝒎𝟐 … (𝝀 − 𝝀𝒌 )𝒎𝒌 is
the characteristic equation then it can have at least k Linearly independent eigenvectors and
at most ∑𝒌𝟏 𝒎𝒊 Linearly independent eigenvectors. Here ∑ 𝑚 is called Arithmetic multiplicity
(AM) and No. of LI eigenvectors corresponding to 𝜆 = 𝜆1is called Geometric multiplicity
(GM). One thing to note that AM ≥ GM. Because Arithmetic multiplicity is the at most, number
of linearly independent eigenvector.

3) If number of free variables in (A- 𝜆1I)x = 0 are 2 then what is geometric multiplicity of 𝜆1?

y
Answer : As free variables defines number of LI eigenvectors GM = 2 and AM = at least 2. Free variable
= number of LI eigenvectors why ? because we assign variable k to row which is zero or to free variable.
This variables like k,s,t decides eigenvectors.

Cit
Now, we have seen that if (𝜆 − 𝜆1 )𝑚1 then we can have at least one or at most m1 LI eigen vectors
but is there any special matrices for which AM = GM. Let’s see.
In other words – Are there matrices An x n which will always have n independent eigenvectors (even
when one or more eigenvalues are repeating) ?
um
Yes!, symmetric matrices.

Real symmetric matrices :

Properties : if n real eigenvalues then n orthogonal eigenvectors, all eigenvectors are LI even
ant

if eigenvalues are repeating and Aij = Aji which mean AT = A. How can single eigenvalue have more
than two eigen vectors ? – It is possible when consider AG = 2. And GM should be 2 as well (as it is
symmetric matrix) so eigenvector have form v = ku1 + lu2 where k, l is constant and u1 and u2 are LI
eigenvectors. If the n × n matrix A is symmetric then eigenvectors corresponding to different
Qu

eigenvalues must be orthogonal to each other. Furthermore, in this case there will exist n linearly
independent eigenvectors for A, so that A will be diagonalizable.

Some properties of Eigenvalues :

• Determinant of any matrix is product of eigen values.
• Trace (sum of elements in main diagonal) of matrix is sum of eigen values.

1.8) Some important properties of Eigen values and matrix :

1.8.1) Rank and eigen values :
Is there any relationship between Rank and eigen values of A. If eigenvalue of A ≠ 0 then rank = n. and
if eigenvalues of A = 0 then rank < n.
Because if eigenvalue is 0 then determinant is definitely zero. And Determinant only become zero
when two or more columns are dependent which means there is decrease in row space.

Question :
1) Can Ax = 0 have a unique non-trivial solution ?
Engineering Mathematics

Answer : x = 0 is trivial solution. This always exist in case of homogenous equation. That means rows
are linearly dependent because we always have c1x1 + c2x2 + … + cnxn = 0. Means we always have
infinite many solutions because we can have infinite many combinations for each solution of x. So, we
can replace each c1 or c2 or c3 by different equation which means we have more than one solution
but question asks for unique solution. Which cannot exist.
One interesting fact is Ax = 0 is nothing but characteristic equation corresponding to 𝜆 = 0.
Read question no 3 carefully, No of LI eigenvectors for 𝜆 = 0 are 2 that means we are talking about
free variable which is nothing but nullity so, another definition of NULITY = No. of eigenvectors in
characteristics equation corresponding to 𝝀 = 𝟎.

If you do not understand previous part revise from starting

2) Let A be 5 x 5 matrix and one of the eigenvalues of A is 0. It is also known that there are 4
linearly independent eigenvectors corresponding to 0 eigenvalue. What is Rank of A?
Answer : If one of the eigenvalues of A is 0. That means rank < 5. And 4 LI eigenvectors are
corresponding to 0 which is nothing but Nullity or number of free variables. Which means rank = 1.

y
3) Let A be 15 x 15 matrix and one of its eigenvalues is zero. What is rank of A ?

4)
Cit
Answer : one of its eigenvalues is zero which means rank < 15. But no other information is given so
rank cannot be determined.

Let A be 15 x 15 matrix and one of its eigenvalues is zero. It is also known that there are 10
linearly independent eigenvector of A. What is rank of A?
Answer : 10 linear independent eigenvectors of A is given which may or may not be corresponds by
um
eigenvalue zero. Which means rank cannot determined.

1.8.2) Cayley-Hamilton theorem : It says every square matrix satisfies its own characteristics equation.
From Cayley-Hamilton theorem, we can also say that if eigenvalue of A is 𝜆 then eigenvalue of An is 𝜆n.
ant

1.8.3) Eigenvalues of AB and BA :

Let x and 𝜆 be the eigenvector and eigenvalues of AB respectively. Then 𝐴𝐵𝑥 = 𝜆𝑥, 𝐵𝐴𝐵𝑥 = 𝜆𝐵𝑥.
That means Bx is eigenvectors of BA and 𝜆 is an eigenvalue. That means AB and BA shares their
Qu

eigenvalues.

Question :
1) Let A be 3 x 7 and B be 7 x 3 and eigenvalues of AB are 1, 2, 4. Then what will be eigenvalues
of BA ?
Answer : As from 1.8.3, AB and BA shares same eigenvalues so eigenvalues of BA will be 1, 2, 4 and 4
0’s. Why 4 0’s because dimension of BA is 7x7 so after copying non-zero eigenvalues all remaining
eigenvalues must be zero to accommodate.

2) Let A be 4 x 3 and B be 3 x 4 then AB must have at least one zero eigenvalues. It one eigenvalue
is not zero then BA would be wrong. One more reasoning is that A has 3 columns and the
columns of AB are linear combination of A only. Because AB is 4 x 4 and is generated using A.
Means A has 3 columns and using these 3 columns and AB contains 4 columns. Now, these 4
columns should be linearly dependent as these 4 columns are generated from 3 LI columns of
A or LD columns of A. that means determinant of AB is zero which means one of the
eigenvalues of AB should be zero. As determinant is product of eigenvalues.
Engineering Mathematics

3) Consider A be 10x10 matrix having rank 2. So, Nullity is 8. Rank is 2 means determinants should
be zero. Because you have 10 dimensions of which only 2 dimensions are dominating so
determinant is zero. Determinant is nothing but product of eigenvalues. Which means one of
the eigenvalues is zero. And This nullity is nothing but no of linearly independent eigenvector
corresponding to these zero eigenvalues.

NOTE :
1) Single eigen value can have multiple LI eigenvectors. You have written this but above but
still.
2) If there is pivot element in any column then it is LI also. So, if every row contains pivot
element means that columns can have ability to produce any vector in that space.
3) When solving Ax = b question, also take help from pivot and free variable.
4) If Ax = b has unique solution then A has to be invertible. This seems correct at first glance
but consider A matrix having 3 x 2. (See page no 6 Q2) Here unique solution exists but A is
not invertible since it is not square matrix. So, this sentence is false.

y
5) Unit eigenvector is nothing but vector divided by magnitude. Same as unit vector. Consider
x be the unit eigenvector then x.xT is 1 because x.xT is nothing but magnitude of unit vector
which is nothing but 1.

1.8.4) LU decomposition :
Cit
6) A matrix is diagonalizable if and only if it has n LI eigenvectors.

In LU decomposition, we convert matrix into Lower triangular and Upper triangular matrix. Off course,
we cannot convert every matrix so, there is condition “A matrix must be able to be reduced to row-
um
echelon form U, without interchanging any rows.”

There are two methods to find LU decomposition :

1) Formula method : (Remember how L and U look like this is same in both method)
ant

𝐴11 𝐴12 𝐴13 1 0 0 𝑈11 𝑈12 𝑈13

𝐴
[ 21 𝐴22 𝐴 𝐿
23 ] = [ 21 1 0 ] × [ 0 𝑈22 𝑈23 ]
𝐴31 𝐴32 𝐴33 𝐿31 𝐿32 1 0 0 𝑈33
𝑈11 𝑈12 𝑈13
= [𝐿21 × 𝑈11 𝐿21 × 𝑈12 + 𝑈22 𝐿21 × 𝑈13 + 𝑈23 ]
Qu

𝐿21 × 𝑈11 𝐿31 × 𝑈12 + 𝐿32 × 𝑈22 𝐿31 × 𝑈13 + 𝐿32 × 𝑈23 + 𝑈33
2) Interesting method :
1 4 −3
𝐴 = [−2 8 5 ]
3 4 7
First, we convert this matrix into row echelon form,
1 4 −3 𝑅2 +2𝑅1 1 4 −3 𝑅3 −3𝑅1 1 4 −3 𝑅3 +0.5𝑅2 1 4 −3
𝐴 = [−2 8 5 ] → [0 16 −1] → [0 16 −1] → [0 16 −1 ]
3 4 7 3 4 7 0 −8 16 0 0 15.5
In first transformation, we multiplied row 1 by -2 and subtracted it from row 2. So, we write that -2 in
L. similarly in next step, we multiplied row 1 by 3 and subtracted it from row 3. So, we write that 3 in
L. That’s how we fill L. and at the end we get some random matrix which is nothing but U.
1 0 0 1 4 −3
𝐿 = [−2 1 0] , 𝑈 = [0 16 −1 ]
3 −1/2 1 0 0 15.5
Engineering Mathematics

One thing to note that If we multiply diagonal entries of upper triangular matrix you will get
determinant of original matrix. This is one of the methods to find out determinant but use only row
transformation and no row swapping.

1.8.5) Types of matrices :

1) Identity : The identity matrix of size n is the n x n square matrix with ones on the main diagonal
and zeros elsewhere.
Inverse of a matrix : A and B (square matrices n x n) are inverse of each other iff AB = BA = I.
• (A-1)-1 = A
• (kA)-1 = k-1 A-1 for nonzero scalar k;
• (AB)-1 = B-1 A-1
Transpose of matrix :
• (A+B)T = AT + BT
• (AB)T = BT AT
• (cA)T = cAT
• The dot product of two column vectors a and b can be computed as the a.b = aTb

y
• (AT)-1 = (A-1)T.

2) Lower/Upper triangular matrix :

Cit
um
A matrix can be both L/U triangular.
ant

3) Diagonal : Entries outside the main diagonal are all zero.

4) Symmetric : AT = A
5) Skew-Symmetric : AT = -A
6) Orthogonal : Orthogonal matrix, is a real square matrix whose columns and rows are
Qu

orthonormal vectors. Q-1 = QT or QTQ = I. A and B are orthogonal if A.B = ATB=0. And if it also
unit vector then it is called orthonormal. So, orthonormal = orthogonal + unit vector.
7) Idempotent : An idempotent matrix is a matrix which, when multiplied by itself, yield itself
A2=A. also An = A for any n>0.
8) Nilpotent : It is a square matrix N such that Nk = 0 for some positive integer k.
9) Involuntary : A-1 = A.
10) Hermitian : The conjugate transpose of a complex matrix A, denoted by A*, is given by A* =
(Ā)T. Where the entries of A are the complex conjugates of the corresponding entries of A.
Hermitian when A* = A.
11) Skew-Hermitian : A* = -A
Engineering Mathematics

2. Calculus
2.1) Limits :
lim 𝑓(𝑛)
𝑓(𝑛) 𝑛→𝑎
lim 𝑓(𝑛); lim =
𝑛→𝑎 𝑛→𝑎 𝑔(𝑛) lim 𝑔(𝑛)
𝑛→𝑎

LHL : lim− 𝑓(𝑛) = lim 𝑓(𝑎 − ℎ)

𝑛→𝑎 ℎ→0

RHL : lim+ 𝑓(𝑛) = lim 𝑓(𝑎 + ℎ)

𝑛→𝑎 ℎ→0

While solving limits, we encounter many unusual forms of limits. Some of are given below :
∞ 0
, ,∞−
∞ 0
∞, 00 , 0 . ∞, ∞0 , 1∞

We call these indeterminate forms in limits.

𝟎
2.1.1) Methods for solving 𝟎
:

y
I. Factorization : For factorization, we follow some formula like,

Cit
𝑎3 + 𝑏 3 = (𝑎 + 𝑏)(𝑎2 − 𝑎𝑏 + 𝑏 2 )
𝑎3 − 𝑏 3 = (𝑎 − 𝑏)(𝑎2 + 𝑎𝑏 + 𝑏 2 )
um
ant
Qu

Example : Solve following limits using factorization.

𝑥 3 −1 (𝑥−1)(𝑥 2 +𝑥+1)
1) lim = =3
𝑥→1 𝑥−1 (𝑥−1)
3
𝑥 2 −√𝑥 √𝑥(√𝑥 −1) √𝑥(√𝑥−1)(𝑥+√𝑥+1)
2) lim = = =3
𝑥→1 √𝑥−1 (√𝑥−1) (√𝑥−1)

II. Rationalization :
(𝑥−1)(√𝑥−1) (𝑥+1)(√𝑥−1) (𝑥+1)(√𝑥−1) 2
Example : lim 2𝑥 2 +𝑥+3
= 2𝑥2 +3𝑥−2𝑥+3 = (𝑥−1)(2𝑥+3) = 2×5 = 0.2
𝑥→1
Engineering Mathematics

Frequently used limits : (FUL)

𝑠𝑖𝑛𝑥 𝑒𝑥 − 1
lim =1 lim =1
𝑥→1 𝑥 𝑥→1 𝑥
log (1 + 𝑥) 𝑎𝑥 − 1
lim =1 lim = ln (𝑎)
𝑥→1 𝑥 𝑥→1 𝑥
bx2 +15𝑥+15+𝑏
Is there a number b such that lim 𝑥 2 +𝑥−2
exists ? if so, find the value of b and the value of the
𝑥→−2
limit. – we know that if we put x = -2 at the denominator we get 0 which makes this limit do not exists
but see carefully, we can write 𝑥 2 + 𝑥 − 2 = (𝑥 − 1)(𝑥 + 2) now if we somehow eliminate this (x+2)
term. For that we have to make factors of numerator which should include (x+2). If we do, we
encounter one remainder which should be zero. Through which we get b = 3.
∞
2.1.2) Solving ∞ form :
∞
∞
case is possible in polynomials, we take highest power common, make numerator or denominator
1

y
as 𝑥 form by taking 𝑥 𝑚 common.

Example
𝑥→∞
√3𝑥 2 +2
: lim 𝑥−2 =
2
√3+ 2
𝑥
2
(1− )
𝑥
= √3

Cit
2.1.3) Solving ∞ − ∞ form : Try to combine first ∞ and second ∞ terms.

2.1.4) Solving 𝟏∞ form :

um
𝑙 = lim (1 + 𝑓(𝑥)) 𝑔(𝑥)
𝑥→𝑎

Consider as 𝑥 → 𝑎 in f(x), f(a) tends to zero.

𝑔(𝑥) × log(1 + 𝑓(𝑥)) × 𝑓(𝑥)
ant

∴ log (𝑙) = lim

𝑥→𝑎 𝑓(𝑥)
log(1 + 𝑓(𝑥))
∴ log (𝑙) = lim 𝑓(𝑥). 𝑔(𝑥) × lim
𝑥→𝑎 𝑥→𝑎 𝑓(𝑥)
Qu

lim 𝑓(𝑥).𝑔(𝑥)
∴ 𝑙 = 𝑒 𝑥→𝑎
𝐥𝐢𝐦 𝒈(𝒙)(𝒇(𝒙)−𝟏)
𝐥𝐢𝐦 𝒇(𝒙)𝒈(𝒙) = 𝒆𝒙→𝒂
𝒙→𝒂

This is only applicable when f(x) → 1 when x→a. Because in l equation it is 1+f(x) and here only f(x).

2.1.5) L-hospital rule : All the FUL are derived from L-hospital rule only.

2.2) Continuity :

A function f(x) is continuous at x = a if lim 𝑓(𝑥) = 𝑓(𝑎) OR LHL = RHL = value of function at a.
𝑥→𝑎

Example :
Engineering Mathematics

√1 + 𝑘𝑥 − √1 − 𝑘𝑥
, −1 ≤ 𝑥 < 0
𝑓(𝑥) = 𝑥
2𝑥 + 1
{ , 0≤𝑥<1
𝑥−1
Find k. if f(x) is continuous at x = 0.

Answer : f(0) = -1 because we select 2nd at 2nd eq., we have 0<=x so, x is equal to 0 in this function.
√1+𝑘𝑥−√1−𝑘𝑥
LHL = lim− 𝑥
= lim−𝑘 = −1
𝑥→0 𝑥→0

NOTE :

1) If F and G are continuous at x = a then F(x)±G(x), F(x).G(x), F(x)/G(x) is also continuous off
course G(a) ≠ 0 in last form.
2) Composite function F(G(x)) is continuous when G(x) is continuous at x = a and f(x) is
continuous at x = G(a).

y
Is it possible that two functions are not continuous at some point but their addition is continuous at
that point ? – Let's suppose we have two functions, f(x) and g(x), defined as follows:

f(x) = 1/x, for x ≠ 0

g(x) = -1/x, for x ≠ 0 Cit

Both f(x) and g(x) are discontinuous at x = 0 because their values approach positive and negative
infinity, respectively, as x approaches 0. However, if we consider their sum h(x) = f(x) + g(x), we get:
um
h(x) = (1/x) + (-1/x) = 0. A constant is always continuous. This example illustrates that the discontinuity
of individual functions does not necessarily imply the discontinuity of their sum.
ant

2.3) Differentiability :
𝑓(𝑎−ℎ)−𝑓(𝑎) 𝑓(𝑎−ℎ)−𝑓(𝑎)
Slope = = =𝐴
𝑎−ℎ−𝑎 −ℎ

𝑓(𝑎+ℎ)−𝑓(𝑎) 𝑓(𝑎+ℎ)−𝑓(𝑎)
Qu

Also, slope = 𝑎+ℎ−𝑎

= ℎ
=𝐵

For differentiability criteria should be A = B. OR

𝑔(𝑥), 0 < 𝑥 ≤ 𝑎
If 𝑓(𝑥) = { is differentiable at a
ℎ(𝑥), 𝑎 < 𝑥 ≤ 𝑏
Then, f’(a-) = f’(a+), which means

f’(a-) = g’(a) and f’(a+) = h’(a)

𝑑 𝑓′ (𝑥)
If f is differentiable, then 𝑑𝑥 √𝑓(𝑥) = ? – This is false don’t take determinant. This question asks
2√𝑓(𝑥)
us if f is differentiable then differentiation of square root of f should also be differentiable. But we
have counter example, when f(x) = x then at x = 0 f is differentiable but sqrt(x) is not differentiable at
x = 0.

2.4) Maxima and Minima :

Engineering Mathematics

y
Cit
um
Here u has criteria of ILATE.
ant

Tic-tac-toe method :
Qu

Approximate by integrals :

When a summation has the form ∑𝑛𝑘=𝑚 𝑓(𝑘), where 𝑓(𝑘) is a monotonically increasing function, we
can approximate it by integrals:
Engineering Mathematics

𝑛 𝑛 𝑛+1
∫ 𝑓(𝑥)𝑑𝑥 ≤ ∑ 𝑓(𝑘) ≤ ∫ 𝑓(𝑥)𝑑𝑥
𝑚−1 𝑘=𝑚 𝑚

When f(k) is a monotonically decreasing function, we can use a similar method to provide the
bounds

y
𝑛+1 𝑛 𝑛
∫
𝑚
𝑓(𝑥)𝑑𝑥 ≤ ∑
𝑘=𝑚

Cit
𝑓(𝑘) ≤ ∫
𝑚−1
𝑓(𝑥)𝑑𝑥

For maxima and minima f’(x) = 0. Which means if there is maxima or minima then at that point f’(x) =
0 but if f’(x) = 0 then there is maxima or minima ? – This is false statement because take f(x) = X3, it’s
f’(x) = 0 at x = 0 but it is neither minima nor maxima.
um
Point of inflection : It is a point from which function is changing from concave to convex or vice versa.
At that point f’’(x) = 0. But we know that it is not point of minima and maxima. Critical points are
points where f’(x) = 0.
ant

Example :

F(x) = x3 – 3x + 3

F’(x) = 3x2 – 3 = 0
Qu

X = +1, -1.

Which point is local minima and local maxima? For that we put this value in f’(x) and calculate different
value of F’(x) and then we put those value in lines where we have +1 and -1 is represented. Like this

You can also use second derivative method.

Putting values after taking second derivative. Sagging if value is positive and hogging if value is
negative.
Engineering Mathematics

2.5) Theorems on continuity and differentiability :

Theorem 1 (Intermediate value theorem). If f is a continuous function on the closed interval [a,b], and
if d is between f(a) and f(b), then there is a number c Є [a,b] with f(c) = d.

If d = 0, we have useful logic. F is continuous and f(a) and f(b) have opposite sign means there exists
point where c is intersecting x axis.

A function f(x) is continuous in the interval [0, 2]. It is known that f(0) = f(2) = -1 and f(1) = 1. Which
one of the following statements must be true? –

There exists a y in the interval (0, 1) such that f(y) = f(y+1)

For every y in the interval (0, 1),f(y) = f(2-y).

The maximum value of the function in the interval (0, 1) is 1

There exists a y in the interval (0, 1) such that f(y) = -f(2-y)

Answer : for option A, let’s say it is false and we have f(y) – f(y+1) = g(y). Now, from note 2.2 first point

y
we know that g(y) is also continuous between (0, 1) so we take g(0) = f(0) – f(1) = -2 and g(1) = f(1) –

Cit
f(2) = 1 + 1 = 2. So, g(0) * g(1) < 0 means there is a point in between 0 and 1 when g(y) is zero which
means for y between 0 to 1 f(y) = f(y+1) hence option A is correct. Similarly, option d is also correct.

Theorem 2 (Roll’s Theorem). Suppose f is continuous on [a,b] and differentiable on (a,b), and suppose
that f(a) = f(b). then there is a number c Є [a,b] with f’(c) = 0.

Theorem 3 (The Mean Value Theorem). Suppose f is continuous on [a,b] and differentiable on (a,b).
um
Then there is a number c Є [a,b] with (f(b) – f(a))/(b-a) = f’(c).
ant
Qu
Engineering Mathematics

3. Probability

3.1) Introduction :

Axioms of probability :

1. Nonnegativity : P(A) ≥ 0
2. Normalization : P(Ω) = 1
3. Additivity : If A ∩ B = φ (disjoint), then P(A U B) = P(A) + P(B)
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑙𝑒𝑚𝑒𝑛𝑡 𝑜𝑓 𝐴 𝑛(𝐴)
Where 𝑃(𝐴) = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 𝑝𝑜𝑖𝑛𝑡𝑠
= 𝑛(𝑆)

3.1.1) The Inclusion Exclusion Principle :

For any two events, E and F: 𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸) + 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹)

For three events, E, F, and G that formula mathematically expands to :

y
𝑃(𝐸 ∪ 𝐹 ∪ 𝐺) = 𝑃(𝐸) + 𝑃(𝐹) + 𝑃(𝐺) − 𝑃(𝐸 ∩ 𝐹) − 𝑃(𝐸 ∩ 𝐺) − 𝑃(𝐹 ∩ 𝐺) + 𝑃(𝐸 ∩ 𝐹 ∩ 𝐺)

3.2) Conditional Probability :

Cit
Conditional probability is nothing but “Change in belief”. Example of conditioning :

Dice tossing : We consider the following situation

• We throw 2 dice
um
• We look for P(Sum of 2 faces is 9)

Without prior information : P(sum of 2 faces is 9) = 1/9.

Now, consider B : It is given that first face is 6.

ant

So, now what is probability P(sum is 9 | B) = 1/6.

𝑃(𝐴∩𝐵)
𝑃(𝐴|𝐵) = 𝑃(𝐵)
We call this as “Conditional probability of A given B”.
Qu

3.2.1) Introduction to tree diagram :

From above equation we have, 𝑃(𝐴𝐵) = 𝑃(𝐵) × 𝑃(𝐴|𝐵). We call this as joint distribution.

We can extend this idea to tree cases, 𝑃(𝐴𝐵𝐶) = 𝑃(𝐶) × 𝑃(𝐵|𝐶) × 𝑃(𝐴|𝐵𝐶)

It also equals to 𝑃(𝐵𝐶𝐴) = 𝑃(𝐴) × 𝑃(𝐶 |𝐴) × 𝑃(𝐵|𝐶𝐴) = 𝑃(𝐵) × 𝑃(𝐶 |𝐵) × 𝑃(𝐴|𝐵𝐶) similarly you
can generate many formulas. We call this as Factorization of joint distribution.

Tree diagram for conditional probability (useful in sequential models) :

Many experiments have an inherently sequential character, such as for example –

• Tossing a coin three times

• Observing the value of a stock on five successive days
• Receiving eight successive digits at a communication receiver.
Engineering Mathematics

In above diagram if you don’t understand RHS, then you know if you have to find final probability you
multiply all probability that occurs along the way. So RHS is nothing but that thing only. It is
multiplication of all cases probability.

3.2.2) Conditional probability satisfies axioms :

y
1. Nonnegative : P(A|B) ≥ 0
2. Normalization : P(Ω|B) = 1

Cit
3. Additivity : 𝑃(𝐴 ∪ 𝐵|𝐶) = 𝑃(𝐴|𝐶) + 𝑃(𝐵|𝐶) You can easily prove this by using de’ morgen’s
formula. Provided A and B are disjoint event.

3.2.3) Marginalization (also called total probability) :

Mutually exclusive : Two events are mutually exclusive or disjoint if they do not occur at the same
time. Which mean 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵).
um
Mutually exhaustive : When a sample space S is divided into many mutually exclusive events such
that their union forms the entire sample space, these events are said to be mutually exhaustive events.
ant

Which means mutually exhaustive needs to be first mutually exclusive and then union should cover S.
Qu

From above you can see that 𝐵 = (𝐴1 ∩ 𝐵) ∪ (𝐴2 ∩ 𝐵) ∪ …

So, 𝑃(𝐵) = 𝑃(𝐴1 ∩ 𝐵) + 𝑃(𝐴2 ∩ 𝐵) + ⋯ This is called Marginalization.

Question :

1) 𝑃(𝐴|𝐵) + 𝑃(𝐴′ |𝐵) = 1 or 𝑃(𝐴|𝐵) + 𝑃(𝐴|𝐵′ ) = 1 which one Is true ?

Answer : Off course first is true. You can prove it by expanding terms or by intuition.

2) An unbalanced dice (with 6 face numbered from 1 to 6) is thrown. The probability that the
face value is odd is 90% of the probability that the face value is even. The probability of getting
any even numbered face is the same. If the probability that the face is even given that it is
Engineering Mathematics

greater than 3 is 0.75, which one of the following options is closest to the probability that the
face value exceeds 3?

Answer : Given that, P(odd) = 0.9 x P(even), P(2)=P(4)=P(6), P(even|>3)=0.75, P(>3)=?

P(odd)+P(even) = 1, P(even)=1/1.9, P(2)=P(4)=P(6)=1/5.7

P(even|>3) = P(4,6) / P(>3), P(4, 6) = P(4) + P(6) you do addition because 𝑃(𝑒𝑣𝑒𝑛 ∩ > 3) = 𝑃(4 ∪ 6)

P(>3) = 0.468

3) 𝑃(𝐸|𝐹) = 𝑃(𝐸|𝐹, 𝐺)𝑃(𝐺|𝐹) + 𝑃(𝐸|𝐹, 𝐺 𝑐 )𝑃(𝐺 𝑐 |𝐹) is this true ?

Answer : here were introducing some variable G or set G into existing set. We have seen introducing
set into existing set in marginalization,

i.e. 𝑃(𝐸) = 𝑃(𝐸 ∩ 𝐺) + 𝑃(𝐸 ∩ 𝐺 𝑐 ) = 𝑃(𝐸|𝐺). 𝑃(𝐺) + 𝑃(𝐸|𝐺 𝑐 ). 𝑃(𝐺 𝑐 )

But in question it is given E|F so in previous equation we put |F after every variable. We get,

y
𝑃(𝐸∩𝐹∩𝐺) 𝑃(𝐸∩𝐹∩𝐺 𝑐 )
i.e. 𝑃(𝐸|𝐹) = 𝑃(𝐸, 𝐺|𝐹) + 𝑃(𝐸, 𝐺 𝑐 |𝐹) = 𝑃(𝐹)
+ 𝑃(𝐹) then we apply formula from 3.2.1,
then cancel P(F) then we get this formula.

4) True / False ? 𝑃(𝐸, 𝐹, 𝐺) ≤ min {𝑃(𝐸), 𝑃(𝐹), 𝑃(𝐺)}

um
Answer : We know that, when 𝐴 ⊆ 𝐵 then 𝑃(𝐴) ≤ 𝑃(𝐵) because B covers more or equal element
than A. Similarly, we know that, (𝐸 ∩ 𝐹 ∩ 𝐺) ⊆ 𝐸, (𝐸 ∩ 𝐹 ∩ 𝐺) ⊆ 𝐹, (𝐸 ∩ 𝐹 ∩ 𝐺) ⊆ 𝐺 so,

𝑃(𝐸) ≥ 𝑃(𝐸 ∩ 𝐹 ∩ 𝐺), 𝑃(𝐹) ≥ 𝑃(𝐸 ∩ 𝐹 ∩ 𝐺), 𝑃(𝐺) ≥ 𝑃(𝐸 ∩ 𝐹 ∩ 𝐺) Minimum of these should also
ant

be less or equal.

5) If the occurrence of event F makes event E more likely, then the occurrence of E necessarily
makes F also more likely.

Answer : Now, how to express more likely. It is very simple. Usual probability is P(E) now if you say
Qu

“some event” is more likely then, we express it as P(E) <= P(some event). Similarly, here they are asking
if P(E|F) >= P(E) then P(F|E) >= P(F). Is it true ?

First term means 𝑃(𝐸 ∩ 𝐹) ≥ 𝑃(𝐸). 𝑃(𝐹) then we can get second as well. Which means true.

3.3) Bayes theorem :

Let A1, A2, A3, …, An be disjoint events that form a partition of the sample space, and assume that
P(Ai) > 0, for all I, for any event B such that P(B) > 0, we have
𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 ) 𝑃(𝐴𝑖 )𝑃(𝐵|𝐴𝑖 )
𝑃(𝐴𝑖 |𝐵) = =
𝑃(𝐵) 𝑃(𝐴1 )𝑃(𝐵|𝐴1 ) + ⋯ + 𝑃(𝐴𝑛 )𝑃(𝐵|𝐴𝑛 )
3.3.1) Independence of events :

Independence of two events

Engineering Mathematics

A : I had a sandwich for breakfast

B : It will rain today

Occurrence of A provides no information about B.

𝑷(𝑩|𝑨) = 𝑷(𝑩)
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵|𝐴) = 𝑃(𝐴). 𝑃(𝐵)
Which mean Events A and B are independent if 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵)

y
You can see second diagram it says independent event but it is not. There must be some information
given about A and B. So, when 𝑃(𝐴) = 0.3, 𝑃(𝐵) = 0.4, 𝑃(𝐴 ∩ 𝐵) = 0.12, then A and B is called

Cit
independent event unless not. You cannot say the independency of two event just by looking at Venn
diagram, some information must be given.

NOTE : 𝑷(𝑨 ∩ 𝑩) = 𝑷(𝑨). 𝑷(𝑩) This equation does not directly imply independency, it is just
consequences of 𝑷(𝑩|𝑨) = 𝑷(𝑩). Independency means 𝑷(𝑩|𝑨) = 𝑷(𝑩).

Question :
um
1) Can two disjoint events be independent ?

Answer : If two disjoint events are there, they are called mutually exclusive event means 𝑃(𝐴 ∩ 𝐵) =
0. And independent means 𝑃(𝐴). 𝑃(𝐵) = 𝑃(𝐴 ∩ 𝐵). Which means if either A or B is zero then two
ant

disjoint events can be independent.

2) If A and B are independent then what can you say about A’, B’.

Answer : We know that 𝑃(𝐴) = 𝑃(𝐴 ∩ 𝐵) + 𝑃(𝐴 ∩ 𝐵′ ) = 𝑃(𝐴). 𝑃(𝐵) + 𝑃(𝐴 ∩ 𝐵′ )

𝑃(𝐴)(1 − 𝑃(𝐵)) = 𝑃(𝐴 ∩ 𝐵′ ) = 𝑃(𝐴). 𝑃(𝐵′)

Which means if A and B are independent then A and B’ are independent, A’ and B are independent,
A’ and B’ are independent.

Similarly, we can extent this idea to three events being independent.

Three event A, B, C are called independent events when all of four condition satisfies :

𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵)

𝑃(𝐵 ∩ 𝐶) = 𝑃(𝐵). 𝑃(𝐶)
𝑃(𝐴 ∩ 𝐶) = 𝑃(𝐴). 𝑃(𝐶)
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴). 𝑃(𝐵). 𝑃(𝐶)
Yellow and green colored sentence or equation does not imply each other. Similar for 4 events.
Engineering Mathematics

3) Are A and A’ independent if 0 < P(A) < 1 ?

Answer : They are mutually exclusive. So, they are only independent if A or A’ are zero. Off course
both cannot be zero at the same time.

3.3.2) Conditional independence :

If events A and B are independent then P(A) = (A|B) or P(B) = P(B|A) or P(AB) = P(A).P(B)

Two events A and B are independent given C : 𝑃(𝐴 ∩ 𝐵|𝐶) = 𝑃(𝐴|𝐶). 𝑃(𝐵|𝐶)

Above sentence does not means A and B are independent. It says when it is given C then they are
independent. So, if above equation is true then it does not imply that P(A) = P(A|B) or P(B) = P(B|A).
which is true when A and B are independent but in this nothing about C or any other condition is given.

If 𝑃(𝐴𝐵) = 𝑃(𝐴). 𝑃(𝐵|𝐴) then 𝑃(𝐴𝐵|𝐶) = 𝑃(𝐴|𝐶). 𝑃(𝐵|𝐴𝐶). This if and then is not true I’m just
observing pattern and deriving one equation. You can see why it is not true in question section.

𝑃(𝐵|𝐴𝐶) = 𝑃(𝐵|𝐶) 𝑜𝑟 𝑃(𝐴|𝐵𝐶) = 𝑃(𝐴|𝐶) this is true if A and B are independent given C.

y
Similar for independence, in conditional independence we also have some conditions :

Question :
Cit
If A and B are independent given C then A and B’ are independent given C then A and B’ are
independent given C, A’ and B are independent given C, A’ and B’ are independent given C.

1) Assume A and B are independent i.e. 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵). Can we also say that A and B
are independent given C.
um
ant

Answer :

Let’s say A and B are independent. i.e. 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵). If you now say given C then two events
will become disjoint as there is no common region between A and B when given that C. so, that’s why
Qu

both statements does not imply each other.

3.4) Probability distribution :

3.4.1) Random variable : It is the function that maps an outcome to the real number.

Consider, rolling a dice. Sample space is {1, 2, 3, 4, 5, 6} and you have number line. Then Random
variable is mapping these subsets of sample space to real number.

X(2) = 20, X(3) = 30. The meaning of x can be anything for example here, x may represent number of
1 or 2 or 4 on dice.
Engineering Mathematics

But then how are we writing X = 3 as you have seen last year ? – X = 3 is actually a notation which
represents { w | X(w) = 3}. Means Set of all outcomes whose value is 3.

Consider rolling a dice,

Sr. No. probability X(.) – random variable

1 1/6 1
2 1/6 1
3 1/6 2
4 1/6 2
5 1/6 2
6 1/6 3
Now, P(outcome is 3) = ? – here outcome is given which is 3 so we do not have to find set here set
itself is given so P(outcome is 3) = 1/6. Now, say P(X=2) = P(set of all outcomes whose value is 2) =
P({w | X(w) = 2}) = P(3, 4, 5) = P(3) + P(4) + P(5) = ½.

Question :

y
1) True/false ? – Suppose we have two events X = a, X = b. We can say that both of the events
are mutually exclusive.

Cit
Answer : It is true. Because single outcome does not map to double value of X.

2) True/false ? – Suppose X maps any out come to either a, b or c. Then X = a, X = b and X = c,

these 3 events are mutually exhaustive ?

Answer : Yes, because X only maps to a, b, c and these set contains some events and as reason
um
explained in 1st question, these 3 events should be mutually exhaustive.

3) Consider we coin toss twice, and let X = number of head after both flips, X1= number of heads
on first flip, X2 = number of heads on second flip then P(X<2) = ? and P(X1+X2<2) = ?
ant

Answer :
Qu

4) Let X and Z are two random variables. Y = X Z. Find P(Y=y), P(Y=1|X=0)

Answer :
Engineering Mathematics

5) Given joint distribution

a. Find distribution of X and Y.
b. For X = 2 find conditional distribution P(Y|X)

Answer : Part a is done as we have calculated distribution of X and Y. Now for second It says conditional
distribution P(Y|X) for X = 2. Which means P(Y | X = 2) = P(Y=1|X=2) + P(Y=2|X=2) + … . Remember

y
here summation of terms represents conditional probability it is not an operator between Y and X.

NOTE :

Cit
1) A probability distribution is valid if sum of all probabilities is 1.
2) If min(x1, x2, x3, x4,…)>1 then every term is greater than one because min of these term is
greater then 1. So, value greater than min will also be greater than 1.
um
Types of Random variable :

• Discrete : A random variable is called discrete if takes either finite or countably infinite number
of values. The number of sixes in the two rolls.
ant

• Continuous : A random variable that can be an uncountably infinite number of values. For an
example. Consider the experiment of choosing a point a from the interval [-1, 1].

a) In discrete random variable we have probability distribution function which is called

Probability mass function (PMF).

So, what is probability mass function ? – Listing down probability of each value for a discrete random
variable is called PMF.

Question :

1) Suppose that a coin is tossed twice. Let X represents the number of heads that can come up.
Write down PMF of X.

Answer : As we have seen that definition it is list of probability of each value of discrete random
variable. So, P(X=2) = ¼, P(X=1) = ½, P(X=0) = ¼.

2) Check whether the function given by , P(X=x) = (x+2)/25 for x = 1, 2, 3, 4, 5 is PMF of RV X.

Answer : PMF is P(X=1) = 3/25, P(X=2) = 4/25, … after this you if you add all these probabilities you
will get 1. That means probability distribution is valid.

In summary, Calculation of the PMF of a random variable X :

Engineering Mathematics

For each possible value x of X :

1. Collect all the possible outcomes that give rise to the event {X=x}.
2. Add their probabilities to obtain PX(x).

b) Expectation of Random variable : The PMF of a random variable X provides us with several
numbers, the probabilities of all the possible values of X. Expectation is a way to summaries
in one number. We define the expected value (also called the expectation or the mean) of a
random variable X. with PMF pX(x), by
𝐸[𝑋] = ∑ 𝑥𝑝𝑋 (𝑥).
𝑥

Consider a roll of dice. You get i rupee if outcome is i. How much do you expect to get per trial ? How
much do you expect in N trials ?

y
Cit
um
Machine learning is all about finding the expectation of probability, which value is most likely to occur
based on probability.
ant

So, what is the difference between average and expectation, both are same thing but when data
becomes very large, we say AVG tends to expectation.

Law of Total expectation : Sometimes we can apply recursion to find expectation.

𝐸[𝑋] = 𝐸1 𝑃(𝐴) + 𝐸2 𝑃(𝐴𝑐 )

Questions :

1) What is the expected number of tries to get 6 when rolling dice ?

Answer : In this type of question we assign one random variable to whatever they have asked for here
they are asking about number of tries to get 6. Hence, X = Number of rolls to get 6. And we want E[X]
expected value of this X. Now, two cases are possible, we can get 6 on first trial with probability 1/6
or we not get 6 and count first roll and move to second roll. This is process will be infinite.

As you can see this is iterative process.

2) What is the expected number of tries to get HH and TT when tossing coin ?
Engineering Mathematics

Answer : In this type of question we use tree method, we cannot use first tree. We create two more
variable. And find their expectation first then we calculate main expectation asked in question.

Where X1 and X2 represents Number of tries to get HH and TT given that first toss is Head and Number
of tries to get HH and TT given that first toss is Tail respectively.

y
Cit
We solve this equation and get E[X1] = E[X2] = 3. And E[X] = 3.

3) Let X1, X2 be independent. Bernoulli random variables with parameter p (i.e., they are
independent and satisfy P(Xi=1)=p, P(Xi=0)=1−p) Find E[X12X2]?

Answer : If you find combined variable like X1+X2 or X12X2 do not afraid. Just Make truth table and find
respective values and then make new pdf of new random variable. Let’s solve this question.
um
ant

c) Cumulative distribution function (CDF) : It is the sum of probability. As the name indicates
cumulative. It is denoted as 𝐹(𝑋) = 𝑃(𝑋 ≤ 𝑥)
Qu

Example, find the cumulative distribution function of the total of heads obtained in four tosses of a
balanced coin. – Here they are asking about cumulative the total of head obtained in four tosses,
which means they want sum of probabilities of 0, 1, 2, 3, 4. Because maximum 4 heads can appear in
4 tosses.

X = number of heads in 4 tosses.

This is just probabilities we have not yet found the CDF,

Engineering Mathematics

𝐹(0) = 𝑃(𝑋 ≤ 0) = 𝑃(0), 𝐹(1) = 𝑃(𝑋 ≤ 1) = 𝑃(0) + 𝑃(1), similarly, for the rest of function up to
4. Hence, the cumulative distribution function is given by,

Observe that this distribution function is defined not only for the values taken on by the given random
variable, but for all real numbers. For instance, we can write F(1.7) = 5/16 and F(100) = 1, although the
probabilities of getting at most 1.7 heads or at most 100 heads in four tosses of a balanced coin may
not be of any real significance.

y
Now let’s see the behavior of PMF and CDF :

Cit
um
Which means you can get PMF from CDF and vice versa.
ant

d) Variance : Mean is the way to summaries data and variance is one other way. Together they
serve as good summary of data. Variance is average distance from mean. Variance tells us that
how spread-out our data is. Variance is always nonnegative.
Qu

Consider two cases, in first we have 3 different point and average is 0. In second case we have new 3
different points and average is 0 as well.

But we know that two cases are different. So, expectation does not give you info about data enough.
That’s why we introduced a new concept which tells us about spread of data from expectation.

Similarly, there is concept of covariance which captures some other data. Here we don’t go in depth.

Following have same variance and expectation but different covariance.

Engineering Mathematics

Let’s get back into variance.

∑(𝑥 − 𝑚𝑒𝑎𝑛)2
𝑉𝑎𝑟(𝑋) = = 𝑚𝑒𝑎𝑛((𝑥 − 𝑚𝑒𝑎𝑛)2 ) = 𝐸[(𝑥 − 𝐸[𝑥])2 ]
𝑛
Example, consider the random variable X. which has the PMF
1
𝑝𝑋 (𝑥) = {9 𝑖𝑓 𝑥 𝑖𝑠 𝑎𝑛 𝑖𝑛𝑡𝑒𝑔𝑒𝑟 𝑖𝑛 𝑡ℎ𝑒 𝑟𝑎𝑛𝑔𝑒 [−4,4]
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find out mean and variance

Answer : You can solve for mean using xp(x) for formula but here -4 to 4 have same probability and 0
otherwise, which means this is PMF is symmetric so means is 0. Var(X)=E[X2].
60
𝐸[𝑥 2 ] = ∑𝑥 2 𝑃(𝑥) =
9
We can simplify the formula of variance.

y
𝑉𝑎𝑟(𝑥) = 𝐸[(𝑥 − 𝐸[𝑥])2 ] = 𝐸[𝑥 2 − 2. 𝑥. 𝐸[𝑥] + 𝐸[𝑥]2 ] = 𝐸[𝑥 2 ] − 𝐸[𝑥]2
Properties of expectation and variance :
• Y = aX + b
E[Y] = a E[X] + b Cit
Var[Y] = Var[aX] = a2 Var[X] (adding constant doesn’t affect spread of data)
• E[X1 + X2] = E[X1] + E[X2]
Var[X1 + X2] = Var[X1] + Var[X2]. (if X1 and X2 are independent)
um
e) Standard deviation : square of standard deviation is variance. 𝜎 2 = 𝑉𝑎𝑟(𝑋)
ant

3.4.2) Discrete random variable : Our usual PMF is popular and there exists other types of PMF which
are not so popular. Example of random variable : Bernoulli, Binomial, Poisson, Uniform. These are all
example of PMF but have different features.
Qu

a) Bernoulli Random Variable : For all its simplicity, the Bernoulli random variable is very
important. It takes on two values, 1 and 0. It takes on a 1 if an experiment with probability p
resulted in success and a 0 otherwise.

1 𝑖𝑓 𝑎 ℎ𝑒𝑎𝑑.
Example, 𝑋 = {
0 𝑖𝑓 𝑎 𝑡𝑎𝑖𝑙.
𝑝 𝑖𝑓 𝑥 = 1.
𝑃(𝑋 = 𝑥) = { Find expectation and variance.
1 − 𝑝 𝑖𝑓 𝑥 = 0.
Answer : E[X] = p and E[X2] = p, Var[X] = E[X2] – E[X]2 = p(1-p)
Engineering Mathematics

b) Binomial Random Variable : Repeated independent trials of Bernoulli. A coin is tossed n

times. Probability of head = p. X = number of heads
𝑛
𝑃(𝑋 = 𝑘) = ( ) 𝑝𝑘 (1 − 𝑝)𝑛−𝑘 , 𝑘 = 0,1, … , 𝑛.
𝑘
In Bernoulli random variable, we perform experiment only once and assign 1 and 0 values depending
upon success and failure. Here in binomial random variable, we repeat experiment regardless of
previous experiments i.e. independently.

Example, P(H) = p. and X is number of heads in n tosses. Then we can distribute this n tosses in to k
success and n-k failure. Consider another example, P(success) = P and X : binomial Random variable
and represents number of success in n trials.
𝑛
∴ 𝑃(𝑋 = 𝑘) = ( ) 𝑝𝑘 (1 − 𝑝)𝑛−𝑘
𝑘
Now we find expectation and variance of binomial random variable.

y
We cannot easily find expectation by usual method we have to use another intuition. Why because
𝑛

Cit
𝐸[𝑋] = ∑𝑥𝑃(𝑥) = ∑𝑥 ( ) 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥
𝑥
This is very difficult to solve so, we split our Binomial random variable into many Bernoulli random
variable.

Let, X = Number of heads in n trials.

um
1 𝑖𝑓 𝑎 ℎ𝑒𝑎𝑑 𝑖𝑛 𝑓𝑖𝑟𝑠𝑡 𝑡𝑟𝑖𝑎𝑙. 1 𝑖𝑓 𝑎 ℎ𝑒𝑎𝑑 𝑖𝑛 𝑛𝑡ℎ 𝑡𝑟𝑖𝑎𝑙.
𝑋1 = { … 𝑋𝑛 = {
0 𝑖𝑓 𝑎 𝑡𝑎𝑖𝑙. 0 𝑖𝑓 𝑎 𝑡𝑎𝑖𝑙.
∴ 𝑋 = 𝑋1 + 𝑋2 + 𝑋3 … + 𝑋𝑛
ant

𝐸[𝑋] = 𝐸[𝑋1 ] + ⋯ + 𝐸[𝑋𝑛 ] = 𝑝 + ⋯ + 𝑝 = 𝑛𝑝

Similarly, for 𝑉𝑎𝑟[𝑋] = 𝑉𝑎𝑟[𝑋1 ] + ⋯ + 𝑉𝑎𝑟[𝑋𝑛 ] = 𝑛𝑝(1 − 𝑝). We can split X1, …, Xn because they
are independent. Note that we do not need the independence assumption for the expected value.
Since it is a linear function of RVs, but we need it for variance.
Qu

In Bernoulli distribution we have two unknown or also called parameter namely n and p. because k is
case type variable, where we put different value of k depending upon question or data.

Question :

1) Your probability class have 300 students and each student has probability 1/3 of getting an A.
independently of any other student. What is the mean of X, then number of students that get
an A?

Answer : E[X] = np = 300 x 1/3 = 100

2) If you want exactly k heads (success) then what should be the probability ?

Answer : let P(H) = p, 𝑃(𝑋 = 𝑘) = (𝑛𝑘)𝑝𝑘 (1 − 𝑝)𝑛−𝑘 question ask us for the ideal or value of p for
which we have exactly k heads. Which means we have to maximize this function (P(X = k)) with respect
to p. Let P(X=k) = f(p). We take log on both sides,
Engineering Mathematics

𝑛
log(𝑓(𝑝)) = log (( )) + 𝑘 log(𝑝) + (𝑛 − 𝑘) log(1 − 𝑝)
𝑘

We take differentiation, if we maximize logx or x only it will give us same value but here we are using
log strategy because it simplifies our problem. You will get p = k/n as answer.

Now, one thing to note that if your probability of success is high then your probability vs number of
success will go more to the right. Because it has high chance that you would get success after more
trials.

Example of such graph would be n = number of gate aspirants and p = probability to get good rank.

y
(less)

Cit
c) Poisson distribution : In previous we saw cases where n is very large and p is very small. This
is Poisson distribution is same where n tends to infinite and p tends to zero. This is very
common case and that is why it is reasonable to create another type of distribution.

An RV X has the Poisson distribution with parameter λ, where λ > 0, if the PMF of X is

𝑃(𝑋 = 𝑘) =
𝑒 −λ λk
, 𝑘 = 0,1,2, …
um
𝑘!
Note that here X can be seen as representing the number of events occurring in a fixed interval where
events occur randomly throughout the interval
ant

Mean of Poisson R.V. = λ

Variance of Poisson R.V. = λ

Question :
Qu

1) On an average there are 2 accidents per day, then what is probability for 4 accidents on a
given day ?

Answer : Here, X = No. of events in a certain duration. Accident is nothing but events so we have
written here a very general case. Average is nothing but means or expectation which is given here as
2 which indirectly implies that λ = 2. Because expectation = λ in Poisson distribution.

𝑒 −2 24
∴ 𝑃(𝑥 = 4) =
4!
2) How do I identify a Poisson RV i.e. X ?

Answer : Some examples of random variables that generally obey the Poisson probability law :

• The number of misprints on a page ( or a group of pages) of a book

• The number of people in a community who survive to age 100
• The number of wrong telephone numbers that are dialed in a day
• The number of packages of dog biscuits old in a particular store each day
Engineering Mathematics

• The number of customers entering a post office on a given day

• The number of vacancies occurring during a year in federal judicial system
• The number of alpha particles discharged in a fixed period of time from some radioactive
material.
3) On an average there are 2 accidents per day, then what is probability for 4 accident in 2 days
?
Answer : Here λ = 2, but probability is asking for 4 accidents in 2 days, for 2 days λ = 2 x 2 = 4

𝑒 −4 44
∴ 𝑃(𝑥 = 4) = ≈ 0.195
4!
d) Uniform distribution : Uniform distribution is common in both discrete and continuous
random variable. So, when we talk in discrete, we say discrete uniform random variable and
for continuous, continuous uniform random variable.

A random variable X has a discrete uniform distribution if each of the n values in its range, say x1, x2,
1
x3, …, xn has equal probability. Then, 𝑃(𝑥𝑖 ) = where P(x) represents the probability mass function
𝑛

y
(PMF).

1
𝑃(𝑥) = 𝑁 , 𝐸[𝑋] =
𝑁+1
2
, 𝑉𝑎𝑟[𝑋] =
𝑁 2 −1
12 Cit
Example, If x is uniformly distributed on the set {1,2,3…,N} then
um
ant

3.4.3) Continuous random variable : In discrete random variable we were having probability mass
function but in continuous random variable we have probability density function because of
continuous nature of random variable.
𝒃
Probability density function : A random variable has a PDF f(x) if 𝑷(𝒂 ≤ 𝑿 ≤ 𝒃) = ∫𝒂 𝒇(𝒙)𝒅𝒙 for all
Qu

+∞
a, b. and for valid pdf : 𝑷(𝑿 ≤ ∞) = ∫−∞ 𝒇(𝒙)𝒅𝒙 = 𝟏

What is f(x) intuitively ?

f(x) is likelihood of event. We can compare probabilities using f(x) between two events. It will give you
information about x that it is more likely that your random variable x is between 3 and 6 than 6 and 7.
Comparatively it can tell which range is more likely.
Engineering Mathematics

𝑎
What is the intuition behind 𝑃(𝑎 ≤ 𝑋 ≤ 𝑎) = ∫𝑎 𝑓(𝑥)𝑑𝑥 = 0 ? – This is called point probability and
it is zero because there is infinite value between any range and question is asking about specific
probability which means there is 1/infinity chances meaning zero.

Example, let x be a continuous random variable with the following PDF.

𝑐𝑒 −𝑥 𝑥 ≥ 0
𝑓(𝑥) = { , where c is a positive constant.
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find c and find P(1<X<3)

Answer : first we integrate and equate to 1 we will get c then we apply limits and get the final answer.

a) Uniform distribution :
1
𝑐 = 𝑏−𝑎 , 𝑎 ≤ 𝑥 ≤ 𝑏
𝑓(𝑥) = { probability α length of interval.
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

y
b) Normal distribution : also known as Gaussian distribution. Arises naturally in physical
phenomena. In case of population, decay, etc. It is so natural and normal that is why it is said

Cit
to be normal distribution. Symmetrical distribution (around mean).
um
ant

There is a type of normal distribution called standard normal distribution where means is 0 and
standard deviation is 1.
Qu

c) Exponential Distribution : A continuous random variable whose probability density function

is given, for some λ > 0, by

𝜆𝑒 −𝜆𝑥 𝑖𝑓 𝑥 ≥ 0
𝑓(𝑥) = {
0 𝑖𝑓 𝑥 < 0
Is said to be an exponential random variable. E[X]=1/λ, Var[X]=1/λ2

In short, we can say

Type PMF/PDF E[X] Var[X] X

Bernoulli (p) 𝑝 𝑖𝑓 𝑥 = 1. 𝑝 𝑝(1 − 𝑝) 0, 1
𝑃(𝑋 = 𝑥) = {
1 − 𝑝 𝑖𝑓 𝑥 = 0.
𝑛 𝑛𝑝 𝑛𝑝(1 − 𝑝) 0, 1, 2, … , n
Binomial (m, p) 𝑃(𝑋 = 𝑘) = ( ) 𝑝𝑘 (1 − 𝑝)𝑛−𝑘
𝑘
Poisson (λ) 𝑒 −λ λk 𝜆 𝜆 0, 1, 2, …
𝑃(𝑋 = 𝑘) =
𝑘!
Engineering Mathematics

Uniform([a, b]=N) 1 𝑁+1 𝑁2 − 1 a, a+1, a+2,…, b

𝑃(𝑋 = 𝑥) =
𝑁 2 12
Uniform([a, b]) 1 𝑎+𝑏 (𝑏 − 𝑎)2 [a, b]
𝑓(𝑥) = {𝑐 = 𝑏 − 𝑎 , 𝑎 ≤ 𝑥 ≤ 𝑏 2 12
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Normal (𝜇, 𝜎 2 ) 𝑓(𝑥) =
1 −
1
𝑒 2𝜎2
(𝑥−𝜇)2 𝜇 𝜎2 [−∞, ∞]
√2𝜋𝜎
Exponential (λ) 𝜆𝑒 −𝜆𝑥 𝑖𝑓 𝑥 ≥ 0 1 1 𝑥≥0
𝑓(𝑥) = {
0 𝑖𝑓 𝑥 < 0 𝜆 𝜆2

3.4.4) Mean, Mode and Median : Mean is the average of number. Mode is the most frequently
number and Median is the middle number in sorted sequence of number. Why it is required ? – When
probability is given then we use random variable. And when row data is given then we use 3M. Median
shows the center of all data in number line.

y
Cit
um
ant
Qu

Quiz 4 (Pass - Quizchapter4) (Page 2 of 2)
No ratings yet
Quiz 4 (Pass - Quizchapter4) (Page 2 of 2)
1 page
Compilation
No ratings yet
Compilation
17 pages
LA_19_Practice_Exam_1_Solns
No ratings yet
LA_19_Practice_Exam_1_Solns
9 pages
Linear Equations Linear Algebra
No ratings yet
Linear Equations Linear Algebra
4 pages
Section 1.4: The Matrix Equation Ax B
No ratings yet
Section 1.4: The Matrix Equation Ax B
2 pages
MathCamp2024 Linear Algebra
No ratings yet
MathCamp2024 Linear Algebra
36 pages
6 - Digression - 29jan2019
No ratings yet
6 - Digression - 29jan2019
11 pages
Linear Algebra Week 2
No ratings yet
Linear Algebra Week 2
34 pages
HW 2 Solutions
No ratings yet
HW 2 Solutions
13 pages
R1
No ratings yet
R1
9 pages
Algebra 101
No ratings yet
Algebra 101
37 pages
Lec 31
No ratings yet
Lec 31
16 pages
Numerical Methods
No ratings yet
Numerical Methods
26 pages
l8-mth113-2025
No ratings yet
l8-mth113-2025
5 pages
Linear Algebra Session 2
No ratings yet
Linear Algebra Session 2
12 pages
6 7 Vector Space Subspace Span Linear Independence Basis and Dimension
No ratings yet
6 7 Vector Space Subspace Span Linear Independence Basis and Dimension
39 pages
Mat188 Notes
No ratings yet
Mat188 Notes
18 pages
MTH 501
No ratings yet
MTH 501
7 pages
DE ZG535 (23-S2) - Sessions 3 - 5 (27 Jan, 3 & 10 Feb 2024)
No ratings yet
DE ZG535 (23-S2) - Sessions 3 - 5 (27 Jan, 3 & 10 Feb 2024)
7 pages
Linear Algebra Notes PDF
No ratings yet
Linear Algebra Notes PDF
3 pages
MA2001 summary notes
No ratings yet
MA2001 summary notes
12 pages
Notes 03
No ratings yet
Notes 03
7 pages
Linear Algebra
100% (1)
Linear Algebra
57 pages
!MA2001 Summary Notes
No ratings yet
!MA2001 Summary Notes
12 pages
Assignment 2 Math 205 Linear Algebra
No ratings yet
Assignment 2 Math 205 Linear Algebra
6 pages
Matrix Algebra_Nishtha_25-9-13
No ratings yet
Matrix Algebra_Nishtha_25-9-13
32 pages
Unit I Ax B and The Four Subspaces PDF
No ratings yet
Unit I Ax B and The Four Subspaces PDF
57 pages
Matrices 1
No ratings yet
Matrices 1
55 pages
Optimization of Chemical Processes (CHE1011)
No ratings yet
Optimization of Chemical Processes (CHE1011)
18 pages
Unit 5
No ratings yet
Unit 5
44 pages
Linear Algebra Test 1 Review
No ratings yet
Linear Algebra Test 1 Review
5 pages
Linear Equations and Matrices
No ratings yet
Linear Equations and Matrices
30 pages
Tut3 Sol
No ratings yet
Tut3 Sol
4 pages
Linear Algebra Cheat Sheet
100% (1)
Linear Algebra Cheat Sheet
3 pages
Chapter 1 Linear Equations and Linear Transformations
No ratings yet
Chapter 1 Linear Equations and Linear Transformations
44 pages
Lecture 10
No ratings yet
Lecture 10
41 pages
4 Pictures of The Same Thing: Picture (I) : Systems of Linear Equations
No ratings yet
4 Pictures of The Same Thing: Picture (I) : Systems of Linear Equations
4 pages
Definition of Terms
No ratings yet
Definition of Terms
3 pages
MATH 257 Lecture Notes.pdf
No ratings yet
MATH 257 Lecture Notes.pdf
274 pages
선형대수 강의노트 kwak, hong
No ratings yet
선형대수 강의노트 kwak, hong
45 pages
Errata Leon Ed
No ratings yet
Errata Leon Ed
7 pages
Rough Notes From Cafe
No ratings yet
Rough Notes From Cafe
8 pages
Midterm Review
No ratings yet
Midterm Review
10 pages
Lecture 1
No ratings yet
Lecture 1
41 pages
basis
No ratings yet
basis
17 pages
ES#1 Suggested Solutions
No ratings yet
ES#1 Suggested Solutions
5 pages
Span, Linear Independence and Basis
No ratings yet
Span, Linear Independence and Basis
6 pages
Linear Algebra
No ratings yet
Linear Algebra
2 pages
Matb 314 & Matb 253 - Linear Algebra
No ratings yet
Matb 314 & Matb 253 - Linear Algebra
30 pages
LA Sheet2 Sols
No ratings yet
LA Sheet2 Sols
6 pages
Linear Algebra MA2033
No ratings yet
Linear Algebra MA2033
11 pages
DIFFERENTIAL EQUATIONS LECTURE SLIDES - Systems of Linear Algebraic Equations
No ratings yet
DIFFERENTIAL EQUATIONS LECTURE SLIDES - Systems of Linear Algebraic Equations
13 pages
Linear Algebra: Chapter 2: Systems of Linear Equations
No ratings yet
Linear Algebra: Chapter 2: Systems of Linear Equations
32 pages
Linear Algebra Chapter 1
No ratings yet
Linear Algebra Chapter 1
10 pages
MA1513 Tutorial 3 T03
No ratings yet
MA1513 Tutorial 3 T03
106 pages
Assignment 2 Kashif Khan
No ratings yet
Assignment 2 Kashif Khan
17 pages
Lecture1
No ratings yet
Lecture1
49 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Matrix Theory and Applications for Scientists and Engineers
From Everand
Matrix Theory and Applications for Scientists and Engineers
Alexander Graham
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Elasticity Multiple Choice. Choose The One Alternative That Best Completes The Statement or Answers The Question. (5 Points)
No ratings yet
Elasticity Multiple Choice. Choose The One Alternative That Best Completes The Statement or Answers The Question. (5 Points)
2 pages
General Plane Motion
No ratings yet
General Plane Motion
14 pages
2D Modelling of The Consolidation of Soil
No ratings yet
2D Modelling of The Consolidation of Soil
13 pages
Cbse Class 11 Physics Sample Paper Sa1 2014 PDF
No ratings yet
Cbse Class 11 Physics Sample Paper Sa1 2014 PDF
4 pages
Grade 7 Math Worksheet
No ratings yet
Grade 7 Math Worksheet
37 pages
Abacus Syllabus Advacad Solutions
50% (2)
Abacus Syllabus Advacad Solutions
2 pages
Theoretical distribution
No ratings yet
Theoretical distribution
7 pages
2017-JIT Grade 11 Term 2
No ratings yet
2017-JIT Grade 11 Term 2
34 pages
Summer Assignments - Grade 11 Math
No ratings yet
Summer Assignments - Grade 11 Math
2 pages
Review
No ratings yet
Review
11 pages
Supplement111 112
No ratings yet
Supplement111 112
31 pages
Multitask Trajectory Planning Based On Predictive Control
No ratings yet
Multitask Trajectory Planning Based On Predictive Control
8 pages
Software Engineering
No ratings yet
Software Engineering
4 pages
Math Distributions Exam Verified Solution 2024
No ratings yet
Math Distributions Exam Verified Solution 2024
7 pages
Digital Lab Manual
No ratings yet
Digital Lab Manual
87 pages
Fusion 360 - Torch Tutorial
No ratings yet
Fusion 360 - Torch Tutorial
29 pages
Quadratic Functions and Equations: 45-Minute
No ratings yet
Quadratic Functions and Equations: 45-Minute
106 pages
Solid Mensuration
No ratings yet
Solid Mensuration
1 page
Single Index Model & The Capital Asset Pricing Model
No ratings yet
Single Index Model & The Capital Asset Pricing Model
75 pages
23 AI Crowd A - Cloud-Based - Deep - Learning - Framework - For - Early - Detection - of - Pushing - at - Crowded - Event - Entrances
No ratings yet
23 AI Crowd A - Cloud-Based - Deep - Learning - Framework - For - Early - Detection - of - Pushing - at - Crowded - Event - Entrances
14 pages
6.6 The Convolution Integral: 350 Chapter 6. The Laplace Transform
No ratings yet
6.6 The Convolution Integral: 350 Chapter 6. The Laplace Transform
9 pages
Ognizant (CTS) Placement Paper Fully Solved-2
No ratings yet
Ognizant (CTS) Placement Paper Fully Solved-2
12 pages
MPDF PDF
100% (1)
MPDF PDF
2 pages
09 Ans Practicebook
No ratings yet
09 Ans Practicebook
2 pages
CSCI262 Autumn2013 Assignments Csci262 Fall2013 Programming 1
No ratings yet
CSCI262 Autumn2013 Assignments Csci262 Fall2013 Programming 1
14 pages
Additional Coaching Problems Baquilar
No ratings yet
Additional Coaching Problems Baquilar
100 pages
A Grid-Compatible Virtual Oscillator Controller Analysis and Design
No ratings yet
A Grid-Compatible Virtual Oscillator Controller Analysis and Design
7 pages
LAB 5 Electric Circuits
No ratings yet
LAB 5 Electric Circuits
12 pages
Week 2 (Final) - Theory - Conversion and Multiple Reactors1
No ratings yet
Week 2 (Final) - Theory - Conversion and Multiple Reactors1
29 pages