2.2 Linear Algebra & Geometry
2.2 Linear Algebra & Geometry
o In any ML/DL problem, most of data is presented as Vectors, Matrices, and Tensors.
o A vector is a 1D array. a point in space is a vector of 3 coordinates (x, y, z). it is defined such that it has
magnitude and direction.
o A matrix is a 2D array of numbers, that has a fixed number of rows and columns. It contains a number at the
intersection of 𝑛𝑡ℎ 𝑟𝑜𝑤 𝑎𝑛𝑑 𝑑 𝑡ℎ 𝑐𝑜𝑙𝑢𝑚𝑛. A matrix is usually denoted by square brackets [].
o A tensor is a generalization of vectors and matrices. E.g., a 1D tensor is a vector,2D tensor is a matrix.
o We can have a 3D tensor like image with RGB colours. This continues to expand to n-dimensional tensors.
o Following are some basic matrix terms to represent a matrix
• Vectors
o In machine learning context, vectors are list of numbers. E.g. If we are predicting student final test marks based
on his previous test marks, age and gender. 𝑆𝑡𝑢𝑑𝑒𝑛𝑡1 = [65,15,0]
o In real word projects, this dimension can go up to thousands .
o In linear algebra, vector coordinates tell distance & direction to go from the origin on that axis.
o A vector is multiplied by a scalar, which is done by multiplying every element of the vector by the scalar.
o This process of stretching the vector direction is called scaling. A scalar is a number that scales some vector.
• Unit Vectors (basis of a coordinate system)
1 0
o In the 2D coordinate system, there are two-unit vectors one on 𝑥 direction & one in 𝑦 𝑖̂ = [ ] 𝑎𝑛𝑑 𝑗̂ = [ ]
0 1
o A vector [3, −2] scales 𝑖̂ by a factor of 3 and 𝑗̂ by factor of −2.
o These coordinates describe the sum of two scaled vectors 3𝑖̂ + (−2𝑗̂)
o Coordinates as scalars are scaled unit vectors.
o It is possible to have different basis or unit vectors. With these different basis vectors, we get a completely new
coordinate system.
o If their addition adds a new span of dimensions then they are Linear independent. This vector will give a
contribution to the span. For all values of 𝑎 , 𝑣⃗ ≠ 𝑎. 𝑤
⃗⃗⃗
o Set of vectors is linearly dependant if 𝛽1 ∗ 𝑎1 + ⋯ + 𝛽𝑘 ∗ 𝑎𝑘 = 0 where 𝛽1 , ⋯ , 𝛽𝑘 ≠ 0. We can form the zero
vector as a linear combination of the vectors.
o When a collection of vectors is linearly dependent, at least one of the vectors can be expressed as a linear
combination of the other vectors
Gif File
o With linear transformation vector is moved again in some particular way in plane to get output vector. E.g.
input vector is rotated at 90° clockwise to form output vector.
o Several important geometric transformations can be expressed as a matrix-vector product 𝑤 = 𝐴𝑣. 𝐴 is 2×2
matrix. E.g., Scaling, Rotation, Reflection
1 0
o To transform input vector 𝑥𝑖𝑛 and 𝑦𝑖𝑛 to 𝑥𝑜𝑢𝑡 and 𝑦𝑜𝑢𝑡 basis vectors 𝑖̂ = [ ] 𝑎𝑛𝑑 𝑗̂ = [ ] can be observed.
0 1
−2
E.g., To obtain vector 𝑣⃗ = [ ] and we have to multiply 𝑖̂ with −2 and 𝑗̂ with 4.
4
o With linear transformation, we will obtain transformed 𝑖̂ and transformed 𝑗̂. E.g. 𝑖̂𝑡𝑟𝑎𝑛𝑠 = [ 1 ] 𝑎𝑛𝑑 𝑗̂𝑡𝑟𝑎𝑛𝑠 = [3]
−2 0
1 3 𝑎𝑥 + 𝑏𝑦 1 ∗ −2 + 3 ∗ 4 10
𝑣⃗𝑡𝑟𝑎𝑛𝑠 = −2 [ ] + 4[ ] = [ ]=[ ]=[ ]
−2 0 𝑐𝑥 + 𝑑𝑦 −2 ∗ −2 + 0 ∗ 4 4
𝑥
o To map any input vector [𝑦] with a linear transformation, the output vector will multiply them with
transformed basis vectors.
1∗𝑥+3∗𝑦
𝑣⃗𝑡𝑟𝑎𝑛𝑠 = [ ]
−2 ∗ 𝑥 + 0 ∗ 𝑦
o We can map the whole 2-D plane if we know the transformed basis vectors.
o We can create 2×2 matrix of transformed basis vectors which can be used for vector processing. E.g. [ 1 3 ].
−2 0
Intuitively, it’s summing two scaled column vectors. i.e. matrix-vector multiplication.
𝑎 𝑏 𝑥 𝑎 𝑏 𝑎𝑥 + 𝑏𝑦
[ ] ∙ [𝑦] = 𝑥 [ ] + 𝑦 [ ] = [ ]
𝑐 𝑑 𝑐 𝑑 𝑐𝑥 + 𝑑𝑦
o If a matrix whose vectors are linearly dependent, then, a 2-D plane will be mapped to a single line.
o For any object in general which can be approximated by squares, every unit square will transform. So, every
grid square of size 1 will be mapped into a parallelogram under shear operation.
o This new area is called as determinant of a transformation. The following operation doubles the area.
𝑎 𝑏
det ([ ]) = (𝑎 ∗ 𝑑) − (𝑏 ∗ 𝑐)
𝑐 𝑑
The 𝑎 ∗ 𝑑 term tells how much it has stretched in x & y direction while as 𝑏 ∗ 𝑐 how much it has been
squished in diagonal direction
`
−1 1
det ([ ]) = (−1 ∗ −1) − (1 ∗ −1) = 1 − (−1) = 1 + 1 = 2
−1 −1
o If a linear transformation map a 2-D space into a single line, the determinant value = 0
4 2
det ([ ]) = (4 ∗ 1) − (2 ∗ 2) = 4 − 4 = 0
2 1
𝑎 𝑏 𝑐
𝑑 𝑓 𝑑 𝑒
det ([𝑑 𝑒 𝑓]) = 𝑎 ∗ det ([𝑒 𝑓]) − 𝑏 ∗ det ([ ]) + 𝑐 ∗ det ([ ])
ℎ 𝑖 𝑔 𝑖 𝑔 ℎ
𝑔 ℎ 𝑖
= 𝑎(𝑒𝑖 − 𝑓ℎ) − 𝑏(𝑑𝑖 − 𝑓𝑔) + 𝑐(𝑑ℎ − 𝑒𝑔)
= 𝑎𝑒𝑖 − 𝑎𝑓ℎ − 𝑏𝑑𝑖 + 𝑏𝑓𝑔 + 𝑐𝑑ℎ − 𝑐𝑒𝑔
• Inverse matrices
o For Linear equation, variables 𝑥, 𝑦 are only scaled by a coefficient and are summed. To solve them, we align
variables on the left and constants on the right into another matrix.
2𝑥 + 2𝑦 = −4
1𝑥 + 3𝑦 = −1
2 2 𝑥 −4
[ ] [𝑦 ] = [ ]
⏟
1 3 ⏟ ⏟−1
𝐴 𝑥⃗ ⃗⃗
𝑏
𝐴𝑥⃗ = 𝑏⃗⃗
o 𝑥⃗ is input vector and it is transformed to 𝑏⃗⃗ using a matrix 𝐴. So, it moved somewhere in the 2-D space to
obtain resulting vector 𝑏⃗⃗ .
o So, we are looking for a vector 𝑥⃗ which lands on 𝑏⃗⃗
o If determinant is not 0, we will get the unique solution which will transform 𝑥⃗ into 𝑏⃗⃗. As the 𝑥⃗ is input vector
and it is transformed to 𝑏⃗⃗ Vice versa we can go back from 𝑏⃗⃗ in search of 𝑥⃗ . This backward search is inverse
transformation.
3 1 3 1 −1
E.g., 𝐴 = [ ] and Its inverse transformation will be 𝐴−1 = [ ]
0 2 0 2
1 0
o If we consecutively apply 𝐴 and then 𝐴−1 we will go back to the basis vectors 𝑖̂ = [ ] 𝑎𝑛𝑑 𝑗̂ = [ ] .This
0 1
consecutive transformation is called as matrix multiplication. This is also called as identity transformation as
this transformation does nothing but gives us the identity matrix.
o To solve the equation 𝐴𝑥⃗ = 𝑏⃗⃗ we will multiply both sides by 𝐴−1
𝐴−1 𝐴𝑥⃗ = 𝐴−1 𝑏⃗⃗
o Intuitively we are transformation in reverse and following 𝑏⃗⃗ thus wherever 𝑥⃗ will land is solution
o when 𝑑𝑒𝑡(𝐴) = 0 then any vector will be squashed into the line. The vector mapped into this line, cannot
determine its location of origin i.e. we can’t find 𝐴−1
o Column space is span of columns of matrix i.e. All possible solutions that one matrix 𝐴.
0
o A null space is a set of all vectors that are mapped into 𝑏⃗⃗ = [ ] vector.
0
• Distance between 2 points
𝑦1
𝑝(𝑎1 , 𝑎2 )
𝑑𝑝𝑞
𝑎2 − 𝑏2
𝑎1 − 𝑏1 𝑞(𝑏1 , 𝑏2 )
𝑥1
o By Pythagoras theorem,
o In 2D Space, 𝑑𝑝𝑞 = √(𝑎1 − 𝑏1 )2 + (𝑎2 − 𝑏2 )2
o In 3D space 𝑝(𝑎1 , 𝑎2 , 𝑎3 ), 𝑞(𝑏1 , 𝑏2 , 𝑏3 ) 𝑑𝑝𝑞 = √(𝑎1 − 𝑏1 )2 + (𝑎2 − 𝑏2 )2 + (𝑎3 − 𝑏3 )2
o In ND Space 𝑝(𝑎1 , 𝑎2 , 𝑎3 , ⋯ , 𝑎𝑛 ), 𝑞(𝑏1 , 𝑏2 , 𝑏3 , ⋯ , 𝑏𝑛 )
𝑑𝑝𝑞 = √∑(𝑎𝑖 − 𝑏𝑖 )2
𝑖=1
𝑥2
𝑝(𝑎1 , 𝑎2 )
𝑑𝑝𝑜 𝑎2
𝑜(0,0) 𝑥1
𝑎1
o By Pythagoras theorem,
o In 2D Space, 𝑑𝑝𝑜 = √𝑎1 2 + 𝑎2 2
o In 3D space 𝑝(𝑎1 , 𝑎2 , 𝑎3 ) 𝑑𝑝𝑜 = √𝑎1 2 + 𝑎2 2 + 𝑎3 2
o In ND Space 𝑝(𝑎1 , 𝑎2 , 𝑎3 , ⋯ , 𝑎𝑛 )
𝑑𝑝𝑜 = √∑ 𝑎𝑖 2
𝑖=1
2 4
𝑎 = [3] 𝑏 = [6]
4 7
𝑎 . 𝑏 will be 𝑎𝑥 𝑏𝑥 + 𝑎𝑦 𝑏𝑦 + 𝑎𝑧 𝑏𝑧 = 2 ∗ 5 + 3 ∗ 6 + 4 ∗ 7 = 56
Gif Link
o If projection is opposite side of 𝑣⃗ the dot product is negative and if they are perpendicular the dot product is
zero
o If 𝑣⃗ and 𝑤
⃗⃗⃗ has same length, the projected length of both vectors on each other would be same. Thus,
calculation sequence does not matter.
𝑣⃗ ∙ 𝑤
⃗⃗⃗ = 𝑤
⃗⃗⃗ ∙ 𝑣⃗
o If 𝑣⃗ is 3 times longer than 𝑤 ⃗⃗⃗, we can interpret 3𝑣⃗ is scaling of 𝑣⃗. But scaling is actually scaling the length.
Thus, (3⃗𝑣⃗ ) ∙ 𝑤
⃗⃗⃗ = 3(⃗𝑣⃗ ∙ ⃗𝑤
⃗⃗)
o By using basis vectors, we can calculate where any vector will land on transformed line.
1 0
E.g. 𝑖̂ = [ ] 𝑎𝑛𝑑 𝑗̂ = [ ] change to [1 −2]
0 1
4
o A vector 𝑣⃗ = [ ] will be transformed to
3
𝑇𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚
⏞ 4
[1 −2] [⏟] = 4 ∙ 1 + 3 ∙ (−2) = 4 − 6 = −2
3
𝑉𝑒𝑐𝑡𝑜𝑟
o For further clarification, it can be thought as how 2-D points will be projected onto a single line. The
transformed single line has unit vector 𝑢̂ .
o The projection of Basis vector 𝑖̂ and 𝑗̂ onto a unit vector 𝑢̂ will be same as 𝑢̂ will be a projected 𝑖̂ and 𝑗̂
i.e. 𝑥𝑐𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒 = 𝑢𝑥 and 𝑦𝑐𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒 = 𝑢𝑦 of a vector 𝑢̂.
o The Cross Product a × b of two vectors is another vector that is at right angles to both
o The magnitude (length) of the cross product equals the area of a parallelogram with vectors 𝑎 and 𝑏 for sides
|𝑎|𝑜𝑟𝑖𝑔𝑖𝑛 = √𝑎1 2 + 𝑎2 2 + … , 𝑎𝑛 2
|𝑏|𝑜𝑟𝑖𝑔𝑖𝑛 = √𝑏1 2 + 𝑏2 2 + … , 𝑏𝑛 2
o It is to be noted that the cross product is a vector with a specified direction. The resultant is always
perpendicular to both 𝑎 and 𝑏.
o In case 𝑎 and 𝑏 are parallel vectors, the resultant shall be zero as 𝑠𝑖𝑛(0) = 0
o When 𝑎 and 𝑏 start at the origin point (0,0,0), the Cross Product will end at:
[𝑋 𝑌 𝑍]
𝑎 = [2 3 4]
𝑏 = [5 6 7]
𝑐𝑥 = 𝑎𝑦 𝑏𝑧 − 𝑎𝑧 𝑏𝑦 = 3 ∗ 7 − 4 ∗ 6 = −3
𝑎 𝑋 𝑏 will be 𝑐𝑦 = 𝑎𝑧 𝑏𝑥 − 𝑎𝑥 𝑏𝑧 = 4 ∗ 5 − 2 ∗ 7 = 6
𝑐𝑧 = 𝑎𝑥 𝑏𝑦 − 𝑎𝑦 𝑏𝑥 = 2 ∗ 6 − 3 ∗ 5 = −3
o The cross product could point in the completely opposite direction and still be
perpendicular to the two other vectors, so we have the "Right Hand Rule"
o With your right-hand, point your index finger along vector a, and point your middle
finger along vector b: the cross product goes in the direction of your thumb.
o Now, consider𝑊 𝑇 𝑋 = 0 Geometrically, X represents a vector or a point of components x1, x2. W is a vector
which is perpendicular to x
o i.e., 𝑊 𝑇 𝑋 = 0 signifies W and X are perpendicular to each other since there product is 0
• Ellipsoid(3D)
o Ellipsoid is 3D variant of ellipse equation is same as ellipse extended
for 3D. the point checking properties remain same for ellipsoid as that of
ellipse
o Above concept can be extended to Hyper ellipsoid which is basically
an n dimensional ellipse