2.1 - 2.18 Foundational Math - Co-Ordinate Geometry and Linear Algebra
2.1 - 2.18 Foundational Math - Co-Ordinate Geometry and Linear Algebra
1 Introduction
)
y
op
C
ft
ra
(D
● In this chapter we will be learning about basics in Linear algebra (lines
,planes,vectors,matrices e.t.c)
ts
oo
dR
ie
pl
● We will discuss concepts which are relevant to Ml/AI and solve them using real world
Ap
problem context.
2.2 Real world ML classification problem:
Sorting Fish in a Factory
)
Below is an example shown at timestamp 2.28 in the video
y
op
C
ft
ra
(D
ts
oo
dR
ie
pl
Ap
● Sorting fish problem can be treated as classification problem where we have to classify
whether a given fish is of type 1 or type2.It is clearly an labour intensive task.
● We try to automate the task by identifying the features of the fish such as length,width
,weight and using these features we classify each fish whether it is type 1 or type 2
)
y
op
C
ft
ra
(D
● Our task here is a binary classification task since we have two types of fish but this can
be extended to multiclass classification as well.
● Similar classification tasks are performed across the manufacturing plants for identifying
faulty vs good products.
ts
)
y
op
C
ft
ra
(D
● The problem we are about to solve is a binary classification problem and we use the
features of the fish for classification.
● We have a dataset(D) in which each row describes the length,width,type of a fish.We call
ts
type 2.
)
y
op
C
ft
ra
(D
ts
oo
dR
ie
pl
● Assume we have built model M ,consider we have a fish with features length=12,width=6
and we want to find whether it is of type 1 or type2. We call this a query datapoint.we
Ap
)
At timestamp 6.13
y
op
C
ft
ra
(D
ts
● Now that we have our line as a model ,we can have a query point and determine
whether it is type 1 or type 2 fish.The closer the query point towards the line our model is
less certain about its type.If the point is farther away from the line then model will be
oo
)
y
op
C
At timestamp 10.32
ft
ra
(D
ts
oo
dR
ie
pl
Ap
● The model can also be a circle and we classify the data points based on whether the
data point lies inside or outside of the circle.We do this by calculating the distance of the
point from the centre of the circle.
● The closer we are to the centre the more certain we are about the type of fish vice versa.
)
y
op
C
ft
ra
(D
● We can also have a model which is represented using two circles
● For every model there are pros and cons
for example
ts
the circles
● Similarly we can use ellipse also for our binary classification problem.
dR
● The same way we visualize a datapoint in 2D we can visualize a data point in 3 D space
if we have 3 features to represent a data point along the x,y,z axis respectively.
pl
)
y
op
C
ft
ra
(D
● We cannot use a line to classify the data points in 3D space; we need a plane for
separating them.Plane is the simplest model in 3D
● Plane separates the 3D space into two regions ,given a query point we find the distance
ts
of the point from the plane and which side of the plane the point belongs .
● This is similar to what we have discussed with line in 2D
oo
dR
ie
pl
Ap
At timestamp 9.43 in video
)
y
op
C
ft
ra
(D
● Just like we have extended the concept line in 2D to plane in 3D ;we can extend circles
in 2D to spheres in 3D .Everything inside the sphere is one class of data points and
outside will be another class.How close the point is to the centre of the sphere
determines how certain we are that the point belongs to a particular class.
ts
oo
dR
ie
pl
Ap
y
op
C
ft
ra
At timestamp 14.34 min (D
ts
oo
dR
ie
visualize data in higher dimensions like we did in case of 2D and 3D;so we use linear
algebra to understand it mathematically.We call them hyper planes ,hyper-spheres and
hyper ellipsoids in higher dimensions.
Ap
)
y
op
C
ft
● Slope intercept formulation of a line is y=m.x+c,whereas m is the slope and c is
ra
y-intercept.
● y intercept is the point at which the line intercepts the y axis.
(D
At timestamp 2.32
ts
oo
dR
ie
pl
Ap
● For the line shown above the line is c units away from the origin ,hence for this line
y-intercept=c.Algebraically at x=0 we have y=c
At timestamp 4.5 min
)
y
op
C
ft
ra
(D
● Slope m is the angle made by the line with x -axis,you can calculate slope tanΘ = a/b as
shown above.
At timestamp 7.02
ts
oo
dR
ie
pl
Ap
)
At timestamp 8.29
y
op
C
ft
ra
(D
● The intercept c can be the same for two lines but their slopes will be different;because
ts
)
y
op
C
ft
ra
(D
● General form of a line is given by the equation a.x+b.y+c=0.(x,y correspond to x and y
coordinates).This equation is important because we use this to represent lines
,planes,hyperplanes in ML.
ts
● Given a general form of line we can easily convert it into slope-intercept form as shown
above.Slope m=-a/b and y-intercept is -c/b.
oo
dR
ie
pl
Ap
At timestamp 16.13
)
y
op
C
ft
ra
(D
● Instead of using x,y as x coordinate and y coordinate from now on we use x1,x2 as the
coordinates ;so that we can extend the same concept even if we have n dimensions
(x1,x2,x3,x4….).
ts
● Also for representing coefficients we are going to use w1,w2,w0 instead of a,b,c
● For understanding the geometry of line,plane and hyperplane in higher dimensions we
are going to represent the equation as w1x1+w2x2+w0=0
oo
At timestamp 2.10
ie
pl
Ap
)
y
op
C
ft
● Equation of the plane is ax+by+cz+d=0.
ra
● We can clearly notice that the general form of line in 2D and plane in 3D are similar,just
like line separates two regions in 2d ,a plane separates two regions in 3D. In 2D we have
(D
only 2 axes and in 3D we have 3 axes.
At timestamp 3.1
ts
oo
dR
ie
pl
Ap
● From now on we use the representation as shown above for representing a plane.
At timestamp 4.26
)
y
op
C
ft
ra
● Like we have lines in 2D,planes in 3D,we can have d dimensional hyperplanes in dD .
● A hyperplane in d-dimensions separates the d dimensional space into two regions.
● The equation for representing dD hyperplane is shown above.
(D
2.8 Equations using Linear Algebra
ts
At timestamp 3.10
oo
dR
ie
pl
Ap
● We are familiar with the concept of vectors in linear algebra,we can write the equation of
a plane in a crisp way using vectors as shown above.
● W is called a row vector and X is called a column vector ,both vectors are of d
dimensions.When we perform matrix multiplication for these two vectors we obtain the
equation of the plane .
At timestamp 5.48
)
y
op
C
ft
ra
(D
ts
● In physics a vector has magnitude and direction,the concept of vectors arose both in
physics and maths.Instead of using the physics approach of vectors we use
oo
At timestamp 6.58
dR
ie
pl
Ap
● A row vector and a column vector are represented as shown above,when you multiply a
row vector with a column vector the resultant will be a scalar.
● When we say a vector in maths by default it is a column vector.(it is a convention)
At timestamp 9.28
)
y
op
C
ft
ra
(D
● W is a d dimensional vector and we represent it as shown above.(R is set of real
numbers)
At timestamp 10.15
ts
oo
dR
ie
pl
Ap
● By default every vector is a column vector , we convert a column vector into a row vector
by taking the transpose as shown above. Transpose is an operation we perform on
vectors to convert column vectors to row vectors and vice versa.
At timestamp 11.15
)
y
op
C
● Transpose can be used in matrices as well as shown above .Rows will be interchanged
ft
to columns and vice versa.
ra
(D
ts
oo
dR
)
y
op
C
ft
ra
● The above equation when we have 2 dimensions we get a line,in 3 dimensions we get a
plane and if d>=4 we get a hyperplane because w, x are d dimensional vectors.
(D
● It is the most elegant and widely used equation in Machine learning.
At timestamp 16.7
ts
oo
dR
ie
pl
)
y
op
C
ft
ra
(D
● We can use vectors to represent our dataset.Suppose we have two features to represent
a fish(length,width) .Then we can represent a fish using the two values as a vector(fish is
represented a point).We can think of d dimensional points as vectors.similarly we can
represent a fish with 3 features.
● The first feature is 2 units away from x1 axis,3 units away from x2 axis and 1 unit away
ts
from x3 axis.so we can represent a point in 3 D space and that point itself can be
represented using a vector from origin to that point.
oo
dR
ie
pl
Ap
At timestamp 4.51
)
y
op
C
ft
ra
● Vectors in physics can be represented using magnitude and direction,we get the
direction by obtaining the angle made by the vector with x-axis .
(D
● Magnitude is nothing but length of the vector and can be calculated as shown above.\
ts
oo
dR
ie
pl
Ap
ft
At timestamp 11.45
ra
(D
ts
oo
dR
ie
pl
● Let our fish dataset has 6 features to represent each fish.So each row of our data set is
a d dimensional row vector.The class to which the fish belongs to is given as type y1 ,y2
Ap
)
y
op
C
ft
● We can represent our dataset mathematically as shown above .
At timestamp 0.15
oo
dR
ie
pl
Ap
● In above snapshot f1,f2 are axis and x1,x2 are vectors 𝝷 is the angle between the
vectors.
At timestamp 4.26
)
y
op
C
ft
ra
(D
● Dot product is an operation between two vectors and is denoted as x1.x2 as shown
above.
● For computing dot product between two vectors we need length of the two vectors and
cos𝝷(𝝷 is angle between the vectors).Another way of interpreting dot product is if we
ts
have the vectors x1 and x2 (here we have two coordinates since 2D we can the same
concept to nD as well) we can compute dot product by multiplying the vectors as shown.
oo
● Given two vectors and their corresponding coordinates we can calculate cos𝝷 as shown
below.
dR
ft
ra
(D
ts
oo
At timestamp 9.26
)
y
op
C
ft
ra
● When two vectors are perpendicular to each other then they are called Orthogonal
vectors. (angle between the vectors is 90 degrees)
● The dot product of two perpendicular vectors is 0.
(D
● If the dot product between two vectors is 0 we can conclude that they are Orthogonal
vectors.
ts
)
y
op
C
ft
ra
(D
● Let's consider a plane passing through origin and the equation of plane is as shown
above.This plane can be in any dimensional space if d=2 it's a line,if d =3 it's a plane,if
d>4 they are all hyperplanes.
● Since our plane passes through origin we substitute x=0 in the equation ,which gives us
w0=0.This implies when a plane passes through origin our w0 =0.
ts
)
y
op
C
ft
ra
(D
● w will be perpendicular to plane as shown above.
ts
oo
dR
ie
pl
Ap
)
y
op
C
ft
ra
● By convention whenever we use a normal vector to a plane we typically use a unit vector
but it is not necessary.
(D
● When the length of vector w=1 we refer to it as the unit normal vector to plane.
● For a plane not passing through origin the equation of plane is as shown above.w is
normal vector to the plane.Lets consider a point x on the plane .(ox is a vector and w is
our normal vector)
● Now we assume w to be unit vector and find the dot product between vectors x and w
as shown above and cos𝝷 is not equal to 0. w0 will not be equal to 0 since plane is not
passing through origin
)
y
op
C
ft
ra
(D
● We can calculate cos𝝷 as shown above and we can clearly see that a=-w0.
ts
● a is nothing but the shortest distance from origin to the plane and we know that distance
cannot have negative values(-ve represents the direction ).So ,the distance from origin to
the plane is abs(w0)
At timestamp 17.11 in video
)
y
op
C
ft
●
ra
For plane not passing through origin and w is not a unit vector then a can be written as
(D
shown above.
● Given the equation of plane w is vector normal to the plane and passing through
origin,w0 is the distance from origin to the plane.
2.12 Half spaces & classification using a
plane
At timestamp 0.42 in video
)
y
op
C
ft
ra
(D
● Now we use the concept of half-spaces to actually classify the points in d dimensional
ts
space.
● Given a plane in d-dimensional space and normal vector w as shown above,if we
oo
assume it to be a 2D space the line divides the total space into 2 halves everything
above the plane and everything below the plane.(+ve half and -ve half).likewise a plane
separates the 3D space into two regions.
dR
● The space above the plane in the direction of w is positive half space and space below
the plane opposite direction of w is negative half space
ie
pl
Ap
At timestamp 4.40 in video
)
y
op
C
ft
ra
● Let’s consider three points x1 below the plane,x2 above the plane and x3 on the plane.
(D
● As shown above for x2 and all the points above the plane w.x2+w0>0 ,and for points
below the plane the value is less than 0 and for points lying on the plane it is 0.
● Let’s understand each scenario one by one.
ts
● As shown above we have our plane and normal vector to the plane w and we have point
x1 below the plane in negative half space.
● We extend the vector x1 slightly till it touches the plane and we call the point x1’.Now we
have two vectors x1 and x1’.
● As shown above the magnitude of x1’ is greater than the magnitude of x1.
● We know that w.x1’+w0=0 since x1’ lies on the plane. From above equations it's clear
that w.x1+w0<0
)
At timestamp 8.50 in video
y
op
C
ft
ra
(D
ts
● We now consider a point x2 in positive half space,let’s call the point at which the vector
oo
that w.x2+w0>0
ie
pl
Ap
At timestamp 11.53 in video
)
y
op
C
ft
ra
● To summarize we perform classification using the plane based on above conditions.
● When we use a plane to separate any point in d -dimensional space ,we determine
whether the point is above or below the plane.
(D
2.13 Distance of a point from a plane
ts
● Given any point we will find which side of the plane it belongs to and also its important to
know how far away it is from the plane.
● Given any plane and a point x ,we know the normal vector w and also the distance from
origin to the plane as shown above.Let's find the distance from point x to the plane
● Distance from plane to the point x is nothing but the length of the line segment that is
perpendicular to plane and passing through the point x.(d as shown above)
)
At timestamp 14.12 in video
y
op
C
ft
ra
(D
● From the above diagram we can say that ɸ=90-𝝷
ts
)
y
op
C
ft
ra
(D
● By subtracting a from the length of vector x we can get b as shown.
ie
dR
oo
ts
(D
ra
ft
C
op
y
)
2.14 Projection & arithmetic operations on
vectors
)
At timestamp 0.40 in video
y
op
C
ft
ra
(D
ts
oo
dR
ie
pl
Ap
)
y
op
C
ft
● Let we have a vector W and its length need not be 1 .If we want to represent a vector
ra
whose length is equal to 1 but in the same direction of W we call such a vector as a unit
vector.
(D
● When we are given a vector W , we can obtain a unit vector which has the same
direction as W as shown above.
● Let we have two data points x1 and x2 ,if we want to project vector x1 on to x2 we can
draw a perpendicular line from x1 onto x2 and we call it x1’ as shown above.The
direction of x1’ is the same as x2 .
)
y
op
C
ft
●
ra
Whenever we have a datapoint we draw a line from origin to the point and we call it a
(D
vector.But vectors can be anywhere like the yellow line shown above and we cannot
interpret it as a datapoint.
)
y
op
C
ft
ra
● Vector addition and subtraction be done even in d dimensions as shown .
(D
● Note that for vector addition,subtraction and dot product the vectors should be of the
same dimensions.
ts
hypersphere
At timestamp 0.37 in video
dR
ie
pl
Ap
)
y
op
C
ft
ra
(D
ts
oo
dR
● Circle in 2D can be represented as shown above with radius r,centre c=0.To represent
any circle we need its centre c and radius r.
ie
pl
ra
three points x1,x2,x3 lying outside,on and inside the sphere.Typically we use
inside/outside as a mechanism to determine whether a point belongs to which class.
(D
ts
oo
dR
ie
pl
● We have to calculate the length of the vector from centre c to to determine where the
datapoint lies.We have to check the conditions shown above.
Ap
2.17 Ellipse, Ellipsoid, Hyper-Ellipsoid
At timestamp 1.18 in video
)
y
op
C
ft
ra
(D
ts
oo
dR
ie
pl
Ap
)
y
op
C
ft
● We represent an ellipse which is centred at origin as shown above ,we call the x axis as
ra
major axis and y axis as minor axis here
● We can extend the same concept to 3D and higher dimensions as shown above.
(D
2.18 Squares, Rectangles, Hypercubes and
ts
Hyper-cuboids
oo
)
At timestamp 2.19 in video
y
op
C
ft
ra
(D
ts
● A point lies inside or outside of the rectangle/square if and only if it's x-coordinate
lies between the x-coordinate of the given bottom-right and top-left coordinates of
the rectangle and y-coordinate lies between the y-coordinate of the given
dR