0% found this document useful (0 votes)
20 views54 pages

2.1 - 2.18 Foundational Math - Co-Ordinate Geometry and Linear Algebra

The document discusses using linear algebra concepts like lines, planes, circles and ellipses for machine learning classification problems. It explains how geometric shapes can be used as models to classify data points into categories and determine if a new query point belongs to one class or another based on the model and its distance from the data points.

Uploaded by

Mohit Saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views54 pages

2.1 - 2.18 Foundational Math - Co-Ordinate Geometry and Linear Algebra

The document discusses using linear algebra concepts like lines, planes, circles and ellipses for machine learning classification problems. It explains how geometric shapes can be used as models to classify data points into categories and determine if a new query point belongs to one class or another based on the model and its distance from the data points.

Uploaded by

Mohit Saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

2.

1 Introduction

)
y
op
C
ft
ra
(D
● In this chapter we will be learning about basics in Linear algebra (lines
,planes,vectors,matrices e.t.c)
ts
oo
dR
ie
pl

● We will discuss concepts which are relevant to Ml/AI and solve them using real world
Ap

problem context.
2.2 Real world ML classification problem:
Sorting Fish in a Factory

)
Below is an example shown at timestamp 2.28 in the video

y
op
C
ft
ra
(D
ts
oo
dR
ie
pl
Ap

● Sorting fish problem can be treated as classification problem where we have to classify
whether a given fish is of type 1 or type2.It is clearly an labour intensive task.
● We try to automate the task by identifying the features of the fish such as length,width
,weight and using these features we classify each fish whether it is type 1 or type 2
)
y
op
C
ft
ra
(D
● Our task here is a binary classification task since we have two types of fish but this can
be extended to multiclass classification as well.
● Similar classification tasks are performed across the manufacturing plants for identifying
faulty vs good products.
ts

2.3.Dataset & Plotting


oo

At timestamp 0.30 in the video


dR
ie
pl
Ap
Below is an example shown at timestamp 2.02 in the video

)
y
op
C
ft
ra
(D
● The problem we are about to solve is a binary classification problem and we use the
features of the fish for classification.
● We have a dataset(D) in which each row describes the length,width,type of a fish.We call
ts

this information of each fish as a datapoint which is collected manually.


● The yellow data points represent fish of type1 and green data points represent fish of
oo

type 2.

At timestamp 3.31 in the video


dR
ie
pl
Ap
● Given the dataset D we are going to build or train a machine learning model (M).

At timestamp 6.03 in the video

)
y
op
C
ft
ra
(D
ts
oo
dR
ie
pl

● Assume we have built model M ,consider we have a fish with features length=12,width=6
and we want to find whether it is of type 1 or type2. We call this a query datapoint.we
Ap

can visualize the data point as shown.


● feature 1(length),feature 2(width) are represented along the X,Y axis respectively.(f1,f2)
gives data about 1 fish.
)
y
op
C
ft
ra
● We can visualize the entire dataset in 2D using a scatter plot since we have 2 features
,and we can add legend to the plot for separating data points.
(D
2.4 Lines, Circles and Ellipses for ML
Classification
ts

At timestamp 3.18 in video


oo
dR
ie
pl
Ap
● You can clearly observe that the pink line is more or less separating the two types of fish.
Let's say that the line is our model. It is separating our yellow points and green points
fairly well except for the two misclassified points.Majority of the points are classified
correctly.

)
At timestamp 6.13

y
op
C
ft
ra
(D
ts

● Now that we have our line as a model ,we can have a query point and determine
whether it is type 1 or type 2 fish.The closer the query point towards the line our model is
less certain about its type.If the point is farther away from the line then model will be
oo

confident about the type of fish.


● Finding the classifier or model is nothing but finding a good line that separates the
points.
dR

● We classify a query point based on two things ,


1. Which side of line point lies
2. Distance of point from the line
ie
pl
Ap
At timestamp 6.13

)
y
op
C
At timestamp 10.32

ft
ra
(D
ts
oo
dR
ie
pl
Ap
● The model can also be a circle and we classify the data points based on whether the
data point lies inside or outside of the circle.We do this by calculating the distance of the
point from the centre of the circle.
● The closer we are to the centre the more certain we are about the type of fish vice versa.

)
y
op
C
ft
ra
(D
● We can also have a model which is represented using two circles
● For every model there are pros and cons
for example
ts

1. What if a point lies exactly on the line,circle


2. What if the point lies in the region where circles overlap,and if point lies outside
oo

the circles
● Similarly we can use ellipse also for our binary classification problem.
dR

2.5 Planes, Spheres and Ellipsoids for ML


classification
ie

At timestamp 1.53 in the video


pl
Ap
)
y
op
C
ft
ra
(D
ts
oo
dR
ie

● The same way we visualize a datapoint in 2D we can visualize a data point in 3 D space
if we have 3 features to represent a data point along the x,y,z axis respectively.
pl

● We can also visualize the entire dataset in 3D space.


Ap
At timestamp 8.20 in video

)
y
op
C
ft
ra
(D
● We cannot use a line to classify the data points in 3D space; we need a plane for
separating them.Plane is the simplest model in 3D
● Plane separates the 3D space into two regions ,given a query point we find the distance
ts

of the point from the plane and which side of the plane the point belongs .
● This is similar to what we have discussed with line in 2D
oo
dR
ie
pl
Ap
At timestamp 9.43 in video

)
y
op
C
ft
ra
(D
● Just like we have extended the concept line in 2D to plane in 3D ;we can extend circles
in 2D to spheres in 3D .Everything inside the sphere is one class of data points and
outside will be another class.How close the point is to the centre of the sphere
determines how certain we are that the point belongs to a particular class.
ts
oo
dR
ie
pl
Ap

● Similarly we can extend ellipse in 2D to ellipsoid in 3D for the classification.


)
At timestamp 13.2 min

y
op
C
ft
ra
At timestamp 14.34 min (D
ts
oo
dR
ie

● To summarize we use planes ,spheres,ellipsoids in 3D space and when we have more


than 3 features we extend the same concept to higher dimensions as well.But we cannot
pl

visualize data in higher dimensions like we did in case of 2D and 3D;so we use linear
algebra to understand it mathematically.We call them hyper planes ,hyper-spheres and
hyper ellipsoids in higher dimensions.
Ap

2.6 Equation of a line


At timestamp 0.22

)
y
op
C
ft
● Slope intercept formulation of a line is y=m.x+c,whereas m is the slope and c is

ra
y-intercept.
● y intercept is the point at which the line intercepts the y axis.
(D
At timestamp 2.32
ts
oo
dR
ie
pl
Ap

● For the line shown above the line is c units away from the origin ,hence for this line
y-intercept=c.Algebraically at x=0 we have y=c
At timestamp 4.5 min

)
y
op
C
ft
ra
(D
● Slope m is the angle made by the line with x -axis,you can calculate slope tanΘ = a/b as
shown above.

At timestamp 7.02
ts
oo
dR
ie
pl
Ap

● Intercept c will always be unique for a given line.


● If we have two lines parallel to each other their slopes will be same but intercept will not
be same
● Given unique m and c there will only be one such line .

)
At timestamp 8.29

y
op
C
ft
ra
(D
● The intercept c can be the same for two lines but their slopes will be different;because
ts

the angle made by each line with x axis will be different.


oo
dR
ie
pl
Ap
At timestamp 12.31

)
y
op
C
ft
ra
(D
● General form of a line is given by the equation a.x+b.y+c=0.(x,y correspond to x and y
coordinates).This equation is important because we use this to represent lines
,planes,hyperplanes in ML.
ts

● Given a general form of line we can easily convert it into slope-intercept form as shown
above.Slope m=-a/b and y-intercept is -c/b.
oo
dR
ie
pl
Ap
At timestamp 16.13

)
y
op
C
ft
ra
(D
● Instead of using x,y as x coordinate and y coordinate from now on we use x1,x2 as the
coordinates ;so that we can extend the same concept even if we have n dimensions
(x1,x2,x3,x4….).
ts

● Also for representing coefficients we are going to use w1,w2,w0 instead of a,b,c
● For understanding the geometry of line,plane and hyperplane in higher dimensions we
are going to represent the equation as w1x1+w2x2+w0=0
oo

2.7 Equation of a plane & hyper planes


dR

At timestamp 2.10
ie
pl
Ap
)
y
op
C
ft
● Equation of the plane is ax+by+cz+d=0.

ra
● We can clearly notice that the general form of line in 2D and plane in 3D are similar,just
like line separates two regions in 2d ,a plane separates two regions in 3D. In 2D we have
(D
only 2 axes and in 3D we have 3 axes.

At timestamp 3.1
ts
oo
dR
ie
pl
Ap

● From now on we use the representation as shown above for representing a plane.
At timestamp 4.26

)
y
op
C
ft
ra
● Like we have lines in 2D,planes in 3D,we can have d dimensional hyperplanes in dD .
● A hyperplane in d-dimensions separates the d dimensional space into two regions.
● The equation for representing dD hyperplane is shown above.
(D
2.8 Equations using Linear Algebra
ts

At timestamp 3.10
oo
dR
ie
pl
Ap

● We are familiar with the concept of vectors in linear algebra,we can write the equation of
a plane in a crisp way using vectors as shown above.
● W is called a row vector and X is called a column vector ,both vectors are of d
dimensions.When we perform matrix multiplication for these two vectors we obtain the
equation of the plane .

At timestamp 5.48

)
y
op
C
ft
ra
(D
ts

● In physics a vector has magnitude and direction,the concept of vectors arose both in
physics and maths.Instead of using the physics approach of vectors we use
oo

mathematics interpretation vectors.


● In mathematics a vector and a scalar is represented as shown above.

At timestamp 6.58
dR
ie
pl
Ap
● A row vector and a column vector are represented as shown above,when you multiply a
row vector with a column vector the resultant will be a scalar.
● When we say a vector in maths by default it is a column vector.(it is a convention)

At timestamp 9.28

)
y
op
C
ft
ra
(D
● W is a d dimensional vector and we represent it as shown above.(R is set of real
numbers)

At timestamp 10.15
ts
oo
dR
ie
pl
Ap

● By default every vector is a column vector , we convert a column vector into a row vector
by taking the transpose as shown above. Transpose is an operation we perform on
vectors to convert column vectors to row vectors and vice versa.
At timestamp 11.15

)
y
op
C
● Transpose can be used in matrices as well as shown above .Rows will be interchanged

ft
to columns and vice versa.

ra
(D
ts
oo
dR

● So now we can represent the equation of the plane as shown above.


● Note that w ,x should be of the same dimensions.
ie
pl
Ap
At timestamp 15.49

)
y
op
C
ft
ra
● The above equation when we have 2 dimensions we get a line,in 3 dimensions we get a
plane and if d>=4 we get a hyperplane because w, x are d dimensional vectors.
(D
● It is the most elegant and widely used equation in Machine learning.

At timestamp 16.7
ts
oo
dR
ie
pl

2.9 Dataset using vectors


Ap
At timestamp 2.11

)
y
op
C
ft
ra
(D
● We can use vectors to represent our dataset.Suppose we have two features to represent
a fish(length,width) .Then we can represent a fish using the two values as a vector(fish is
represented a point).We can think of d dimensional points as vectors.similarly we can
represent a fish with 3 features.
● The first feature is 2 units away from x1 axis,3 units away from x2 axis and 1 unit away
ts

from x3 axis.so we can represent a point in 3 D space and that point itself can be
represented using a vector from origin to that point.
oo
dR
ie
pl
Ap
At timestamp 4.51

)
y
op
C
ft
ra
● Vectors in physics can be represented using magnitude and direction,we get the
direction by obtaining the angle made by the vector with x-axis .
(D
● Magnitude is nothing but length of the vector and can be calculated as shown above.\
ts
oo
dR
ie
pl
Ap

● The magnitude of a vector is represented as shown above.


● A vector of magnitude or length 1 is a unit vector
)
y
op
C
● We can represent a vector as a point in d dimensional space.

ft
At timestamp 11.45

ra
(D
ts
oo
dR
ie
pl

● Let our fish dataset has 6 features to represent each fish.So each row of our data set is
a d dimensional row vector.The class to which the fish belongs to is given as type y1 ,y2
Ap

.Here we have two classes.


● You can see that x,y is data related to each fish
● Each x can be represented as a 6 dimensional vector.
At timestamp 11.53

)
y
op
C
ft
● We can represent our dataset mathematically as shown above .

2.10 Dot product and angle between


ra
(D
vectors
ts

At timestamp 0.15
oo
dR
ie
pl
Ap

● In above snapshot f1,f2 are axis and x1,x2 are vectors 𝝷 is the angle between the
vectors.
At timestamp 4.26

)
y
op
C
ft
ra
(D
● Dot product is an operation between two vectors and is denoted as x1.x2 as shown
above.
● For computing dot product between two vectors we need length of the two vectors and
cos𝝷(𝝷 is angle between the vectors).Another way of interpreting dot product is if we
ts

have the vectors x1 and x2 (here we have two coordinates since 2D we can the same
concept to nD as well) we can compute dot product by multiplying the vectors as shown.
oo

● Given two vectors and their corresponding coordinates we can calculate cos𝝷 as shown
below.
dR

At timestamp 5.15 in video


ie
pl
Ap
)
y
op
C
At timestamp 7.57

ft
ra
(D
ts
oo

● By equating and rearranging terms as shown above we calculated cos𝝷.(note that x1


dR

and x2 should be of same dimensions because they both have to be in same


d-dimensional space)
● What we learnt in 2D can transform it to higher dimensional space easily.
ie
pl
Ap

At timestamp 9.26
)
y
op
C
ft
ra
● When two vectors are perpendicular to each other then they are called Orthogonal
vectors. (angle between the vectors is 90 degrees)
● The dot product of two perpendicular vectors is 0.
(D
● If the dot product between two vectors is 0 we can conclude that they are Orthogonal
vectors.
ts

2.11 Geometry of the Equation of a plane


oo

At timestamp 0.41 in video


dR
ie
pl
Ap
● Just like we understood the geometry of a line we can understand a plane geometrically.
At timestamp 3.32 in video

)
y
op
C
ft
ra
(D
● Let's consider a plane passing through origin and the equation of plane is as shown
above.This plane can be in any dimensional space if d=2 it's a line,if d =3 it's a plane,if
d>4 they are all hyperplanes.
● Since our plane passes through origin we substitute x=0 in the equation ,which gives us
w0=0.This implies when a plane passes through origin our w0 =0.
ts

At timestamp 4.40 in video


oo
dR
ie
pl
Ap
● For a plane passing through origin since w0=0 .The dot product of vectors w.x is 0,so
they are perpendicular vectors.(angle between x and w is 90 degrees)
● w will be perpendicular to every point x on the plane.

At timestamp 5.50 in video

)
y
op
C
ft
ra
(D
● w will be perpendicular to plane as shown above.
ts
oo
dR
ie
pl
Ap

● You can visualize w in 3D as shown above.


At timestamp 8.33 in video

)
y
op
C
ft
ra
● By convention whenever we use a normal vector to a plane we typically use a unit vector
but it is not necessary.
(D
● When the length of vector w=1 we refer to it as the unit normal vector to plane.

At timestamp 13.8 in video


ts
oo
dR
ie
pl
Ap

● For a plane not passing through origin the equation of plane is as shown above.w is
normal vector to the plane.Lets consider a point x on the plane .(ox is a vector and w is
our normal vector)
● Now we assume w to be unit vector and find the dot product between vectors x and w
as shown above and cos𝝷 is not equal to 0. w0 will not be equal to 0 since plane is not
passing through origin

At timestamp 15.8 in video

)
y
op
C
ft
ra
(D
● We can calculate cos𝝷 as shown above and we can clearly see that a=-w0.
ts

At timestamp 16.26 in video


oo
dR
ie
pl
Ap

● a is nothing but the shortest distance from origin to the plane and we know that distance
cannot have negative values(-ve represents the direction ).So ,the distance from origin to
the plane is abs(w0)
At timestamp 17.11 in video

)
y
op
C
ft

ra
For plane not passing through origin and w is not a unit vector then a can be written as
(D
shown above.

At timestamp 21.53 in video


ts
oo
dR
ie
pl
Ap

● Given the equation of plane w is vector normal to the plane and passing through
origin,w0 is the distance from origin to the plane.
2.12 Half spaces & classification using a
plane
At timestamp 0.42 in video

)
y
op
C
ft
ra
(D
● Now we use the concept of half-spaces to actually classify the points in d dimensional
ts

space.
● Given a plane in d-dimensional space and normal vector w as shown above,if we
oo

assume it to be a 2D space the line divides the total space into 2 halves everything
above the plane and everything below the plane.(+ve half and -ve half).likewise a plane
separates the 3D space into two regions.
dR

● The space above the plane in the direction of w is positive half space and space below
the plane opposite direction of w is negative half space
ie
pl
Ap
At timestamp 4.40 in video

)
y
op
C
ft
ra
● Let’s consider three points x1 below the plane,x2 above the plane and x3 on the plane.
(D
● As shown above for x2 and all the points above the plane w.x2+w0>0 ,and for points
below the plane the value is less than 0 and for points lying on the plane it is 0.
● Let’s understand each scenario one by one.
ts

At timestamp 7.27 in video


oo
dR
ie
pl
Ap

● As shown above we have our plane and normal vector to the plane w and we have point
x1 below the plane in negative half space.
● We extend the vector x1 slightly till it touches the plane and we call the point x1’.Now we
have two vectors x1 and x1’.
● As shown above the magnitude of x1’ is greater than the magnitude of x1.
● We know that w.x1’+w0=0 since x1’ lies on the plane. From above equations it's clear
that w.x1+w0<0

)
At timestamp 8.50 in video

y
op
C
ft
ra
(D
ts

● We now consider a point x2 in positive half space,let’s call the point at which the vector
oo

x2 intersects the plane be x2’.Now we have two vectors x2 and x2’.


● From the above diagram it's clear that length of x2 is greater than length of x2’.
● We know that w.x2’+w0=0 since x2’ lies on the plane. From above equations it's clear
dR

that w.x2+w0>0
ie
pl
Ap
At timestamp 11.53 in video

)
y
op
C
ft
ra
● To summarize we perform classification using the plane based on above conditions.
● When we use a plane to separate any point in d -dimensional space ,we determine
whether the point is above or below the plane.
(D
2.13 Distance of a point from a plane
ts

At timestamp 0.13 in video


oo
dR
ie
pl
Ap

● Given any point we will find which side of the plane it belongs to and also its important to
know how far away it is from the plane.
● Given any plane and a point x ,we know the normal vector w and also the distance from
origin to the plane as shown above.Let's find the distance from point x to the plane
● Distance from plane to the point x is nothing but the length of the line segment that is
perpendicular to plane and passing through the point x.(d as shown above)

)
At timestamp 14.12 in video

y
op
C
ft
ra
(D
● From the above diagram we can say that ɸ=90-𝝷
ts

At timestamp 4.49 in video


oo
dR
ie
pl
Ap

● From the triangle shown above we can calculate a.


At timestamp 5.51 in video

)
y
op
C
ft
ra
(D
● By subtracting a from the length of vector x we can get b as shown.

At timestamp 7.38 in video


ts
oo
dR
ie
pl
Ap

● We can calculate d =bcos𝛉 from the above triangle.


● We substitute b value in the above equation and get d value as shown below.
Ap
pl At timestamp 8.53

ie
dR
oo
ts
(D
ra
ft
C
op
y
)
2.14 Projection & arithmetic operations on
vectors

)
At timestamp 0.40 in video

y
op
C
ft
ra
(D
ts
oo
dR
ie
pl
Ap
)
y
op
C
ft
● Let we have a vector W and its length need not be 1 .If we want to represent a vector

ra
whose length is equal to 1 but in the same direction of W we call such a vector as a unit
vector.
(D
● When we are given a vector W , we can obtain a unit vector which has the same
direction as W as shown above.

At timestamp 4.6 in video


ts
oo
dR
ie
pl
Ap

● Let we have two data points x1 and x2 ,if we want to project vector x1 on to x2 we can
draw a perpendicular line from x1 onto x2 and we call it x1’ as shown above.The
direction of x1’ is the same as x2 .
)
y
op
C
ft

ra
Whenever we have a datapoint we draw a line from origin to the point and we call it a
(D
vector.But vectors can be anywhere like the yellow line shown above and we cannot
interpret it as a datapoint.

At timestamp 8.47 in video


ts
oo
dR
ie
pl
Ap

● We can perform vector addition as shown above.


At timestamp 10.56 in video

)
y
op
C
ft
ra
● Vector addition and subtraction be done even in d dimensions as shown .
(D
● Note that for vector addition,subtraction and dot product the vectors should be of the
same dimensions.
ts

2.15 Equations of circle, sphere &


oo

hypersphere
At timestamp 0.37 in video
dR
ie
pl
Ap
)
y
op
C
ft
ra
(D
ts
oo
dR

● Circle in 2D can be represented as shown above with radius r,centre c=0.To represent
any circle we need its centre c and radius r.
ie
pl

At timestamp 4.52 in video


Ap
)
y
op
C
ft
ra
● From the definition of circle the length from center of circle to any point x is always the
same and it is r.
● Using algebra we derived cx=ox-oc.
(D
At timestamp 5.33
ts
oo
dR
ie
pl
Ap
)
y
op
C
ft
ra
● We already know that the length of cx is radius r.Coordinates of ox are (x1,x2 ) and
coordinates of oc are (c1,c2).By doing component wise subtraction of two vectors we get
(D
radius r as shown.
ts
oo
dR
ie

● Whatever we learnt in 2D can be extended to higher dimensions as well.


pl
Ap

2.16 Classification using hypersphere


At timestamp 1.04 in video
)
y
op
C
ft
● Lets understand how we classify or data using spheres and hyperspheres.We assume

ra
three points x1,x2,x3 lying outside,on and inside the sphere.Typically we use
inside/outside as a mechanism to determine whether a point belongs to which class.
(D
ts
oo
dR
ie
pl

● We have to calculate the length of the vector from centre c to to determine where the
datapoint lies.We have to check the conditions shown above.
Ap
2.17 Ellipse, Ellipsoid, Hyper-Ellipsoid
At timestamp 1.18 in video

)
y
op
C
ft
ra
(D
ts
oo
dR
ie
pl
Ap
)
y
op
C
ft
● We represent an ellipse which is centred at origin as shown above ,we call the x axis as

ra
major axis and y axis as minor axis here
● We can extend the same concept to 3D and higher dimensions as shown above.
(D
2.18 Squares, Rectangles, Hypercubes and
ts

Hyper-cuboids
oo

At timestamp 1.42 in video


dR
ie
pl
Ap
● We can represent a square using 4 lines in 2D such that every pair of lines that intersect
will be perpendicular and if they don't intersect they have to be parallel to each other .
● Rectangle is similar to a square except that opposite sides will be the same length and
all sides need not be of the same length.

)
At timestamp 2.19 in video

y
op
C
ft
ra
(D
ts

● We can extend the same concept to higher dimensions as well.


oo

● A point lies inside or outside of the rectangle/square if and only if it's x-coordinate
lies between the x-coordinate of the given bottom-right and top-left coordinates of
the rectangle and y-coordinate lies between the y-coordinate of the given
dR

bottom-right and top-left coordinates.


ie
pl
Ap

You might also like