? - Vectors, Matrices & Linear Algebra Basics
? - Vectors, Matrices & Linear Algebra Basics
Linear algebra forms the foundation of many areas in mathematics, science, and
engineering. This notebook introduces the core concepts of vectors and matrices,
covering their definitions, operations, geometric interpretations, and applications in
solving systems of linear equations. A strong grasp of these basics is essential for fields
like machine learning, physics, and computer graphics.
Definition of a Vector
A vector is a mathematical object that has both magnitude (size) and direction. It is
typically represented as a list of numbers, which are called the components or elements
of the vector.
v = [v1 , v2 , v3 ]
Types of Vectors
1. Row Vector: A row vector is a 1×n matrix (one row and multiple columns). It is
written horizontally.
Example:
v = [v1 , v2 , v3 ]
2. Column Vector: A column vector is an n×1 matrix (multiple rows and one column).
It is written vertically.
Example:
v1
⎡ ⎤
v = ⎢ v2 ⎥
⎣ ⎦
v3
Vector Operations
1. Vector Addition
Vector addition is the operation of adding two vectors by adding their corresponding
components. If you have two vectors ( \mathbf{v} = [v_1, v_2] ) and ( \mathbf{w} = [w_1,
w_2] ), their sum ( \mathbf{v} + \mathbf{w} ) is:
v + w = [v1 + w1 , v2 + w2 ]
2. Scalar Multiplication
Scalar multiplication involves multiplying each component of a vector by a scalar (a
constant). If ( \mathbf{v} = [v_1, v_2] ) and ( c ) is a scalar, then the scalar multiplication ( c
\mathbf{v} ) is:
cv = [c ⋅ v1 , c ⋅ v2 ]
3. Dot Product
The dot product (or scalar product) of two vectors ( \mathbf{v} = [v_1, v_2] ) and (
\mathbf{w} = [w_1, w_2] ) is calculated as:
v ⋅ w = v 1 ⋅ w1 + v 2 ⋅ w2
The dot product is a scalar value and gives a measure of how much one vector extends in
the direction of another.
2 2
|v| = √v + v
1 2
Direction: The direction of a vector represents the angle it makes with respect to a
reference axis, typically the x-axis. Vectors can be visualized as arrows pointing in the
direction of the vector’s components.
v
^ =
v
|v|
Example in Code:
For example, in 2D space, the magnitude ( |\mathbf{v}| ) of the vector ( \mathbf{v} = [v_1,
v_2] ) is given by:
2 2
|v| = √v + v
1 2
2 2 2
|v| = √v + v + v
1 2 3
2. Direction of a Vector
The direction of a vector is the angle it makes with a reference axis (typically the x-axis in
2D or the x-y plane in 3D). The direction is independent of the vector's magnitude.
In 2D, the angle ( \theta ) between the vector ( \mathbf{v} = [v_1, v_2] ) and the x-axis can
be found using the formula:
v2
−1
θ = tan ( )
v1
3. Unit Vector
A unit vector is a vector with a magnitude of 1, which represents the direction of the
original vector but with no magnitude. You can convert any vector ( \mathbf{v} ) to a unit
vector ( \hat{v} ) by dividing it by its magnitude:
v
v
^ =
|v|
This gives you a vector with the same direction as ( \mathbf{v} ), but with a magnitude of
1.
4. Vector Addition
The sum of two vectors ( \mathbf{v} ) and ( \mathbf{w} ) can be visualized geometrically
using the tip-to-tail method. This means placing the tail of the second vector (
\mathbf{w} ) at the tip of the first vector ( \mathbf{v} ). The resulting vector is the vector
from the tail of ( \mathbf{v} ) to the tip of ( \mathbf{w} ).
v = [v1 , v2 ], w = [w1 , w2 ]
v + w = [v1 + w1 , v2 + w2 ]
5. Scalar Multiplication
When you multiply a vector by a scalar, you stretch or shrink the vector. If the scalar is
positive, the direction remains the same, but if the scalar is negative, the vector reverses
direction.
For example, multiplying a vector ( \mathbf{v} = [v_1, v_2] ) by a scalar ( c ) gives the
vector:
cv = [c ⋅ v1 , c ⋅ v2 ]
If ( c > 1 ), the vector becomes longer, and if ( 0 < c < 1 ), it shrinks. If ( c < 0 ), the vector
also flips direction.
6. Dot Product
The dot product of two vectors measures how much one vector extends in the direction
of the other. It is defined as:
v ⋅ w = |v||w| cos(θ)
Visual Example:
Consider the 2D vector ( \mathbf{v} = [3, 4] ). This vector can be represented as an arrow
from the origin (0,0) to the point (3,4).
2 2
|v| = √3 + 4 = 5
Direction of the vector ( \mathbf{v} ) is the angle ( \theta ) it makes with the x-axis:
4
−1 ∘
θ = tan ( ) ≈ 53.13
3
Example in Code:
In [1]: import numpy as np
# Defining vectors
v = np.array([2, 3])
w = np.array([1, 4])
# Vector addition
addition = v + w
# Scalar multiplication
scalar_mult = 2 * v
# Dot product
dot_product = np.dot(v, w)
# Magnitude of a vector
magnitude_v = np.linalg.norm(v)
# Unit vector of v
unit_vector_v = v / np.linalg.norm(v)
Vector Addition: [3 7]
Scalar Multiplication: [4 6]
Dot Product: 14
Magnitude of v: 3.605551275463989
Unit Vector of v: [0.5547002 0.83205029]
For example, in 2D space, the magnitude of the vector v = [v1, v2] is given by:
2 2
|v| = sqrt(v1 + v2 )
2 2 2
|v| = sqrt(v1 + v2 + v3 )
2. Direction of a Vector
The direction of a vector is the angle it makes with a reference axis (typically the x-axis in
2D or the x-y plane in 3D). The direction is independent of the vector's magnitude.
In 2D, the angle (θ) between the vector v = [v1, v2] and the x-axis can be found using the
formula:
(
θ = tan − 1)(v2/v1)
3. Unit Vector
A unit vector is a vector with a magnitude of 1, which represents the direction of the
original vector but with no magnitude. You can convert any vector v to a unit vector v̂ by
dividing it by its magnitude:
v̂ = v/|v|
This gives you a vector with the same direction as v, but with a magnitude of 1.
4. Vector Addition
The sum of two vectors v and w can be visualized geometrically using the tip-to-tail
method. This means placing the tail of the second vector w at the tip of the first vector
v. The resulting vector is the vector from the tail of v to the tip of w.
5. Scalar Multiplication
When you multiply a vector by a scalar, you stretch or shrink the vector. If the scalar is
positive, the direction remains the same, but if the scalar is negative, the vector reverses
direction.
For example, multiplying a vector v = [v1, v2] by a scalar c gives the vector:
c ∗ v = [c ∗ v1, c ∗ v2]
If c > 1, the vector becomes longer, and if 0 < c < 1, it shrinks. If c < 0, the vector also
flips direction.
6. Dot Product
The dot product of two vectors measures how much one vector extends in the direction
of the other. It is defined as:
v ⋅ w = |v||w|cos(θ)
Visual Example:
Consider the 2D vector v = [3, 4]. This vector can be represented as an arrow from the
origin (0,0) to the point (3,4).
2 2
|v| = sqrt(3 + 4 ) = 5
(
θ = tan − 1)(4/3) ≈ 53.13°
Example in Code:
In [2]: import numpy as np
import matplotlib.pyplot as plt
# Add labels
plt.xlabel('X')
plt.ylabel('Y')
# Operations
v_add = v + scalar
v_sub = v - scalar
v_mult = v * scalar
v_div = v / scalar
# Create a figure with only 5 subplots (2x3 grid without the last one)
fig, axs = plt.subplots(2, 3, figsize=(15, 10))
# Plot the original vector on all subplots (excluding the last one)
for ax in axs.flat:
ax.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color="r
Matrices
Definition of a Matrix
A matrix is a two-dimensional array of numbers arranged in rows and columns. Each
number in the matrix is called an element. Matrices are usually represented by
uppercase letters like A, B, etc.
Example of a 2 × 3 matrix:
1 2 3
A = [ ]
4 5 6
Types of Matrices
Type Description Example
1
⎡ ⎤
⎣ ⎦
3
1 0
Diagonal Matrix Non-diagonal elements are zero [ ]
0 2
1 0
Identity Matrix Diagonal elements = 1, others = 0 I = [ ]
0 1
Upper Triangular 1 2
All elements below main diagonal are 0 [ ]
Matrix 0 3
Lower Triangular 1 0
All elements above main diagonal are 0 [ ]
Matrix 2 3
1 2
Symmetric Matrix A = A
⊤
[ ]
2 3
Skew-Symmetric ⊤
[
0 2
]
A = −A
Matrix −2 0
Matrix Operations
1. Matrix Addition
Matrices must have the same dimensions.
Add corresponding elements.
Example:
1 2 5 6 6 8
A + B = [ ] + [ ] = [ ]
3 4 7 8 10 12
2. Matrix Multiplication
The number of columns of the first matrix must match the number of rows of the
second matrix.
Row-by-column multiplication.
Example:
1 2 5
A = [ ], B = [ ]
3 4 6
(1)(5) + (2)(6) 17
A × B = [ ] = [ ]
(3)(5) + (4)(6) 39
3. Transpose of a Matrix
Rows become columns, columns become rows.
Example:
1 2 ⊤
1 3
A = [ ] ⇒ A = [ ]
3 4 2 4
4. Inverse of a Matrix
Exists only for square matrices where det(A) ≠ 0 .
.
−1
A × A = I
1 d −b
−1
A = [ ]
ad − bc −c a
Example:
1 0
I = [ ]
0 1
Rank of a Matrix
The rank of a matrix is the maximum number of linearly independent rows or
columns.
It represents the dimension of the vector space spanned by its rows/columns.
Rank gives insight into whether a system of equations has a unique solution.
Example:
2. Distributive Property
A(B + C) = AB + AC
Conclusion
Matrices are fundamental structures in linear algebra that help in solving systems of
equations, performing transformations, machine learning algorithms, graphics, and much
more. A solid understanding of matrix types, operations, and properties builds the
foundation for more advanced topics like Linear Regression, PCA, Neural Networks, etc.
Rank of a Matrix
The rank of a matrix is the maximum number of linearly independent rows or
columns.
It represents the dimension of the vector space spanned by its rows/columns.
Rank gives insight into whether a system of equations has a unique solution.
Consider:
1 0 2
⎡ ⎤
B = ⎢0 1 3⎥
⎣ ⎦
4 5 6
Let's check:
Row 1: (1, 0, 2)
Row 2: (0, 1, 3)
Row 3: (4, 5, 6)
If we try:
Expanding:
Setting up equations:
a = 4
b = 5
2a + 3b = 6
Substituting a ,
= 4 b = 5 into third equation:
2(4) + 3(5) = 8 + 15 = 23 ≠ 6
✅ Therefore, all rows are independent, and the rank of matrix is 3 (full rank).
1 2 3
⎡ ⎤
A = ⎢4 5 6⎥
⎣ ⎦
7 8 9
Let's check if the third row is a linear combination of the first two.
Suppose:
Setting up equations:
Expanding:
Solving:
1a + 4b = 7
2a + 5b = 8
3a + 6b = 9
a = 7 − 4b
Then:
a = 7 − 4(2) = −1
Thus:
Conclusion:
The rows are not fully independent.
Hence, the rank of the matrix is 2, not 3 (matrix is not full rank).
Let's take:
1 2 3
⎡ ⎤
B = ⎢2 4 6⎥
⎣ ⎦
7 8 9
Here:
Thus:
Only two rows are linearly independent (Row 1 and Row 3).
Hence, rank = 2.
Quick check:
2 × (1, 2, 3) = (2, 4, 6)
✅ Conclusion:
Rank counts only independent rows.
If a row is a multiple or linear combination of others, it does not add to the rank.
Matrix-Vector Multiplication
Example:
Let
1 2
⎡ ⎤
7
A = ⎢3 4⎥, x = [ ]
⎣ ⎦ 8
5 6
Then,
(1)(7) + (2)(8) 7 + 16 23
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
Ax = ⎢ (3)(7) + (4)(8) ⎥ = ⎢ 21 + 32 ⎥ = ⎢ 53 ⎥
⎣ ⎦ ⎣ ⎦ ⎣ ⎦
(5)(7) + (6)(8) 35 + 48 83
Example:
2x + 3y = 5
4x + y = 6
2 3 x 5
[ ][ ] = [ ]
4 1 y 6
−1
x = A b
Applications:
Conclusion
Matrix-vector multiplication builds the bridge between linear algebra and real-world
problem solving by providing a compact way to represent and compute linear systems,
transformations, and optimizations.
axs[0].set_xlim(-1, 7)
axs[0].set_ylim(-1, 7)
axs[0].set_aspect('equal')
axs[0].grid(True)
axs[0].legend()
axs[0].set_title('Matrix Rows and Vector')
axs[1].set_xlim(-1, 15)
axs[1].set_ylim(-1, 15)
axs[1].set_aspect('equal')
axs[1].grid(True)
axs[1].legend()
axs[1].set_title('Resultant Vector')
plt.show()
y = mx + b
Where:
a1 x + b 1 y = c 1
a2 x + b 2 y = c 2
Where:
Ax = b
Where:
a1 b1 x c1
A = [ ], x = [ ], b = [ ]
a2 b2 y c2
y = β0 + β1 x + ϵ
Where:
ri = yi − (β0 + β1 xi )
Where:
2 2
E(β0 , β1 ) = ∑ r = ∑ (yi − (β0 + β1 xi ))
i
i=1 i=1
y = Xβ + ϵ
Where:
1 x1
⎡ ⎤
⎢1 x2 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⋮ ⋮ ⎥
⎣ ⎦
1 xn
),
( \boldsymbol{\beta} ) is the vector of coefficients ( [\beta_0, \beta_1]^T ),
( \mathbf{\epsilon} ) is the error term.
T −1 T
β = (X X) X y
This gives us the values of ( \beta_0 ) and ( \beta_1 ) that minimize the sum of squared
residuals.
The core idea of gradient descent is to update the parameters of the model (in our case, (
\beta_0 ) and ( \beta_1 )) in the direction of the steepest descent of the cost function,
which tells us how far off our predictions are from the actual values.
Where:
The gradient of the cost function with respect to the parameters ( \beta_0 ) and ( \beta_1
) is calculated to determine the direction of the steepest slope, and the parameters are
updated accordingly:
∂J
β1 := β1 − α
∂β1
Where:
( \alpha ) is the learning rate (a small positive value that controls the step size of
each update),
( \frac{\partial J}{\partial \beta_0} ) and ( \frac{\partial J}{\partial \beta_1} ) are the
partial derivatives of the cost function with respect to ( \beta_0 ) and ( \beta_1 ),
respectively.
By iteratively applying these update rules, gradient descent moves toward the values of (
\beta_0 ) and ( \beta_1 ) that minimize the cost function, thereby finding the best fit line
for the data.
Y = Xβ + ϵ
Where:
T −1 T
β = (X X) X y
Where:
The normal equation provides an exact solution for the coefficients, minimizing the sum
of squared residuals between the predicted values and the actual values.
The Mean Squared Error (MSE) measures the average of the squared
differences between the actual and predicted values. It gives an idea of how well
the model fits the data.
The formula for MSE is:
n
1
2
M SE = ^ )
∑(yi − y
i
n
i=1
Where:
The ( R^2 ) score measures the proportion of the variance in the dependent
variable that is explained by the independent variables.
The formula for ( R^2 ) is:
n 2
∑ (yi − y
^ )
i=1 i
2
R = 1 −
n
2
∑ (yi − ȳ )
i=1
Where:
A value closer to 1 indicates that the model explains a large portion of the
variance in the dependent variable.
A value closer to 0 indicates that the model explains very little of the variance.
In [ ]:
In [ ]: