Data_Structures_AI
Data_Structures_AI
to Practice
Albert Einstein
Stefan Banach
Shakuntala Devi
Mark Jackson
Peter Norvig
* Equal contribution
† Corresponding author
4
Contents
5
6 CONTENTS
10 Introduction 81
10.1 Mathematical Foundations in Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.2 Importance of Linear Algebra and Matrix Operations . . . . . . . . . . . . . . . . . . . . 81
10.3 PyTorch and TensorFlow for Mathematical Computations . . . . . . . . . . . . . . . . . 82
13 Matrix Operations 91
13.1 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
13.2 Optimization of Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
13.2.1 Basics of Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Example of Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
13.2.2 Traditional Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
13.2.3 Strassen’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
How Strassen’s Algorithm Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Python Implementation of Strassen’s Algorithm . . . . . . . . . . . . . . . . . . . 93
13.2.4 Further Improvements in Matrix Multiplication . . . . . . . . . . . . . . . . . . . . 94
Coppersmith-Winograd Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Recent Advances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
13.2.5 Practical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
13.2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
13.3 Transpose of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
13.4 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
13.5 Determinant of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
13.6 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
CONTENTS 9
19 Summary 133
21 Summary 143
21.1 Key Concepts Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
34 Summary 221
34.1 Key Concepts Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
36 Conclusion 231
40 Z-Transform 255
40.1 Introduction to Z-Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
40.1.1 What is the Z-Transform? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
40.2 Mathematical Definition of Z-Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
40.2.1 Z-Transform of Common Sequences . . . . . . . . . . . . . . . . . . . . . . . . . 256
40.2.2 Inverse Z-Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
40.2.3 Properties of Z-Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
40.3 Applications of Z-Transform in Digital Signal Processing . . . . . . . . . . . . . . . . . . 258
40.3.1 Discrete-Time Signal Analysis using Z-Transform . . . . . . . . . . . . . . . . . . 258
40.3.2 Deep Learning Applications of Z-Transform in Recurrent Neural Networks . . . . 259
CONTENTS 15
43 Conclusion 275
45 Summary 279
45.1 Key Concepts Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
45.1.1 Fourier Transform and FFT Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
45.1.2 Laplace and Z-Transform Recap . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
45.1.3 Convolution Theorem and Its Importance . . . . . . . . . . . . . . . . . . . . . . . 279
Part I
17
Chapter 1
Python is a high-level, interpreted programming language known for its simplicity, readability, and
flexibility[138]. It was created by Guido van Rossum and first released in 1991[273]. Python has since
become one of the most popular programming languages in the world, used in various domains such
as web development, scientific computing, artificial intelligence, and, most importantly for us, mathe-
matical computing. Python is open-source, meaning it’s free to use, and has a large community that
contributes to its development and creates powerful libraries for every imaginable use case.
Python’s simplicity makes it ideal for beginners, while its extensive libraries and scalability appeal
to experienced programmers. Python’s syntax is designed to be readable, reducing the complexity for
those new to programming.
Python’s name comes from “Monty Python’s Flying Circus,”[132] a British sketch comedy show that its
creator, Guido van Rossum, enjoyed. He wanted to create a language that was easy and fun to use,
while still being powerful enough to solve complex problems. Python has evolved through multiple
versions, with Python 2 and Python 3 being the most notable. Python 3 is the current standard, offering
many improvements over Python 2.
Guido van Rossum released Python 1.0 in 1994, and since then, Python has continued to grow[272].
Python 3, released in 2008, introduced many changes that were not backward-compatible with Python
2, leading to a gradual shift in adoption by the developer community.
• Readability and Simplicity: Python’s syntax is clear and easy to understand, making it ideal for
beginners.
• Powerful Libraries: Python boasts a rich ecosystem of libraries like NumPy, SciPy, SymPy, and
Matplotlib, which are essential for mathematical computations and data visualization.
19
20 CHAPTER 1. INTRODUCTION TO PYTHON PROGRAMMING
• Interpreted Language: Python is an interpreted language, meaning you can write and execute
code line by line, making it easier to debug and experiment with.
• Versatility: Python is not only great for mathematics but also for data analysis, machine learning,
and artificial intelligence, making it a multipurpose tool in various fields.
3. Run the installer and ensure you check the box labeled “Add Python to PATH.”
4. Follow the installation instructions, and once completed, you can verify the installation by open-
ing a command prompt and typing:
python --version
For macOS or Linux: Most versions of macOS and Linux come with Python pre-installed. You can
check the version of Python installed by running:
python3 --version
Setting Up a Python IDE: You can write Python code using any text editor, but it is more efficient
to use an Integrated Development Environment (IDE) like:
1 x = 10 # An integer variable
2 y = 3.14 # A float variable
3 name = "Alice" # A string variable
4 is_happy = True # A boolean variable
• x is an integer.
• y is a floating-point number.
• name is a string.
• + (Addition)
• - (Subtraction)
• * (Multiplication)
• / (Division)
• % (Modulus)
Example:
1 a = 5
2 b = 3
3 sum = a + b # Adds a and b
4 difference = a - b # Subtracts b from a
5 product = a * b # Multiplies a and b
6 quotient = a / b # Divides a by b
7 remainder = a % b # Finds remainder when a is divided by b
Example:
1 name = input("Enter your name: ")
2 print("Hello, " + name)
This code takes the user’s input and greets them by name.
22 CHAPTER 1. INTRODUCTION TO PYTHON PROGRAMMING
This program outputs the text “Hello, World!” to the console. It is an excellent starting point because
it introduces the print() function and shows how to execute a basic Python program.
In this program:
• The numbers are converted to float type to allow for decimal values.
• The two numbers are added and the result is displayed using the print() function.
This program works similarly to the addition example, but it multiplies the two numbers instead of
adding them.
Chapter 2
2.1 Lists
A list is one of the most commonly used data structures in Python. A list is a collection of items
that can hold different types of elements, such as integers, floats, strings, or even other lists. Lists in
Python are ordered and mutable, meaning that their elements can be modified after the list is created.
To define a list in Python, you use square brackets ‘[]‘ and separate the elements with commas.
Here’s an example:
You can also create a list that contains different data types:
In Python, you can access individual elements of a list using indexing. The index of the first element in
a list is ‘0‘, the second element is at index ‘1‘, and so on. Negative indexing starts from the last element
with index ‘-1‘.
You can also access multiple elements at once using slicing. The syntax for slicing is ‘list[start:stop]‘,
where ‘start‘ is the index where slicing starts, and ‘stop‘ is the index where slicing stops (but it does
not include the element at the ‘stop‘ index).
23
24 CHAPTER 2. FUNDAMENTAL PYTHON DATA STRUCTURES
1 # Slicing a list
2 print(my_list[1:3]) # Output: [20, 30]
3 print(my_list[:3]) # Output: [10, 20, 30]
4 print(my_list[2:]) # Output: [30, 40, 50]
1 my_list = [5, 2, 9, 1]
2
3 # Append
4 my_list.append(7)
5 print(my_list) # Output: [5, 2, 9, 1, 7]
6
7 # Pop
8 last_element = my_list.pop()
9 print(last_element) # Output: 7
10 print(my_list) # Output: [5, 2, 9, 1]
11
12 # Sort
13 my_list.sort()
14 print(my_list) # Output: [1, 2, 5, 9]
2.2 Tuples
1 # Defining a tuple
2 my_tuple = (1, 2, 3)
3 print(my_tuple) # Output: (1, 2, 3)
You can also define a tuple without parentheses, using just commas:
Although tuples are immutable, you can perform other operations like indexing and slicing, similar
to lists.
1 my_tuple = (5, 10, 15, 20)
2
3 # Accessing elements
4 print(my_tuple[1]) # Output: 10
5
6 # Slicing
7 print(my_tuple[:3]) # Output: (5, 10, 15)
2.3 Dictionaries
To add or update a key-value pair, you can simply assign a value to the key.
1 # Updating dictionary
2 my_dict["age"] = 26
3 my_dict["email"] = "[email protected]"
4 print(my_dict)
5 # Output: {'name': 'Alice', 'age': 26, 'city': 'New York', 'email': '[email protected]'}
2.4 Sets
1 # Defining a set
2 my_set = {1, 2, 3, 4, 4, 5}
3 print(my_set) # Output: {1, 2, 3, 4, 5}
1 set_a = {1, 2, 3}
2 set_b = {3, 4, 5}
3
4 # Union
5 print(set_a | set_b) # Output: {1, 2, 3, 4, 5}
6
7 # Intersection
8 print(set_a & set_b) # Output: {3}
9
10 # Difference
11 print(set_a - set_b) # Output: {1, 2}
5 total = sum(numbers)
6 print(total) # Output: 15
7
1 if condition_1:
2 # Block of code executed if condition_1 is True
3 elif condition_2:
4 # Block of code executed if condition_1 is False and condition_2 is True
5 else:
6 # Block of code executed if both condition_1 and condition_2 are False
• Python checks condition_1. If it evaluates to True, the code block under the if statement runs,
and the rest of the conditions are ignored.
• If condition_1 is False, Python checks condition_2. If it’s True, the code block under the elif
statement is executed.
• If both conditions are False, the code under the else block runs.
3 if number > 0:
4 print("The number is positive.")
29
30 CHAPTER 3. CONTROL FLOW AND FUNCTIONS IN PYTHON
Loops are used to iterate over sequences (like lists, tuples, or strings) or execute a block of code
repeatedly as long as a condition is true.
There are two primary types of loops in Python:
for Loop
The for loop is typically used when you know the number of iterations ahead of time or when iterating
through a collection.
Here’s an example of using a for loop to print each element in a list:
1 numbers = [1, 2, 3, 4, 5]
2
while Loop
The while loop is used when you want to repeat a block of code as long as a condition remains true.
Example of a while loop:
1 count = 0
2
In this example, the loop will run until the value of count reaches 5.
3.3. DEFINING AND USING FUNCTIONS 31
Factorial Calculation
A factorial of a number is the product of all integers from 1 up to that number. Here’s how you can
calculate the factorial of a number using a while loop:
1 def function_name(parameters):
2 # Function body
3 return value # optional
For example, let’s define a function that takes two numbers and returns their sum:
1 def add_numbers(a, b):
2 return a + b
You can then call the function by passing the appropriate arguments:
1 result = add_numbers(3, 4)
2 print(result) # Output: 7
32 CHAPTER 3. CONTROL FLOW AND FUNCTIONS IN PYTHON
4 result = multiply_numbers(6, 7)
5 print(result) # Output: 42
If you don’t include a return statement, the function will return None.
1 multiply = lambda x, y: x * y
2 print(multiply(5, 4)) # Output: 20
Lambda functions are useful when you need a quick function for a short task.
1 def square_numbers(n):
2 for i in range(n):
3 yield i ** 2
4
5 squares_gen = square_numbers(10)
6
The key advantage of generators is that they don’t store all values in memory at once. Instead, they
yield one value at a time, making them ideal for large data sets.
34 CHAPTER 3. CONTROL FLOW AND FUNCTIONS IN PYTHON
Chapter 4
• Homogeneity: NumPy arrays can only store elements of the same data type, which makes them
more memory-efficient and faster compared to Python lists.
• Performance: Operations on NumPy arrays are optimized and vectorized, meaning they run sig-
nificantly faster than corresponding operations on Python lists.
• Multidimensional Support: While Python lists are inherently one-dimensional (though they can
store lists of lists), NumPy arrays are inherently multi-dimensional, supporting complex struc-
tures like matrices and tensors.
• Built-in Mathematical Functions: NumPy provides a wide range of built-in mathematical oper-
ations that work on entire arrays, including element-wise addition, multiplication, dot products,
and more.
Once installed, you can create NumPy arrays from Python lists or use NumPy’s built-in functions
to initialize arrays.
35
36 CHAPTER 4. ADVANCED DATA STRUCTURES IN PYTHON
NumPy provides several useful functions to create arrays initialized with specific values, such as zeros,
ones, or random numbers.
1. Initializing an array with zeros:
The numpy.zeros() function creates an array filled with zeros. The shape of the array is passed as
an argument.
1 import numpy as np
2
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
[[1. 1. 1. 1.]
[1. 1. 1. 1.]]
[[ 6 8]
[10 12]]
2. Array Multiplication:
[[ 5 12]
[21 32]]
2. Slicing:
NumPy arrays can also be sliced using the same syntax as Python lists. You can specify ranges of
rows and columns to extract subarrays.
[1 2]
[2 4]
3. Reshaping:
The reshape() function allows you to change the shape of an array without changing its data.
[[1 2 3]
[4 5 6]]
[[ 6 8]
[10 12]]
4.2. MATRIX OPERATIONS IN NUMPY 39
2. Matrix Multiplication:
Element-wise multiplication is performed by using the * operator.
1 matrix_product = matrix1 * matrix2
2 print(matrix_product)
2. Inverse of a Matrix:
To compute the inverse of a matrix, use the numpy.linalg.inv() function. Note that the matrix
must be square and invertible.
1 inverse_matrix = np.linalg.inv(matrix1)
2 print(inverse_matrix)
The dot product multiplies rows of the first matrix by columns of the second matrix and sums the
result.
40 CHAPTER 4. ADVANCED DATA STRUCTURES IN PYTHON
Chapter 5
1 # Addition
2 result_add = 5 + 3
3 print("Addition: 5 + 3 =", result_add)
4
5 # Subtraction
6 result_sub = 10 - 4
7 print("Subtraction: 10 - 4 =", result_sub)
8
9 # Multiplication
10 result_mul = 7 * 6
11 print("Multiplication: 7 * 6 =", result_mul)
12
13 # Division
14 result_div = 20 / 4
15 print("Division: 20 / 4 =", result_div)
Addition: 5 + 3 = 8
Subtraction: 10 - 4 = 6
41
42 CHAPTER 5. FUNDAMENTAL MATHEMATICAL OPERATIONS IN PYTHON
Multiplication: 7 * 6 = 42
Division: 20 / 4 = 5.0
As seen, basic arithmetic operations are intuitive. Note that the division operation always returns
a floating-point result, even if both operands are integers.
3 # Exponentiation
4 result_exp = 2 ** 3
5 print("Exponentiation: 2^3 =", result_exp)
6
7 # Logarithm (base e)
8 result_log = math.log(10)
9 print("Logarithm (base e): log(10) =", result_log)
10
In this example, the exponentiation operation calculates powers, and the math.log() function is
used for natural logarithms (base e), while math.log10() is used for logarithms with base 10.
5 # Maximum value
6 result_max = max(3, 5, 2, 8)
7 print("Maximum value:", result_max)
8
9 # Minimum value
10 result_min = min(3, 5, 2, 8)
11 print("Minimum value:", result_min)
5.2. VECTOR AND MATRIX OPERATIONS 43
1 import numpy as np
2
7 # Vector addition
8 result_add = vector_a + vector_b
9 print("Vector addition:", result_add)
10
11 # Vector subtraction
12 result_sub = vector_a - vector_b
13 print("Vector subtraction:", result_sub)
14
15 # Scalar multiplication
16 result_scalar_mul = 2 * vector_a
17 print("Scalar multiplication:", result_scalar_mul)
Vector addition: [5 7 9]
Vector subtraction: [-3 -3 -3]
Scalar multiplication: [2 4 6]
5 # Matrix addition
6 result_add = matrix_a + matrix_b
7 print("Matrix addition:\n", result_add)
8
9 # Matrix subtraction
10 result_sub = matrix_a - matrix_b
11 print("Matrix subtraction:\n", result_sub)
12
Note the difference between element-wise multiplication and dot product multiplication.
6 # Determinant of a matrix
7 matrix_det = np.linalg.det(matrix_c)
8 print("Determinant of the matrix:", matrix_det)
5.3. LINEAR ALGEBRA WITH NUMPY 45
In this example, np.linalg.inv() is used to compute the inverse of a matrix, and np.linalg.det()
is used to calculate the determinant.
5 # Dot product
6 dot_product = np.dot(vector_a, vector_b)
7 print("Dot product:", dot_product)
8
9 # Cross product
10 cross_product = np.cross(vector_a, vector_b)
11 print("Cross product:", cross_product)
8 import scipy.linalg as la
9 p, l, u = la.lu(matrix_d)
10 print("L matrix:\n", l)
11 print("U matrix:\n", u)
In this chapter, we will explore advanced mathematical operations using Python. These operations
form the backbone of mathematical computing in various fields such as engineering, physics, and
data science. We will introduce two powerful Python libraries: Scipy for numerical mathematics and
Sympy for symbolic mathematics.
• Optimization algorithms
• Signal processing
• Linear algebra
Example: Basic Use of Scipy for Integration Let’s start by performing numerical integration using
Scipy.
47
48 CHAPTER 6. ADVANCED MATHEMATICAL OPERATIONS IN PYTHON
4 def f(x):
5 return x**2
6
In this example, we define a simple function f (x) = x2 and integrate it between 0 and 2 using scipy.integrate.quad.
The quad() function is designed for one-dimensional integrals and returns both the integral result and
an estimate of the error.
In this example, we define a symbolic expression f (x) = x2 + 2x + 1 and compute its derivative
using sympy.diff. This demonstrates how Sympy allows symbolic differentiation.
approach the problem differently: Scipy for numerical methods and Sympy for symbolic methods.
Here, derivative() estimates the derivative numerically at a given point (in this case, at x = 1) by
using a small increment value (dx).
Symbolic Differentiation with Sympy: Symbolic differentiation is more precise and allows us to
find the derivative as a formula rather than at a specific point.
1 from sympy import symbols, diff
2
6 # Define a function
7 f = x**3 + x**2
8
The result here will be the exact symbolic derivative of the function f (x) = x3 +x2 , which is 3x2 +2x.
This code integrates the function f (x) = x3 over the range 0 to 2 and returns the result along with
an error estimate.
Symbolic Integration with Sympy: With Sympy, you can compute integrals symbolically, meaning
the result will be an exact mathematical expression rather than a numerical approximation.
6 # Define a function
7 f = x**3
8
In this example, we compute the exact symbolic integral of f (x) = x3 between 0 and 2, yielding
16
the exact result 4 = 4.
6 # Define an equation
7 equation = Eq(x**2 - 4, 0)
8
This example solves the quadratic equation x2 − 4 = 0, and solve() returns both solutions: x = 2
and x = −2.
Calculating Limits with Sympy: Sympy also allows us to compute limits symbolically.
6.3. SYMBOLIC MATHEMATICS WITH SYMPY 51
6 # Define a function
7 f = (x**2 - 1) / (x - 1)
8
x2 −1
This computes the limit of x−1 as x approaches 1, which gives 2.
6 # Define a function
7 f = x**4
8
This example computes the second derivative of the function f (x) = x4 , resulting in 12x2 .
Example: Definite Integral
6 # Define a function
7 f = x**2
8
Here:
• t represents time.
In simple terms, the Fourier Transform converts a time-domain signal into a sum of sinusoids
of different frequencies. Each sinusoid has a corresponding amplitude and phase, which together
describe how much of that particular frequency is present in the original signal.
1 import numpy as np
2 import matplotlib.pyplot as plt
3
4 # Sampling parameters
5 sampling_rate = 1000 # Samples per second
6 T = 1.0 / sampling_rate # Sampling period
7 L = 1000 # Length of signal
8 t = np.linspace(0, L*T, L, endpoint=False) # Time vector
9
37 plt.show()
In this example:
• The result is a complex-valued array where the magnitude gives the amplitude of each frequency
component in the signal.
• The numpy.fft.fftfreq function returns the corresponding frequencies for each element in the
Fourier Transform result.
12 # Compute the inverse FFT to get the filtered signal back in time domain
13 filtered_signal = np.fft.ifft(noisy_fft)
14
32 plt.show()
In this example:
• After computing the FFT, we apply a low-pass filter by setting the Fourier coefficients of frequen-
cies higher than the cutoff frequency to zero.
• We use the inverse FFT (np.fft.ifft) to convert the filtered signal back into the time domain.
The Laplace Transform is a mathematical operation that transforms a function of time f (t) into a
function of a complex variable s[241]. It is used extensively in the analysis of linear time-invariant (LTI)
systems, especially in control systems and circuit analysis.
The Laplace Transform is defined as:
Z ∞
F (s) = f (t)e−st dt
0
where:
The Laplace Transform is particularly useful because it converts differential equations into alge-
braic equations, making them easier to solve. It also provides insights into the stability and transient
behavior of systems.
In this example, we compute the Laplace Transform of e−t , which is a common function used in
control systems and signal processing. The result is the symbolic Laplace Transform.
In this example:
56 CHAPTER 6. ADVANCED MATHEMATICAL OPERATIONS IN PYTHON
The Z-Transform is essential for understanding the behavior of digital filters and systems, espe-
cially in applications like digital audio processing and communications systems.
Chapter 7
1 class Car:
2 # Constructor method to initialize the object
3 def __init__(self, make, model, year):
4 self.make = make
5 self.model = model
6 self.year = year
7
• Car: This is the class name. By convention, class names in Python are written in CamelCase.
57
58 CHAPTER 7. OBJECT-ORIENTED PROGRAMMING AND MODULARIZATION IN PYTHON
• self: This refers to the instance of the class. It allows access to the object’s attributes and
methods within the class. Every method in a class must have self as the first parameter.
• __init__: This is the constructor method that gets called when an object is instantiated. It
initializes the object with the provided parameters.
• display_info: This is a method that prints out information about the car. Methods inside classes
always have self as their first parameter.
In this example, we created an object called my_car from the Car class and accessed its display_info
method, which prints the car’s details.
Inheritance allows one class (the child class) to inherit attributes and methods from another class (the
parent class). This promotes code reuse and is a core concept of OOP.
Here is an example where ElectricCar inherits from the Car class:
1 class ElectricCar(Car):
2 # Constructor for ElectricCar that extends Car
3 def __init__(self, make, model, year, battery_size):
4 # Calling the constructor of the parent class (Car)
5 super().__init__(make, model, year)
6 self.battery_size = battery_size
7
• ElectricCar is the child class, and it inherits from the Car class.
• The display_info method is overridden in the child class to provide additional information about
the battery size.
Polymorphism allows different classes to have methods with the same name but potentially differ-
ent behavior. For example, both Car and ElectricCar have a display_info method, but the behavior
differs based on the class.
In Python, methods can be classified into instance methods, class methods, and static methods:
7.2. MODULES AND PACKAGES IN PYTHON 59
• Instance Methods: These methods act on an instance of the class and have access to the in-
stance’s attributes. These are the most common type of methods and must take self as their
first parameter.
• Class Methods: These methods are called on the class itself rather than on an instance. They
are defined using the @classmethod decorator, and they take cls as their first parameter instead
of self.
• Static Methods: These methods neither modify the state of an object nor the state of the class.
They are defined using the @staticmethod decorator.
1 class MathOperations:
2 # Class method
3 @classmethod
4 def square(cls, x):
5 return x * x
6
7 # Static method
8 @staticmethod
9 def add(x, y):
10 return x + y
11
As your programs become larger, it’s essential to organize your code. Python provides an excellent
way to do this by using modules. A module is simply a Python file that contains related functions,
classes, or variables. You can then import this module into other Python files and reuse its code.
Suppose you have a file math_utils.py containing useful functions:
1 # math_utils.py
2 def add(a, b):
3 return a + b
4
You can import this module into another Python script like so:
60 CHAPTER 7. OBJECT-ORIENTED PROGRAMMING AND MODULARIZATION IN PYTHON
1 # main.py
2 import math_utils
3
4 result = math_utils.add(5, 3)
5 print(result) # Output: 8
By organizing your code into modules, you make your programs more modular and easier to main-
tain.
Now you can import the package and its modules like this:
1 # Importing from the package
2 from mypackage import math_utils
3
4 result = math_utils.add(10, 5)
5 print(result)
Packages make it easier to organize large projects and share reusable components.
Python also has an extensive ecosystem of third-party modules, which you can install via pip,
Python’s package manager. For example, to install and use the popular requests module for making
HTTP requests:
pip install requests
3 response = requests.get('https://api.github.com')
7.3. INTRODUCTION TO COMMON SCIENTIFIC LIBRARIES 61
4 print(response.status_code)
• SciPy: Built on top of NumPy, SciPy provides additional functionality for optimization, integration,
interpolation, eigenvalue problems, and more.
• Matplotlib: This is a plotting library used to create static, interactive, and animated visualizations
in Python. It is highly flexible and widely used for data visualization.
1 import numpy as np
2 import matplotlib.pyplot as plt
3
In this example:
• Matplotlib was used to plot the results, and we labeled the axes and title before displaying the
plot.
Similarly, SciPy provides more specialized functions. For instance, to solve an integral using SciPy:
62 CHAPTER 7. OBJECT-ORIENTED PROGRAMMING AND MODULARIZATION IN PYTHON
This example shows how to use SciPy to compute the integral of x2 from 0 to 2.
Chapter 8
In this chapter, we will apply advanced mathematical concepts using Python. The focus will be on
real-world applications, such as signal processing using Fourier transforms and matrix operations in
the context of neural networks. These projects are aimed at giving beginners hands-on experience
with complex mathematical operations, making the transition from theory to practice smoother.
1 import numpy as np
2 import matplotlib.pyplot as plt
3
63
64 CHAPTER 8. PROJECT: IMPLEMENTING ADVANCED MATHEMATICAL OPERATIONS
29 plt.tight_layout()
30 plt.show()
In this example:
• The resulting frequency-domain signal is plotted, showing a peak at 5 Hz, which corresponds to
the frequency of the original sine wave.
• Sampling Frequency (Fs): This is the number of samples taken per second. In the example, we
sample the signal at 500 Hz[277].
• FFT Output: The output of FFT is complex numbers. To get the magnitude of the signal in the
frequency domain, we use the absolute value of the FFT result.
Convolution is a fundamental operation in signal processing and image processing. It involves com-
bining two signals to form a third signal. Convolution in the time domain can be computationally
expensive, especially for large signals. However, the Fourier Transform can be used to compute con-
volution efficiently[219].
Convolution Using Fourier Transforms
Using the Convolution Theorem[257], we know that convolution in the time domain is equivalent to
multiplication in the frequency domain. Here’s how we can implement convolution using FFT:
8.2. MATRIX OPERATIONS AND THEIR APPLICATIONS IN NEURAL NETWORKS 65
12 # Compute the inverse FFT to get the convolved signal in time domain
13 convolved_signal = np.fft.ifft(fft_convolved)
14
This method is much faster than direct convolution for long signals. We first take the FFT of both
signals, multiply them in the frequency domain, and then use the inverse FFT to get the convolved
signal back in the time domain.
Neural networks are essentially a series of matrix operations. To demonstrate this, we will build a
simple feedforward neural network with one hidden layer using only NumPy.
Step-by-Step Guide to Building the Neural Network
We will create a neural network with:
3. Forward Propagation
During forward propagation, the input is passed through the network to produce the output. This
involves several matrix multiplications and the application of activation functions.
1 # Forward propagation
2 def forward_propagation(X):
3 # Input to hidden layer
4 z1 = np.dot(X, W1) + b1
5 a1 = sigmoid(z1)
6
11 return a1, a2
10 # Example dataset
11 X = np.array([[0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
12 y = np.array([[0], [1], [1], [0]])
13
14 train(X, y)
• Forward Propagation: We compute the output layer’s predictions by passing the input through
the network.
• Backpropagation: We compute the gradients of the error with respect to the weights and update
them to minimize the error.
• Training Loop: The network is trained over many iterations, each time updating the weights to
reduce the error.
where:
In deep learning, the optimization of these matrices through training is what enables the network
to learn from data.
We will use the scipy.signal module to represent and simulate control systems. Here’s an exam-
ple of how to define a transfer function and simulate a step response.
1 import numpy as np
2 import scipy.signal as signal
3 import matplotlib.pyplot as plt
4
20 plt.grid(True)
21 plt.show()
In this example, we define a second-order system represented by the transfer function G(s) =
1
s2 +2s+1 ,which describes a damped system. The step response simulates how the system responds
to a unit step input, which is a common way to analyze system behavior.
The graph produced by this code will show the system’s response over time.
19 plt.subplot(1, 2, 1)
20 plt.imshow(image_array, cmap='gray')
21 plt.title('Original Image')
22 plt.axis('off')
23
24 plt.subplot(1, 2, 2)
70 CHAPTER 8. PROJECT: IMPLEMENTING ADVANCED MATHEMATICAL OPERATIONS
25 plt.imshow(magnitude_spectrum, cmap='gray')
26 plt.title('Magnitude Spectrum (Frequency Domain)')
27 plt.axis('off')
28
29 plt.show()
• Shifts the frequency components so that the low frequencies are at the center.
• Computes and plots the magnitude spectrum, which visualizes the frequency content of the
image.
The Fourier Transform provides insights into the frequency characteristics of the image, such as
identifying dominant patterns or filtering out noise.
Now, let’s process an audio signal and analyze it using the Short-Time Fourier Transform (STFT).
1 import librosa
2 import numpy as np
3 import matplotlib.pyplot as plt
4
21 plt.ylabel('Frequency [Hz]')
22 plt.show()
• Applies the Short-Time Fourier Transform (STFT) to the signal, which provides a time-frequency
representation.
• Visualizes the spectrogram, which shows how the frequency content of the signal changes over
time.
In deep learning, such frequency-domain representations are used for tasks like sound event detec-
tion and speech recognition, as they provide more meaningful features for machine learning models
compared to raw time-domain signals.
8.5 Summary
In this chapter, we have explored the fundamental mathematical operations in Python, including arith-
metic operations, vector and matrix operations, and linear algebra using numpy. We also delved into
more advanced topics such as the Laplace Transform’s applications in control systems and the Fourier
Transform’s applications in deep learning. These topics serve as a foundation for applying mathemat-
ical methods to real-world problems in engineering, control systems, and deep learning.
By understanding how to manipulate data in both the time and frequency domains, you gain pow-
erful tools for analyzing and solving complex problems, whether in control system design or deep
learning applications.
72 CHAPTER 8. PROJECT: IMPLEMENTING ADVANCED MATHEMATICAL OPERATIONS
Chapter 9
This chapter will serve as a comprehensive review of what we have covered so far, including Python’s
basic data structures, mathematical functions, and essential libraries such as Scipy and Sympy. We will
also discuss how you can continue your learning journey in Python and scientific computing. Finally,
we will end with a project-based practice, where you will build your own mathematical function library,
applying the concepts you have learned.
1 # Example of a list
2 numbers = [1, 2, 3, 4, 5]
3 numbers.append(6) # Adding an element to the list
4 print(numbers)
2. Tuples: Tuples are immutable (cannot be modified after creation) and are used for storing fixed
collections of items.
1 # Example of a tuple
2 coordinates = (10, 20)
3 print(coordinates)
3. Dictionaries: Dictionaries store data in key-value pairs and are very useful when you want to
map one value to another.
1 # Example of a dictionary
2 student = {"name": "John", "age": 21}
3 print(student["name"])
73
74 CHAPTER 9. SUMMARY AND PRACTICE
1 # Example of a set
2 fruits = {"apple", "banana", "cherry"}
3 fruits.add("orange")
4 print(fruits)
1 a = 10
2 b = 5
3 sum_result = a + b
4 difference = a - b
5 print("Sum:", sum_result)
6 print("Difference:", difference)
1 product = a * b
2 quotient = a / b
3 print("Product:", product)
4 print("Quotient:", quotient)
6 # Define a function
7 f = x**2 + 2*x + 1
8
We will create a file called mymath.py, which will contain all the mathematical functions.
1 # mymath.py
2
This file contains functions for basic operations like addition, subtraction, multiplication, and divi-
sion.
Now, let’s extend our library to include more advanced functions, such as solving quadratic equations
and performing integration using Scipy.
1 # mymath.py (extended)
2
3 import math
4 from scipy import integrate
5
79
Chapter 10
Introduction
• Linear Algebra[109]: Essential for understanding tensors, matrix operations, and vector spaces.
• Calculus[62]: Necessary for understanding how optimization works (e.g., gradient descent).
• Probability and Statistics[234]: Important for understanding how models handle uncertainty,
interpret data, and measure performance.
• Optimization[74]: Helps in tuning models to minimize loss functions and improve accuracy.
In this chapter, we will explore these mathematical concepts and how they relate to deep learning
through practical implementations in PyTorch and TensorFlow, two of the most widely used deep
learning frameworks.
• Multiplying inputs with weights in neural networks can be seen as matrix multiplication.
In deep learning, tensors are the generalization of matrices to higher dimensions. Tensors are the
core data structure, and their manipulation is crucial for training deep learning models.
81
82 CHAPTER 10. INTRODUCTION
• PyTorch: Known for its dynamic computational graph, PyTorch allows users to define models
and compute gradients on the fly, making it easier to debug and experiment.
• TensorFlow: TensorFlow uses a static computation graph by default, which is more efficient
for deployment but can be less intuitive during model development. However, TensorFlow 2.0
introduced eager execution, making it more user-friendly like PyTorch.
Both frameworks provide tools for handling tensor operations efficiently, which is essential for
working with deep learning models.
Chapter 11
Where Ti1 ,i2 ,...,in represents the elements of the tensor, and n indicates the number of dimensions
or rank of the tensor.
1 import torch
2
83
84 CHAPTER 11. TENSORS: THE CORE DATA STRUCTURE
12 tensor_random = torch.rand(4, 4)
13 print(tensor_random)
1 import tensorflow as tf
2
1 # PyTorch example
2 tensor = torch.rand(3, 4, 5)
3 print(tensor.shape) # Output: torch.Size([3, 4, 5])
4
5 # TensorFlow example
6 tensor = tf.random.uniform([3, 4, 5])
7 print(tensor.shape) # Output: (3, 4, 5)
The rank of a tensor refers to the number of dimensions it has. A scalar has rank 0, a vector has
rank 1, a matrix has rank 2, and so on.
1 # Tensor of zeros
2 tensor_zeros = torch.zeros(3, 3)
3
4 # Tensor of ones
5 tensor_ones = torch.ones(2, 2)
6
7 # Random tensor
8 tensor_random = torch.rand(4, 4)
Examples in TensorFlow:
1 # Tensor of zeros
2 tensor_zeros = tf.zeros([3, 3])
3
4 # Tensor of ones
5 tensor_ones = tf.ones([2, 2])
6
7 # Random tensor
8 tensor_random = tf.random.uniform([4, 4])
6 # TensorFlow example
7 tensor = tf.random.uniform([4, 4])
8 reshaped_tensor = tf.reshape(tensor, [2, 8])
9 print(reshaped_tensor)
5 # TensorFlow slicing
6 tensor = tf.random.uniform([4, 4])
7 print(tensor[:2, :2]) # Extracts the first 2 rows and columns
86 CHAPTER 11. TENSORS: THE CORE DATA STRUCTURE
Example in TensorFlow:
1 # Adding a scalar to a tensor
2 tensor = tf.random.uniform([3, 3])
3 result = tensor + 5 # Broadcasting automatically adds 5 to each element
4 print(result)
Broadcasting rules can be tricky at first, but they greatly simplify tensor operations when applied
correctly.
Chapter 12
In this chapter, we will cover the fundamental arithmetic operations in Python. These operations form
the basis of all mathematical calculations and are essential for both beginner and advanced users.
Python makes it easy to perform these operations with both single values and larger data structures
like arrays.
Element-wise operations are those that are applied to each element of a data structure individually[136].
In Python, we can easily perform element-wise operations on arrays and lists using libraries like NumPy.
The basic arithmetic operations include addition, subtraction, multiplication, and division. These op-
erations can be performed on scalars (individual numbers) or element-wise on arrays.
Scalar Operations
Here’s how you can perform these basic arithmetic operations with individual numbers:
1 a = 10
2 b = 5
3
4 # Addition
5 print(a + b) # Output: 15
6
7 # Subtraction
8 print(a - b) # Output: 5
9
10 # Multiplication
11 print(a * b) # Output: 50
12
13 # Division
14 print(a / b) # Output: 2.0
87
88 CHAPTER 12. BASIC ARITHMETIC OPERATIONS
To perform element-wise operations on arrays, we need to use the NumPy library, which is designed
for numerical computations.
1 import numpy as np
2
7 # Element-wise addition
8 print(arr1 + arr2) # Output: [5 7 9]
9
10 # Element-wise subtraction
11 print(arr1 - arr2) # Output: [-3 -3 -3]
12
13 # Element-wise multiplication
14 print(arr1 * arr2) # Output: [4 10 18]
15
16 # Element-wise division
17 print(arr1 / arr2) # Output: [0.25 0.4 0.5]
In the above example, operations are applied to each element of the arrays independently. This
feature makes Python highly efficient for numerical computations, especially with large datasets.
1 a = 10
2 b = 5
3
4 # Sum
5 print(a + b) # Output: 15
6
7 # Max
8 print(max(a, b)) # Output: 10
9
10 # Min
11 print(min(a, b)) # Output: 5
1 import numpy as np
2
11 # Maximum value
12 print(np.max(arr)) # Output: 5
13
14 # Minimum value
15 print(np.min(arr)) # Output: 1
Reduction operations are essential when working with large datasets, where you often need a sum-
mary statistic or an aggregate measure.
90 CHAPTER 12. BASIC ARITHMETIC OPERATIONS
Chapter 13
Matrix Operations
Matrices are an essential part of many mathematical fields, especially in linear algebra. Python, with
the help of libraries like NumPy, provides powerful tools for performing matrix operations easily and
efficiently.
1 import numpy as np
2
7 # Matrix multiplication
8 C = np.dot(A, B)
9 print(C)
Output:
" # " #
1∗5+2∗7 1∗6+2∗8 19 22
=
3∗5+4∗7 3∗6+4∗8 43 50
In this example, we defined two 2x2 matrices and performed matrix multiplication using np.dot.
Matrix multiplication follows the rule where the element at position (i, j) in the resulting matrix is com-
puted as the dot product of the i-th row of the first matrix and the j-th column of the second matrix.
91
92 CHAPTER 13. MATRIX OPERATIONS
large matrices. Thus, understanding the intricacies of this operation and exploring optimization tech-
niques is crucial for improving performance in practical applications.
n
X
cij = aik bkj
k=1
This means that each entry in the resulting matrix is the sum of the products of corresponding
entries from the row of A and the column of B.
Let’s illustrate this with a simple example. Consider the following matrices A and B:
! 7 8
1 2 3
A= , B=9 10
4 5 6
11 12
To find the product C = A × B, we compute each element of C:
! !
(1 × 7 + 2 × 9 + 3 × 11) (1 × 8 + 2 × 10 + 3 × 12) 58 64
C= =
(4 × 7 + 5 × 9 + 6 × 11) (4 × 8 + 5 × 10 + 6 × 12) 139 154
This example demonstrates the straightforward nature of matrix multiplication, where each entry
in the resulting matrix is derived from a combination of row and column elements.
!
C11 C12
C = A×B =
C21 C22
13.2. OPTIMIZATION OF MATRIX MULTIPLICATION 93
This diagram illustrates how the elements from matrices A and B are used to calculate the ele-
ments of matrix C.
Strassen’s algorithm works by recursively dividing each matrix into four submatrices. Given two n × n
matrices A and B:
! !
A11 A12 B11 B12
A= , B=
A21 A22 B21 B22
Strassen’s algorithm requires seven multiplications of these submatrices instead of eight, as re-
quired by the conventional approach. The seven multiplications are defined as follows:
The resulting submatrices Cij of the product matrix C are computed using these multiplications:
C11 = M1 + M4 − M5 + M7
C12 = M3 + M5
C21 = M2 + M4
C22 = M1 − M2 + M3 + M6
Here is a Python implementation of Strassen’s algorithm, which demonstrates how to recursively mul-
tiply matrices using the principles outlined above.
1 import numpy as np
2
6 return A * B
7
43 return C
This implementation effectively uses recursion to break down the matrix multiplication into smaller
components, applying Strassen’s optimization techniques to reduce the number of multiplicative op-
erations.
While Strassen’s algorithm represents a significant improvement over the standard method, further ad-
vancements have been made in the field of matrix multiplication. Below are some notable algorithms
that have emerged since Strassen’s work.
13.3. TRANSPOSE OF A MATRIX 95
Coppersmith-Winograd Algorithm
The Coppersmith-Winograd algorithm further reduced the complexity of matrix multiplication to ap-
proximately O(n2.376 )[60]. This algorithm utilizes advanced mathematical techniques involving tensor
rank and is considered to be more theoretical due to its complexity and the overhead associated with
its practical implementation.
Recent Advances
Recent research has led to even faster algorithms, some of which leverage techniques from algebraic
geometry and combinatorial optimization. Notably, there have been advancements that utilize fast
Fourier transforms (FFT) for multiplying polynomials, which can be adapted to matrix multiplication
scenarios, yielding further reductions in complexity[267].
13.2.6 Conclusion
Matrix multiplication is a cornerstone of many computational applications. While the naive approach
is straightforward, the advent of algorithms like Strassen’s and subsequent improvements highlights
the importance of optimization in computational mathematics. By leveraging advanced techniques
and utilizing efficient libraries, one can achieve significant performance improvements in matrix com-
putations, which is essential for handling large-scale problems in science and engineering. Under-
standing these algorithms not only enhances computational efficiency but also deepens our grasp of
the mathematical principles underlying linear algebra.
1 import numpy as np
2
3 # Define a matrix
4 A = np.array([[1, 2, 3], [4, 5, 6]])
5
96 CHAPTER 13. MATRIX OPERATIONS
Output:
" #T 1 4
1 2 3
= 2 5
4 5 6
3 6
In this example, we transposed a 2x3 matrix into a 3x2 matrix.
1 import numpy as np
2
3 # Define a matrix
4 A = np.array([[1, 2], [3, 4]])
5
Output:
" #
−1 −2 1
A =
1.5 −0.5
In this example, we used the np.linalg.inv() function to calculate the inverse of a 2x2 matrix.
1 import numpy as np
2
3 # Define a matrix
4 A = np.array([[1, 2], [3, 4]])
5
Output:
det(A) = −2.0000000000000004
In this example, the determinant of matrix A is calculated as -2. Determinants are particularly
useful in determining whether a matrix is invertible (a matrix is invertible if and only if its determinant
is non-zero).
Av = λv
In Python, we can compute the eigenvalues and eigenvectors of a matrix using np.linalg.eig().
Here’s an example:
1 import numpy as np
2
3 # Define a matrix
4 A = np.array([[1, 2], [2, 3]])
5
Output:
Solving systems of linear equations is a fundamental problem in mathematics and science. In Python,
there are several methods available to solve these problems efficiently, particularly when the system of
equations can be represented as a matrix equation. In this chapter, we will explore different techniques
to solve linear equations using matrix operations.
Ax = b
where A is a matrix, x is the vector of unknowns, and b is the vector of constants, we can solve for
x by computing the inverse of matrix A:
x = A−1 b
This method works well when the matrix A is invertible. Let’s look at how we can implement this
using Python.
Example: Solving a system using matrix inverse
Consider the following system of equations:
x + 2y = 5
3x + 4y = 6
99
100 CHAPTER 14. SOLVING SYSTEMS OF LINEAR EQUATIONS
10 # Solve for x
11 x = np.dot(A_inv, b)
12 print(x)
[-4. 4.5]
14.2 LU Decomposition
LU decomposition is a method that decomposes a matrix A into two matrices: a lower triangular
matrix L and an upper triangular matrix U [108]. This decomposition can simplify the process of solving
systems of equations.
The matrix equation Ax = b can be written as:
LU x = b
1 import scipy.linalg as la
2
7 # Perform LU decomposition
8 P, L, U = la.lu(A)
9
10 # Solve L*y = b
11 y = np.linalg.solve(L, b)
12
13 # Solve U*x = y
14 x = np.linalg.solve(U, y)
15 print(x)
[-4. 4.5]
14.3. QR DECOMPOSITION 101
LU decomposition is a more efficient method than directly using the matrix inverse for large sys-
tems.
14.3 QR Decomposition
QR decomposition decomposes a matrix A into an orthogonal matrix Q and an upper triangular matrix
R. This method is particularly useful in solving linear systems and least squares problems[3].
Given:
Ax = b
QRx = b
4 # Solve Q.T * y = b
5 y = np.dot(Q.T, b)
6
7 # Solve R * x = y
8 x = np.linalg.solve(R, y)
9 print(x)
QR decomposition is numerically stable and can be used for solving both linear systems and least
squares problems.
102 CHAPTER 14. SOLVING SYSTEMS OF LINEAR EQUATIONS
Chapter 15
In linear algebra and machine learning, norms and distance metrics are important for measuring the
size or length of vectors and the distance between points in a vector space. In this chapter, we will
explore different types of norms and distance metrics used in numerical computing.
15.1.1 L1 Norm
The L1 norm (also known as the Manhattan or Taxicab norm) is the sum of the absolute values of the
vector components. It is defined as:
n
X
kxk1 = |xi |
i=1
1 # Define a vector
2 x = np.array([1, -2, 3])
3
15.1.2 L2 Norm
The L2 norm (also known as the Euclidean norm) is the square root of the sum of the squares of the
vector components[155]. It is defined as:
103
104 CHAPTER 15. NORMS AND DISTANCE METRICS
n
!1/2
X
kxk2 = x2i
i=1
3.7416573867739413
The L2 norm is commonly used in machine learning for measuring the error or magnitude of vec-
tors.
1/2
X
kAkF = |aij |2
i,j
1 # Define a matrix
2 A = np.array([[1, 2], [3, 4]])
3
5.477225575051661
a·b
cosine similarity =
kakkbk
Example: Computing cosine similarity in Python
15.4. EUCLIDEAN DISTANCE 105
5.196152422706632
Euclidean distance is widely used in clustering algorithms and in measuring similarity between
data points.
106 CHAPTER 15. NORMS AND DISTANCE METRICS
Chapter 16
Automatic differentiation (AD) is a powerful technique used in many machine learning frameworks,
including PyTorch and TensorFlow, to compute gradients efficiently and accurately[16]. Unlike sym-
bolic differentiation, which can produce complex expressions, or numerical differentiation, which can
suffer from precision issues, automatic differentiation computes derivatives systematically using the
chain rule. This chapter introduces the concept of automatic differentiation and demonstrates how
gradients can be computed in popular machine learning libraries like PyTorch and TensorFlow.
• Reverse Mode: Particularly efficient for functions with many inputs and one output (e.g., neural
networks).
Reverse-mode AD is particularly useful in deep learning, where we often need to compute the gra-
dient of a loss function with respect to model parameters[89].
Example: Consider the function f (x) = x2 + 3x + 5. To compute its derivative using autodiff, the
function can be broken into smaller parts:
f (x) = (x · x) + (3 · x) + 5
Each elementary operation (multiplication, addition) is recorded, and the chain rule is applied auto-
matically to compute the derivative.
107
108 CHAPTER 16. AUTOMATIC DIFFERENTIATION AND GRADIENTS
1 import torch
2
tensor(7.)
Explanation:
• We define a tensor x with the argument requires_grad=True, which tells PyTorch to track all
operations on this tensor.
• The function f (x) = x2 + 3x + 5 is computed, and PyTorch automatically tracks all operations.
In this example, PyTorch computes the partial derivatives of f (x, y) = 3x2 + 4y 3 with respect to
both x and y, evaluated at x = 1 and y = 2.
1 import tensorflow as tf
2
Explanation:
10 # Compute gradients
11 gradients = tape.gradient(f, [x, y])
12
1 x = tf.Variable(1.0)
2
16.5 Summary
In this chapter, we explored automatic differentiation and its use in computing gradients. Both PyTorch
and TensorFlow provide powerful tools to automatically compute derivatives, which are essential in
training machine learning models. We also covered how to compute gradients, Jacobians, and Hes-
sians, providing a foundation for more advanced optimization and machine learning techniques.
112 CHAPTER 16. AUTOMATIC DIFFERENTIATION AND GRADIENTS
Part III
113
115
Optimization is a crucial aspect of deep learning. Without proper optimization, training neural net-
works efficiently would not be possible. In this part, we will discuss the fundamental concepts of
optimization, focusing on gradient-based methods. These methods are essential for minimizing the
loss function, allowing the model to learn from data and improve its performance.
116
Chapter 17
Optimization Basics
In this chapter, we will introduce the basic concepts behind optimization in deep learning, starting
with Gradient Descent, which is the foundation of many advanced optimization techniques. We will
then explore Stochastic Gradient Descent (SGD)[232], momentum-based optimization, and adaptive
optimization methods like Adagrad[76], RMSprop[266], Adam[148], and AdamW[176].
θ := θ − η∇θ L(θ)
Where:
• η is the learning rate, a hyperparameter that controls the step size in each iteration.
1 import numpy as np
2
117
118 CHAPTER 17. OPTIMIZATION BASICS
7 def gradient(x):
8 return 2*x
9
In this example:
• In each iteration, we update x by subtracting the product of the learning rate and the gradient.
θ := θ − η∇θ L(θ(i) )
Where L(θ(i) ) represents the loss for the i-th data point or mini-batch.
1 import numpy as np
2
8 # Initialize parameters
9 theta = np.random.randn(2, 1)
10 learning_rate = 0.1
11 iterations = 100
12 batch_size = 10
13
17 # Perform SGD
18 for i in range(iterations):
19 indices = np.random.randint(100, size=batch_size)
20 X_batch = X_b[indices]
21 y_batch = y[indices]
22
In this example:
• SGD is performed over 100 iterations, and in each iteration, we randomly sample a mini-batch of
10 data points.
• The model parameters are updated using the gradient computed from the mini-batch.
v := βv + (1 − β)∇θ L(θ)
θ := θ − ηv
Where:
5 for i in range(iterations):
6 grad = grad_f(theta)
7 v = beta * v + (1 - beta) * grad
8 theta = theta - learning_rate * v
9 print(f"Iteration {i+1}: theta = {theta}, f(theta) = {f(theta)}")
10
11 # Example usage
12 f = lambda x: x**2
13 grad_f = lambda x: 2*x
14 initial_theta = 10
15
Adaptive optimization methods automatically adjust the learning rate during training, which can sig-
nificantly improve convergence[173]. These methods include Adagrad, RMSprop, Adam, and AdamW,
each of which modifies the learning rate based on the gradients.
17.4.1 Adagrad
Adagrad (Adaptive Gradient Algorithm) adjusts the learning rate for each parameter based on the
history of gradients. Parameters with large gradients receive smaller learning rates, and parameters
with small gradients receive larger learning rates.
η
θ := θ − √ ∇θ L(θ)
G+ǫ
Where:
17.4.3 RMSprop
RMSprop (Root Mean Square Propagation) is a variant of Adagrad that scales the learning rate based
on a moving average of squared gradients, preventing the learning rate from decaying too quickly.
17.4.5 Adam
Adam (Adaptive Moment Estimation) combines the benefits of both momentum-based methods and
RMSprop by using both the first and second moments of the gradients.
mt = β1 mt−1 + (1 − β1 )gt
vt = β2 vt−1 + (1 − β2 )gt2
mt vt
m̂t = , vˆt =
1 − β1t 1 − β2t
η m̂t
θ := θ − √
vˆt + ǫ
17.4.7 AdamW
AdamW is a variant of Adam that decouples the weight decay from the gradient updates, leading to
improved performance for regularization.
ηt = η0 · drop_factor⌊ drop_epoch ⌋
t
Where:
• drop_epoch is the number of epochs after which the learning rate is reduced.
3 # Define optimizer
4 optimizer = optim.SGD(model.parameters(), lr=0.1)
5
9 # Training loop
10 for epoch in range(30):
11 train() # Custom training function
12 validate() # Custom validation function
13
In this example:
• After every 10 epochs, the learning rate is multiplied by 0.5, effectively reducing it.
• This allows for rapid progress initially and then slower, more refined updates as training pro-
gresses.
1 import tensorflow as tf
2
3 # Define optimizer
4 optimizer = tf.keras.optimizers.SGD(learning_rate=0.1)
17.5. LEARNING RATE SCHEDULES 123
15 # Training loop
16 for epoch in range(30):
17 train() # Custom training function
18 validate() # Custom validation function
19
ηt = η0 · e−λt
Where:
This approach is often used when the learning rate should decrease continuously throughout the
training process.
Example of Exponential Decay in PyTorch:
1 # Define optimizer
2 optimizer = optim.SGD(model.parameters(), lr=0.1)
3
7 # Training loop
8 for epoch in range(30):
9 train() # Custom training function
10 validate() # Custom validation function
11
13 scheduler.step()
14 print(f'Epoch {epoch+1}, Learning Rate: {scheduler.get_last_lr()}')
In this example:
• After every epoch, the learning rate is multiplied by 0.9, causing it to decrease exponentially.
11 # Training loop
12 for epoch in range(30):
13 train() # Custom training function
14 validate() # Custom validation function
15
In this example:
• The learning rate starts at 0.1 and decays by 4% every 1000 steps.
• If staircase is set to True, the learning rate would decay in discrete steps instead of continuously.
• ηmax is the maximum learning rate (usually the initial learning rate).
1 # Define optimizer
2 optimizer = optim.SGD(model.parameters(), lr=0.1)
3
7 # Training loop
8 for epoch in range(30):
9 train() # Custom training function
10 validate() # Custom validation function
11
In this example:
• The learning rate follows a cosine annealing schedule, resetting every 10 epochs.
• T0 is the number of epochs before the first restart, and Tmult is a factor that increases the period
of restarts.
1 import numpy as np
2
25 # Training loop
26 for epoch in range(30):
27 train() # Custom training function
28 validate() # Custom validation function
29
In this example:
• We define a custom learning rate schedule class for warm restarts using cosine annealing.
• The learning rate periodically resets to a high value and then decreases again.
Warm restarts can be very effective in improving convergence, especially for deep neural networks
where escaping local minima is crucial for achieving better performance.
Chapter 18
In this chapter, we will delve into some advanced optimization techniques that are essential for improv-
ing the performance of machine learning models. These methods are critical in the training process,
particularly when working with deep learning models. We will cover techniques like Batch Normaliza-
tion, Gradient Clipping, and Second-order Optimization Methods.
m m
1 X 2 1 X
µB = xi , σB = (xi − µB )2
m i=1 m i=1
where:
2
• σB is the variance of the mini-batch,
• xi is the input to the layer for the i-th example in the mini-batch.
xi − µB
x̂i = p 2
σB + ǫ
where ǫ is a small constant added to avoid division by zero.
127
128 CHAPTER 18. ADVANCED OPTIMIZATION TECHNIQUES
In addition to normalization, we introduce two learnable parameters, γ and β, to allow the network
to scale and shift the normalized values:
yi = γ x̂i + β
• It helps mitigate the problem of internal covariate shift, where the distribution of inputs to layers
changes during training.
• It allows for higher learning rates by ensuring that the activations stay within a controlled range,
leading to faster convergence.
• It acts as a regularizer, often reducing the need for other regularization techniques like Dropout.
1 import numpy as np
2
In this example, we normalized a mini-batch of data and applied scaling and shifting with the learn-
able parameters γ and β.
to the network weights that are too drastic, leading to unstable training or even causing the model to
diverge. Gradient clipping limits the size of the gradients by setting a maximum threshold.
t
gclipped = g ·
||g||
1 import numpy as np
2
3 # Simulate a gradient
4 gradient = np.array([0.5, 0.7, 1.2])
5
6 # Define a threshold
7 threshold = 1.0
8
In this example, we clipped the gradient if its norm exceeded the specified threshold.
xt+1 = xt − H −1 ∇f (xt )
130 CHAPTER 18. ADVANCED OPTIMIZATION TECHNIQUES
where:
The Hessian matrix contains information about the curvature of the objective function, allowing
Newton’s Method to take more informed steps toward the minimum.
Example of Newton’s Method in Python
Here’s a simple example of using Newton’s method to minimize a quadratic function:
f (x) = x2 + 4x + 4
∇f (x) = 2x + 4
H =2
1 def f_prime(x):
2 # Gradient of the function f(x) = x^2 + 4x + 4
3 return 2 * x + 4
4
5 def hessian():
6 # The Hessian (second derivative) is constant in this case
7 return 2
8
9 # Initial guess
10 x = 0.0
11
In this example, we performed one iteration of Newton’s Method to update the parameter x.
BFGS builds an approximation to the inverse of the Hessian matrix iteratively. At each step, the
approximation is updated using the gradient information from the current and previous steps. This
method balances the efficiency of first-order methods and the accuracy of second-order methods.
Example of Quasi-Newton Method (BFGS) in Python
In Python, we can use the scipy.optimize library to perform optimization using the BFGS algorithm:
1 import numpy as np
2 from scipy.optimize import minimize
3
8 # Initial guess
9 x0 = 0.0
10
In this example, we used the BFGS algorithm to find the minimum of the quadratic function f (x) =
2
x + 4x + 4. The minimize function from scipy.optimize handles the details of the BFGS algorithm
for us.
132 CHAPTER 18. ADVANCED OPTIMIZATION TECHNIQUES
Chapter 19
Summary
In this chapter, we covered advanced optimization techniques that are crucial for training complex
machine learning models:
• Batch normalization, which stabilizes and accelerates training by normalizing the inputs to each
layer.
• Gradient clipping, which prevents exploding gradients by limiting the size of the gradient during
backpropagation.
• Second-order optimization methods like Newton’s Method and Quasi-Newton methods, which
use curvature information to make more efficient parameter updates.
Understanding and applying these techniques can significantly improve the performance and stability
of machine learning models, particularly in deep learning scenarios.
133
134 CHAPTER 19. SUMMARY
Part IV
135
137
In this part of the book, we will focus on the practical aspects of deep learning mathematics.
Through exercises and examples, you will solidify your understanding of key concepts such as ten-
sor operations, gradient computation, and optimization algorithms. These are essential topics for
anyone looking to understand how deep learning models function at a mathematical level.
138
Chapter 20
Practice Problems
This chapter contains a series of practice problems that will help you deepen your understanding of the
mathematical concepts behind deep learning. You will work through exercises on tensor and matrix
operations, gradient computations, and optimization algorithms. These problems are designed to
build your confidence in applying these mathematical techniques in practical deep learning scenarios.
1 import numpy as np
2
7 # Element-wise addition
8 C = A + B
9 print(C)
10
139
140 CHAPTER 20. PRACTICE PROBLEMS
13 print(D)
Expected output:
# Element-wise addition result
[[[14 16 18]
[20 22 24]]
[[26 28 30]
[32 34 36]]]
Expected output:
[[14 16 18 20 22 24]
[26 28 30 32 34 36]]
f (x, y) = x2 + 3xy + y 2
∂f ∂f
Find the partial derivatives ∂x and ∂y at the point (x, y) = (1, 2).
1 # Define the function
2 def f(x, y):
3 return x**2 + 3*x*y + y**2
4
12 print("df/dx:", df_dx)
20.3. OPTIMIZATION ALGORITHM PRACTICE 141
13 print("df/dy:", df_dy)
Expected output:
df/dx: 7.00001000001393
df/dy: 8.999999999812167
Expected output:
[[2 3]
[4 5]]
5 def df(x):
6 return 2*x + 4
7
y = Wx + b
Where W is the weight matrix, x is the input vector, and b is the bias. Given:
" # " # " #
1 2 5 1
W= , x= , b=
3 4 6 1
Compute y.
6 # Compute y = W * x + b
7 y = np.dot(W, x) + b
8 print(y)
Expected output:
[18 40]
Chapter 21
Summary
This chapter provides a summary of the key mathematical concepts covered in this part of the book.
Understanding these concepts is crucial for anyone looking to work in deep learning.
• Tensor Operations: Deep learning models rely heavily on tensor operations such as addition,
multiplication, and reshaping.
• Optimization Algorithms: Algorithms like gradient descent enable models to minimize loss func-
tions and improve prediction accuracy.
• Linear Algebra in Deep Learning: Concepts such as matrix multiplication and eigenvalue decom-
position are widely used in neural network training and operations.
Mastering these mathematical tools will help you build and understand more complex models in
deep learning.
143
144 CHAPTER 21. SUMMARY
Part V
145
147
Numerical methods play a crucial role in deep learning, particularly in tasks such as optimization,
solving differential equations, and matrix computations. These methods help in efficiently solving
mathematical problems that arise in training deep learning models, especially when analytical solu-
tions are impractical or impossible. In this part, we will introduce various numerical methods, discuss
sources of computational errors, and explore strategies to ensure the stability and accuracy of numer-
ical algorithms in deep learning.
148
Chapter 22
• Eigenvalue and singular value decomposition (used in dimensionality reduction techniques like
PCA)
1 import numpy as np
2
149
150 CHAPTER 22. INTRODUCTION AND ERROR ANALYSIS
Here, we have used a numerical method (Gaussian elimination) implemented by numpy to find the
solution to the system of linear equations.
• Round-off Error[72]: This occurs due to the limited precision of floating-point arithmetic used
by computers. For example, irrational numbers like π and square roots of non-perfect squares
cannot be represented exactly.
• Truncation Error[256]: This occurs when an infinite process is approximated by a finite one. For
example, the Taylor series expansion of functions is often truncated after a few terms, introduc-
ing truncation errors.
• Approximation Error[13, 65]: This occurs when an exact mathematical solution is approximated
using a numerical method. For instance, when we approximate a continuous function by a finite
sum, an error is introduced.
This example illustrates that due to round-off error, the sum of 0.1 and 0.2 is not exactly equal to
0.3, even though we expect it to be. Such small errors can propagate through computations and lead
to significant discrepancies.
In this example, small numerical errors are introduced in each iteration, but the method converges
to the correct value because the error is reduced at each step.
• Absolute Error[274, 49, 270, 20]: The difference between the exact value and the approximate
value.
Absolute Error = |xexact − xapproximate |
• Relative Error[37, 187, 290, 43]: The absolute error divided by the exact value, providing a nor-
malized measure of the error.
|xexact − xapproximate |
Relative Error =
|xexact |
152 CHAPTER 22. INTRODUCTION AND ERROR ANALYSIS
Relative error is often more meaningful than absolute error, as it gives a sense of how significant
the error is relative to the size of the true value.
Example: Computing Absolute and Relative Errors
Let’s compute the absolute and relative errors for an approximation of π using a numerical method.
1 import math
2
3 # Exact value of pi
4 pi_exact = math.pi
5
In this example, we computed the absolute and relative errors for the approximation 22 7 of π, show-
ing that while the absolute error is small, the relative error gives a better sense of the significance of
the approximation.
In this example, due to the near-singularity of matrix A, small numerical errors in the computation
of the inverse can lead to instability. The product A · A−1 may not exactly result in the identity matrix
due to these errors.
22.6 Summary
In this chapter, we introduced numerical methods and discussed the various sources of errors that
arise in computations. We explored error propagation, absolute and relative error, and the stability of
algorithms. Understanding these concepts is crucial for ensuring accurate and reliable results when
using numerical methods in deep learning. Future chapters will dive deeper into specific numerical
techniques and their applications in deep learning.
154 CHAPTER 22. INTRODUCTION AND ERROR ANALYSIS
Chapter 23
Each of these methods has its own strengths and weaknesses, and they are applicable under dif-
ferent circumstances.
155
156 CHAPTER 23. ROOT FINDING METHODS
23.2.1 Algorithm
The Bisection Method repeatedly bisects the interval [a, b] and selects the subinterval where the root
lies. The steps are as follows:
1. Check that f (a)f (b) < 0 (i.e., the root lies between a and b).
a+b
2. Compute the midpoint c = 2 .
3. Evaluate f (c).
6 iteration = 0
7 while (b - a) / 2 > tol and iteration < max_iter:
8 c = (a + b) / 2 # Midpoint
9 if f(c) == 0: # Root found
10 return c
11 elif f(a) * f(c) < 0:
12 b = c
13 else:
14 a = c
15 iteration += 1
16 return (a + b) / 2
17
18 # Example usage
19 f = lambda x: x**2 - 4 # Function whose root we want to find
20 root = bisection(f, 1, 3)
21 print(f"Root found: {root}")
In this implementation:
• The bisection() function takes a function f , an interval [a, b], and an optional tolerance and
maximum number of iterations.
• The method returns the approximate root of the function within the given tolerance.
23.3. NEWTON’S METHOD 157
23.3.1 Algorithm
The idea behind Newton’s Method is to use the tangent line to approximate the function near the
current estimate of the root. The update rule is:
f (xn )
xn+1 = xn −
f ′ (xn )
Where f ′ (xn ) is the derivative of f (x) evaluated at xn .
11 # Example usage
12 f = lambda x: x**2 - 4 # Function
13 df = lambda x: 2*x # Derivative of the function
14 root = newton(f, df, x0=3)
15 print(f"Root found: {root}")
In this implementation:
23.4.1 Algorithm
The update rule for the Secant Method is:
14 # Example usage
15 f = lambda x: x**2 - 4
16 root = secant(f, 1, 3)
17 print(f"Root found: {root}")
In this example:
• The Secant Method finds the root without needing the derivative of the function.
23.5.1 Algorithm
The fixed-point iteration algorithm is simple:
11 # Example usage
12 g = lambda x: 0.5 * (x + 4/x) # Rewrite of x^2 = 4
13 root = fixed_point(g, x0=3)
14 print(f"Fixed point found: {root}")
In this example:
It is important to understand the convergence properties of the root-finding methods. Not all meth-
ods converge at the same rate, and some may fail to converge under certain conditions. Let’s briefly
discuss the convergence characteristics of the methods we’ve introduced.
Convergence Rate: The Bisection Method has a linear convergence rate, meaning that the error de-
creases by a constant factor in each iteration. While this method is very reliable, it is not the fastest.
Convergence Rate: Newton’s Method converges quadratically, which means that the number of cor-
rect digits in the approximation roughly doubles at each step. However, if the initial guess is far from
the root, the method may fail to converge.
Convergence Rate: The Secant Method converges super-linearly, with a rate between linear and quadratic.
It is generally slower than Newton’s Method but does not require the derivative of the function.
160 CHAPTER 23. ROOT FINDING METHODS
Interpolation and function approximation are fundamental concepts in both mathematics and ma-
chine learning[188, 33, 231, 175]. In this chapter, we will explore various methods for interpolating data
points and approximating functions, which are widely used in numerical analysis, scientific computing,
and deep learning. We will begin with basic interpolation techniques such as polynomial interpolation
and then move to more advanced methods like spline interpolation and piecewise linear interpolation.
Finally, we will discuss how neural networks are used for function approximation in the context of deep
learning.
f (xi ) = yi for i = 0, 1, . . . , n
There are several different methods to achieve this, depending on the type of data and the required
smoothness of the resulting function.
161
162 CHAPTER 24. INTERPOLATION AND FUNCTION APPROXIMATION
P (x) = a0 + a1 x + a2 x2 + · · · + an xn
n
X
P (x) = yi Li (x)
i=0
Y x − xj
Li (x) =
xi − xj
0≤j≤n
j6=i
The Lagrange polynomial is useful because it explicitly passes through all the given points, but it
can become computationally expensive for large n.
Example in Python:
Here is an implementation of Lagrange interpolation using Python:
1 import numpy as np
2
22 # Interpolating at x = 1.5
23 x = 1.5
24 y = lagrange_interpolation(x_values, y_values, x)
25 print(f'Interpolated value at x = {x}: {y}')
24.2. POLYNOMIAL INTERPOLATION 163
In this example:
• The function lagrange_basis computes the Lagrange basis polynomial for a given i.
• The function lagrange_interpolation calculates the interpolated value for any x using the La-
grange polynomial.
f [xi ] = yi
f [xi+1 ] − f [xi ]
f [xi , xi+1 ] =
xi+1 − xi
f [xi+1 , . . . , xi+k ] − f [xi , . . . , xi+k−1 ]
f [xi , xi+1 , . . . , xi+k ] =
xi+k − xi
Example in Python:
Here is an implementation of Newton’s divided difference interpolation using Python:
1 # Function to compute divided differences
2 def divided_differences(x_values, y_values):
3 n = len(x_values)
4 table = np.zeros((n, n))
5 table[:, 0] = y_values
6 for j in range(1, n):
7 for i in range(n - j):
8 table[i, j] = (table[i+1, j-1] - table[i, j-1]) / (x_values[i+j] - x_values[i])
9 return table[0]
10
25
26 # Interpolating at x = 1.5
27 x = 1.5
28 y = newton_interpolation(x_values, y_values, x)
29 print(f'Interpolated value at x = {x}: {y}')
In this example:
• The function newton_interpolation computes the interpolated value for any x using Newton’s
polynomial.
Spline interpolation uses piecewise polynomials to interpolate data[243, 255, 185, 41]. Unlike high-
degree polynomial interpolation, which can suffer from oscillations (known as Runge’s phenomenon),
spline interpolation ensures smoothness by using lower-degree polynomials over each subinterval
between data points.
The most common type of spline interpolation is cubic spline interpolation, where a cubic poly-
nomial is fit between each pair of points, ensuring continuity of the function and its first and second
derivatives at each point.
Example of Cubic Spline Interpolation in Python:
11 # Interpolating at x = 1.5
12 x = 1.5
13 y = cs(x)
14 print(f'Interpolated value at x = {x}: {y}')
In this example:
• We use the CubicSpline function from the scipy.interpolate module to create a cubic spline
interpolator.
• The cubic spline ensures a smooth curve through the data points, with continuous first and sec-
ond derivatives.
24.4. PIECEWISE LINEAR INTERPOLATION 165
yi+1 − yi
P (x) = yi + (x − xi )
xi+1 − xi
Example of Piecewise Linear Interpolation in Python:
10 # Interpolating at x = 1.5
11 x = 1.5
12 y = linear_interp(x)
13 print(f'Interpolated value at x = {x}: {y}')
1 import torch
2 import torch.nn as nn
3 import torch.optim as optim
4
26 # Training loop
27 for epoch in range(1000):
28 model.train()
29 optimizer.zero_grad()
30 output = model(x_train)
31 loss = criterion(output, y_train)
32 loss.backward()
33 optimizer.step()
34
In this example:
• A simple feedforward neural network is trained to approximate the sine function y = sin(x).
• The model consists of two fully connected layers, with a ReLU activation function in between.
• The network is trained using the mean squared error (MSE) loss function and stochastic gradient
descent (SGD) optimizer.
Chapter 25
In mathematics and applied fields, differentiation and integration are fundamental operations used to
compute rates of change and areas under curves, respectively. While analytic solutions exist for many
problems, there are cases where exact solutions are not feasible, and we must rely on numerical tech-
niques. In this chapter, we will cover basic numerical methods for differentiation and integration, with
a focus on their implementation in Python. These methods are widely used in many fields, including
physics, engineering, and machine learning.
f (x + h) − f (x)
f ′ (x) ≈
h
Backward Difference
The backward difference method approximates the derivative by looking at the difference between
the function values at x and a previous point x − h:
f (x) − f (x − h)
f ′ (x) ≈
h
Central Difference
167
168 CHAPTER 25. NUMERICAL DIFFERENTIATION AND INTEGRATION
The central difference method is generally more accurate than the forward or backward difference
methods because it uses points on both sides of x to compute the derivative:
f (x + h) − f (x − h)
f ′ (x) ≈
2h
Example of Finite Difference in Python
Let’s implement the central difference method to approximate the derivative of a function in Python:
1 import numpy as np
2
In this example, we used the central difference method to approximate the derivative of sin(x) at
π π
4 . The exact derivative at this point is cos 4 , which we compared with the numerical result.
x=
n−1
" #
b
h
Z X
f (x) dx ≈ f (a) + 2 f (xi ) + f (b)
a 2 i=1
where h = b−a
n is the step size, and xi are the points dividing the interval.
Example of the Trapezoidal Rule in Python
1 import numpy as np
2
25.2. INTRODUCTION TO NUMERICAL INTEGRATION 169
In this example, we estimated the integral of sin(x) over [0, π] using the trapezoidal rule.
b−a
where h = n , and n must be an even number.
Example of Simpson’s Rule in Python
1 import numpy as np
2
16
In this example, we used Simpson’s rule to estimate the same integral of sin(x) from 0 to π. Simp-
son’s rule generally provides a more accurate result than the trapezoidal rule for the same number of
subdivisions.
Gaussian quadrature is a powerful technique for numerical integration that provides exact results for
polynomials of degree 2n − 1 or less, where n is the number of sample points[96]. It selects both the
sample points and weights optimally to achieve high accuracy.
In Gaussian quadrature, the integral is approximated as:
Z b n
X
f (x) dx ≈ wi f (xi )
a i=1
where wi are the weights, and xi are the sample points chosen optimally.
Example of Gaussian Quadrature in Python
The scipy library provides a function for Gaussian quadrature called scipy.integrate.quadrature.
Here is an example:
1 import numpy as np
2 from scipy.integrate import quadrature
3
In this example, we used Gaussian quadrature to approximate the integral of sin(x) over [0, π].
Gaussian quadrature is particularly useful for high-precision integration.
25.3. APPLICATION OF NUMERICAL INTEGRATION IN DEEP LEARNING 171
4 # Example ROC curve data (true positive rate and false positive rate)
5 fpr = np.array([0.0, 0.1, 0.4, 0.8, 1.0])
6 tpr = np.array([0.0, 0.4, 0.7, 0.9, 1.0])
7
In this example, we approximated the area under the ROC curve using the trapezoidal rule. This
gives us an estimate of how well the classifier distinguishes between classes.
172 CHAPTER 25. NUMERICAL DIFFERENTIATION AND INTEGRATION
Chapter 26
Solving systems of linear equations is a fundamental problem in mathematics and forms the core of
many applications in numerical computing and deep learning. In deep learning, many optimization
problems, including backpropagation, can be reduced to solving linear systems. In this chapter, we
will cover both direct and iterative methods for solving systems of linear equations.
Gaussian elimination is a method for solving linear systems by converting the system’s matrix into
an upper triangular form[159, 260, 220, 107, 269]. Once the matrix is in this form, the solution can be
obtained through back-substitution.
Given a system:
Ax = b
we aim to reduce the matrix A to an upper triangular matrix U using row operations. Then, we solve
the system U x = b using back-substitution.
Example: Gaussian Elimination in Python
Consider the following system of equations:
x + 2y + 3z = 9
2x + 3y + 4z = 12
3x + 4y + 5z = 15
173
174 CHAPTER 26. SOLVING SYSTEMS OF LINEAR EQUATIONS
1 import numpy as np
2
Expected output:
[1. 1. 2.]
26.1.2 LU Decomposition
LU decomposition is a method that factors a matrix A into the product of two matrices: a lower tri-
angular matrix L and an upper triangular matrix U [288, 77, 107, 259]. This is useful for solving linear
systems because once A is decomposed, solving the system becomes a matter of solving two trian-
gular systems.
Given Ax = b, LU decomposition splits this into:
LU x = b
1 import scipy.linalg as la
2
3 # Perform LU decomposition
4 P, L, U = la.lu(A)
5
6 # Solve L * y = b
7 y = np.linalg.solve(L, b)
8
9 # Solve U * x = y
10 x = np.linalg.solve(U, y)
11 print(x)
Expected output:
[1. 1. 2.]
LU decomposition is more efficient than Gaussian elimination for solving multiple systems with
the same coefficient matrix.
26.2. ITERATIVE METHODS 175
A = LL⊤
This decomposition is particularly efficient for numerical stability in certain applications, such as
when dealing with covariance matrices.
Example: Cholesky Decomposition in Python
Expected output:
[[ 4. 12. -16.]
[ 12. 37. -43.]
[-16. -43. 98.]]
Cholesky decomposition is faster than LU decomposition but only applies to certain types of ma-
trices.
The Jacobi method is an iterative algorithm for solving linear systems. It updates each variable in
the system independently of the others using the previous iteration’s values[8, 238, 297, 275, 21]. The
system Ax = b is written as:
(k+1) 1 X (k)
xi = bi − aij xj
aii
j6=i
12 # Initial guess
13 x_init = np.zeros(len(b))
14
Expected output:
[1. 1. 2.]
Expected output:
[1. 1. 2.]
Expected output:
[1. 1. 2.]
z = Wx + b
where W is the weight matrix, x is the input, and b is the bias vector. During backpropagation, we
need to compute the gradients of the loss function with respect to W and x, which involves solving
linear systems.
For example, in a feedforward neural network, the gradient of the loss function with respect to the
weights of the output layer is given by:
∂L
= a⊤ δ
∂W
where a is the activation from the previous layer and δ is the error term. This equation involves
matrix multiplication, a key linear algebra operation.
In convolutional neural networks (CNNs), backpropagation involves solving more complex linear
systems, particularly in the convolution and pooling layers, making efficient linear system solvers crit-
ical for training large-scale networks.
178 CHAPTER 26. SOLVING SYSTEMS OF LINEAR EQUATIONS
Chapter 27
Numerical linear algebra is the backbone of many algorithms in deep learning, especially those involv-
ing large datasets and high-dimensional spaces. In this chapter, we will explore key matrix factoriza-
tion techniques, eigenvalue computations, and principal component analysis (PCA). These concepts
are vital for solving problems like dimensionality reduction, which is important in making deep learning
algorithms more efficient.
where:
• Σ is an m × n diagonal matrix with non-negative real numbers on the diagonal (the singular
values).
The SVD is useful in applications like image compression, noise reduction, and dimensionality
reduction, as it can help identify the most important components of a matrix.
Example: Computing the SVD in Python
Let’s see how to compute the SVD of a matrix using Python’s numpy library.
1 import numpy as np
2
3 # Define a matrix A
179
180 CHAPTER 27. NUMERICAL LINEAR ALGEBRA
This will output the matrices U , Σ, and V T , which represent the decomposition of the matrix A.
Matrix U:
[[-0.70710678 -0.70710678]
[ 0.70710678 -0.70710678]]
Singular values (S):
[4. 2.]
Matrix V^T:
[[-0.70710678 0. 0.70710678]
[ 0. -1. 0. ]
[ 0.70710678 0. 0.70710678]]
Applications of SVD:
• Image Compression: SVD can be used to approximate an image matrix with a reduced number
of singular values, resulting in efficient compression while preserving essential features.
27.1.2 QR Decomposition
QR decomposition is another important matrix factorization technique. It decomposes a matrix A into
the product of two matrices:
A = QR
where:
QR decomposition is useful in solving linear systems, least squares problems, and for computing
eigenvalues.
Example: Computing the QR Decomposition in Python
Let’s compute the QR decomposition of a matrix using Python.
1 # Define a matrix A
2 A = np.array([[12, -51, 4], [6, 167, -68], [-4, 24, -41]])
3
5 Q, R = np.linalg.qr(A)
6
Applications of QR Decomposition:
• Solving Linear Systems: QR decomposition can be used to efficiently solve systems of linear
equations, especially in least squares problems.
Av = λv
where:
Eigenvalues and eigenvectors play a crucial role in many applications, such as principal component
analysis (PCA), stability analysis, and quantum mechanics.
Example: Computing Eigenvalues and Eigenvectors in Python
Let’s compute the eigenvalues and eigenvectors of a matrix using Python.
1 # Define a square matrix A
2 A = np.array([[4, -2], [1, 1]])
3
Eigenvalues:
[3. 2.]
Eigenvectors:
[[ 0.89442719 0.70710678]
[ 0.4472136 0.70710678]]
• Dimensionality Reduction: Eigenvalues and eigenvectors are used in PCA for reducing the di-
mensionality of datasets while preserving the most significant variance.
• Stability Analysis: In dynamical systems, eigenvalues are used to determine the stability of equi-
librium points.
11 X_reduced = pca.fit_transform(X)
12
Reduced dataset:
[[ 0.7495898 -0.11194563]
[-1.24862174 -0.05295381]
[ 0.49903194 0.16489943]]
PCA reduces the dimensionality of the dataset from 3 to 2, keeping the most significant compo-
nents that explain the variance in the data.
20 mlp.fit(X_train, y_train)
21
This code demonstrates how PCA can be used to reduce the number of input features before
training a neural network, leading to a more efficient training process.
27.5 Summary
In this chapter, we explored essential concepts of numerical linear algebra, including matrix factoriza-
tion techniques such as SVD and QR decomposition, eigenvalues and eigenvectors, and PCA. These
tools are critical for many deep learning applications, particularly in tasks like dimensionality reduction,
where they help improve the efficiency and performance of models by reducing the dimensionality of
large datasets.
Chapter 28
The Fourier Transform is a fundamental mathematical tool in signal processing, image analysis, and
many areas of scientific computing, including deep learning. It allows us to analyze the frequency con-
tent of signals and functions by transforming data from the time (or spatial) domain to the frequency
domain. This chapter will introduce the concept of the Fourier Transform, delve into the Discrete
Fourier Transform (DFT) and Fast Fourier Transform (FFT), and explore their applications in signal
processing and deep learning.
Where:
• e−iωt is the complex exponential function, which decomposes the function into its frequency
components.
The inverse Fourier Transform allows us to reconstruct the original function from its frequency
components[286, 73]:
185
186 CHAPTER 28. FOURIER TRANSFORM AND SPECTRAL METHODS
∞
1
Z
f (t) = F (ω)eiωt dω
2π −∞
• They provide a way to analyze signals in the frequency domain, which can reveal properties not
easily observed in the time domain.
• They are used in signal processing for filtering, noise reduction, and signal reconstruction.
• In deep learning, Fourier transforms can be used to enhance image processing and in convolu-
tion operations.
N −1
2π
X
Fk = fn e−i N kn , k = 0, 1, . . . , N − 1
n=0
Where:
• fn is the value of the signal at the n-th point in the time domain.
N −1
1 X 2π
fn = Fk ei N kn , n = 0, 1, . . . , N − 1
N
k=0
3 def dft(signal):
4 """Compute the Discrete Fourier Transform (DFT) of a signal."""
5 N = len(signal)
28.3. FAST FOURIER TRANSFORM (FFT) 187
12 # Example usage
13 signal = [1, 2, 3, 4] # A simple signal
14 dft_result = dft(signal)
15 print("DFT result:", dft_result)
In this implementation:
• We define a function dft() that computes the Discrete Fourier Transform for a given signal.
• The inner loop multiplies each point in the signal by a complex exponential and sums the result
to get the frequency component.
While this implementation is mathematically correct, it is computationally expensive for large sig-
nals. The Fast Fourier Transform (FFT) significantly optimizes this process.
The Fast Fourier Transform (FFT) is an efficient algorithm for computing the DFT, reducing the time
complexity from O(N 2 ) to O(N log N ). FFT is one of the most important algorithms in numerical
computing because it allows the analysis of large datasets quickly[59, 38].
1 import numpy as np
2
3 # Example signal
4 signal = [1, 2, 3, 4]
5
Here:
• The function returns the frequency components of the signal in the same way as the DFT, but
with much greater computational efficiency.
188 CHAPTER 28. FOURIER TRANSFORM AND SPECTRAL METHODS
1 import numpy as np
2 import matplotlib.pyplot as plt
3
25 plt.subplot(2, 1, 2)
26 plt.plot(t, np.real(filtered_signal))
27 plt.title("Filtered Signal")
28.4. APPLICATIONS OF FOURIER TRANSFORM IN SIGNAL PROCESSING AND DEEP LEARNING 189
28 plt.show()
In this example:
• The FFT is applied to the noisy signal, and we set the frequency components above a certain
threshold to zero, effectively filtering out the high-frequency noise.
• Finally, the original and filtered signals are plotted to visualize the noise reduction effect.
1 import numpy as np
2 import matplotlib.pyplot as plt
3 from scipy import fftpack
4 from skimage import data, color
5
31 plt.subplot(1, 2, 2)
32 plt.imshow(filtered_image, cmap='gray')
33 plt.title("Low-Pass Filtered Image")
34 plt.show()
In this example:
• The inverse FFT is used to reconstruct the image after applying the filter.
Nonlinear equations and systems of nonlinear equations arise frequently in various fields, including
physics, engineering, finance, and machine learning. These equations are called nonlinear because
they do not adhere to the principle of superposition, meaning the relationship between variables can-
not be expressed as a simple linear combination. Solving nonlinear equations is often more challeng-
ing than solving linear equations, but there are powerful numerical methods available to tackle these
problems.
In this chapter, we will introduce nonlinear systems, explain widely used methods such as New-
ton’s method [221] and Broyden’s method[31, 40] for solving nonlinear systems[207], and explore their
applications in optimization tasks in neural networks.
F (x1 , x2 , . . . , xn ) = 0
Where F represents a vector-valued function of several variables. Solving such a system means
finding the values of x1 , x2 , . . . , xn that satisfy all the equations simultaneously.
An example of a simple nonlinear system is:
f2 (x1 , x2 ) = x21 − x2 = 0
In general, there are no analytical solutions for nonlinear systems, so numerical methods are used
to find approximate solutions.
191
192 CHAPTER 29. SOLVING NONLINEAR EQUATIONS
respect to each variable. The Jacobian matrix is used to iteratively improve an initial guess until the
solution converges[162, 250].
Newton’s Method Algorithm
Given a system of nonlinear equations F (x) = 0, where F is a vector-valued function, the Newton
iteration step is:
Where:
f2 (x1 , x2 ) = x21 − x2 = 0
1 import numpy as np
2
21 Fx = F(x)
22 Jx = J(x)
23 delta_x = np.linalg.solve(Jx, -Fx)
24 x = x + delta_x
25
31 # Initial guess
32 x0 = np.array([0.5, 0.5])
33
In this example:
• The newtons_method function iteratively applies Newton’s method to find the solution.
6 for i in range(max_iter):
7 Fx = F(x)
8 s = np.linalg.solve(B, -Fx)
9 x_new = x + s
10 y = F(x_new) - Fx
11 B = B + np.outer((y - B @ s), s) / np.dot(s, s)
12 x = x_new
13
In this implementation:
• We use an initial approximation to the Jacobian B0 (the identity matrix in this case).
• The Jacobian is updated iteratively based on the correction vector y and the step s.
• The method converges to the solution without computing the exact Jacobian at every step, mak-
ing it more efficient than Newton’s method in certain scenarios.
29 # Initial guess
30 theta0 = np.array([0, 0])
31
In this example:
• The gradient and Hessian matrix of the loss function are explicitly defined.
• Newton’s method quickly converges to the optimal parameters, though in practice, gradient-
based methods like stochastic gradient descent (SGD) are more commonly used for training
large neural networks.
196 CHAPTER 29. SOLVING NONLINEAR EQUATIONS
Newton’s and Broyden’s methods are powerful tools in solving nonlinear systems and optimization
problems. While Newton’s method requires calculating the Jacobian or Hessian matrix, Broyden’s
method reduces the computational burden by updating an approximation to the Jacobian iteratively.
Both methods play an essential role in various fields, including optimization in neural networks.
Chapter 30
Numerical Optimization
Numerical optimization refers to the process of finding the minimum or maximum of a function when
an analytical solution is difficult or impossible to obtain. Optimization is fundamental in many fields
such as machine learning, physics, economics, and engineering[100, 86, 24]. In this chapter, we will ex-
plore various numerical optimization techniques, starting from gradient-based methods and advanc-
ing to more sophisticated approaches like quasi-Newton methods and gradient-free methods. We will
also explore how these methods are applied in training deep neural networks.
min f (x)
x∈Rn
where f (x) is the objective function, and we seek to find the value of x that minimizes f (x). In
machine learning, for example, f (x) could represent the loss function, and our goal is to minimize it
to improve the performance of the model.
• Unconstrained Optimization[39, 291, 58, 242, 184, 118]: In this case, there are no restrictions on
the values that x can take. We aim to find the global or local minimum of the objective function.
• Constrained Optimization[142, 218, 249, 163, 111, 183, 214]: Here, the variable x is subject to cer-
tain constraints, such as g(x) ≤ 0 or h(x) = 0. The optimization needs to account for these
constraints.
In this chapter, we will focus on unconstrained optimization methods that are widely used in ma-
chine learning and other applications.
197
198 CHAPTER 30. NUMERICAL OPTIMIZATION
7 def grad_f(x):
8 return 2*x + 4
9
In this example, we applied gradient descent to minimize a simple quadratic function f (x) = x2 +
4x + 4. The algorithm starts at an initial guess x = 10, and the learning rate η = 0.1 controls the step
size.
30.3. QUASI-NEWTON METHODS 199
xt+1 = xt + αt pt
8 # Initial guess
9 x_init = np.array([10.0])
10
• st = xt+1 − xt ,
• yt = ∇f (xt+1 ) − ∇f (xt ).
1 import numpy as np
2 from scipy.optimize import minimize
3
8 # Initial guess
9 x_init = np.array([10.0])
10
In this example, we minimized the same quadratic function using the BFGS algorithm. The BFGS
method is efficient for many optimization problems and converges faster than gradient descent for
many smooth functions.
1 import numpy as np
2 from scipy.optimize import minimize
3
8 # Initial guess
9 x_init = np.array([10.0])
10
In this example, we used L-BFGS to minimize the quadratic function. L-BFGS is particularly useful
when the problem involves a large number of variables and memory is a constraint.
30.4. GRADIENT-FREE OPTIMIZATION 201
1 import numpy as np
2 from scipy.optimize import minimize
3
8 # Initial guess
9 x_init = np.array([10.0])
10
In this example, we used the Nelder-Mead method to minimize the function without using gradient
information. Nelder-Mead is particularly useful for problems where the gradient is not available or is
expensive to compute.
where the gradient is computed using a small randomly sampled mini-batch of data at each itera-
tion.
202 CHAPTER 30. NUMERICAL OPTIMIZATION
14 # Create the model, define the loss function and the optimizer
15 model = SimpleNet()
16 criterion = nn.MSELoss()
17 optimizer = optim.SGD(model.parameters(), lr=0.01)
18
23 # Training loop
24 for epoch in range(100):
25 optimizer.zero_grad() # Zero the gradient buffers
26 output = model(inputs) # Forward pass
27 loss = criterion(output, target) # Compute the loss
28 loss.backward() # Backward pass (compute gradients)
29 optimizer.step() # Update weights
30
31 print("Training completed.")
In this example, we defined a simple neural network using PyTorch and trained it using the SGD
optimizer. The network minimizes the mean squared error (MSE) loss to learn from the input data.
Chapter 31
Ordinary Differential Equations (ODEs) are equations that describe the relationship between a function
and its derivatives[6, 30]. They play a crucial role in many fields of science and engineering, including
neural networks and deep learning, where they are used to model dynamic systems[48, 235]. In this
chapter, we will cover the basic concepts of ODEs, methods for solving them, and their applications in
modeling neural dynamics.
dy
= f (t, y)
dt
where f (t, y) is a known function, and y(t) is the unknown function to be determined. The goal of
solving an ODE is to find the function y(t) that satisfies the given equation.
Example: A Simple ODE
Consider the following first-order ODE:
dy
= −2y
dt
This equation describes exponential decay, where the rate of change of y(t) is proportional to y(t)
itself. The analytical solution to this equation is:
y(t) = y0 e−2t
where y0 is the initial condition y(0).
However, many ODEs cannot be solved analytically, and we need numerical methods to approxi-
mate the solution. In the next sections, we will explore numerical methods such as Euler’s method
and Runge-Kutta methods.
203
204 CHAPTER 31. ORDINARY DIFFERENTIAL EQUATIONS (ODES)
dy
= f (t, y)
dt
and an initial condition y(0) = y0 , Euler’s method approximates y(t) by taking steps of size h as
follows:
yn+1 = yn + hf (tn , yn )
where tn+1 = tn + h, and yn is the approximation of y(tn ).
Example: Implementing Euler’s Method in Python
Let’s solve the ODE dy
dt = −2y using Euler’s method.
1 import numpy as np
2 import matplotlib.pyplot as plt
3
19 # Parameters
20 t0 = 0 # Initial time
21 y0 = 1 # Initial condition y(0) = 1
22 h = 0.1 # Step size
23 t_end = 5 # End time
24
In this example:
• We define the function f (t, y) = −2y and solve it using Euler’s method.
h
yn+1 = yn + (k1 + 2k2 + 2k3 + k4 )
6
where:
k1 = f (tn , yn )
h h
k2 = f tn + , yn + k1
2 2
h h
k3 = f tn + , yn + k2
2 2
k4 = f (tn + h, yn + hk3 )
Example: Implementing RK4 in Python
dy
Let’s solve the same ODE dt = −2y using the RK4 method.
11 k1 = f(t_n, y_n)
12 k2 = f(t_n + h/2, y_n + h/2 * k1)
13 k3 = f(t_n + h/2, y_n + h/2 * k2)
14 k4 = f(t_n + h, y_n + h * k3)
15
In this example:
• We use the RK4 method to solve the ODE and compare the results with Euler’s method.
• RK4 provides a more accurate solution, especially for larger step sizes.
dy
= −1000y + 3000 − 2000e−t
dt
To solve stiff ODEs efficiently, implicit methods such as the backward Euler method or specialized
solvers like the scipy.integrate.solve_ivp() function with the ’Radau’ or ’BDF’ method are often
used.
Example: Solving a Stiff ODE in Python
We will use the solve_ivp() function from scipy to solve a stiff ODE.
1 from scipy.integrate import solve_ivp
2
7 # Solve the ODE using the 'Radau' method for stiff equations
8 sol = solve_ivp(stiff_ode, [0, 5], [0], method='Radau')
9
In this example:
• We define a stiff ODE and solve it using the Radau method, which is well-suited for stiff problems.
In simpler models, neurons can be modeled using the leaky integrate-and-fire (LIF) model[158, 149,
223], where the membrane potential V (t) evolves according to the following ODE:
dV
τm = −(V (t) − Vrest ) + Rm I(t)
dt
where:
In this example:
• We model the dynamics of a leaky integrate-and-fire neuron using an ODE and solve it numeri-
cally.
• The membrane potential is plotted as a function of time, showing how the neuron responds to
an input current.
208 CHAPTER 31. ORDINARY DIFFERENTIAL EQUATIONS (ODES)
Chapter 32
Partial Differential Equations (PDEs) are equations that involve rates of change with respect to more
than one variable[81]. These equations are fundamental in describing various physical phenomena,
such as heat conduction, fluid dynamics, and electromagnetic fields. In this chapter, we will introduce
the basic concepts of PDEs, explore numerical methods for solving PDEs, and discuss their applica-
tions in deep learning, particularly through Physics-Informed Neural Networks (PINNs)[226, 295, 14].
• Elliptic PDEs: These PDEs describe equilibrium states, such as the Laplace equation:
∂ 2 u ∂ 2u
∆u = 0 or + 2 =0
∂x2 ∂y
• Parabolic PDEs: These PDEs describe processes that evolve over time, such as the heat equa-
tion:
∂u ∂2u
=α 2
∂t ∂x
• Hyperbolic PDEs: These PDEs describe wave propagation, such as the wave equation:
∂2u 2
2∂ u
= c
∂t2 ∂x2
Example: Heat Equation
The heat equation models how heat diffuses through a material:
∂u ∂2u
=α 2
∂t ∂x
where u(x, t) represents the temperature distribution, α is the thermal diffusivity, x is the spatial coor-
dinate, and t is time.
209
210 CHAPTER 32. PARTIAL DIFFERENTIAL EQUATIONS (PDES)
∂u un+1 − uni
≈ i
∂t ∆t
By substituting these approximations into the heat equation, we get the finite difference scheme:
α∆t n
un+1 = uni + − 2uni + uni−1
u
i
∆x2 i+1
Example: Solving the 1D Heat Equation Using Finite Difference Method
We will solve the 1D heat equation using the finite difference method in Python.
1 import numpy as np
2 import matplotlib.pyplot as plt
3
4 # Define parameters
5 alpha = 0.01 # thermal diffusivity
6 L = 10.0 # length of the rod
7 T = 1.0 # total time
8 Nx = 100 # number of spatial points
9 Nt = 500 # number of time points
10 dx = L / (Nx - 1)
11 dt = T / Nt
12
13 # Stability criterion
14 assert alpha * dt / dx**2 < 0.5, "The scheme is unstable!"
15
18 u = np.sin(np.pi * x / L)
19
In this example, we discretize both space and time, then iteratively update the solution for each
time step using the finite difference scheme.
• Dividing the domain into finite elements (e.g., triangles or quadrilaterals in 2D).
−∇2 u = f in Ω
• Step 2: Express the solution u as a sum of basis functions u(x) = uj φj (x), where φj (x) are
P
j
piecewise polynomial basis functions.
• Step 3: Formulate the weak form of the PDE by multiplying by a test function and integrating by
parts.
212 CHAPTER 32. PARTIAL DIFFERENTIAL EQUATIONS (PDES)
• Step 4: Assemble the stiffness matrix and solve the system of linear equations for the unknown
coefficients uj .
FEM is widely used in engineering and scientific applications, such as structural analysis and fluid
dynamics.
• No need for mesh generation: Unlike FEM or FDM, PINNs do not require mesh generation or
discretization.
• Handling complex geometries: PINNs can easily handle complex geometries and boundary con-
ditions.
• Data integration: PINNs can integrate observed data into the learning process, making them
useful for data-driven modeling of physical systems.
∂u ∂2u
=α 2
∂t ∂x
We will define a neural network that takes x and t as inputs and predicts u(x, t). The loss function will
be based on the residual of the PDE and the initial and boundary conditions.
1 import torch
2 import torch.nn as nn
3 import numpy as np
4
12 nn.Linear(50, 50),
13 nn.Tanh(),
14 nn.Linear(50, 1) # Output: u(x, t)
15 )
16
In this example, we define a neural network model that takes x and t as inputs and predicts the
solution u(x, t). The loss function is based on the residual of the heat equation, and the gradients are
computed using automatic differentiation (torch.autograd.grad).
32.5 Summary
In this chapter, we introduced Partial Differential Equations (PDEs) and explored numerical methods
for solving them, including the Finite Difference Method (FDM) and the Finite Element Method (FEM).
We also discussed how Physics-Informed Neural Networks (PINNs) can be used to solve PDEs using
deep learning. PINNs offer a powerful and flexible approach to solving PDEs in scientific and engi-
neering applications, particularly when combining data and physical laws.
214 CHAPTER 32. PARTIAL DIFFERENTIAL EQUATIONS (PDES)
Chapter 33
Numerical methods are widely used in deep learning to approximate, optimize, and solve complex
mathematical problems that are otherwise intractable. From training neural networks to reinforce-
ment learning, numerical techniques provide the foundation for iterative algorithms and optimizations
that make modern machine learning possible. In this chapter, we will explore various applications of
numerical methods in deep learning, focusing on their use in training neural networks, reinforcement
learning, and data science applications such as interpolation and optimization.
θ := θ − η∇θ L(θ)
Where:
215
216 CHAPTER 33. SELECTED APPLICATIONS OF NUMERICAL METHODS IN DEEP LEARNING
1 import numpy as np
2
20 # Training loop
21 for epoch in range(10000):
22 # Forward pass
23 z = np.dot(X, weights) + bias
24 output = sigmoid(z)
25
In this example:
33.2. NUMERICAL APPROXIMATIONS IN REINFORCEMENT LEARNING 217
• The weights and bias are updated using gradient descent in each iteration.
• The training data is an XOR problem, and the model learns to classify the inputs over multiple
epochs.
1 import numpy as np
2
14 # Q-learning algorithm
15 for episode in range(1000):
16 state = np.random.randint(0, states) # Start at a random state
17 while state != 4: # Goal state
218 CHAPTER 33. SELECTED APPLICATIONS OF NUMERICAL METHODS IN DEEP LEARNING
27 # Update Q-table
28 q_table[state, action] = q_table[state, action] + learning_rate * (
29 reward + discount_factor * np.max(q_table[next_state]) - q_table[state, action])
30
31 state = next_state
32
In this example:
• A simple Q-learning algorithm is used to find the optimal action-value function for a gridworld
environment.
• The Q-table is updated using numerical methods based on the rewards and the expected future
rewards.
In policy gradient methods, the policy (which defines the agent’s behavior) is directly optimized using
numerical methods. The goal is to find a policy that maximizes the cumulative rewards. Algorithms
like REINFORCE use gradient-based optimization to improve the policy[244].
Numerical methods are also widely used in data science for tasks such as function approximation,
data interpolation, and optimization. These techniques are essential in machine learning algorithms,
where numerical methods help in optimizing models and approximating complex functions.
Numerical approximation is used when an exact solution to a mathematical function or model is diffi-
cult or impossible to find. In machine learning, models like decision trees, random forests, and neural
networks are essentially function approximators that use numerical methods to fit data[189].
33.3. DATA SCIENCE APPLICATIONS: APPROXIMATION, INTERPOLATION, AND OPTIMIZATION 219
1 import numpy as np
2
10 # Linear interpolation
11 y_interp = np.interp(x_interp, x, y)
12
In this example:
• We use the np.interp() function to perform linear interpolation between known data points.
1 import numpy as np
2 from scipy.optimize import curve_fit
3 import matplotlib.pyplot as plt
4
In this example:
• We use curve_fit() from scipy.optimize to find the best-fitting polynomial for noisy data.
• The function optimizes the parameters of the polynomial to minimize the error between the data
points and the curve.
Chapter 34
Summary
In this chapter, we covered several numerical methods and their applications in deep learning, rein-
forcement learning, and data science. From gradient descent to Q-learning and interpolation tech-
niques, numerical methods are essential in training machine learning models, optimizing functions,
and approximating unknown data points.
• Gradient Descent: A numerical method for minimizing a loss function by updating model param-
eters in the direction of the negative gradient.
• Q-Learning: A reinforcement learning algorithm that numerically approximates the optimal action-
value function using a Q-table.
• Interpolation: A technique used to estimate unknown values between known data points, often
applied in handling missing data.
• Optimization: The process of minimizing or maximizing a function, widely used in machine learn-
ing to optimize models and fit data.
This concludes our discussion on the applications of numerical methods in deep learning. Numer-
ical methods form the backbone of modern machine learning algorithms and are vital for both theory
and practice in this field.
221
222 CHAPTER 34. SUMMARY
Part VI
223
Chapter 35
Frequency domain methods are essential tools in mathematics and signal processing for analyzing
how functions or signals behave in terms of their frequency content. Instead of looking at a signal
in the time domain (how it changes over time), frequency domain analysis focuses on the frequen-
cies that compose the signal. This chapter introduces the historical background of frequency do-
main methods, tracing their development from the early origins of Fourier analysis to modern tech-
niques such as the Fast Fourier Transform (FFT)[59, 296, 225], Laplace Transform[10, 247, 50], and
Z-Transform[298, 211, 145], all of which have become fundamental in digital signal processing and
control systems.
Where:
225
226 CHAPTER 35. INTRODUCTION TO FREQUENCY DOMAIN METHODS
• a0 , an , bn are the Fourier coefficients that determine the amplitude of the sine and cosine com-
ponents.
The importance of Fourier’s work was initially underestimated but later became fundamental in
many fields, from signal processing to quantum mechanics. His discovery laid the groundwork for
frequency domain analysis by showing that complex signals could be decomposed into simpler, har-
monic components.
Where:
The Fourier Transform found widespread use in signal processing, allowing engineers to analyze
the frequency content of electrical signals, sound waves, and other types of data. Its ability to de-
compose complex waveforms into simpler frequency components is key in filtering, modulation, and
spectrum analysis.
Example: Computing the Fourier Transform in Python
1 import numpy as np
2 import matplotlib.pyplot as plt
3
19 # Time-domain signal
20 plt.subplot(1, 2, 1)
21 plt.plot(t, signal)
22 plt.title('Time-Domain Signal (50 Hz Sine Wave)')
23 plt.xlabel('Time [s]')
24 plt.ylabel('Amplitude')
25
In this example, we generate a sine wave with a frequency of 50 Hz and compute its Fourier Trans-
form using Python’s numpy library. The Fourier Transform reveals the frequency content of the signal,
which shows a peak at 50 Hz.
The FFT is particularly important in real-time applications where large amounts of data must be
processed quickly, such as in speech recognition, communications, and radar systems.
frequency-domain representation. It is particularly useful in the study of linear systems and control
theory[271, 205, 2].
The Laplace Transform is defined as:
Z ∞
F (s) = f (t)e−st dt
0
Where:
The Laplace Transform was developed by French mathematician Pierre-Simon Laplace in the late
18th century. It has been widely applied in electrical engineering, mechanical engineering, and control
systems, where it simplifies the analysis of systems described by differential equations by converting
them into algebraic equations.
The Laplace Transform is especially useful for analyzing systems with initial conditions and for
studying system stability in the frequency domain.
Example: Symbolic Laplace Transform in Python
1 import sympy as sp
2
This code computes the Laplace Transform of f (t) = e−t symbolically using the sympy library.
∞
X
X(z) = x[n]z −n
n=−∞
Where:
• z is a complex variable.
35.1. HISTORICAL BACKGROUND OF FREQUENCY DOMAIN ANALYSIS 229
The Z-Transform plays a critical role in the design of digital filters and systems, enabling engineers
to work in the frequency domain when processing discrete signals. It is particularly useful in applica-
tions such as telecommunications, audio processing, and digital control systems.
Example: Z-Transform in Python (Symbolic)
1 # Define a discrete-time signal and the Z-transform variable
2 n, z = sp.symbols('n z')
3 x_n = 2**n # Example signal x[n] = 2^n
4
In this example, we compute the Z-Transform of the discrete signal x[n] = 2n using symbolic
computation. The Z-Transform is critical in designing systems that process digital signals, such as
FIR (Finite Impulse Response) and IIR (Infinite Impulse Response) filters.
230 CHAPTER 35. INTRODUCTION TO FREQUENCY DOMAIN METHODS
Chapter 36
Conclusion
Frequency domain methods are essential in many fields, including signal processing, communica-
tions, control theory, and systems analysis. The Fourier Transform, Laplace Transform, Z-Transform,
and FFT all provide powerful techniques to analyze signals and systems in terms of their frequency
content. By shifting our perspective from the time domain to the frequency domain, we can gain
deeper insights into the behavior of systems, design more effective filters, and solve complex differ-
ential equations more efficiently.
231
232 CHAPTER 36. CONCLUSION
Chapter 37
The Fourier Transform is a mathematical technique that transforms a signal from the time domain
to the frequency domain. It plays a fundamental role in fields such as signal processing, image pro-
cessing, and even in solving partial differential equations. By converting a signal into its frequency
components, we gain insights into its underlying structure, periodicity, and other characteristics that
may not be apparent in the time domain.
In this chapter, we will introduce the Fourier Transform, starting with its definition, and gradually
cover the Fourier series[193], Continuous Fourier Transform (CFT)[93, 192], Discrete Fourier Transform
(DFT)[301], and their applications.
where:
The inverse Fourier Transform allows us to recover the original time-domain signal from its frequency-
domain representation:
233
234 CHAPTER 37. FOURIER TRANSFORM: FROM TIME TO FREQUENCY DOMAIN
∞
1
Z
f (t) = F (ω)eiωt dω
2π −∞
• Solve differential equations by converting them into algebraic equations in the frequency do-
main.
In practice, Fourier Transforms are used in audio processing, image compression (e.g., JPEG), and
in the analysis of electronic signals.
The Fourier Series is closely related to the Fourier Transform and is the foundation for understanding
how signals can be decomposed into frequency components. The Fourier Series applies to periodic
functions, while the Fourier Transform applies to both periodic and non-periodic functions.
Fourier Series
For a periodic function f (t) with period T , the Fourier Series represents the function as a sum of
sines and cosines (or equivalently, complex exponentials). The general form of the Fourier Series is:
∞
X 2πnt 2πnt
f (t) = a0 + an cos + bn sin
n=1
T T
∞
2πnt
X
f (t) = cn e i T
n=−∞
where:
N −1
2πkn
X
Xk = xn e−i N , k = 0, 1, 2, . . . , N − 1
n=0
where:
The inverse DFT allows us to recover the original sequence from its frequency components:
N −1
1 X 2πkn
xn = Xk e i N , n = 0, 1, 2, . . . , N − 1
N
k=0
13
In this example:
• We created a signal composed of two sine waves with frequencies of 5 Hz and 50 Hz.
• We plotted the signal in both the time domain and the frequency domain. In the frequency do-
main plot, you can clearly see peaks corresponding to the frequencies 5 Hz and 50 Hz.
• Signal Type: The CFT is applied to continuous signals, while the DFT is used for discrete signals.
• Spectrum: The CFT produces a continuous frequency spectrum, whereas the DFT results in a
discrete frequency spectrum.
• Application: The DFT is widely used in digital signal processing because real-world signals are
often sampled at discrete intervals.
Both the CFT and DFT are essential tools in signal analysis, with the DFT being particularly impor-
tant in digital systems due to its computational efficiency and the discrete nature of real-world data.
Here:
37.2. MATHEMATICAL DEFINITION OF FOURIER TRANSFORM 237
• e−iωt represents complex exponentials, which are sine and cosine functions in Euler’s formula.
The result of the Fourier Transform is a complex-valued function that encodes both the amplitude
and phase of the frequency components of the original function.
The Dirac delta function, δ(t), is a function that is zero everywhere except at t = 0, where it is infinite,
but the integral over all time is 1:
∞ t = 0
δ(t) =
0 t =
6 0
The Fourier Transform of the delta function is:
F {δ(t)} = 1
This result shows that the delta function contains all frequencies equally.
Consider a sine wave f (t) = sin(ω0 t), where ω0 is a constant frequency. The Fourier Transform of this
function is:
i
F {sin(ω0 t)} =[δ(ω − ω0 ) − δ(ω + ω0 )]
2
This shows that the Fourier Transform of a sine wave is composed of two delta functions centered
at ω = ±ω0 .
1 import numpy as np
2 import matplotlib.pyplot as plt
3
In this example:
• We reconstruct the original function using the Inverse Fourier Transform np.fft.ifft().
• The original and reconstructed functions are plotted, demonstrating that the Fourier and Inverse
Fourier Transforms recover the original signal.
Linearity
The Fourier Transform is a linear operation, meaning that the transform of a sum of functions is the
sum of their individual transforms:
where a and b are constants, and F (ω) and G(ω) are the Fourier Transforms of f (t) and g(t), re-
spectively.
37.3. APPLICATIONS OF FOURIER TRANSFORM IN DEEP LEARNING 239
Time Shifting
If a function f (t) is shifted in time by t0 , the Fourier Transform is affected by a phase shift:
F {f (t − t0 )} = F (ω)e−iωt0
Convolution Theorem
The Fourier Transform of the convolution of two functions f (t) and g(t) is the product of their Fourier
Transforms:
This property is particularly useful in the context of convolutional neural networks (CNNs), where
convolutions play a critical role in feature extraction.
17 plt.subplot(2, 1, 2)
18 plt.plot(frequencies[:len(frequencies)//2], np.abs(F_signal)[:len(F_signal)//2])
240 CHAPTER 37. FOURIER TRANSFORM: FROM TIME TO FREQUENCY DOMAIN
23 plt.tight_layout()
24 plt.show()
In this example:
• We compute the Fourier Transform of the signal to extract its frequency components.
• The original signal in the time domain and its frequency-domain representation are plotted.
21 plt.subplot(1, 2, 2)
22 plt.imshow(np.real(F_conv_result), cmap='gray')
23 plt.title('Convolution (Fourier Domain)')
37.3. APPLICATIONS OF FOURIER TRANSFORM IN DEEP LEARNING 241
24
25 plt.tight_layout()
26 plt.show()
In this example:
• We perform the convolution in both the spatial domain (using convolve2d) and the frequency
domain (using the Fourier Transform).
This method of convolution using the Fourier Transform is especially useful for large images and
large kernels, as it reduces the computational complexity of the convolution operation.
242 CHAPTER 37. FOURIER TRANSFORM: FROM TIME TO FREQUENCY DOMAIN
Chapter 38
The Fast Fourier Transform (FFT) is a highly efficient algorithm used to compute the Discrete Fourier
Transform (DFT) of a sequence, and it has widespread applications in signal processing, image anal-
ysis, and deep learning. The FFT reduces the computational complexity of calculating the DFT from
O(N 2 ) to O(N log N ), making it a cornerstone in numerical methods. In this chapter, we will explore
the importance of FFT, its algorithmic structure, and its applications in deep learning.
where N is the length of the sequence, X[k] are the frequency domain coefficients, and j is the imag-
inary unit.
The DFT is computationally expensive, requiring O(N 2 ) operations. The Fast Fourier Transform
(FFT) is an optimized algorithm that computes the same result as the DFT, but in only O(N log N )
operations, making it vastly more efficient.
• Signal Processing: FFT is used to analyze and filter signals, extract features, and remove noise.
• Image Processing: FFT is applied to enhance images, detect patterns, and perform compres-
sion.
• Audio Analysis: FFT enables the decomposition of audio signals into their frequency compo-
nents, facilitating tasks like speech recognition and music analysis.
• Deep Learning: In deep learning, FFT can be used to accelerate convolutions and perform spec-
tral analysis for feature extraction.
243
244 CHAPTER 38. FAST FOURIER TRANSFORM (FFT)
The naive computation of the DFT has a time complexity of O(N 2 ), because for each output frequency
k, a sum over all N input points is computed. The FFT reduces this complexity by breaking down the
DFT into smaller parts, recursively computing the DFT on smaller and smaller sequences.
The key idea behind FFT is to exploit the symmetry and periodicity of the exponential term e−j2πkn/N ,
which allows us to compute the DFT more efficiently. Specifically, the FFT algorithm divides the se-
quence into even-indexed and odd-indexed parts and recursively applies the DFT on each part.
where WNk = e−j2πk/N is called the twiddle factor, and Xeven[k] and Xodd [k] are the DFTs of the even
and odd indexed elements, respectively.
Example: Radix-2 FFT Implementation in Python
Here’s a simple Python implementation of the Radix-2 FFT algorithm:
1 import numpy as np
2
15 # Example usage
16 x = np.random.random(8) # Input array of length 8 (must be a power of 2)
17 X = fft(x)
18
This implementation recursively computes the FFT of the input sequence x. It divides the input
into even and odd parts, computes their FFTs, and combines them using the twiddle factors.
Time Complexity: The Radix-2 FFT algorithm reduces the computational complexity from O(N 2 )
to O(N log N ), which is a significant improvement, especially for large input sizes[191].
38.2. APPLICATIONS OF FFT IN DEEP LEARNING 245
where F denotes the Fourier Transform, and F −1 denotes the inverse Fourier Transform.
Example: Using FFT for Fast Convolution
In this example, we use the FFT to compute the convolution of two signals.
1 import numpy as np
2 from scipy.fft import fft, ifft
3
This approach leverages the FFT to compute the convolution in the frequency domain, reducing
the computational cost compared to the direct method of convolving two signals in the time domain.
1 import numpy as np
246 CHAPTER 38. FAST FOURIER TRANSFORM (FFT)
36 plt.tight_layout()
37 plt.show()
In this example:
• We plot the frequency spectrum to visualize the frequencies present in the signal.
• Feature Extraction: In tasks such as audio and speech recognition, the FFT is used to extract
frequency-domain features from raw audio signals, which can then be fed into machine learning
models.
• Denoising: The FFT can help remove noise by filtering out unwanted frequencies in the data.
38.3. SUMMARY 247
• Anomaly Detection: Spectral analysis using FFT can detect periodic or anomalous patterns in
time-series data, which is useful in predictive maintenance and anomaly detection tasks.
38.3 Summary
In this chapter, we explored the Fast Fourier Transform (FFT), a highly efficient algorithm for com-
puting the Discrete Fourier Transform (DFT). We discussed the significance of FFT in reducing the
computational complexity of the DFT and examined the Radix-2 FFT algorithm in detail. Additionally,
we demonstrated several applications of FFT in deep learning, including fast convolution and spectral
analysis for feature extraction. The FFT continues to be a powerful tool in numerical computing, signal
processing, and deep learning, enabling efficient computation and analysis of large datasets.
248 CHAPTER 38. FAST FOURIER TRANSFORM (FFT)
Chapter 39
Laplace Transform
The Laplace Transform is a powerful integral transform used in engineering, physics, and mathemat-
ics to analyze linear time-invariant systems. It converts differential equations into algebraic equations,
making it easier to solve complex problems. In this chapter, we will explore the definition, mathemati-
cal properties, and common applications of the Laplace Transform.
Where:
• s is a complex number, s = σ + iω, where σ is the real part and ω is the imaginary part.
The Laplace Transform provides insights into the behavior of dynamic systems and helps in solving
ordinary differential equations (ODEs) and partial differential equations (PDEs)[50].
249
250 CHAPTER 39. LAPLACE TRANSFORM
• Sine Function:
ω
f (t) = sin(ωt) =⇒ F (s) =
s2 + ω2
• Cosine Function:
s
f (t) = cos(ωt) =⇒ F (s) =
s2 + ω 2
• Power Function:
n!
f (t) = tn =⇒ F (s) = (n is a non-negative integer)
sn+1
These transforms are essential in control systems and engineering applications, as they help solve
differential equations that describe system behavior.
c+i∞
1
Z
f (t) = L−1
{F (s)} = est F (s) ds
2πi c−i∞
Where c is a real number that is greater than the real part of all singularities of F (s).
Example: Inverse Laplace Transform of a Rational Function
Let’s consider the function:
1
F (s) =
s2 + 1
Using known properties of Laplace Transforms, we can determine:
−1 1
f (t) = L 2
= sin(t)
s +1
This shows how we can recover the original function from its transform.
• Linearity:
L{af (t) + bg(t)} = aF (s) + bG(s)
Where a and b are constants, and F (s) and G(s) are the Laplace Transforms of f (t) and g(t),
respectively.
39.3. CONCLUSION 251
• Time Shifting:
L{f (t − a)u(t − a)} = e−as F (s) (t ≥ a)
• Frequency Shifting:
L{eat f (t)} = F (s − a)
• Differentiation:
L{f ′ (t)} = sF (s) − f (0)
• Integration:
Z t
1
L f (τ ) dτ = F (s)
0 s
These properties allow for the simplification of complex transforms, enabling the analysis and
design of systems in a straightforward manner.
39.3 Conclusion
The Laplace Transform is a vital mathematical tool in various fields, particularly in engineering and
physics. It provides a systematic approach to analyzing linear time-invariant systems and facilitates
the solution of differential equations. In this chapter, we discussed the definition, common functions,
inverse transform, and key properties of the Laplace Transform, which are crucial for anyone working
in fields that require the analysis of dynamic systems.
The Laplace Transform is a powerful mathematical tool with numerous applications in both control
systems and deep learning. It enables the analysis of dynamic systems, providing insights into their
behavior and stability. In this section, we will explore how the Laplace Transform is used in stability
analysis of neural networks and in solving differential equations.
Stability is a critical aspect of neural networks and control systems. A system is considered stable if
its output remains bounded for any bounded input. In the context of neural networks, stability analysis
helps us understand how changes in weights, biases, and inputs affect the network’s behavior over
time.
The Laplace Transform provides a method for analyzing stability by converting time-domain dif-
ferential equations that describe the system into algebraic equations in the frequency domain. This
makes it easier to analyze the poles of the system, which determine stability[247].
252 CHAPTER 39. LAPLACE TRANSFORM
The poles of a system are the values of s in the Laplace domain that make the denominator of the
transfer function zero. For a continuous-time system, if all poles have negative real parts, the system
is stable. Conversely, if any pole has a positive real part, the system is unstable.
For example, consider a simple first-order linear system described by the differential equation:
dy(t)
τ + y(t) = Ku(t)
dt
Where:
Y (s) K
H(s) = =
U (s) τs + 1
The pole of this transfer function is at s = − τ1 . Since τ > 0, the pole is in the left-half plane,
indicating that the system is stable.
Example: Stability Analysis in Python
Here is an example of how to perform stability analysis of a first-order system using Python:
1 import numpy as np
2 import matplotlib.pyplot as plt
3 from scipy.signal import TransferFunction, step
4
19 plt.plot(t, y)
20 plt.title('Step Response of First-Order System')
21 plt.xlabel('Time [s]')
22 plt.ylabel('Response')
23 plt.grid()
24 plt.axhline(1, color='r', linestyle='--', label='Steady State Value')
25 plt.legend()
26 plt.show()
In this example:
• The step response of the system is plotted, demonstrating how the system responds to a step
input over time.
This analysis can be extended to more complex systems, including those with multiple poles and
zeros, where the stability can be assessed by examining the location of poles in the complex plane.
dy(t)
+ ay(t) = bu(t)
dt
Where:
bU (s)
Y (s) =
s+a
This expression can be inverted using the inverse Laplace Transform to find y(t).
Example: Solving the ODE in Python
Let’s consider the case where u(t) = 1 (a step input) and solve the ODE using Python.
254 CHAPTER 39. LAPLACE TRANSFORM
In this example:
• We define the variables and the differential equation using symbolic computation.
• The Laplace Transform is taken, and the solution for Y (s) is derived.
• The inverse Laplace Transform is computed to obtain the solution in the time domain.
The solution y(t) provides the response of the system over time to a step input, demonstrating how
the Laplace Transform simplifies the process of solving differential equations.
Conclusion
The Laplace Transform is a versatile tool in both control systems and deep learning applications. It
allows for effective stability analysis of neural networks and provides a systematic method for solving
differential equations, which are critical for modeling dynamic systems. By transforming complex
differential equations into simpler algebraic forms, the Laplace Transform simplifies the analysis and
design of systems in various engineering disciplines. Understanding these applications is essential
for engineers and data scientists working with systems that evolve over time.
Chapter 40
Z-Transform
The Z-transform is a powerful mathematical tool used in the field of signal processing, control sys-
tems, and digital signal processing. It provides a method to analyze discrete-time signals and systems
in the frequency domain. In this chapter, we will introduce the concept of the Z-transform, its mathe-
matical definition, common sequences, inverse Z-transform, and its properties.
where:
• z is a complex variable, defined as z = rejω , where r is the magnitude and ω is the angle (fre-
quency).
The Z-transform provides a way to analyze the behavior of discrete-time systems in terms of their
poles and zeros in the complex plane.
255
256 CHAPTER 40. Z-TRANSFORM
x[n] = an u[n]
• Linearity:
Z{a1 x1 [n] + a2 x2 [n]} = a1 X1 (z) + a2 X2 (z)
40.2. MATHEMATICAL DEFINITION OF Z-TRANSFORM 257
• Time Shifting:
Z{x[n − k]} = z −k X(z)
• Time Scaling:
Z{x[an]} = X(z 1/a )
• Convolution:
Z{x[n] ∗ h[n]} = X(z)H(z)
• Differentiation:
dX(z)
Z{nx[n]} = −z
dz
These properties simplify the analysis and design of digital filters and control systems.
Example: Using Properties of Z-Transform
Let’s consider the Z-transform of a simple signal using its properties. Suppose we want to find the
Z-transform of x[n] = u[n] + 2u[n − 1].
1 import sympy as sp
2
This code uses the properties of the Z-transform to calculate the Z-transform of the combined
signal, demonstrating how to leverage the properties in practical applications.
In summary, the Z-transform is a fundamental tool for analyzing discrete-time systems and sig-
nals, providing insights into their behavior in the frequency domain. Understanding the mathematical
definition, common sequences, inverse Z-transform, and properties of the Z-transform is crucial for
signal processing and control system design.
258 CHAPTER 40. Z-TRANSFORM
∞
X
X(z) = x[n]z −n
n=−∞
where z is a complex number defined as z = rejω , where r is the radius and ω is the angular
frequency. The Z-Transform transforms the discrete-time signal from the time domain to the complex
frequency domain, allowing for easier analysis of linear time-invariant systems.
The Z-Transform is particularly useful for analyzing the stability and frequency response of discrete-
time systems. The poles and zeros of the Z-Transform provide insight into the behavior of the system.
Example: Z-Transform of a Simple Discrete Signal
Let’s consider a simple discrete-time signal x[n] = an u[n], where u[n] is the unit step function and
a is a constant. The Z-Transform of this signal can be calculated as follows:
∞
X 1
X(z) = an z −n = for |z| > |a|
n=0
1 − az −1
This result shows that the Z-Transform of a geometric sequence converges for |z| > |a|.
Calculating the Z-Transform in Python
Let’s implement this example in Python and visualize the Z-Transform of the discrete signal.
1 import numpy as np
2 import matplotlib.pyplot as plt
3
4 # Define parameters
5 a = 0.5 # Decay factor
6 n = np.arange(0, 20) # Discrete time values
7
21
In this example:
• The original discrete signal and its Z-Transform magnitude are plotted for analysis.
f (Wx X(z))
H(z) =
1 − Wh z −1 f ′ (H(z))
This expression indicates how the hidden state responds to the input in the frequency domain.
Using Z-Transform for Sequence Prediction in RNNs
In a practical implementation, RNNs can leverage the Z-Transform to improve their performance in
sequence prediction tasks. By analyzing the Z-Transform of the hidden states, we can determine the
appropriate architecture and activation functions that lead to stable and efficient learning.
To illustrate the application of RNNs in Python, we can use a simple RNN model implemented with
Keras:
19 # Generate data
20 timesteps = 100
21 features = 5
22 X, y = generate_data(timesteps, features)
23
In this example:
• We generate synthetic sequential data, where the target is the sum of the input features.
• We create a simple RNN model using Keras and train it on the generated data.
• The trained model is evaluated, demonstrating its ability to learn from sequential data.
The Z-Transform assists in understanding the underlying mechanics of RNNs and how they handle
temporal dependencies, which is crucial for tasks involving sequences.
Chapter 41
Convolution is a mathematical operation that combines two functions to produce a third function. It
represents the way in which one function influences another. In the context of signals and systems,
convolution describes how an input signal is transformed by a system represented by an impulse
response.
Applications of Convolution:
• Filtering: Convolution is used to filter signals, such as removing noise or enhancing certain fea-
tures.
• Image Processing: Convolution is applied in various image processing tasks, such as blurring,
sharpening, and edge detection.
• Neural Networks: Convolutional Neural Networks (CNNs) utilize convolution to extract features
from images and other data.
Convolution is an operation that takes two input functions and produces a new function that expresses
how the shape of one function is modified by the other. For continuous functions, the convolution of
261
262 CHAPTER 41. CONVOLUTION IN TIME AND FREQUENCY DOMAINS
X
(f ∗ g)[n] = f [m]g[n − m]
m
Example Calculation:
1. (f ∗ g)[0] = f [0]g[0] = 1 · 0 = 0
• Commutative Property: f ∗ g = g ∗ f
• Associative Property: f ∗ (g ∗ h) = (f ∗ g) ∗ h
• Distributive Property: f ∗ (g + h) = f ∗ g + f ∗ h
• Identity Property: f ∗ δ(t) = f (t), where δ(t) is the Dirac delta function.
These properties make convolution a flexible and powerful tool for analyzing linear systems.
41.3. CONVOLUTION THEOREM: LINKING TIME AND FREQUENCY DOMAINS 263
F {f ∗ g} = F {f } · F {g}
This property is critical because it allows us to analyze systems in the frequency domain, which
can often simplify calculations.
Given two functions f (t) and g(t), the convolution (f ∗ g)(t) in the time domain corresponds to multi-
plication in the frequency domain:
F {f ∗ g} = F (ω)G(ω)
where F (ω) and G(ω) are the Fourier Transforms of f (t) and g(t), respectively.
Example: Verifying the Convolution Theorem
Let’s compute the Fourier Transform of the convolution from our previous example and verify the
convolution theorem using Python.
1 import numpy as np
2 from scipy.fft import fft, ifft
3
20
21 # Print results
22 print("Convolution Result:", convolution_result)
23 print("Inverse FFT Result:", inverse_fft_result)
This code calculates the convolution directly using NumPy’s convolve function and verifies the
result using the FFT and inverse FFT.
Expected Output:
The convolution result and the inverse FFT result should match, demonstrating the convolution
theorem’s validity.
Conversely, multiplying two functions in the time domain corresponds to convolution in the frequency
domain:
1
F {f (t)g(t)} = F {f } ∗ F {g}
2π
This property allows us to analyze the effects of multiplicative interactions between signals in the
frequency domain.
Example: Verifying the Multiplication Theorem
Let’s implement the multiplication in the time domain and observe the convolution in the frequency
domain.
17 # Print results
18 print("Product in Time Domain:", product_time_domain)
19 print("Convolution of Fourier Transforms:", convolution_frequency_domain)
This example shows the relationship between multiplication in the time domain and convolution
in the frequency domain.
41.4. SUMMARY 265
41.4 Summary
In this chapter, we explored the concept of convolution in both the time and frequency domains. We
defined convolution mathematically, discussed its properties, and introduced the convolution theorem
that links the two domains. The understanding of convolution is crucial in signal processing, image
analysis, and machine learning applications, providing the basis for filtering and feature extraction.
By employing the FFT, convolutions can be computed efficiently, facilitating real-time processing and
analysis of signals and data.
1. Transform the input and the filter into the frequency domain using the Fast Fourier Transform
(FFT).
3. Transform the result back to the time domain using the Inverse FFT.
This approach can significantly reduce the computational complexity from O(N 2 ) for direct convo-
lution to O(N log N ) for FFT-based convolution, making it particularly advantageous for large datasets.
Python Example: Frequency Domain Convolution
Here is a simple implementation of convolution in the frequency domain using NumPy:
1 import numpy as np
2 import matplotlib.pyplot as plt
3
15 return np.real(convolved_signal)
16
17 # Example usage
18 signal = np.array([1, 2, 3, 4])
19 kernel = np.array([0.25, 0.5, 0.25])
20
In this example:
• We compute the FFT of both the signal and kernel, multiply their frequency representations, and
then apply the inverse FFT to obtain the convolved signal.
• The result demonstrates the convolution of the original signal with the specified kernel.
1. Transform Input and Filters: Convert the input feature maps and convolutional filters to the
frequency domain using FFT.
3. Inverse Transform: Apply the inverse FFT to obtain the convolved feature maps in the spatial
domain.
1 import numpy as np
2 import tensorflow as tf
3
29 return tf.abs(convolved_output)
30
In this example:
• We define a function fft_convolution() to perform convolution using FFT, similar to what would
occur in a CNN layer.
• The kernel simulates an edge detection filter, which is commonly used in image processing.
• Reduced Dimensionality: By performing pooling in the frequency domain, it can effectively re-
duce the dimensionality of the feature maps while preserving important information.
• Robustness to Noise: Frequency domain operations can help in making models more robust to
noise and variations in the input data.
15 return np.real(pooled_feature_map)
16
17 # Example usage
18 feature_map = np.random.rand(8, 8) # Simulate a feature map from CNN
19 pooled_feature_map = spectral_pooling(feature_map, pool_size=4)
20
In this example:
• We compute the FFT of the feature map and set high-frequency components to zero based on
the specified pooling size.
• The inverse FFT reconstructs the pooled feature map from the modified frequency representa-
tion.
• The results show how spectral pooling can reduce the dimensionality of the feature map while
maintaining important low-frequency information.
In conclusion, the Convolution Theorem and its applications in deep learning, such as FFT-based
convolution, spectral pooling, and efficient frequency domain operations, provide powerful tools for
enhancing the performance of neural networks. These techniques enable faster computations and
better feature extraction, making them essential in modern machine learning frameworks.
Chapter 42
Frequency domain methods are widely used in various fields such as image processing, audio signal
processing, control systems, and deep learning. These methods allow us to analyze, manipulate, and
process signals and data effectively. In this chapter, we will explore the applications of Fourier Trans-
form, Fast Fourier Transform (FFT), Laplace Transform, and Z-Transform in practical scenarios[193].
2. Image Compression: The Fourier Transform is also used in compression algorithms, such as
JPEG. By transforming an image into the frequency domain, we can discard less important fre-
quency components, allowing for reduced file sizes.
3. Feature Extraction: In machine learning and neural networks, the Fourier Transform can be used
to extract features from images. By analyzing the frequency components, neural networks can
learn to recognize patterns and classify images more effectively.
1 import numpy as np
269
270 CHAPTER 42. PRACTICAL APPLICATIONS OF FREQUENCY DOMAIN METHODS
20 # Original image
21 plt.subplot(1, 2, 1)
22 plt.imshow(image, cmap='gray')
23 plt.title('Original Image')
24 plt.axis('off')
25
26 # Magnitude spectrum
27 plt.subplot(1, 2, 2)
28 plt.imshow(magnitude_spectrum, cmap='gray')
29 plt.title('Magnitude Spectrum')
30 plt.axis('off')
31
32 plt.show()
In this example:
• We load a grayscale image and compute its 2D Fourier Transform using fft2.
• The magnitude spectrum is calculated and displayed, showing the frequency content of the im-
age.
2. Audio Filtering: FFT enables efficient filtering of audio signals by manipulating specific fre-
quency ranges. For instance, one can remove noise from a recording by suppressing unwanted
frequency components.
3. Speech Recognition: In speech processing, FFT helps convert time-domain audio signals into
frequency-domain representations. These representations can be used to extract features for
machine learning algorithms to recognize spoken words.
17 # Time-domain signal
18 plt.subplot(1, 2, 1)
19 plt.plot(np.linspace(0, N / sampling_rate, N), audio_data)
20 plt.title('Time-Domain Audio Signal')
21 plt.xlabel('Time [s]')
22 plt.ylabel('Amplitude')
23
24 # Frequency-domain (FFT)
25 plt.subplot(1, 2, 2)
26 plt.plot(xf[:N // 2], 2.0 / N * np.abs(yf[:N // 2]))
27 plt.title('FFT of Audio Signal')
28 plt.xlabel('Frequency [Hz]')
29 plt.ylabel('Magnitude')
30 plt.grid()
31
32 plt.show()
In this example:
272 CHAPTER 42. PRACTICAL APPLICATIONS OF FREQUENCY DOMAIN METHODS
• The time-domain signal and its frequency-domain representation are plotted, illustrating how
FFT reveals the frequency components of the audio signal.
2. Controller Design: Control system designers use the Laplace Transform to design controllers
(like PID controllers) that maintain system stability and performance.
3. Response Analysis: It enables the analysis of the system response to various inputs, including
step, impulse, and sinusoidal inputs.
23 plt.legend()
24 plt.show()
In this example:
• The step response is plotted to analyze how the system responds to a step input over time.
2. Stability Analysis: The Z-Transform helps determine the stability of digital filters by analyzing
the poles of the transfer function in the Z-domain.
3. Signal Analysis: It allows for efficient analysis of discrete signals, enabling the extraction of
frequency components and system characteristics.
20
In this example:
• We design a low-pass filter using the Z-Transform and apply it to a noisy sine wave.
• The original noisy signal and the filtered signal are plotted to demonstrate the effectiveness of
the filter.
Chapter 43
Conclusion
Frequency domain methods are integral to modern signal processing, control systems, and machine
learning applications. The Fourier Transform, FFT, Laplace Transform, and Z-Transform provide pow-
erful techniques for analyzing and manipulating signals and systems. By understanding and applying
these methods, practitioners can enhance their ability to design systems, process data, and solve
complex problems in various engineering disciplines.
275
276 CHAPTER 43. CONCLUSION
Chapter 44
Practice Problems
This chapter contains a set of practice problems designed to reinforce the concepts learned through-
out this book, particularly focusing on Fourier and Laplace transforms, FFT and convolution theorem,
and applications of frequency domain methods in deep learning.
277
278 CHAPTER 44. PRACTICE PROBLEMS
x[n] = {1, 2, 3, 4}
Summary
In this chapter, we summarize the key concepts covered throughout the book, providing a concise
recap of the fundamental ideas.
279
280 CHAPTER 45. SUMMARY
Bibliography
[2] Tamer Abdelsalam and A. M. Ibrahim. Laplace transform and its applications in engineering.
Mathematical Methods in Engineering, 18(4):204–221, 2023.
[3] Monika Agarwal and Rajesh Mehra. Review of matrix decomposition techniques for signal pro-
cessing applications. International Journal of Engineering Research and Applications, 4(1):90–
93, 2014.
[4] Charu C. Aggarwal, Lagerstrom-Fife Aggarwal, and Lagerstrom-Fife. Linear algebra and opti-
mization for machine learning, volume 156. Springer International Publishing, Cham, 2020.
[5] John H. Ahlberg, Norman L. Fox, Leo S. Goodwin, Carl H. Hayden, John G. Krogh, and Marvin H.
Thompson. Methods in Computational Physics. Academic Press, 1968.
[6] Vladimir I. Arnold. Ordinary Differential Equations. Springer-Verlag, Berlin, Heidelberg, 3rd edition,
1992.
[7] Kendall E Atkinson. An introduction to numerical analysis. John Wiley & Sons, 1989.
[8] Owe Axelsson. Iterative Solution Methods, volume 5. Cambridge University Press, 1996.
[9] Benjamin Baka. Python Data Structures and Algorithms. Packt Publishing Ltd, 2017.
[11] Claire Baker, Volodymyr Mnih, and Tom Schaul. Generalized value functions for reinforcement
learning with learning objectives. Journal of Machine Learning Research, 25(1):1–35, 2024.
[12] Steve Baker and Wei Zhang. Jacobian-free newton-krylov methods in scientific computing: Chal-
lenges and solutions. SIAM Journal on Scientific Computing, 46(2):A211–A234, 2024.
[13] Peter L Bartlett and Shahar Mendelson. A bound on the error of cross validation using approxi-
mation algorithms. IEEE Transactions on Information Theory, 51(11):4005–4014, 2005.
[14] Giulia Battaglia, Ludmil Zikatanov, and Pasquale Barone. Efficient physics-informed neural net-
works for solving nonlinear pdes. Journal of Computational Physics, 473:111741, 2023.
[15] Roberto Battiti. First-and second-order methods for learning: between steepest descent and
newton’s method. Neural computation, 4(2):141–166, 1992.
281
282 BIBLIOGRAPHY
[16] Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey A. Radul, and Jeffrey M. Siskind. Automatic
differentiation in machine learning: a survey. Journal of Machine Learning Research, 18(153):1–
43, 2018.
[17] Yoshua Bengio. Practical recommendations for gradient-based training of deep architectures. In
Grégoire Montavon, Geneviève Orr, and Klaus-Robert Müller, editors, Neural Networks: Tricks of
the Trade: Second Edition, pages 437–478. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
[18] Philip R. Bevington and D. Keith Robinson. Data Analysis and Error Estimation for the Physical
Sciences. McGraw-Hill Physical and Engineering Sciences Series. McGraw-Hill Higher Education,
2002.
[19] Dario Bini and Victor Y. Pan. Polynomial and matrix computations: fundamental algorithms.
Springer Science & Business Media, 2012.
[20] Christopher M. Bishop. Pattern recognition and machine learning. springer, 2006.
[21] Äke Björck. Numerical Methods for Least Squares Problems. Society for Industrial and Applied
Mathematics, 1996.
[22] Boualem Boashash. Time-frequency signal analysis and processing: a comprehensive reference.
Academic press, 2015.
[23] Léon Bottou. Large-scale machine learning with stochastic gradient descent. Proceedings of
the 19th International Conference on Computational Statistics, pages 177–186, 2010.
[24] Stephen Boyd and Lieven Vandenberghe. Convex optimization, volume 3. Cambridge university
press, 2004.
[25] Ronald Bracewell. Fourier Analysis and Imaging. Springer Science & Business Media, 2012.
[26] Ronald N. Bracewell. Fourier transforms and their applications. McGraw-Hill New York, 1986.
[28] Harold L Broberg. Laplace and z transform analysis and design using matlab. In 1996 Annual
Conference, pages 1–295, Jun 1996.
[29] John W. Brown. An Introduction to the Numerical Analysis of Functional Equations, volume 7.
Springer, 1967.
[30] Tom Brown and Sarah Wilson. Ordinary differential equations in scientific computing: The state
of the art. SIAM Review, 65(3):455–486, 2023.
[31] Charles G Broyden. A class of methods for solving nonlinear simultaneous equations. Mathe-
matics of computation, 19(92):577–593, 1965.
[32] Charles George Broyden. The convergence of a class of double-rank minimization algorithms:
1. general considerations. IMA Journal of Applied Mathematics, 6(1):76–90, 1970.
[34] R. L. Burden and J. D. Faires. Finite Difference Methods for Ordinary and Partial Differential Equa-
tions: Steady-State and Time-Dependent Problems. Brooks/Cole, 2016.
BIBLIOGRAPHY 283
[35] Richard L. Burden and J. Douglas Faires. A First Course in Numerical Analysis. Prentice Hall.
Prentice Hall, 2001.
[36] Richard L. Burden and J. Douglas Faires. Numerical Analysis. Prentice-Hall series in automatic
computation. Prentice Hall, 2015.
[37] Richard L. Burden and J. Douglas Faires. Numerical Analysis. Brooks/Cole Engineering. Cengage
Learning, 2016.
[38] C. Sidney Burrus. Fft: An algorithm the whole family can use. IEEE ASSP Magazine, 2(4):4–15,
1985.
[39] Richard H. Byrd, Robert B. Schnabel, and Zhong Zhang. Recent advances in unconstrained op-
timization: theory and methods. Annual Review of Computational Mathematics, 5(1):205–248,
2023.
[40] Huiping Cao, Xiaomin An, and Jing Han. Solving nonlinear equations with a direct broyden
method and its acceleration. Journal of Applied Mathematics and Computing, 2023.
[41] James Carroll. Interpolation and Approximation by Polynomials. American Mathematical Soc.,
2006.
[42] E. C. Carson. The laplace transform and its applications. Journal of the Franklin Institute,
205(6):951–963, 1928.
[43] Steven Chapra and Raymond Canale. Numerical Methods for Engineers: Methods and Applica-
tions. McGraw-Hill Higher Education, 2010.
[44] Steven Chapra and Raymond Canale. Numerical Methods for Engineers. The Brooks/Cole Engi-
neering Series. McGraw-Hill Education, 2011.
[45] Steven Chapra and Raymond Canale. Numerical Methods for Engineers. McGraw-Hill Education,
2016.
[46] Jie Chen and Yi Lin. Dynamical systems for stiff odes: A survey and new approaches. Mathe-
matics, 11(5):895, 2023.
[47] Jie Chen and Wen Zhang. A survey of finite difference methods for time-dependent pdes. Math-
ematics, 11(7):1583, 2023.
[48] Tianqi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differ-
ential equations: Advances and applications in machine learning. Journal of Machine Learning
Research, 24:1–40, 2023.
[49] Ting Chen, Xiaohui Tao, and Michael K Hu. A measure of diversity in classifier ensembles. In
Sixth International Conference on Intelligent Data Engineering and Automated Learning (IDEAL’05),
volume 3578, pages 16–25. Springer, 2004.
[50] Wei Chen and Feng Zhao. Laplace transform analysis of nonlinear dynamic systems. Journal
of Sound and Vibration, 541:117181, 2024.
[51] Yi Chen, Mingrui Sun, and Lin Zhang. Gradient-free optimization in machine learning: Algorithms,
applications, and challenges. Journal of Machine Learning Research, 25:1–34, 2024.
284 BIBLIOGRAPHY
[52] Ward Cheney and Will Light. An Introduction to the Numerical Analysis of Functional Equations.
Corrected reprint of the 1966 original. Dover Publications, 2009.
[53] André-Louis Cholesky. Sur la résolution des équations linéaires par la méthode des moindres
carrés. Gazette des Ponts et Chaussées, 3(3):161–173, 1907.
[55] Andrzej Cichocki. Tensor networks for dimensionality reduction, big data and deep learning.
In Advances in Data Analysis with Computational Intelligence Methods: Dedicated to Professor
Jacek Żurada, pages 3–49. 2018.
[56] Codewars Team. Codewars: Achieve mastery through coding practice and developer mentor-
ship. https://www.codewars.com/, 2024. Accessed: 2024-10-09.
[57] A. R. Colquhoun and A. R. Gibson. Numerical Interpolation, Differentiation, and Integration. Oxford
University Press, 1997.
[58] Andrew R. Conn, Katya Scheinberg, and Luis N. Vicente. Advances in derivative-free optimization
for unconstrained problems. Optimization Methods and Software, 38(2):302–321, 2023.
[59] James W Cooley and John W Tukey. Algorithm 501: The fast fourier transform. Communications
of the ACM, 13(2):1–16, 1965.
[60] Don Coppersmith and Shmuel Winograd. Matrix multiplication via arithmetic progressions. In
Proceedings of the nineteenth annual ACM symposium on Theory of computing. ACM, 1987.
[62] Richard Courant, Fritz John, Albert A. Blank, and Alan Solomon. Introduction to calculus and
analysis, volume 1. Interscience Publishers, New York, 1965.
[63] Frank C Curriero. On the use of non-euclidean distance measures in geostatistics. Mathematical
Geology, 38(9):907–926, 2006.
[64] Ashok Cutkosky and Francesco Orabona. Momentum-based variance reduction in non-convex
sgd. In Advances in Neural Information Processing Systems, volume 32, 2019.
[65] George Cybenko. The approximation capability of multilayer feedforward networks. Mathemat-
ics of Control, Signals, and Systems (MCSS), 2(4):303–314, 1989.
[66] Ezey M. Dar-El. Human Learning: From Learning Curves to Learning Organizations, volume 29.
Springer Science & Business Media, New York, NY, 1st edition, 2013.
[67] Gaston Darboux. Sur les transformations de laplace. Annali di Matematica Pura e Applicata,
14:119–158, 1915.
[68] Philip J Davis. On the newton interpolation formula. The American Mathematical Monthly,
74(3):258–266, 1967.
[69] Carl De Boor. On calculating with splines. Journal of Approximation Theory, 6(1):50–62, 1972.
[70] James W. Demmel. On the stability of gaussian elimination. SIAM journal on numerical analysis,
26(4):882–899, 1989.
BIBLIOGRAPHY 285
[71] James W. Demmel. The qr algorithm for real hessenberg matrices. SIAM Journal on Scientific
and Statistical Computing, 10(6):1042–1078, 1989.
[72] James W. Demmel. Round-off errors in matrix procedures. SIAM Journal on Numerical Analysis,
29(5):1119–1178, 1992.
[73] P. G. L. Dirichlet. On the convergence of fourier series. Journal für die reine und angewandte
Mathematik, 1829.
[74] Urmila M. Diwekar. Introduction to Applied Optimization, volume 22. Springer Nature, 2020.
[75] Matthew R. Dowling and Lindsay D. Grant. A review of recent advances in runge-kutta methods
for solving differential equations. Numerical Algorithms, 96(1):89–113, 2023.
[76] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning
and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, 2011.
[77] Iain S. Duff and Jack K. Reid. Ma48–a variable coefficient sparse indefinite solver. i. the algo-
rithm. ACM Transactions on Mathematical Software (TOMS), 9(3):309–326, 1983.
[78] Roger Dufresne. A general convolution theorem for fourier transforms. Mathematics of Compu-
tation, 72(243):1045–1059, 2003.
[79] George Eckart and Gale Young. The approximation of one matrix by another of lower rank.
Psychometrika, 1(3):211–218, 1936.
[80] Shervin Erfani and Nima Bayan. Characterisation of nonlinear and linear time-varying systems
by laplace transformation. International Journal of Systems Science, 44(8):1450–1467, 2013.
[81] Lawrence C. Evans. Partial Differential Equations, volume 19 of Graduate Studies in Mathematics.
American Mathematical Society, Providence, RI, 2nd edition, 2010.
[82] Khaled Fayed and Mohammed Ali. A new adaptive step-size method for stiff odes. Applied
Mathematics and Computation, 457:127054, 2023.
[83] Anthony V Fiacco and Garth P McCormick. Nonlinear programming: Sequential unconstrained
minimization techniques. 1968.
[84] Wendell H Fleming and Albert Tong. Interpolation and approximation, volume 8. Springer, 1977.
[85] Roger Fletcher. A new approach to variable metric algorithms. The Computer Journal, 13(3):317–
322, 1970.
[86] Roger Fletcher. Practical Methods of Optimization. John Wiley & Sons, 2nd edition, 2013.
[87] J. A. Ford and I. A. Moghrabi. Multi-step quasi-newton methods for optimization. Journal of
Computational and Applied Mathematics, 50(1-3):305–323, 1994.
[88] Jean-Baptiste Joseph Fourier. Analytical Theory of Heat. Cambridge University Press, 1822.
[89] Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. Forward and reverse
gradient-based hyperparameter optimization. In International Conference on Machine Learning,
pages 1165–1173. PMLR, 2017.
286 BIBLIOGRAPHY
[90] Jerome H Friedman. A proof that piecewise linear interpolation of data points is a spline. Tech-
nical report, Stanford University, 1984.
[91] Jerome H Friedman and John W Tukey. Projection pursuit. IEEE Transactions on Computers,
C-23(9):881–890, 1974.
[92] Claus Fuhrer, Jan Erik Solem, and Olivier Verdier. Scientific Computing with Python: High-
performance scientific computing with NumPy, SciPy, and pandas. Packt Publishing Ltd, 2021.
Includes additional publication date information.
[94] Maria A. Garcia and Peter R. Johnson. Jacobian and hessian matrices in nonlinear optimization
for machine learning algorithms. Journal of Optimization Theory and Applications, 187(1):101–
118, 2023.
[95] Miguel Garcia and Ayesha Patel. Efficient training of neural networks using l-bfgs: A comparative
study. Journal of Machine Learning Research, 25:1–22, 2024.
[96] Carl Friedrich Gauss. Tafeln der integrale der ersten art mit anwendungen auf die gaussische
theorie der quadrature. Journal of reine und angewandte Mathematik, 1814.
[97] C. William Gear. First-order differential equations and stiff systems. Communications of the
ACM, 14(10):722–733, 1971.
[98] Saptarshi Ghosh, Kyu-Jin Lee, and Weili Chen. A survey of deep reinforcement learning in
robotics: Trends and applications. Journal of Robotics and Automation, 12(1):1–25, 2024.
[100] Philip E. Gill, Walter Murray, and Margaret H. Wright. Practical Optimization. Academic Press,
1981.
[101] Donald Goldfarb. A family of variable-metric methods derived by variational means. Mathemat-
ics of computation, 24(109):23–26, 1970.
[102] Ronald N Goldman. Illicit expressions in vector algebra. ACM Transactions on Graphics (TOG),
4(3):223–243, 1985.
[103] G. H. Golub and W. Kahan. Computing the singular value decomposition. SIAM Journal on
Numerical Analysis, 2(2):205–224, 1965.
[104] G. H. Golub and J. M. Ortega. An Introduction to the Numerical Analysis of Functional Equations.
SIAM, 1993.
[105] Gene H Golub and Christian Reinsch. Singular value decomposition and least squares problems.
Numerische Mathematik, 14(5):403–420, 1970.
[106] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins studies in the
mathematical sciences. JHU Press, 2012.
BIBLIOGRAPHY 287
[107] Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins University Press,
4th edition, 2013.
[108] Lars Grasedyck, Ronald Kriemann, and Sabine Le Borne. Domain decomposition based-lu pre-
conditioning. Numerische Mathematik, 112(4):565–600, 2009.
[109] Werner H Greub. Linear algebra, volume 23. Springer Science & Business Media, 2012.
[110] Ming Gu and Stanley Eisenstat. Qr decomposition and its applications. SIAM Journal on Scien-
tific Computing, 15(5):1257–1271, 1994.
[111] Rakesh Gupta and Aditi Singh. Constrained multi-objective optimization using evolutionary al-
gorithms. Applied Soft Computing, 122:109937, 2023.
[112] HackerRank Team. HackerRank: Code practice and challenges for developers.
https://www.hackerrank.com/, 2024. Accessed: 2024-10-09.
[113] Ernst Hairer and Gerhard Wanner. Solving Ordinary Differential Equations I: Nonstiff Problems.
Springer Series in Computational Mathematics. Springer, 1996.
[114] Ernst Hairer and Gerhard Wanner. Solving Ordinary Differential Equations II: Stiff and Differential-
Algebraic Problems. Springer Series in Computational Mathematics. Springer, 2009.
[115] Charles R. Harris, K. Jarrod Millman, Stéfan J. Van Der Walt, Ralf Gommers, Pauli Virtanen, David
Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, and Robert Kern.
Array programming with numpy. Nature, 585(7825):357–362, 2020.
[116] Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity: the
Lasso and generalizations. CRC Press, 2015.
[118] Pritam Hazra and V. Govindaraju. Review on stochastic methods for unconstrained optimization.
Computational Optimization and Applications, 75(1):145–168, 2024.
[119] Magnus R. Hestenes and Eduard Stiefel. Methods of conjugate gradients for solving linear sys-
tems. Proceedings of the National Academy of Sciences, 40(40):449–450, 1952.
[120] Nicholas J. Higham. A stable and efficient algorithm for n-dimensional gaussian elimination.
SIAM journal on scientific and statistical computing, 11(1):35–47, 1990.
[121] M. K. Hindmarsh. The Finite Difference Method for Heat Conduction. Chapman & Hall, Ltd., 1973.
[122] Roger A Horn and Charles R Johnson. Matrix analysis, volume 2. Cambridge university press,
2012.
[123] Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are
universal approximators. Neural Networks, 2(5):359–366, 1989.
[124] Harold Hotelling. Analysis of a complex of statistical variables into principal components. Jour-
nal of Educational Psychology, 24(6):417, 1933.
[126] Thomas J. R. Hughes. The finite element method: Linear static and dynamic finite element
analysis. Prentice Hall, 1987.
[127] Dylan Hutchison, Bill Howe, and Dan Suciu. Lara: A key-value algebra underlying arrays and
relations. arXiv preprint arXiv:1604.03607, 2016.
[128] E. L. Ince. Numerical Solution of Partial Differential Equations: Finite Difference Methods. Dover
Publications, 1956.
[129] Frank P. Incropera and David P. Dewitt. Numerical Heat Transfer and Fluid Flow. Wiley Series in
Heat and Mass Transfer. Wiley, 2002.
[130] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by
reducing internal covariate shift. In International Conference on Machine Learning (ICML), 2015.
[131] Mark A. Iwen and Craig V. Spencer. A note on compressed sensing and the complexity of matrix
multiplication. Information Processing Letters, 109(10):468–471, Apr 2009.
[132] Michał Jaworski and Tarek Ziadé. Expert Python programming: become a master in Python
by learning coding best practices and advanced programming concepts in Python 3.7. Packt
Publishing Ltd, 2019.
[134] Robert Johansson and Robert Johansson. Symbolic Computing, chapter 6, pages 97–134.
Apress, 2019.
[135] Peter Johnson and Karen Lee. Runge-kutta methods and their applications in modern numeri-
cal analysis. In John D. Roberts and Anne Smith, editors, Handbook of Numerical Methods for
Differential Equations, pages 95–132. Springer, 2023.
[136] S. Lennart Johnsson and Kapil K. Mathur. Data structures and algorithms for the finite element
method on a data parallel supercomputer. International Journal for Numerical Methods in Engi-
neering, 29(4):881–908, Mar 1990.
[137] Ian Trevor Jolliffe. Principal component analysis, volume 2. Springer, 2002.
[138] Eric Jones, Travis Oliphant, and et al. Dubois, Paul. Scipy 1.0: fundamental algorithms for scien-
tific computing in python. Nature Methods, 17:261–274, 2020.
[140] Carl Karpfinger. Calculus and Linear Algebra in Recipes. Springer, 2022.
[142] William Karush. Minima of functions of several variables with inequalities as side conditions. In
Master’s thesis, Dept. of Mathematics, Univ. of Chicago, 1939.
[143] R. P. Kellogg and J. H. Welsch. On the numerical solution of integral equations. SIAM Journal
on Numerical Analysis, 12(2):345–362, 1975.
[144] J. Kim and Y. Park. A new finite difference method for the stochastic heat equation. Numerical
Algorithms, 92:123–140, 2023.
BIBLIOGRAPHY 289
[145] Jong-Hoon Kim and Min-Jae Lee. Adaptive z-transform techniques for real-time signal process-
ing. Signal Processing, 207:109876, 2024.
[146] Soo Jung Kim and Hyun Lee. High-order optimization methods: An overview and recent ad-
vances. Optimization and Machine Learning Review, 19:112–145, 2024.
[147] David Kincaid and Ward Cheney. Numerical Analysis: Mathematics of Scientific Computing.
American Mathematical Society, 3rd edition, 2009.
[148] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980, 2014.
[149] C. Koch and I. Segev. The role of dendrites in neuronal computation. Nature Reviews Neuro-
science, 24(5):357–375, 2023.
[150] J. Zico Kolter, Yuxin Wang, and Yifan Diao. Deep reinforcement learning for energy management
in smart buildings: A review. IEEE Transactions on Smart Grid, 14(2):1307–1321, 2023.
[151] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender
systems. Computer, 42(8):30–37, 2009.
[152] Ivan Kozachenko and Yuri Manin. Polynomial interpolation: Theory, methods, and applications,
volume 7. American Mathematical Soc., 1992.
[153] Harold W Kuhn and Albert W Tucker. Nonlinear programming. pages 481–492, 1951.
[154] Martin Wilhelm Kutta. Beitrag zur näherungweisen integration totaler differentialgleichungen.
Zeitschrift für Mathematik und Physik, 46:435–453, 1901.
[155] Nojun Kwak. Principal component analysis by Lp -norm maximization. IEEE Transactions on
Cybernetics, 44(5):594–609, 2013.
[156] Joseph Louis Lagrange. On a new general method of interpolation calculated in terms of la-
grange. Mémoires de Mathématique et de Physique, Académie des Sciences, 1859.
[157] E. L. Lancaster. On the newton-raphson method for complex functions. The American Mathe-
matical Monthly, 63(3):189–191, 1956.
[158] Louis Lapicque. Recherches quantitatives sur l’excitation des neurones. J. Physiol. (Paris),
9:620–635, 1907.
[159] David C. Lay. Linear algebra and its applications. Pearson, 2015.
[160] Jeffery J. Leader. Numerical Analysis and Scientific Computation. Chapman and Hall/CRC, 2022.
[161] Daniel Lee and Wei Chen. Applications of the nelder-mead method in hyperparameter optimiza-
tion for deep learning models. Journal of Artificial Intelligence Research, 76:341–365, 2023.
[162] Hyun Lee, Minho Choi, and Nirav Patel. Jacobian-based regularization for improved generaliza-
tion in deep neural networks. Neural Computation, 35(5):987–1010, 2023.
[163] Michael Lee and Yiwen Chen. Ai meets constrained optimization: Methods and applications.
Artificial Intelligence Review, 64(1):77–102, 2023.
290 BIBLIOGRAPHY
[164] Thomas Lee and Hana Kim. A survey on optimization algorithms for machine learning: From sgd
to l-bfgs and beyond. Journal of Optimization Theory and Applications, 195(2):305–329, 2023.
[165] LeetCode Team. LeetCode: Improve your problem-solving skills with challenges.
https://leetcode.com/, 2024. Accessed: 2024-10-09.
[166] Aitor Lewkowycz. How to decay your learning rate. arXiv preprint arXiv:2103.12682, 2021.
[167] Ming Li and Jun Wang. Adaptive finite difference methods for nonlinear partial differential equa-
tions. Applied Mathematics and Computation, 433:129–145, 2024.
[168] Xia Li and Rui Wang. Radix-2 fast fourier transform and its applications in image processing.
Journal of Visual Communication and Image Representation, 99:103897, 2023.
[169] Yu Li, Lei Huang, and Jian Zhao. A review of finite element methods for fluid-structure interaction
problems. Journal of Fluids and Structures, 100:1–15, 2024.
[170] Dong C Liu and Jorge Nocedal. Limited memory bfgs method for large scale optimization. Math-
ematical programming, 45(1-3):503–528, 1989.
[171] Jian Liu and Lin Wang. Stability analysis of neural networks using the hessian matrix: Theoretical
insights and practical applications. IEEE Transactions on Neural Networks and Learning Systems,
pages 1–14, 2023.
[172] Jing Liu, Minghua Zhang, and Hailong Wang. A comprehensive review of optimization algo-
rithms: From newton’s method to machine learning. ACM Computing Surveys, 56(2):10–47,
2023.
[173] Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei
Han. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265,
2019.
[174] Mei Liu, Liangming Chen, Aohao Du, Long **, and Mingsheng Shang. Activated gradients for deep
neural networks. IEEE Transactions on Neural Networks and Learning Systems, 34(4):2156–2168,
2021.
[175] G. G. Lorentz. Approximation Theory and Interpolation. National Science Foundation, Washing-
ton, D.C., USA, 1966.
[176] Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint
arXiv:1711.05101, 2017.
[177] Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. In
International Conference on Learning Representations (ICLR), 2017.
[178] Steven F. Lott. Mastering Object-Oriented Python: Build powerful applications with reusable code
using OOP design patterns and Python 3.7. Packt Publishing Ltd, 2019.
[179] William R. Mann. An introduction to the numerical analysis of functional equations, volume 5.
The University of Michigan, 1943.
[180] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information
Retrieval. Cambridge University Press, 2008.
BIBLIOGRAPHY 291
[181] Carlos Martinez and Jian Liu. Efficient algorithms for discrete fourier transform: A comprehen-
sive review. Applied Mathematics and Computation, 429:127197, 2023.
[182] Clara Martinez and Robert F. Gomez. Recent advances in time-stepping methods for differ-
ential equations: From runge-kutta to modern techniques. Journal of Computational Physics,
478:110945, 2023.
[183] Jose Martinez and Ananya Patel. A new algorithmic framework for high-dimensional con-
strained optimization. Optimization Letters, 2024.
[184] José Mario Martínez and José Luis Morales. Trust-region methods in unconstrained optimiza-
tion: recent trends and applications. Optimization Letters, 17(3):523–547, 2023.
[185] Marie-Laurence Mazure. Spline functions and the reproducing kernel hilbert space. SIAM review,
43(3):435–472, 2001.
[186] Andrew McCall and Emily Brown. Efficient numerical methods for stiff odes with discontinuous
solutions. Journal of Computational Physics, 472:111320, 2024.
[187] William M. McLean. Introduction to Numerical Analysis. Cambridge University Press, 2010.
[189] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Belle-
mare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georgian Ostrovski, and et al. Playing
atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[190] A. Mohamed, Y. Jiang, and T. Wang. Adaptive finite element methods for pdes: A survey. Applied
Numerical Mathematics, 186:109–123, 2023.
[191] Ahmed Mohamed and Mohamed Ali. Performance analysis of radix-2 fft on fpga for real-time
applications. Microprocessors and Microsystems, 103:104418, 2023.
[192] N. Mohammadi and F. Dabbagh. Continuous fourier transform: Theory and applications. Applied
Mathematics and Computation, 420:127612, 2023.
[193] N. Mohammadi, F. Dabbagh, and A. Mahdavian. Fourier series in signal processing: A review.
Journal of Signal Processing Systems, 95:301–314, 2023.
[194] B. Somanathan Nair. Digital signal processing: Theory, analysis and digital-filter design. PHI
Learning Pvt. Ltd., 2004.
[195] Maryam M. Najafabadi, Flavio Villanustre, Taghi M. Khoshgoftaar, Naeem Seliya, Randall Wald,
and Edin Muharemagic. Deep learning applications and challenges in big data analytics. Journal
of big data, 2(1):1–21, 2015.
[196] Meenal V Narkhede, Prashant P Bartakke, and Mukul S Sutaone. A review on weight initialization
strategies for neural networks. Artificial Intelligence Review, 55(1):291–322, 2022.
[197] John A. Nelder and Roger Mead. A simplex method for function minimization. The Computer
Journal, 7(4):308–313, 1965.
[198] Olavi Nevanlinna. Convergence of iterations for linear equations. Birkhäuser, 2012.
292 BIBLIOGRAPHY
[199] David Newton, Raghu Pasupathy, and Farzad Yousefian. Recent trends in stochastic gradient
descent for machine learning and big data. In 2018 Winter Simulation Conference (WSC), pages
366–380. IEEE, 2018.
[200] Jorge Nocedal and Stephen J. Wright. Numerical optimization. Springer Science & Business
Media, 2006.
[201] Jorge Nocedal and Stephen J. Wright. Recent advances in quasi-newton methods for large-scale
optimization. Optimization Letters, 17(4):567–593, 2023.
[202] Henri J Nussbaumer and Henri J Nussbaumer. The fast Fourier transform. Springer Berlin Hei-
delberg, 1982.
[203] Henri J Nussbaumer and Henri J Nussbaumer. The fast Fourier transform. Springer Berlin Hei-
delberg, 1982.
[204] Peter J. Olver, Chehrzad Shakiban, and Chehrzad Shakiban. Applied Linear Algebra, volume 1.
Prentice Hall, Upper Saddle River, NJ, 2006.
[205] Alan V. Oppenheim and Ronald W. Schafer. Digital Signal Processing. Prentice-Hall, 1975.
[206] Alan V. Oppenheim and Ronald W. Schafer. Discrete-Time Signal Processing. Prentice Hall, 3rd
edition, 1997. This book provides a comprehensive foundation in the theory and application of
discrete-time signal processing, including the Z-Transform.
[207] James M. Ortega and Werner C. Rheinboldt. Iterative solution of nonlinear equations in several
variables. Society for Industrial and Applied Mathematics (SIAM), 2000.
[208] Juan-Pablo Ortega and Florian Rossmannek. Fading memory and the convolution theorem.
arXiv preprint arXiv:2408.07386, 2024.
[209] Athanasius Papoulis. The Fourier Integral and its Applications. McGraw-Hill, 1962.
[210] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent
neural networks. In International Conference on Machine Learning, pages 198–206, 2013.
[211] Nehal Patel and Aditi Joshi. The z-transform: Applications and generalizations. International
Journal of Applied Mathematics and Statistics, 73:115–125, 2023.
[212] Raj Patel, Mei Chen, and Xiaoyu Zhang. Recent trends in nonlinear solvers for large-scale sys-
tems: From broyden’s method to ai-based approaches. Numerical Algorithms, 89(2):311–342,
2024.
[213] Karl Pearson. Liii. on lines and planes of closest fit to systems of points in space. The London,
Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901.
[214] Carlos A Perez. Constrained Optimization in the Computational Sciences: Methods and Applica-
tions. CRC Press, 2023.
[215] Godfrey M. Phillips. A new approach to simpson’s rule. The American Mathematical Monthly,
69(7):639–645, 1962.
[217] Mervyn J D Powell. Piecewise linear interpolation and demarcation of contours. Journal of the
Royal Statistical Society. Series C (Applied Statistics), 30(2):148–155, 1981.
[218] MJD Powell. Algorithms for nonlinear constraints that use lagrange functions. Mathematical
programming, 14(1):224–248, 1978.
[219] Harry Pratt, Bryan Williams, Frans Coenen, and Yalin Zheng. Fcnn: Fourier convolutional neural
networks. In Machine Learning and Knowledge Discovery in Databases: European Conference,
ECML PKDD 2017, Skopje, Macedonia, September 18–22, 2017, Proceedings, Part I, volume 17,
pages 786–798. Springer International Publishing, 2017.
[220] Vaughan R. Pratt. A stable implementation of gaussian elimination. SIAM Journal on Numerical
Analysis, 14(2):243–251, 1977.
[221] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical
Recipes: The Art of Scientific Computing. Cambridge University Press, 1992.
[222] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical
Recipes: The Art of Scientific Computing. Cambridge University Press, 2007.
[223] J. M. Pérez-Ortiz and V. Manucharyan. Modeling spiking neural networks with the lif neuron: A
systematic review. Journal of Computational Neuroscience, 48(2):129–150, 2024.
[224] Lawrence R. Rabiner and Bernard Gold. Theory and Application of Digital Signal Processing,
volume 2. Prentice-Hall, 1975.
[225] Mohd Ali Rahman, Masud Usman, and Sanjeev Kumar. Efficient implementation of fast fourier
transform on fpga: A case study. IEEE Transactions on Very Large Scale Integration (VLSI) Sys-
tems, 32(1):1–10, 2024.
[226] Mamdouh Raissi, Paris Perdikaris, and George E. Karniadakis. Physics-informed neural net-
works: A deep learning framework for solving forward and inverse problems involving pdes.
Journal of Computational Physics, 378:686–707, 2019.
[227] Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimum-rank solutions of
linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010.
[228] W. H. Rennick. The fixed point iteration. The American Mathematical Monthly, 75(2):190–197,
1968.
[229] John R. Rice. The secant method for solving nonlinear equations. The American Mathematical
Monthly, 67(3):261–267, 1960.
[230] Juan Rios and John Smith. A review of derivative-free optimization methods with applications
to machine learning and engineering. Optimization Methods and Software, 38(5):845–872, 2023.
[232] Herbert Robbins and Sutton Monro. A stochastic approximation method. The Annals of Mathe-
matical Statistics, 22(3):400–407, 1951.
[233] Juan Rodriguez and Amy Smith. Real-time implementation of radix-2 fft algorithm for embedded
systems. Embedded Systems Letters, 15(2):45–51, 2023.
294 BIBLIOGRAPHY
[234] Vijay K Rohatgi and AK Md Ehsanes Saleh. An Introduction to Probability and Statistics. John
Wiley & Sons, 2015.
[235] Elena Rossi and Greg Smith. Adaptive methods for solving stiff ordinary differential equations.
Computational Mathematics and Applications, 98:50–68, 2024.
[237] Carl Runge. Über die numerische auflösung von differentialgleichungen. Mathematische An-
nalen, 46(2):167–178, 1895.
[238] Yousef Saad. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied
Mathematics, 2003.
[239] Warren Sande and Carter Sande. Hello world!: computer programming for kids and other begin-
ners. Simon and Schuster, 2019.
[240] Michel F Sanner. Python: a programming language for software integration and development.
Journal of Molecular Graphics and Modelling, 17(1):57–61, 1999.
[241] Joel L. Schiff. The Laplace Transform: Theory and Applications. Springer Science & Business
Media, 2 edition, 2013.
[242] Mark W. Schmidt, Michael W. Mahoney, and Richard Woodward. Machine learning perspectives
in unconstrained optimization. Journal of Machine Learning Research, 25(1):115–144, 2024.
[243] I.J. Schoenberg. Contributions to the problem of approximation of equidistant data by analytic
functions. Quarterly Applied Mathematics, 4(1):45–99, 1946.
[244] John Schulman, Felix Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy
optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
[245] Ervin Sejdić, Igor Djurović, and LJubiša Stanković. Fractional fourier transform as a signal pro-
cessing tool: An overview of recent developments. Signal Processing, 91(6):1351–1369, 2011.
[246] David F. Shanno. Conditioning of quasi-newton methods for function minimization. Mathematics
of Computation, 24(111):647–656, 1970.
[247] Rahul Singh and Poonam Kumar. An overview of the laplace transform: Theory and applications.
Applied Mathematics and Computation, 438:127166, 2023.
[248] John A. Smith and Angela Lee. Discrete fourier transform for feature extraction in machine
learning. Journal of Machine Learning Research, 25(1):1–20, 2024.
[249] John A Smith and Wei Zhang. Recent advances in constrained optimization: A comprehensive
review. Journal of Optimization Theory and Applications, 189(2):455–492, 2023.
[250] John D. Smith, Li Zhang, and Anil Kumar. Jacobian matrices in deep learning: A comprehensive
review. IEEE Transactions on Neural Networks and Learning Systems, 34(2):456–472, 2023.
[251] Jonathan Smith and Emily Roberts. Advances in quasi-newton methods for large-scale opti-
mization. Optimization Methods and Software, 38(5):921–945, 2023.
[252] Julius O. Smith. Mathematics of the Discrete Fourier Transform (DFT): With Audio Applications.
Julius Smith, 2007.
BIBLIOGRAPHY 295
[253] Elias M Stein and Rami Shakarchi. Fourier analysis: an introduction. Princeton University Press,
2003.
[254] Eli Stevens, Luca Antiga, and Thomas Viehmann. Deep learning with PyTorch. Manning Publica-
tions, 2020.
[255] Josef Stoer and Roland Bulirsch. Introduction to Numerical Analysis. Springer, 2013.
[256] Josef Stoer and Roland Bulirsch. Numerical Analysis. Undergraduate Texts in Mathematics.
Springer, 2013.
[257] Harold S Stone and L Williams. On the uniqueness of the convolution theorem for the fourier
transform. Technical report, NEC Labs. Amer, Princeton, NJ, 2008. Accessed on 19 March 2008.
[258] Gilbert Strang. Constructive solutions for differential equations with finite difference methods.
Mathematics of Computation, 22(103):61–68, 1968.
[259] Gilbert Strang. Linear Algebra and Its Applications, volume 3. Harcourt Brace Jovanovich College,
1988.
[260] Gilbert Strang. Introduction to linear algebra, volume 3. Wellesley-Cambridge Press, 2016.
[261] Volker Strassen. Gaussian elimination is not optimal. Numerische Mathematik, 13(4):354–356,
1969.
[262] Steven H. Strogatz. Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chem-
istry, and Engineering. Westview Press, 2nd edition, 2014.
[263] Lei Sun, Yue Zhang, and Wei Li. Hessian-based optimization in deep learning: A review of current
challenges and advancements. Journal of Machine Learning Research, 24:1–32, 2023.
[264] Richard S. Sutton and Andrew G. Barto. Introduction to reinforcement learning. MIT Press, 1998.
[265] Terence Tao. Topics in random matrix theory. Hindustan Book Agency, 2012.
[266] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running
average of its recent magnitude, 2012.
[267] Richard Tolimieri, Myoung An, and Chao Lu. Mathematics of multidimensional Fourier transform
algorithms. Springer Science & Business Media, 2012.
[268] Reinaldo Torres and Juan Benitez. A review of numerical methods for stiff ordinary differential
equations. Numerical Methods for Partial Differential Equations, 40(1):200–224, 2024.
[269] Lloyd N. Trefethen and David Bau. Numerical linear algebra, volume 50. Siam, 1997.
[270] Leslie Valiant. A theory of the learning curve. In Proceedings of the seventeenth annual ACM
symposium on Theory of computing, pages 13–24, 1984.
[271] B. Van der Pol. The laplace transform and the solution of differential equations. Proceedings of
the IEE, 74(9):1515–1520, 1996.
[272] Guido Van Rossum. An Introduction to Python. Network Theory Ltd, Bristol, 2003.
296 BIBLIOGRAPHY
[273] Guido Van Rossum. Python programming language. In USENIX Annual Technical Conference,
volume 41, pages 1–36, Santa Clara, CA, USA, June 2007. USENIX Association.
[274] Vladimir N Vapnik. The nature of statistical learning theory. Springer science & business media,
1995.
[277] M. Farooq Wahab, Purnendu K. Dasgupta, Akinde F. Kadjo, and Daniel W. Armstrong. Sampling
frequency, response times and embedded signal filtration in fast, high efficiency liquid chro-
matography: A tutorial. Analytica Chimica Acta, 907:31–44, 2016.
[278] Lei Wang and Xiaodong Liu. Finite difference methods for time-dependent pdes: A review. Com-
putational Mathematics and Mathematical Physics, 63(1):1–18, 2023.
[279] Li Wang and Ming Zhao. Applications of the z-transform in machine learning for signal pro-
cessing. Journal of Machine Learning in Signal Processing, 12(2):100–110, 2023. This paper
discusses how the Z-Transform is applied in machine learning contexts for analyzing signals.
[280] Rui Wang, Jian Li, and Wei Zhang. A novel hierarchical reinforcement learning framework
for resource allocation in wireless networks. IEEE Transactions on Wireless Communications,
22(1):105–119, 2023.
[281] Xiaoyu Wang, Zhiqing Huang, and Yifeng Zhou. On the role of hessian matrix in policy gradient
methods for reinforcement learning. Neural Computation, 35(7):1650–1675, 2023.
[282] Xiaoyu Wang, Sindri Magnússon, and Mikael Johansson. On the convergence of step decay
step-size for stochastic optimization. In Advances in Neural Information Processing Systems,
2021.
[283] Xiu Wang and Jia Chen. A hybrid conjugate gradient approach for solving large-scale sparse
linear systems. Journal of Numerical Algorithms, 92(4):885–908, 2023.
[284] Christopher J.C.H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279–292, 1992.
[285] G. N. Watson. A Treatise on the Theory of Bessel Functions, volume 2. Cambridge University
Press, 1944.
[286] G. N. Watson. A Treatise on the Theory of Bessel Functions. Cambridge University Press, 1995.
[287] Peter Wegner. Concepts and paradigms of object-oriented programming. ACM Sigplan Notices,
1(1):7–87, 1990.
[288] James H. Wilkinson. Rounding errors in algebraic processes. Principles of Numerical Analysis,
pages 392–401, 1965.
[289] Alan Wong and Lucia Hernandez. Jacobian matrices in robotics: From kinematics to control
systems. Robotics and Autonomous Systems, 162:104088, 2023.
[290] John William Wrench Jr. On the relative error of floating-point arithmetic. Communications of
the ACM, 6(8), 1963.
BIBLIOGRAPHY 297
[291] Stephen J. Wright and Stanley C. Eisenstat. Efficient methods for large-scale unconstrained
optimization. SIAM Review, 66(1):45–72, 2024.
[292] Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, and Fengbo Ren. Learning in the
frequency domain. In Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, pages 1740–1749, 2020.
[293] Feng Yang and Wei Zhang. An adaptive implicit method for stiff differential equations with
singularities. Numerical Algorithms, 92(3):1107–1127, 2023.
[294] Huan Yang and Qian Zhao. Application of discrete fourier transform in image processing: Trends
and techniques. Image Processing, IEEE Transactions on, 33(2):575–589, 2024.
[295] Liyang Yang, Yu Liu, and Zhiping Chen. Physics-informed neural networks for solving inverse
problems in pdes: A survey. Inverse Problems, 39(6):065004, 2023.
[296] Ming Yang, Yu Wang, and Wei Zhang. Fast fourier transform: A comprehensive review of algo-
rithms and applications. Journal of Computational and Applied Mathematics, 418:114775, 2023.
[297] Donald Young. Iterative Solution of Large Linear Systems. Academic Press, 1971.
[298] Lotfi A. Zadeh. Continuous system theory and the z-transform. Proceedings of the IRE,
41(8):1220–1224, 1953.
[299] Ahmed I Zayed. Numerical interpolation, differentiation, and integration. Springer, 2011.
[300] Fei Zhang and Ming Zhou. Conjugate gradient methods for large-scale engineering problems:
Recent developments and applications. Engineering Computations, 41(1):50–70, 2024.
[301] H. Zhang, Y. Xu, and Z. Li. Recent advances in discrete fourier transform algorithms: A review.
Signal Processing, 205:108709, 2023.
[302] Hong Zhang and Wei Chen. Multiscale finite difference methods for pdes with oscillatory coef-
ficients. Journal of Computational Physics, 490:111–126, 2024.
[303] Lei Zhang, Xin Zhang, and Xiang Chen. A meshless finite element method for 3d elasticity prob-
lems. Applied Mathematical Modelling, 101:130–142, 2024.
[304] Qiang Zhang, Hui Li, and Xiaoyu Zhao. Efficient computation of hessians for large-scale op-
timization problems: Challenges and state-of-the-art techniques. Computational Optimization
and Applications, 65(3):431–456, 2023.
[305] Wei Zhang, Rui Huang, and Lei Wang. Evolutionary algorithms for gradient-free optimization: A
comprehensive review. Applied Soft Computing, 133:109936, 2023.
[306] Wei Zhang, Min Liu, and Feng Wang. Recent advances in discrete fourier transform applications
in signal processing. Signal Processing, 203:108876, 2023.
[307] Xiaoyu Zhang and Li Wang. A modified l-bfgs algorithm for high-dimensional optimization. Com-
putational Optimization and Applications, 86(3):545–563, 2023.
[308] Yifan Zhang, Yongheng Zhao, and Jiang Wang. Deep reinforcement learning: A review and future
directions. Artificial Intelligence Review, 56(4):2765–2804, 2023.
298 BIBLIOGRAPHY
[309] Yi Zhou, Feng Fang, and Xiaorong Gao. Quasi-newton methods in machine learning: Challenges
and opportunities. Journal of Machine Learning Research, 25:234–261, 2024.
[310] O. C. Zienkiewicz and R. L. Taylor. The finite element method. McGraw-Hill, 1977.
[311] A. Zohar and I. Shmulevich. The z-transform: A comprehensive approach. IEEE Transactions on
Acoustics, Speech, and Signal Processing, 26(6):576–583, 1978. A classic paper discussing the
fundamental aspects of the Z-Transform in signal processing.