0% found this document useful (0 votes)

26 views27 pages

Unit4 Notes

Calculus is essential in machine learning for gradient computation, backpropagation, and optimization algorithms, enabling effective model training and analysis. The document illustrates the application of calculus in analyzing a plant's growth using derivatives to determine growth rates and concavity. It also discusses convex sets and functions, providing examples and their relevance in optimization problems.

Uploaded by

taniabhat2017

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views27 pages

Unit4 Notes

Uploaded by

taniabhat2017

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

1. What is the importance of calculus in machine learning?

Imagine you are

analyzing the growth of a plant over time. The height of the plant (in centimeters)
at any time 𝑥 (in weeks) is modeled by the function:
𝑓(𝑥) = 𝑥 + 3𝑥 + 5
Here:
 𝑓(𝑥) represents the height of the plant at time 𝑥.
 The function is quadratic, indicating that the plant's growth accelerates over
time.
We will now analyze the growth rate of the plant by computing the derivative of
𝑓(𝑥), finding the slope of the tangent at a specific time (𝑥 = 2 weeks), and
determining whether the plant's height is increasing or decreasing at that time.

Answer: Calculus is a cornerstone of machine learning, providing the mathematical foundation

for many key concepts and algorithms. Its importance can be broken down into several areas:
1. Gradient Computation:

o Calculus is used to compute gradients (partial derivatives) of the loss function

with respect to model parameters.
o Gradients indicate the direction of the steepest ascent or descent, which is
essential for optimizing model parameters.
o For example, in linear regression, the gradient of the Mean Squared Error (MSE)
loss function is used to update the weights.
2. Backpropagation:

o In neural networks, the chain rule of calculus is used to compute gradients layer
by layer during backpropagation.
o This allows the network to learn by adjusting weights to minimize the loss
function.
3. Convexity Analysis:

o Calculus helps determine whether a function is convex, which is crucial for

ensuring the existence of a global minimum in optimization problems.
o Convex functions are easier to optimize because they have no local minima.
4. Rate of Change:

o Derivatives measure how a function changes with respect to its inputs, enabling
the analysis of model behavior and sensitivity.
o For example, in regression models, the derivative of the loss function with respect
to the parameters determines how the parameters should be updated.
5. Optimization Algorithms:

o Gradient-based optimization algorithms like gradient descent, stochastic gradient

descent (SGD), and Adam rely on calculus to update model parameters
iteratively.
6. Regularization:

o Calculus is used to compute gradients for regularization terms (e.g., L1 or L2

regularization), which help prevent overfitting.
Numerical Example:

Step 1: Compute the Derivative of 𝑓(𝑥)

The derivative 𝑓 (𝑥) represents the rate of change of the plant's height with respect to time. For
the given function:
𝑓(𝑥) = 𝑥 + 3𝑥 + 5
Using the power rule for differentiation:
𝑑 𝑑 𝑑
𝑓 (𝑥) = (𝑥 ) + (3𝑥) + (5)
𝑑𝑥 𝑑𝑥 𝑑𝑥
Applying the power rule:
𝑓 (𝑥) = 2𝑥 + 3 + 0
So, the derivative is:
𝑓 (𝑥) = 2𝑥 + 3
Interpretation:
 The derivative 𝑓 (𝑥) = 2𝑥 + 3 tells us how fast the plant's height is changing at any time
𝑥.
 For example, at 𝑥 = 1 week, the growth rate is 𝑓 (1) = 2(1) + 3 = 5 cm/week.

Step 2: Find the Slope of the Tangent at 𝑥 = 2

The slope of the tangent to the curve at a specific point 𝑥 = 𝑎 is given by the value of the
derivative at that point, 𝑓 (𝑎).
For 𝑥 = 2 weeks:
𝑓 (2) = 2(2) + 3 = 4 + 3 = 7
So, the slope of the tangent at 𝑥 = 2 is 7 cm/week.
Interpretation:
 At 𝑥 = 2 weeks, the plant's height is increasing at a rate of 7 cm per week.
 This means that, at this specific time, the plant is growing rapidly.

Step 3: Determine Whether the Function is Increasing or Decreasing at 𝑥 = 2

The sign of the derivative 𝑓 (𝑥) at a point determines whether the function is increasing or
decreasing at that point:
 If 𝑓 (𝑥) > 0, the function is increasing.
 If 𝑓 (𝑥) < 0, the function is decreasing.
From Step 2, we know:
𝑓 (2) = 7 > 0
Since 𝑓 (2) > 0, the function 𝑓(𝑥) is increasing at 𝑥 = 2.
Interpretation:
 At 𝑥 = 2 weeks, the plant's height is increasing.
 This aligns with our observation that the slope of the tangent is positive, indicating
growth.

Summary of Results:
7. Derivative of 𝑓(𝑥):
𝑓 (𝑥) = 2𝑥 + 3
o This represents the growth rate of the plant at any time 𝑥.
8. Slope of the Tangent at 𝑥 = 2:
𝑓 (2) = 7 cm/week
o At 𝑥 = 2 weeks, the plant is growing at a rate of 7 cm per week.
9. Behavior of the Function at 𝑥 = 2:
o The function is increasing at 𝑥 = 2, meaning the plant's height is growing at this
time.

Case Study Conclusion:

By analyzing the derivative of the function 𝑓(𝑥) = 𝑥 + 3𝑥 + 5, we determined:
 The plant's growth rate at any time 𝑥 is given by 𝑓 (𝑥) = 2𝑥 + 3.
 At 𝑥 = 2 weeks, the plant is growing at a rate of 7 cm/week.
 The plant's height is increasing at 𝑥 = 2 weeks, indicating healthy growth.
This type of analysis is useful in real-world applications, such as predicting growth trends,
optimizing resources, and making informed decisions based on rates of change.

Q2. Consider the function 𝒇(𝒙) = 𝒙𝟑 − 𝟔𝒙𝟐 + 𝟗𝒙 + 𝟐.

1. Compute the first derivative 𝑓 (𝑥).
2. Compute the second derivative 𝑓 (𝑥).
3. Evaluate the second derivative at 𝑥 = 2 and interpret its meaning.
4. Determine whether the function is concave upward, concave downward, or has an
inflection point at 𝑥 = 2. Justify your answer.

Detailed Answer:
1. Compute the First Derivative 𝑓 (𝑥)
The first derivative of 𝑓(𝑥) is calculated using the power rule:
𝑓(𝑥) = 𝑥 − 6𝑥 + 9𝑥 + 2
𝑑 𝑑 𝑑 𝑑
𝑓 (𝑥) = (𝑥 ) − (6𝑥 ) + (9𝑥) + (2)
𝑑𝑥 𝑑𝑥 𝑑𝑥 𝑑𝑥
𝑓 (𝑥) = 3𝑥 − 12𝑥 + 9 + 0
𝑓 (𝑥) = 3𝑥 − 12𝑥 + 9
Answer:
𝑓 (𝑥) = 3𝑥 − 12𝑥 + 9

2. Compute the Second Derivative 𝑓 (𝑥)

The second derivative is the derivative of 𝑓 (𝑥):
𝑓 (𝑥) = 3𝑥 − 12𝑥 + 9
𝑑 𝑑 𝑑
𝑓 (𝑥) = (3𝑥 ) − (12𝑥) + (9)
𝑑𝑥 𝑑𝑥 𝑑𝑥
𝑓 (𝑥) = 6𝑥 − 12 + 0
𝑓 (𝑥) = 6𝑥 − 12
Answer:
𝑓 (𝑥) = 6𝑥 − 12

3. Evaluate the Second Derivative at 𝑥 = 2 and Interpret Its Meaning :

Substitute 𝑥 = 2 into 𝑓 (𝑥):
𝑓 (2) = 6(2) − 12 = 12 − 12 = 0
Interpretation:
 The second derivative 𝑓 (𝑥) measures the concavity of the function.
 At 𝑥 = 2, 𝑓 (2) = 0, which indicates that the function may have an inflection point at
𝑥 = 2.
Answer:
𝑓 (2) = 0
This suggests that 𝑥 = 2 is a possible inflection point.

4. Determine Concavity and Justify

To determine whether the function is concave upward, concave downward, or has an inflection
point at 𝑥 = 2, analyze the sign of 𝑓 (𝑥) around 𝑥 = 2:
 For 𝑥 < 2 (e.g., 𝑥 = 1):

𝑓 (1) = 6(1) − 12 = −6 < 0

The function is concave downward for 𝑥 < 2.

 For 𝑥 > 2 (e.g., 𝑥 = 3):

𝑓 (3) = 6(3) − 12 = 6 > 0

The function is concave upward for 𝑥 > 2.

Since the concavity changes from downward to upward at 𝑥 = 2, this confirms that 𝑥 = 2 is an
inflection point.
Answer:
 For 𝑥 < 2, the function is concave downward.
 For 𝑥 > 2, the function is concave upward.
 At 𝑥 = 2, the function has an inflection point.
3. Define convex sets and convex functions. Provide examples.
Answer:
Convex Sets
A set 𝑆 is convex if, for any two points 𝑥, 𝑦 ∈ 𝑆, the line segment connecting them lies entirely
within 𝑆. Mathematically:
𝜆𝑥 + (1 − 𝜆)𝑦 ∈ 𝑆 ∀𝜆 ∈ [0,1]
Examples:
 A circle is a convex set because any line segment connecting two points within the circle
lies entirely inside it.
 A crescent moon is not a convex set because there exist points where the line segment
connecting them goes outside the set.
Convex Functions
A function 𝑓(𝑥) is convex if, for any two points 𝑥, 𝑦 in its domain and 𝜆 ∈ [0,1]:
𝑓(𝜆𝑥 + (1 − 𝜆)𝑦) ≤ 𝜆𝑓(𝑥) + (1 − 𝜆)𝑓(𝑦)
Examples:
 𝑓(𝑥) = 𝑥 is convex because its second derivative 𝑓 (𝑥) = 2 is always positive.
 𝑓(𝑥) = sin(𝑥) is not convex because its second derivative 𝑓 (𝑥) = −sin(𝑥) changes
sign.
Numerical Example: Consider the function 𝑓(𝑥) = 𝑥 . For 𝑥 = 1 and 𝑦 = 2, and 𝜆 = 0.5:
𝑓(0.5 ⋅ 1 + 0.5 ⋅ 2) = 𝑓(1.5) = 2.25
0.5 ⋅ 𝑓(1) + 0.5 ⋅ 𝑓(2) = 0.5 ⋅ 1 + 0.5 ⋅ 4 = 2.5
Since 2.25 ≤ 2.5, the function is convex.
Practical Case Study: In support vector machines (SVM), the objective function is convex,
ensuring a unique global minimum. This convexity property makes SVM optimization efficient
and reliable.

Q4. A company is designing a new product and needs to optimize its production process.
The production cost 𝑪(𝒙) depends on the quantity 𝒙 of raw materials used. The company
has identified that the cost function 𝑪(𝒙) and the feasible set of raw material quantities 𝑺
play a crucial role in determining the optimal production strategy. The cost function is
given by:
𝑪(𝒙) = 𝒙𝟐 + 𝟏𝟎𝒙 + 𝟏𝟎𝟎,
and the feasible set of raw material quantities is 𝑺 = [𝟓, 𝟏𝟓].
Based on this information, answer the following questions:

1. Convex Set Verification

Show that the feasible set 𝑆 = [5,15] is a convex set by verifying the convexity
condition for 𝑥 = 5 and 𝑥 = 15 with 𝜆 = 0.3.

2. Convex Function Verification

Verify that the cost function 𝐶(𝑥) = 𝑥 + 10𝑥 + 100 is convex by checking the
convexity condition for 𝑥 = 5 and 𝑥 = 10 with 𝜆 = 0.5.

Ans:- 1. Convex Set Verification

We are given the set 𝑆 = [5,15]. To show that 𝑆 is convex, we verify the convexity condition for
𝑥 = 5, 𝑥 = 15, and 𝜆 = 0.3.
Step 1: Compute 𝜆𝑥 + (1 − 𝜆)𝑥 :
𝜆𝑥 + (1 − 𝜆)𝑥 = 0.3(5) + (1 − 0.3)(15) = 1.5 + 0.7(15) = 1.5 + 10.5 = 12.
Step 2: Check if 12 ∈ 𝑆: Since 12 lies within the interval [5,15], the set 𝑆 satisfies the
convexity condition for 𝜆 = 0.3.
Conclusion: The set 𝑆 = [5,15] is convex.
2. Convex Function Verification
We are given the cost function 𝐶(𝑥) = 𝑥 + 10𝑥 + 100. To verify that 𝐶(𝑥) is convex, we
check the convexity condition for 𝑥 = 5, 𝑥 = 10, and 𝜆 = 0.5.
Step 1: Compute 𝐶(𝜆𝑥 + (1 − 𝜆)𝑥 ):
𝜆𝑥 + (1 − 𝜆)𝑥 = 0.5(5) + (1 − 0.5)(10) = 2.5 + 0.5(10) = 2.5 + 5 = 7.5.
𝐶(7.5) = (7.5) + 10(7.5) + 100 = 56.25 + 75 + 100 = 231.25.
Step 2: Compute 𝜆𝐶(𝑥 ) + (1 − 𝜆)𝐶(𝑥 ):
𝐶(𝑥 ) = 𝐶(5) = (5) + 10(5) + 100 = 25 + 50 + 100 = 175,
𝐶(𝑥 ) = 𝐶(10) = (10) + 10(10) + 100 = 100 + 100 + 100 = 300,
𝜆𝐶(𝑥 ) + (1 − 𝜆)𝐶(𝑥 ) = 0.5(175) + 0.5(300) = 87.5 + 150 = 237.5.
Step 3: Compare the two results:
𝐶(𝜆𝑥 + (1 − 𝜆)𝑥 ) = 231.25 ≤ 237.5 = 𝜆𝐶(𝑥 ) + (1 − 𝜆)𝐶(𝑥 ).
Conclusion: The cost function 𝐶(𝑥) = 𝑥 + 10𝑥 + 100 satisfies the convexity condition for
𝑥 = 5, 𝑥 = 10, and 𝜆 = 0.5.

5. Explain the concept of optimization in machine learning with numerical

case study.
Answer: Optimization in machine learning refers to the process of finding the best set of model
parameters that minimize (or maximize) a given objective function, typically a loss function.
Key aspects include:
1. Objective Function:

o The objective function quantifies the performance of a model. For example:

 In regression, the Mean Squared Error (MSE) is used.
 In classification, the cross-entropy loss is used.
2. Constraints:

o Constraints are conditions that the solution must satisfy. For example:
 Regularization terms like L1 or L2 penalty are added to the loss function
to prevent overfitting.
3. Gradient-Based Methods:

o Optimization algorithms like gradient descent use gradients (partial derivatives) to

iteratively update parameters and minimize the loss function.
4. Global vs Local Optima:

o In non-convex problems, optimization algorithms may converge to local minima,

but convex problems guarantee a global minimum.
5. Applications:

o Optimization is used in training models like linear regression, logistic regression,

neural networks, and support vector machines (SVM).
(a) Numerical Example: Minimize 𝑓(𝑥) = 𝑥 using gradient descent. Start at 𝑥 = 3, learning
rate 𝜂 = 0.1:
𝑥 = 𝑥 − 𝜂𝑓 (𝑥) = 3 − 0.1 ⋅ 6 = 2.4
Practical Case Study: In logistic regression, the log-loss function is minimized using gradient
descent to classify data points. The optimization process adjusts the model parameters to
maximize the likelihood of the observed data.
(b) Numerical Example:
Let’s consider a simple linear regression problem where we want to fit a line 𝑦 = 𝑚𝑥 + 𝑏 to a
set of data points. The goal is to find the best values of 𝑚 (slope) and 𝑏 (intercept) that minimize
the Mean Squared Error (MSE) loss function.

Step 1: Define the Loss Function

The MSE loss function is given by:

1
𝐿(𝑚, 𝑏) = 𝑦 − (𝑚𝑥 + 𝑏) ,
𝑁

where:
 𝑁 is the number of data points,
 (𝑥 , 𝑦 ) are the data points,
 𝑚𝑥 + 𝑏 is the predicted value for input 𝑥 .

Step 2: Initialize Parameters

Let’s initialize the parameters 𝑚 and 𝑏 with some random values:
𝑚 = 1, 𝑏 = 0.

Step 3: Compute the Loss

Suppose we have the following data points:
(𝑥 , 𝑦 ) = (1,3), (𝑥 , 𝑦 ) = (2,5), (𝑥 , 𝑦 ) = (3,7).
Compute the predicted values and the loss:
Predicted values: 𝑦 = 1(1) + 0 = 1, 𝑦 = 1(2) + 0 = 2, 𝑦 = 1(3) + 0 = 3.
1 1 29
Loss: 𝐿(𝑚, 𝑏) = [(3 − 1) + (5 − 2) + (7 − 3) ] = (4 + 9 + 16) = ≈ 9.67.
3 3 3

Step 4: Update Parameters Using Gradient Descent

Gradient Descent is an optimization algorithm that updates the parameters in the direction of the
negative gradient of the loss function. The update rules are:
∂𝐿
𝑚new = 𝑚old − 𝛼 ,
∂𝑚
∂𝐿
𝑏new = 𝑏old − 𝛼 ,
∂𝑏
where 𝛼 is the learning rate (e.g., 𝛼 = 0.1).
Compute the Gradients:

∂𝐿 2
= 𝑥 𝑦 − (𝑚𝑥 + 𝑏) ,
∂𝑚 𝑁

∂𝐿 2
= 𝑦 − (𝑚𝑥 + 𝑏) .
∂𝑏 𝑁

For our data:

∂𝐿 2 2 40
= [1(3 − 1) + 2(5 − 2) + 3(7 − 3)] = (2 + 6 + 12) = ≈ 13.33,
∂𝑚 3 3 3
∂𝐿 2 2 18
= [(3 − 1) + (5 − 2) + (7 − 3)] = (2 + 3 + 4) = = 6.
∂𝑏 3 3 3
Update the Parameters:
𝑚new = 1 − 0.1(13.33) = 1 − 1.333 = −0.333,
𝑏new = 0 − 0.1(6) = 0 − 0.6 = −0.6.

Step 5: Repeat Until Convergence

Repeat the process of computing the loss, gradients, and updating the parameters until the loss
function is minimized (or reaches a satisfactory value).

Summary of Optimization in Machine Learning:

6. Objective: Minimize the loss function 𝐿(𝑚, 𝑏).
7. Parameters: 𝑚 and 𝑏.
8. Optimization Algorithm: Gradient Descent.
9. Process:
o Compute the loss.
o Compute the gradients.
o Update the parameters.
o Repeat until convergence.
6. What is gradient descent? Explain its variants.
Answer:
Gradient Descent (GD)
Gradient descent is an iterative optimization algorithm used to minimize a function by moving in
the direction of the negative gradient. The update rule is:
𝑤 = 𝑤 − 𝜂∇𝐿(𝑤 )
where:
 𝑤 : Current parameter value.
 𝜂: Learning rate (step size).
 ∇𝐿(𝑤 ): Gradient of the loss function with respect to 𝑤 .
Problem Statement
 Function to minimize:

𝑓(𝑥) = 𝑥 + 4𝑥 + 4
o This is a quadratic function with a parabolic shape.
o The function has a global minimum at 𝑥 = −2.
 Initial guess: 𝑥 = 5

o We start the optimization process from 𝑥 = 5.

 Learning rate: 𝛼 = 0.1

o The learning rate controls the step size during each iteration.
o A smaller learning rate leads to slower but more stable convergence.
 Number of iterations: 5

o We will perform 5 updates to the value of 𝑥.

 Goal:

o Find the value of 𝑥 that minimizes 𝑓(𝑥).

Step 1 - Compute the Gradient

 The gradient (derivative) of 𝑓(𝑥) is:
𝑑
𝑓 (𝑥) = (𝑥 + 4𝑥 + 4) = 2𝑥 + 4
𝑑𝑥
o The gradient measures the slope of the function at a given point.
o It indicates the direction of the steepest ascent.
 Why is the gradient important?

o To minimize 𝑓(𝑥), we move in the opposite direction of the gradient.

o The gradient tells us how to adjust 𝑥 to reduce the value of 𝑓(𝑥).

Step 2 - Gradient Descent Formula

 The update rule for gradient descent is:

𝑥new = 𝑥old − 𝛼 ⋅ 𝑓 (𝑥old )

o Explanation of terms:
 𝑥old : Current value of 𝑥.
 𝛼: Learning rate (step size).
 𝑓 (𝑥old ): Gradient at 𝑥old .
 How it works:

o Compute the gradient at the current point.

o Multiply the gradient by the learning rate.
o Subtract this value from the current 𝑥 to get the updated 𝑥.

Step 3 - Iterations
 Perform 5 iterations of gradient descent starting from 𝑥 = 5.
 Iteration Table:
Iteration 𝑥old 𝑓 (𝑥old ) 𝑥new = 𝑥old − 0.1 ⋅ 𝑓 (𝑥old )
0 5.0 14.0 5.0 − 0.1 ⋅ 14.0 = 3.6
1 3.6 11.2 3.6 − 0.1 ⋅ 11.2 = 2.48
2 2.48 8.96 2.48 − 0.1 ⋅ 8.96 = 1.584
3 1.584 7.168 1.584 − 0.1 ⋅ 7.168 = 0.8672
4 0.8672 5.7344 0.8672 − 0.1 ⋅ 5.7344 = 0.29376
5 0.29376 4.58752 0.29376 − 0.1 ⋅ 4.58752 = −0.164992
 Explanation of iterations:
o At each step, the gradient is computed using 𝑓 (𝑥) = 2𝑥 + 4.
o The new value of 𝑥 is updated using the gradient descent formula.
o Over time, 𝑥 moves closer to the minimum at 𝑥 = −2.
Final Result

 After 5 iterations, the value of 𝑥 is approximately:

𝑥 ≈ −0.165
 Explanation of the result:
o The function is moving toward the minimum at 𝑥 = −2.
o If more iterations are performed, 𝑥 will converge to 𝑥 = −2.
o The learning rate 𝛼 = 0.1 ensures stable but gradual progress.

Mathematical Insight
 The function 𝑓(𝑥) = 𝑥 + 4𝑥 + 4 can be rewritten as:
𝑓(𝑥) = (𝑥 + 2)
 Explanation:
o This form shows that the function is a perfect square.
o The minimum value of 𝑓(𝑥) is 0, achieved at 𝑥 = −2.
o Gradient descent is converging toward this minimum.

Key Takeaways
 Gradient descent is an iterative optimization algorithm:

o It updates the parameters step-by-step to minimize the loss function.

 The learning rate 𝛼 controls the step size:

o A smaller learning rate leads to slower but more stable convergence.

o A larger learning rate can cause overshooting or divergence.
 The gradient 𝑓 (𝑥) determines the direction of the update:

o The gradient points in the direction of the steepest ascent.

o Moving in the opposite direction reduces the value of 𝑓(𝑥).
 For quadratic functions, gradient descent converges to the global minimum:

o Quadratic functions have a single global minimum, making them ideal for
gradient descent.
 More iterations or a smaller learning rate can improve accuracy:

o Increasing the number of iterations or reducing the learning rate can lead to better
results.

Here’s a tabular comparison of Stochastic Gradient Descent (SGD), Momentum-based

Gradient Descent, and Adam (Adaptive Moment Estimation) in terms of their mathematical
computation and key characteristics:
Stochastic
Gradient Momentum-based Adam (Adaptive Moment
Aspect Descent (SGD) Gradient Descent Estimation)
Key Idea Updates Adds momentum to Combines momentum and
parameters using SGD by incorporating adaptive learning rates for
one data point (or past gradients to each parameter.
a small batch) at a accelerate convergence.
time.
Gradient ∂𝐿 ∂𝐿 ∂𝐿
Computation = 2𝑥 𝑦 − (𝑚𝑥 = 2𝑥 𝑦 − (𝑚𝑥
∂𝑚 ∂𝑚 ∂𝑚
= 2𝑥 𝑦 + 𝑏) + 𝑏)
− (𝑚𝑥 + 𝑏)
∂𝐿 ∂𝐿 ∂𝐿
= 2 𝑦 − (𝑚𝑥 = 2 𝑦 − (𝑚𝑥 + 𝑏)
∂𝑏 ∂𝑏 ∂𝑏
=2 𝑦 + 𝑏)
− (𝑚𝑥 + 𝑏)
Parameter ∂𝐿 𝑣 = 𝛽𝑣 + (1 𝑚 =𝛽 𝑚 + (1
Update 𝑚 =𝑚−𝛼 ∂𝐿 ∂𝐿
∂𝑚
− 𝛽) −𝛽 )
∂𝑚 ∂𝑚
∂𝐿 𝑚 = 𝑚 − 𝛼𝑣 𝑣
𝑏 =𝑏−𝛼
∂𝑏 =𝛽 𝑣
∂𝐿
+ (1 − 𝛽 )
∂𝑚
𝑚 = ,𝑣 =
𝑚
𝑚 =𝑚−𝛼
𝑣 +𝜖
Hyperparameters Learning rate (𝛼). Learning rate (𝛼), Learning rate (𝛼), 𝛽 , 𝛽 ,
momentum (𝛽). 𝜖.
Advantages Simple and fast Faster convergence than Combines momentum and
for large datasets. SGD, reduces adaptive learning rates,
oscillations. works well for sparse data
and noisy gradients.
Disadvantages Noisy updates, Requires tuning of More complex, requires
may get stuck in momentum tuning of multiple
local minima. hyperparameter. hyperparameters.
Use Cases Large datasets, Faster convergence than Deep learning, large-scale
simple models. SGD, especially for optimization problems.
high-curvature or noisy
loss landscapes.
Key Takeaways:
 SGD: Simple and fast but noisy updates.
 Momentum: Adds momentum to SGD for faster convergence and reduced oscillations.
 Adam: Combines momentum and adaptive learning rates, making it highly effective for
deep learning and large-scale problems.

Below is the detailed solution for each variant of gradient descent

Stochastic Gradient Descent (SGD)

Problem Setup
 Function to minimize:
𝑓(𝑥) = 𝑥 + 4𝑥 + 4
 Gradient:
𝑓 (𝑥) = 2𝑥 + 4
 Initial guess: 𝑥 = 5
 Learning rate: 𝛼 = 0.1
Update Rule
𝑥new = 𝑥old − 𝛼 ⋅ 𝑓 (𝑥old )
Iterations
Iteration 𝑥old 𝑓 (𝑥old ) 𝑥new
0 5.0 2(5) + 4 = 14.0 5.0 − 0.1(14.0) = 3.6
1 3.6 2(3.6) + 4 = 11.2 3.6 − 0.1(11.2) = 2.48
2 2.48 2(2.48) + 4 = 8.96 2.48 − 0.1(8.96) = 1.584
3 1.584 2(1.584) + 4 = 7.168 1.584 − 0.1(7.168) = 0.8672
4 0.8672 2(0.8672) + 4 = 5.7344 0.8672 − 0.1(5.7344) = 0.29376
5 0.29376 2(0.29376) + 4 = 4.58752 0.29376 − 0.1(4.58752) = −0.164992
Explanation
 SGD updates the parameter 𝑥 using the gradient at each step.
 The updates are frequent but noisy, leading to fluctuations.

Momentum-based Gradient Descent

Problem Setup
 Function to minimize:
𝑓(𝑥) = 𝑥 + 4𝑥 + 4
 Gradient:
𝑓 (𝑥) = 2𝑥 + 4
 Initial guess: 𝑥 = 5
 Learning rate: 𝛼 = 0.1
 Momentum factor: 𝛽 = 0.9
Update Rule
𝑣 = 𝛽𝑣 + (1 − 𝛽)𝑓 (𝑥old )
𝑥new = 𝑥old − 𝛼 ⋅ 𝑣
Iterations
𝑣
Iteratio = 𝛽𝑣
n 𝑥old 𝑓 (𝑥old ) = 2𝑥 + 4 + (1 − 𝛽)𝑓 (𝑥old ) 𝑥new = 𝑥old − 𝛼𝑣
0 5.0 2(5) + 4 = 14.0 0.9(0) + 0.1(14.0) 5.0 − 0.1(1.4) = 4.86
= 1.4
1 4.86 2(4.86) + 4 0.9(1.4) + 0.1(13.72) 4.86 − 0.1(2.632)
= 13.72 = 2.632 = 4.5968
2 4.5968 2(4.5968) + 4 0.9(2.632) 4.5968
= 13.1936 + 0.1(13.1936) − 0.1(3.68816)
= 3.68816 = 4.227984
3 4.227984 2(4.227984) + 4 0.9(3.68816) 4.227984
= 12.455968 + 0.1(12.455968) − 0.1(4.5649408)
= 4.5649408 = 3.77148992
4 3.77148992 2(3.77148992) 0.9(4.5649408) 3.77148992
+4 + 0.1(11.54297984) − 0.1(5.262744704)
= 11.54297984 = 5.262744704 = 3.2452154496
5 3.245215449 2(3.2452154496) 0.9(5.262744704) 3.2452154496
6 +4 + 0.1(10.4904308992) − 0.1(5.78551332352)
= 10.4904308992 = 5.78551332352 = 2.666664117248

Explanation
 Momentum introduces a velocity term 𝑣 that accumulates past gradients, reducing
oscillations.
 Momentum is particularly useful for functions with steep regions or noisy gradients.

Adam (Adaptive Moment Estimation)

Problem Setup
 Function to minimize:
𝑓(𝑥) = 𝑥 + 4𝑥 + 4
 Gradient:
𝑓 (𝑥) = 2𝑥 + 4
 Initial guess: 𝑥 = 5
 Learning rate: 𝛼 = 0.1
 Momentum factors: 𝛽 = 0.9, 𝛽 = 0.999
 Small constant: 𝜖 = 10
Update Rule
𝑚 =𝛽 𝑚 + (1 − 𝛽 )𝑓 (𝑥old )

𝑣 =𝛽 𝑣 + (1 − 𝛽 ) 𝑓 (𝑥old )
𝑚 𝑣
𝑚corrected = , 𝑣 corrected =
1−𝛽 1−𝛽

𝑚corrected
𝑥new = 𝑥old − 𝛼 ⋅
𝑣 corrected + 𝜖

Iterations
Iteration 𝑥old 𝑓 (𝑥old ) 𝑚 𝑣 𝑚corrected 𝑣 corrected 𝑥new
0 5.0 14.0 0.9 × 0 0.999 × 0 1.4 0.196 5.0
+ 0.1 + 0.001 1 − 0.9 1 − 0.999 − 0.1
× 14 × 196 = 14.0 = 196 14.0
×
= 1.4 = 0.196 √196
= 4.9
1 4.9 13.8 0.9 × 1.4 0.999 2.64 0.386 4.9
+ 0.1 × 0.196 1 − 0.9 1 − 0.999 − 0.1
× 13.8 + 0.001 = 13.8947 = 193.2 13.8947
×
= 2.64 × 190.44 √193.2
= 0.386 ≈ 4.8
2 4.8 13.6 0.9 0.999 3.736 0.57056 4.8
× 2.64 × 0.386 1 − 0.9 1 − 0.999 − 0.1
+ 0.1 + 0.001 = 13.78 ≈ 190.4 13.78
×
× 13.6 × 184.96 √190.4
= 3.736 = 0.57056 ≈ 4.7
Explanation
 Adam maintains two moving averages: 𝑚 (first moment) and 𝑣 (second moment).
 The corrected moments 𝑚corrected and 𝑣 corrected account for bias in the initial steps.
 Adam is widely used in deep learning due to its adaptability and robustness.

Practical Case Study: In neural networks, stochastic gradient descent (SGD) is used to train the
model on large datasets. The noisy updates help the model escape local minima and converge to
a better solution.

7. Explain the role of the learning rate in gradient descent.

Answer: The learning rate (𝜂) controls the step size of parameter updates in gradient descent.
Its role includes:
1. Convergence Speed: A larger learning rate leads to faster convergence but may cause
overshooting.
2. Stability: A smaller learning rate ensures stable convergence but may be slow.
3. Trade-Off: Choosing an appropriate learning rate is crucial for balancing speed and
stability.
4. Adaptive Learning Rates: Variants like Adam dynamically adjust the learning rate for
better performance.
Case Study: Minimizing a Quadratic Function with Multiple Variables
Objective Function: Consider the following quadratic function:
𝑓(𝑥, 𝑦) = 𝑥 + 2𝑦
This function has a global minimum at (𝑥, 𝑦) = (0,0).
Gradient: The gradient of the function is:
∂𝑓
⎡ ⎤
∂𝑥 2𝑥
∇𝑓(𝑥, 𝑦) = ⎢∂𝑓 ⎥ =
⎢ ⎥ 4𝑦
⎣∂𝑦⎦
Gradient Descent Update Rule: For each parameter 𝑥 and 𝑦, the update rule is:
∂𝑓
𝑥new = 𝑥old − 𝛼 ⋅
∂𝑥
∂𝑓
𝑦new = 𝑦old − 𝛼 ⋅
∂𝑦
Substituting the gradient:
𝑥new = 𝑥old − 𝛼 ⋅ 2𝑥old
𝑦new = 𝑦old − 𝛼 ⋅ 4𝑦old

Initialization
Let’s start with an initial guess:
(𝑥 , 𝑦 ) = (2,3)

Case 1: Learning Rate 𝛼 = 0.1

Iteration 1:
𝑥 = 𝑥 − 𝛼 ⋅ 2𝑥 = 2 − 0.1 ⋅ 2 ⋅ 2 = 2 − 0.4 = 1.6
𝑦 = 𝑦 − 𝛼 ⋅ 4𝑦 = 3 − 0.1 ⋅ 4 ⋅ 3 = 3 − 1.2 = 1.8
Iteration 2:
𝑥 = 𝑥 − 𝛼 ⋅ 2𝑥 = 1.6 − 0.1 ⋅ 2 ⋅ 1.6 = 1.6 − 0.32 = 1.28
𝑦 = 𝑦 − 𝛼 ⋅ 4𝑦 = 1.8 − 0.1 ⋅ 4 ⋅ 1.8 = 1.8 − 0.72 = 1.08
Iteration 3:
𝑥 = 𝑥 − 𝛼 ⋅ 2𝑥 = 1.28 − 0.1 ⋅ 2 ⋅ 1.28 = 1.28 − 0.256 = 1.024
𝑦 = 𝑦 − 𝛼 ⋅ 4𝑦 = 1.08 − 0.1 ⋅ 4 ⋅ 1.08 = 1.08 − 0.432 = 0.648
Observation: The values of 𝑥 and 𝑦 are gradually decreasing towards the minimum at (0,0).
The learning rate 𝛼 = 0.1 is appropriate for this case.

Case 2: Learning Rate 𝛼 = 0.5

Iteration 1:
𝑥 = 𝑥 − 𝛼 ⋅ 2𝑥 = 2 − 0.5 ⋅ 2 ⋅ 2 = 2 − 2 = 0
𝑦 = 𝑦 − 𝛼 ⋅ 4𝑦 = 3 − 0.5 ⋅ 4 ⋅ 3 = 3 − 6 = −3
Iteration 2:
𝑥 = 𝑥 − 𝛼 ⋅ 2𝑥 = 0 − 0.5 ⋅ 2 ⋅ 0 = 0
𝑦 = 𝑦 − 𝛼 ⋅ 4𝑦 = −3 − 0.5 ⋅ 4 ⋅ (−3) = −3 + 6 = 3
Iteration 3:
𝑥 = 𝑥 − 𝛼 ⋅ 2𝑥 = 0 − 0.5 ⋅ 2 ⋅ 0 = 0
𝑦 = 𝑦 − 𝛼 ⋅ 4𝑦 = 3 − 0.5 ⋅ 4 ⋅ 3 = 3 − 6 = −3
Observation: The value of 𝑥 converges to the minimum (𝑥 = 0) in one step, but 𝑦 oscillates
between 3 and −3. This is because the learning rate is too high for the 𝑦-dimension, causing
overshooting.

Case 3: Learning Rate 𝛼 = 0.01

Iteration 1:
𝑥 = 𝑥 − 𝛼 ⋅ 2𝑥 = 2 − 0.01 ⋅ 2 ⋅ 2 = 2 − 0.04 = 1.96
𝑦 = 𝑦 − 𝛼 ⋅ 4𝑦 = 3 − 0.01 ⋅ 4 ⋅ 3 = 3 − 0.12 = 2.88
Iteration 2:
𝑥 = 𝑥 − 𝛼 ⋅ 2𝑥 = 1.96 − 0.01 ⋅ 2 ⋅ 1.96 = 1.96 − 0.0392 = 1.9208
𝑦 = 𝑦 − 𝛼 ⋅ 4𝑦 = 2.88 − 0.01 ⋅ 4 ⋅ 2.88 = 2.88 − 0.1152 = 2.7648
Iteration 3:
𝑥 = 𝑥 − 𝛼 ⋅ 2𝑥 = 1.9208 − 0.01 ⋅ 2 ⋅ 1.9208 = 1.9208 − 0.0384 = 1.8824
𝑦 = 𝑦 − 𝛼 ⋅ 4𝑦 = 2.7648 − 0.01 ⋅ 4 ⋅ 2.7648 = 2.7648 − 0.1106 = 2.6542
Observation: The values of 𝑥 and 𝑦 are decreasing very slowly. The learning rate 𝛼 = 0.01 is
too small, resulting in slow convergence.

Key Takeaways:
1. Learning Rate Matters: The choice of learning rate significantly impacts the
convergence of gradient descent.
2. Too High: Causes oscillations or divergence.
3. Too Low: Results in slow convergence.
4. Optimal Learning Rate: Depends on the problem and the function being optimized. In
practice, techniques like learning rate scheduling or adaptive learning rates (e.g.,
Adam, RMSprop) are used to dynamically adjust the learning rate.

Practical Case Study: In deep learning, adaptive learning rate methods like Adam are used to
improve convergence.
8. What is the difference between batch gradient descent and stochastic
gradient descent?
Answer:
 Batch GD: Uses the entire dataset to compute the gradient.
 SGD: Uses a single data point (or mini-batch) to compute the gradient.
Let’s consider another numerical case study to compare Batch Gradient Descent (BGD) and
Stochastic Gradient Descent (SGD). This time, we’ll use a non-linear function to better
illustrate the differences in behavior between the two algorithms.

Case Study: Minimizing a Non-Linear Function

Objective Function: Consider the following non-linear function:
𝑓(𝑥) = 𝑥 − 3𝑥 + 2
This function has a global minimum at 𝑥 ≈ 2.25 and a local minimum at 𝑥 ≈ 0.
Initial Guess:
𝑥 = 1.5
Learning Rate:

𝛼 = 0.01

Gradient: The gradient (derivative) of the function is:

𝑓 (𝑥) = 4𝑥 − 9𝑥
Dataset: Let’s assume we have the following dataset of points sampled from the function:

𝑥 , 𝑓(𝑥 ) = (0,2), 𝑥 , 𝑓(𝑥 ) = (1,0), 𝑥 , 𝑓(𝑥 ) = (2, −6), 𝑥 , 𝑓(𝑥 ) = (3,2)

Initial Guess:
𝑥 = 1.5
Learning Rate:
𝛼 = 0.01

Batch Gradient Descent (BGD)

Compute the Gradient (using the entire dataset):
∂𝐿(𝑥) 1
= 𝑓 (𝑥 )
∂𝑥 4

where 𝑓 (𝑥 ) = 4𝑥 − 9𝑥 .

Update Rule:

∂𝐿(𝑥)
𝑥new = 𝑥old − 𝛼 ⋅
∂𝑥
Iteration 1:

o Compute the gradient at 𝑥 = 1.5:

𝑓 (𝑥 ) = 4(0) − 9(0) = 0
𝑓 (𝑥 ) = 4(1) − 9(1) = 4 − 9 = −5
𝑓 (𝑥 ) = 4(2) − 9(2) = 32 − 36 = −4
𝑓 (𝑥 ) = 4(3) − 9(3) = 108 − 81 = 27
∂𝐿(𝑥) 1 18
= (0 − 5 − 4 + 27) = = 4.5
∂𝑥 4 4
o Update 𝑥:
𝑥 = 𝑥 − 𝛼 ⋅ 4.5 = 1.5 − 0.01 ⋅ 4.5 = 1.455
Iteration 2:

o Compute the gradient at 𝑥 = 1.455:

𝑓 (𝑥 ) = 0, 𝑓 (𝑥 ) = −5, 𝑓 (𝑥 ) = −4, 𝑓 (𝑥 ) = 27
∂𝐿(𝑥) 1
= (0 − 5 − 4 + 27) = 4.5
∂𝑥 4
o Update 𝑥:
𝑥 = 𝑥 − 𝛼 ⋅ 4.5 = 1.455 − 0.01 ⋅ 4.5 = 1.41
Observation:

o The parameter 𝑥 is updated once per epoch using the entire dataset.
o The updates are smooth and deterministic.

Stochastic Gradient Descent (SGD)

Update Rule (using one data point at a time):

𝑥new = 𝑥old − 𝛼 ⋅ 𝑓 (𝑥 )

Iteration 1 (using 𝑥 , 𝑓(𝑥 ) = (0,2)):

o Compute the gradient:

𝑓 (𝑥 ) = 0
o Update 𝑥:
𝑥 = 𝑥 − 𝛼 ⋅ 0 = 1.5
Iteration 2 (using 𝑥 , 𝑓(𝑥 ) = (1,0)):

o Compute the gradient:

𝑓 (𝑥 ) = −5
o Update 𝑥:
𝑥 = 𝑥 − 𝛼 ⋅ (−5) = 1.5 + 0.01 ⋅ 5 = 1.55
Iteration 3 (using 𝑥 , 𝑓(𝑥 ) = (2, −6)):

o Compute the gradient:

𝑓 (𝑥 ) = −4
o Update 𝑥:
𝑥 = 𝑥 − 𝛼 ⋅ (−4) = 1.55 + 0.01 ⋅ 4 = 1.59
Iteration 4 (using 𝑥 , 𝑓(𝑥 ) = (3,2)):

o Compute the gradient:

𝑓 (𝑥 ) = 27
o Update 𝑥:
𝑥 = 𝑥 − 𝛼 ⋅ 27 = 1.59 − 0.01 ⋅ 27 = 1.32
Observation:

o The parameter 𝑥 is updated for each data point.

o The updates are noisy and less deterministic compared to BGD.

Key Differences:
Convergence Behavior:

o BGD converges smoothly but slowly.

o SGD converges faster but with oscillations.
Computational Cost:

o BGD is computationally expensive for large datasets.

o SGD is computationally efficient.
Noise in Updates:

o BGD updates are deterministic.

o SGD updates are noisy due to the use of individual data points.

When to Use:
 BGD: Suitable for small datasets or when stable convergence is required.
 SGD: Suitable for large datasets or when computational efficiency is critical. Mini-batch
SGD is often used as a compromise between BGD and SGD.
Practical Case Study: In training deep neural networks, SGD is preferred for large datasets due
to its scalability.

9. How is optimization implemented in Python? Provide an example.

Answer: Optimization in Python is implemented using libraries like scipy.optimize.
Numerical Example:
from scipy.optimize import minimize

def objective(x):
return x[0]**2 + x[1]**2

x0 = [1, 1]
result = minimize(objective, x0, method='BFGS')
print(result.x) # Optimal solution

Practical Case Study: In hyperparameter tuning, optimization algorithms are used to find the
best model parameters.

1. Gradient Descent for Minimizing a Function

Let’s minimize the function 𝑓(𝑥) = 𝑥 + 5𝑥 + 6 using gradient descent.
import numpy as np

# Define the function and its gradient

def f(x):
return x**2 + 5*x + 6

def grad_f(x):
return 2*x + 5

# Gradient Descent
def gradient_descent(starting_point, learning_rate, num_iterations):
x = starting_point
for i in range(num_iterations):
gradient = grad_f(x)
x = x - learning_rate * gradient
if i % 10 == 0:
print(f"Iteration {i}: x = {x}, f(x) = {f(x)}")
return x

# Parameters
starting_point = 0.0
learning_rate = 0.1
num_iterations = 50

# Run Gradient Descent

optimal_x = gradient_descent(starting_point, learning_rate, num_iterations)
print(f"Optimal x: {optimal_x}")

2. Case Study : Linear Regression with Gradient Descent

We’ll implement linear regression using gradient descent to fit a line to data.
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data

np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Add bias term (intercept)

X_b = np.c_[np.ones((100, 1)), X]

# Define the loss function (MSE) and its gradient

def compute_loss(X, y, theta):
m = len(y)
predictions = X.dot(theta)
loss = (1/(2*m)) * np.sum((predictions - y)**2)
return loss

def compute_gradient(X, y, theta):

m = len(y)
predictions = X.dot(theta)
gradient = (1/m) * X.T.dot(predictions - y)
return gradient

# Gradient Descent
def gradient_descent(X, y, theta, learning_rate, num_iterations):
loss_history = []
for i in range(num_iterations):
gradient = compute_gradient(X, y, theta)
theta = theta - learning_rate * gradient
loss = compute_loss(X, y, theta)
loss_history.append(loss)
return theta, loss_history

# Initialize parameters
theta = np.random.randn(2, 1)
learning_rate = 0.1
num_iterations = 1000

# Run Gradient Descent

theta_optimal, loss_history = gradient_descent(X_b, y, theta, learning_rate,
num_iterations)

# Plot the results

plt.plot(loss_history)
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.title("Loss vs. Iterations")
plt.show()

print(f"Optimal parameters (theta): {theta_optimal}")

3. Case Study : Neural Network Training with Adam Optimizer

We’ll use TensorFlow/Keras to train a simple neural network on the MNIST dataset.
import tensorflow as tf
from tensorflow.keras import layers, models

# Load MNIST dataset

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.reshape(-1, 28*28).astype("float32") / 255.0
X_test = X_test.reshape(-1, 28*28).astype("float32") / 255.0

# Build a simple neural network

model = models.Sequential([
layers.Dense(128, activation="relu", input_shape=(28*28,)),
layers.Dense(10, activation="softmax")
])

# Compile the model with Adam optimizer

model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])

# Train the model

history = model.fit(X_train, y_train, epochs=5, batch_size=32,
validation_split=0.2)

# Evaluate the model

test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy}")
Key Takeaways:
1. Basic Optimization: Gradient descent can be implemented from scratch for simple
functions.
2. Linear Regression: Gradient descent is used to fit a model to data by minimizing the
mean squared error (MSE).
3. Neural Networks: Frameworks like TensorFlow/Keras provide built-in optimizers (e.g.,
Adam) for training complex models.

MABA2 Calculus
No ratings yet
MABA2 Calculus
111 pages
Calculus Notes For Business
No ratings yet
Calculus Notes For Business
19 pages
Application of Functions in Business and Economics
50% (4)
Application of Functions in Business and Economics
25 pages
Op Tim Ization
No ratings yet
Op Tim Ization
28 pages
Lecture Notes On Differentiation
No ratings yet
Lecture Notes On Differentiation
12 pages
Calculus
No ratings yet
Calculus
54 pages
Business Mathematics Presentation
No ratings yet
Business Mathematics Presentation
28 pages
JEE Main Sets Relations and Functions Important Questions (2022)
No ratings yet
JEE Main Sets Relations and Functions Important Questions (2022)
16 pages
Applications of Differentiation
No ratings yet
Applications of Differentiation
30 pages
Lecture Notes On Differentiation MATH161 PDF
No ratings yet
Lecture Notes On Differentiation MATH161 PDF
12 pages
Higher Maths: Revision Notes
No ratings yet
Higher Maths: Revision Notes
26 pages
Ch6 MGT Mathematics Lecture Notes
No ratings yet
Ch6 MGT Mathematics Lecture Notes
11 pages
HBR Marketing Management
No ratings yet
HBR Marketing Management
67 pages
MA1010 Notes 1
No ratings yet
MA1010 Notes 1
38 pages
Functions and Optimization
No ratings yet
Functions and Optimization
48 pages
Differentiation
No ratings yet
Differentiation
59 pages
Ieee 493
No ratings yet
Ieee 493
9 pages
3) Differentiation FPM
No ratings yet
3) Differentiation FPM
38 pages
6 More Differentiationjryu
No ratings yet
6 More Differentiationjryu
20 pages
Optimisation Problem
No ratings yet
Optimisation Problem
29 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
I. How To Compute Overall Percentage Changes: Compounded M Times Per Year
No ratings yet
I. How To Compute Overall Percentage Changes: Compounded M Times Per Year
8 pages
Module 3 - Statistics
No ratings yet
Module 3 - Statistics
24 pages
Linear and Quadratic Functions
No ratings yet
Linear and Quadratic Functions
40 pages
Differentiation and Optimization
No ratings yet
Differentiation and Optimization
4 pages
1 The Mean Value Theorem
No ratings yet
1 The Mean Value Theorem
7 pages
Lecture 00 Math Review
No ratings yet
Lecture 00 Math Review
23 pages
Derivatives and Its Applications PDF
No ratings yet
Derivatives and Its Applications PDF
20 pages
CALCULUS 4A - Cubic Functions
No ratings yet
CALCULUS 4A - Cubic Functions
4 pages
Survival Analysis
No ratings yet
Survival Analysis
36 pages
Study Unit 5: Calculus Chapter 6: Sections 6.1, 6.2.1, 6.3.1 Chapter 8: Section 8.1, 8.2 and 8.5
No ratings yet
Study Unit 5: Calculus Chapter 6: Sections 6.1, 6.2.1, 6.3.1 Chapter 8: Section 8.1, 8.2 and 8.5
19 pages
Math 146 Co2
No ratings yet
Math 146 Co2
107 pages
Derivative Math
No ratings yet
Derivative Math
24 pages
Mat1143 Mat1543 Notes
No ratings yet
Mat1143 Mat1543 Notes
55 pages
Answer Key Second Internals
No ratings yet
Answer Key Second Internals
9 pages
6 - More Differentiation PDF
No ratings yet
6 - More Differentiation PDF
20 pages
Maths 3 - Calculus PDF
No ratings yet
Maths 3 - Calculus PDF
16 pages
Basic Differentiation
No ratings yet
Basic Differentiation
26 pages
Machine Learning Notes2
No ratings yet
Machine Learning Notes2
34 pages
1.1 Functions and Models
No ratings yet
1.1 Functions and Models
7 pages
BC Exam Formula List
No ratings yet
BC Exam Formula List
9 pages
QT Calculus
No ratings yet
QT Calculus
10 pages
Math150a 4 2video
No ratings yet
Math150a 4 2video
44 pages
Functions, Graphs, and Limits: By: Husni Tel: 0615257175 Alas Tel: 0615521558
No ratings yet
Functions, Graphs, and Limits: By: Husni Tel: 0615257175 Alas Tel: 0615521558
56 pages
Lecture 0 Math For Managerial Economics
No ratings yet
Lecture 0 Math For Managerial Economics
22 pages
XP 00 Cal1TT Ch3 Derivative P2 E-241
No ratings yet
XP 00 Cal1TT Ch3 Derivative P2 E-241
16 pages
Lesson 7
No ratings yet
Lesson 7
15 pages
Cmae Math 111 Notes 2020
No ratings yet
Cmae Math 111 Notes 2020
13 pages
Calculus Review
No ratings yet
Calculus Review
5 pages
Functions and Models
No ratings yet
Functions and Models
15 pages
FUNCTIONS
No ratings yet
FUNCTIONS
6 pages
Math in Business
No ratings yet
Math in Business
15 pages
Calc 2.1-3 The Derivative Presentation - Includes Pop Quizzes
No ratings yet
Calc 2.1-3 The Derivative Presentation - Includes Pop Quizzes
25 pages
8 Calculus
No ratings yet
8 Calculus
39 pages
Monotonocity: 13. Application of Derivative (Aod) - Tangent & Normal
No ratings yet
Monotonocity: 13. Application of Derivative (Aod) - Tangent & Normal
3 pages
CourseOutlineMLBA20 22
No ratings yet
CourseOutlineMLBA20 22
7 pages
Differentiation Minima and Maxima
No ratings yet
Differentiation Minima and Maxima
11 pages
Eco256 Exam Summary
No ratings yet
Eco256 Exam Summary
32 pages
IIT JEE Main Syllabus 2018 For Mathematics - Topperlearning
No ratings yet
IIT JEE Main Syllabus 2018 For Mathematics - Topperlearning
4 pages
Principal Components Analysis (PCA) in SPSS Statistics - Laerd Statistics
No ratings yet
Principal Components Analysis (PCA) in SPSS Statistics - Laerd Statistics
8 pages
CE 111 Class Introduction
No ratings yet
CE 111 Class Introduction
41 pages
Applications of Derivatives
No ratings yet
Applications of Derivatives
3 pages
Module 6 - Analysing Qualitative Data
No ratings yet
Module 6 - Analysing Qualitative Data
17 pages
Knapsack Problem
No ratings yet
Knapsack Problem
11 pages
Legendre Equation Problems
No ratings yet
Legendre Equation Problems
2 pages
Functions Lesson For Middle School
No ratings yet
Functions Lesson For Middle School
53 pages
Conjugate Gradient Method Report
No ratings yet
Conjugate Gradient Method Report
17 pages
Points of Inflection
No ratings yet
Points of Inflection
14 pages
Calculus Notes
No ratings yet
Calculus Notes
13 pages
Unit 2 Test Derivatives: Part A: Knowledge and Understanding (20 Marks)
No ratings yet
Unit 2 Test Derivatives: Part A: Knowledge and Understanding (20 Marks)
3 pages
Six Sigma Practice
No ratings yet
Six Sigma Practice
4 pages
Hexing Technology Company LTD: 10.2.1 When A Nonconformity Occurs, Including Any Arising From Complaints, The
No ratings yet
Hexing Technology Company LTD: 10.2.1 When A Nonconformity Occurs, Including Any Arising From Complaints, The
5 pages
Quantitative Analysis For Management 12th Edition Barry Render
No ratings yet
Quantitative Analysis For Management 12th Edition Barry Render
311 pages
9 Laplace Transformation 2017
No ratings yet
9 Laplace Transformation 2017
48 pages
HA2 2nd Sem. Exam Review 2
No ratings yet
HA2 2nd Sem. Exam Review 2
10 pages
Batch 6 1
No ratings yet
Batch 6 1
65 pages
Differentiation #4
No ratings yet
Differentiation #4
11 pages
RD Sharma Class 11 Maths Chapter 32
No ratings yet
RD Sharma Class 11 Maths Chapter 32
48 pages
False Position Matlab
No ratings yet
False Position Matlab
12 pages
Probability and Statistics PDF
No ratings yet
Probability and Statistics PDF
30 pages
6.4 Logarithmic Equations and Inequalities
No ratings yet
6.4 Logarithmic Equations and Inequalities
10 pages
Solution: See Attached Excel Sheet For Detailed Calculation of Below Equations and Values
No ratings yet
Solution: See Attached Excel Sheet For Detailed Calculation of Below Equations and Values
2 pages
Engineering Mathematics: Assignment-C2
No ratings yet
Engineering Mathematics: Assignment-C2
4 pages
MATH1101 (Final Note 1)
No ratings yet
MATH1101 (Final Note 1)
17 pages
Equilibrium
No ratings yet
Equilibrium
12 pages
EE5121: Convex Optimization: Assignment 5
No ratings yet
EE5121: Convex Optimization: Assignment 5
2 pages
الماجستير المهنى فى العلوم الاكتوارية PDF
No ratings yet
الماجستير المهنى فى العلوم الاكتوارية PDF
2 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Speed Mathamatics
From Everand
Speed Mathamatics
Naila Hina
1/5 (1)