Coding Assignment Report
Coding Assignment Report
Problem 1
Here is the link to the colab notebook which contains the code that solves the problem 1.
The explanation and the difference between DTW distance and Euclidean distance have
already been discussed in the notebook above, so I am not repeating the same thing here.
Why DTW distance was used to construct the kernel and do the diffusion map embedding
instead of Euclidean distance?
➔ The reason lies in the UCI HAR DATASET itself only. DTW was selected in place of
Euclidean distance due to three significant properties of human activity data:
Temporal warping tolerance: Synchronizes offset activity patterns (e.g., upstairs vs
downstairs) by warping the time-axis nonlinearly.
Phase variation robustness: It synchronizes sequences of different speeds with
dynamic programming alignment.
Shape-based similarity: Maintains morphological features of sensor data irrespective
of local time offsets.
Visual inspection showed DTW produces large warping paths among activities (see
below image), while Euclidean distance produces artificial mismatches due to rigid
alignment.
In conclusion :
• Euclidean Distance: Measures the straight-line distance between two sequences
of equal length. It is sensitive to shifts and distortions in the time axis.
• DTW Distance: Allows flexible alignment by warping the time axis, making it more
robust for comparing time series with varying speeds or misalignments.
Thus, DTW is more suitable for time-dependent sequences, while Euclidean distance
is a simpler, rigid comparison method. And HAR dataset contains time-dependent
sequences as data was sampled in fixed-width sliding windows of 2.56 seconds with a
50% overlap (128 readings per window).
1. Nelder-Mead Method
Strengths:
Low computation cost: Obtained results within less than 146 iterations for
optimization of Rosenbrock function (compared to 4118 for SA).
Simplicity: No gradient calculations are required.
Rapid neighbourhood convergence: Best for smooth convex problems.
Weaknesses:
Global search is poor: Failed to reduce Rastrigin function (function value=7.96) and
Ackley function(function value =6.56) due to local minima trapping.
Dimensionality constraints: Struggled with tuning SVM hyperparameters
(accuracy=0.91 vs CMA-ES's 0.9 with 1/2 evaluations).
Best Use Case: Low-dimensional, unimodal problems with limited compute budgets.
Weaknesses:
High compute cost: Required over 4,000 iterations for test functions.
Parameter sensitivity: Performance is highly dependent on cooling schedule
adjustment.
Inefficient convergence: Took 6,009 evaluations for SVM tuning versus Nelder-Mead's
49 evaluations.
Best Use Case: Multimodal problems where global optimum quality is warranted by
computational cost.
3. CMA-ES
Strengths:
Balanced search: Correctly solved Rosenbrock/Ackley precisely (function value≈1e-12)
with moderate iterations.
Flexibility: Self-tuning covariance matrix accommodates ill-conditioned topographies.
Sample efficiency: Finished SVM tuning in 30 tests (1/200 of SA's cost).
Weaknesses:
Premature convergence: Poor Rastrigin solution (function value = 0.99) due to
population size limitations.
Memory overhead: Saves covariance matrix (O(n²) complexity).
Discrete parameter problems: Struggled with categorical kernel selection in SVM.
Implications:
Rosenbrock-type problems: Apply Nelder-Mead with a fast search using smooth-
gradients.
Multimodal landscapes (Rastrigin/Ackley): Optimize SA over computational expense.
ML hyperparameter tuning: CMA-ES for continuous parameters, Nelder-Mead for
mixed spaces.
In conclusion, Trade-offs are:
Accuracy and Speed: SA’s results are quite accurate but speed is slow while the Nelder-
Mead was fast but not accurate for the Rastrigin and Ackley functions.
Generality and Specialization: Nelder-Mead's simplex degrades in large dimensions
but performs well in low-Dimensional functions.
Automation and Control: CMA-ES makes tuning easier but is harder to understand.