0% found this document useful (0 votes)
3 views5 pages

Coding Assignment Report

The report discusses the use of Dynamic Time Warping (DTW) over Euclidean distance for analyzing human activity data due to DTW's ability to handle temporal warping, phase variation, and shape-based similarity. It also compares Diffusion Maps and t-SNE, highlighting that Diffusion Maps better preserve temporal progression patterns, making them suitable for sequential human motion analysis. Additionally, the report evaluates three optimization methods (Nelder-Mead, Simulated Annealing, CMA-ES), outlining their strengths, weaknesses, and best use cases, ultimately emphasizing trade-offs between accuracy, speed, and generality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views5 pages

Coding Assignment Report

The report discusses the use of Dynamic Time Warping (DTW) over Euclidean distance for analyzing human activity data due to DTW's ability to handle temporal warping, phase variation, and shape-based similarity. It also compares Diffusion Maps and t-SNE, highlighting that Diffusion Maps better preserve temporal progression patterns, making them suitable for sequential human motion analysis. Additionally, the report evaluates three optimization methods (Nelder-Mead, Simulated Annealing, CMA-ES), outlining their strengths, weaknesses, and best use cases, ultimately emphasizing trade-offs between accuracy, speed, and generality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Mathematical foundation for AI (CS303)

Name: Nishit Prajapati


ID: 24120036
Coding Assignment Report

Problem 1
Here is the link to the colab notebook which contains the code that solves the problem 1.
The explanation and the difference between DTW distance and Euclidean distance have
already been discussed in the notebook above, so I am not repeating the same thing here.
Why DTW distance was used to construct the kernel and do the diffusion map embedding
instead of Euclidean distance?
➔ The reason lies in the UCI HAR DATASET itself only. DTW was selected in place of
Euclidean distance due to three significant properties of human activity data:
Temporal warping tolerance: Synchronizes offset activity patterns (e.g., upstairs vs
downstairs) by warping the time-axis nonlinearly.
Phase variation robustness: It synchronizes sequences of different speeds with
dynamic programming alignment.
Shape-based similarity: Maintains morphological features of sensor data irrespective
of local time offsets.

Visual inspection showed DTW produces large warping paths among activities (see
below image), while Euclidean distance produces artificial mismatches due to rigid
alignment.
In conclusion :
• Euclidean Distance: Measures the straight-line distance between two sequences
of equal length. It is sensitive to shifts and distortions in the time axis.
• DTW Distance: Allows flexible alignment by warping the time axis, making it more
robust for comparing time series with varying speeds or misalignments.

Thus, DTW is more suitable for time-dependent sequences, while Euclidean distance
is a simpler, rigid comparison method. And HAR dataset contains time-dependent
sequences as data was sampled in fixed-width sliding windows of 2.56 seconds with a
50% overlap (128 readings per window).

Why Diffusion Maps outperform PCA/t-SNE?


➔ To answer this, we can look at the silhouette score for each type of clustering.
Silhouette score is a metric used to evaluate the quality of clustering, and it represents
how far a data point is from points in neighbouring clusters. Greater the silhouette
score more dense and well-separated clusters are. Now, if we look at this score for
different clustering:
So, if we observe all the above images, we can see both Diffusion Maps & t-SNE has
good score when compared to other clustering.
• Raw Feature Space: Exhibits the worst clustering quality with too much overlap
between the majority of activity classes.
• PCA: Provides slightly better separation than original features but still has considerable
overlap between some classes.
• Diffusion Maps: Produces a unique structure with clear separation between
prominent activity groups, exposing underlying manifold structure in the data.
• t-SNE: Captures the ideal cluster quality with clear boundaries among different activity
categories and well-formed clusters.
Now, if we compare t-SNE & Diffusion map, I found:
• Diffusion maps retain both local neighbourhood similarity and global manifold
structure simultaneously, whereas t-SNE preserves neighbourhood relationships but
often distorts global patterns. This can be observed in the manner the curved manifold
structure reveals the continuous relationship between activities.
• The diffusion map embedding shows how systems evolve over time by approximating
the eigenfunctions of the normalized graph Laplacian that characterizes the behaviour
of the system. This makes it particularly well-suited for finding patterns in time-series
data.
• Diffusion maps view time-series observations as samples from evolving distributions
and unveil the underlying statistical structure. That is evident from the way the clusters
trace a smooth curved trajectory. For noisy measurement data in time series, diffusion
maps are better than PCA (which only applies to linear data) and t-SNE (which may
vary greatly depending on the parameters you choose).
Thus, while t-SNE showed better cluster separation (0.065 > 0.030), Diffusion Maps
uniquely preserved temporal progression patterns through curved embeddings that
correlate with activity state transitions. This makes them particularly suitable for
analysing sequential human motion data where temporal relationships between states
are critical.
Problem 2
Here is the link to the colab notebook which contains the code that solves the problem 2.
The explanation along with their strengths, limitations and process for these three-
optimization method(Nelder-Mead, Simulated Annealing(SA), CMA-ES) has already been
discussed in the colab notebook(link on the above line), so not repeating the same thing here.
Now, what are the trade-offs of each method?
Note: To answer this question, I have used some results obtain ed from the code
available in the above linked notebook.
➔ Below are the results which will helps us to observe trade-offs for each method.

1. Nelder-Mead Method
Strengths:
Low computation cost: Obtained results within less than 146 iterations for
optimization of Rosenbrock function (compared to 4118 for SA).
Simplicity: No gradient calculations are required.
Rapid neighbourhood convergence: Best for smooth convex problems.

Weaknesses:
Global search is poor: Failed to reduce Rastrigin function (function value=7.96) and
Ackley function(function value =6.56) due to local minima trapping.
Dimensionality constraints: Struggled with tuning SVM hyperparameters
(accuracy=0.91 vs CMA-ES's 0.9 with 1/2 evaluations).

Best Use Case: Low-dimensional, unimodal problems with limited compute budgets.

2. Simulated Annealing (SA)


Strengths:
Global optimization: All test functions optimized to f≈0 (even for Rastrigin, Ackley).
Noise tolerance: Applicable in rugged terrain through probabilistic acceptability.

Weaknesses:
High compute cost: Required over 4,000 iterations for test functions.
Parameter sensitivity: Performance is highly dependent on cooling schedule
adjustment.
Inefficient convergence: Took 6,009 evaluations for SVM tuning versus Nelder-Mead's
49 evaluations.
Best Use Case: Multimodal problems where global optimum quality is warranted by
computational cost.

3. CMA-ES
Strengths:
Balanced search: Correctly solved Rosenbrock/Ackley precisely (function value≈1e-12)
with moderate iterations.
Flexibility: Self-tuning covariance matrix accommodates ill-conditioned topographies.
Sample efficiency: Finished SVM tuning in 30 tests (1/200 of SA's cost).

Weaknesses:
Premature convergence: Poor Rastrigin solution (function value = 0.99) due to
population size limitations.
Memory overhead: Saves covariance matrix (O(n²) complexity).
Discrete parameter problems: Struggled with categorical kernel selection in SVM.

Best Use Case: Medium-dimensional continuous optimization with correlated


parameters.

Implications:
Rosenbrock-type problems: Apply Nelder-Mead with a fast search using smooth-
gradients.
Multimodal landscapes (Rastrigin/Ackley): Optimize SA over computational expense.
ML hyperparameter tuning: CMA-ES for continuous parameters, Nelder-Mead for
mixed spaces.
In conclusion, Trade-offs are:
Accuracy and Speed: SA’s results are quite accurate but speed is slow while the Nelder-
Mead was fast but not accurate for the Rastrigin and Ackley functions.
Generality and Specialization: Nelder-Mead's simplex degrades in large dimensions
but performs well in low-Dimensional functions.
Automation and Control: CMA-ES makes tuning easier but is harder to understand.

-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-Reports ends here-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-

You might also like