2,5 Stochastic Gradient Descent
2,5 Stochastic Gradient Descent
Uses a single
random sample or Uses the entire
a small batch of dataset (batch) at
samples at each each iteration.
Dataset Usage iteration.
Faster Slower
convergence due convergence due
to frequent to less frequent
Convergence updates. updates.
Requires less
Requires more
memory as it
memory to hold
processes fewer
the entire dataset
Memory data points at a
in memory.
Requirement time.
Stochastic
Gradient Descent Batch Gradient
Aspect (SGD) Descent
Less sensitive to
More sensitive to
initial parameter
initial parameter
Initialization values due to
values.
Sensitivity frequent updates.