Rate this Page

SWALR#

class torch.optim.swa_utils.SWALR(optimizer, swa_lr, anneal_epochs=10, anneal_strategy='cos', last_epoch=-1)[source]#

Anneals the learning rate in each parameter group to a fixed value.

This learning rate scheduler is meant to be used with Stochastic Weight Averaging (SWA) method (see torch.optim.swa_utils.AveragedModel).

Parameters
  • optimizer (torch.optim.Optimizer) – wrapped optimizer

  • swa_lrs (float or list) – the learning rate value for all param groups together or separately for each group.

  • annealing_epochs (int) – number of epochs in the annealing phase (default: 10)

  • annealing_strategy (str) – “cos” or “linear”; specifies the annealing strategy: “cos” for cosine annealing, “linear” for linear annealing (default: “cos”)

  • last_epoch (int) – the index of the last epoch (default: -1)

The SWALR scheduler can be used together with other schedulers to switch to a constant learning rate late in the training as in the example below.

Example

>>> loader, optimizer, model = ...
>>> lr_lambda = lambda epoch: 0.9
>>> scheduler = torch.optim.lr_scheduler.MultiplicativeLR(optimizer,
>>>        lr_lambda=lr_lambda)
>>> swa_scheduler = torch.optim.swa_utils.SWALR(optimizer,
>>>        anneal_strategy="linear", anneal_epochs=20, swa_lr=0.05)
>>> swa_start = 160
>>> for i in range(300):
>>>      for input, target in loader:
>>>          optimizer.zero_grad()
>>>          loss_fn(model(input), target).backward()
>>>          optimizer.step()
>>>      if i > swa_start:
>>>          swa_scheduler.step()
>>>      else:
>>>          scheduler.step()
get_last_lr()[source]#

Get the most recent learning rates computed by this scheduler.

Returns

A list of learning rates with entries for each of the optimizer’s param_groups, with the same types as their group["lr"]s.

Return type

list[float | Tensor]

Note

The returned Tensors are copies, and never alias the optimizer’s group["lr"]s.

get_lr()[source]#

Compute the next learning rate for each of the optimizer’s param_groups.

Uses anneal_func to interpolate between each group’s group["lr"] and group["swa_lr"] over anneal_epochs epochs. Once anneal_epochs is reached, keeps the learning rate fixed at group["swa_lr"].

Returns

A list of learning rates for each of the optimizer’s param_groups with the same types as their current group["lr"]s.

Return type

list[float | Tensor]

Note

If you’re trying to inspect the most recent learning rate, use get_last_lr() instead.

Note

The returned Tensors are copies, and never alias the optimizer’s group["lr"]s.

load_state_dict(state_dict)[source]#

Load the scheduler’s state.

Parameters

state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().

state_dict()[source]#

Return the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer or anneal_func.

Return type

dict[str, Any]

step(epoch=None)[source]#

Step the scheduler.

Parameters

epoch (int, optional) –

Deprecated since version 1.4: If provided, sets last_epoch to epoch and uses _get_closed_form_lr() if it is available. This is not universally supported. Use step() without arguments instead.

Note

Call this method after calling the optimizer’s step().