PolynomialDecayLR

final class fairseq2.optim.lr_scheduler.PolynomialDecayLR(optimizer, num_steps, num_warmup_steps, *, power=1.0, start_lr=0.0, final_lr=0.0, last_epoch=-1, verbose=False)[source]

Bases: LRSchedulerBase

Represents the polynomial decay learning rate schedule.

During warmup:

\[\eta_t = \eta_{base} \frac{t}{T_{warmup}}\]

After warmup:

\[\eta_t = \eta_{final} + (\eta_{base} - \eta_{final}) (\frac{T - t}{T - T_{warmup}})^{p}\]

This corresponds to increasing the learning rate linearly for the first \(T_{warmup}\) training steps to the base learning rate, and decreasing it thereafter for \(T - T_{warmup}\) steps to the final learning rate using a polynomial of degree \(p\).

Note

This scheduler is not chainable.

Parameters:
  • optimizer (Optimizer) – The associated optimizer.

  • num_steps (int) – The total number of steps, including warmup, over which to decay the learning rate.

  • num_warmup_steps (int) – The number of warmup steps.

  • power (float) – The exponent of the polynomial used for decay.

  • start_lr (float | Sequence[float]) – The initial warmup learning rate of all parameter groups, or of each parameter group respectively.

  • final_lr (float | Sequence[float]) – The final learning rate of all parameter groups, or of each parameter group respectively.

  • last_epoch (int) – The index of the last epoch.

  • verbose (bool) – If True, prints a message to stdout for each update.

get_last_lr()

Return last computed learning rate by current scheduler.

load_state_dict(state_dict)

Loads the schedulers state.

Args:
state_dict (dict): scheduler state. Should be an object returned

from a call to state_dict().

print_lr(is_verbose, group, lr, epoch=None)

Display the current learning rate.

state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.