NoamLR

final class fairseq2.optim.lr_scheduler.NoamLR(optimizer, num_warmup_steps, *, last_epoch=-1)[source]

Bases: AbstractLRScheduler

Represents the learning rate schedule described in Section 5.3 of Vaswani et al. [VSP+17].

\[\eta_t = \eta_{base} \min(\frac{1}{\sqrt{t}}, \frac{t}{T_{warmup}} \frac{1}{\sqrt{T_{warmup}}})\]

This corresponds to increasing the learning rate linearly for the first \(T_{warmup}\) training steps, and decreasing it thereafter proportionally to the inverse square root of the step number. In the paper, the authors use the square root of the dimensionality of the model as \(\eta_{base}\).

This scheduler is commonly referred to as Noam, after the second author of the paper, Noam Shazeer.

Note

This scheduler is not chainable.

Parameters:

optimizer (Optimizer) – The associated optimizer.
num_warmup_steps (int) – The number of warmup steps.
last_epoch (int) – The index of the last epoch.

get_last_lr()

Return last computed learning rate by current scheduler.

Return type:: List[float]

get_lr()

Compute learning rate using chainable form of the scheduler.

Return type:: List[float]

load_state_dict(state_dict)

Load the scheduler’s state.

Args:

state_dict (dict): scheduler state. Should be an object returned: from a call to state_dict().

print_lr(is_verbose, group, lr, epoch=None)

Display the current learning rate.

Deprecated since version 2.4: print_lr() is deprecated. Please use get_last_lr() to access the learning rate.

state_dict()

Return the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.

step(epoch=None)

Perform a step.