NoamLR
- final class fairseq2.optim.lr_scheduler.NoamLR(optimizer, num_warmup_steps, *, last_epoch=-1, verbose=False)[source]
Bases:
LRSchedulerBase
Represents the learning rate schedule described in Section 5.3 of Vaswani et al. [VSP+17].
\[\eta_t = \eta_{base} \min(\frac{1}{\sqrt{t}}, \frac{t}{T_{warmup}} \frac{1}{\sqrt{T_{warmup}}})\]This corresponds to increasing the learning rate linearly for the first \(T_{warmup}\) training steps, and decreasing it thereafter proportionally to the inverse square root of the step number. In the paper, the authors use the square root of the dimensionality of the model as \(\eta_{base}\).
This scheduler is commonly referred to as Noam, after the second author of the paper, Noam Shazeer.
Note
This scheduler is not chainable.
- Parameters:
- get_last_lr()
Return last computed learning rate by current scheduler.
- load_state_dict(state_dict)
Loads the schedulers state.
- Args:
- state_dict (dict): scheduler state. Should be an object returned
from a call to
state_dict()
.
- print_lr(is_verbose, group, lr, epoch=None)
Display the current learning rate.