PolynomialDecayLR
- final class fairseq2.optim.lr_scheduler.PolynomialDecayLR(optimizer, num_steps, num_warmup_steps, *, power=1.0, start_lr=0.0, final_lr=0.0, last_epoch=-1, verbose=False)[source]
Bases:
LRSchedulerBase
Represents the polynomial decay learning rate schedule.
During warmup:
\[\eta_t = \eta_{base} \frac{t}{T_{warmup}}\]After warmup:
\[\eta_t = \eta_{final} + (\eta_{base} - \eta_{final}) (\frac{T - t}{T - T_{warmup}})^{p}\]This corresponds to increasing the learning rate linearly for the first \(T_{warmup}\) training steps to the base learning rate, and decreasing it thereafter for \(T - T_{warmup}\) steps to the final learning rate using a polynomial of degree \(p\).
Note
This scheduler is not chainable.
- Parameters:
optimizer (Optimizer) – The associated optimizer.
num_steps (int) – The total number of steps, including warmup, over which to decay the learning rate.
num_warmup_steps (int) – The number of warmup steps.
power (float) – The exponent of the polynomial used for decay.
start_lr (float | Sequence[float]) – The initial warmup learning rate of all parameter groups, or of each parameter group respectively.
final_lr (float | Sequence[float]) – The final learning rate of all parameter groups, or of each parameter group respectively.
last_epoch (int) – The index of the last epoch.
verbose (bool) – If
True
, prints a message to stdout for each update.
- get_last_lr()
Return last computed learning rate by current scheduler.
- load_state_dict(state_dict)
Loads the schedulers state.
- Args:
- state_dict (dict): scheduler state. Should be an object returned
from a call to
state_dict()
.
- print_lr(is_verbose, group, lr, epoch=None)
Display the current learning rate.