PolynomialDecayLR
- final class fairseq2.optim.lr_scheduler.PolynomialDecayLR(optimizer, num_steps, num_warmup_steps, *, power=1.0, start_lr=0.0, final_lr=0.0, last_epoch=-1)[source]
Bases:
AbstractLRScheduler
Represents the polynomial decay learning rate schedule.
During warmup:
\[\eta_t = \eta_{base} \frac{t}{T_{warmup}}\]After warmup:
\[\eta_t = \eta_{final} + (\eta_{base} - \eta_{final}) (\frac{T - t}{T - T_{warmup}})^{p}\]This corresponds to increasing the learning rate linearly for the first \(T_{warmup}\) training steps to the base learning rate, and decreasing it thereafter for \(T - T_{warmup}\) steps to the final learning rate using a polynomial of degree \(p\).
Note
This scheduler is not chainable.
- Parameters:
optimizer (Optimizer) – The associated optimizer.
num_steps (int) – The total number of steps, including warmup, over which to decay the learning rate.
num_warmup_steps (int) – The number of warmup steps.
power (float) – The exponent of the polynomial used for decay.
start_lr (Union[float, Sequence[float]]) – The initial warmup learning rate of all parameter groups, or of each parameter group respectively.
final_lr (Union[float, Sequence[float]]) – The final learning rate of all parameter groups, or of each parameter group respectively.
last_epoch (int) – The index of the last epoch.
- load_state_dict(state_dict)
Load the scheduler’s state.
- Args:
- state_dict (dict): scheduler state. Should be an object returned
from a call to
state_dict()
.
- print_lr(is_verbose, group, lr, epoch=None)
Display the current learning rate.
Deprecated since version 2.4:
print_lr()
is deprecated. Please useget_last_lr()
to access the learning rate.
- state_dict()
Return the state of the scheduler as a
dict
.It contains an entry for every variable in self.__dict__ which is not the optimizer.
- step(epoch=None)
Perform a step.