MyleLR

final class fairseq2.optim.lr_scheduler.MyleLR(optimizer, num_warmup_steps, *, start_lr=0.0, last_epoch=-1, verbose=False)[source]

Bases: LRSchedulerBase

Represents a scaled version of NoamLR that preserves the base learning rate of the associated optimizer.

\[\eta_t = \eta_{base} \min(\sqrt{\frac{T_{warmup}}{t}}, \frac{t}{T_{warmup}})\]

Essentially, this is Noam learning rate schedule scaled by the square root of the number of warmup steps. It was originally proposed and implemented by Myle Ott in fairseq under the name InverseSquareRootLR.

It corresponds to increasing the learning rate linearly for the first \(T_{warmup}\) training steps to the base learning rate, and decreasing it thereafter proportionally to the inverse square root of the step number.

Note

This scheduler is not chainable.

Parameters:

optimizer (Optimizer) – The associated optimizer.
num_warmup_steps (int) – The number of warmup steps.
start_lr (float | Sequence[float]) – The initial warmup learning rate of all parameter groups, or of each parameter group respectively.
last_epoch (int) – The index of the last epoch.
verbose (bool) – If True, prints a message to stdout for each update.

get_last_lr(): Return last computed learning rate by current scheduler.

load_state_dict(state_dict)

Loads the schedulers state.

Args:

state_dict (dict): scheduler state. Should be an object returned: from a call to state_dict().

print_lr(is_verbose, group, lr, epoch=None): Display the current learning rate.

state_dict()

Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.__dict__ which is not the optimizer.