CosineAnnealingLR
- final class fairseq2.optim.lr_scheduler.CosineAnnealingLR(optimizer, cycle_len, num_warmup_steps, *, cycle_mul=1.0, lr_mul=1.0, start_lr=0.0, final_lr=0.0, last_epoch=-1, verbose=False)[source]
Bases:
LRSchedulerBase
Represents the learning rate schedule described in Loshchilov and Hutter [LH17].
During warmup:
\[\eta_t = \eta_{base} \frac{t}{T_{warmup}}\]After warmup:
\[\eta_t = \eta_{final}^i + \frac{1}{2} (\eta_{base}^i - \eta_{final}^i) (1 + \text{cos}(\pi \frac{t_{i}}{T_{i}}))\]where \(i\) is the number of the current annealing cycle, \(t_i\) is the number of steps taken since the last restart, and \(T_i\) is the total number of steps within the \(i\)-th cycle (i.e. length of the cycle).
Cosine Annealing is a type of learning rate schedule that has the effect of starting with a large learning rate that is relatively rapidly decreased to a minimum value before being increased rapidly again.
Please refer to the paper to learn more about the details.
In addition to the original schedule, this implementation also supports a warmup phase where the learning rate is linearly increased for the first \(T_{warmup}\) training steps to the base learning rate.
Note
This scheduler is not chainable.
- Parameters:
optimizer (Optimizer) – The associated optimizer.
cycle_len (int) – The number of steps within the first cycle.
num_warmup_steps (int) – The number of warmup steps.
cycle_mul (float) – The factor to grow the length of each cycle.
lr_mul (float) – The factor to scale the base and final learning rate at the end of each cycle.
start_lr (float | Sequence[float]) – The initial warmup learning rate of all parameter groups, or of each parameter group respectively.
final_lr (float | Sequence[float]) – The final learning rate of all parameter groups, or of each parameter group respectively, at the end of the first cycle.
last_epoch (int) – The index of the last epoch.
verbose (bool) – If
True
, prints a message to stdout for each update.
- get_last_lr()
Return last computed learning rate by current scheduler.
- load_state_dict(state_dict)
Loads the schedulers state.
- Args:
- state_dict (dict): scheduler state. Should be an object returned
from a call to
state_dict()
.
- print_lr(is_verbose, group, lr, epoch=None)
Display the current learning rate.