qhoptim: Quasi-hyperbolic optimization¶

Star Fork Issue

The qhoptim library provides PyTorch and TensorFlow implementations of the quasi-hyperbolic momentum (QHM) and quasi-hyperbolic Adam (QHAdam) optimization algorithms from Facebook AI Research.

Quickstart¶

Use this one-liner for installation:

$ pip install qhoptim

Then, you can instantiate the optimizers in PyTorch:

>>> from qhoptim.pyt import QHM, QHAdam

# something like this for QHM
>>> optimizer = QHM(model.parameters(), lr=1.0, nu=0.7, momentum=0.999)

# or something like this for QHAdam
>>> optimizer = QHAdam(
...     model.parameters(), lr=1e-3, nus=(0.7, 1.0), betas=(0.995, 0.999))

Or in TensorFlow:

>>> from qhoptim.tf import QHMOptimizer, QHAdamOptimizer

# something like this for QHM
>>> optimizer = QHMOptimizer(
...     learning_rate=1.0, nu=0.7, momentum=0.999)

# or something like this for QHAdam
>>> optimizer = QHAdamOptimizer(
...     learning_rate=1e-3, nu1=0.7, nu2=1.0, beta1=0.995, beta2=0.999)

Please refer to the links on the menubar for detailed installation instructions and API references.

Choosing QHM parameters¶

For those who use momentum or Nesterov’s accelerated gradient with momentum constant \(\beta = 0.9\), we recommend trying out QHM with \(\nu = 0.7\) and momentum constant \(\beta = 0.999\). You’ll need to normalize the learning rate by dividing by \(1 - \beta_{old}\).

Similarly, for those who use Adam with \(\beta_1 = 0.9\), we recommend trying out QHAdam with \(\nu_1 = 0.7\), \(\beta_1 = 0.995\), \(\nu_2 = 1\), and all other parameters unchanged.

Below is a handy widget to help convert from SGD with (Nesterov) momentum to QHM:

QHM Hyperparameter Advisor

Your current SGD optimizer is equivalent to QHM with:
QHM learning rate (alpha):

QHM immediate discount (nu):

QHM momentum (beta):

We suggest that you try the following QHM hyperparameters:
QHM learning rate (alpha):

QHM immediate discount (nu):

QHM momentum (beta):

Reference¶

QHM and QHAdam were proposed in the ICLR 2019 paper “Quasi-hyperbolic momentum and Adam for deep learning”. We recommend reading the paper for both theoretical insights into and empirical analyses of the algorithms.

If you find the algorithms useful in your research, we ask that you cite the paper as follows:

@inproceedings{ma2019qh,
  title={Quasi-hyperbolic momentum and Adam for deep learning},
  author={Jerry Ma and Denis Yarats},
  booktitle={International Conference on Learning Representations},
  year={2019}
}

GitHub¶

The project’s GitHub repository can be found here. Bugfixes and contributions are very much appreciated!

Table of contents

Index¶

Index