How to perform optimization

By default, all optimizers assume a centered and reduced prior at the beginning of the optimization (i.e. 0 mean and unitary standard deviation). They are however able to find solutions far from this initial prior.

Basic example

Minimizing a function using an optimizer (here OnePlusOne) can be easily run with:

import nevergrad as ng

def square(x, y=12):
    return sum((x - .5)**2) + abs(y)

# optimization on x as an array of shape (2,)
optimizer = ng.optimizers.OnePlusOne(parametrization=2, budget=100)
recommendation = optimizer.minimize(square)  # best value
print(recommendation.value)
# >>> [0.49971112 0.5002944 ]

parametrization=n is a shortcut to state that the function has only one variable, of dimension n, See the Parametrization section for more complex parametrizations.

parametrization=n is a shortcut to state that the function has only one variable, continuous, of dimension n, Defining the following parametrization instead will optimize on both x (continuous, dimension 2) and y (continuous, dimension 1).

instrum = ng.p.Instrumentation(ng.p.Array(shape=(2,)), y=ng.p.Scalar())
optimizer = ng.optimizers.OnePlusOne(parametrization=instrum, budget=100)
recommendation = optimizer.minimize(square)
print(recommendation.value)
# >>> ((array([0.52213095, 0.45030925]),), {'y': -0.0003603100877068604})

We can work in the discrete case as well, e.g. with the one-max function applied on {0,1,2,3,4,5,6}^10:

import nevergrad as ng

def onemax(*x):
    return len(x) - x.count(1)

# Discrete, ordered
variables = list(ng.p.TransitionChoice(list(range(7))) for _ in range(10))
instrum = ng.p.Instrumentation(*variables)
optimizer = ng.optimizers.DiscreteOnePlusOne(parametrization=instrum, budget=100, num_workers=1)

recommendation = optimizer.provide_recommendation()
for _ in range(optimizer.budget):
    x = optimizer.ask()
    loss = onemax(*x.args, **x.kwargs)
    optimizer.tell(x, loss)

recommendation = optimizer.provide_recommendation()
print(recommendation.value)
# >>> ((1, 1, 0, 1, 1, 4, 1, 1, 1, 1), {})
print(recommendation.args)
# >>> (1, 1, 0, 1, 1, 4, 1, 1, 1, 1)

Using several workers

Running the function evaluation in parallel with several workers is as easy as providing an executor:

from concurrent import futures
optimizer = ng.optimizers.OnePlusOne(parametrization=instrum, budget=10, num_workers=2)

with futures.ThreadPoolExecutor(max_workers=optimizer.num_workers) as executor:
    recommendation = optimizer.minimize(square, executor=executor, batch_mode=False)

With batch_mode=True it will ask the optimizer for num_workers points to evaluate, run the evaluations, then update the optimizer with the num_workers function outputs, and repeat until the budget is all spent. Since no executor is provided, the evaluations will be sequential. num_workers > 1 with no executor is therefore suboptimal but nonetheless useful for evaluation purpose (i.e. we simulate parallelism but have no actual parallelism). batch_mode=False (steady state mode) will ask for a new evaluation whenever a worker is ready.

Ask and tell interface

An ask and tell interface is also available. The 3 key methods for this interface are respectively:

  • ask: suggest a candidate on which to evaluate the function to optimize.

  • tell: for updated the optimizer with the value of the function for a candidate.

  • provide_recommendation: returns the candidate the algorithms considers the best.

For most optimization algorithms in the platform, they can be called in arbitrary order - asynchronous optimization is OK. Some algorithms (with class attribute no_parallelization=True however do not support this.

The Parameter class holds attribute value which contain the actual value to evaluate through the function.

Here is a simpler example in the sequential case (this is what happens in the optimize method for num_workers=1):

import nevergrad as ng

def square(x, y=12):
    return sum((x - .5)**2) + abs(y)

instrum = ng.p.Instrumentation(ng.p.Array(shape=(2,)), y=ng.p.Scalar())  # We are working on R^2 x R.
optimizer = ng.optimizers.OnePlusOne(parametrization=instrum, budget=100, num_workers=1)

for _ in range(optimizer.budget):
    x = optimizer.ask()
    loss = square(*x.args, **x.kwargs)
    optimizer.tell(x, loss)

recommendation = optimizer.provide_recommendation()
print(recommendation.value)

Please make sure that your function returns a float, and that you indeed want to perform minimization and not maximization ;)

Choosing an optimizer

ng.optimizers.registry is a dict of all optimizers, so you ng.optimizers.OnePlusOne is equivalent to ng.optimizers.registry["OnePlusOne"]. Also, you can print the full list of optimizers with:

import nevergrad as ng
print(sorted(ng.optimizers.registry.keys()))

All algorithms have strengths and weaknesses. Questionable rules of thumb could be:

  • NGOpt is “meta”-optimizer which adapts to the provided settings (budget, number of workers, parametrization) and should therefore be a good default.

  • TwoPointsDE is excellent in many cases, including very high num_workers.

  • PortfolioDiscreteOnePlusOne is excellent in discrete settings of mixed settings when high precision on parameters is not relevant; it’s possibly a good choice for hyperparameter choice.

  • OnePlusOne is a simple robust method for continuous parameters with num_workers < 8.

  • CMA is excellent for control (e.g. neurocontrol) when the environment is not very noisy (num_workers ~50 ok) and when the budget is large (e.g. 1000 x the dimension).

  • TBPSA is excellent for problems corrupted by noise, in particular overparameterized (neural) ones; very high num_workers ok).

  • PSO is excellent in terms of robustness, high num_workers ok.

  • ScrHammersleySearchPlusMiddlePoint is excellent for super parallel cases (fully one-shot, i.e. num_workers = budget included) or for very multimodal cases (such as some of our MLDA problems); don’t use softmax with this optimizer.

  • RandomSearch is the classical random search baseline; don’t use softmax with this optimizer.

Telling non-asked points, or suggesting points

There are two ways to inoculate information you already have about some points:

  • optimizer.sugggest(*args, **kwargs): after suggesting a point, the next ask will be a point with the provided value.

  • candidate = optimizer.parametrization.spawn_child(new_value=your_value) which you can then use to tell the optimizer with the corresponding loss.

Examples:

  • parametrized with an Array:

optim = ng.optimizers.OnePlusOne(parametrization=2, budget=100)
optim.suggest([12, 12])
candidate = optim.ask()
# equivalent to:
candidate = optim.parametrization.spawn_child(new_value=[12, 12])
# you can then use to tell the loss
optim.tell(candidate, 2.0)
  • parametrized with an ng.p.Instrumentation

param = ng.p.Instrumentation(ng.p.Choice(["a", "b", "c"]), lr=ng.p.Log(lower=0.001, upper=1.0))
optim = ng.optimizers.OnePlusOne(parametrization=param, budget=100)
optim.suggest("c", lr=0.02)
candidate = optim.ask()
# equivalent to:
candidate = optim.parametrization.spawn_child(new_value=(("c",), {"lr": 0.02}))
# you can then use to tell the loss
optim.tell(candidate, 2.0)

Note: some optimizers do not support such inoculation. Those will raise a TellNotAskedNotSupportedError.

Adding callbacks

You can add callbacks to the ask and tell methods through the register_callback method. The functions/callbacks registered on ask must have signature callback (optimizer) and functions registered on tell must have signature function(optimizer, candidate, value).

The example below shows a callback which prints candidate and value on tell:

import nevergrad as ng

def my_function(x):
    return abs(sum(x - 1))

def print_candidate_and_value(optimizer, candidate, value):
    print(candidate, value)

optimizer = ng.optimizers.OnePlusOne(parametrization=2, budget=4)
optimizer.register_callback("tell", print_candidate_and_value)
optimizer.minimize(my_function)  # triggers a print at each tell within minimize

Two callbacks are available through ng.callbacks, see the callbacks module documentation.

Optimization with constraints

Nevergrad has a mechanism for cheap constraints. “Cheap” means that we do not try to reduce the number of calls to such constraints. We basically repeat mutations until we get a satisfiable point. Let us say that we want to minimize (x[0]-.5)**2 + (x[1]-.5)**2 under the constraint x[0] >= 1.

import nevergrad as ng

def square(x):
    return sum((x - .5)**2)

optimizer = ng.optimizers.OnePlusOne(parametrization=2, budget=100)
# define a constraint on first variable of x:
optimizer.parametrization.register_cheap_constraint(lambda x: x[0] >= 1)

recommendation = optimizer.minimize(square)
print(recommendation.value)
# >>> [1.00037625, 0.50683314]

Optimizing machine learning hyperparameters

When optimizing hyperparameters as e.g. in machine learning. If you don’t know what variables (see Parametrization to use:

  • use Choice for discrete variables

  • use TwoPointsDE with num_workers equal to the number of workers available to you. See the machine learning examples for more.

Or if you want something more aimed at robustly outperforming random search in highly parallel settings (one-shot):

  • use TransitionChoice for discrete variables, taking care that the default value is in the middle.

  • Use ScrHammersleySearchPlusMiddlePoint (PlusMiddlePoint only if you have continuous parameters or good default values for discrete parameters).

Example of chaining, or inoculation, or initialization of an evolutionary algorithm

Chaining consists in running several algorithms in turn, information being forwarded from the first to the second and so on. More precisely, the budget is distributed over several algorithms, and when an objective function value is computed, all algorithms are informed.

Here is how to create such optimizers:

# Running LHSSearch with budget num_workers and then DE:
DEwithLHS = Chaining([LHSSearch, DE], ["num_workers"])

# Runninng LHSSearch with budget the dimension and then DE:
DEwithLHSdim = Chaining([LHSSearch, DE], ["dimension"])

# Runnning LHSSearch with budget 30 and then DE:
DEwithLHS30 = Chaining([LHSSearch, DE], [30])

# Running LHS for 100 iterations, then DE for 60, then CMA:
LHSthenDEthenCMA = Chaining([LHSSearch, DE, CMA], [100, 60])

We can then minimize as usual:

import nevergrad as ng

def square(x):
    return sum((x - .5)**2)

optimizer = DEwithLHS30(parametrization=2, budget=300)
recommendation = optimizer.minimize(square)
print(recommendation.value)
>>> [0.50843113, 0.5104554]

Multiobjective minimization with Nevergrad

Multiobjective minimization is a work in progress in nevergrad. It is:

  • not stable: the API may be updated at any time, hopefully to make it simpler and more intuitive.

  • not robust: there are probably corner cases we have not investigated yet.

  • not scalable: it is not yet clear how the current version will work with large number of losses, or large budget. For now the features have been implemented without time complexity considerations.

  • not optimal: this currently transforms multiobjective functions into monoobjective functions, hence losing some structure and making the function dynamic, which some optimizers are not designed to work on.

In other words, use it at your own risk ;) and provide feedbacks (both positive and negative) if you have any!

The initial API that was added into nevergrad to work with multiobjective functions uses a function wrapper to convert them into monoobjective functions. Let us minimize f1 and f2 (two objective functions) assuming that values above 2.5 are of no interest:

import nevergrad as ng
from nevergrad.functions import MultiobjectiveFunction
import numpy as np

f = MultiobjectiveFunction(multiobjective_function=lambda x: [np.sum(x**2), np.sum((x - 1)**2)], upper_bounds=[2.5, 2.5])
print(f(np.array([1.0, 2.0])))

optimizer = ng.optimizers.CMA(parametrization=3, budget=100)  # 3 is the dimension, 100 is the budget.
recommendation = optimizer.minimize(f)

# The function embeds its Pareto-front:
print("My Pareto front:", [x[0][0] for x in f.pareto_front()])

# It can also provide a subset:
print("My Pareto front:", [x[0][0] for x in f.pareto_front(2, subset="random")])
print("My Pareto front:", [x[0][0] for x in f.pareto_front(2, subset="loss-covering")])
print("My Pareto front:", [x[0][0] for x in f.pareto_front(2, subset="domain-covering")])

We are currently working on an new experimental API allowing users to directly tell the results as an array or list of floats. When this API is stabilized and proved to work, it will probably replace the older one. Here is an example on how to use it:

import nevergrad as ng
import numpy as np

def multiobjective(x):
    return [np.sum(x**2), np.sum((x - 1)**2)]

print("Example: ", multiobjective(np.array([1.0, 2.0, 0])))
# >>> Example: [5.0, 2.0]

optimizer = ng.optimizers.CMA(parametrization=3, budget=100)

# it's not strictly necessary but highly advised to provide an
# upper bound reference for the losses (if not provided, such upper
# bound is automatically inferred with the first few "tell")
optimizer.tell(ng.p.MultiobjectiveReference(), [5, 5])
# note that you can provide a Parameter to MultiobjectiveReference,
# which will be passed to the optimizer

optimizer.minimize(multiobjective, verbosity=2)

# The function embeds its Pareto-front:
print("Pareto front:")
for param in sorted(optimizer.pareto_front(), key=lambda p: p.losses[0]):
    print(f"{param} with losses {param.losses}")

# >>> Array{(3,)}:[0. 0. 0.] with loss [0. 3.]
#     Array{(3,)}:[0.39480968 0.98105712 0.55785803] with loss [1.42955333 0.56210368]
#     Array{(3,)}:[1.09901515 0.97673712 0.97153943] with loss [3.10573857 0.01115516]

# It can also provide subsets:
print("Random subset:", optimizer.pareto_front(2, subset="random"))
print("Loss-covering subset:", optimizer.pareto_front(2, subset="loss-covering"))
print("Domain-covering subset:", optimizer.pareto_front(2, subset="domain-covering"))
print("EPS subset:", optimizer.pareto_front(2, subset="EPS"))

Note that DE and its variants have been updated to make use of the multi-objective losses [#789](https://github.com/facebookresearch/nevergrad/pull/789). This is a preliminary fix since the initial DE implementaton was ill-suited for this use case.

Reproducibility

Each parametrization has its own random_state for generating random numbers. All optimizers pull from it when they require stochastic behaviors. For reproducibility, this random state can be seeded in two ways:

  • by setting numpy’s global random state seed (np.random.seed(32)) before the parametrization’s first use. Indeed, when first used, the parametrization’s random state is seeded with a seed drawn from the global random state.

  • by manually seeding the parametrization random state (E.g.: parametrization.random_state.seed(12) or optimizer.parametrization.random_state = np.random.RandomState(12))