How to perform optimization

By default, all optimizers assume a centered and reduced prior at the beginning of the optimization (i.e. 0 mean and unitary standard deviation). They are however able to find solutions far from this initial prior.

Basic example

Minimizing a function using an optimizer (here NGOpt, our adaptative optimization algorithm) can be easily run with:

import nevergrad as ng

def square(x, y=12):
    return sum((x - 0.5) ** 2) + abs(y)

# optimization on x as an array of shape (2,)
optimizer = ng.optimizers.NGOpt(parametrization=2, budget=100)
recommendation = optimizer.minimize(square)  # best value
print(recommendation.value)
# >>> [0.49971112 0.5002944 ]

parametrization=n is a shortcut to state that the function has only one variable, continuous, of dimension n: ng.p.Array(shape=(n,)).

Important: Make sure to check the Parametrization section for more complex parametrizations examples, and Parametrization API section for the full list of options. Below are a few more advanced cases.

Defining the parametrization (instrum) as follows in the code sample will instead optimize on both x (continuous, dimension 2, bounded between -12 and 12) and y (continuous, dimension 1).

instrum = ng.p.Instrumentation(
    ng.p.Array(shape=(2,)).set_bounds(lower=-12, upper=12),
    y=ng.p.Scalar()
)
optimizer = ng.optimizers.NGOpt(parametrization=instrum, budget=100)
recommendation = optimizer.minimize(square)
print(recommendation.value)
# >>> ((array([0.52213095, 0.45030925]),), {'y': -0.0003603100877068604})

We can work in the discrete case as well, e.g. with the one-max function applied on {0,1,2,3,4,5,6}^10:

import nevergrad as ng

def onemax(x):
    return len(x) - x.count(1)

# Discrete, ordered
param = ng.p.TransitionChoice(range(7), repetitions=10)
optimizer = ng.optimizers.DiscreteOnePlusOne(parametrization=param, budget=100, num_workers=1)

recommendation = optimizer.provide_recommendation()
for _ in range(optimizer.budget):
    x = optimizer.ask()
    # loss = onemax(*x.args, **x.kwargs)  # equivalent to x.value if not using Instrumentation
    loss = onemax(x.value)
    optimizer.tell(x, loss)

recommendation = optimizer.provide_recommendation()
print(recommendation.value)
# >>> (1, 1, 0, 1, 1, 4, 1, 1, 1, 1)

Using several workers

Running the function evaluation in parallel with several workers is as easy as providing an executor:

from concurrent import futures

optimizer = ng.optimizers.NGOpt(parametrization=instrum, budget=10, num_workers=2)

with futures.ThreadPoolExecutor(max_workers=optimizer.num_workers) as executor:
    recommendation = optimizer.minimize(square, executor=executor, batch_mode=False)

With batch_mode=True it will ask the optimizer for num_workers points to evaluate, run the evaluations, then update the optimizer with the num_workers function outputs, and repeat until the budget is all spent. Since no executor is provided, the evaluations will be sequential. num_workers > 1 with no executor is therefore suboptimal but nonetheless useful for evaluation purpose (i.e. we simulate parallelism but have no actual parallelism). batch_mode=False (steady state mode) will ask for a new evaluation whenever a worker is ready.

Ask and tell interface

An ask and tell interface is also available. The 3 key methods for this interface are respectively:

  • ask: suggest a candidate on which to evaluate the function to optimize.

  • tell: for updated the optimizer with the value of the function for a candidate.

  • provide_recommendation: returns the candidate the algorithms considers the best.

For most optimization algorithms in the platform, they can be called in arbitrary order - asynchronous optimization is OK. Some algorithms (with class attribute no_parallelization=True however do not support this.

The Parameter class holds attribute value which contain the actual value to evaluate through the function.

Here is a simpler example in the sequential case (this is what happens in the optimize method for num_workers=1):

import nevergrad as ng

def square(x, y=12):
    return sum((x - 0.5) ** 2) + abs(y)

instrum = ng.p.Instrumentation(ng.p.Array(shape=(2,)), y=ng.p.Scalar())  # We are working on R^2 x R.
optimizer = ng.optimizers.NGOpt(parametrization=instrum, budget=100, num_workers=1)

for _ in range(optimizer.budget):
    x = optimizer.ask()
    loss = square(*x.args, **x.kwargs)
    optimizer.tell(x, loss)

recommendation = optimizer.provide_recommendation()
print(recommendation.value)

Please make sure that your function returns a float, and that you indeed want to perform minimization and not maximization ;)

Choosing an optimizer

ng.optimizers.registry is a dict of all optimizers, so you ng.optimizers.NGOpt is equivalent to ng.optimizers.registry["NGOpt"]. Also, you can print the full list of optimizers with:

import nevergrad as ng

print(sorted(ng.optimizers.registry.keys()))

All algorithms have strengths and weaknesses. Questionable rules of thumb could be:

  • NGOpt is “meta”-optimizer which adapts to the provided settings (budget, number of workers, parametrization) and should therefore be a good default.

  • TwoPointsDE is excellent in many cases, including very high num_workers.

  • PortfolioDiscreteOnePlusOne is excellent in discrete settings of mixed settings when high precision on parameters is not relevant; it’s possibly a good choice for hyperparameter choice.

  • OnePlusOne is a simple robust method for continuous parameters with num_workers < 8.

  • CMA is excellent for control (e.g. neurocontrol) when the environment is not very noisy (num_workers ~50 ok) and when the budget is large (e.g. 1000 x the dimension).

  • TBPSA is excellent for problems corrupted by noise, in particular overparameterized (neural) ones; very high num_workers ok).

  • PSO is excellent in terms of robustness, high num_workers ok.

  • ScrHammersleySearchPlusMiddlePoint is excellent for super parallel cases (fully one-shot, i.e. num_workers = budget included) or for very multimodal cases (such as some of our MLDA problems); don’t use softmax with this optimizer.

  • RandomSearch is the classical random search baseline; don’t use softmax with this optimizer.

Telling non-asked points, or suggesting points

There are two ways to inoculate information you already have about some points:

  • optimizer.sugggest(*args, **kwargs): after suggesting a point, the next ask will be a point with the provided inputs. Make sure you call optimizer.suggest the same way (= with the same arguments) that you would call your function to optimize.

  • candidate = optimizer.parametrization.spawn_child(new_value=your_value) which you can then use to tell the optimizer with the corresponding loss.

Examples:

  • parametrized with an ng.p.Instrumentation

param = ng.p.Instrumentation(ng.p.Choice(["a", "b", "c"]), lr=ng.p.Log(lower=0.001, upper=1.0))
optim = ng.optimizers.NGOpt(parametrization=param, budget=100)
optim.suggest("c", lr=0.02)
candidate = optim.ask()
# equivalent to:
candidate = optim.parametrization.spawn_child(new_value=(("c",), {"lr": 0.02}))
# you can then use to tell the loss
optim.tell(candidate, 2.0)
  • parametrized with an Array:

optim = ng.optimizers.NGOpt(parametrization=2, budget=100)
optim.suggest([12, 12])
candidate = optim.ask()
# equivalent to:
candidate = optim.parametrization.spawn_child(new_value=[12, 12])
# you can then use to tell the loss
optim.tell(candidate, 2.0)

Note: some optimizers do not support such inoculation. Those will raise a TellNotAskedNotSupportedError.

Adding callbacks

You can add callbacks to the ask and tell methods through the register_callback method. The functions/callbacks registered on ask must have signature callback (optimizer) and functions registered on tell must have signature function(optimizer, candidate, value).

The example below shows a callback which prints candidate and value on tell:

import nevergrad as ng

def my_function(x):
    return abs(sum(x - 1))

def print_candidate_and_value(optimizer, candidate, value):
    print(candidate, value)

optimizer = ng.optimizers.NGOpt(parametrization=2, budget=4)
optimizer.register_callback("tell", print_candidate_and_value)
optimizer.minimize(my_function)  # triggers a print at each tell within minimize

Two callbacks are available through ng.callbacks, see the callbacks module documentation.

Optimization with constraints

Nevergrad has a mechanism for cheap constraints. “Cheap” means that we do not try to reduce the number of calls to such constraints. We basically repeat mutations until we get a satisfiable point.

Let us say that we want to minimize (x[0]-.5)**2 + (x[1]-.5)**2 under the constraint x[0] >= 1.

import nevergrad as ng

def square(x):
    return sum((x - 0.5) ** 2)

optimizer = ng.optimizers.NGOpt(parametrization=2, budget=100)
# define a constraint on first variable of x:
optimizer.parametrization.register_cheap_constraint(lambda x: x[0] >= 1)

recommendation = optimizer.minimize(square, verbosity=2)
print(recommendation.value)
# >>> [1.00037625, 0.50683314]

Note that we can provide a richer information by using float-valued constraints (>= 0 if ok):

import nevergrad as ng

def square(x):
    return sum((x - 0.5) ** 2)

optimizer = ng.optimizers.NGOpt(parametrization=2, budget=100)
# define a constraint on first variable of x:
optimizer.parametrization.register_cheap_constraint(lambda x: x[0] - 1)

recommendation = optimizer.minimize(square, verbosity=2)
print(recommendation.value)
# >>> [1.00037625, 0.50683314]

Optimizing machine learning hyperparameters

When optimizing hyperparameters as e.g. in machine learning. If you don’t know what variables (see Parametrization to use:

  • use Choice for discrete variables

  • use TwoPointsDE with num_workers equal to the number of workers available to you. See the machine learning examples for more.

Or if you want something more aimed at robustly outperforming random search in highly parallel settings (one-shot):

  • use TransitionChoice for discrete variables, taking care that the default value is in the middle.

  • Use ScrHammersleySearchPlusMiddlePoint (PlusMiddlePoint only if you have continuous parameters or good default values for discrete parameters).

Example of chaining, or inoculation, or initialization of an evolutionary algorithm

Chaining consists in running several algorithms in turn, information being forwarded from the first to the second and so on. More precisely, the budget is distributed over several algorithms, and when an objective function value is computed, all algorithms are informed.

Here is how to create such optimizers:

# Running LHSSearch with budget num_workers and then DE:
DEwithLHS = Chaining([LHSSearch, DE], ["num_workers"])

# Runninng LHSSearch with budget the dimension and then DE:
DEwithLHSdim = Chaining([LHSSearch, DE], ["dimension"])

# Runnning LHSSearch with budget 30 and then DE:
DEwithLHS30 = Chaining([LHSSearch, DE], [30])

# Running LHS for 100 iterations, then DE for 60, then CMA:
LHSthenDEthenCMA = Chaining([LHSSearch, DE, CMA], [100, 60])

We can then minimize as usual:

import nevergrad as ng

def square(x):
    return sum((x - .5)**2)

optimizer = DEwithLHS30(parametrization=2, budget=300)
recommendation = optimizer.minimize(square)
print(recommendation.value)
>>> [0.50843113, 0.5104554]

Multiobjective minimization with Nevergrad

Multiobjective minimization is a work in progress in nevergrad. It is:

  • not stable: the API may be updated at any time, hopefully to make it simpler and more intuitive.

  • not robust: there are probably corner cases we have not investigated yet.

  • not scalable: it is not yet clear how the current version will work with large number of losses, or large budget. For now the features have been implemented without time complexity considerations.

  • not optimal: this currently transforms multiobjective functions into monoobjective functions, hence losing some structure and making the function dynamic, which some optimizers are not designed to work on.

In other words, use it at your own risk ;) and provide feedbacks (both positive and negative) if you have any!

To perform multiobjective optimization, you can just provide tell with the results as an array or list of floats:

import nevergrad as ng
import numpy as np

def multiobjective(x):
    return [np.sum(x ** 2), np.sum((x - 1) ** 2)]

print("Example: ", multiobjective(np.array([1.0, 2.0, 0])))
# >>> Example: [5.0, 2.0]

optimizer = ng.optimizers.CMA(parametrization=3, budget=100)

# for all but DE optimizers, deriving a volume out of the losses,
# it's not strictly necessary but highly advised to provide an
# upper bound reference for the losses (if not provided, such upper
# bound is automatically inferred with the first few "tell")
optimizer.tell(ng.p.MultiobjectiveReference(), [5, 5])
# note that you can provide a Parameter to MultiobjectiveReference,
# which will be passed to the optimizer

optimizer.minimize(multiobjective, verbosity=2)

# The function embeds its Pareto-front:
print("Pareto front:")
for param in sorted(optimizer.pareto_front(), key=lambda p: p.losses[0]):
    print(f"{param} with losses {param.losses}")

# >>> Array{(3,)}:[0. 0. 0.] with loss [0. 3.]
#     Array{(3,)}:[0.39480968 0.98105712 0.55785803] with loss [1.42955333 0.56210368]
#     Array{(3,)}:[1.09901515 0.97673712 0.97153943] with loss [3.10573857 0.01115516]

# It can also provide subsets:
print("Random subset:", optimizer.pareto_front(2, subset="random"))
print("Loss-covering subset:", optimizer.pareto_front(2, subset="loss-covering"))
print("Domain-covering subset:", optimizer.pareto_front(2, subset="domain-covering"))
print("EPS subset:", optimizer.pareto_front(2, subset="EPS"))

Currently most optimizers only derive a volume float loss from the multiobjective loss and minimize it. DE and its variants have however been updated to make use of the full multi-objective losses [#789](https://github.com/facebookresearch/nevergrad/pull/789), which make them good candidates for multi-objective minimization (NGOpt will delegate to DE in the case of multi-objective functions).

Reproducibility

Each parametrization has its own random_state for generating random numbers. All optimizers pull from it when they require stochastic behaviors. For reproducibility, this random state can be seeded in two ways:

  • by setting numpy’s global random state seed (np.random.seed(32)) before the parametrization’s first use. Indeed, when first used, the parametrization’s random state is seeded with a seed drawn from the global random state.

  • by manually seeding the parametrization random state (E.g.: parametrization.random_state.seed(12) or optimizer.parametrization.random_state = np.random.RandomState(12))