# How to perform optimization¶

By default, all optimizers assume a centered and reduced prior at the beginning of the optimization (i.e. 0 mean and unitary standard deviation). They are however able to find solutions far from this initial prior.

## Basic example¶

Minimizing a function using an optimizer (here `OnePlusOne`) can be easily run with:

```import nevergrad as ng

def square(x, y=12):
return sum((x - .5)**2) + abs(y)

# optimization on x as an array of shape (2,)
optimizer = ng.optimizers.OnePlusOne(parametrization=2, budget=100)
recommendation = optimizer.minimize(square)  # best value
print(recommendation.value)
# >>> [0.49971112 0.5002944 ]
```

`parametrization=n` is a shortcut to state that the function has only one variable, of dimension `n`, See the Parametrization section for more complex parametrizations.

`parametrization=n` is a shortcut to state that the function has only one variable, continuous, of dimension `n`, Defining the following parametrization instead will optimize on both `x` (continuous, dimension 2) and `y` (continuous, dimension 1).

```instrum = ng.p.Instrumentation(ng.p.Array(shape=(2,)), y=ng.p.Scalar())
optimizer = ng.optimizers.OnePlusOne(parametrization=instrum, budget=100)
recommendation = optimizer.minimize(square)
print(recommendation.value)
# >>> ((array([0.52213095, 0.45030925]),), {'y': -0.0003603100877068604})
```

We can work in the discrete case as well, e.g. with the one-max function applied on `{0,1,2,3,4,5,6}^10`:

```import nevergrad as ng

def onemax(*x):
return len(x) - x.count(1)

# Discrete, ordered
variables = list(ng.p.TransitionChoice(list(range(7))) for _ in range(10))
instrum = ng.p.Instrumentation(*variables)
optimizer = ng.optimizers.DiscreteOnePlusOne(parametrization=instrum, budget=100, num_workers=1)

recommendation = optimizer.provide_recommendation()
for _ in range(optimizer.budget):
loss = onemax(*x.args, **x.kwargs)
optimizer.tell(x, loss)

recommendation = optimizer.provide_recommendation()
print(recommendation.value)
# >>> ((1, 1, 0, 1, 1, 4, 1, 1, 1, 1), {})
print(recommendation.args)
# >>> (1, 1, 0, 1, 1, 4, 1, 1, 1, 1)
```

## Using several workers¶

Running the function evaluation in parallel with several workers is as easy as providing an executor:

```from concurrent import futures
optimizer = ng.optimizers.OnePlusOne(parametrization=instrum, budget=10, num_workers=2)

recommendation = optimizer.minimize(square, executor=executor, batch_mode=False)
```

With `batch_mode=True` it will ask the optimizer for `num_workers` points to evaluate, run the evaluations, then update the optimizer with the `num_workers` function outputs, and repeat until the budget is all spent. Since no executor is provided, the evaluations will be sequential. `num_workers > 1` with no executor is therefore suboptimal but nonetheless useful for evaluation purpose (i.e. we simulate parallelism but have no actual parallelism). `batch_mode=False` (steady state mode) will ask for a new evaluation whenever a worker is ready.

An ask and tell interface is also available. The 3 key methods for this interface are respectively:

• `ask`: suggest a candidate on which to evaluate the function to optimize.

• `tell`: for updated the optimizer with the value of the function for a candidate.

• `provide_recommendation`: returns the candidate the algorithms considers the best.

For most optimization algorithms in the platform, they can be called in arbitrary order - asynchronous optimization is OK. Some algorithms (with class attribute `no_parallelization=True` however do not support this.

The `Parameter` class holds attribute `value` which contain the actual value to evaluate through the function.

Here is a simpler example in the sequential case (this is what happens in the `optimize` method for `num_workers=1`):

```import nevergrad as ng

def square(x, y=12):
return sum((x - .5)**2) + abs(y)

instrum = ng.p.Instrumentation(ng.p.Array(shape=(2,)), y=ng.p.Scalar())  # We are working on R^2 x R.
optimizer = ng.optimizers.OnePlusOne(parametrization=instrum, budget=100, num_workers=1)

for _ in range(optimizer.budget):
loss = square(*x.args, **x.kwargs)
optimizer.tell(x, loss)

recommendation = optimizer.provide_recommendation()
print(recommendation.value)
```

Please make sure that your function returns a float, and that you indeed want to perform minimization and not maximization ;)

## Choosing an optimizer¶

`ng.optimizers.registry` is a `dict` of all optimizers, so you `ng.optimizers.OnePlusOne` is equivalent to `ng.optimizers.registry["OnePlusOne"]`. Also, you can print the full list of optimizers with:

```import nevergrad as ng
print(sorted(ng.optimizers.registry.keys()))
```

All algorithms have strengths and weaknesses. Questionable rules of thumb could be:

• `NGOpt` is “meta”-optimizer which adapts to the provided settings (budget, number of workers, parametrization) and should therefore be a good default.

• `TwoPointsDE` is excellent in many cases, including very high `num_workers`.

• `PortfolioDiscreteOnePlusOne` is excellent in discrete settings of mixed settings when high precision on parameters is not relevant; it’s possibly a good choice for hyperparameter choice.

• `OnePlusOne` is a simple robust method for continuous parameters with `num_workers` < 8.

• `CMA` is excellent for control (e.g. neurocontrol) when the environment is not very noisy (num_workers ~50 ok) and when the budget is large (e.g. 1000 x the dimension).

• `TBPSA` is excellent for problems corrupted by noise, in particular overparameterized (neural) ones; very high `num_workers` ok).

• `PSO` is excellent in terms of robustness, high `num_workers` ok.

• `ScrHammersleySearchPlusMiddlePoint` is excellent for super parallel cases (fully one-shot, i.e. `num_workers` = budget included) or for very multimodal cases (such as some of our MLDA problems); don’t use softmax with this optimizer.

• `RandomSearch` is the classical random search baseline; don’t use softmax with this optimizer.

## Telling non-asked points, or suggesting points¶

There are two ways to inoculate information you already have about some points:

• `optimizer.sugggest(*args, **kwargs)`: after suggesting a point, the next `ask` will be a point with the provided `value`.

• `candidate = optimizer.parametrization.spawn_child(new_value=your_value)` which you can then use to `tell` the optimizer with the corresponding loss.

Examples:

• parametrized with an `Array`:

```optim = ng.optimizers.OnePlusOne(parametrization=2, budget=100)
optim.suggest([12, 12])
# equivalent to:
candidate = optim.parametrization.spawn_child(new_value=[12, 12])
# you can then use to tell the loss
optim.tell(candidate, 2.0)
```
• parametrized with an `ng.p.Instrumentation`

```param = ng.p.Instrumentation(ng.p.Choice(["a", "b", "c"]), lr=ng.p.Log(lower=0.001, upper=1.0))
optim = ng.optimizers.OnePlusOne(parametrization=param, budget=100)
optim.suggest("c", lr=0.02)
# equivalent to:
candidate = optim.parametrization.spawn_child(new_value=(("c",), {"lr": 0.02}))
# you can then use to tell the loss
optim.tell(candidate, 2.0)
```

Note: some optimizers do not support such inoculation. Those will raise a `TellNotAskedNotSupportedError`.

You can add callbacks to the `ask` and `tell` methods through the `register_callback` method. The functions/callbacks registered on `ask` must have signature `callback (optimizer)` and functions registered on `tell` must have signature `function(optimizer, candidate, value)`.

The example below shows a callback which prints `candidate` and `value` on `tell`:

```import nevergrad as ng

def my_function(x):
return abs(sum(x - 1))

def print_candidate_and_value(optimizer, candidate, value):
print(candidate, value)

optimizer = ng.optimizers.OnePlusOne(parametrization=2, budget=4)
optimizer.register_callback("tell", print_candidate_and_value)
optimizer.minimize(my_function)  # triggers a print at each tell within minimize
```

Two callbacks are available through `ng.callbacks`, see the callbacks module documentation.

## Optimization with constraints¶

Nevergrad has a mechanism for cheap constraints. “Cheap” means that we do not try to reduce the number of calls to such constraints. We basically repeat mutations until we get a satisfiable point. Let us say that we want to minimize `(x-.5)**2 + (x-.5)**2` under the constraint `x >= 1`.

```import nevergrad as ng

def square(x):
return sum((x - .5)**2)

optimizer = ng.optimizers.OnePlusOne(parametrization=2, budget=100)
# define a constraint on first variable of x:
optimizer.parametrization.register_cheap_constraint(lambda x: x >= 1)

recommendation = optimizer.minimize(square)
print(recommendation.value)
# >>> [1.00037625, 0.50683314]
```

## Optimizing machine learning hyperparameters¶

When optimizing hyperparameters as e.g. in machine learning. If you don’t know what variables (see Parametrization to use:

• use `Choice` for discrete variables

• use `TwoPointsDE` with `num_workers` equal to the number of workers available to you. See the machine learning examples for more.

Or if you want something more aimed at robustly outperforming random search in highly parallel settings (one-shot):

• use `TransitionChoice` for discrete variables, taking care that the default value is in the middle.

• Use `ScrHammersleySearchPlusMiddlePoint` (`PlusMiddlePoint` only if you have continuous parameters or good default values for discrete parameters).

## Example of chaining, or inoculation, or initialization of an evolutionary algorithm¶

Chaining consists in running several algorithms in turn, information being forwarded from the first to the second and so on. More precisely, the budget is distributed over several algorithms, and when an objective function value is computed, all algorithms are informed.

Here is how to create such optimizers:

```# Running LHSSearch with budget num_workers and then DE:
DEwithLHS = Chaining([LHSSearch, DE], ["num_workers"])

# Runninng LHSSearch with budget the dimension and then DE:
DEwithLHSdim = Chaining([LHSSearch, DE], ["dimension"])

# Runnning LHSSearch with budget 30 and then DE:
DEwithLHS30 = Chaining([LHSSearch, DE], )

# Running LHS for 100 iterations, then DE for 60, then CMA:
LHSthenDEthenCMA = Chaining([LHSSearch, DE, CMA], [100, 60])
```

We can then minimize as usual:

```import nevergrad as ng

def square(x):
return sum((x - .5)**2)

optimizer = DEwithLHS30(parametrization=2, budget=300)
recommendation = optimizer.minimize(square)
print(recommendation.value)
>>> [0.50843113, 0.5104554]
```

Multiobjective minimization is a work in progress in `nevergrad`. It is:

• not stable: the API may be updated at any time, hopefully to make it simpler and more intuitive.

• not robust: there are probably corner cases we have not investigated yet.

• not scalable: it is not yet clear how the current version will work with large number of losses, or large budget. For now the features have been implemented without time complexity considerations.

• not optimal: this currently transforms multiobjective functions into monoobjective functions, hence losing some structure and making the function dynamic, which some optimizers are not designed to work on.

In other words, use it at your own risk ;) and provide feedbacks (both positive and negative) if you have any!

The initial API that was added into `nevergrad` to work with multiobjective functions uses a function wrapper to convert them into monoobjective functions. Let us minimize `f1` and `f2` (two objective functions) assuming that values above 2.5 are of no interest:

```import nevergrad as ng
import numpy as np

f = MultiobjectiveFunction(multiobjective_function=lambda x: [np.sum(x**2), np.sum((x - 1)**2)], upper_bounds=[2.5, 2.5])
print(f(np.array([1.0, 2.0])))

optimizer = ng.optimizers.CMA(parametrization=3, budget=100)  # 3 is the dimension, 100 is the budget.
recommendation = optimizer.minimize(f)

# The function embeds its Pareto-front:
print("My Pareto front:", [x for x in f.pareto_front()])

# It can also provide a subset:
print("My Pareto front:", [x for x in f.pareto_front(2, subset="random")])
print("My Pareto front:", [x for x in f.pareto_front(2, subset="loss-covering")])
print("My Pareto front:", [x for x in f.pareto_front(2, subset="domain-covering")])
```

We are currently working on an new experimental API allowing users to directly `tell` the results as an array or list of floats. When this API is stabilized and proved to work, it will probably replace the older one. Here is an example on how to use it:

```import nevergrad as ng
import numpy as np

def multiobjective(x):
return [np.sum(x**2), np.sum((x - 1)**2)]

print("Example: ", multiobjective(np.array([1.0, 2.0, 0])))
# >>> Example: [5.0, 2.0]

optimizer = ng.optimizers.CMA(parametrization=3, budget=100)

# it's not strictly necessary but highly advised to provide an
# upper bound reference for the losses (if not provided, such upper
# bound is automatically inferred with the first few "tell")
optimizer.tell(ng.p.MultiobjectiveReference(), [5, 5])
# note that you can provide a Parameter to MultiobjectiveReference,
# which will be passed to the optimizer

optimizer.minimize(multiobjective, verbosity=2)

# The function embeds its Pareto-front:
print("Pareto front:")
for param in sorted(optimizer.pareto_front(), key=lambda p: p.losses):
print(f"{param} with losses {param.losses}")

# >>> Array{(3,)}:[0. 0. 0.] with loss [0. 3.]
#     Array{(3,)}:[0.39480968 0.98105712 0.55785803] with loss [1.42955333 0.56210368]
#     Array{(3,)}:[1.09901515 0.97673712 0.97153943] with loss [3.10573857 0.01115516]

# It can also provide subsets:
print("Random subset:", optimizer.pareto_front(2, subset="random"))
print("Loss-covering subset:", optimizer.pareto_front(2, subset="loss-covering"))
print("Domain-covering subset:", optimizer.pareto_front(2, subset="domain-covering"))
print("EPS subset:", optimizer.pareto_front(2, subset="EPS"))

```

Note that DE and its variants have been updated to make use of the multi-objective losses [#789](https://github.com/facebookresearch/nevergrad/pull/789). This is a preliminary fix since the initial DE implementaton was ill-suited for this use case.

## Reproducibility¶

Each parametrization has its own `random_state` for generating random numbers. All optimizers pull from it when they require stochastic behaviors. For reproducibility, this random state can be seeded in two ways:

• by setting `numpy`’s global random state seed (`np.random.seed(32)`) before the parametrization’s first use. Indeed, when first used, the parametrization’s random state is seeded with a seed drawn from the global random state.

• by manually seeding the parametrization random state (E.g.: `parametrization.random_state.seed(12)` or `optimizer.parametrization.random_state = np.random.RandomState(12)`)