# Examples - Nevergrad for machine learning

Let us assume that you have defined an objective function as in:

```
def myfunction(lr, num_layers, arg3, arg4, other_anything):
...
return -accuracy # something to minimize
```

You should define how it must be instrumented, i.e. what are the arguments you want to optimize upon, and on which space they are defined. If you have both continuous and discrete parameters, you have a good initial guess, maybe just use `TransitionChoice`

for all discrete variables, `Array`

for all your continuous variables, and use `PortfolioDiscreteOnePlusOne`

as optimizer.

```
import nevergrad as ng
# instrument learning rate and number of layers, keep arg3 to 3 and arg4 to 4
lr = ng.p.Log(a_min=0.0001, a_max=1) # log distributed between 0.001 and 1
num_layers = ng.p.TransitionChoice([4, 5, 6])
parametrization = ng.p.Instrumentation(lr, num_layers, 3., arg4=4)
```

Make sure `parametrization.value`

holds your initial guess. It is automatically populated, but can be updated manually (just set `value`

to what you want). For more details on parametrization, see the parametrization section.

The fact that you use (ordered) discrete variables through `TransitionChoice`

is not a big deal because by nature `PortfolioDiscreteOnePlusOne`

will ignore the order. This algorithm is quite stable.

If you have more budget, a cool possibility is to use `Choice`

for all discrete variables and then apply `TwoPointsDE`

. You might also compare this to `DE`

(classical differential evolution). This might need a budget in the hundreds.

If you want to double-check that you are not worse than random search, you might use `RandomSearch`

.

If you want something fully parallel (the number of workers can be equal to the budget), then you might use `ScrHammersleySearch`

, which includes the discrete case. Then, you should use `TransitionChoice`

rather than `Choice`

. This does not have the traditional drawback of grid search and should still be more uniform than random. By nature `ScrHammersleySearch`

will deal correctly with `TransitionChoice`

type for discrete variables.

If you are optimizing weights in reinforcement learning, you might use `TBPSA`

(high noise) or `CMA`

(low noise).

Below are 3 examples :

the optimization of continuous hyperparameters. It is also presented in an asynchronous setting. All other examples are based on the ask and tell interface, which can be synchronous or not but relies on the user for setting up asynchronicity.

the optimization of mixed (continuous and discrete) hyperparameters.

the optimization of parameters in a noisy setting, typically as in reinforcement learning.

## Optimization of continuous hyperparameters with CMA, PSO, DE, Random and QuasiRandom

Let’s first define our test case:

```
import nevergrad as ng
import numpy as np
print("Optimization of continuous hyperparameters =========")
def train_and_return_test_error(x):
return np.linalg.norm([int(50. * abs(x_ - 0.2)) for x_ in x])
parametrization = ng.p.Array(shape=(300,)) # optimize on R^300
budget = 1200 # How many trainings we will do before concluding.
names = ["RandomSearch", "TwoPointsDE", "CMA", "PSO", "ScrHammersleySearch"]
```

We will compare several algorithms (defined in `names`

).
`RandomSearch`

is well known, `ScrHammersleySearch`

is a quasirandom; these two methods
are fully parallel, i.e. we can perform the 1200 trainings in parallel.
`CMA`

and `PSO`

are classical optimization algorithms, and `TwoPointsDE`

is Differential Evolution equipped with a 2-points crossover.
A complete list is available in `ng.optimizers.registry`

.

### Ask and tell version

```
for name in names:
optim = ng.optimizers.registry[name](parametrization=parametrization, budget=budget, num_workers=3)
for u in range(budget // 3):
x1 = optim.ask()
# Ask and tell can be asynchronous.
# Just be careful that you "tell" something that was asked.
# Here we ask 3 times and tell 3 times in order to fake asynchronicity
x2 = optim.ask()
x3 = optim.ask()
# The three folowing lines could be parallelized.
# We could also do things asynchronously, i.e. do one more ask
# as soon as a training is over.
y1 = train_and_return_test_error(*x1.args, **x1.kwargs) # here we only defined an arg, so we could omit kwargs
y2 = train_and_return_test_error(*x2.args, **x2.kwargs) # (keeping it here for the sake of consistency)
y3 = train_and_return_test_error(*x3.args, **x3.kwargs)
optim.tell(x1, y1)
optim.tell(x2, y2)
optim.tell(x3, y3)
recommendation = optim.recommend()
print("* ", name, " provides a vector of parameters with test error ",
train_and_return_test_error(*recommendation.args, **recommendation.kwargs))
```

### Asynchronous version with concurrent.futures

```
from concurrent import futures
for name in names:
optim = ng.optimizers.registry[name](parametrization=parametrization, budget=budget)
with futures.ThreadPoolExecutor(max_workers=optim.num_workers) as executor: # the executor will evaluate the function in multiple threads
recommendation = optim.minimize(train_and_return_test_error, executor=executor)
print("* ", name, " provides a vector of parameters with test error ",
train_and_return_test_error(recommendation))
```

## Optimization of mixed (continuous and discrete) hyperparameters

Let’s define our function:

```
import numpy as np
# Let us define a function.
def myfunction(arg1, arg2, arg3, value=3):
return np.abs(value) + (1 if arg1 != "a" else 0) + (1 if arg2 != "e" else 0)
```

This function must then be instrumented in order to let the optimizer now what are the arguments:

```
import nevergrad as ng
# argument transformation
# Optimization of mixed (continuous and discrete) hyperparameters.
arg1 = ng.p.TransitionChoice(["a", "b"]) # 1st arg. = positional discrete argument
# We apply a softmax for converting real numbers to discrete values.
arg2 = ng.p.Choice(["a", "c", "e"]) # 2nd arg. = positional discrete argument
value = ng.p.Scalar(init=1.0).set_mutation(sigma=2) # the 4th arg. is a keyword argument with Gaussian prior
# create the parametrization
# the 3rd arg. is a positional arg. which will be kept constant to "blublu"
instru = ng.p.Instrumentation(arg1, arg2, "blublu", value=value)
print(instru.dimension) # 5 dimensional space
```

The dimension is 5 because:

the 1st discrete var. has 1 possible values, represented by a hard thresholding in a 1-dimensional space, i.e. we add 1 coordinate to the continuous problem

the 2nd discrete var. has 3 possible values, represented by softmax, i.e. we add 3 coordinates to the continuous problem

the 3rd var. has no uncertainty, so it does not introduce any coordinate in the continuous problem

the 4th var. is a real number, represented by single coordinate.

```
instru.set_standardized_data([1, -80, -80, 80, 3])
print(instru.args, instru.kwargs)
>>> (('b', 'e', 'blublu'), {'value': 7.0})
myfunction(*instru.args, **instru.kwargs)
>>> 8.0
```

In this case:
- `args[0] == "b"`

because 1 > 0 (the threshold is 0 here since there are 2 values.
- `args[1] == "e"`

is selected because proba(e) = exp(80) / (exp(80) + exp(-80) + exp(-80)) = 1
- `args[2] == "blublu"`

because it is kept constant
- `value == 7`

because std * 3 + current_value = 2 * 3 + 1 = 7
The function therefore returns 7 + 1 = 8.

Then you can run the optimization as usual. `PortfolioDiscreteOnePlusOne`

is quite a natural choice when you have a good initial guess and a mix of discrete and continuous variables; in this case, it might be better to use `TransitionChoice`

rather than `Choice`

.
TwoPointsDE is often excellent in the large scale case (budget in the hundreds).

```
import nevergrad as ng
budget = 1200 # How many episode we will do before concluding.
for name in ["RandomSearch", "ScrHammersleySearch", "TwoPointsDE", "PortfolioDiscreteOnePlusOne", "CMA", "PSO"]:
optim = ng.optimizers.registry[name](parametrization=parametrization, budget=budget)
for u in range(budget // 3):
x1 = optim.ask()
# Ask and tell can be asynchronous.
# Just be careful that you "tell" something that was asked.
# Here we ask 3 times and tell 3 times in order to fake asynchronicity
x2 = optim.ask()
x3 = optim.ask()
# The three folowing lines could be parallelized.
# We could also do things asynchronously, i.e. do one more ask
# as soon as a training is over.
y1 = myfunction(*x1.args, **x1.kwargs) # here we only defined an arg, so we could omit kwargs
y2 = myfunction(*x2.args, **x2.kwargs) # (keeping it here for the sake of consistency)
y3 = myfunction(*x3.args, **x3.kwargs)
optim.tell(x1, y1)
optim.tell(x2, y2)
optim.tell(x3, y3)
recommendation = optim.recommend()
print("* ", name, " provides a vector of parameters with test error ",
myfunction(*recommendation.args, **recommendation.kwargs))
```

### Manual parametrization

You always have the possibility to define your own parametrization inside your function (not recommended):

```
def softmax(x, possible_values=None):
expx = [np.exp(x_ - max(x)) for x_ in x]
probas = [e / sum(expx) for e in expx]
return np.random.choice(len(x) if possible_values is None
else possible_values, size=1, p=probas)
def train_and_return_test_error_mixed(x):
cx = [x_ - 0.1 for x_ in x[3:]]
activation = softmax(x[:3], ["tanh", "sigmoid", "relu"])
return np.linalg.norm(cx) + (1. if activation != "tanh" else 0.)
parametrization = 10 # you can just provide the size of your input in this case
#This version is bigger.
def train_and_return_test_error_mixed(x):
cx = x[:(len(x) // 2)] # continuous part.
presoftmax_values = x[(len(x) // 2):] # discrete part.
values_for_this_softmax = []
dx = []
for g in presoftmax:
values_for_this_softmax += [g]
if len(values_for_this_softmax) > 4:
dx += softmax(values_for_this_softmax)
values_for_this_softmax = []
return np.linalg.norm([int(50. * abs(x_ - 0.2)) for x_ in cx]) + [
1 if d != 1 else 0 for d in dx]
parametrization = 300
```

## Optimization of parameters for reinforcement learning

We do not average evaluations over multiple episodes - the algorithm is in charge of averaging, if need be.
`TBPSA`

, based on population-control mechanisms, performs quite well in this case.

```
import nevergrad as ng
import numpy as np
# Similar, but with a noisy case: typically a case in which we train in reinforcement learning.
# This is about parameters rather than hyperparameters. TBPSA is a strong candidate in this case.
# We do *not* manually average over multiple evaluations; the algorithm will take care
# of averaging or reevaluate whatever it wants to reevaluate.
print("Optimization of parameters in reinforcement learning ===============")
def simulate_and_return_test_error_with_rl(x, noisy=True):
return np.linalg.norm([int(50. * abs(x_ - 0.2)) for x_ in x]) + noisy * len(x) * np.random.normal()
budget = 1200 # How many trainings we will do before concluding.
for tool in ["TwoPointsDE", "RandomSearch", "TBPSA", "CMA", "NaiveTBPSA",
"PortfolioNoisyDiscreteOnePlusOne"]:
optim = ng.optimizers.registry[tool](parametrization=300, budget=budget)
for u in range(budget // 3):
# Ask and tell can be asynchronous.
# Just be careful that you "tell" something that was asked.
# Here we ask 3 times and tell 3 times in order to fake asynchronicity
x1 = optim.ask()
x2 = optim.ask()
x3 = optim.ask()
# The three folowing lines could be parallelized.
# We could also do things asynchronously, i.e. do one more ask
# as soon as a training is over.
y1 = simulate_and_return_test_error_with_rl(*x1.args)
y2 = simulate_and_return_test_error_with_rl(*x2.args)
y3 = simulate_and_return_test_error_with_rl(*x3.args)
optim.tell(x1, y1)
optim.tell(x2, y2)
optim.tell(x3, y3)
recommendation = optim.recommend()
print("* ", tool, " provides a vector of parameters with test error ",
simulate_and_return_test_error_with_rl(*recommendation.args, noisy=False))
```

## Examples from our external users

Nevergrad is integrated in Ray/Tune, Berkeley AI Research library for parameter tuning .

Nevergrad is a plugin in Hydra Facebook’s parameter sweeping library.

Nevergrad is interfaced in IOH Profiler, a tool from Univ. Leiden, CNRS, Sorbonne univ and Tel Hai college for profiling optimization algorithms.

Nevergrad is interfaced in MixSimulator, a useful tool to get the optimal parameters for an electrical mix.