# Examples of benchmarks

The following figures are examples of algorithm benchmarks which can be generated very easily from the platform In all examples, we use independent experiments for the different x-values; so that consistent rankings between methods, over several x-values, have a statistical meaning.

If you want to run the examples yourself, please make sure you have installed `nevergrad` with the `benchmark` flag (see here).

## Noisy optimization

Created with command:

```python -m nevergrad.benchmark noise --seed=12 --repetitions=10 --plot
```

Here the variance of the noise does not vanish near the optimum. TBPSA uses the noise management principles of pcCMSA-ES reaching fast convergence rates. We here compare it to a sample of our algorithms; but it performed very well also compared to many other methods.

## One-shot optimization

### In dimension-11 with one feature

Created with command:

```python -m nevergrad.benchmark dim10_select_one_feature --seed=12 --repetitions=400 --plot
```

One-shot optimization is the case in which all evaluations should be done in parallel; the optimization algorithm can only decide, once and for all, which points are going to be evaluated. We consider here:

• an optimum which is translated by a standard centered Gaussian;

• 1 useful variable and 10 useless variables (this is a feature selection context as in https://arxiv.org/abs/1706.03200);

• the sphere function (restricted to the useful variable);

We see that:

• Quasirandom without scrambling is suboptimal;

• Cauchy sampling helps a lot (though the optimum is normally drawn!), in this feature selection context;

• LHS performs equivalently to low discrepancy (which can be related to the fact that only one feature matters).

### In dimension-12 with two features

We reproduce this experiment but with 2 useful variables:

```python -m nevergrad.benchmark dim10_select_two_features --seed=12 --repetitions=400 --plot
```

LHS still performs very well, as well as scrambled methods; Cauchy is not that useful anymore.

### In dimension-10 with small budget

With all variables useful, the situation becomes different; Cauchy is harmful. Scrambling is still very necessary. LHS (vanilla), which does not couple variables, is weak.

```python -m nevergrad.benchmark dim10_smallbudget --seed=12 --repetitions=400 --plot
```

### In dimension-4

In moderate dimension, scrambling is less necessary (consistently with theory) and LHS becomes weaker as budget increases (consistently with discrepancy results in https://arxiv.org/abs/1707.08481). The following plot was created with command:

```python -m nevergrad.benchmark doe_dim4 --seed=12 --repetitions=400 --plot
```

## Comparison-based methods for ill-conditioned problems

In this setting (rotated or not, ill-conditioned) we get excellent results with:

```python -m nevergrad.benchmark compabasedillcond --seed=12 --repetitions=400 --plot
```

## Ill-conditioned function

SQP (which won the BBComp GECCO 2015 contest) performs great in the quadratic case, consistently with theory and intuition:

```python -m nevergrad.benchmark illcond --seed=12 --repetitions=50 --plot
```

## Discrete

The platform can also deal with discrete objective functions! We can both consider discrete domains handled through softmax or through discretization of continuous variables.

```python -m nevergrad.benchmark discrete --seed=12 --repetitions=10 --plot
```

We note that FastGA performs best. DoubleFastGA corresponds to a mutation rate ranging between 1/dim and (dim-1)/dim instead of 1/dim and 1/2; this is because the original range corresponds to a binary domain whereas we consider arbitrary domains. The simple uniform mixing of mutation rates (https://arxiv.org/abs/1606.05551) performs well in several cases.

## List of benchmarks

You can find a list of currently available benchmarks below. Most are not well-documented, please open an issue when you need more information and we’ll update the documentation on demand ;)

Test settings

All optimizers on ill cond problems

All optimizers on ill cond problems

All optimizers on ill cond problems

Comparing one-shot optimizers as initializers for Bayesian Optimization.

Test settings

Pretrained ResNes50 under black-box attacked. Square attacks: 100 queries ==> 0.1743119266055046 200 queries ==> 0.09043250327653997 300 queries ==> 0.05111402359108781 400 queries ==> 0.04325032765399738 1700 queries ==> 0.001310615989515072

All DE methods on various functions. Dimension 5, 20, 100. Sphere, Cigar, Hm, Ellipsoid. Budget 10, 100, 1000, 10000, 100000.

FAO Crop simulator. Maximize yield.

Finding the best causal graph

Counterpart of simple_tsp with non-planar term.

Many optimizers on ill cond problems with constraints.

MuJoCo testbed. Learn linear policy for different control problems. Budget 500, 1000, 3000, 5000.

Very difficult objective functions: one is highly multimodal (infinitely many local optima), one has an infinite condition number, one has an infinitely long path towards the optimum. Looks somehow fractal.

One shot optimization of 3 classical objective functions (sphere, rastrigin, cigar), simplified. Base dimension 2000 or 20000. No rotation, no dummy variable. Budget 30, 100, 3000, 10000, 30000, 100000.

Optimization of policies for the 007 game. Sequential or 10-parallel or 100-parallel. Various numbers of averagings: 1, 10 or 100.

Lotka-Volterra equations

Five-shots optimization of 3 classical objective functions (sphere, rastrigin, cigar). Base dimension 3 or 25. 0 or 5 dummy variable per real variable. Budget 30, 100 or 3000.

Parallel optimization on 4 classical objective functions. More distinct settings than << parallel >>.

All Bayesian optimization methods on various functions. Budget 25, 31, 37, 43, 50, 60. Dimension 20. Sphere, Cigar, Hm, Ellipsoid.

Experiment on multimodal functions, namely hm, rastrigin, griewank, rosenbrock, ackley, lunacek, deceptivemultimodal. Similar to multimodal, but dimension 20 or 100 or 1000. Budget 1000 or 10000, sequential.

Testing optimizers on ill cond problems. Cigar, Ellipsoid. Both rotated and unrotated. Budget 100, 1000, 10000. Dimension 50.

Testing optimizers on ill-conditionned parallel optimization. 50 workers in parallel.

nevergrad.benchmark.experiments.image_multi_similarity(seed: Optional[int] = None, cross_valid: bool = False, with_pgan: bool = False) Iterator[Experiment]

Optimizing images: artificial criterion for now.

Counterpart of image_multi_similarity with cross-validation.

Counterpart of image_similarity, using PGan as a representation.

Counterpart of image_multi_similarity with cross-validation.

nevergrad.benchmark.experiments.image_quality(seed: Optional[int] = None, cross_val: bool = False, with_pgan: bool = False, num_images: int = 1) Iterator[Experiment]

Optimizing images for quality: we optimize K512, Blur and Brisque.

With num_images > 1, we are doing morphing.

Counterpart of image_quality with cross-validation.

Counterpart of image_quality with cross-validation.

Counterpart of image_quality with cross-validation.

nevergrad.benchmark.experiments.image_quality_proxy(seed: Optional[int] = None, with_pgan: bool = False) Iterator[Experiment]

Optimizing images: artificial criterion for now.

nevergrad.benchmark.experiments.image_similarity(seed: Optional[int] = None, with_pgan: bool = False, similarity: bool = True) Iterator[Experiment]

Optimizing images: artificial criterion for now.

nevergrad.benchmark.experiments.image_similarity_and_quality(seed: Optional[int] = None, cross_val: bool = False, with_pgan: bool = False) Iterator[Experiment]

Optimizing images: artificial criterion for now.

Counterpart of image_similarity_and_quality with cross-validation.

Counterpart of image_similarity_and_quality with cross-validation.

Counterpart of image_similarity_and_quality with cross-validation.

Counterpart of image_similarity, using PGan as a representation.

Counterpart of image_similarity, but based on image quality assessment.

Counterpart of image_similarity_pgan, but based on image quality assessment.

Comparison of optimization algorithms equipped with distinct instrumentations. Onemax, Leadingones, Jump function.

nevergrad.benchmark.experiments.keras_tuning(seed: Optional[int] = None, overfitter: bool = False, seq: bool = False, veryseq: bool = False) Iterator[Experiment]

Machine learning hyperparameter tuning experiment. Based on Keras models.

MixSimulator of power plants Budget 20, 40, …, 1600. Sequential or 30 workers.

MLDA (machine learning and data analysis) testbed.

MLDA (machine learning and data analysis) testbed, restricted to the K-means part.

nevergrad.benchmark.experiments.mltuning(seed: Optional[int] = None, overfitter: bool = False, seq: bool = False, veryseq: bool = False, nano: bool = False) Iterator[Experiment]

Machine learning hyperparameter tuning experiment. Based on scikit models.

Sequential counterpart of the rocket problem.

Testing optimizers on exponentiated problems. Cigar, Ellipsoid. Both rotated and unrotated. Budget 100, 1000, 10000. Dimension 50.

nevergrad.benchmark.experiments.multimodal(seed: Optional[int] = None, para: bool = False) Iterator[Experiment]

Experiment on multimodal functions, namely hm, rastrigin, griewank, rosenbrock, ackley, lunacek, deceptivemultimodal. 0 or 5 dummy variable per real variable. Base dimension 3 or 25. Budget in 3000, 10000, 30000, 100000. Sequential.

nevergrad.benchmark.experiments.multiobjective_example(seed: Optional[int] = None, hd: bool = False, many: bool = False) Iterator[Experiment]

Optimization of 2 and 3 objective functions in Sphere, Ellipsoid, Cigar, Hm. Dimension 6 and 7. Budget 100 to 3200

Counterpart of moo with high dimension.

Counterpart of moo with more objective functions.

Counterpart of moo with high dimension and more objective functions.

Naive counterpart (no overfitting, see naivemltuning)of seq_keras_tuning.

Iterative counterpart of mltuning with overfitting of valid loss, i.e. train/valid/valid instead of train/valid/test.

Naive counterpart (no overfitting, see naivemltuning)of seq_keras_tuning.

Counterpart of mltuning with overfitting of valid loss, i.e. train/valid/valid instead of train/valid/test.

Iterative counterpart of mltuning with overfitting of valid loss, i.e. train/valid/valid instead of train/valid/test, and with lower budget.

Iterative counterpart of mltuning with overfitting of valid loss, i.e. train/valid/valid instead of train/valid/test, and with lower budget.

Iterative counterpart of seq_mltuning with smaller budget.

Iterative counterpart of seq_mltuning with smaller budget.

MuJoCo testbed. Learn neural policies.

One shot optimization of 3 classical objective functions (sphere, rastrigin, cigar), simplified. Tested on more dimensionalities than doe, namely 20, 200, 2000, 20000. No dummy variables. Budgets 30, 100, 3000, 10000, 30000, 100000, 300000.

Noisy optimization methods on a few noisy problems. Sphere, Rosenbrock, Cigar, Hm (= highly multimodal). Noise level 10. Noise dyssymmetry or not. Dimension 2, 20, 200, 2000. Budget 25000, 50000, 100000.

Testing optimizers on exponentiated problems. Cigar, Ellipsoid. Both rotated and unrotated. Budget 100, 1000, 10000. Dimension 50.

Olympus emulators

Olympus surfaces

One shot optimization of 3 classical objective functions (sphere, rastrigin, cigar). 0 or 5 dummy variables per real variable. Base dimension 3 or 25. budget 30, 100 or 3000.

One-shot counterpart of Scikit tuning.

All DE methods on various functions. Parallel version. Dimension 5, 20, 100, 500, 2500. Sphere, Cigar, Hm, Ellipsoid. No rotation.

All Bayesian optimization methods on various functions. Parallel version Dimension 20 and 2000. Budget 25, 31, 37, 43, 50, 60. Sphere, Cigar, Hm, Ellipsoid. No rotation.

Parallel optimization on 3 classical objective functions: sphere, rastrigin, cigar. The number of workers is 20 % of the budget. Testing both no useless variables and 5/6 of useless variables.

Parallel optimization with small budgets

Parallel counterpart of the multimodal experiment: 1000 workers.

Testing optimizers on exponentiated problems. Cigar, Ellipsoid. Both rotated and unrotated. Budget 100, 1000, 10000. Dimension 50.

nevergrad.benchmark.experiments.pbo_suite(seed: Optional[int] = None, reduced: bool = False) Iterator[Experiment]
nevergrad.benchmark.experiments.photonics(seed: Optional[int] = None, as_tuple: bool = False, small: bool = False, ultrasmall: bool = False, verysmall: bool = False) Iterator[Experiment]

Too small for being interesting: Bragg mirror + Chirped + Morpho butterfly.

Counterpart of yabbob with higher dimensions.

Unit commitment problem, i.e. management of dams for hydroelectric planning.

Noisy optimization methods on a few noisy problems. Cigar, Altcigar, Ellipsoid, Altellipsoid. Dimension 200, 2000, 20000. Budget 25000, 50000, 100000. No rotation. Noise level 10. With or without noise dissymmetry.

Realworld optimization. This experiment contains:

• a subset of MLDA (excluding the perceptron: 10 functions rescaled or not.

• ARCoating https://arxiv.org/abs/1904.02907: 1 function.

• The 007 game: 1 function, noisy.

• PowerSystem: a power system simulation problem.

• STSP: a simple TSP problem.

• MLDA, except the Perceptron.

Budget 25, 50, 100, 200, 400, 800, 1600, 3200, 6400, 12800. Sequential or 10-parallel or 100-parallel.

Counterpart of yabbob with HD and low budget.

nevergrad.benchmark.experiments.rocket(seed: Optional[int] = None, seq: bool = False) Iterator[Experiment]

Rocket simulator. Maximize max altitude by choosing the thrust schedule, given a total thrust. Budget 25, 50, …, 1600. Sequential or 30 workers.

Iterative counterpart of keras tuning.

Iterative counterpart of mltuning.

Optimization of policies for games, i.e. direct policy search. Budget 12800, 25600, 51200, 102400. Games: War, Batawaf, Flip, GuessWho, BigGuessWho.

Sequential counterpart of instrum_discrete.

nevergrad.benchmark.experiments.simple_tsp(seed: Optional[int] = None, complex_tsp: bool = False) Iterator[Experiment]

Simple TSP problems. Please note that the methods we use could be applied or complex variants, whereas specialized methods can not always do it; therefore this comparisons from a black-box point of view makes sense even if white-box methods are not included though they could do this more efficiently. 10, 100, 1000, 10000 cities. Budgets doubling from 25, 50, 100, 200, … up to 25600

Only use this if there is a good reason for not testing the xp, such as very slow for instance (>1min) with no way to make it faster. This is dangereous because it won’t test reproducibility and the experiment may therefore be corrupted with no way to notice it automatically.

Counterpart of yabbob with higher dimensions.

Counterpart of yabbob with higher dimensions.

Some optimizers on a noisy optimization problem. This benchmark is based on the noisy benchmark. Budget 500, 1000, 2000, 4000, … doubling… 128000. Rotation or not. Sphere, Sphere4, Cigar.

Experiment to optimise team pursuit track cycling problem.

Counterpart of yabbob with higher dimensions.

Counterpart of yabbob with higher dimensions.

Unit commitment problem.

Iterative counterpart of keras tuning.

Counterpart of yabbob with higher dimensions.

Counterpart of yabbob with higher dimensions.

nevergrad.benchmark.experiments.yabbob(seed: Optional[int] = None, parallel: bool = False, big: bool = False, small: bool = False, noise: bool = False, hd: bool = False, constraint_case: int = 0, split: bool = False, tuning: bool = False, reduction_factor: int = 1, bounded: bool = False, box: bool = False, max_num_constraints: int = 4, mega_smooth_penalization: int = 0) Iterator[Experiment]

Yet Another Black-Box Optimization Benchmark. Related to, but without special effort for exactly sticking to, the BBOB/COCO dataset. Dimension 2, 10 and 50. Budget 50, 200, 800, 3200, 12800. Both rotated or not rotated.

Counterpart of yabbob with more budget.

Counterpart of yabbob with bounded domain and dim only 40, (-5,5)**n by default.

Counterpart of yabbob with bounded domain, (-5,5)**n by default.

Counterpart of yabbob with constraints. Constraints are cheap: we do not count calls to them.

Counterpart of yabbob with higher dimensions.

Counterpart of yabbob with HD and low budget.

Counterpart of yabbob with higher dimensions.

Counterpart of yabbob with more budget.

Counterpart of yasplitbbob with more dimension.

Counterpart of yabbob with penalized constraints.

Counterpart of yabbob with penalized constraints.

Counterpart of yabbob with penalized constraints.

Counterpart of yabbob with penalized constraints.

Counterpart of yabbob with penalized constraints.

Noisy optimization counterpart of yabbob. This is supposed to be consistent with normal practices in noisy optimization: we distinguish recommendations and exploration. This is different from the original BBOB/COCO from that point of view.

Counterpart of yabbob with more budget.

Counterpart of yabbob with penalized constraints.

Counterpart of yabbob with penalized constraints.

Counterpart of yabooundedbbob with penalized constraints.

Counterpart of yaboxbbob with penalized constraints.

Counterpart of yanoisybbob with penalized constraints.

Counterpart of yaparabbob with penalized constraints.

Counterpart of yasmallbbob with penalized constraints.

Parallel optimization counterpart of yabbob.

Counterpart of yabbob with penalized constraints.

Counterpart of yabooundedbbob with penalized constraints.

Counterpart of yaboxbbob with penalized constraints.

Counterpart of yanoisybbob with penalized constraints.

Counterpart of yaparabbob with penalized constraints.

Counterpart of yasmallbbob with penalized constraints.

Counterpart of yabbob with less budget.

Counterpart of yabbob with splitting info in the instrumentation.

Counterpart of yabbob with less budget and less xps.

Counterpart of yabbob with less budget and less dimension.