Optimizers API Reference

Optimizer API

All the optimizers share the following common API:

class nevergrad.optimizers.base.Optimizer(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Algorithm framework with 3 main functions:

  • ask() which provides a candidate on which to evaluate the function to optimize.

  • tell(candidate, loss) which lets you provide the loss associated to points.

  • provide_recommendation() which provides the best final candidate.

Typically, one would call ask() num_workers times, evaluate the function on these num_workers points in parallel, update with the fitness value when the evaluations is finished, and iterate until the budget is over. At the very end, one would call provide_recommendation for the estimated optimum.

This class is abstract, it provides internal equivalents for the 3 main functions, among which at least _internal_ask_candidate has to be overridden.

Each optimizer instance should be used only once, with the initial provided budget

Parameters
  • parametrization (int or Parameter) – either the dimension of the optimization space, or its parametrization

  • budget (int/None) – number of allowed evaluations

  • num_workers (int) – number of evaluations which will be run in parallel at once

ask() → nevergrad.parametrization.core.Parameter

Provides a point to explore. This function can be called multiple times to explore several points in parallel

Returns

The candidate to try on the objective function. p.Parameter have field args and kwargs which can be directly used on the function (objective_function(*candidate.args, **candidate.kwargs)).

Return type

p.Parameter

property dimension

Dimension of the optimization space.

Type

int

dump(filepath: Union[str, pathlib.Path]) → None

Pickles the optimizer into a file.

classmethod load(filepath: Union[str, pathlib.Path]) → X

Loads a pickle and checks that the class is correct.

minimize(objective_function: Callable[[], Union[float, Tuple[float, ], List[float], numpy.ndarray]], executor: Optional[nevergrad.common.typing.ExecutorLike] = None, batch_mode: bool = False, verbosity: int = 0) → nevergrad.parametrization.core.Parameter

Optimization (minimization) procedure

Parameters
  • objective_function (callable) – A callable to optimize (minimize)

  • executor (Executor) – An executor object, with method submit(callable, *args, **kwargs) and returning a Future-like object with methods done() -> bool and result() -> float. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg: concurrent.futures.ThreadPoolExecutor

  • batch_mode (bool) – when num_workers = n > 1, whether jobs are executed by batch (n function evaluations are launched, we wait for all results and relaunch n evals) or not (whenever an evaluation is finished, we launch another one)

  • verbosity (int) – print information about the optimization (0: None, 1: fitness values, 2: fitness values and recommendation)

Returns

The candidate with minimal value. p.Parameters have field args and kwargs which can be directly used on the function (objective_function(*candidate.args, **candidate.kwargs)).

Return type

p.Parameter

Note

for evaluation purpose and with the current implementation, it is better to use batch_mode=True

property num_ask

Number of time the ask method was called.

Type

int

property num_tell

Number of time the tell method was called.

Type

int

property num_tell_not_asked

Number of time the tell method was called on candidates that were not asked for by the optimizer (or were suggested).

Type

int

pareto_front(size: Optional[int] = None, subset: str = 'random', subset_tentatives: int = 12) → List[nevergrad.parametrization.core.Parameter]

Pareto front, as a list of Parameter. The losses can be accessed through parameter.losses

Parameters
  • size (int (optional)) – if provided, selects a subset of the full pareto front with the given maximum size

  • subset (str) – method for selecting the subset (“random, “loss-covering”, “domain-covering”, “hypervolume”)

  • subset_tentatives (int) – number of random tentatives for finding a better subset

Returns

the list of Parameter of the pareto front

Return type

list

Note

During non-multiobjective optimization, this returns the current pessimistic best

provide_recommendation() → nevergrad.parametrization.core.Parameter

Provides the best point to use as a minimum, given the budget that was used

Returns

The candidate with minimal value. p.Parameters have field args and kwargs which can be directly used on the function (objective_function(*candidate.args, **candidate.kwargs)).

Return type

p.Parameter

recommend() → nevergrad.parametrization.core.Parameter

Provides the best candidate to use as a minimum, given the budget that was used.

Returns

The candidate with minimal loss. p.Parameters have field args and kwargs which can be directly used on the function (objective_function(*candidate.args, **candidate.kwargs)).

Return type

p.Parameter

register_callback(name: str, callback: Union[Callable[[Optimizer, p.Parameter, float], None], Callable[[Optimizer], None]]) → None

Add a callback method called either when tell or ask are called, with the same arguments (including the optimizer / self). This can be useful for custom logging.

Parameters
  • name (str) – name of the method to register the callback for (either ask or tell)

  • callback (callable) – a callable taking the same parameters as the method it is registered upon (including self)

remove_all_callbacks() → None

Removes all registered callables

suggest(*args: Any, **kwargs: Any) → None

Suggests a new point to ask. It will be asked at the next call (last in first out).

Parameters
  • *args (Any) – positional arguments matching the parametrization pattern.

  • *kwargs (Any) – keyword arguments matching the parametrization pattern.

Note

  • This relies on optmizers implementing a way to deal with unasked candidate. Some optimizers may not support it and will raise a TellNotAskedNotSupportedError at tell time.

  • LIFO is used so as to be able to suggest and ask straightaway, as an alternative to creating a new candidate with optimizer.parametrization.spawn_child(new_value)

tell(candidate: nevergrad.parametrization.core.Parameter, loss: Union[float, Tuple[float, ], List[float], numpy.ndarray]) → None

Provides the optimizer with the evaluation of a fitness value for a candidate.

Parameters
  • x (np.ndarray) – point where the function was evaluated

  • loss (float/list/np.ndarray) – loss of the function (or multi-objective function

Note

The candidate should generally be one provided by ask(), but can be also a non-asked candidate. To create a p.Parameter instance from args and kwargs, you can use candidate = optimizer.parametrization.spawn_child(new_value=your_value):

  • for an Array(shape(2,)): optimizer.parametrization.spawn_child(new_value=[12, 12])

  • for an Instrumentation: optimizer.parametrization.spawn_child(new_value=(args, kwargs))

Alternatively, you can provide a suggestion with optimizer.suggest(*args, **kwargs), the next ask will use this suggestion.

Callbacks

Callbacks can be registered through the optimizer.register_callback for call on either ask or tell methods. Two of them are available through the ng.callbacks namespace.

class nevergrad.callbacks.OptimizerDump(filepath: Union[str, pathlib.Path])

Dumps the optimizer to a pickle file at every call.

Parameters

filepath (str or Path) – path to the pickle file

class nevergrad.callbacks.ParametersLogger(filepath: Union[str, pathlib.Path], append: bool = True, order: int = 1)

Logs parameter and run information throughout into a file during optimization.

Parameters
  • filepath (str or pathlib.Path) – the path to dump data to

  • append (bool) – whether to append the file (otherwise it replaces it)

  • order (int) – order of the internal/model parameters to extract

Example

logger = ParametersLogger(filepath)
optimizer.register_callback("tell",  logger)
optimizer.minimize()
list_of_dict_of_data = logger.load()

Note

Arrays are converted to lists

load() → List[Dict[str, Any]]

Loads data from the log file

load_flattened(max_list_elements: int = 24) → List[Dict[str, Any]]

Loads data from the log file, and splits lists (arrays) into multiple arguments

Parameters

max_list_elements (int) – Maximum number of elements displayed from the array, each element is given a unique id of type list_name#i0_i1_…

to_hiplot_experiment(max_list_elements: int = 24) → Any

Converts the logs into an hiplot experiment for display.

Parameters

max_list_elements (int) – maximum number of elements of list/arrays to export (only the first elements are extracted)

Example

exp = logs.to_hiplot_experiment()
exp.display(force_full_width=True)

Note

class nevergrad.callbacks.ProgressBar

Progress bar to register as callback in an optimizer

Configurable optimizers

Configurable optimizers share the following API to create optimizers instances:

class nevergrad.optimizers.base.ConfiguredOptimizer(OptimizerClass: Type[nevergrad.optimization.base.Optimizer], config: Dict[str, Any], as_config: bool = False)

Creates optimizer-like instances with configuration.

Parameters
  • OptimizerClass (type) – class of the optimizer to configure

  • config (dict) – dictionnary of all the configurations

  • as_config (bool) – whether to provide all config as kwargs to the optimizer instantiation (default, see ConfiguredCMA for an example), or through a config kwarg referencing self. (if True, see EvolutionStrategy for an example)

Note

This provides a default repr which can be bypassed through set_name

__call__(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1) → nevergrad.optimization.base.Optimizer

Creates an optimizer from the parametrization

Parameters
  • instrumentation (int or Instrumentation) – either the dimension of the optimization space, or its instrumentation

  • budget (int/None) – number of allowed evaluations

  • num_workers (int) – number of evaluations which will be run in parallel at once

load(filepath: Union[str, pathlib.Path]) → nevergrad.optimization.base.Optimizer

Loads a pickle and checks that it is an Optimizer.

set_name(name: str, register: bool = False) → nevergrad.optimization.base.ConfiguredOptimizer

Set a new representation for the instance

Here is a list of the available configurable optimizers:

Parametrizable families of optimizers.

Caution

This module and its available classes are experimental and may change quickly in the near future.

class nevergrad.families.Chaining(optimizers: Sequence[Union[nevergrad.optimization.base.ConfiguredOptimizer, Type[nevergrad.optimization.base.Optimizer]]], budgets: Sequence[Union[str, int]])

A chaining consists in running algorithm 1 during T1, then algorithm 2 during T2, then algorithm 3 during T3, etc. Each algorithm is fed with what happened before it.

Parameters
  • optimizers (list of Optimizer classes) – the sequence of optimizers to use

  • budgets (list of int) – the corresponding budgets for each optimizer but the last one

class nevergrad.families.DifferentialEvolution(*, initialization: str = 'gaussian', scale: Union[str, float] = 1.0, recommendation: str = 'optimistic', crossover: Union[str, float] = 0.5, F1: float = 0.8, F2: float = 0.8, popsize: Union[str, int] = 'standard', propagate_heritage: bool = False, multiobjective_adaptation: bool = True)

Differential evolution is typically used for continuous optimization. It uses differences between points in the population for doing mutations in fruitful directions; it is therefore a kind of covariance adaptation without any explicit covariance, making it super fast in high dimension. This class implements several variants of differential evolution, some of them adapted to genetic mutations as in Holland’s work), (this combination is termed TwoPointsDE in Nevergrad, corresponding to crossover="twopoints"), or to the noisy setting (coined NoisyDE, corresponding to recommendation="noisy"). In that last case, the optimizer returns the mean of the individuals with fitness better than median, which might be stupid sometimes though.

Default settings are CR =.5, F1=.8, F2=.8, curr-to-best, pop size is 30 Initial population: pure random.

Parameters
  • initialization ("LHS", "QR" or "gaussian") – algorithm/distribution used for the initialization phase

  • scale (float or str) – scale of random component of the updates

  • recommendation ("pessimistic", "optimistic", "mean" or "noisy") – choice of the criterion for the best point to recommend

  • crossover (float or str) – crossover rate value, or strategy among: - “dimension”: crossover rate of 1 / dimension, - “random”: different random (uniform) crossover rate at each iteration - “onepoint”: one point crossover - “twopoints”: two points crossover - “parametrization”: use the parametrization recombine method

  • F1 (float) – differential weight #1

  • F2 (float) – differential weight #2

  • popsize (int, "standard", "dimension", "large") – size of the population to use. “standard” is max(num_workers, 30), “dimension” max(num_workers, 30, dimension +1) and “large” max(num_workers, 30, 7 * dimension).

  • multiobjective_adaptation (bool) – Automatically adapts to handle multiobjective case. This is a very basic experimental version, activated by default because the non-multiobjective implementation is performing very badly.

class nevergrad.families.EMNA(*, isotropic: bool = True, naive: bool = True, population_size_adaptation: bool = False, initial_popsize: Optional[int] = None)

Estimation of Multivariate Normal Algorithm This algorithm is quite efficient in a parallel context, i.e. when the population size is large.

Parameters
  • isotropic (bool) – isotropic version on EMNA if True, i.e. we have an identity matrix for the Gaussian, else we here consider the separable version, meaning we have a diagonal matrix for the Gaussian (anisotropic)

  • naive (bool) – set to False for noisy problem, so that the best points will be an average of the final population.

  • population_size_adaptation (bool) – population size automatically adapts to the landscape

  • initial_popsize (Optional[int]) – initial (and minimal) population size (default: 4 x dimension)

class nevergrad.families.EvolutionStrategy(*, recombination_ratio: float = 0, popsize: int = 40, offsprings: Optional[int] = None, only_offsprings: bool = False, ranker: str = 'simple')

Experimental evolution-strategy-like algorithm The API is going to evolve

class nevergrad.families.ParametrizedBO(*, initialization: Optional[str] = None, init_budget: Optional[int] = None, middle_point: bool = False, utility_kind: str = 'ucb', utility_kappa: float = 2.576, utility_xi: float = 0.0, gp_parameters: Optional[Dict[str, Any]] = None)

Bayesian optimization. Hyperparameter tuning method, based on statistical modeling of the objective function. This class is a wrapper over the bayes_opt package.

Parameters
  • initialization (str) – Initialization algorithms (None, “Hammersley”, “random” or “LHS”)

  • init_budget (int or None) – Number of initialization algorithm steps

  • middle_point (bool) – whether to sample the 0 point first

  • utility_kind (str) – Type of utility function to use among “ucb”, “ei” and “poi”

  • utility_kappa (float) – Kappa parameter for the utility function

  • utility_xi (float) – Xi parameter for the utility function

  • gp_parameters (dict) – dictionnary of parameters for the gaussian process

class nevergrad.families.ParametrizedCMA(*, scale: float = 1.0, popsize: Optional[int] = None, diagonal: bool = False, fcmaes: bool = False, random_init: bool = False)

CMA-ES optimizer, This evolution strategy uses a Gaussian sampling, iteratively modified for searching in the best directions. This optimizer wraps an external implementation: https://github.com/CMA-ES/pycma

Parameters
  • scale (float) – scale of the search

  • popsize (Optional[int] = None) – population size, should be n * self.num_workers for int n >= 1. default is max(self.num_workers, 4 + int(3 * np.log(self.dimension)))

  • diagonal (bool) – use the diagonal version of CMA (advised in big dimension)

  • fcmaes (bool = False) – use fast implementation, doesn’t support diagonal=True. produces equivalent results, preferable for high dimensions or if objective function evaluation is fast.

class nevergrad.families.ParametrizedOnePlusOne(*, noise_handling: Optional[Union[str, Tuple[str, float]]] = None, mutation: str = 'gaussian', crossover: bool = False)

Simple but sometimes powerfull class of optimization algorithm. This use asynchronous updates, so that (1+1) can actually be parallel and even performs quite well in such a context - this is naturally close to (1+lambda).

Parameters
  • noise_handling (str or Tuple[str, float]) –

    Method for handling the noise. The name can be:

    • ”random”: a random point is reevaluated regularly, this uses the one-fifth adaptation rule, going back to Schumer and Steiglitz (1968). It was independently rediscovered by Devroye (1972) and Rechenberg (1973).

    • ”optimistic”: the best optimistic point is reevaluated regularly, optimism in front of uncertainty

    • a coefficient can to tune the regularity of these reevaluations (default .05)

  • mutation (str) –

    One of the available mutations from:

    • ”gaussian”: standard mutation by adding a Gaussian random variable (with progressive widening) to the best pessimistic point

    • ”cauchy”: same as Gaussian but with a Cauchy distribution.

    • ”discrete”: when a variable is mutated (which happens with probability 1/d in dimension d), it’s just

      randomly drawn. This means that on average, only one variable is mutated.

    • ”discreteBSO”: as in brainstorm optimization, we slowly decrease the mutation rate from 1 to 1/d.

    • ”fastga”: FastGA mutations from the current best

    • ”doublefastga”: double-FastGA mutations from the current best (Doerr et al, Fast Genetic Algorithms, 2017)

    • ”portfolio”: Random number of mutated bits (called niform mixing in Dang & Lehre “Self-adaptation of Mutation Rates in Non-elitist Population”, 2016)

    • ”lengler”: specific mutation rate chosen as a function of the dimension and iteration index.

  • crossover (bool) – whether to add a genetic crossover step every other iteration.

Notes

After many papers advocared the mutation rate 1/d in the discrete (1+1) for the discrete case, it was proposed to use of a randomly drawn mutation rate. Fast genetic algorithms are based on a similar idea These two simple methods perform quite well on a wide range of problems.

class nevergrad.families.ParametrizedTBPSA(*, naive: bool = True, initial_popsize: Optional[int] = None)

Test-based population-size adaptation This method, based on adapting the population size, performs the best in many noisy optimization problems, even in large dimension

Parameters
  • naive (bool) – set to False for noisy problem, so that the best points will be an average of the final population.

  • initial_popsize (Optional[int]) – initial (and minimal) population size (default: 4 x dimension)

Note

Derived from: Hellwig, Michael & Beyer, Hans-Georg. (2016). Evolution under Strong Noise: A Self-Adaptive Evolution Strategy Reaches the Lower Performance Bound – the pcCMSA-ES. https://homepages.fhv.at/hgb/New-Papers/PPSN16_HB16.pdf

class nevergrad.families.RandomSearchMaker(*, middle_point: bool = False, stupid: bool = False, opposition_mode: Optional[str] = None, cauchy: bool = False, scale: Union[float, str] = 1.0, recommendation_rule: str = 'pessimistic')

Provides random suggestions.

Parameters
  • stupid (bool) – Provides a random recommendation instead of the best point so far (for baseline)

  • middle_point (bool) – enforces that the first suggested point (ask) is zero.

  • opposition_mode (str or None) –

    symmetrizes exploration wrt the center: (e.g. https://ieeexplore.ieee.org/document/4424748)
    • full symmetry if “opposite”

    • random * symmetric if “quasi”

  • cauchy (bool) – use a Cauchy distribution instead of Gaussian distribution

  • scale (float or "random") –

    scalar for multiplying the suggested point values, or string:
    • ”random”: uses a randomized pattern for the scale.

    • ”auto”: scales in function of dimension and budget (version 1: sigma = (1+log(budget)) / (4log(dimension)) )

    • ”autotune”: scales in function of dimension and budget (version 2: sigma = sqrt(log(budget) / dimension) )

  • recommendation_rule (str) – “average_of_best” or “pessimistic” or “average_of_exp_best”; “pessimistic” is the default and implies selecting the pessimistic best.

class nevergrad.families.SamplingSearch(*, sampler: str = 'Halton', scrambled: bool = False, middle_point: bool = False, opposition_mode: Optional[str] = None, cauchy: bool = False, autorescale: Union[bool, str] = False, scale: float = 1.0, rescaled: bool = False, recommendation_rule: str = 'pessimistic')

This is a one-shot optimization method, hopefully better than random search by ensuring more uniformity.

Parameters
  • sampler (str) – Choice of the sampler among “Halton”, “Hammersley” and “LHS”.

  • scrambled (bool) – Adds scrambling to the search; much better in high dimension and rarely worse than the original search.

  • middle_point (bool) – enforces that the first suggested point (ask) is zero.

  • cauchy (bool) – use Cauchy inverse distribution instead of Gaussian when fitting points to real space (instead of box).

  • scale (float or "random") – scalar for multiplying the suggested point values.

  • rescaled (bool or str) – rescales the sampling pattern to reach the boundaries and/or applies automatic rescaling.

  • recommendation_rule (str) – “average_of_best” or “pessimistic”; “pessimistic” is the default and implies selecting the pessimistic best.

Notes

  • Halton is a low quality sampling method when the dimension is high; it is usually better to use Halton with scrambling.

  • When the budget is known in advance, it is also better to replace Halton by Hammersley. Basically the key difference with Halton is adding one coordinate evenly spaced (the discrepancy is better). budget, low discrepancy sequences (e.g. scrambled Hammersley) have a better discrepancy.

  • Reference: Halton 1964: Algorithm 247: Radical-inverse quasi-random point sequence, ACM, p. 701. adds scrambling to the Halton search; much better in high dimension and rarely worse than the original Halton search.

  • About Latin Hypercube Sampling (LHS): Though partially incremental versions exist, this implementation needs the budget in advance. This can be great in terms of discrepancy when the budget is not very high.

class nevergrad.families.ScipyOptimizer(*, method: str = 'Nelder-Mead', random_restart: bool = False)

Wrapper over Scipy optimizer implementations, in standard ask and tell format. This is actually an import from scipy-optimize, including Sequential Quadratic Programming,

Parameters
  • method (str) –

    Name of the method to use among:

    • Nelder-Mead

    • COBYLA

    • SQP (or SLSQP): very powerful e.g. in continuous noisy optimization. It is based on approximating the objective function by quadratic models.

    • Powell

  • random_restart (bool) – whether to restart at a random point if the optimizer converged but the budget is not entirely spent yet (otherwise, restarts from best point)

Note

These optimizers do not support asking several candidates in a row

Optimizers

Here are all the other optimizers available in nevergrad:

Caution

Only non-family-based optimizers are listed in the documentation, you can get a full list of available optimizers with sorted(nevergrad.optimizers.registry.keys())

class nevergrad.optimization.optimizerlib.ASCMA2PDEthird(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Algorithm selection, with CMA and 2pt-DE. Active selection at 1/3.

class nevergrad.optimization.optimizerlib.ASCMADEQRthird(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Algorithm selection, with CMA, ScrHalton and Lhs-DE. Active selection at 1/3.

class nevergrad.optimization.optimizerlib.ASCMADEthird(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Algorithm selection, with CMA and Lhs-DE. Active selection at 1/3.

class nevergrad.optimization.optimizerlib.CM(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Competence map, simplest.

class nevergrad.optimization.optimizerlib.CMandAS(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Competence map, with algorithm selection in one of the cases (2 CMAs).

class nevergrad.optimization.optimizerlib.CMandAS2(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Competence map, with algorithm selection in one of the cases (3 CMAs).

class nevergrad.optimization.optimizerlib.CMandAS3(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Competence map, with algorithm selection in one of the cases (3 CMAs).

class nevergrad.optimization.optimizerlib.Chaining(optimizers: Sequence[Union[nevergrad.optimization.base.ConfiguredOptimizer, Type[nevergrad.optimization.base.Optimizer]]], budgets: Sequence[Union[str, int]])

A chaining consists in running algorithm 1 during T1, then algorithm 2 during T2, then algorithm 3 during T3, etc. Each algorithm is fed with what happened before it.

Parameters
  • optimizers (list of Optimizer classes) – the sequence of optimizers to use

  • budgets (list of int) – the corresponding budgets for each optimizer but the last one

class nevergrad.optimization.optimizerlib.ConfSplitOptimizer(*, num_optims: Optional[int] = None, num_vars: Optional[List[int]] = None, multivariate_optimizer: Union[ConfiguredOptimizer, Type[Optimizer]] = CMA, monovariate_optimizer: Union[ConfiguredOptimizer, Type[Optimizer]] = RandomSearch, progressive: bool = False, non_deterministic_descriptor: bool = True)

“Combines optimizers, each of them working on their own variables.

Parameters
  • num_optims (int) – number of optimizers

  • num_vars (optional list of int) – number of variable per optimizer.

  • progressive (optional bool) – whether we progressively add optimizers.

  • non_deterministic_descriptor (bool) – subparts parametrization descriptor is set to noisy function. This can have an impact for optimizer selection for NGOpt optimizers.

class nevergrad.optimization.optimizerlib.ConfiguredPSO(transform: str = 'identity', wide: bool = False, popsize: Optional[int] = None)

Particle Swarm Optimization is based on a set of particles with their inertia. Wikipedia provides a beautiful illustration ;) (see link)

Parameters
  • transform (str) – name of the transform to use to map from PSO optimization space to R-space.

  • wide (bool) – if True: legacy initialization in [-1,1] box mapped to R

  • popsize (int) – population size of the particle swarm. Defaults to max(40, num_workers)

Note

  • Using non-default “transform” and “wide” parameters can lead to extreme values

  • Implementation partially following SPSO2011. However, no randomization of the population order.

  • Reference: M. Zambrano-Bigiarini, M. Clerc and R. Rojas, Standard Particle Swarm Optimisation 2011 at CEC-2013: A baseline for future PSO improvements, 2013 IEEE Congress on Evolutionary Computation, Cancun, 2013, pp. 2337-2344. https://ieeexplore.ieee.org/document/6557848

class nevergrad.optimization.optimizerlib.EDA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Test-based population-size adaptation.

Population-size equal to lambda = 4 x dimension. Test by comparing the first fifth and the last fifth of the 5lambda evaluations.

Caution

This optimizer is probably wrong.

class nevergrad.optimization.optimizerlib.EMNA(*, isotropic: bool = True, naive: bool = True, population_size_adaptation: bool = False, initial_popsize: Optional[int] = None)

Estimation of Multivariate Normal Algorithm This algorithm is quite efficient in a parallel context, i.e. when the population size is large.

Parameters
  • isotropic (bool) – isotropic version on EMNA if True, i.e. we have an identity matrix for the Gaussian, else we here consider the separable version, meaning we have a diagonal matrix for the Gaussian (anisotropic)

  • naive (bool) – set to False for noisy problem, so that the best points will be an average of the final population.

  • population_size_adaptation (bool) – population size automatically adapts to the landscape

  • initial_popsize (Optional[int]) – initial (and minimal) population size (default: 4 x dimension)

exception nevergrad.optimization.optimizerlib.InfiniteMetaModelOptimum

Sometimes the optimum of the metamodel is at infinity.

class nevergrad.optimization.optimizerlib.MEDA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)
class nevergrad.optimization.optimizerlib.MPCEDA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)
class nevergrad.optimization.optimizerlib.ManyCMA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Combining 3 CMAs. Exactly identical. Active selection at 1/3 of the budget.

class nevergrad.optimization.optimizerlib.ManySmallCMA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Combining 3 CMAs. Exactly identical. Active selection at 1/3 of the budget.

class nevergrad.optimization.optimizerlib.MetaModel(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1, multivariate_optimizer: nevergrad.optimization.base.ConfiguredOptimizer = CMA)

Adding a metamodel into CMA.

class nevergrad.optimization.optimizerlib.MultiCMA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Combining 3 CMAs. Exactly identical. Active selection at 1/10 of the budget.

class nevergrad.optimization.optimizerlib.MultiDiscrete(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Combining 3 Discrete(1+1). Exactly identical. Active selection at 1/10 of the budget.

class nevergrad.optimization.optimizerlib.MultiScaleCMA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Combining 3 CMAs with different init scale. Active selection at 1/3 of the budget.

class nevergrad.optimization.optimizerlib.NGO(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)
class nevergrad.optimization.optimizerlib.NGOpt(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)
class nevergrad.optimization.optimizerlib.NGOpt2(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Nevergrad optimizer by competence map. You might modify this one for designing youe own competence map.

class nevergrad.optimization.optimizerlib.NGOpt4(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Nevergrad optimizer by competence map. You might modify this one for designing youe own competence map.

class nevergrad.optimization.optimizerlib.NGOpt8(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Nevergrad optimizer by competence map. You might modify this one for designing youe own competence map.

class nevergrad.optimization.optimizerlib.NGOptBase(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Nevergrad optimizer by competence map.

property optim
recommend() → nevergrad.parametrization.core.Parameter

Provides the best candidate to use as a minimum, given the budget that was used.

Returns

The candidate with minimal loss. p.Parameters have field args and kwargs which can be directly used on the function (objective_function(*candidate.args, **candidate.kwargs)).

Return type

p.Parameter

class nevergrad.optimization.optimizerlib.NoisyBandit(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

UCB. This is upper confidence bound (adapted to minimization), with very poor parametrization; in particular, the logarithmic term is set to zero. Infinite arms: we add one arm when 20 * #ask >= #arms ** 3.

class nevergrad.optimization.optimizerlib.PCEDA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)
class nevergrad.optimization.optimizerlib.PSO(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1, transform: str = 'arctan', wide: bool = False, popsize: Optional[int] = None)
class nevergrad.optimization.optimizerlib.ParaPortfolio(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Passive portfolio of CMA, 2-pt DE, PSO, SQP and Scr-Hammersley.

class nevergrad.optimization.optimizerlib.ParametrizedBO(*, initialization: Optional[str] = None, init_budget: Optional[int] = None, middle_point: bool = False, utility_kind: str = 'ucb', utility_kappa: float = 2.576, utility_xi: float = 0.0, gp_parameters: Optional[Dict[str, Any]] = None)

Bayesian optimization. Hyperparameter tuning method, based on statistical modeling of the objective function. This class is a wrapper over the bayes_opt package.

Parameters
  • initialization (str) – Initialization algorithms (None, “Hammersley”, “random” or “LHS”)

  • init_budget (int or None) – Number of initialization algorithm steps

  • middle_point (bool) – whether to sample the 0 point first

  • utility_kind (str) – Type of utility function to use among “ucb”, “ei” and “poi”

  • utility_kappa (float) – Kappa parameter for the utility function

  • utility_xi (float) – Xi parameter for the utility function

  • gp_parameters (dict) – dictionnary of parameters for the gaussian process

no_parallelization = True
class nevergrad.optimization.optimizerlib.ParametrizedCMA(*, scale: float = 1.0, popsize: Optional[int] = None, diagonal: bool = False, fcmaes: bool = False, random_init: bool = False)

CMA-ES optimizer, This evolution strategy uses a Gaussian sampling, iteratively modified for searching in the best directions. This optimizer wraps an external implementation: https://github.com/CMA-ES/pycma

Parameters
  • scale (float) – scale of the search

  • popsize (Optional[int] = None) – population size, should be n * self.num_workers for int n >= 1. default is max(self.num_workers, 4 + int(3 * np.log(self.dimension)))

  • diagonal (bool) – use the diagonal version of CMA (advised in big dimension)

  • fcmaes (bool = False) – use fast implementation, doesn’t support diagonal=True. produces equivalent results, preferable for high dimensions or if objective function evaluation is fast.

class nevergrad.optimization.optimizerlib.ParametrizedOnePlusOne(*, noise_handling: Optional[Union[str, Tuple[str, float]]] = None, mutation: str = 'gaussian', crossover: bool = False)

Simple but sometimes powerfull class of optimization algorithm. This use asynchronous updates, so that (1+1) can actually be parallel and even performs quite well in such a context - this is naturally close to (1+lambda).

Parameters
  • noise_handling (str or Tuple[str, float]) –

    Method for handling the noise. The name can be:

    • ”random”: a random point is reevaluated regularly, this uses the one-fifth adaptation rule, going back to Schumer and Steiglitz (1968). It was independently rediscovered by Devroye (1972) and Rechenberg (1973).

    • ”optimistic”: the best optimistic point is reevaluated regularly, optimism in front of uncertainty

    • a coefficient can to tune the regularity of these reevaluations (default .05)

  • mutation (str) –

    One of the available mutations from:

    • ”gaussian”: standard mutation by adding a Gaussian random variable (with progressive widening) to the best pessimistic point

    • ”cauchy”: same as Gaussian but with a Cauchy distribution.

    • ”discrete”: when a variable is mutated (which happens with probability 1/d in dimension d), it’s just

      randomly drawn. This means that on average, only one variable is mutated.

    • ”discreteBSO”: as in brainstorm optimization, we slowly decrease the mutation rate from 1 to 1/d.

    • ”fastga”: FastGA mutations from the current best

    • ”doublefastga”: double-FastGA mutations from the current best (Doerr et al, Fast Genetic Algorithms, 2017)

    • ”portfolio”: Random number of mutated bits (called niform mixing in Dang & Lehre “Self-adaptation of Mutation Rates in Non-elitist Population”, 2016)

    • ”lengler”: specific mutation rate chosen as a function of the dimension and iteration index.

  • crossover (bool) – whether to add a genetic crossover step every other iteration.

Notes

After many papers advocared the mutation rate 1/d in the discrete (1+1) for the discrete case, it was proposed to use of a randomly drawn mutation rate. Fast genetic algorithms are based on a similar idea These two simple methods perform quite well on a wide range of problems.

class nevergrad.optimization.optimizerlib.ParametrizedTBPSA(*, naive: bool = True, initial_popsize: Optional[int] = None)

Test-based population-size adaptation This method, based on adapting the population size, performs the best in many noisy optimization problems, even in large dimension

Parameters
  • naive (bool) – set to False for noisy problem, so that the best points will be an average of the final population.

  • initial_popsize (Optional[int]) – initial (and minimal) population size (default: 4 x dimension)

Note

Derived from: Hellwig, Michael & Beyer, Hans-Georg. (2016). Evolution under Strong Noise: A Self-Adaptive Evolution Strategy Reaches the Lower Performance Bound – the pcCMSA-ES. https://homepages.fhv.at/hgb/New-Papers/PPSN16_HB16.pdf

class nevergrad.optimization.optimizerlib.PolyCMA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Combining 20 CMAs. Exactly identical. Active selection at 1/3 of the budget.

class nevergrad.optimization.optimizerlib.Portfolio(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Passive portfolio of CMA, 2-pt DE and Scr-Hammersley.

class nevergrad.optimization.optimizerlib.SPSA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

The First order SPSA algorithm as shown in [1,2,3], with implementation details from [4,5].

  1. https://en.wikipedia.org/wiki/Simultaneous_perturbation_stochastic_approximation

  2. https://www.chessprogramming.org/SPSA

  3. Spall, James C. “Multivariate stochastic approximation using a simultaneous perturbation gradient approximation.” IEEE transactions on automatic control 37.3 (1992): 332-341.

  4. Section 7.5.2 in “Introduction to Stochastic Search and Optimization: Estimation, Simulation and Control” by James C. Spall.

  5. Pushpendre Rastogi, Jingyi Zhu, James C. Spall CISS (2016). Efficient implementation of Enhanced Adaptive Simultaneous Perturbation Algorithms.

no_parallelization = True
class nevergrad.optimization.optimizerlib.SQPCMA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Passive portfolio of CMA and many SQP.

class nevergrad.optimization.optimizerlib.Shiwa(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Nevergrad optimizer by competence map. You might modify this one for designing youe own competence map.

class nevergrad.optimization.optimizerlib.SplitOptimizer(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1, num_optims: Optional[int] = None, num_vars: Optional[List[int]] = None, multivariate_optimizer: Union[ConfiguredOptimizer, Type[Optimizer]] = CMA, monovariate_optimizer: Union[ConfiguredOptimizer, Type[Optimizer]] = RandomSearch, progressive: bool = False, non_deterministic_descriptor: bool = True)

Combines optimizers, each of them working on their own variables.

Parameters
  • num_optims (int or None) – number of optimizers

  • num_vars (int or None) – number of variable per optimizer.

  • progressive (bool) – True if we want to progressively add optimizers during the optimization run. If progressive = True, the optimizer is forced at OptimisticNoisyOnePlusOne.

Example

for 5 optimizers, each of them working on 2 variables, one can use:

opt = SplitOptimizer(parametrization=10, num_workers=3, num_optims=5, num_vars=[2, 2, 2, 2, 2]) or equivalently: opt = SplitOptimizer(parametrization=10, num_workers=3, num_vars=[2, 2, 2, 2, 2]) Given that all optimizers have the same number of variables, one can also run: opt = SplitOptimizer(parametrization=10, num_workers=3, num_optims=5)

Note

By default, it uses CMA for multivariate groups and RandomSearch for monovariate groups.

Caution

The variables refer to the deep representation used by optimizers. For example, a categorical variable with 5 possible values becomes 5 continuous variables.

class nevergrad.optimization.optimizerlib.TripleCMA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1)

Combining 3 CMAs. Exactly identical. Active selection at 1/3 of the budget.

class nevergrad.optimization.optimizerlib.cGA(parametrization: Union[int, nevergrad.parametrization.core.Parameter], budget: Optional[int] = None, num_workers: int = 1, arity: Optional[int] = None)

Compact Genetic Algorithm. A discrete optimization algorithm, introduced in and often used as a first baseline.

nevergrad.optimization.optimizerlib.learn_on_k_best(archive: nevergrad.optimization.utils.Archive[nevergrad.optimization.utils.MultiValue], k: int) → Union[Tuple[float, ], List[float], numpy.ndarray]

Approximate optimum learnt from the k best.

Parameters

archive (utils.Archive[utils.Value]) –