Optimizers API Reference
Optimizer API
All the optimizers share the following common API:
- class nevergrad.optimizers.base.Optimizer(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Algorithm framework with 3 main functions:
ask()
which provides a candidate on which to evaluate the function to optimize.tell(candidate, loss)
which lets you provide the loss associated to points.provide_recommendation()
which provides the best final candidate.
Typically, one would call
ask()
num_workers times, evaluate the function on these num_workers points in parallel, update with the fitness value when the evaluations is finished, and iterate until the budget is over. At the very end, one would call provide_recommendation for the estimated optimum.This class is abstract, it provides internal equivalents for the 3 main functions, among which at least
_internal_ask_candidate
has to be overridden.Each optimizer instance should be used only once, with the initial provided budget
- Parameters
parametrization (int or Parameter) – either the dimension of the optimization space, or its parametrization
budget (int/None) – number of allowed evaluations
num_workers (int) – number of evaluations which will be run in parallel at once
- ask() Parameter
Provides a point to explore. This function can be called multiple times to explore several points in parallel
- Returns
The candidate to try on the objective function.
p.Parameter
have fieldargs
andkwargs
which can be directly used on the function (objective_function(*candidate.args, **candidate.kwargs)
).- Return type
- property dimension: int
Dimension of the optimization space.
- Type
int
- dump(filepath: Union[str, Path]) None
Pickles the optimizer into a file.
- enable_pickling() None
Some optimizers are only optionally picklable, because picklability requires saving the whole history which would be a waste of memory in general. To tell an optimizer to be picklable, call this function before any asks.
In this base class, the function is a no-op, but it is overridden in some optimizers.
- classmethod load(filepath: Union[str, Path]) X
Loads a pickle and checks that the class is correct.
- minimize(objective_function: Callable[[...], Union[float, Tuple[float, ...], List[float], ndarray]], executor: Optional[ExecutorLike] = None, batch_mode: bool = False, verbosity: int = 0, constraint_violation: Optional[Any] = None) Parameter
Optimization (minimization) procedure
- Parameters
objective_function (callable) – A callable to optimize (minimize)
executor (Executor) – An executor object, with method
submit(callable, *args, **kwargs)
and returning a Future-like object with methodsdone() -> bool
andresult() -> float
. The executor role is to dispatch the execution of the jobs locally/on a cluster/with multithreading depending on the implementation. Eg:concurrent.futures.ProcessPoolExecutor
batch_mode (bool) – when
num_workers = n > 1
, whether jobs are executed by batch (n
function evaluations are launched, we wait for all results and relaunch n evals) or not (whenever an evaluation is finished, we launch another one)verbosity (int) – print information about the optimization (0: None, 1: fitness values, 2: fitness values and recommendation)
constraint_violation (list of functions or None) – each function in the list returns >0 for a violated constraint.
- Returns
The candidate with minimal value.
ng.p.Parameters
have fieldargs
andkwargs
which can be directly used on the function (objective_function(*candidate.args, **candidate.kwargs)
).- Return type
ng.p.Parameter
Note
for evaluation purpose and with the current implementation, it is better to use batch_mode=True
- property num_ask: int
Number of time the ask method was called.
- Type
int
- property num_objectives: int
Provides 0 if the number is not known yet, else the number of objectives to optimize upon.
- property num_tell: int
Number of time the tell method was called.
- Type
int
- property num_tell_not_asked: int
Number of time the
tell
method was called on candidates that were not asked for by the optimizer (or were suggested).- Type
int
- pareto_front(size: Optional[int] = None, subset: str = 'random', subset_tentatives: int = 12) List[Parameter]
Pareto front, as a list of Parameter. The losses can be accessed through parameter.losses
- Parameters
size (int (optional)) – if provided, selects a subset of the full pareto front with the given maximum size
subset (str) – method for selecting the subset (“random, “loss-covering”, “domain-covering”, “hypervolume”)
subset_tentatives (int) – number of random tentatives for finding a better subset
- Returns
the list of Parameter of the pareto front
- Return type
list
Note
During non-multiobjective optimization, this returns the current pessimistic best
- provide_recommendation() Parameter
Provides the best point to use as a minimum, given the budget that was used
- Returns
The candidate with minimal value. p.Parameters have field
args
andkwargs
which can be directly used on the function (objective_function(*candidate.args, **candidate.kwargs)
).- Return type
- recommend() Parameter
Provides the best candidate to use as a minimum, given the budget that was used.
- Returns
The candidate with minimal loss.
p.Parameters
have fieldargs
andkwargs
which can be directly used on the function (objective_function(*candidate.args, **candidate.kwargs)
).- Return type
- register_callback(name: str, callback: Union[Callable[[Optimizer, Parameter, float], None], Callable[[Optimizer], None]]) None
Add a callback method called either when tell or ask are called, with the same arguments (including the optimizer / self). This can be useful for custom logging.
- Parameters
name (str) – name of the method to register the callback for (either
ask
ortell
)callback (callable) – a callable taking the same parameters as the method it is registered upon (including self)
- remove_all_callbacks() None
Removes all registered callables
- suggest(*args: Any, **kwargs: Any) None
Suggests a new point to ask. It will be asked at the next call (last in first out).
- Parameters
*args (Any) – positional arguments matching the parametrization pattern.
*kwargs (Any) – keyword arguments matching the parametrization pattern.
Note
This relies on optmizers implementing a way to deal with unasked candidate. Some optimizers may not support it and will raise a
TellNotAskedNotSupportedError
attell
time.LIFO is used so as to be able to suggest and ask straightaway, as an alternative to creating a new candidate with
optimizer.parametrization.spawn_child(new_value)
- tell(candidate: Parameter, loss: Union[float, Tuple[float, ...], List[float], ndarray], constraint_violation: Optional[Union[Tuple[float, ...], List[float], ndarray, float]] = None, penalty_style: Optional[Union[Tuple[float, ...], List[float], ndarray]] = None) None
Provides the optimizer with the evaluation of a fitness value for a candidate.
- Parameters
x (np.ndarray) – point where the function was evaluated
loss (float/list/np.ndarray) – loss of the function (or multi-objective function
constraint_violation (float/list/np.ndarray/None) – constraint violation (> 0 means that this is not correct)
penalty_style (ArrayLike/None) – to be read as [a,b,c,d,e,f] with cv the constraint violation vector (above): penalty = (a + sum(|loss|)) * (f+num_tell)**e * (b * sum(cv**c)) ** d default: [1e5, 1., .5, 1., .5, 1.]
Note
The candidate should generally be one provided by
ask()
, but can be also a non-asked candidate. To create a p.Parameter instance from args and kwargs, you can usecandidate = optimizer.parametrization.spawn_child(new_value=your_value)
:for an
Array(shape(2,))
:optimizer.parametrization.spawn_child(new_value=[12, 12])
for an
Instrumentation
:optimizer.parametrization.spawn_child(new_value=(args, kwargs))
Alternatively, you can provide a suggestion with
optimizer.suggest(*args, **kwargs)
, the nextask
will use this suggestion.
Callbacks
Callbacks can be registered through the optimizer.register_callback
for call on either ask
or tell
methods. Two of them are available through the
ng.callbacks namespace.
- class nevergrad.callbacks.EarlyStopping(stopping_criterion: Callable[[Optimizer], bool])
Callback for stopping the
minimize
method before the budget is fully used.- Parameters
stopping_criterion (func(optimizer) -> bool) – function that takes the current optimizer as input and returns True if the minimization must be stopped
Note
This callback must be register on the “ask” method only.
Example
In the following code, the
minimize
method will be stopped at the 4th “ask”>>> early_stopping = ng.callbacks.EarlyStopping(lambda opt: opt.num_ask > 3) >>> optimizer.register_callback("ask", early_stopping) >>> optimizer.minimize(_func, verbosity=2)
A couple other options (equivalent in case of non-noisy optimization) for stopping if the loss is below 12:
>>> early_stopping = ng.callbacks.EarlyStopping(lambda opt: opt.recommend().loss < 12) >>> early_stopping = ng.callbacks.EarlyStopping(lambda opt: opt.current_bests["minimum"].mean < 12)
- classmethod timer(max_duration: float) EarlyStopping
Early stop when max_duration seconds has been reached (from the first ask)
- class nevergrad.callbacks.OptimizerDump(filepath: Union[str, Path])
Dumps the optimizer to a pickle file at every call.
- Parameters
filepath (str or Path) – path to the pickle file
- class nevergrad.callbacks.ParametersLogger(filepath: Union[str, Path], append: bool = True, order: int = 1)
Logs parameter and run information throughout into a file during optimization.
- Parameters
filepath (str or pathlib.Path) – the path to dump data to
append (bool) – whether to append the file (otherwise it replaces it)
order (int) – order of the internal/model parameters to extract
Example
logger = ParametersLogger(filepath) optimizer.register_callback("tell", logger) optimizer.minimize() list_of_dict_of_data = logger.load()
Note
Arrays are converted to lists
- load() List[Dict[str, Any]]
Loads data from the log file
- load_flattened(max_list_elements: int = 24) List[Dict[str, Any]]
Loads data from the log file, and splits lists (arrays) into multiple arguments
- Parameters
max_list_elements (int) – Maximum number of elements displayed from the array, each element is given a unique id of type list_name#i0_i1_…
- to_hiplot_experiment(max_list_elements: int = 24) Any
Converts the logs into an hiplot experiment for display.
- Parameters
max_list_elements (int) – maximum number of elements of list/arrays to export (only the first elements are extracted)
Example
exp = logs.to_hiplot_experiment() exp.display(force_full_width=True)
Note
You can easily change the axes of the XY plot:
exp.display_data(hip.Displays.XY).update({'axis_x': '0#0', 'axis_y': '0#1'})
For more context about hiplot, check:
- class nevergrad.callbacks.ProgressBar
Progress bar to register as callback in an optimizer
Configurable optimizers
Configurable optimizers share the following API to create optimizers instances:
- class nevergrad.optimizers.base.ConfiguredOptimizer(OptimizerClass: Union[ConfiguredOptimizer, Type[Optimizer]], config: Dict[str, Any], as_config: bool = False)
Creates optimizer-like instances with configuration.
- Parameters
OptimizerClass (type) – class of the optimizer to configure, or another ConfiguredOptimizer (config will then be ignored except for the optimizer name/representation)
config (dict) – dictionnary of all the configurations
as_config (bool) – whether to provide all config as kwargs to the optimizer instantiation (default, see ConfiguredCMA for an example), or through a config kwarg referencing self. (if True, see EvolutionStrategy for an example)
Note
This provides a default repr which can be bypassed through set_name
- __call__(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1) Optimizer
Creates an optimizer from the parametrization
- Parameters
instrumentation (int or Instrumentation) – either the dimension of the optimization space, or its instrumentation
budget (int/None) – number of allowed evaluations
num_workers (int) – number of evaluations which will be run in parallel at once
- set_name(name: str, register: bool = False) ConfiguredOptimizer
Set a new representation for the instance
Here is a list of the available configurable optimizers:
Parametrizable families of optimizers.
Caution
This module and its available classes are experimental and may change quickly in the near future.
- class nevergrad.families.BayesOptim(*, init_budget: Optional[int] = None, pca: Optional[bool] = False, n_components: Optional[float] = 0.95, prop_doe_factor: Optional[float] = None)
Algorithms from bayes-optim package.
We use: - BO - PCA-BO: Principle Component Analysis (PCA) Bayesian Optimization for dimensionality reduction in BO
References
- [RaponiWB+20]
Raponi, Elena, Hao Wang, Mariusz Bujny, Simonetta Boria, and Carola Doerr. “High dimensional bayesian optimization assisted by principal component analysis.” In International Conference on Parallel Problem Solving from Nature, pp. 169-183. Springer, Cham, 2020.
- Parameters
init_budget (int or None) – Number of initialization algorithm steps
pca (bool) – whether to use the PCA transformation defining PCA-BO rather than BO
n_components (float or 0.95) – Principal axes in feature space, representing the directions of maximum variance in the data. It represents the percentage of explained variance
prop_doe_factor (float or None) – Percentage of the initial budget used for DoE and eventually overwriting init_budget
- class nevergrad.families.Chaining(optimizers: Sequence[Union[ConfiguredOptimizer, Type[Optimizer]]], budgets: Sequence[Union[str, int]])
A chaining consists in running algorithm 1 during T1, then algorithm 2 during T2, then algorithm 3 during T3, etc. Each algorithm is fed with what happened before it.
- Parameters
optimizers (list of Optimizer classes) – the sequence of optimizers to use
budgets (list of int) – the corresponding budgets for each optimizer but the last one
- class nevergrad.families.ConfPSO(transform: str = 'identity', popsize: Optional[int] = None, omega: float = 0.7213475204444817, phip: float = 1.1931471805599454, phig: float = 1.1931471805599454)
Particle Swarm Optimization is based on a set of particles with their inertia. Wikipedia provides a beautiful illustration ;) (see link)
- Parameters
transform (str) – name of the transform to use to map from PSO optimization space to R-space.
popsize (int) – population size of the particle swarm. Defaults to max(40, num_workers)
omega (float) – particle swarm optimization parameter
phip (float) – particle swarm optimization parameter
phig (float) – particle swarm optimization parameter
Note
Using non-default “transform” and “wide” parameters can lead to extreme values
Implementation partially following SPSO2011. However, no randomization of the population order.
Reference: M. Zambrano-Bigiarini, M. Clerc and R. Rojas, Standard Particle Swarm Optimisation 2011 at CEC-2013: A baseline for future PSO improvements, 2013 IEEE Congress on Evolutionary Computation, Cancun, 2013, pp. 2337-2344. https://ieeexplore.ieee.org/document/6557848
- class nevergrad.families.ConfPortfolio(*, optimizers: Sequence[Union[Optimizer, ConfiguredOptimizer, Type[Optimizer], str]] = (), warmup_ratio: Optional[float] = None)
Alternates
ask()
on several optimizers- Parameters
optimizers (list of Optimizer, optimizer name, Optimizer class or ConfiguredOptimizer) – the list of optimizers to use.
warmup_ratio (optional float) – ratio of the budget used before choosing to focus on one optimizer
Notes
if providing an initialized optimizer, the parametrization of the optimizer must be the exact same instance as the one of the Portfolio.
this API is temporary and will be renamed very soon
- class nevergrad.families.ConfSplitOptimizer(*, num_optims: ~typing.Optional[float] = None, num_vars: ~typing.Optional[~typing.List[int]] = None, max_num_vars: ~typing.Optional[int] = None, multivariate_optimizer: ~typing.Union[ConfiguredOptimizer, ~typing.Type[Optimizer]] = <class 'nevergrad.optimization.optimizerlib.MetaCMA'>, monovariate_optimizer: ~typing.Union[ConfiguredOptimizer, ~typing.Type[Optimizer]] = RandomSearch, progressive: bool = False, non_deterministic_descriptor: bool = True)
Combines optimizers, each of them working on their own variables.
- Parameters
num_optims (int (or float("inf"))) – number of optimizers to create (if not provided through
num_vars: or :code:`max_num_vars
)num_vars (int or None) – number of variable per optimizer (should not be used if
max_num_vars
ornum_optims
is set)max_num_vars (int or None) – maximum number of variables per optimizer. Should not be defined if
num_vars
ornum_optims
is defined since they will be chosen automatically.progressive (optional bool) – whether we progressively add optimizers.
non_deterministic_descriptor (bool) – subparts parametrization descriptor is set to noisy function. This can have an impact for optimizer selection for competence maps.
Example
for 5 optimizers, each of them working on 2 variables, one can use:
opt = ConfSplitOptimizer(num_vars=[2, 2, 2, 2, 2])(parametrization=10, num_workers=3) or equivalently: opt = SplitOptimizer(parametrization=10, num_workers=3, num_vars=[2, 2, 2, 2, 2]) Given that all optimizers have the same number of variables, one can also run: opt = SplitOptimizer(parametrization=10, num_workers=3, num_optims=5)
Note
By default, it uses CMA for multivariate groups and RandomSearch for monovariate groups.
Caution
The variables refer to the deep representation used by optimizers. For example, a categorical variable with 5 possible values becomes 5 continuous variables.
- class nevergrad.families.DifferentialEvolution(*, initialization: str = 'parametrization', scale: Union[str, float] = 1.0, recommendation: str = 'optimistic', crossover: Union[str, float] = 0.5, F1: float = 0.8, F2: float = 0.8, popsize: Union[str, int] = 'standard', propagate_heritage: bool = False, multiobjective_adaptation: bool = True, high_speed: bool = False)
Differential evolution is typically used for continuous optimization. It uses differences between points in the population for doing mutations in fruitful directions; it is therefore a kind of covariance adaptation without any explicit covariance, making it super fast in high dimension. This class implements several variants of differential evolution, some of them adapted to genetic mutations as in Holland’s work), (this combination is termed
TwoPointsDE
in Nevergrad, corresponding tocrossover="twopoints"
), or to the noisy setting (coinedNoisyDE
, corresponding torecommendation="noisy"
). In that last case, the optimizer returns the mean of the individuals with fitness better than median, which might be stupid sometimes though.Default settings are CR =.5, F1=.8, F2=.8, curr-to-best, pop size is 30 Initial population: pure random.
- Parameters
initialization ("parametrization", "LHS" or "QR") – algorithm/distribution used for the initialization phase. If “parametrization”, this uses the sample method of the parametrization.
scale (float or str) – scale of random component of the updates
recommendation ("pessimistic", "optimistic", "mean" or "noisy") – choice of the criterion for the best point to recommend
crossover (float or str) – crossover rate value, or strategy among: - “dimension”: crossover rate of 1 / dimension, - “random”: different random (uniform) crossover rate at each iteration - “onepoint”: one point crossover - “twopoints”: two points crossover - “rotated_twopoints”: more genetic 2p cross-over - “parametrization”: use the parametrization recombine method
F1 (float) – differential weight #1
F2 (float) – differential weight #2
popsize (int, "standard", "dimension", "large") – size of the population to use. “standard” is max(num_workers, 30), “dimension” max(num_workers, 30, dimension +1) and “large” max(num_workers, 30, 7 * dimension).
multiobjective_adaptation (bool) – Automatically adapts to handle multiobjective case. This is a very basic experimental version, activated by default because the non-multiobjective implementation is performing very badly.
high_speed (bool) – Trying to make the optimization faster by a metamodel for the recommendation step.
- class nevergrad.families.EMNA(*, isotropic: bool = True, naive: bool = True, population_size_adaptation: bool = False, initial_popsize: Optional[int] = None)
Estimation of Multivariate Normal Algorithm This algorithm is quite efficient in a parallel context, i.e. when the population size is large.
- Parameters
isotropic (bool) – isotropic version on EMNA if True, i.e. we have an identity matrix for the Gaussian, else we here consider the separable version, meaning we have a diagonal matrix for the Gaussian (anisotropic)
naive (bool) – set to False for noisy problem, so that the best points will be an average of the final population.
population_size_adaptation (bool) – population size automatically adapts to the landscape
initial_popsize (Optional[int]) – initial (and minimal) population size (default: 4 x dimension)
- class nevergrad.families.EvolutionStrategy(*, recombination_ratio: float = 0, popsize: int = 40, offsprings: Optional[int] = None, only_offsprings: bool = False, ranker: str = 'nsga2')
Experimental evolution-strategy-like algorithm The API is going to evolve
- Parameters
recombination_ratio (float) – probability of using a recombination (after the mutation) for generating new offsprings
popsize (int) – population size of the parents (lambda)
offsprings (int) – number of generated offsprings (mu)
only_offsprings (bool) – use only offsprings for the new generation if True (True: lambda,mu, False: lambda+mu)
ranker (str) – ranker for the multiobjective case (defaults to NSGA2)
- class nevergrad.families.NoisySplit(*, num_optims: Optional[float] = None, discrete: bool = False)
Non-progressive noisy split of variables based on 1+1
- Parameters
num_optims (optional int) – number of optimizers (one per variable if float(“inf”))
discrete (bool) – uses OptimisticDiscreteOnePlusOne if True, else NoisyOnePlusOne
- class nevergrad.families.NonObjectOptimizer(*, method: str = 'Nelder-Mead', random_restart: bool = False)
Wrapper over Scipy optimizer implementations, in standard ask and tell format. This is actually an import from scipy-optimize, including Sequential Quadratic Programming,
- Parameters
method (str) –
Name of the method to use among:
Nelder-Mead
COBYLA
SQP (or SLSQP): very powerful e.g. in continuous noisy optimization. It is based on approximating the objective function by quadratic models.
Powell
- NLOPT* (https://nlopt.readthedocs.io/en/latest/; by default, uses Sbplx, based on Subplex);
- can be NLOPT,
NLOPT_LN_SBPLX, NLOPT_LN_PRAXIS, NLOPT_GN_DIRECT, NLOPT_GN_DIRECT_L, NLOPT_GN_CRS2_LM, NLOPT_GN_AGS, NLOPT_GN_ISRES, NLOPT_GN_ESCH, NLOPT_LN_COBYLA, NLOPT_LN_BOBYQA, NLOPT_LN_NEWUOA_BOUND, NLOPT_LN_NELDERMEAD.
random_restart (bool) – whether to restart at a random point if the optimizer converged but the budget is not entirely spent yet (otherwise, restarts from best point)
Note
These optimizers do not support asking several candidates in a row
- class nevergrad.families.ParametrizedBO(*, initialization: Optional[str] = None, init_budget: Optional[int] = None, middle_point: bool = False, utility_kind: str = 'ucb', utility_kappa: float = 2.576, utility_xi: float = 0.0, gp_parameters: Optional[Dict[str, Any]] = None)
Bayesian optimization. Hyperparameter tuning method, based on statistical modeling of the objective function. This class is a wrapper over the bayes_opt package.
- Parameters
initialization (str) – Initialization algorithms (None, “Hammersley”, “random” or “LHS”)
init_budget (int or None) – Number of initialization algorithm steps
middle_point (bool) – whether to sample the 0 point first
utility_kind (str) – Type of utility function to use among “ucb”, “ei” and “poi”
utility_kappa (float) – Kappa parameter for the utility function
utility_xi (float) – Xi parameter for the utility function
gp_parameters (dict) – dictionnary of parameters for the gaussian process
- class nevergrad.families.ParametrizedCMA(*, scale: float = 1.0, elitist: bool = False, popsize: Optional[int] = None, popsize_factor: float = 3.0, diagonal: bool = False, high_speed: bool = False, fcmaes: bool = False, random_init: bool = False, inopts: Optional[Dict[str, Any]] = None)
CMA-ES optimizer, This evolution strategy uses a Gaussian sampling, iteratively modified for searching in the best directions. This optimizer wraps an external implementation: https://github.com/CMA-ES/pycma
- Parameters
scale (float) – scale of the search
elitist (bool) – whether we switch to elitist mode, i.e. mode + instead of comma, i.e. mode in which we always keep the best point in the population.
popsize (Optional[int] = None) – population size, should be n * self.num_workers for int n >= 1. default is max(self.num_workers, 4 + int(3 * np.log(self.dimension)))
popsize_factor (float = 3.) – factor in the formula for computing the population size
diagonal (bool) – use the diagonal version of CMA (advised in big dimension)
high_speed (bool) – use metamodel for recommendation
fcmaes (bool) – use fast implementation, doesn’t support diagonal=True. produces equivalent results, preferable for high dimensions or if objective function evaluation is fast.
random_init (bool) – Use a randomized initialization
inopts (optional dict) – use this to averride any inopts parameter of the wrapped CMA optimizer (see https://github.com/CMA-ES/pycma)
- class nevergrad.families.ParametrizedMetaModel(*, multivariate_optimizer: Optional[Union[ConfiguredOptimizer, Type[Optimizer]]] = None, frequency_ratio: float = 0.9)
Adds a metamodel to an optimizer. The optimizer is alway OnePlusOne if dimension is 1.
- Parameters
multivariate_optimizer (base.OptCls or None) – Optimizer to which the metamodel is added
frequency_ratio (float) – used for deciding the frequency at which we use the metamodel
- class nevergrad.families.ParametrizedOnePlusOne(*, noise_handling: Optional[Union[str, Tuple[str, float]]] = None, tabu_length: int = 0, mutation: str = 'gaussian', crossover: bool = False, rotation: bool = False, annealing: str = 'none', use_pareto: bool = False, sparse: bool = False, smoother: bool = False)
Simple but sometimes powerfull class of optimization algorithm. This use asynchronous updates, so that (1+1) can actually be parallel and even performs quite well in such a context - this is naturally close to (1+lambda).
- Parameters
noise_handling (str or Tuple[str, float]) –
Method for handling the noise. The name can be:
”random”: a random point is reevaluated regularly, this uses the one-fifth adaptation rule, going back to Schumer and Steiglitz (1968). It was independently rediscovered by Devroye (1972) and Rechenberg (1973).
”optimistic”: the best optimistic point is reevaluated regularly, optimism in front of uncertainty
a coefficient can to tune the regularity of these reevaluations (default .05)
mutation (str) –
One of the available mutations from:
”gaussian”: standard mutation by adding a Gaussian random variable (with progressive widening) to the best pessimistic point
”cauchy”: same as Gaussian but with a Cauchy distribution.
- ”discrete”: when a variable is mutated (which happens with probability 1/d in dimension d), it’s just
randomly drawn. This means that on average, only one variable is mutated.
”discreteBSO”: as in brainstorm optimization, we slowly decrease the mutation rate from 1 to 1/d.
”fastga”: FastGA mutations from the current best
”doublefastga”: double-FastGA mutations from the current best (Doerr et al, Fast Genetic Algorithms, 2017)
”rls”: Randomized Local Search (randomly mutate one and only one variable).
”portfolio”: Random number of mutated bits (called niform mixing in Dang & Lehre “Self-adaptation of Mutation Rates in Non-elitist Population”, 2016)
”lengler”: specific mutation rate chosen as a function of the dimension and iteration index.
crossover (bool) – whether to add a genetic crossover step every other iteration.
use_pareto (bool) – whether to restart from a random pareto element in multiobjective mode, instead of the last one added
sparse (bool) – whether we have random mutations setting variables to 0.
smoother (bool) – whether we suggest smooth mutations.
Notes
After many papers advocared the mutation rate 1/d in the discrete (1+1) for the discrete case, it was proposed to use of a randomly drawn mutation rate. Fast genetic algorithms are based on a similar idea These two simple methods perform quite well on a wide range of problems.
- class nevergrad.families.ParametrizedTBPSA(*, naive: bool = True, initial_popsize: Optional[int] = None)
Test-based population-size adaptation This method, based on adapting the population size, performs the best in many noisy optimization problems, even in large dimension
- Parameters
naive (bool) – set to False for noisy problem, so that the best points will be an average of the final population.
initial_popsize (Optional[int]) – initial (and minimal) population size (default: 4 x dimension)
Note
Derived from: Hellwig, Michael & Beyer, Hans-Georg. (2016). Evolution under Strong Noise: A Self-Adaptive Evolution Strategy Reaches the Lower Performance Bound – the pcCMSA-ES. https://homepages.fhv.at/hgb/New-Papers/PPSN16_HB16.pdf
- class nevergrad.families.Pymoo(*, algorithm: str)
Wrapper over Pymoo optimizer implementations, in standard ask and tell format. This is actually an import from Pymoo Optimize.
- Parameters
algorithm (str) – Use “algorithm-name” with following names to access algorithm classes: Single-Objective -“de” -‘ga’ -“brkga” -“nelder-mead” -“pattern-search” -“cmaes” Multi-Objective -“nsga2” Multi-Objective requiring reference directions, points or lines -“rnsga2” -“nsga3” -“unsga3” -“rnsga3” -“moead” -“ctaea”
Note
These optimizers do not support asking several candidates in a row
- class nevergrad.families.RandomSearchMaker(*, middle_point: bool = False, stupid: bool = False, opposition_mode: Optional[str] = None, sampler: str = 'parametrization', scale: Union[float, str] = 1.0, recommendation_rule: str = 'pessimistic')
Provides random suggestions.
- Parameters
stupid (bool) – Provides a random recommendation instead of the best point so far (for baseline)
middle_point (bool) – enforces that the first suggested point (ask) is zero.
opposition_mode (str or None) –
- symmetrizes exploration wrt the center: (e.g. https://ieeexplore.ieee.org/document/4424748)
full symmetry if “opposite”
random * symmetric if “quasi”
sampler (str) –
parametrization: uses the default sample() method of the parametrization, which samples uniformly between bounds and a Gaussian otherwise
gaussian: uses a Gaussian distribution
cauchy: uses a Cauchy distribution
use a Cauchy distribution instead of Gaussian distribution
scale (float or "random") –
- scalar for multiplying the suggested point values, or string:
”random”: uses a randomized pattern for the scale.
”auto”: scales in function of dimension and budget (version 1: sigma = (1+log(budget)) / (4log(dimension)) )
”autotune”: scales in function of dimension and budget (version 2: sigma = sqrt(log(budget) / dimension) )
recommendation_rule (str) – “average_of_best” or “pessimistic” or “average_of_exp_best”; “pessimistic” is the default and implies selecting the pessimistic best.
- class nevergrad.families.SamplingSearch(*, sampler: str = 'Halton', scrambled: bool = False, middle_point: bool = False, opposition_mode: Optional[str] = None, cauchy: bool = False, autorescale: Union[bool, str] = False, scale: float = 1.0, rescaled: bool = False, recommendation_rule: str = 'pessimistic')
This is a one-shot optimization method, hopefully better than random search by ensuring more uniformity.
- Parameters
sampler (str) – Choice of the sampler among “Halton”, “Hammersley” and “LHS”.
scrambled (bool) – Adds scrambling to the search; much better in high dimension and rarely worse than the original search.
middle_point (bool) – enforces that the first suggested point (ask) is zero.
cauchy (bool) – use Cauchy inverse distribution instead of Gaussian when fitting points to real space (instead of box).
scale (float or "random") – scalar for multiplying the suggested point values.
rescaled (bool or str) – rescales the sampling pattern to reach the boundaries and/or applies automatic rescaling.
recommendation_rule (str) – “average_of_best” or “pessimistic”; “pessimistic” is the default and implies selecting the pessimistic best.
Notes
Halton is a low quality sampling method when the dimension is high; it is usually better to use Halton with scrambling.
When the budget is known in advance, it is also better to replace Halton by Hammersley. Basically the key difference with Halton is adding one coordinate evenly spaced (the discrepancy is better). budget, low discrepancy sequences (e.g. scrambled Hammersley) have a better discrepancy.
Reference: Halton 1964: Algorithm 247: Radical-inverse quasi-random point sequence, ACM, p. 701. adds scrambling to the Halton search; much better in high dimension and rarely worse than the original Halton search.
About Latin Hypercube Sampling (LHS): Though partially incremental versions exist, this implementation needs the budget in advance. This can be great in terms of discrepancy when the budget is not very high.
Optimizers
Here are all the other optimizers available in nevergrad
:
Caution
Only non-family-based optimizers are listed in the documentation,
you can get a full list of available optimizers with sorted(nevergrad.optimizers.registry.keys())
- class nevergrad.optimization.optimizerlib.BayesOptim(*, init_budget: Optional[int] = None, pca: Optional[bool] = False, n_components: Optional[float] = 0.95, prop_doe_factor: Optional[float] = None)
Algorithms from bayes-optim package.
We use: - BO - PCA-BO: Principle Component Analysis (PCA) Bayesian Optimization for dimensionality reduction in BO
References
- [RaponiWB+20]
Raponi, Elena, Hao Wang, Mariusz Bujny, Simonetta Boria, and Carola Doerr. “High dimensional bayesian optimization assisted by principal component analysis.” In International Conference on Parallel Problem Solving from Nature, pp. 169-183. Springer, Cham, 2020.
- Parameters
init_budget (int or None) – Number of initialization algorithm steps
pca (bool) – whether to use the PCA transformation defining PCA-BO rather than BO
n_components (float or 0.95) – Principal axes in feature space, representing the directions of maximum variance in the data. It represents the percentage of explained variance
prop_doe_factor (float or None) – Percentage of the initial budget used for DoE and eventually overwriting init_budget
- no_parallelization = True
- recast = True
- class nevergrad.optimization.optimizerlib.CM(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Competence map, simplest.
- class nevergrad.optimization.optimizerlib.CMandAS2(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Competence map, with algorithm selection in one of the cases (3 CMAs).
- class nevergrad.optimization.optimizerlib.CMandAS3(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Competence map, with algorithm selection in one of the cases (3 CMAs).
- class nevergrad.optimization.optimizerlib.Chaining(optimizers: Sequence[Union[ConfiguredOptimizer, Type[Optimizer]]], budgets: Sequence[Union[str, int]])
A chaining consists in running algorithm 1 during T1, then algorithm 2 during T2, then algorithm 3 during T3, etc. Each algorithm is fed with what happened before it.
- Parameters
optimizers (list of Optimizer classes) – the sequence of optimizers to use
budgets (list of int) – the corresponding budgets for each optimizer but the last one
- class nevergrad.optimization.optimizerlib.ChoiceBase(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Nevergrad optimizer by competence map.
- enable_pickling() None
Some optimizers are only optionally picklable, because picklability requires saving the whole history which would be a waste of memory in general. To tell an optimizer to be picklable, call this function before any asks.
In this base class, the function is a no-op, but it is overridden in some optimizers.
- class nevergrad.optimization.optimizerlib.ConfPSO(transform: str = 'identity', popsize: Optional[int] = None, omega: float = 0.7213475204444817, phip: float = 1.1931471805599454, phig: float = 1.1931471805599454)
Particle Swarm Optimization is based on a set of particles with their inertia. Wikipedia provides a beautiful illustration ;) (see link)
- Parameters
transform (str) – name of the transform to use to map from PSO optimization space to R-space.
popsize (int) – population size of the particle swarm. Defaults to max(40, num_workers)
omega (float) – particle swarm optimization parameter
phip (float) – particle swarm optimization parameter
phig (float) – particle swarm optimization parameter
Note
Using non-default “transform” and “wide” parameters can lead to extreme values
Implementation partially following SPSO2011. However, no randomization of the population order.
Reference: M. Zambrano-Bigiarini, M. Clerc and R. Rojas, Standard Particle Swarm Optimisation 2011 at CEC-2013: A baseline for future PSO improvements, 2013 IEEE Congress on Evolutionary Computation, Cancun, 2013, pp. 2337-2344. https://ieeexplore.ieee.org/document/6557848
- class nevergrad.optimization.optimizerlib.ConfPortfolio(*, optimizers: Sequence[Union[Optimizer, ConfiguredOptimizer, Type[Optimizer], str]] = (), warmup_ratio: Optional[float] = None)
Alternates
ask()
on several optimizers- Parameters
optimizers (list of Optimizer, optimizer name, Optimizer class or ConfiguredOptimizer) – the list of optimizers to use.
warmup_ratio (optional float) – ratio of the budget used before choosing to focus on one optimizer
Notes
if providing an initialized optimizer, the parametrization of the optimizer must be the exact same instance as the one of the Portfolio.
this API is temporary and will be renamed very soon
- class nevergrad.optimization.optimizerlib.ConfSplitOptimizer(*, num_optims: ~typing.Optional[float] = None, num_vars: ~typing.Optional[~typing.List[int]] = None, max_num_vars: ~typing.Optional[int] = None, multivariate_optimizer: ~typing.Union[ConfiguredOptimizer, ~typing.Type[Optimizer]] = <class 'nevergrad.optimization.optimizerlib.MetaCMA'>, monovariate_optimizer: ~typing.Union[ConfiguredOptimizer, ~typing.Type[Optimizer]] = RandomSearch, progressive: bool = False, non_deterministic_descriptor: bool = True)
Combines optimizers, each of them working on their own variables.
- Parameters
num_optims (int (or float("inf"))) – number of optimizers to create (if not provided through
num_vars: or :code:`max_num_vars
)num_vars (int or None) – number of variable per optimizer (should not be used if
max_num_vars
ornum_optims
is set)max_num_vars (int or None) – maximum number of variables per optimizer. Should not be defined if
num_vars
ornum_optims
is defined since they will be chosen automatically.progressive (optional bool) – whether we progressively add optimizers.
non_deterministic_descriptor (bool) – subparts parametrization descriptor is set to noisy function. This can have an impact for optimizer selection for competence maps.
Example
for 5 optimizers, each of them working on 2 variables, one can use:
opt = ConfSplitOptimizer(num_vars=[2, 2, 2, 2, 2])(parametrization=10, num_workers=3) or equivalently: opt = SplitOptimizer(parametrization=10, num_workers=3, num_vars=[2, 2, 2, 2, 2]) Given that all optimizers have the same number of variables, one can also run: opt = SplitOptimizer(parametrization=10, num_workers=3, num_optims=5)
Note
By default, it uses CMA for multivariate groups and RandomSearch for monovariate groups.
Caution
The variables refer to the deep representation used by optimizers. For example, a categorical variable with 5 possible values becomes 5 continuous variables.
- class nevergrad.optimization.optimizerlib.EDA(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Test-based population-size adaptation.
Population-size equal to lambda = 4 x dimension. Test by comparing the first fifth and the last fifth of the 5lambda evaluations.
Caution
This optimizer is probably wrong.
- class nevergrad.optimization.optimizerlib.EMNA(*, isotropic: bool = True, naive: bool = True, population_size_adaptation: bool = False, initial_popsize: Optional[int] = None)
Estimation of Multivariate Normal Algorithm This algorithm is quite efficient in a parallel context, i.e. when the population size is large.
- Parameters
isotropic (bool) – isotropic version on EMNA if True, i.e. we have an identity matrix for the Gaussian, else we here consider the separable version, meaning we have a diagonal matrix for the Gaussian (anisotropic)
naive (bool) – set to False for noisy problem, so that the best points will be an average of the final population.
population_size_adaptation (bool) – population size automatically adapts to the landscape
initial_popsize (Optional[int]) – initial (and minimal) population size (default: 4 x dimension)
- class nevergrad.optimization.optimizerlib.MEDA(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.MPCEDA(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.MetaCMA(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Nevergrad optimizer by competence map. You might modify this one for designing your own competence map.
- class nevergrad.optimization.optimizerlib.MultiDiscrete(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Combining 3 Discrete(1+1) optimizers. Active selection at 1/4th of the budget.
- class nevergrad.optimization.optimizerlib.MultipleSingleRuns(*, num_single_runs: int = 9, base_optimizer: ~typing.Union[ConfiguredOptimizer, ~typing.Type[Optimizer]] = <class 'nevergrad.optimization.optimizerlib.NGOpt'>)
Multiple single-objective runs, in particular for multi-objective optimization. :param num_single_runs: number of single runs. :type num_single_runs: int
- class nevergrad.optimization.optimizerlib.NGO(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt10(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt12(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt13(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt14(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt15(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt16(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt21(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt36(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt38(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt39(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NGOpt4(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Nevergrad optimizer by competence map. You might modify this one for designing your own competence map.
- class nevergrad.optimization.optimizerlib.NGOpt8(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Nevergrad optimizer by competence map. You might modify this one for designing your own competence map.
- class nevergrad.optimization.optimizerlib.NGOptBase(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Nevergrad optimizer by competence map.
- enable_pickling() None
Some optimizers are only optionally picklable, because picklability requires saving the whole history which would be a waste of memory in general. To tell an optimizer to be picklable, call this function before any asks.
In this base class, the function is a no-op, but it is overridden in some optimizers.
- class nevergrad.optimization.optimizerlib.NGOptRW(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.NoisyBandit(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
UCB. This is upper confidence bound (adapted to minimization), with very poor parametrization; in particular, the logarithmic term is set to zero. Infinite arms: we add one arm when 20 * #ask >= #arms ** 3.
- class nevergrad.optimization.optimizerlib.NoisySplit(*, num_optims: Optional[float] = None, discrete: bool = False)
Non-progressive noisy split of variables based on 1+1
- Parameters
num_optims (optional int) – number of optimizers (one per variable if float(“inf”))
discrete (bool) – uses OptimisticDiscreteOnePlusOne if True, else NoisyOnePlusOne
- class nevergrad.optimization.optimizerlib.PCEDA(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
- class nevergrad.optimization.optimizerlib.ParametrizedBO(*, initialization: Optional[str] = None, init_budget: Optional[int] = None, middle_point: bool = False, utility_kind: str = 'ucb', utility_kappa: float = 2.576, utility_xi: float = 0.0, gp_parameters: Optional[Dict[str, Any]] = None)
Bayesian optimization. Hyperparameter tuning method, based on statistical modeling of the objective function. This class is a wrapper over the bayes_opt package.
- Parameters
initialization (str) – Initialization algorithms (None, “Hammersley”, “random” or “LHS”)
init_budget (int or None) – Number of initialization algorithm steps
middle_point (bool) – whether to sample the 0 point first
utility_kind (str) – Type of utility function to use among “ucb”, “ei” and “poi”
utility_kappa (float) – Kappa parameter for the utility function
utility_xi (float) – Xi parameter for the utility function
gp_parameters (dict) – dictionnary of parameters for the gaussian process
- no_parallelization = True
- class nevergrad.optimization.optimizerlib.ParametrizedCMA(*, scale: float = 1.0, elitist: bool = False, popsize: Optional[int] = None, popsize_factor: float = 3.0, diagonal: bool = False, high_speed: bool = False, fcmaes: bool = False, random_init: bool = False, inopts: Optional[Dict[str, Any]] = None)
CMA-ES optimizer, This evolution strategy uses a Gaussian sampling, iteratively modified for searching in the best directions. This optimizer wraps an external implementation: https://github.com/CMA-ES/pycma
- Parameters
scale (float) – scale of the search
elitist (bool) – whether we switch to elitist mode, i.e. mode + instead of comma, i.e. mode in which we always keep the best point in the population.
popsize (Optional[int] = None) – population size, should be n * self.num_workers for int n >= 1. default is max(self.num_workers, 4 + int(3 * np.log(self.dimension)))
popsize_factor (float = 3.) – factor in the formula for computing the population size
diagonal (bool) – use the diagonal version of CMA (advised in big dimension)
high_speed (bool) – use metamodel for recommendation
fcmaes (bool) – use fast implementation, doesn’t support diagonal=True. produces equivalent results, preferable for high dimensions or if objective function evaluation is fast.
random_init (bool) – Use a randomized initialization
inopts (optional dict) – use this to averride any inopts parameter of the wrapped CMA optimizer (see https://github.com/CMA-ES/pycma)
- class nevergrad.optimization.optimizerlib.ParametrizedMetaModel(*, multivariate_optimizer: Optional[Union[ConfiguredOptimizer, Type[Optimizer]]] = None, frequency_ratio: float = 0.9)
Adds a metamodel to an optimizer. The optimizer is alway OnePlusOne if dimension is 1.
- Parameters
multivariate_optimizer (base.OptCls or None) – Optimizer to which the metamodel is added
frequency_ratio (float) – used for deciding the frequency at which we use the metamodel
- class nevergrad.optimization.optimizerlib.ParametrizedOnePlusOne(*, noise_handling: Optional[Union[str, Tuple[str, float]]] = None, tabu_length: int = 0, mutation: str = 'gaussian', crossover: bool = False, rotation: bool = False, annealing: str = 'none', use_pareto: bool = False, sparse: bool = False, smoother: bool = False)
Simple but sometimes powerfull class of optimization algorithm. This use asynchronous updates, so that (1+1) can actually be parallel and even performs quite well in such a context - this is naturally close to (1+lambda).
- Parameters
noise_handling (str or Tuple[str, float]) –
Method for handling the noise. The name can be:
”random”: a random point is reevaluated regularly, this uses the one-fifth adaptation rule, going back to Schumer and Steiglitz (1968). It was independently rediscovered by Devroye (1972) and Rechenberg (1973).
”optimistic”: the best optimistic point is reevaluated regularly, optimism in front of uncertainty
a coefficient can to tune the regularity of these reevaluations (default .05)
mutation (str) –
One of the available mutations from:
”gaussian”: standard mutation by adding a Gaussian random variable (with progressive widening) to the best pessimistic point
”cauchy”: same as Gaussian but with a Cauchy distribution.
- ”discrete”: when a variable is mutated (which happens with probability 1/d in dimension d), it’s just
randomly drawn. This means that on average, only one variable is mutated.
”discreteBSO”: as in brainstorm optimization, we slowly decrease the mutation rate from 1 to 1/d.
”fastga”: FastGA mutations from the current best
”doublefastga”: double-FastGA mutations from the current best (Doerr et al, Fast Genetic Algorithms, 2017)
”rls”: Randomized Local Search (randomly mutate one and only one variable).
”portfolio”: Random number of mutated bits (called niform mixing in Dang & Lehre “Self-adaptation of Mutation Rates in Non-elitist Population”, 2016)
”lengler”: specific mutation rate chosen as a function of the dimension and iteration index.
crossover (bool) – whether to add a genetic crossover step every other iteration.
use_pareto (bool) – whether to restart from a random pareto element in multiobjective mode, instead of the last one added
sparse (bool) – whether we have random mutations setting variables to 0.
smoother (bool) – whether we suggest smooth mutations.
Notes
After many papers advocared the mutation rate 1/d in the discrete (1+1) for the discrete case, it was proposed to use of a randomly drawn mutation rate. Fast genetic algorithms are based on a similar idea These two simple methods perform quite well on a wide range of problems.
- class nevergrad.optimization.optimizerlib.ParametrizedTBPSA(*, naive: bool = True, initial_popsize: Optional[int] = None)
Test-based population-size adaptation This method, based on adapting the population size, performs the best in many noisy optimization problems, even in large dimension
- Parameters
naive (bool) – set to False for noisy problem, so that the best points will be an average of the final population.
initial_popsize (Optional[int]) – initial (and minimal) population size (default: 4 x dimension)
Note
Derived from: Hellwig, Michael & Beyer, Hans-Georg. (2016). Evolution under Strong Noise: A Self-Adaptive Evolution Strategy Reaches the Lower Performance Bound – the pcCMSA-ES. https://homepages.fhv.at/hgb/New-Papers/PPSN16_HB16.pdf
- class nevergrad.optimization.optimizerlib.Portfolio(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1, config: Optional[ConfPortfolio] = None)
Passive portfolio of CMA, 2-pt DE and Scr-Hammersley.
- enable_pickling() None
Some optimizers are only optionally picklable, because picklability requires saving the whole history which would be a waste of memory in general. To tell an optimizer to be picklable, call this function before any asks.
In this base class, the function is a no-op, but it is overridden in some optimizers.
- class nevergrad.optimization.optimizerlib.Rescaled(*, base_optimizer: ~typing.Union[ConfiguredOptimizer, ~typing.Type[Optimizer]] = <class 'nevergrad.optimization.optimizerlib.MetaCMA'>, scale: ~typing.Optional[float] = None)
Configured optimizer for creating rescaled optimization algorithms.
- Parameters
base_optimizer (base.OptCls) – optimization algorithm to be rescaled.
scale (how much do we rescale. E.g. 0.001 if we want to focus on the center) – with std 0.001 (assuming the std of the domain is set to 1).
- class nevergrad.optimization.optimizerlib.SPSA(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
The First order SPSA algorithm as shown in [1,2,3], with implementation details from [4,5].
https://en.wikipedia.org/wiki/Simultaneous_perturbation_stochastic_approximation
Spall, James C. “Multivariate stochastic approximation using a simultaneous perturbation gradient approximation.” IEEE transactions on automatic control 37.3 (1992): 332-341.
Section 7.5.2 in “Introduction to Stochastic Search and Optimization: Estimation, Simulation and Control” by James C. Spall.
Pushpendre Rastogi, Jingyi Zhu, James C. Spall CISS (2016). Efficient implementation of Enhanced Adaptive Simultaneous Perturbation Algorithms.
- no_parallelization = True
- class nevergrad.optimization.optimizerlib.SQPCMA(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Passive portfolio of MetaCMA and many SQP.
- class nevergrad.optimization.optimizerlib.Shiwa(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1)
Nevergrad optimizer by competence map. You might modify this one for designing your own competence map.
- class nevergrad.optimization.optimizerlib.SplitOptimizer(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1, config: Optional[ConfSplitOptimizer] = None)
Combines optimizers, each of them working on their own variables. (use ConfSplitOptimizer)
- class nevergrad.optimization.optimizerlib.cGA(parametrization: Union[int, Parameter], budget: Optional[int] = None, num_workers: int = 1, arity: Optional[int] = None)
Compact Genetic Algorithm. A discrete optimization algorithm, introduced in and often used as a first baseline.