Replay buffer module

class mbrl.util.replay_buffer.BootstrapIterator(transitions: mbrl.types.TransitionBatch, batch_size: int, ensemble_size: int, shuffle_each_epoch: bool = False, permute_indices: bool = True, rng: Optional[numpy.random._generator.Generator] = None)

Bases: mbrl.util.replay_buffer.TransitionIterator

A transition iterator that can be used to train ensemble of bootstrapped models.

When iterating, this iterator samples from a different set of indices for each model in the ensemble, essentially assigning a different dataset to each model. Each batch is of shape (ensemble_size x batch_size x obs_size) – likewise for actions, rewards, dones.

Parameters
  • transitions (TransitionBatch) – the transition data used to built the iterator.

  • batch_size (int) – the batch size to use when iterating over the stored data.

  • ensemble_size (int) – the number of models in the ensemble.

  • shuffle_each_epoch (bool) – if True the iteration order is shuffled everytime a loop over the data is completed. Defaults to False.

  • permute_indices (boot) – if True the bootstrap datasets are just permutations of the original data. If False they are sampled with replacement. Defaults to True.

  • rng (np.random.Generator, optional) – a random number generator when sampling batches. If None (default value), a new default generator will be used.

Note

If you want to make other custom types of iterators compatible with ensembles of bootstrapped models, the easiest way is to subclass BootstrapIterator and overwrite __getitem()__ method. The sampling methods of this class will then batch the result of of self[item] along a model dimension, where each batch is sampled independently.

toggle_bootstrap()

Toggles whether the iterator returns a batch per model or a single batch.

class mbrl.util.replay_buffer.ReplayBuffer(capacity: int, obs_shape: Sequence[int], action_shape: Sequence[int], obs_type: Type = <class 'numpy.float32'>, action_type: Type = <class 'numpy.float32'>, reward_type: Type = <class 'numpy.float32'>, rng: Optional[numpy.random._generator.Generator] = None, max_trajectory_length: Optional[int] = None)

Bases: object

A replay buffer with support for training/validation iterators and ensembles.

This buffer can be pushed to and sampled from as a typical replay buffer.

Parameters
  • capacity (int) – the maximum number of transitions that the buffer can store. When the capacity is reached, the contents are overwritten in FIFO fashion.

  • obs_shape (Sequence of ints) – the shape of the observations to store.

  • action_shape (Sequence of ints) – the shape of the actions to store.

  • obs_type (type) – the data type of the observations (defaults to np.float32).

  • action_type (type) – the data type of the actions (defaults to np.float32).

  • reward_type (type) – the data type of the rewards (defaults to np.float32).

  • rng (np.random.Generator, optional) – a random number generator when sampling batches. If None (default value), a new default generator will be used.

  • max_trajectory_length (int, optional) – if given, indicates that trajectory information should be stored and that trajectories will be at most this number of steps. Defaults to None in which case no trajectory information will be kept. The buffer will keep trajectory information automatically using the done value when calling add().

Warning

When using max_trajectory_length it is the user’s responsibility to ensure that trajectories are stored continuously in the replay buffer.

add(obs: numpy.ndarray, action: numpy.ndarray, next_obs: numpy.ndarray, reward: float, done: bool)

Adds a transition (s, a, s’, r, done) to the replay buffer.

Parameters
  • obs (np.ndarray) – the observation at time t.

  • action (np.ndarray) – the action at time t.

  • next_obs (np.ndarray) – the observation at time t + 1.

  • reward (float) – the reward at time t + 1.

  • done (bool) – a boolean indicating whether the episode ended or not.

get_all(shuffle: bool = False) → mbrl.types.TransitionBatch

Returns all data stored in the replay buffer.

Parameters
  • shuffle (int) – set to True if the data returned should be in random order.

  • to False. (Defaults) –

get_iterators(batch_size: int, val_ratio: float, train_ensemble: bool = False, ensemble_size: Optional[int] = None, shuffle_each_epoch: bool = True, bootstrap_permutes: bool = False) → Tuple[mbrl.util.replay_buffer.TransitionIterator, Optional[mbrl.util.replay_buffer.TransitionIterator]]

Returns training/validation iterators for the data in the replay buffer.

Deprecated since version v0.1.2: Use mbrl.util.common.get_basic_buffer_iterators().

Parameters
  • batch_size (int) – the batch size for the iterators.

  • val_ratio (float) – the proportion of data to use for validation. If 0., the validation buffer will be set to None.

  • train_ensemble (bool) – if True, the training iterator will be and instance of BootstrapIterator. Defaults to False.

  • ensemble_size (int) – the size of the ensemble being trained. Must be provided if train_ensemble == True.

  • shuffle_each_epoch (bool) – if True, the iterator will shuffle the order each time a loop starts. Otherwise the iteration order will be the same. Defaults to True.

  • bootstrap_permutes (bool) – if True, the bootstrap iterator will create the bootstrap data using permutations of the original data. Otherwise it will use sampling with replacement. Defaults to False.

load(load_dir: Union[pathlib.Path, str])

Loads transition data from a given directory.

Parameters

load_dir (str) – the directory where the buffer is stored.

sample(batch_size: int) → mbrl.types.TransitionBatch

Samples a batch of transitions from the replay buffer.

Parameters

batch_size (int) – the number of samples required.

Returns

the sampled values of observations, actions, next observations, rewards and done indicators, as numpy arrays, respectively. The i-th transition corresponds to (obs[i], act[i], next_obs[i], rewards[i], dones[i]).

Return type

(tuple)

sample_trajectory() → Optional[mbrl.types.TransitionBatch]

Samples a full trajectory and returns it as a batch.

Returns

A tuple with observations, actions, next observations, rewards and done indicators, as numpy arrays, respectively; these will correspond to a full trajectory. The i-th transition corresponds to (obs[i], act[i], next_obs[i], rewards[i], dones[i]).

Return type

(tuple)

save(save_dir: Union[pathlib.Path, str])

Saves the data in the replay buffer to a given directory.

Parameters

save_dir (str) – the directory to save the data to. File name will be replay_buffer.npz.

class mbrl.util.replay_buffer.SequenceTransitionIterator(transitions: mbrl.types.TransitionBatch, trajectory_indices: List[Tuple[int, int]], batch_size: int, sequence_length: int, ensemble_size: int, shuffle_each_epoch: bool = False, rng: Optional[numpy.random._generator.Generator] = None, max_batches_per_loop: Optional[int] = None)

Bases: mbrl.util.replay_buffer.BootstrapIterator

A transition iterator that provides sequences of transitions.

Returns batches of short sequences of transitions in the buffer, corresponding to fixed-length segments of the trajectories indicated by the given trajectory indices. The start states of all trajectories are sampled uniformly at random from the set of states from which a sequence of the desired length can be started.

When iterating over this object, batches might contain overlapping trajectories. By default, a full loop over this iterator will return as many samples as valid start states there are (but start states could be repeated, they are sampled with replacement). Since this is unlikely necessary, you can use input argument batches_per_epoch to only return a smaller number of batches.

Note that this is a bootstrap iterator, so it can return an extra model dimension, where each batch is sampled independently. By default, each observation batch is of shape (ensemble_size x batch_size x sequence_length x obs_size) – likewise for actions, rewards, dones. If not in bootstrap mode, then the ensemble_size dimension is removed.

Parameters
  • transitions (TransitionBatch) – the transition data used to built the iterator.

  • trajectory_indices (list(tuple(int, int)) – a list of [start, end) indices for trajectories.

  • batch_size (int) – the batch size to use when iterating over the stored data.

  • sequence_length (int) – the length of the sequences returned.

  • ensemble_size (int) – the number of models in the ensemble.

  • shuffle_each_epoch (bool) – if True the iteration order is shuffled everytime a loop over the data is completed. Defaults to False.

  • rng (np.random.Generator, optional) – a random number generator when sampling batches. If None (default value), a new default generator will be used.

  • max_batches_per_loop (int, optional) – if given, specifies how many batches to return (at most) over a full loop of the iterator.

class mbrl.util.replay_buffer.SequenceTransitionSampler(transitions: mbrl.types.TransitionBatch, trajectory_indices: List[Tuple[int, int]], batch_size: int, sequence_length: int, batches_per_loop: int, rng: Optional[numpy.random._generator.Generator] = None)

Bases: mbrl.util.replay_buffer.TransitionIterator

A transition iterator that provides sequences of transitions sampled at random.

Returns batches of short sequences of transitions in the buffer, corresponding to fixed-length segments of the trajectories indicated by the given trajectory indices. The start states of all trajectories are sampled uniformly at random from the set of states from which a sequence of the desired length can be started. When iterating over this object, batches might contain overlapping trajectories.

Parameters
  • transitions (TransitionBatch) – the transition data used to built the iterator.

  • trajectory_indices (list(tuple(int, int)) – a list of [start, end) indices for trajectories.

  • batch_size (int) – the batch size to use when iterating over the stored data.

  • sequence_length (int) – the length of the sequences returned.

  • batches_per_loop (int) – if given, specifies how many batches to return (at most) over a full loop of the iterator.

  • rng (np.random.Generator, optional) – a random number generator when sampling batches. If None (default value), a new default generator will be used.

class mbrl.util.replay_buffer.TransitionIterator(transitions: mbrl.types.TransitionBatch, batch_size: int, shuffle_each_epoch: bool = False, rng: Optional[numpy.random._generator.Generator] = None)

Bases: object

An iterator for batches of transitions.

The iterator can be used doing:

for batch in batch_iterator:
    do_something_with_batch()

Rather than be constructed directly, the preferred way to use objects of this class is for the user to obtain them from ReplayBuffer.

Parameters
  • transitions (TransitionBatch) – the transition data used to built the iterator.

  • batch_size (int) – the batch size to use when iterating over the stored data.

  • shuffle_each_epoch (bool) – if True the iteration order is shuffled everytime a loop over the data is completed. Defaults to False.

  • rng (np.random.Generator, optional) – a random number generator when sampling batches. If None (default value), a new default generator will be used.