Replay buffer module¶
-
class
mbrl.util.replay_buffer.
BootstrapIterator
(transitions: mbrl.types.TransitionBatch, batch_size: int, ensemble_size: int, shuffle_each_epoch: bool = False, permute_indices: bool = True, rng: Optional[numpy.random._generator.Generator] = None)¶ Bases:
mbrl.util.replay_buffer.TransitionIterator
A transition iterator that can be used to train ensemble of bootstrapped models.
When iterating, this iterator samples from a different set of indices for each model in the ensemble, essentially assigning a different dataset to each model. Each batch is of shape (ensemble_size x batch_size x obs_size) – likewise for actions, rewards, dones.
- Parameters
transitions (
TransitionBatch
) – the transition data used to built the iterator.batch_size (int) – the batch size to use when iterating over the stored data.
ensemble_size (int) – the number of models in the ensemble.
shuffle_each_epoch (bool) – if
True
the iteration order is shuffled everytime a loop over the data is completed. Defaults toFalse
.permute_indices (boot) – if
True
the bootstrap datasets are just permutations of the original data. IfFalse
they are sampled with replacement. Defaults toTrue
.rng (np.random.Generator, optional) – a random number generator when sampling batches. If None (default value), a new default generator will be used.
Note
If you want to make other custom types of iterators compatible with ensembles of bootstrapped models, the easiest way is to subclass
BootstrapIterator
and overwrite__getitem()__
method. The sampling methods of this class will then batch the result of ofself[item]
along a model dimension, where each batch is sampled independently.-
toggle_bootstrap
()¶ Toggles whether the iterator returns a batch per model or a single batch.
-
class
mbrl.util.replay_buffer.
ReplayBuffer
(capacity: int, obs_shape: Sequence[int], action_shape: Sequence[int], obs_type: Type = <class 'numpy.float32'>, action_type: Type = <class 'numpy.float32'>, reward_type: Type = <class 'numpy.float32'>, rng: Optional[numpy.random._generator.Generator] = None, max_trajectory_length: Optional[int] = None)¶ Bases:
object
A replay buffer with support for training/validation iterators and ensembles.
This buffer can be pushed to and sampled from as a typical replay buffer.
- Parameters
capacity (int) – the maximum number of transitions that the buffer can store. When the capacity is reached, the contents are overwritten in FIFO fashion.
obs_shape (Sequence of ints) – the shape of the observations to store.
action_shape (Sequence of ints) – the shape of the actions to store.
obs_type (type) – the data type of the observations (defaults to np.float32).
action_type (type) – the data type of the actions (defaults to np.float32).
reward_type (type) – the data type of the rewards (defaults to np.float32).
rng (np.random.Generator, optional) – a random number generator when sampling batches. If None (default value), a new default generator will be used.
max_trajectory_length (int, optional) – if given, indicates that trajectory information should be stored and that trajectories will be at most this number of steps. Defaults to
None
in which case no trajectory information will be kept. The buffer will keep trajectory information automatically using the done value when callingadd()
.
Warning
When using
max_trajectory_length
it is the user’s responsibility to ensure that trajectories are stored continuously in the replay buffer.-
add
(obs: numpy.ndarray, action: numpy.ndarray, next_obs: numpy.ndarray, reward: float, done: bool)¶ Adds a transition (s, a, s’, r, done) to the replay buffer.
- Parameters
obs (np.ndarray) – the observation at time t.
action (np.ndarray) – the action at time t.
next_obs (np.ndarray) – the observation at time t + 1.
reward (float) – the reward at time t + 1.
done (bool) – a boolean indicating whether the episode ended or not.
-
get_all
(shuffle: bool = False) → mbrl.types.TransitionBatch¶ Returns all data stored in the replay buffer.
- Parameters
shuffle (int) – set to
True
if the data returned should be in random order.to False. (Defaults) –
-
get_iterators
(batch_size: int, val_ratio: float, train_ensemble: bool = False, ensemble_size: Optional[int] = None, shuffle_each_epoch: bool = True, bootstrap_permutes: bool = False) → Tuple[mbrl.util.replay_buffer.TransitionIterator, Optional[mbrl.util.replay_buffer.TransitionIterator]]¶ Returns training/validation iterators for the data in the replay buffer.
Deprecated since version v0.1.2: Use
mbrl.util.common.get_basic_buffer_iterators()
.- Parameters
batch_size (int) – the batch size for the iterators.
val_ratio (float) – the proportion of data to use for validation. If 0., the validation buffer will be set to
None
.train_ensemble (bool) – if
True
, the training iterator will be and instance ofBootstrapIterator
. Defaults toFalse
.ensemble_size (int) – the size of the ensemble being trained. Must be provided if
train_ensemble == True
.shuffle_each_epoch (bool) – if
True
, the iterator will shuffle the order each time a loop starts. Otherwise the iteration order will be the same. Defaults toTrue
.bootstrap_permutes (bool) – if
True
, the bootstrap iterator will create the bootstrap data using permutations of the original data. Otherwise it will use sampling with replacement. Defaults toFalse
.
-
load
(load_dir: Union[pathlib.Path, str])¶ Loads transition data from a given directory.
- Parameters
load_dir (str) – the directory where the buffer is stored.
-
sample
(batch_size: int) → mbrl.types.TransitionBatch¶ Samples a batch of transitions from the replay buffer.
- Parameters
batch_size (int) – the number of samples required.
- Returns
the sampled values of observations, actions, next observations, rewards and done indicators, as numpy arrays, respectively. The i-th transition corresponds to (obs[i], act[i], next_obs[i], rewards[i], dones[i]).
- Return type
(tuple)
-
sample_trajectory
() → Optional[mbrl.types.TransitionBatch]¶ Samples a full trajectory and returns it as a batch.
- Returns
A tuple with observations, actions, next observations, rewards and done indicators, as numpy arrays, respectively; these will correspond to a full trajectory. The i-th transition corresponds to (obs[i], act[i], next_obs[i], rewards[i], dones[i]).
- Return type
(tuple)
-
save
(save_dir: Union[pathlib.Path, str])¶ Saves the data in the replay buffer to a given directory.
- Parameters
save_dir (str) – the directory to save the data to. File name will be replay_buffer.npz.
-
class
mbrl.util.replay_buffer.
SequenceTransitionIterator
(transitions: mbrl.types.TransitionBatch, trajectory_indices: List[Tuple[int, int]], batch_size: int, sequence_length: int, ensemble_size: int, shuffle_each_epoch: bool = False, rng: Optional[numpy.random._generator.Generator] = None, max_batches_per_loop: Optional[int] = None)¶ Bases:
mbrl.util.replay_buffer.BootstrapIterator
A transition iterator that provides sequences of transitions.
Returns batches of short sequences of transitions in the buffer, corresponding to fixed-length segments of the trajectories indicated by the given trajectory indices. The start states of all trajectories are sampled uniformly at random from the set of states from which a sequence of the desired length can be started.
When iterating over this object, batches might contain overlapping trajectories. By default, a full loop over this iterator will return as many samples as valid start states there are (but start states could be repeated, they are sampled with replacement). Since this is unlikely necessary, you can use input argument
batches_per_epoch
to only return a smaller number of batches.Note that this is a bootstrap iterator, so it can return an extra model dimension, where each batch is sampled independently. By default, each observation batch is of shape (ensemble_size x batch_size x sequence_length x obs_size) – likewise for actions, rewards, dones. If not in bootstrap mode, then the ensemble_size dimension is removed.
- Parameters
transitions (
TransitionBatch
) – the transition data used to built the iterator.trajectory_indices (list(tuple(int, int)) – a list of [start, end) indices for trajectories.
batch_size (int) – the batch size to use when iterating over the stored data.
sequence_length (int) – the length of the sequences returned.
ensemble_size (int) – the number of models in the ensemble.
shuffle_each_epoch (bool) – if
True
the iteration order is shuffled everytime a loop over the data is completed. Defaults toFalse
.rng (np.random.Generator, optional) – a random number generator when sampling batches. If
None
(default value), a new default generator will be used.max_batches_per_loop (int, optional) – if given, specifies how many batches to return (at most) over a full loop of the iterator.
-
class
mbrl.util.replay_buffer.
SequenceTransitionSampler
(transitions: mbrl.types.TransitionBatch, trajectory_indices: List[Tuple[int, int]], batch_size: int, sequence_length: int, batches_per_loop: int, rng: Optional[numpy.random._generator.Generator] = None)¶ Bases:
mbrl.util.replay_buffer.TransitionIterator
A transition iterator that provides sequences of transitions sampled at random.
Returns batches of short sequences of transitions in the buffer, corresponding to fixed-length segments of the trajectories indicated by the given trajectory indices. The start states of all trajectories are sampled uniformly at random from the set of states from which a sequence of the desired length can be started. When iterating over this object, batches might contain overlapping trajectories.
- Parameters
transitions (
TransitionBatch
) – the transition data used to built the iterator.trajectory_indices (list(tuple(int, int)) – a list of [start, end) indices for trajectories.
batch_size (int) – the batch size to use when iterating over the stored data.
sequence_length (int) – the length of the sequences returned.
batches_per_loop (int) – if given, specifies how many batches to return (at most) over a full loop of the iterator.
rng (np.random.Generator, optional) – a random number generator when sampling batches. If
None
(default value), a new default generator will be used.
-
class
mbrl.util.replay_buffer.
TransitionIterator
(transitions: mbrl.types.TransitionBatch, batch_size: int, shuffle_each_epoch: bool = False, rng: Optional[numpy.random._generator.Generator] = None)¶ Bases:
object
An iterator for batches of transitions.
The iterator can be used doing:
for batch in batch_iterator: do_something_with_batch()
Rather than be constructed directly, the preferred way to use objects of this class is for the user to obtain them from
ReplayBuffer
.- Parameters
transitions (
TransitionBatch
) – the transition data used to built the iterator.batch_size (int) – the batch size to use when iterating over the stored data.
shuffle_each_epoch (bool) – if
True
the iteration order is shuffled everytime a loop over the data is completed. Defaults toFalse
.rng (np.random.Generator, optional) – a random number generator when sampling batches. If None (default value), a new default generator will be used.