rlstructures API¶
DictTensor, TemporalDictTensor, Trajectories¶
-
class
rlstructures.core.
DictTensor
(v: Optional[Dict] = None)[source]¶ A dictionary of torch.Tensor. The first dimension of each tensor is the batch dimension such that all tensors have the same batch dimension size.
-
cat
() → rlstructures.core.DictTensor[source]¶ Aggregate multiple packed tensors over the batch dimension
- Parameters
tensors (list) – a list of tensors
-
clone
() → rlstructures.core.DictTensor[source]¶ Clone the dicttensor by cloning all its tensors :rtype: DictTensor
-
copy_
(source, source_indexes, destination_indexes)[source]¶ Copy the values of a source TDT at given indexes to the current TDT at the specified indexes
-
device
() → torch.device[source]¶ Return the device of the tensors stored in the DictTensor. :rtype: torch.device
-
get
(keys: Iterable[str], clone=False) → rlstructures.core.DictTensor[source]¶ Returns a DictTensor composed of a subset of the tensors specifed by their keys
- Parameters
keys (Iterable[str]) – The keys to keep in the new DictTensor
clone (bool, optional) – if True, the new DictTensor is composed of clone of the original tensors, defaults to False
- Return type
-
index
(index: int) → rlstructures.core.DictTensor[source]¶ The same as self.slice(index)
-
n_elems
() → int[source]¶ Return the size of size of the batch dimension (i.e the first dimension of the tensors)
-
prepend_key
(_str: str) → rlstructures.core.DictTensor[source]¶ Return a new DictTensor where _str has been concatenated to all the keys
-
set
(key: str, value: torch.Tensor)[source]¶ Add a tensor to the DictTensor
- Parameters
key (str) – the name of the tensor
value (torch.Tensor) – the tensor to add, with a correct batch dimension size
-
slice
(index_from: int, index_to: Optional[int] = None) → rlstructures.core.DictTensor[source]¶ Returns a dict tensor, keeping only batch dimensions between index_from and index_to+1
- Parameters
index_from (int) – The first batch index to keep
index_to (int, optional) – The last+1 batch index to keep. If None, then just index_from is kept
- Return type
-
truncate_key
(_str: str) → rlstructures.core.DictTensor[source]¶ Return a new DictTensor where _str has been removed to all the keys that have _str as a prefix
-
unfold
() → List[rlstructures.core.DictTensor][source]¶ Returns a list of DictTensor, each DictTensor capturing one element of the batch dimension (i.e suc that n_elems()==1)
-
-
class
rlstructures.core.
TemporalDictTensor
(from_dict: Dict[torch.Tensor], lengths: torch.Tensor = None)[source]¶ Describe a batch of temporal tensors where: * each tensor has a name * each tensor is of size B x T x …., where B is the batch index, and T
the time index
the length tensor gives the number of timesteps for each batch
It is an extension of DictTensor where a temporal dimension has been added. The structure also allows dealing with batches of sequences of different sizes.
Note that self.lengths returns a tensor of the lengths of each element of the batch.
-
cat
() → rlstructures.core.TemporalDictTensor[source]¶ Aggregate multiple packed tensors over the batch dimension
- Parameters
tensors (list) – a list of tensors
-
copy_
(source, source_indexes, destination_indexes)[source]¶ Copy the values of a source TDT at given indexes to the current TDT at the specified indexes
-
get
(keys: Iterable[str]) → rlstructures.core.TemporalDictTensor[source]¶ Returns a subset of the TemporalDictTensor depending on the specifed keys
- Parameters
keys (iterable) – the keys to keep in the new TemporalDictTensor
-
index
(index: int) → rlstructures.core.TemporalDictTensor[source]¶ Returns the 1xT TemporalDictTensor for the specified batch index
-
mask
() → torch.Tensor[source]¶ Returns a mask over sequences based on the length of each trajectory
Considering that the TemporalDictTensor is of size B x T, the mask is a float tensor (0.0 or 1.0) of size BxT. A 0.0 value means that the value at b x t is not set in the TemporalDictTensor.
-
masked_temporal_index
(index_t: int) → [DictTensor, torch.Tensor][source]¶ Return a DictTensor at time t along with a mapping vector Considering the TemporalDictTensor is of size BxT, the method returns a TemporalDictTensor of size B’xT and a tensor of size B’ where:
- only the B’ relevant dimension has been kept (depending on the
index_t < self.lengths criterion)
- the mapping vector maps each of the B’ dimension to the B
dimension of the original TemporalDictTensor
-
n_elems
() → int[source]¶ Returns the number of element in the TemporalDictTensor (i.e size of the first dimension of each tensor).
-
shorten
() → rlstructures.core.TemporalDictTensor[source]¶ Restrict the size of the variables (in term of timesteps) to provide the smallest possible tensors.
If the TemporalDictTensor is of size B x T, considering that Tmax = self.lengths.max(), then it returns a TemporalDictTensor of size B x Tmax
-
slice
(index_from: int, index_to: Optional[int] = None) → rlstructures.core.TemporalDictTensor[source]¶ Returns a slice (in the batch dimension)
-
temporal_index
(index_t: int) → rlstructures.core.TemporalDictTensor[source]¶ Return a DictTensor corresponding to the TemporalDictTensor at time index_t.
-
temporal_multi_index
(index_t: torch.Tensor) → rlstructures.core.TemporalDictTensor[source]¶ Return a DictTensor corresponding to the TemporalDictTensor at time index_t
-
temporal_slice
(index_from: int, index_to: int) → rlstructures.core.TemporalDictTensor[source]¶ Returns a slice (in the temporal dimension)
-
to
(device: torch.device)[source]¶ Returns a copy of the TemporalDictTensor to the provided device (if needed).
-
unfold
() → List[rlstructures.core.TemporalDictTensor][source]¶ Return a list of TemporalDictTensor of size 1 x T
-
rlstructures.core.
masked_dicttensor
(dicttensor0, dicttensor1, mask)[source]¶ Same as masked_tensor, but for DictTensor
-
rlstructures.core.
masked_tensor
(tensor0, tensor1, mask)[source]¶ Compute a tensor by combining two tensors with a mask
- Parameters
tensor0 (torch.Tensor) – a Bx(N) tensor
tensor1 (torch.Tensor) – a Bx(N) tensor
mask (torch.Tensor) – a B tensor
- Returns
(1-m) * tensor 0 + m *tensor1 (averafging is made ine by line)
- Return type
tensor0.dtype
VecEnv¶
-
class
rlstructures.env.
VecEnv
[source]¶ An VecEnvironment corresponds to multiple ‘gym’ environments (i.e a batch) that are running simultaneously.
At each timestep, upon the B environments, a subset B’ of envs are running (since some envs may have stopped).
So each observation returned by the VecEnv is a DictTensor of size B’. To mark which environments that are still running, the observation is returned with a mapping vector of size B’. e.g [0,2,5] means that the observation 0 corresponds to the env 0, the observation 1 corresponds to env 2, and the observation 3 corresponds to env 5.
Finally, when running a step (at time t) method (over B’ running envs), the agent has to provide an action (DictTensor) of size B’. The VecEnv will return the next observation (time t+1) (size B’). But some of the B’ envs may have stopped at t+1, such that actually only B’’ envs are still running. The step method will thus also return a B’’ observation (and corresponding mapping).
- The return of the step function is thus:
((DictTensor of size B’, tensor of size B’), (Dicttensor of size B’’, mapping vector if size B’’))
-
n_envs
() → int[source]¶ Returns the number of environment instances contained in this env :rtype: int
-
reset
(env_info: rlstructures.core.DictTensor = None)[source]¶ reset the environments instances
- Parameters
env_info (DictTensor, optional) – a DictTensor of size n_envs, such that each value will be transmitted to each environment instance
-
step
(policy_output: rlstructures.core.DictTensor) → [[<class ‘rlstructures.core.DictTensor’>, <class ‘torch.Tensor’>], [<class ‘rlstructures.core.DictTensor’>, <class ‘torch.Tensor’>]][source]¶ Execute one step over alll the running environment instances
- Parameters
policy_output (DictTensor) – the output given by the policy
- Returns
see general description
- Return type
[[DictTensor,torch.Tensor],[DictTensor,torch.Tensor]]
RL_Agent¶
-
class
rlstructures.rl_batchers.agent.
RL_Agent
[source]¶ Defines an agent representing policies
-
call_replay
(trajectories: rlstructures.core.Trajectories, t: int, state)[source]¶ A default function used when replaying an agent over trajectories
- Parameters
trajectories (Trajectories) – The trajectories on which one wants to replay the agent
t (int) – The current timestep in the trajectories
state ([type]) – The current state of the replay process (or None if t==0)
- Returns
All the actions and internal state of the agents during the trajectories
- Return type
-
initial_state
(agent_info: rlstructures.core.DictTensor, B: int)[source]¶ Returns the initial internal state of the agent
- Parameters
agent_info (DictTensor) – the agent_info used to reset the agent
B (int) – the number of single environments the agent has to deal with
Note that agent_info.n_ellems()==B or agent_info.empty()
Returns: DicTensor
-
-
class
rlstructures.rl_batchers.agent.
RL_Agent_CheckDevice
(agent, device)[source]¶ This class is used to check that an Agent is working correctly on a particular device It does not modify the behaviour of the agent, but check that input/output are on the right devices
-
call_replay
(trajectories: rlstructures.core.Trajectories, t: int, state)[source]¶ A default function used when replaying an agent over trajectories
- Parameters
trajectories (Trajectories) – The trajectories on which one wants to replay the agent
t (int) – The current timestep in the trajectories
state ([type]) – The current state of the replay process (or None if t==0)
- Returns
All the actions and internal state of the agents during the trajectories
- Return type
-
initial_state
(agent_info: rlstructures.core.DictTensor, B: int)[source]¶ Returns the initial internal state of the agent
- Parameters
agent_info (DictTensor) – the agent_info used to reset the agent
B (int) – the number of single environments the agent has to deal with
Note that agent_info.n_ellems()==B or agent_info.empty()
Returns: DicTensor
-
-
rlstructures.rl_batchers.agent.
replay_agent
(agent, trajectories: rlstructures.core.Trajectories, replay_method_name: str = 'call_replay')[source]¶ Replay transitions one by one in the temporal order, passing a state between each call returns a TDT
-
rlstructures.rl_batchers.agent.
replay_agent_stateless
(agent, trajectories: rlstructures.core.Trajectories, replay_method_name: str)[source]¶ Replay transitions all in one returns a TDT
RL_Batcher¶
-
class
rlstructures.rl_batchers.batcher.
RL_Batcher
(n_timesteps, create_agent, agent_args, create_env, env_args, n_processes, seeds, agent_info, env_info, agent_seeds=None, device=device(type='cpu'))[source]¶