rlstructures API¶

DictTensor, TemporalDictTensor, Trajectories¶

class rlstructures.core.DictTensor(v: Optional[Dict] = None)[source]¶

A dictionary of torch.Tensor. The first dimension of each tensor is the batch dimension such that all tensors have the same batch dimension size.

cat() → rlstructures.core.DictTensor [source]¶

Aggregate multiple packed tensors over the batch dimension

Parameters: tensors (list) – a list of tensors

clone() → rlstructures.core.DictTensor [source]¶: Clone the dicttensor by cloning all its tensors :rtype: DictTensor

copy_(source, source_indexes, destination_indexes)[source]¶: Copy the values of a source TDT at given indexes to the current TDT at the specified indexes

device() → torch.device[source]¶: Return the device of the tensors stored in the DictTensor. :rtype: torch.device

empty() → bool[source]¶: Is the DictTensor empty? (no tensors in it) :rtype: bool

get(keys: Iterable[str], clone=False) → rlstructures.core.DictTensor [source]¶

Returns a DictTensor composed of a subset of the tensors specifed by their keys

Parameters

keys (Iterable[str]) – The keys to keep in the new DictTensor
clone (bool, optional) – if True, the new DictTensor is composed of clone of the original tensors, defaults to False

Return type

DictTensor

index(index: int) → rlstructures.core.DictTensor [source]¶: The same as self.slice(index)

keys() → Iterable[str][source]¶: Return the keys of the DictTensor (as an iterator)

n_elems() → int[source]¶: Return the size of size of the batch dimension (i.e the first dimension of the tensors)

prepend_key(_str: str) → rlstructures.core.DictTensor [source]¶: Return a new DictTensor where _str has been concatenated to all the keys

set(key: str, value: torch.Tensor)[source]¶

Add a tensor to the DictTensor

Parameters

key (str) – the name of the tensor
value (torch.Tensor) – the tensor to add, with a correct batch dimension size

slice(index_from: int, index_to: Optional[int] = None) → rlstructures.core.DictTensor [source]¶

Returns a dict tensor, keeping only batch dimensions between index_from and index_to+1

Parameters

index_from (int) – The first batch index to keep
index_to (int, optional) – The last+1 batch index to keep. If None, then just index_from is kept

Return type

DictTensor

specs()[source]¶: Return the specifications of the dicttensor as a dictionary

to(device: torch.device)[source]¶: Create a copy of the DictTensor on a new device (if needed)

truncate_key(_str: str) → rlstructures.core.DictTensor [source]¶: Return a new DictTensor where _str has been removed to all the keys that have _str as a prefix

unfold() → List[rlstructures.core.DictTensor][source]¶: Returns a list of DictTensor, each DictTensor capturing one element of the batch dimension (i.e suc that n_elems()==1)

class rlstructures.core.TemporalDictTensor(from_dict: Dict[torch.Tensor], lengths: torch.Tensor = None)[source]¶

Describe a batch of temporal tensors where: * each tensor has a name * each tensor is of size B x T x …., where B is the batch index, and T

the time index

the length tensor gives the number of timesteps for each batch

It is an extension of DictTensor where a temporal dimension has been added. The structure also allows dealing with batches of sequences of different sizes.

Note that self.lengths returns a tensor of the lengths of each element of the batch.

cat() → rlstructures.core.TemporalDictTensor [source]¶

Aggregate multiple packed tensors over the batch dimension

Parameters: tensors (list) – a list of tensors

clone()[source]¶

copy_(source, source_indexes, destination_indexes)[source]¶: Copy the values of a source TDT at given indexes to the current TDT at the specified indexes

device() → torch.device[source]¶: Returns the device of the TemporalDictTensor

expand(new_batch_size)[source]¶: Expand a TemporalDictTensor to reach a given batch_size

full()[source]¶: returns True if self.lengths==self.lengts.max() => No empty element

get(keys: Iterable[str]) → rlstructures.core.TemporalDictTensor [source]¶

Returns a subset of the TemporalDictTensor depending on the specifed keys

Parameters: keys (iterable) – the keys to keep in the new TemporalDictTensor

index(index: int) → rlstructures.core.TemporalDictTensor [source]¶: Returns the 1xT TemporalDictTensor for the specified batch index

keys() → Iterable[str][source]¶: Returns the keys in the TemporalDictTensor

mask() → torch.Tensor[source]¶

Returns a mask over sequences based on the length of each trajectory

Considering that the TemporalDictTensor is of size B x T, the mask is a float tensor (0.0 or 1.0) of size BxT. A 0.0 value means that the value at b x t is not set in the TemporalDictTensor.

masked_temporal_index(index_t: int) → [DictTensor, torch.Tensor][source]¶

Return a DictTensor at time t along with a mapping vector Considering the TemporalDictTensor is of size BxT, the method returns a TemporalDictTensor of size B’xT and a tensor of size B’ where:

only the B’ relevant dimension has been kept (depending on the
index_t < self.lengths criterion)

the mapping vector maps each of the B’ dimension to the B
dimension of the original TemporalDictTensor

n_elems() → int[source]¶: Returns the number of element in the TemporalDictTensor (i.e size of the first dimension of each tensor).

set(name, tensor)[source]¶

shorten() → rlstructures.core.TemporalDictTensor [source]¶

Restrict the size of the variables (in term of timesteps) to provide the smallest possible tensors.

If the TemporalDictTensor is of size B x T, considering that Tmax = self.lengths.max(), then it returns a TemporalDictTensor of size B x Tmax

slice(index_from: int, index_to: Optional[int] = None) → rlstructures.core.TemporalDictTensor [source]¶: Returns a slice (in the batch dimension)

specs()[source]¶

temporal_index(index_t: int) → rlstructures.core.TemporalDictTensor [source]¶: Return a DictTensor corresponding to the TemporalDictTensor at time index_t.

temporal_multi_index(index_t: torch.Tensor) → rlstructures.core.TemporalDictTensor [source]¶: Return a DictTensor corresponding to the TemporalDictTensor at time index_t

temporal_slice(index_from: int, index_to: int) → rlstructures.core.TemporalDictTensor [source]¶: Returns a slice (in the temporal dimension)

to(device: torch.device)[source]¶: Returns a copy of the TemporalDictTensor to the provided device (if needed).

unfold() → List[rlstructures.core.TemporalDictTensor][source]¶: Return a list of TemporalDictTensor of size 1 x T

class rlstructures.core.Trajectories(info, trajectories)[source]¶

cat()[source]¶

device()[source]¶

n_elems()[source]¶

sample(n)[source]¶

to(device)[source]¶

rlstructures.core.masked_dicttensor(dicttensor0, dicttensor1, mask)[source]¶: Same as masked_tensor, but for DictTensor

rlstructures.core.masked_tensor(tensor0, tensor1, mask)[source]¶

Compute a tensor by combining two tensors with a mask

Parameters

tensor0 (torch.Tensor) – a Bx(N) tensor
tensor1 (torch.Tensor) – a Bx(N) tensor
mask (torch.Tensor) – a B tensor

Returns

(1-m) * tensor 0 + m *tensor1 (averafging is made ine by line)

Return type

tensor0.dtype

VecEnv¶

class rlstructures.env.VecEnv[source]¶

An VecEnvironment corresponds to multiple ‘gym’ environments (i.e a batch) that are running simultaneously.

At each timestep, upon the B environments, a subset B’ of envs are running (since some envs may have stopped).

So each observation returned by the VecEnv is a DictTensor of size B’. To mark which environments that are still running, the observation is returned with a mapping vector of size B’. e.g [0,2,5] means that the observation 0 corresponds to the env 0, the observation 1 corresponds to env 2, and the observation 3 corresponds to env 5.

Finally, when running a step (at time t) method (over B’ running envs), the agent has to provide an action (DictTensor) of size B’. The VecEnv will return the next observation (time t+1) (size B’). But some of the B’ envs may have stopped at t+1, such that actually only B’’ envs are still running. The step method will thus also return a B’’ observation (and corresponding mapping).

The return of the step function is thus:: ((DictTensor of size B’, tensor of size B’), (Dicttensor of size B’’, mapping vector if size B’’))

close()[source]¶: Terminate the environment

n_envs() → int[source]¶: Returns the number of environment instances contained in this env :rtype: int

reset(env_info: rlstructures.core.DictTensor = None)[source]¶

reset the environments instances

Parameters: env_info (DictTensor, optional) – a DictTensor of size n_envs, such that each value will be transmitted to each environment instance

step(policy_output: rlstructures.core.DictTensor) → [[<class ‘rlstructures.core.DictTensor’>, <class ‘torch.Tensor’>], [<class ‘rlstructures.core.DictTensor’>, <class ‘torch.Tensor’>]][source]¶

Execute one step over alll the running environment instances

Parameters: policy_output (DictTensor) – the output given by the policy
Returns: see general description
Return type: [[DictTensor,torch.Tensor],[DictTensor,torch.Tensor]]

RL_Agent¶

class rlstructures.rl_batchers.agent.RL_Agent[source]¶

Defines an agent representing policies

call_replay(trajectories: rlstructures.core.Trajectories, t: int, state)[source]¶

A default function used when replaying an agent over trajectories

Parameters

trajectories (Trajectories) – The trajectories on which one wants to replay the agent
t (int) – The current timestep in the trajectories
state ([type]) – The current state of the replay process (or None if t==0)

Returns

All the actions and internal state of the agents during the trajectories

Return type

[TemporalDictTensor]

close()[source]¶

initial_state(agent_info: rlstructures.core.DictTensor, B: int)[source]¶

Returns the initial internal state of the agent

Parameters

agent_info (DictTensor) – the agent_info used to reset the agent
B (int) – the number of single environments the agent has to deal with

Note that agent_info.n_ellems()==B or agent_info.empty()

Returns: DicTensor

require_history()[source]¶: If the function returns true, then the __call__ function will received a not None history argument

seed(seed)[source]¶: Use to choose the seed of the agent

update(info)[source]¶: Update the agent (e.g the model)

class rlstructures.rl_batchers.agent.RL_Agent_CheckDevice(agent, device)[source]¶

This class is used to check that an Agent is working correctly on a particular device It does not modify the behaviour of the agent, but check that input/output are on the right devices

call_replay(trajectories: rlstructures.core.Trajectories, t: int, state)[source]¶

A default function used when replaying an agent over trajectories

Parameters

trajectories (Trajectories) – The trajectories on which one wants to replay the agent
t (int) – The current timestep in the trajectories
state ([type]) – The current state of the replay process (or None if t==0)

Returns

All the actions and internal state of the agents during the trajectories

Return type

[TemporalDictTensor]

close()[source]¶

initial_state(agent_info: rlstructures.core.DictTensor, B: int)[source]¶

Returns the initial internal state of the agent

Parameters

agent_info (DictTensor) – the agent_info used to reset the agent
B (int) – the number of single environments the agent has to deal with

Note that agent_info.n_ellems()==B or agent_info.empty()

Returns: DicTensor

require_history()[source]¶: If the function returns true, then the __call__ function will received a not None history argument

update(info)[source]¶: Update the agent (e.g the model)

rlstructures.rl_batchers.agent.replay_agent(agent, trajectories: rlstructures.core.Trajectories, replay_method_name: str = 'call_replay')[source]¶: Replay transitions one by one in the temporal order, passing a state between each call returns a TDT

rlstructures.rl_batchers.agent.replay_agent_stateless(agent, trajectories: rlstructures.core.Trajectories, replay_method_name: str)[source]¶: Replay transitions all in one returns a TDT

RL_Batcher¶

class rlstructures.rl_batchers.batcher.RL_Batcher(n_timesteps, create_agent, agent_args, create_env, env_args, n_processes, seeds, agent_info, env_info, agent_seeds=None, device=device(type='cpu'))[source]¶

close()[source]¶

execute(agent_info=None)[source]¶

get(blocking=True)[source]¶

n_elems()[source]¶

reset(agent_info=<rlstructures.core.DictTensor object>, env_info=<rlstructures.core.DictTensor object>)[source]¶

update(info)[source]¶