rlstructures API

DictTensor, TemporalDictTensor, Trajectories

class rlstructures.core.DictTensor(v: Optional[Dict] = None)[source]

A dictionary of torch.Tensor. The first dimension of each tensor is the batch dimension such that all tensors have the same batch dimension size.

cat()rlstructures.core.DictTensor[source]

Aggregate multiple packed tensors over the batch dimension

Parameters

tensors (list) – a list of tensors

clone()rlstructures.core.DictTensor[source]

Clone the dicttensor by cloning all its tensors :rtype: DictTensor

copy_(source, source_indexes, destination_indexes)[source]

Copy the values of a source TDT at given indexes to the current TDT at the specified indexes

device() → torch.device[source]

Return the device of the tensors stored in the DictTensor. :rtype: torch.device

empty() → bool[source]

Is the DictTensor empty? (no tensors in it) :rtype: bool

get(keys: Iterable[str], clone=False)rlstructures.core.DictTensor[source]

Returns a DictTensor composed of a subset of the tensors specifed by their keys

Parameters
  • keys (Iterable[str]) – The keys to keep in the new DictTensor

  • clone (bool, optional) – if True, the new DictTensor is composed of clone of the original tensors, defaults to False

Return type

DictTensor

index(index: int)rlstructures.core.DictTensor[source]

The same as self.slice(index)

keys() → Iterable[str][source]

Return the keys of the DictTensor (as an iterator)

n_elems() → int[source]

Return the size of size of the batch dimension (i.e the first dimension of the tensors)

prepend_key(_str: str)rlstructures.core.DictTensor[source]

Return a new DictTensor where _str has been concatenated to all the keys

set(key: str, value: torch.Tensor)[source]

Add a tensor to the DictTensor

Parameters
  • key (str) – the name of the tensor

  • value (torch.Tensor) – the tensor to add, with a correct batch dimension size

slice(index_from: int, index_to: Optional[int] = None)rlstructures.core.DictTensor[source]

Returns a dict tensor, keeping only batch dimensions between index_from and index_to+1

Parameters
  • index_from (int) – The first batch index to keep

  • index_to (int, optional) – The last+1 batch index to keep. If None, then just index_from is kept

Return type

DictTensor

specs()[source]

Return the specifications of the dicttensor as a dictionary

to(device: torch.device)[source]

Create a copy of the DictTensor on a new device (if needed)

truncate_key(_str: str)rlstructures.core.DictTensor[source]

Return a new DictTensor where _str has been removed to all the keys that have _str as a prefix

unfold() → List[rlstructures.core.DictTensor][source]

Returns a list of DictTensor, each DictTensor capturing one element of the batch dimension (i.e suc that n_elems()==1)

class rlstructures.core.TemporalDictTensor(from_dict: Dict[torch.Tensor], lengths: torch.Tensor = None)[source]

Describe a batch of temporal tensors where: * each tensor has a name * each tensor is of size B x T x …., where B is the batch index, and T

the time index

  • the length tensor gives the number of timesteps for each batch

It is an extension of DictTensor where a temporal dimension has been added. The structure also allows dealing with batches of sequences of different sizes.

Note that self.lengths returns a tensor of the lengths of each element of the batch.

cat()rlstructures.core.TemporalDictTensor[source]

Aggregate multiple packed tensors over the batch dimension

Parameters

tensors (list) – a list of tensors

clone()[source]
copy_(source, source_indexes, destination_indexes)[source]

Copy the values of a source TDT at given indexes to the current TDT at the specified indexes

device() → torch.device[source]

Returns the device of the TemporalDictTensor

expand(new_batch_size)[source]

Expand a TemporalDictTensor to reach a given batch_size

full()[source]

returns True if self.lengths==self.lengts.max() => No empty element

get(keys: Iterable[str])rlstructures.core.TemporalDictTensor[source]

Returns a subset of the TemporalDictTensor depending on the specifed keys

Parameters

keys (iterable) – the keys to keep in the new TemporalDictTensor

index(index: int)rlstructures.core.TemporalDictTensor[source]

Returns the 1xT TemporalDictTensor for the specified batch index

keys() → Iterable[str][source]

Returns the keys in the TemporalDictTensor

mask() → torch.Tensor[source]

Returns a mask over sequences based on the length of each trajectory

Considering that the TemporalDictTensor is of size B x T, the mask is a float tensor (0.0 or 1.0) of size BxT. A 0.0 value means that the value at b x t is not set in the TemporalDictTensor.

masked_temporal_index(index_t: int)[DictTensor, torch.Tensor][source]

Return a DictTensor at time t along with a mapping vector Considering the TemporalDictTensor is of size BxT, the method returns a TemporalDictTensor of size B’xT and a tensor of size B’ where:

  • only the B’ relevant dimension has been kept (depending on the

    index_t < self.lengths criterion)

  • the mapping vector maps each of the B’ dimension to the B

    dimension of the original TemporalDictTensor

n_elems() → int[source]

Returns the number of element in the TemporalDictTensor (i.e size of the first dimension of each tensor).

set(name, tensor)[source]
shorten()rlstructures.core.TemporalDictTensor[source]

Restrict the size of the variables (in term of timesteps) to provide the smallest possible tensors.

If the TemporalDictTensor is of size B x T, considering that Tmax = self.lengths.max(), then it returns a TemporalDictTensor of size B x Tmax

slice(index_from: int, index_to: Optional[int] = None)rlstructures.core.TemporalDictTensor[source]

Returns a slice (in the batch dimension)

specs()[source]
temporal_index(index_t: int)rlstructures.core.TemporalDictTensor[source]

Return a DictTensor corresponding to the TemporalDictTensor at time index_t.

temporal_multi_index(index_t: torch.Tensor)rlstructures.core.TemporalDictTensor[source]

Return a DictTensor corresponding to the TemporalDictTensor at time index_t

temporal_slice(index_from: int, index_to: int)rlstructures.core.TemporalDictTensor[source]

Returns a slice (in the temporal dimension)

to(device: torch.device)[source]

Returns a copy of the TemporalDictTensor to the provided device (if needed).

unfold() → List[rlstructures.core.TemporalDictTensor][source]

Return a list of TemporalDictTensor of size 1 x T

class rlstructures.core.Trajectories(info, trajectories)[source]
cat()[source]
device()[source]
n_elems()[source]
sample(n)[source]
to(device)[source]
rlstructures.core.masked_dicttensor(dicttensor0, dicttensor1, mask)[source]

Same as masked_tensor, but for DictTensor

rlstructures.core.masked_tensor(tensor0, tensor1, mask)[source]

Compute a tensor by combining two tensors with a mask

Parameters
  • tensor0 (torch.Tensor) – a Bx(N) tensor

  • tensor1 (torch.Tensor) – a Bx(N) tensor

  • mask (torch.Tensor) – a B tensor

Returns

(1-m) * tensor 0 + m *tensor1 (averafging is made ine by line)

Return type

tensor0.dtype

VecEnv

class rlstructures.env.VecEnv[source]

An VecEnvironment corresponds to multiple ‘gym’ environments (i.e a batch) that are running simultaneously.

At each timestep, upon the B environments, a subset B’ of envs are running (since some envs may have stopped).

So each observation returned by the VecEnv is a DictTensor of size B’. To mark which environments that are still running, the observation is returned with a mapping vector of size B’. e.g [0,2,5] means that the observation 0 corresponds to the env 0, the observation 1 corresponds to env 2, and the observation 3 corresponds to env 5.

Finally, when running a step (at time t) method (over B’ running envs), the agent has to provide an action (DictTensor) of size B’. The VecEnv will return the next observation (time t+1) (size B’). But some of the B’ envs may have stopped at t+1, such that actually only B’’ envs are still running. The step method will thus also return a B’’ observation (and corresponding mapping).

The return of the step function is thus:

((DictTensor of size B’, tensor of size B’), (Dicttensor of size B’’, mapping vector if size B’’))

close()[source]

Terminate the environment

n_envs() → int[source]

Returns the number of environment instances contained in this env :rtype: int

reset(env_info: rlstructures.core.DictTensor = None)[source]

reset the environments instances

Parameters

env_info (DictTensor, optional) – a DictTensor of size n_envs, such that each value will be transmitted to each environment instance

step(policy_output: rlstructures.core.DictTensor) → [[<class ‘rlstructures.core.DictTensor’>, <class ‘torch.Tensor’>], [<class ‘rlstructures.core.DictTensor’>, <class ‘torch.Tensor’>]][source]

Execute one step over alll the running environment instances

Parameters

policy_output (DictTensor) – the output given by the policy

Returns

see general description

Return type

[[DictTensor,torch.Tensor],[DictTensor,torch.Tensor]]

RL_Agent

class rlstructures.rl_batchers.agent.RL_Agent[source]

Defines an agent representing policies

call_replay(trajectories: rlstructures.core.Trajectories, t: int, state)[source]

A default function used when replaying an agent over trajectories

Parameters
  • trajectories (Trajectories) – The trajectories on which one wants to replay the agent

  • t (int) – The current timestep in the trajectories

  • state ([type]) – The current state of the replay process (or None if t==0)

Returns

All the actions and internal state of the agents during the trajectories

Return type

[TemporalDictTensor]

close()[source]
initial_state(agent_info: rlstructures.core.DictTensor, B: int)[source]

Returns the initial internal state of the agent

Parameters
  • agent_info (DictTensor) – the agent_info used to reset the agent

  • B (int) – the number of single environments the agent has to deal with

Note that agent_info.n_ellems()==B or agent_info.empty()

Returns: DicTensor

require_history()[source]

If the function returns true, then the __call__ function will received a not None history argument

seed(seed)[source]

Use to choose the seed of the agent

update(info)[source]

Update the agent (e.g the model)

class rlstructures.rl_batchers.agent.RL_Agent_CheckDevice(agent, device)[source]

This class is used to check that an Agent is working correctly on a particular device It does not modify the behaviour of the agent, but check that input/output are on the right devices

call_replay(trajectories: rlstructures.core.Trajectories, t: int, state)[source]

A default function used when replaying an agent over trajectories

Parameters
  • trajectories (Trajectories) – The trajectories on which one wants to replay the agent

  • t (int) – The current timestep in the trajectories

  • state ([type]) – The current state of the replay process (or None if t==0)

Returns

All the actions and internal state of the agents during the trajectories

Return type

[TemporalDictTensor]

close()[source]
initial_state(agent_info: rlstructures.core.DictTensor, B: int)[source]

Returns the initial internal state of the agent

Parameters
  • agent_info (DictTensor) – the agent_info used to reset the agent

  • B (int) – the number of single environments the agent has to deal with

Note that agent_info.n_ellems()==B or agent_info.empty()

Returns: DicTensor

require_history()[source]

If the function returns true, then the __call__ function will received a not None history argument

update(info)[source]

Update the agent (e.g the model)

rlstructures.rl_batchers.agent.replay_agent(agent, trajectories: rlstructures.core.Trajectories, replay_method_name: str = 'call_replay')[source]

Replay transitions one by one in the temporal order, passing a state between each call returns a TDT

rlstructures.rl_batchers.agent.replay_agent_stateless(agent, trajectories: rlstructures.core.Trajectories, replay_method_name: str)[source]

Replay transitions all in one returns a TDT

RL_Batcher

class rlstructures.rl_batchers.batcher.RL_Batcher(n_timesteps, create_agent, agent_args, create_env, env_args, n_processes, seeds, agent_info, env_info, agent_seeds=None, device=device(type='cpu'))[source]
close()[source]
execute(agent_info=None)[source]
get(blocking=True)[source]
n_elems()[source]
reset(agent_info=<rlstructures.core.DictTensor object>, env_info=<rlstructures.core.DictTensor object>)[source]
update(info)[source]