rlstructures – mgirating from v0.1 to v0.2¶

Version 0.2 of rlstructures have some critical changes:

From Agent to RL_Agent¶

Policies are now implemented through the RL_Agent class. The two differences are:

The RL_Agent class has a initial_state methods that initialize the state of the agent at reset time (i.e when you call Batcher.reset). It avoids you to handle the state initialization in the __call__ function.
The RL_Agent does not return its old state anymore, and just provide the agent_do and new_state as an output

RL_Batcher is the batcher class that works with RL_Agent:

At construction time:
- There is no need to specify the n_slots arguments anymore
- One has to provide examples (with n_elems()==1) of agent_info and env_info that will be sent to the batcher at construction time
- You can specify the device of the batcher (default is CPU – see the CPU/GPU tutorial)
At use time:
- Only three functions are available: reset, execute and get
Outputs:
- The RL_Batcher now outputs a Trajectories object composed of trajectories.info:DictTensor and trajectories.trajectories:TemporalDictTensor
- trajectories.info contains informations that is fixed during the trajectorie: agent_info, env_info and initial agent state
- trajectories.trajectories contains informations generated by the environment (observations), and also actions produced by the Agent

We now propose a replay_agent function that allows to easily repaly an agent over trajectories (e.g for loss computation)