rlstructures – mgirating from v0.1 to v0.2

Version 0.2 of rlstructures have some critical changes:

From Agent to RL_Agent

Policies are now implemented through the RL_Agent class. The two differences are:

  • The RL_Agent class has a initial_state methods that initialize the state of the agent at reset time (i.e when you call Batcher.reset). It avoids you to handle the state initialization in the __call__ function.

  • The RL_Agent does not return its old state anymore, and just provide the agent_do and new_state as an output

From EpisodeBatcher/Batcher to RL_Batcher

RL_Batcher is the batcher class that works with RL_Agent:

  • At construction time:

    • There is no need to specify the n_slots arguments anymore

    • One has to provide examples (with n_elems()==1) of agent_info and env_info that will be sent to the batcher at construction time

    • You can specify the device of the batcher (default is CPU – see the CPU/GPU tutorial)

  • At use time:

    • Only three functions are available: reset, execute and get

  • Outputs:

    • The RL_Batcher now outputs a Trajectories object composed of trajectories.info:DictTensor and trajectories.trajectories:TemporalDictTensor

    • trajectories.info contains informations that is fixed during the trajectorie: agent_info, env_info and initial agent state

    • trajectories.trajectories contains informations generated by the environment (observations), and also actions produced by the Agent

Replay functions

We now propose a replay_agent function that allows to easily repaly an agent over trajectories (e.g for loss computation)