Provided Algorithms¶
We provide multiple RL algorithms as examples.
A2C with General Advantage Estimator
PPO with discrete actions
3) Double Duelling Q-Learning + Prioritized Experience Replay 3bis) A simpler DQN implementation (as an example) 4) SAC for continuous actions 5) REINFORCE 6) REINFORCE DIAYN (see
The algorithms can be used as examples to implement your own algorithms.
Typical execution is OMP_NUM_THREADS=1 PYTHONPATH=rlstructures python rlstructures/rlalgos/reinforce/
Note that all algorithms produced a tensorboard and a CSV output (see config[“logdir”] in the main file)