Provided Algorithms¶
We provide multiple RL algorithms as examples.
A2C with General Advantage Estimator
PPO with discrete actions
3) Double Duelling Q-Learning + Prioritized Experience Replay 3bis) A simpler DQN implementation (as an example) 4) SAC for continuous actions 5) REINFORCE 6) REINFORCE DIAYN (see https://arxiv.org/abs/1802.06070)
The algorithms can be used as examples to implement your own algorithms.
Typical execution is OMP_NUM_THREADS=1 PYTHONPATH=rlstructures python rlstructures/rlalgos/reinforce/main_reinforce.py
Note that all algorithms produced a tensorboard and a CSV output (see config[“logdir”] in the main file)