Provided Algorithms =================== We provide multiple RL algorithms as examples. 1) A2C with General Advantage Estimator 2) PPO with discrete actions 3) Double Duelling Q-Learning + Prioritized Experience Replay 3bis) A simpler DQN implementation (as an example) 4) SAC for continuous actions 5) REINFORCE 6) REINFORCE DIAYN (see https://arxiv.org/abs/1802.06070) The algorithms can be used as examples to implement your own algorithms. Typical execution is `OMP_NUM_THREADS=1 PYTHONPATH=rlstructures python rlstructures/rlalgos/reinforce/main_reinforce.py` Note that all algorithms produced a tensorboard and a CSV output (see `config["logdir"]` in the main file)