Provided Algorithms
===================

We provide multiple RL algorithms as examples.

1) A2C with General Advantage Estimator
2) PPO with discrete actions
3) Double Duelling Q-Learning + Prioritized Experience Replay
3bis) A simpler DQN implementation (as an example)
4) SAC for continuous actions
5) REINFORCE
6) REINFORCE DIAYN (see https://arxiv.org/abs/1802.06070)

The algorithms can be used as examples to implement your own algorithms.

Typical execution is `OMP_NUM_THREADS=1 PYTHONPATH=rlstructures python rlstructures/rlalgos/reinforce/main_reinforce.py`

Note that all algorithms produced a tensorboard and a CSV output (see `config["logdir"]` in the main file)