SAC: Soft Actor Critic
This baseline comes from the paper: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Description
This module proposes an implementation of the SAC algorithm.
This is an old implementation that is probably not correct, it was included out of backward compatibility with earlier version (< 0.5.0) of this package
An example to train this model is available in the train function Example-sacold.
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.
Exported class
You can use this class with:
from l2rpn_baselines.SACOld import train, evaluate, SACOld
Other non exported class
These classes need to be imported, if you want to import them with (non exhaustive list):
from l2rpn_baselines.SACOld.sacOld_NN import SACOld_NN
from l2rpn_baselines.SACOld.sacOld_NNParam import SACOld_NNParam