SAC: Soft Actor Critic

This baseline comes from the paper: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Description

This module proposes an implementation of the SAC algorithm.

This is an old implementation that is probably not correct, it was included out of backward compatibility with earlier version (< 0.5.0) of this package

An example to train this model is available in the train function Example-sacold.

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Exported class

You can use this class with:

from l2rpn_baselines.SACOld import train, evaluate, SACOld

Other non exported class

These classes need to be imported, if you want to import them with (non exhaustive list):

from l2rpn_baselines.SACOld.sacOld_NN import SACOld_NN
from l2rpn_baselines.SACOld.sacOld_NNParam import SACOld_NNParam