DuelQSimple: Double Duelling Deep Q Learning
TODO reference the original paper
Description
This file serves as an concrete example on how to implement a baseline, even more concretely than the “do nothing” baseline. Don’t expect to obtain state of the art method with this simple method however.
An example to train this model is available in the train function Examples.
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of l2rpn_baselines.PPO_RLLIB
or the l2rpn_baselines.PPO_SB3
baseline.
Exported class
You can use this class with:
from l2rpn_baselines.DuelQSimple import train, evaluate, DuelQSimple
Classes:
|
Inheriting from |
|
Functions:
|
How to evaluate the performances of the trained DuelQSimple agent. |
|
This function implements the "training" part of the balines "DuelQSimple". |
- class l2rpn_baselines.DuelQSimple.DuelQSimple(action_space, nn_archi, name='DeepQAgent', store_action=True, istraining=False, filter_action_fun=None, verbose=False, observation_space=None, **kwargs_converters)[source]
Inheriting from
l2rpn_baselines.utils.DeepQAgent
this class implements the particular agent used for the Double Duelling Deep Q network baseline.It does nothing in particular.
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.
- class l2rpn_baselines.DuelQSimple.DuelQ_NNParam(action_size, observation_size, sizes, activs, list_attr_obs)[source]
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.
Classes:
alias of
DuelQ_NN
- l2rpn_baselines.DuelQSimple.evaluate(env, name='DuelQSimple', load_path=None, logs_path='./logs-eval/do-nothing-baseline', nb_episode=1, nb_process=1, max_steps=-1, verbose=False, save_gif=False, filter_action_fun=None)[source]
How to evaluate the performances of the trained DuelQSimple agent.
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.
- Parameters:
env (
grid2op.Environment
) – The environment on which you evaluate your agent.name (
str
) – The name of the trained baselineload_path (
str
) – Path where the agent has been storedlogs_path (
str
) – Where to write the results of the assessmentnb_episode (
str
) – How many episodes to run during the assessment of the performancesnb_process (
int
) – On how many process the assessment will be made. (setting this > 1 can lead to some speed ups but can be unstable on some plaform)max_steps (
int
) – How many steps at maximum your agent will be assessedverbose (
bool
) – Currently un usedsave_gif (
bool
) – Whether or not you want to save, as a gif, the performance of your agent. It might cause memory issues (might take a lot of ram) and drastically increase computation time.filter_action_fun (
function
) – A function to filter the action space. See IdToAct.filter_action documentation.
- Returns:
agent (
l2rpn_baselines.utils.DeepQAgent
) – The loaded agent that has been evaluated thanks to the runner.res (
list
) – The results of the Runner on which the agent was tested.
Examples
You can evaluate a DuelQSimpleBaseline this way:
from grid2op.Reward import L2RPNSandBoxScore, L2RPNReward from l2rpn_baselines.DuelQSimple import eval # Create dataset env env = make("l2rpn_case14_sandbox", reward_class=L2RPNSandBoxScore, other_rewards={ "reward": L2RPNReward }) # Call evaluation interface evaluate(env, name="MyAwesomeAgent", load_path="/WHERE/I/SAVED/THE/MODEL", logs_path=None, nb_episode=10, nb_process=1, max_steps=-1, verbose=False, save_gif=False)
- l2rpn_baselines.DuelQSimple.train(env, name='DuelQSimple', iterations=1, save_path=None, load_path=None, logs_dir=None, training_param=None, filter_action_fun=None, verbose=True, kwargs_converters={}, kwargs_archi={})[source]
This function implements the “training” part of the balines “DuelQSimple”.
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.
- Parameters:
env (
grid2op.Environment
) – Then environment on which you need to train your agent.name (
str`
) – The name of your agent.iterations (
int
) – For how many iterations (steps) do you want to train your agent. NB these are not episode, these are steps.save_path (
str
) – Where do you want to save your baseline.load_path (
str
) – If you want to reload your baseline, specify the path where it is located. NB if a baseline is reloaded some of the argument provided to this function will not be used.logs_dir (
str
) – Where to store the tensorboard generated logs during the training.None
if you don’t want to log them.verbose (
bool
) – If you want something to be printed on the terminal (a better logging strategy will be put at some point)training_param (
l2rpn_baselines.utils.TrainingParam
) – The parameters describing the way you will train your model.filter_action_fun (
function
) –A function to filter the action space. See IdToAct.filter_action documentation.
kwargs_converters (
dict
) – A dictionary containing the key-word arguments pass at this initialization of thegrid2op.Converter.IdToAct
that serves as “Base” for the Agent.kwargs_archi (
dict
) – Key word arguments used for making theDeepQ_NNParam
object that will be used to build the baseline.
- Returns:
baseline – The trained baseline.
- Return type:
DeepQSimple
Examples
Here is an example on how to train a DuelQSimple baseline.
First define a python script, for example
import grid2op from grid2op.Reward import L2RPNReward from l2rpn_baselines.utils import TrainingParam, NNParam from l2rpn_baselines.DuelQSimple import train # define the environment env = grid2op.make("l2rpn_case14_sandbox", reward_class=L2RPNReward) # use the default training parameters tp = TrainingParam() # this will be the list of what part of the observation I want to keep # more information on https://grid2op.readthedocs.io/en/latest/observation.html#main-observation-attributes li_attr_obs_X = ["day_of_week", "hour_of_day", "minute_of_hour", "prod_p", "prod_v", "load_p", "load_q", "actual_dispatch", "target_dispatch", "topo_vect", "time_before_cooldown_line", "time_before_cooldown_sub", "rho", "timestep_overflow", "line_status"] # neural network architecture observation_size = NNParam.get_obs_size(env, li_attr_obs_X) sizes = [800, 800, 800, 494, 494, 494] # sizes of each hidden layers kwargs_archi = {'observation_size': observation_size, 'sizes': sizes, 'activs': ["relu" for _ in sizes], # all relu activation function "list_attr_obs": li_attr_obs_X} # select some part of the action # more information at https://grid2op.readthedocs.io/en/latest/converter.html#grid2op.Converter.IdToAct.init_converter kwargs_converters = {"all_actions": None, "set_line_status": False, "change_bus_vect": True, "set_topo_vect": False } # define the name of the model nm_ = "AnneOnymous" try: train(env, name=nm_, iterations=10000, save_path="/WHERE/I/SAVED/THE/MODEL", load_path=None, logs_dir="/WHERE/I/SAVED/THE/LOGS", training_param=tp, kwargs_converters=kwargs_converters, kwargs_archi=kwargs_archi) finally: env.close()
Other non exported class
These classes need to be imported, if you want to import them with (non exhaustive list):
from l2rpn_baselines.DuelQSimple.duelQ_NN import DuelQ_NN
from l2rpn_baselines.DuelQSimple.duelQ_NN import DuelQ_NNParam
- class l2rpn_baselines.DuelQSimple.duelQ_NN.DuelQ_NN(nn_params, training_param=None)[source]
Constructs the desired duelling deep q learning network
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.
Methods:
It uses the architecture defined in the nn_archi attributes.
- class l2rpn_baselines.DuelQSimple.duelQ_NNParam.DuelQ_NNParam(action_size, observation_size, sizes, activs, list_attr_obs)[source]
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.
Classes:
alias of
DuelQ_NN