DuelQLeapNet: D3QN with LeapNet

TODO reference the original papers ESANN Paper Leap Net

That has now be implemented as a github repository Leap Net Github

Description

The Leap is a type of neural network that has showed really good performances on the predictions of flows on powerlines based on the injection and the topology.

In this baseline, we use this very same architecture to model the Q function. The D3QN RL method is used.

An example to train this model is available in the train function Examples.

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of l2rpn_baselines.PPO_RLLIB or the l2rpn_baselines.PPO_SB3 baseline.

Exported class

You can use this class with:

from l2rpn_baselines.DuelQLeapNet import train, evaluate, DuelQLeapNet

Classes:

`DuelQLeapNet`(action_space, nn_archi[, name, ...])	Inheriting from `l2rpn_baselines.utils.deepQAgent.DeepQAgent` this class implements the particular agent used for the Double Duelling Deep Q network baseline, with the particularity that the Q network is encoded with a leap net.
`DuelQLeapNet_NN`(nn_params[, training_param])	Constructs the desired duelling deep q learning network with a leap neural network as a modeling of the q function

Functions:

`evaluate`(env[, name, load_path, logs_path, ...])	How to evaluate the performances of the trained `DuelQLeapNet` agent.
`train`(env[, name, iterations, save_path, ...])	This function implements the "training" part of the balines `DuelQLeapNet`.

class l2rpn_baselines.DuelQLeapNet.DuelQLeapNet(action_space, nn_archi, name='DeepQAgent', store_action=True, istraining=False, filter_action_fun=None, verbose=False, observation_space=None, **kwargs_converters)[source]

Inheriting from l2rpn_baselines.utils.deepQAgent.DeepQAgent this class implements the particular agent used for the Double Duelling Deep Q network baseline, with the particularity that the Q network is encoded with a leap net.

It does nothing in particular.

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

class l2rpn_baselines.DuelQLeapNet.DuelQLeapNet_NN(nn_params, training_param=None)[source]

Constructs the desired duelling deep q learning network with a leap neural network as a modeling of the q function

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Methods:

`construct_q_network`()	Build the Q network appropriatly.
`predict_movement`(data, epsilon[, ...])	Predict movement of game controler where is epsilon probability randomly move.
`train`(s_batch, a_batch, r_batch, d_batch, ...)	Trains network to fit given parameters:
`train_on_batch`(model, optimizer_model, x, y_true)	clip the loss

construct_q_network()[source]

Build the Q network appropriatly.

It first build a standard Q network with regular inputs x.

Then encodes the tau

Then data are split and used in the “value” and the “advantage” networks as done usually in D3QN.

predict_movement(data, epsilon, batch_size=None, training=False)[source]: Predict movement of game controler where is epsilon probability randomly move.

train(s_batch, a_batch, r_batch, d_batch, s2_batch, tf_writer=None, batch_size=None)[source]

Trains network to fit given parameters:

Parameters:

s_batch – the state vector (before the action is taken)
a_batch – the action taken
s2_batch – the state vector (after the action is taken)
d_batch – says whether or not the episode was over
r_batch – the reward obtained this step

train_on_batch(model, optimizer_model, x, y_true)[source]: clip the loss

l2rpn_baselines.DuelQLeapNet.evaluate(env, name='DuelQLeapNet', load_path=None, logs_path='./logs-eval/do-nothing-baseline', nb_episode=1, nb_process=1, max_steps=-1, verbose=False, save_gif=False, filter_action_fun=None)[source]

How to evaluate the performances of the trained DuelQLeapNet agent.

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Parameters:

env (grid2op.Environment) – The environment on which you evaluate your agent.
name (str) – The name of the trained baseline
load_path (str) – Path where the agent has been stored
logs_path (str) – Where to write the results of the assessment
nb_episode (str) – How many episodes to run during the assessment of the performances
nb_process (int) – On how many process the assessment will be made. (setting this > 1 can lead to some speed ups but can be unstable on some plaform)
max_steps (int) – How many steps at maximum your agent will be assessed
verbose (bool) – Currently un used
save_gif (bool) – Whether or not you want to save, as a gif, the performance of your agent. It might cause memory issues (might take a lot of ram) and drastically increase computation time.

Returns:

agent (DuelQLeapNet) – The loaded agent that has been evaluated thanks to the runner.
res (list) – The results of the Runner on which the agent was tested.

Examples

You can evaluate a DuelQLeapNet this way:

from grid2op.Reward import L2RPNSandBoxScore, L2RPNReward
from l2rpn_baselines.DuelQLeapNet import eval

# Create dataset env
env = make("l2rpn_case14_sandbox",
           reward_class=L2RPNSandBoxScore,
           other_rewards={
               "reward": L2RPNReward
           })

# Call evaluation interface
evaluate(env,
         name="MyAwesomeAgent",
         load_path="/WHERE/I/SAVED/THE/MODEL",
         logs_path=None,
         nb_episode=10,
         nb_process=1,
         max_steps=-1,
         verbose=False,
         save_gif=False)

l2rpn_baselines.DuelQLeapNet.train(env, name='DuelQLeapNet', iterations=1, save_path=None, load_path=None, logs_dir=None, training_param=None, filter_action_fun=None, verbose=True, kwargs_converters={}, kwargs_archi={})[source]

This function implements the “training” part of the balines DuelQLeapNet.

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Parameters:

env (grid2op.Environment) – Then environment on which you need to train your agent.
name (str`) – The name of your agent.
iterations (int) – For how many iterations (steps) do you want to train your agent. NB these are not episode, these are steps.
save_path (str) – Where do you want to save your baseline.
load_path (str) – If you want to reload your baseline, specify the path where it is located. NB if a baseline is reloaded some of the argument provided to this function will not be used.
logs_dir (str) – Where to store the tensorboard generated logs during the training. None if you don’t want to log them.
training_param (l2rpn_baselines.utils.TrainingParam) – The parameters describing the way you will train your model.
filter_action_fun (function) – A function to filter the action space. See IdToAct.filter_action documentation.
verbose (bool) – If you want something to be printed on the terminal (a better logging strategy will be put at some point)
kwargs_converters (dict) – A dictionary containing the key-word arguments pass at this initialization of the grid2op.Converter.IdToAct that serves as “Base” for the Agent.
kwargs_archi (dict) – Key word arguments used for making the DeepQ_NNParam object that will be used to build the baseline.

Returns:

baseline – The trained baseline.

Return type:

DuelQLeapNet

Examples

Here is an example on how to train a DuelQLeapNet baseline.

First define a python script, for example

import grid2op
from grid2op.Reward import L2RPNReward
from l2rpn_baselines.utils import TrainingParam
from l2rpn_baselines.DuelQLeapNet import train, LeapNet_NNParam

# define the environment
env = grid2op.make("l2rpn_case14_sandbox",
                   reward_class=L2RPNReward)

# use the default training parameters
tp = TrainingParam()

# this will be the list of what part of the observation I want to keep
# more information on https://grid2op.readthedocs.io/en/latest/observation.html#main-observation-attributes
li_attr_obs_X = ["day_of_week", "hour_of_day", "minute_of_hour", "prod_p", "prod_v", "load_p", "load_q",
                 "actual_dispatch", "target_dispatch", "topo_vect", "time_before_cooldown_line",
                 "time_before_cooldown_sub", "rho", "timestep_overflow", "line_status"]

# neural network architecture
li_attr_obs_X = ["day_of_week", "hour_of_day", "minute_of_hour", "prod_p", "prod_v", "load_p", "load_q",
                 "actual_dispatch", "target_dispatch", "topo_vect", "time_before_cooldown_line",
                 "time_before_cooldown_sub", "timestep_overflow", "line_status", "rho"]
# compared to the other baseline, we have different inputs at different place, this is how we split it
li_attr_obs_Tau = ["rho", "line_status"]
sizes = [800, 800, 800, 494, 494, 494]

# nn architecture
x_dim = LeapNet_NNParam.get_obs_size(env, li_attr_obs_X)
tau_dims = [LeapNet_NNParam.get_obs_size(env, [el]) for el in li_attr_obs_Tau]

kwargs_archi = {'sizes': sizes,
                'activs': ["relu" for _ in sizes],
                'x_dim': x_dim,
                'tau_dims': tau_dims,
                'tau_adds': [0.0 for _ in range(len(tau_dims))],  # add some value to taus
                'tau_mults': [1.0 for _ in range(len(tau_dims))],  # divide by some value for tau (after adding)
                "list_attr_obs": li_attr_obs_X,
                "list_attr_obs_tau": li_attr_obs_Tau
                }

# select some part of the action
# more information at https://grid2op.readthedocs.io/en/latest/converter.html#grid2op.Converter.IdToAct.init_converter
kwargs_converters = {"all_actions": None,
                     "set_line_status": False,
                     "change_bus_vect": True,
                     "set_topo_vect": False
                     }
# define the name of the model
nm_ = "AnneOnymous"
save_path = "/WHERE/I/SAVED/THE/MODEL"
logs_dir = "/WHERE/I/SAVED/THE/LOGS"
try:
    train(env,
          name=nm_,
          iterations=10000,
          save_path=save_path,
          load_path=None,
          logs_dir=logs_dir,
          training_param=tp,
          kwargs_converters=kwargs_converters,
          kwargs_archi=kwargs_archi)
finally:
    env.close()

Other non exported class

These classes need to be imported, if you want to import them with (non exhaustive list):

from l2rpn_baselines.DuelQLeapNet.duelQLeapNet_NN import DuelQLeapNet_NN
from l2rpn_baselines.DuelQLeapNet.leapNet_NNParam import LeapNet_NNParam

class l2rpn_baselines.DuelQLeapNet.duelQLeapNet_NN.DuelQLeapNet_NN(nn_params, training_param=None)[source]

Constructs the desired duelling deep q learning network with a leap neural network as a modeling of the q function

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Methods:

`construct_q_network`()	Build the Q network appropriatly.
`predict_movement`(data, epsilon[, ...])	Predict movement of game controler where is epsilon probability randomly move.
`train`(s_batch, a_batch, r_batch, d_batch, ...)	Trains network to fit given parameters:
`train_on_batch`(model, optimizer_model, x, y_true)	clip the loss

construct_q_network()[source]

Build the Q network appropriatly.

It first build a standard Q network with regular inputs x.

Then encodes the tau

Then data are split and used in the “value” and the “advantage” networks as done usually in D3QN.

predict_movement(data, epsilon, batch_size=None, training=False)[source]: Predict movement of game controler where is epsilon probability randomly move.

train(s_batch, a_batch, r_batch, d_batch, s2_batch, tf_writer=None, batch_size=None)[source]

Trains network to fit given parameters: