LeapNetEncoded: D3QN on a state encoded by a leap net

TODO reference the original papers ESANN Paper Leap Net

That has now be implemented as a github repository Leap Net Github

Description

The Leap is a type of neural network that has showed really good performances on the predictions of flows on powerlines based on the injection and the topology.

In this baseline, we use this very same architecture to model encode the powergrid state (at a given step).

Then this embedding of the powergrid is used by a neural network (that can be a regular network or a leap net) that parametrized the Q function.

An example to train this model is available in the train function Examples.

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of l2rpn_baselines.PPO_RLLIB or the l2rpn_baselines.PPO_SB3 baseline.

Exported class

You can use this class with:

from l2rpn_baselines.LeapNetEncoded import train, evaluate, LeapNetEncoded

Classes:

LeapNetEncoded(action_space, nn_archi[, ...])

Inheriting from l2rpn_baselines.utils.deepQAgent.DeepQAgent this class implements the particular agent used for the Double Duelling Deep Q network baseline, with the particularity that the Q network is encoded with a leap net.

LeapNetEncoded_NN(nn_params[, training_param])

Functions:

evaluate(env[, name, load_path, logs_path, ...])

How to evaluate the performances of the trained DeepQSimple agent.

train(env[, name, iterations, save_path, ...])

This function implements the "training" part of the baselines LeapNetEncoded.

class l2rpn_baselines.LeapNetEncoded.LeapNetEncoded(action_space, nn_archi, name='DeepQAgent', store_action=True, istraining=False, filter_action_fun=None, verbose=False, observation_space=None, **kwargs_converters)[source]

Inheriting from l2rpn_baselines.utils.deepQAgent.DeepQAgent this class implements the particular agent used for the Double Duelling Deep Q network baseline, with the particularity that the Q network is encoded with a leap net.

It does nothing in particular.

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

class l2rpn_baselines.LeapNetEncoded.LeapNetEncoded_NN(nn_params, training_param=None)[source]

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Constructs the desired neural networks.

More information on the leap net can be found at Leap Net on Github

These are:

  • a “state encoder” that uses a leap net to “encode” the observation, or at least the part related to powergrid

  • a q network, that uses the output of the state encoder to predict which action is best.

The Q network can have other types of input, and can also be a leap net, see the class l2rpn_baselines.LeapNetEncoded.leapNetEncoded_NNParam.LeapNetEncoded_NNParam for more information

Methods:

construct_q_network()

Builds the Q network.

load_network(path[, name, ext])

We load all the models using the keras "load_model" function.

predict_movement(data, epsilon[, ...])

Predict movement of game controller where is epsilon probability randomly move.

save_network(path[, name, ext])

Saves all the models with unique names

save_tensorboard(current_step)

function used to save other information to tensorboard

train(s_batch, a_batch, r_batch, d_batch, ...)

Trains network to fit given parameters:

train_on_batch(model, optimizer_model, x, y_true)

clip the loss

construct_q_network()[source]

Builds the Q network.

load_network(path, name=None, ext='h5')[source]

We load all the models using the keras “load_model” function.

predict_movement(data, epsilon, batch_size=None, training=False)[source]

Predict movement of game controller where is epsilon probability randomly move.

save_network(path, name=None, ext='h5')[source]

Saves all the models with unique names

save_tensorboard(current_step)[source]

function used to save other information to tensorboard

train(s_batch, a_batch, r_batch, d_batch, s2_batch, tf_writer=None, batch_size=None)[source]

Trains network to fit given parameters:

Parameters:
  • s_batch – the state vector (before the action is taken)

  • a_batch – the action taken

  • s2_batch – the state vector (after the action is taken)

  • d_batch – says whether or not the episode was over

  • r_batch – the reward obtained this step

train_on_batch(model, optimizer_model, x, y_true)[source]

clip the loss

l2rpn_baselines.LeapNetEncoded.evaluate(env, name='LeapNetEncoded', load_path=None, logs_path='./logs-eval/do-nothing-baseline', nb_episode=1, nb_process=1, max_steps=-1, verbose=False, save_gif=False, filter_action_fun=None)[source]

How to evaluate the performances of the trained DeepQSimple agent.

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Parameters:
  • env (grid2op.Environment) – The environment on which you evaluate your agent.

  • name (str) – The name of the trained baseline

  • load_path (str) – Path where the agent has been stored

  • logs_path (str) – Where to write the results of the assessment

  • nb_episode (str) – How many episodes to run during the assessment of the performances

  • nb_process (int) – On how many process the assessment will be made. (setting this > 1 can lead to some speed ups but can be unstable on some plaform)

  • max_steps (int) – How many steps at maximum your agent will be assessed

  • verbose (bool) – Currently un used

  • save_gif (bool) – Whether or not you want to save, as a gif, the performance of your agent. It might cause memory issues (might take a lot of ram) and drastically increase computation time.

Returns:

  • agent (l2rpn_baselines.utils.DeepQAgent) – The loaded agent that has been evaluated thanks to the runner.

  • res (list) – The results of the Runner on which the agent was tested.

Examples

You can evaluate a DeepQSimple this way:

from grid2op.Reward import L2RPNSandBoxScore, L2RPNReward
from l2rpn_baselines.LeapNetEncoded import eval

# Create dataset env
env = make("l2rpn_case14_sandbox",
           reward_class=L2RPNSandBoxScore,
           other_rewards={
               "reward": L2RPNReward
           })

# Call evaluation interface
evaluate(env,
         name="MyAwesomeAgent",
         load_path="/WHERE/I/SAVED/THE/MODEL",
         logs_path=None,
         nb_episode=10,
         nb_process=1,
         max_steps=-1,
         verbose=False,
         save_gif=False)
l2rpn_baselines.LeapNetEncoded.train(env, name='LeapNetEncoded', iterations=1, save_path=None, load_path=None, logs_dir=None, training_param=None, filter_action_fun=None, verbose=True, kwargs_converters={}, kwargs_archi={})[source]

This function implements the “training” part of the baselines LeapNetEncoded.

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Parameters:
  • env (grid2op.Environment) – Then environment on which you need to train your agent.

  • name (str`) – The name of your agent.

  • iterations (int) – For how many iterations (steps) do you want to train your agent. NB these are not episode, these are steps.

  • save_path (str) – Where do you want to save your baseline.

  • load_path (str) – If you want to reload your baseline, specify the path where it is located. NB if a baseline is reloaded some of the argument provided to this function will not be used.

  • logs_dir (str) – Where to store the tensorboard generated logs during the training. None if you don’t want to log them.

  • training_param (l2rpn_baselines.utils.trainingParam.TrainingParam) – The parameters describing the way you will train your model.

  • filter_action_fun (function) – A function to filter the action space. See IdToAct.filter_action documentation.

  • verbose (bool) – If you want something to be printed on the terminal (a better logging strategy will be put at some point)

  • kwargs_converters (dict) – A dictionary containing the key-word arguments pass at this initialization of the grid2op.Converter.IdToAct that serves as “Base” for the Agent.

  • kwargs_archi (dict) – Key word arguments used for making the DeepQ_NNParam object that will be used to build the baseline.

Returns:

baseline – The trained baseline.

Return type:

LeapNetEncoded`

Examples

Here is an example on how to train a LeapNetEncoded baseline.

First define a python script, for example

import grid2op
from grid2op.Reward import L2RPNReward
from l2rpn_baselines.utils import TrainingParam
from l2rpn_baselines.LeapNetEncoded import train

# define the environment
env = grid2op.make("l2rpn_case14_sandbox",
                   reward_class=L2RPNReward)

# use the default training parameters
tp = TrainingParam()

# nn architecture
li_attr_obs_X = ["prod_p", "prod_v", "load_p", "load_q"]
li_attr_obs_input_q = ["time_before_cooldown_line",
                       "time_before_cooldown_sub",
                       "actual_dispatch",
                       "target_dispatch",
                       "day_of_week",
                       "hour_of_day",
                       "minute_of_hour",
                       "rho"]
li_attr_obs_Tau = ["line_status", "timestep_overflow"]
list_attr_gm_out = ["a_or", "a_ex", "p_or", "p_ex", "q_or", "q_ex", "prod_q", "load_v"] + li_attr_obs_X

kwargs_archi = {'sizes': [],
                'activs': [],
                'x_dim': -1,

                "list_attr_obs": li_attr_obs_X,
                "list_attr_obs_tau": li_attr_obs_Tau,
                "list_attr_obs_x": li_attr_obs_X,
                "list_attr_obs_input_q": li_attr_obs_input_q,
                "list_attr_obs_gm_out": list_attr_gm_out,

                'dim_topo': env.dim_topo,

                "sizes_enc": (50, 50, 50, 50),
                "sizes_main": (300, 300, 300),
                "sizes_out_gm": (100, ),
                "sizes_Qnet": (200, 200, 200)
                }

nm_ = args.name if args.name is not None else DEFAULT_NAME
try:
    train(env,
          name=nm_,
          iterations=args.num_train_steps,
          save_path=args.save_path,
          load_path=args.load_path,
          logs_dir=args.logs_dir,
          training_param=tp,
          kwargs_converters=kwargs_converters,
          kwargs_archi=kwargs_archi,
          verbose=True)
finally:
    env.close()

Other non exported class

These classes need to be imported, if you want to import them with (non exhaustive list):

from l2rpn_baselines.LeapNetEncoded.leapNetEncoded_NN import LeapNetEncoded_NN
from l2rpn_baselines.LeapNetEncoded.leapNetEncoded_NNParam import LeapNetEncoded_NNParam
class l2rpn_baselines.LeapNetEncoded.leapNetEncoded_NN.LeapNetEncoded_NN(nn_params, training_param=None)[source]

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Constructs the desired neural networks.

More information on the leap net can be found at Leap Net on Github

These are:

  • a “state encoder” that uses a leap net to “encode” the observation, or at least the part related to powergrid

  • a q network, that uses the output of the state encoder to predict which action is best.

The Q network can have other types of input, and can also be a leap net, see the class l2rpn_baselines.LeapNetEncoded.leapNetEncoded_NNParam.LeapNetEncoded_NNParam for more information

Methods:

construct_q_network()

Builds the Q network.

load_network(path[, name, ext])

We load all the models using the keras "load_model" function.

predict_movement(data, epsilon[, ...])

Predict movement of game controller where is epsilon probability randomly move.

save_network(path[, name, ext])

Saves all the models with unique names

save_tensorboard(current_step)

function used to save other information to tensorboard

train(s_batch, a_batch, r_batch, d_batch, ...)

Trains network to fit given parameters:

train_on_batch(model, optimizer_model, x, y_true)

clip the loss

construct_q_network()[source]

Builds the Q network.

load_network(path, name=None, ext='h5')[source]

We load all the models using the keras “load_model” function.

predict_movement(data, epsilon, batch_size=None, training=False)[source]

Predict movement of game controller where is epsilon probability randomly move.

save_network(path, name=None, ext='h5')[source]

Saves all the models with unique names

save_tensorboard(current_step)[source]

function used to save other information to tensorboard

train(s_batch, a_batch, r_batch, d_batch, s2_batch, tf_writer=None, batch_size=None)[source]

Trains network to fit given parameters:

Parameters:
  • s_batch – the state vector (before the action is taken)

  • a_batch – the action taken

  • s2_batch – the state vector (after the action is taken)

  • d_batch – says whether or not the episode was over

  • r_batch – the reward obtained this step

train_on_batch(model, optimizer_model, x, y_true)[source]

clip the loss

class l2rpn_baselines.LeapNetEncoded.leapNetEncoded_NNParam.LeapNetEncoded_NNParam(action_size, observation_size, sizes, activs, x_dim, list_attr_obs, list_attr_obs_tau, list_attr_obs_x, list_attr_obs_input_q, list_attr_obs_gm_out, dim_topo, sizes_enc=(20, 20, 20), sizes_main=(150, 150, 150), sizes_out_gm=(100, 40), sizes_Qnet=(100, 100, 100), input_q_adds=None, input_q_mults=None, gm_out_adds=None, gm_out_mults=None, tau_adds=None, tau_mults=None, x_adds=None, x_mults=None, tau_dims=None, x_dims=None, gm_out_dims=None, input_q_dims=None)[source]

This class implements the type of parameters used by the LeapNetEncoded model.

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

More information on the leap net can be found at Leap Net on Github

list_attr_obs

currently ot used

sizes

currently not used

activs

currently not used

x_dim

currently not used

list_attr_obs_x

list of the attribute of the observation that serve as input of the grid model (we recommend [“prod_p”, “prod_v”, “load_p”, “load_q”])

list_attr_obs_gm_out

list of the attribute of the observation that serve as output for the grid model (we recommend [“a_or”, “a_ex”, “p_or”, “p_ex”, “q_or”, “q_ex”, “prod_q”, “load_v”] + li_attr_obs_X) though “rho” can be equally good an improve computation time

list_attr_obs_input_q

list of the attribute of the observation that serve as input (other that the embedding of the grid state) for the Q network (we recommend to have here anything “time related” for example [“time_before_cooldown_line”, “time_before_cooldown_sub”, “actual_dispatch”, “target_dispatch”, “day_of_week”, “hour_of_day”, “minute_of_hour”] etc.

list_attr_obs_tau

If you chose to encode your q network as a leap net it self, then you can put here the attribute you would like the leap net to act on ( [“line_status”, “timestep_overflow”] for example)

dim_topo

Dimension of the topology vector (init it with env.dim_topo)

Type:

int

Examples

All other attributes need to be created once by a call to l2rpn_baselines.LeapNetEncoded.leapNetEncoded_NNParam.LeapNetEncoded_NNParam.compute_dims():

nn_archi.compute_dims(env)
nn_archi.center_reduce(env)

These calls will set up all the attribute that are not set, and register this model to use input data approximately in [-1,1] interval.

Methods:

center_reduce(env)

Compute some basic statistics for x and tau

compute_dims(env)

Compute the dimension of the observations (dimension of x and tau)

get_obs_attr()

Retrieve the list of the observation attributes that are used for this model.

Classes:

nn_class

alias of LeapNetEncoded_NN

center_reduce(env)[source]

Compute some basic statistics for x and tau

compute_dims(env)[source]

Compute the dimension of the observations (dimension of x and tau)

Parameters:

env (a grid2op environment) – A grid2op environment

get_obs_attr()[source]

Retrieve the list of the observation attributes that are used for this model.

nn_class

alias of LeapNetEncoded_NN Methods:

construct_q_network()

Builds the Q network.

load_network(path[, name, ext])

We load all the models using the keras "load_model" function.

predict_movement(data, epsilon[, ...])

Predict movement of game controller where is epsilon probability randomly move.

save_network(path[, name, ext])

Saves all the models with unique names

save_tensorboard(current_step)

function used to save other information to tensorboard

train(s_batch, a_batch, r_batch, d_batch, ...)

Trains network to fit given parameters:

train_on_batch(model, optimizer_model, x, y_true)

clip the loss