DoubleDuelingRDQN: A example implementation of Recurrent DoubleQ Network

Description

This module serves as an concrete example on how to implement a recurrent D3QN baseline. This baseline is of type Recurrent Double Duelling Deep Q Network, as in Duelling Q, DoubleQ update and recurrent neural network.

It’s main purpose is to provide an example of this network type running with Grid2Op. However, don’t expect to obtain state of the art results.

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of l2rpn_baselines.PPO_RLLIB or the l2rpn_baselines.PPO_SB3 baseline.

Agent class

You can use this class with:

from l2rpn_baselines.DoubleDuelingRDQN import DoubleDuelingRDQN
from l2rpn_baselines.DoubleDuelingRDQN import train
from l2rpn_baselines.DoubleDuelingRDQN import evaluate

Classes:

`DoubleDuelingRDQN`(observation_space, ...[, ...])
`DoubleDuelingRDQNConfig`()	DoubleDuelingRDQN configurable hyperparameters as class attributes

Functions:

`evaluate`(env[, load_path, logs_path, ...])
`train`(env[, name, iterations, save_path, ...])

class l2rpn_baselines.DoubleDuelingRDQN.DoubleDuelingRDQN(observation_space, action_space, name='l2rpn_baselines.DoubleDuelingRDQN.doubleDuelingRDQN', is_training=False)[source]

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Methods:

`convert_act`(action)	This function will convert an "ecnoded action" that be of any types, to a valid action that can be ingested by the environment.
`convert_obs`(observation)	This function convert the observation, that is an object of class `grid2op.Observation.BaseObservation` into a representation understandable by the BaseAgent.
`my_act`(state, reward[, done])	This method should be override if this class is used.
`reset`(observation)	This method is called at the beginning of a new episode.

convert_act(action)[source]

This function will convert an “ecnoded action” that be of any types, to a valid action that can be ingested by the environment.

Parameters:: encoded_act (object) – Anything that represents an action.
Returns:: act – A valid actions, represented as a class, that corresponds to the encoded action given as input.
Return type:: :grid2op.BaseAction.BaseAction`

convert_obs(observation)[source]

This function convert the observation, that is an object of class grid2op.Observation.BaseObservation into a representation understandable by the BaseAgent.

For example, and agent could only want to look at the relative flows grid2op.Observation.BaseObservation.rho to take his decision. This is possible by overloading this method.

This method can also be used to scale the observation such that each compononents has mean 0 and variance 1 for example.

Parameters:: observation (grid2op.Observation.Observation) – Initial observation received by the agent in the BaseAgent.act() method.
Returns:: res – Anything that will be used by the BaseAgent to take decisions.
Return type:: object

my_act(state, reward, done=False)[source]

This method should be override if this class is used. It is an “abstract” method.

If someone wants to make a agent that handles different kinds of actions an observation.

Parameters:

transformed_observation (object) – Anything that will be used to create an action. This is the results to the call of AgentWithConverter.convert_obs(). This is likely a numpy array.
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – A representation of an action in any possible format. This action will then be ingested and formatted into a valid action with the AgentWithConverter.convert_act() method.

Return type:

object

reset(observation)[source]

This method is called at the beginning of a new episode. It is implemented by agents to reset their internal state if needed.

obs

The first observation corresponding to the initial state of the environment.

Type:: grid2op.Observation.BaseObservation

class l2rpn_baselines.DoubleDuelingRDQN.DoubleDuelingRDQNConfig[source]: DoubleDuelingRDQN configurable hyperparameters as class attributes

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

l2rpn_baselines.DoubleDuelingRDQN.evaluate(env, load_path=None, logs_path='./logs-eval', nb_episode=1, nb_process=1, max_steps=-1, verbose=True, save_gif=False)[source]: Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

l2rpn_baselines.DoubleDuelingRDQN.train(env, name='DoubleDuelingRDQN', iterations=1024, save_path='./models', load_path=None, logs_path='./logs-train', num_pre_training_steps=256, trace_length=12, batch_size=32, learning_rate=1e-05, verbose=True)[source]: Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Configuration

Training a model requires tweaking many hyperparameters, these can be found in a specific class attributes:

from l2rpn_baselines.DoubleDuelingRDQN import DoubleDuelingRDQNConfig

# Set hyperparameters before training
DoubleDuelingRDQNConfig.LR = 1e-5
DoubleDuelingRDQNConfig.TRACE_LENGTH = 12

DoubleDuelingRDQN configurable hyperparameters as class attributes

Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.

Internal classes

The neural network model is defined in a separate class. You may want to import it manually:

from l2rpn_baselines.DoubleDuelingRDQN.doubleDuelingRDQN_NN import DoubleDuelingRDQN_NN

class l2rpn_baselines.DoubleDuelingRDQN.doubleDuelingRDQN_NN.DoubleDuelingRDQN_NN(action_size, observation_size, learning_rate=1e-05)[source]: Warning

This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).

For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.