DoubleDuelingDQN: A example implementation of Double Duelling Deep Q Network
Description
This module serves as an concrete example on how to implement a D3QN baseline. This baseline is of type Double Duelling Deep Q Network, as in Duelling Q Network and DoubleQ update.
It’s main purpose is to provide an example of this network type running with Grid2Op. However, don’t expect to obtain state of the art results.
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of l2rpn_baselines.PPO_RLLIB
or the l2rpn_baselines.PPO_SB3
baseline.
Agent class
You can use this class with:
from l2rpn_baselines.DoubleDuelingDQN import DoubleDuelingDQN
from l2rpn_baselines.DoubleDuelingDQN import train
from l2rpn_baselines.DoubleDuelingDQN import evaluate
Classes:
|
|
DoubleDuelingDQN configurable hyperparameters exposed as class attributes |
- class l2rpn_baselines.DoubleDuelingDQN.DoubleDuelingDQN(observation_space, action_space, name='l2rpn_baselines.DoubleDuelingDQN.doubleDuelingDQN', is_training=False)[source]
Methods:
convert_act
(action)This function will convert an "ecnoded action" that be of any types, to a valid action that can be ingested by the environment.
convert_obs
(observation)This function convert the observation, that is an object of class
grid2op.Observation.BaseObservation
into a representation understandable by the BaseAgent.my_act
(state, reward[, done])This method should be override if this class is used.
reset
(observation)This method is called at the beginning of a new episode.
- convert_act(action)[source]
This function will convert an “ecnoded action” that be of any types, to a valid action that can be ingested by the environment.
- Parameters:
encoded_act (
object
) – Anything that represents an action.- Returns:
act – A valid actions, represented as a class, that corresponds to the encoded action given as input.
- Return type:
:grid2op.BaseAction.BaseAction`
- convert_obs(observation)[source]
This function convert the observation, that is an object of class
grid2op.Observation.BaseObservation
into a representation understandable by the BaseAgent.For example, and agent could only want to look at the relative flows
grid2op.Observation.BaseObservation.rho
to take his decision. This is possible by overloading this method.This method can also be used to scale the observation such that each compononents has mean 0 and variance 1 for example.
- Parameters:
observation (
grid2op.Observation.Observation
) – Initial observation received by the agent in theBaseAgent.act()
method.- Returns:
res – Anything that will be used by the BaseAgent to take decisions.
- Return type:
object
- my_act(state, reward, done=False)[source]
This method should be override if this class is used. It is an “abstract” method.
If someone wants to make a agent that handles different kinds of actions an observation.
- Parameters:
transformed_observation (
object
) – Anything that will be used to create an action. This is the results to the call ofAgentWithConverter.convert_obs()
. This is likely a numpy array.reward (
float
) – The current reward. This is the reward obtained by the previous actiondone (
bool
) – Whether the episode has ended or not. Used to maintain gym compatibility
- Returns:
res – A representation of an action in any possible format. This action will then be ingested and formatted into a valid action with the
AgentWithConverter.convert_act()
method.- Return type:
object
- class l2rpn_baselines.DoubleDuelingDQN.DoubleDuelingDQNConfig[source]
DoubleDuelingDQN configurable hyperparameters exposed as class attributes
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.
Configuration
Training a model requires tweaking many hyperparameters, these can be found in a specific class attributes:
from l2rpn_baselines.DoubleDuelingDQN import DoubleDuelingDQNConfig
# Set hyperparameters before training
DoubleDuelingDQNConfig.LR = 1e-5
DoubleDuelingDQNConfig.INITAL_EPSILON = 1.0
DoubleDuelingDQNConfig.FINAL_EPSILON = 0.001
DoubleDuelingDQNConfig.DECAY_EPSILON = 10000
DoubleDuelingDQN configurable hyperparameters exposed as class attributes
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.
Internal classes
The neural network model is defined in a separate class. You may want to import it manually:
from l2rpn_baselines.DoubleDuelingDQN.doubleDuelingDQN_NN import DoubleDuelingDQN_NN
- class l2rpn_baselines.DoubleDuelingDQN.doubleDuelingDQN_NN.DoubleDuelingDQN_NN(action_size, observation_size, num_frames=4, learning_rate=1e-05, learning_rate_decay_steps=1000, learning_rate_decay_rate=0.95)[source]
Constructs the desired deep q learning network
Warning
This baseline recodes entire the RL training procedure. You can use it if you want to have a deeper look at Deep Q Learning algorithm and a possible (non optimized, slow, etc. implementation ).
For a much better implementation, you can reuse the code of “PPO_RLLIB” or the “PPO_SB3” baseline.