Template: How to contribute to l2rpn baselines

Description

A Baseline is a grid2op.Agent.BaseAgent with a few more methods that allows to easily load / write and train it.

I can then be used as any grid2op Agent, for example in a runner or doing the “while” open gym loop.

Compared to bare grid2op Agent, baselines have 3 more methods: - Template.load(): to load the agent, if applicable - Template.save(): to save the agent, if applicable - Template.train(): to train the agent, if applicable

The method Template.reset() is already present in grid2op but is emphasized here. It is called by a runner at the beginning of each episode with the first observation.

The method Template.act() is also present in grid2op, of course. It the main method of the baseline, that receives an observation (and a reward and flag that says if an episode is over or not) an return a valid action.

NB the “real” instance of environment on which the baseline will be evaluated will be built AFTER the creation of the baseline. The parameters of the real environment on which the baseline will be assessed will belong to the same class than the argument used by the baseline. This means that if a baseline is built with a grid2op environment “env”, this environment will not be modified in any manner, all it’s internal variable will not change etc. This is done to prevent cheating.

Implementation Example

Classes:

Template(action_space, observation_space, ...)

Note that a Baseline should always somehow inherit from grid2op.Agent.BaseAgent.

Functions:

`evaluate`(env[, load_path, logs_path, ...])	In order to submit a valid basline, it is mandatory to provide a "evaluate" function with the same signature as this one.
`train`(env[, name, iterations, save_path, ...])	This an example function to train a baseline.

class l2rpn_baselines.Template.Template(action_space, observation_space, name, **kwargs)[source]

Note that a Baseline should always somehow inherit from grid2op.Agent.BaseAgent.

It serves as a template agent to explain how a baseline can be built.

As opposed to bare grid2op Agent, baselines have 3 more methods: - Template.load(): to load the agent, if applicable - Template.save(): to save the agent, if applicable - Template.train(): to train the agent, if applicable

The method Template.reset() is already present in grid2op but is emphasized here. It is called by a runner at the beginning of each episode with the first observation.

The method Template.act() is also present in grid2op, of course. It the main method of the baseline, that receives an observation (and a reward and flag that says if an episode is over or not) an return a valid action.

NB the “real” instance of environment on which the baseline will be evaluated will be built AFTER the creation of the baseline. The parameters of the real environment on which the baseline will be assessed will belong to the same class than the argument used by the baseline. This means that if a baseline is built with a grid2op environment “env”, this environment will not be modified in any manner, all it’s internal variable will not change etc. This is done to prevent cheating.

Methods:

`act`(observation, reward, done)	This is the main method of an Template.
`load`(path)	This function is used to build a baseline from a folder for example.
`reset`(observation)	This method is called at the beginning of a new episode.
`save`(path)	This method is used to store the internal state of the baseline.
`train`(env, iterations, save_path, **kwargs)	This function, if provided is used to train the baseline.

act(observation, reward, done)[source]

This is the main method of an Template. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Parameters:

observation (grid2op.Observation.Observation) – The current observation of the grid2op.Environment.Environment
reward (float) – The current reward. This is the reward obtained by the previous action
done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – The action chosen by the bot / controler / agent.

Return type:

grid2op.Action.PlayableAction

load(path)[source]

This function is used to build a baseline from a folder for example. It is recommended that this load function give different resulting depending on the Template.name of the baseline. For example, weights of a neural network can be saved under different names that … depends on the name of the instance.

If path is None is should be undertood as “don’t load anything”.

Parameters:: path (str) – the path from which load the baseline.

reset(observation)[source]

This method is called at the beginning of a new episode. It is implemented by baselines to reset their internal state if needed.

obs

The first observation corresponding to the initial state of the environment.

Type:: grid2op.Observation.BaseObservation

save(path)[source]

This method is used to store the internal state of the baseline.

Parameters:

path (str) –

The location were to store the data of the baseline. If None it should be understood as “don’t save”. In any other cases it is more than recommended that, if “baseline” is a baseline, then:

path = "."  # or any other
baseline.load(path)
loaded_baseline = Template(...)  # built with the same parameters as "baseline"
loaded_baseline.load(path)

is a perfectly valid script (eg it will work perfectly) and that after loading, any call to “loaded_baseline.act” will give the results as the original “baseline.act”. Or in other words, “baseline” and “loaded_baseline” represent the same Baseline, even though they are different instances of Baseline.

train(env, iterations, save_path, **kwargs)[source]

This function, if provided is used to train the baseline. Make sure to save it regularly with “baseline.save” for example.

At the end of the training, it is r

Parameters:

env (grid2op.Environment.Environment) – The environment used to train your baseline.
iterations (int) – Number of training iterations used to train the baseline.
save_path (str) – Path were the final version of the baseline (ie after the “num_training_steps” training steps will be performed). It is more than recommended to save the results regurlarly during training, and to save the baseline at this location at the end.
kwargs – Other key-words arguments used for training.

l2rpn_baselines.Template.evaluate(env, load_path='.', logs_path=None, nb_episode=1, nb_process=1, max_steps=-1, verbose=False, save_gif=False, **kwargs)[source]

In order to submit a valid basline, it is mandatory to provide a “evaluate” function with the same signature as this one.

Parameters:

env (grid2op.Environment.Environment) – The environment on which the baseline will be evaluated.
load_path (str) – The path where the model is stored. This is used by the agent when calling “agent.load)
logs_path (str) – The path where the agents results will be stored.
nb_episode (int) – Number of episodes to run for the assessment of the performance. By default it’s 1.
nb_process (int) – Number of process to be used for the assessment of the performance. Should be an integer greater than 1. By defaults it’s 1.
max_steps (int) – Maximum number of timestep each episode can last. It should be a positive integer or -1. -1 means that the entire episode is run (until the chronics is out of data or until a game over). By default it’s -1.
verbose (bool) – verbosity of the output
save_gif (bool) – Whether or not to save a gif into each episode folder corresponding to the representation of the said episode.
kwargs – Other key words arguments that you are free to use for either building the agent save it etc.

Return type:

None

l2rpn_baselines.Template.train(env, name='Template', iterations=1, save_path=None, load_path=None, **kwargs)[source]

This an example function to train a baseline.

In order to be valid, if you chose (which is recommended) to provide a training script to help other retrain your baseline in different environments, or for longer period of time etc. This script should be contain the “train” function with at least the following arguments.

Parameters:

env (grid2op.Environment.Environment) – The environmnent on which the baseline will be trained
name (str) – Fancy name you give to this baseline.
iterations (int) – Number of training iterations to perform
save_path (str) – The path where the baseline will be saved at the end of the training procedure.
load_path (str) – Path where to look for reloading the model. Use None if no model should be loaded.
kwargs – Other key-word arguments that you might use for training.