Do Nothing: a more concrete example the “do nothing” Baseline

Description

This file serves as an concrete example on how to implement a Baseline.

Implementation Example

Classes:

DoNothing(action_space, observation_space, ...)

Do nothing agent of grid2op, as a lowerbond baseline for l2rpn competition.

class l2rpn_baselines.DoNothing.DoNothing(action_space, observation_space, name, **kwargs)[source]

Do nothing agent of grid2op, as a lowerbond baseline for l2rpn competition.

Methods:

__init__(action_space, observation_space, ...)

act(observation, reward, done)

This is the main method of an BaseAgent.

reset(observation)

This method is called at the beginning of a new episode.

__init__(action_space, observation_space, name, **kwargs)[source]
act(observation, reward, done)[source]

This is the main method of an BaseAgent. Given the current observation and the current reward (ie the reward that the environment send to the agent after the previous action has been implemented).

Notes

In order to be reproducible, and to make proper use of the BaseAgent.seed() capabilities, you must absolutely NOT use the random python module (which will not be seeded) nor the np.random module and avoid any other “sources” of pseudo random numbers.

You can adapt your code the following way. Instead of using np.random use self.space_prng.

For example, if you wanted to write np.random.randint(1,5) replace it by self.space_prng.randint(1,5). It is the same for np.random.normal() that is replaced by self.space_prng.normal().

You have an example of such usage in RandomAgent.my_act().

If you really need other sources of randomness (for example if you use tensorflow or torch) we strongly recommend you to overload the BaseAgent.seed() accordingly. In that

Parameters:
  • observation (grid2op.Observation.BaseObservation) – The current observation of the grid2op.Environment.Environment

  • reward (float) – The current reward. This is the reward obtained by the previous action

  • done (bool) – Whether the episode has ended or not. Used to maintain gym compatibility

Returns:

res – The action chosen by the bot / controler / agent.

Return type:

grid2op.Action.PlaybleAction

reset(observation)[source]

This method is called at the beginning of a new episode. It is implemented by agents to reset their internal state if needed.

obs

The first observation corresponding to the initial state of the environment.

Type:

grid2op.Observation.BaseObservation