utils: Some utility functions and classes

Description

In this files are present a few utilitary scripts used in some other baselines, or that could be used in different context.

They have been put together in this “module” as they can be reused for different baselines (avoid code duplicate)

The main tools are:

  • BaseDeepQ which is the root class for some baselines. This class holds only the code of the neural network. The architecture of the neural network can be customized thanks to the NNParam

  • DeepQAgent this class will create an instance of BaseDeepQ and will implement the agent interface (eg the train, load and save methods). The training procedure is unified (epsilon greedy for exploration, training for a certain amount of steps etc.) but can be customized with TrainingParam. The training procedure can be stopped at any given time and restarted from the last point almost flawlessly, it saves it neural network frequently as well as the other parameters etc.

  • TrainingParam allows to customized for some “common” procedure how to train the agent. More information can be gathered in the Focus on the training parameters section. This is fully serializable / de serializable in json format.

  • NNParam is used to specify the architecture of your neural network. Just like TrainingParam this class also fully supports serialization / de serialization in json format. More about it is specified in the section Focus on the architecture

Focus on the training parameters

The class TrainingParam regroup a certain number of attributes with different roles. In the table below we tried to list all the attributes and group them into attributes serving the same purpose.

Utility

Attribute names

exploration

initial_epsilon, step_for_final_epsilon, final_epsilon

neural network learning

minibatch_size, update_freq, min_observation

RL meta parameters

discount_factor, tau

limit duration of episode

step_increase_nb_iter * , min_iter, max_iter, update_nb_iter, max_iter_fun

start an episode at random

random_sample_datetime_start *

oversampling hard scenarios

oversampling_rate *

optimizer

lr, lr_decay_steps, lr_decay_rate, max_global_norm_grad, max_value_grad, max_loss

saving / logging

update_tensorboard_freq, save_model_each

* when a “star” is present it means this parameters deactivate the whole utility. For example, setting step_increase_nb_iter to None will deactivate the functionality “limit duration of episode”

Focus on the architecture

TODO

Implementation Details