utils: Some utility functions and classes

Description

In this files are present a few utilitary scripts used in some other baselines, or that could be used in different context.

They have been put together in this “module” as they can be reused for different baselines (avoid code duplicate)

The main tools are:

BaseDeepQ which is the root class for some baselines. This class holds only the code of the neural network. The architecture of the neural network can be customized thanks to the NNParam
DeepQAgent this class will create an instance of BaseDeepQ and will implement the agent interface (eg the train, load and save methods). The training procedure is unified (epsilon greedy for exploration, training for a certain amount of steps etc.) but can be customized with TrainingParam. The training procedure can be stopped at any given time and restarted from the last point almost flawlessly, it saves it neural network frequently as well as the other parameters etc.
TrainingParam allows to customized for some “common” procedure how to train the agent. More information can be gathered in the Focus on the training parameters section. This is fully serializable / de serializable in json format.
NNParam is used to specify the architecture of your neural network. Just like TrainingParam this class also fully supports serialization / de serialization in json format. More about it is specified in the section Focus on the architecture

Focus on the training parameters

The class TrainingParam regroup a certain number of attributes with different roles. In the table below we tried to list all the attributes and group them into attributes serving the same purpose.

Utility	Attribute names
exploration	initial_epsilon, step_for_final_epsilon, final_epsilon
neural network learning	minibatch_size, update_freq, min_observation
RL meta parameters	discount_factor, tau
limit duration of episode	step_increase_nb_iter * , min_iter, max_iter, update_nb_iter, max_iter_fun
start an episode at random	random_sample_datetime_start *
oversampling hard scenarios	oversampling_rate *
optimizer	lr, lr_decay_steps, lr_decay_rate, max_global_norm_grad, max_value_grad, max_loss
saving / logging	update_tensorboard_freq, save_model_each

* when a “star” is present it means this parameters deactivate the whole utility. For example, setting step_increase_nb_iter to None will deactivate the functionality “limit duration of episode”

Focus on the architecture

TODO

utils: Some utility functions and classes

Description

Focus on the training parameters

Focus on the architecture

Implementation Details