utils: Some utility functions and classes
Description
In this files are present a few utilitary scripts used in some other baselines, or that could be used in different context.
They have been put together in this “module” as they can be reused for different baselines (avoid code duplicate)
The main tools are:
BaseDeepQ
which is the root class for some baselines. This class holds only the code of the neural network. The architecture of the neural network can be customized thanks to theNNParam
DeepQAgent
this class will create an instance ofBaseDeepQ
and will implement the agent interface (eg the train, load and save methods). The training procedure is unified (epsilon greedy for exploration, training for a certain amount of steps etc.) but can be customized withTrainingParam
. The training procedure can be stopped at any given time and restarted from the last point almost flawlessly, it saves it neural network frequently as well as the other parameters etc.TrainingParam
allows to customized for some “common” procedure how to train the agent. More information can be gathered in the Focus on the training parameters section. This is fully serializable / de serializable in json format.NNParam
is used to specify the architecture of your neural network. Just likeTrainingParam
this class also fully supports serialization / de serialization in json format. More about it is specified in the section Focus on the architecture
Focus on the training parameters
The class TrainingParam
regroup a certain number of attributes with different roles. In the table below
we tried to list all the attributes and group them into attributes serving the same purpose.
Utility |
Attribute names |
---|---|
exploration |
|
neural network learning |
|
RL meta parameters |
|
limit duration of episode |
step_increase_nb_iter * , min_iter, max_iter, update_nb_iter, max_iter_fun |
start an episode at random |
|
oversampling hard scenarios |
|
optimizer |
lr, lr_decay_steps, lr_decay_rate, max_global_norm_grad, max_value_grad, max_loss |
saving / logging |
* when a “star” is present it means this parameters deactivate the whole utility. For example, setting
step_increase_nb_iter to None
will deactivate the functionality “limit duration of episode”
Focus on the architecture
TODO