MRL environment
Environment
The Environment class holds all Callback classes involved in the fit cycle and runs the fit loop. All callbacks are treated the same, but the following callback classes are distinguished for semantic convenience:
agent- theAgentbeing trainedtemplate_cb- theTemplateCallbackto use for the fit cyclesamplers- anySamplercallbacks usedrewards- anyRewardCallbackcallbackslosses- anyLossCallbackcallbackscbs- any otherCallbackclasses that don't fall into the above categories
The Fit Loop
The following describes the order of events in Environment.fit
- Callbacks added during
Environment.fitare registered before_trainevent is called- Start iterating over the number of batches. For each batch:
- Call
Environment.build_buffer. If current buffer size is less than the current batch size:- call
build_bufferevent - call
filter_bufferevent - call
after_build_bufferevent
- call
- Call
Environment.sample_batch- create new
BatchState - call
before_batchevent - call
sample_batchevent - call
before_filter_batchevent - call
filter_batchevent - call
after_sampleevent
- create new
- Call
Environment.compute_reward- call
before_compute_rewardevent - call
compute_rewardevent - call
after_compute_rewardevent - call
reward_modificationevent - call
after_reward_modificationevent
- call
- Call
Environment.get_model_outputs- call
get_model_outputsevent - call
after_get_model_outputsevent
- call
- Call
Environment.compute_loss- call
compute_lossevent - call
zero_gradevent - call
before_stepevent - call
stepevent
- call
- Call
Environment.after_batch- call
after_batchevent
- call
- After the specified number of iterations have completed, call
after_trainevent - Remove callbacks registered at the start of the fit loop