MRL environment
Environment
The Environment
class holds all Callback
classes involved in the fit cycle and runs the fit loop. All callbacks are treated the same, but the following callback classes are distinguished for semantic convenience:
agent
- theAgent
being trainedtemplate_cb
- theTemplateCallback
to use for the fit cyclesamplers
- anySampler
callbacks usedrewards
- anyRewardCallback
callbackslosses
- anyLossCallback
callbackscbs
- any otherCallback
classes that don't fall into the above categories
The Fit Loop
The following describes the order of events in Environment.fit
- Callbacks added during
Environment.fit
are registered before_train
event is called- Start iterating over the number of batches. For each batch:
- Call
Environment.build_buffer
. If current buffer size is less than the current batch size:- call
build_buffer
event - call
filter_buffer
event - call
after_build_buffer
event
- call
- Call
Environment.sample_batch
- create new
BatchState
- call
before_batch
event - call
sample_batch
event - call
before_filter_batch
event - call
filter_batch
event - call
after_sample
event
- create new
- Call
Environment.compute_reward
- call
before_compute_reward
event - call
compute_reward
event - call
after_compute_reward
event - call
reward_modification
event - call
after_reward_modification
event
- call
- Call
Environment.get_model_outputs
- call
get_model_outputs
event - call
after_get_model_outputs
event
- call
- Call
Environment.compute_loss
- call
compute_loss
event - call
zero_grad
event - call
before_step
event - call
step
event
- call
- Call
Environment.after_batch
- call
after_batch
event
- call
- After the specified number of iterations have completed, call
after_train
event - Remove callbacks registered at the start of the fit loop