Template Callback
The TemplateCallback class is used by the Environment to interface with a Template during training.
Templates serve two roles during training - filtering and scoring of samples
Filtering
Templates filter all samples added to the Buffer and sampled during each batch. If the argument prefilter=True is passed to the template callback, the template will remove all samples that fail the template's hard filters. If prefilter=False is passed, Template.validate will be used to remove invalid compounds, but will ignore compounds that violate the hard filters
Scoring
If the Template has any soft filters, those filters will be used to score compounds each batch. The aggregate soft filter score will be multiplied by TemplateCallback.weight and added to the total reward
Contrastive Template
The ContrastiveTemplate class applies a template to tasks based around comparing input and output compounds (ie making relative improvements to a compound's properties).
During filtering, the contrastive template will keep samples where both input and output compounds pass the filters.
Contrastive templates also use a similarity function to impose a similarity constraint on sample pairs (ie output compound must have a similarity of X to the input compound).
One consideration in using contrastive scores is how to properly scale contrastive scores. If we have a score with a maximum value of 1, a contrastive sample pair where the score goes from 0.8 to 1 should get the same reward as a sample pair where the score goes from 0.6 to 1. In both cases, the model maximized the output score to the greatest extent possible. To do this, we can scale the reward differences by the maximum possible reward different (ie reward = (output_reward - input_reward)/(max_reward - input_reward)). Passing a value to max_score will cause the contrastive template to perform this scaling.