Performance Notes
The workflow in this notebook is more CPU-constrained than GPU-constrained due to the need to evaluate samples on CPU. If you have a multi-core machine, it is recommended that you uncomment and run the set_global_pool
cells in the notebook. This will trigger the use of multiprocessing, which will result in 2-4x speedups.
This notebook may run slow on Collab due to CPU limitations.
If running on Collab, remember to change the runtime to GPU
import sys
sys.path.append('..')
from mrl.imports import *
from mrl.core import *
from mrl.chem import *
from mrl.templates.all import *
from mrl.torch_imports import *
from mrl.torch_core import *
from mrl.layers import *
from mrl.dataloaders import *
from mrl.g_models.all import *
from mrl.vocab import *
from mrl.policy_gradient import *
from mrl.train.all import *
from mrl.model_zoo import *
Agent
Here we create the model we want to optimize. We will use the LSTM_LM_Small_ZINC
- a LSTM-based language model pretrained on part of the ZINC database
agent = LSTM_LM_Small_ZINC(drop_scale=0.5,opt_kwargs={'lr':5e-5})
Optional: Fine Tuning
We could optionally fine-tune the pretrained model on a dataset of interest. This is an example of how to add a dataset to a pretrained Agent
and finetune
df = pd.read_csv('../files/smiles.csv')
# if in Collab
# download_files()
# df = pd.read_csv('files/smiles.csv')
agent.update_dataset_from_inputs(df.smiles.values)
agent.train_supervised(64, 1, 1e-5)
template = Template([ValidityFilter(),
SingleCompoundFilter()],
[])
template_cb = TemplateCallback(template, prefilter=True)
class FP_Regression_Score():
def __init__(self, fname):
self.model = torch.load(fname)
self.fp_function = partial(failsafe_fp, fp_function=ECFP6)
def __call__(self, samples):
mols = to_mols(samples)
fps = maybe_parallel(self.fp_function, mols)
fps = [fp_to_array(i) for i in fps]
x_vals = np.stack(fps)
preds = self.model.predict(x_vals)
return preds
# if in the repo
reward_function = FP_Regression_Score('../files/erbB1_regression.sklearn')
# if in Collab
# download_files()
# reward_function = FP_Regression_Score('files/erbB1_regression.sklearn')
reward = Reward(reward_function, weight=1.)
aff_reward = RewardCallback(reward, 'aff')
Loss Function
We will use the PolicyGradient
class, the simplest policy gradient algorithm
pg = PolicyGradient(discount=True, gamma=0.97)
loss = PolicyLoss(pg, 'PG')
# 0.5,
# lam=0.95,
# v_coef=0.5,
# cliprange=0.3,
# v_cliprange=0.3,
# ent_coef=0.01,
# kl_target=0.03,
# kl_horizon=3000,
# scale_rewards=True)
# loss = PolicyLoss(pg, 'PPO',
# value_head=ValueHead(256),
# v_update_iter=2,
# vopt_kwargs={'lr':1e-3})
gen_bs = 1500
sampler1 = ModelSampler(agent.vocab, agent.model, 'live', 400, 0., gen_bs)
samplers = [sampler1]
# sampler3 = LogSampler('samples', 'rewards', 10, 97, 400)
# samplers += [sampler2, sampler3]
live_max = MaxCallback('rewards', 'live')
live_p90 = PercentileCallback('rewards', 'live', 90)
cbs = [live_p90, live_max]
env = Environment(agent, template_cb, samplers=samplers, rewards=[aff_reward], losses=[loss],
cbs=cbs)
env.fit(128, 90, 500, 25)
env.log.plot_metrics()