from chem_templates.filter import RangeFunctionFilter, SmartsFilter, CatalogFilter, \
BinaryFunctionFilter, DataFunctionFilter, Templatefrom chem_templates.chem import Molecule, Catalog
from rdkit import Chem
from rdkit.Chem import rdMolDescriptors, Descriptors
from rdkit.Chem.FilterCatalog import FilterCatalogParams
Template Tutorial
A fundamental step in computational drug design is defining what molecules we want. If we don’t have a sense of what molecules are in-spec for a specific project, we risk wasting significant effort screening irrelevant or flawed compounds.
The chem_templates
library enables defining expressive and detailed chemical spaces by defining a Template
made from various Filter
screens
Filters
The Filter
class lets us define pass/fail requirements for a molecule. A filter can be made from any function or evaluation that takes in a Molecule
object and returns a True/False
result.
The most common type of filter used is RangeFunctionFilter
. This uses some function that maps a Molecule
to a numeric value, and checks to see if the value is within some range. The example below filters molecules based on the number of rings present:
def num_rings(molecule):
return rdMolDescriptors.CalcNumRings(molecule.mol)
= 'rings' # filter name
filter_name = 1 # minimum number of rings (inclusive)
min_val = 2 # maximum number of rings (inclusive)
max_val = RangeFunctionFilter(num_rings, filter_name, min_val, max_val)
ring_filter
= Molecule('CCCC')
no_rings = Molecule('c1ccccc1')
one_ring = Molecule('c1ccc(Cc2ccccc2)cc1')
two_rings = Molecule('c1ccc(Cc2ccccc2Cc2ccccc2)cc1')
three_rings
= [
results
ring_filter(no_rings),
ring_filter(one_ring),
ring_filter(two_rings),
ring_filter(three_rings)
]
for i in results] [i.filter_result
[False, True, True, False]
Results are returned in the form of the FilterResult
which holds the aggregate boolean result (True/False pass/fail), the name of the filter, and any data added by the filter.
The RangeFunctionFilter
automatically adds data on the value computed by the function, as well as the min/max values for the range
= results[0]
res print(res.filter_result, res.filter_name, res.filter_data)
False rings {'computed_value': 0, 'min_val': 1, 'max_val': 2}
We can also filter with SMARTS string substructure match using SmartsFilter
= '[#6]1:[#6]:[#6]:[#7]:[#6]:[#6]:1' # filter for pyridine ring
smarts_string = 'pyridine'
name = True # exclude matches
exclude = 1 # min number of matches to trigger filter
min_val = None # max number of matches to trigger filter (None resolves to any value above min_val)
max_val
= SmartsFilter(smarts_string, name, exclude, min_val, max_val)
smarts_filter
= Molecule('c1ccccc1')
benzene = Molecule('c1cnccc1')
pyridine = Molecule('c1cnncc1')
two_nitrogen
= [
results
smarts_filter(benzene),
smarts_filter(pyridine),
smarts_filter(two_nitrogen)
]
for i in results] [i.filter_result
[True, False, True]
The CatalogFilter
class lets us filter on rdkit catalogs. The example below filters on the PAINS catalog:
= Catalog.from_params(FilterCatalogParams.FilterCatalogs.PAINS)
catalog = CatalogFilter(catalog, 'pains')
pains_filter
= Molecule('c1ccccc1Nc1ccccc1')
pains_passing = Molecule('c1ccccc1N=Nc1ccccc1')
pains_failing
= [
results
pains_filter(pains_passing),
pains_filter(pains_failing)
]
for i in results] [i.filter_result
[True, False]
Custom Filters
To allow for flexibility, the BinaryFunctionFilter
and DataFunctionFilter
allow us to make filters from arbitrary functions.
The BinaryFunctionFilter
class works with any function that maps a Molecule
to a boolean value:
def my_func(molecule):
if rdMolDescriptors.CalcExactMolWt(molecule.mol) > 150 and Chem.QED.qed(molecule.mol) > 0.6:
return True
else:
return False
= BinaryFunctionFilter(my_func, 'molwt_plus_qed')
my_filter
print(my_filter(Molecule('c1ccc(Cc2ccccc2)cc1')).filter_result)
True
If we want more information about what the filter function has computed, we can use the BinaryFunctionFilter
class. This works in the same way, but expects the filter function to also return a dictionary of values
def my_func(molecule):
= rdMolDescriptors.CalcExactMolWt(molecule.mol)
molwt = Chem.QED.qed(molecule.mol)
qed
= {'molwt' : molwt, 'qed' : qed}
data_dict
if molwt > 150 and qed > 0.6:
return True, data_dict
else:
return False, data_dict
= DataFunctionFilter(my_func, 'molwt_plus_qed')
my_filter
= my_filter(Molecule('c1ccc(Cc2ccccc2)cc1'))
result
print(result.filter_result, result.filter_data)
True {'molwt': 168.093900384, 'qed': 0.6452001853099995}
Templates
The Template
class holds multiple filters and executes them together. Below is an example of implementing the Rule of Five with a template:
def hbd(molecule):
return rdMolDescriptors.CalcNumHBD(molecule.mol)
def hba(molecule):
return rdMolDescriptors.CalcNumHBA(molecule.mol)
def molwt(molecule):
return rdMolDescriptors.CalcExactMolWt(molecule.mol)
def logp(molecule):
return Descriptors.MolLogP(molecule.mol)
= RangeFunctionFilter(hbd, 'hydrogen_bond_donor', None, 5)
hbd_filter = RangeFunctionFilter(hba, 'hydrogen_bond_acceptor', None, 10)
hba_filter = RangeFunctionFilter(molwt, 'mol_weight', None, 500)
molwt_filter = RangeFunctionFilter(logp, 'logp', None, 5)
logp_filter
= [
filters
hbd_filter,
hba_filter,
molwt_filter,
logp_filter
]
= Template(filters)
ro5_template
= Molecule('CC1=CN=C(C(=C1OC)C)CS(=O)C2=NC3=C(N2)C=C(C=C3)OC')
molecule = ro5_template(molecule) result
Template results are returned as a TemplateResult
, which holds the overall True/False result, as well as results and data from individual filters
print(result.result)
print(result.filter_results)
print(result.filter_data)
True
[True, True, True, True]
[hydrogen_bond_donor result: True, hydrogen_bond_acceptor result: True, mol_weight result: True, logp result: True]
Suggested Template Usage
The following are suggestions for getting the most out of chemical templates:
Leverage Cheap Filters
A major advantage of using filters/templates at scale is the cost per filter per molecule is generally low. Computing the molecular weight or number of rings in a compound is significantly cheaper compared to virtual screening methods such as predictive models or docking. Filters can be used to cheaply eliminate “out of spec” molecules before passing “in spec” molecules to more sophisticated screening methods.
Maintain Desired Chemotypes
If a drug project has a desired chemotype or chemotypes, we want to eliminate molecules that don’t match the desired chemotype(s). We can define the chemotype using SMARTS strings, and use SmartsFilter
with exclude=False
to eliminate molecules that don’t match the chemotype SMARTS.
Control IP Space
If you wish to avoid pre-existing IP, you can specify infringing chemotypes with SMARTS strings and use SmartsFilter
with exclude=True
to eliminate possibly infringing molecules
Synthetic Accessibility
Synthetic accessibility is a major factor in designing novel compounds. Given that synthesis bandwidth is typically a bottleneck in discovery pipelines, we want to avoid difficult to synthesize compounds that drain lab resources from other compounds. This challenge is often approached in literature using SA Score.
Unfortunately, SA score is often a poor fit for real discovery projects. SA score basically computes properties related to size, stereocenters, spiro-carbons, bridge-head carbons, and macrocycles, and renders those values into an aggregate score. While the SA score evaluation is generally reasonable from a “synthesize from scratch” perspective, it doesn’t capture the reality in the lab. For example, plenty of compounds with terrible SA scores can be easily created by taking advantage of building blocks that contain difficult structures.
SA score fails to capture the question of “how hard is it for my specific lab team to make this compound”. A better approach is to work with the lab team to define what compounds/substructures are hard to synthesize and develop a set of custom “SA score” filters based on this information.