Template Tutorial

How to use templates to define chemical space

from chem_templates.filter import RangeFunctionFilter, SmartsFilter, CatalogFilter, \
BinaryFunctionFilter, DataFunctionFilter, Template
from chem_templates.chem import Molecule, Catalog

from rdkit import Chem
from rdkit.Chem import rdMolDescriptors, Descriptors
from rdkit.Chem.FilterCatalog import FilterCatalogParams

A fundamental step in computational drug design is defining what molecules we want. If we don’t have a sense of what molecules are in-spec for a specific project, we risk wasting significant effort screening irrelevant or flawed compounds.

The chem_templates library enables defining expressive and detailed chemical spaces by defining a Template made from various Filter screens

Filters

The Filter class lets us define pass/fail requirements for a molecule. A filter can be made from any function or evaluation that takes in a Molecule object and returns a True/False result.

The most common type of filter used is RangeFunctionFilter. This uses some function that maps a Molecule to a numeric value, and checks to see if the value is within some range. The example below filters molecules based on the number of rings present:

def num_rings(molecule):
    return rdMolDescriptors.CalcNumRings(molecule.mol)
    
filter_name = 'rings' # filter name 
min_val = 1 # minimum number of rings (inclusive)
max_val = 2 # maximum number of rings (inclusive)
ring_filter = RangeFunctionFilter(num_rings, filter_name, min_val, max_val)

no_rings = Molecule('CCCC')
one_ring = Molecule('c1ccccc1')
two_rings = Molecule('c1ccc(Cc2ccccc2)cc1')
three_rings = Molecule('c1ccc(Cc2ccccc2Cc2ccccc2)cc1')

results = [
    ring_filter(no_rings),
    ring_filter(one_ring),
    ring_filter(two_rings),
    ring_filter(three_rings)
]

[i.filter_result for i in results]

[False, True, True, False]

Results are returned in the form of the FilterResult which holds the aggregate boolean result (True/False pass/fail), the name of the filter, and any data added by the filter.

The RangeFunctionFilter automatically adds data on the value computed by the function, as well as the min/max values for the range

res = results[0]
print(res.filter_result, res.filter_name, res.filter_data)

False rings {'computed_value': 0, 'min_val': 1, 'max_val': 2}

We can also filter with SMARTS string substructure match using SmartsFilter

smarts_string = '[#6]1:[#6]:[#6]:[#7]:[#6]:[#6]:1' # filter for pyridine ring
name = 'pyridine'
exclude = True # exclude matches
min_val = 1 # min number of matches to trigger filter
max_val = None # max number of matches to trigger filter (None resolves to any value above min_val)

smarts_filter = SmartsFilter(smarts_string, name, exclude, min_val, max_val)

benzene = Molecule('c1ccccc1')
pyridine = Molecule('c1cnccc1')
two_nitrogen = Molecule('c1cnncc1')

results = [
    smarts_filter(benzene),
    smarts_filter(pyridine),
    smarts_filter(two_nitrogen)
]

[i.filter_result for i in results]

[True, False, True]

The CatalogFilter class lets us filter on rdkit catalogs. The example below filters on the PAINS catalog:

catalog = Catalog.from_params(FilterCatalogParams.FilterCatalogs.PAINS)
pains_filter = CatalogFilter(catalog, 'pains')

pains_passing = Molecule('c1ccccc1Nc1ccccc1')
pains_failing = Molecule('c1ccccc1N=Nc1ccccc1')

results = [
    pains_filter(pains_passing),
    pains_filter(pains_failing)
]

[i.filter_result for i in results]

[True, False]

Custom Filters

To allow for flexibility, the BinaryFunctionFilter and DataFunctionFilter allow us to make filters from arbitrary functions.

The BinaryFunctionFilter class works with any function that maps a Molecule to a boolean value:

def my_func(molecule):
    if rdMolDescriptors.CalcExactMolWt(molecule.mol) > 150 and Chem.QED.qed(molecule.mol) > 0.6:
        return True
    else:
        return False
    
my_filter = BinaryFunctionFilter(my_func, 'molwt_plus_qed')

print(my_filter(Molecule('c1ccc(Cc2ccccc2)cc1')).filter_result)

True

If we want more information about what the filter function has computed, we can use the BinaryFunctionFilter class. This works in the same way, but expects the filter function to also return a dictionary of values

def my_func(molecule):
    molwt = rdMolDescriptors.CalcExactMolWt(molecule.mol)
    qed = Chem.QED.qed(molecule.mol)
    
    data_dict = {'molwt' : molwt, 'qed' : qed}
    
    if molwt > 150 and qed > 0.6:
        return True, data_dict
    else:
        return False, data_dict
    
my_filter = DataFunctionFilter(my_func, 'molwt_plus_qed')

result = my_filter(Molecule('c1ccc(Cc2ccccc2)cc1'))

print(result.filter_result, result.filter_data)

True {'molwt': 168.093900384, 'qed': 0.6452001853099995}

Templates

The Template class holds multiple filters and executes them together. Below is an example of implementing the Rule of Five with a template:

def hbd(molecule):
    return rdMolDescriptors.CalcNumHBD(molecule.mol)

def hba(molecule):
    return rdMolDescriptors.CalcNumHBA(molecule.mol)

def molwt(molecule):
    return rdMolDescriptors.CalcExactMolWt(molecule.mol)

def logp(molecule):
    return Descriptors.MolLogP(molecule.mol)

hbd_filter = RangeFunctionFilter(hbd, 'hydrogen_bond_donor', None, 5)
hba_filter = RangeFunctionFilter(hba, 'hydrogen_bond_acceptor', None, 10)
molwt_filter = RangeFunctionFilter(molwt, 'mol_weight', None, 500)
logp_filter = RangeFunctionFilter(logp, 'logp', None, 5)

filters = [
    hbd_filter,
    hba_filter,
    molwt_filter,
    logp_filter
]

ro5_template = Template(filters)

molecule = Molecule('CC1=CN=C(C(=C1OC)C)CS(=O)C2=NC3=C(N2)C=C(C=C3)OC')
result = ro5_template(molecule)

Template results are returned as a TemplateResult, which holds the overall True/False result, as well as results and data from individual filters

print(result.result)
print(result.filter_results)
print(result.filter_data)

True
[True, True, True, True]
[hydrogen_bond_donor result: True, hydrogen_bond_acceptor result: True, mol_weight result: True, logp result: True]

Suggested Template Usage

The following are suggestions for getting the most out of chemical templates:

Leverage Cheap Filters

A major advantage of using filters/templates at scale is the cost per filter per molecule is generally low. Computing the molecular weight or number of rings in a compound is significantly cheaper compared to virtual screening methods such as predictive models or docking. Filters can be used to cheaply eliminate “out of spec” molecules before passing “in spec” molecules to more sophisticated screening methods.

Maintain Desired Chemotypes

If a drug project has a desired chemotype or chemotypes, we want to eliminate molecules that don’t match the desired chemotype(s). We can define the chemotype using SMARTS strings, and use SmartsFilter with exclude=False to eliminate molecules that don’t match the chemotype SMARTS.

Control IP Space

If you wish to avoid pre-existing IP, you can specify infringing chemotypes with SMARTS strings and use SmartsFilter with exclude=True to eliminate possibly infringing molecules

Synthetic Accessibility

Synthetic accessibility is a major factor in designing novel compounds. Given that synthesis bandwidth is typically a bottleneck in discovery pipelines, we want to avoid difficult to synthesize compounds that drain lab resources from other compounds. This challenge is often approached in literature using SA Score.

Unfortunately, SA score is often a poor fit for real discovery projects. SA score basically computes properties related to size, stereocenters, spiro-carbons, bridge-head carbons, and macrocycles, and renders those values into an aggregate score. While the SA score evaluation is generally reasonable from a “synthesize from scratch” perspective, it doesn’t capture the reality in the lab. For example, plenty of compounds with terrible SA scores can be easily created by taking advantage of building blocks that contain difficult structures.

SA score fails to capture the question of “how hard is it for my specific lab team to make this compound”. A better approach is to work with the lab team to define what compounds/substructures are hard to synthesize and develop a set of custom “SA score” filters based on this information.