-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New molecules returned by enumerator components do not handle molecule properties correctly #98
Comments
Thanks for having a go at this. I think your solution will work well as long as the enumerated molecules returned from the toolkit are in the same order, one concern would be what happens when the ordering is changed the torsion indexer could then be on an incorrect torsion, if the torsion was not connected the validators would pick this up when making the dataset but we might need to think more about how we would get around this. I think this might come up when enumerating tautomers but I should check this. |
I think we could solve this by using atom maps, these would tag the atoms in the dihedral we want to rotate and should be respected by the backend toolkits and allow use to extract the new index numbers of the dihedral atoms. Here is an example using rdkit and openeye, note that the rdkit from rdkit import Chem
from rdkit.Chem.MolStandardize import rdMolStandardize
mol = Molecule.from_smiles("CCC(C)=O")
# select the 4 carbon atoms
atom_map = {0: 1, 1: 2, 2: 3, 3: 4}
rdmol = mol.to_rdkit()
# add the atom map
for atom, mapnum in atom_map.items():
rdatom = rdmol.GetAtomWithIdx(atom)
rdatom.SetAtomMapNum(mapnum)
enumerator = rdMolStandardize.TautomerEnumerator()
enumerator.SetMaxTautomers(10)
rdmol = Chem.RemoveHs(rdmol)
tautomers = enumerator.Enumerate(rdmol)
# make a list of OpenFF molecules
molecules = []
for taut in tautomers:
taut_hs = Chem.AddHs(taut)
mol = Molecule.from_smiles(
Chem.MolToSmiles(taut_hs), allow_undefined_stereo=True
)
molecules.append(mol)
# get the index of the dihedral atoms
molecules[0].properties
openeye oemol = mol.to_openeye()
# add the atom map
for oeatom in oemol.GetAtoms():
oe_index = oeatom.GetIdx()
if oe_index in atom_map:
oeatom.SetMapIdx(atom_map[oe_index])
tautomer_options = oequacpac.OETautomerOptions()
tautomer_options.SetApplyWarts(False)
tautomer_options.SetMaxTautomersGenerated(10)
tautomer_options.SetSaveStereo(True)
# this aligns the outputs of rdkit and openeye for the example cases
tautomer_options.SetCarbonHybridization(False)
tautomers = []
for tautomer in oequacpac.OEEnumerateTautomers(oemol, tautomer_options):
tautomers.append(Molecule.from_openeye(tautomer, allow_undefined_stereo=True)
# look at the map ids
tautomers[0].properties
|
I tried to enumerate protomers and stereoisomers for torsion driving, which returns some new molecules from the toolkit. The issue is that the TK returns e.g.
mol.properties['dihedrals']
as a string in the new molecules, which is a serializedTorsionIndexer
object in the original molecule. I have a PR here that has a quick fix so I was able to get a dataset submitted in qca-dataset-submissions. A general method which repacks data stored in properties would be ideal, as I know there are other fields ("extras", "keywords"), and possibly more in the future. Feel free to use the PR as you want to devise a solution.The text was updated successfully, but these errors were encountered: