Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw error for "explicit" hydrogens that aren't in molecule graph #1983

Merged
merged 23 commits into from
Jan 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/releasehistory.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Releases follow the `major.minor.micro` scheme recommended by [PEP440](https://w
### Bugfixes

- [PR #1971](https://github.com/openforcefield/openff-toolkit/pull/1971): Fixes bug where OpenEyeToolkitWrapper would write coordinate-less PDB atoms if insertion_code or chain_id was an empty string ([Issue #1967](https://github.com/openforcefield/openff-toolkit/issues/1967))
- [PR #1983](https://github.com/openforcefield/openff-toolkit/pull/1983): Fixes issue with implicit hydrogens slipping though checks to result in radicals later. [Issue #936](https://github.com/openforcefield/openff-toolkit/issues/936) and [Issue #1696](https://github.com/openforcefield/openff-toolkit/issues/1696)

### Performance improvements

Expand All @@ -30,6 +31,7 @@ Releases follow the `major.minor.micro` scheme recommended by [PEP440](https://w
## 0.16.7

### Bugfixes

- [PR #1971](https://github.com/openforcefield/openff-toolkit/pull/1971): Fixes bug where OpenEyeToolkitWrapper would write coordinate-less PDB atoms if insertion_code or chain_id was an empty string ([Issue #1967](https://github.com/openforcefield/openff-toolkit/issues/1967))

### Tests updated
Expand Down
21 changes: 21 additions & 0 deletions openff/toolkit/_tests/test_toolkits.py
Original file line number Diff line number Diff line change
Expand Up @@ -2250,6 +2250,27 @@ def test_rdkit_from_smiles_hydrogens_are_explicit(self):
)
assert offmol.n_atoms == 4

def test_rdkit_from_smiles_hydrogens_are_explicit_and_in_graph(self):
"""
Test to ensure that RDKitToolkitWrapper.from_smiles has the proper behavior with
respect to its hydrogens_are_explicit kwarg and that a CMILES entry has all hydrogens
in the graph.
jaclark5 marked this conversation as resolved.
Show resolved Hide resolved

`Issue #936 <https://github.com/openforcefield/openff-toolkit/issues/936>`,
`Issue #1696 <https://github.com/openforcefield/openff-toolkit/issues/1696>`,
"""
toolkit_wrapper = RDKitToolkitWrapper()
smiles_impl = "[H][C]([H])([H])[NH+]([H])[C]([H])([H])[H]"
with pytest.raises(
ValueError,
match="the following SMILES as having some nonexplicit hydrogens",
):
_ = Molecule.from_smiles(
smiles_impl,
toolkit_registry=toolkit_wrapper,
hydrogens_are_explicit=True,
)

@pytest.mark.parametrize("molecule", get_mini_drug_bank(RDKitToolkitWrapper))
def test_to_inchi(self, molecule):
"""Test, but do not validate, conversion to standard and non-standard InChI"""
Expand Down
18 changes: 18 additions & 0 deletions openff/toolkit/utils/rdkit_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -1588,6 +1588,24 @@ def from_smiles(
f"desired molecule as an RDMol with no implicit hydrogens, and then use "
f"Molecule.from_rdkit() to create the desired OFFMol."
)
# Issue #936 and #1696: GetNumExplicitHs() counts the number of hydrogens that are
# explicitly in the string but not present in the graph. This criteria would apply to
# hydrogens inside the brackets descibing a heavy atom, e.g. [NH+].
# https://www.rdkit.org/docs/Cookbook.html#explicit-valence-and-number-of-hydrogens
# This method is also used in GetTotalNumHs, which returns
# GetNumImplicitHs() + GetNumExplicitHs() + Optionally the number of
# neighboring hydrogens in the molecular graph.
# (see the cookbook link above for more details)
elif atom.GetNumExplicitHs() != 0:
raise ValueError(
"'hydrogens_are_explicit' was specified as True, but the RDKit wrapper interpreted "
f"the following SMILES as having some nonexplicit hydrogens (e.g., [NH+]): '{smiles}'\n"
"If this SMILES is intended to express all explicit hydrogens in the molecule, then either: \n"
"1) Replace instances of implicit Hs (ex. `[NH+]` with `[N+]([H])` to make "
"the hydrogen explicit, or\n"
"2) construct the desired molecule as an RDMol with no implicit hydrogens, "
"and then use Molecule.from_rdkit() to create the desired OFFMol."
)

molecule = self.from_rdkit(
rdmol,
Expand Down
Loading