Structure Conversion
An important functionality of HQS Molecules
is the conversion of two-dimensional molecular structure representations into three-dimensional geometries, as shown here for the example of the glucose molecule.
Simple 2D to 3D Conversion
Input: SMILES strings and Molfiles are supported as input for the functions smiles_to_molecule
and molfile_to_molecule
, respectively.
Output: Both functions return a Molecule
object containing three-dimensional atomic coordinates. In addition, the returned object contains the overall molecular charge, which is always included in a Molfile or a SMILES string. The multiplicity field is set to None
, as it cannot be derived unambiguously from the input.
The structure conversion employs the respective functionality of RDKit
as the first choice, and uses Open Babel
as a backup in case that structure conversion with RDKit fails. After generating a three-dimensional structure, a bonding graph is determined using distance criteria and used to verify the generated structure against the input. If the composition and the bonding graphs do not match, then the structure is rejected.
If the input structure is stored in a molfile named "my_molecule.mol"
, then the conversion is performed by calling:
>>> from hqs_molecules import molfile_to_molecule
>>> mol = molfile_to_molecule("my_molecule.mol")
Likewise, it is possible to perform a conversion of a molecular structure with a SMILES string - for example, one obtained from PubChem:
>>> from hqs_molecules import PubChem, smiles_to_molecule
>>> pc = PubChem.from_name("propane")
>>> mol = smiles_to_molecule(pc.smiles)
An optional check can be carried out with either of the conversion functions by supplying a molecular formula as an argument. The conversion fails if the input does not match the provided formula. In that case, the formula needs to be represented as a MolecularFormula
object.
>>> from hqs_molecules import MolecularFormula, PubChem, smiles_to_molecule
>>> pc = PubChem.from_name("propane")
>>> # succeeds
>>> mol = smiles_to_molecule(pc.smiles, formula=pc.formula)
>>> # raises an exception
>>> mol = smiles_to_molecule(pc.smiles, formula=MolecularFormula.from_str("C3H7-"))
Utilities for RDKit
The HQS Molecules
module includes convenience functions to create RDKit Mol
objects from SMILES strings, Molfiles, or XYZ files. These objects represent molecular information within the RDKit package.
Both the smiles_to_rdkit
and molfile_to_rdkit
functions accept an argument addHs
. By default, it is set to True
, causing explicit hydrogen atoms to be added in the generated object. Setting addHs = False
suppresses the addition of explicit hydrogens; only hydrogens that were already explicitly represented within a Molfile are retained.
>>> from hqs_molecules import smiles_to_rdkit
>>> # The object generated contains 11 atoms.
>>> rdkit_mol = smiles_to_rdkit("CCC")
>>> # The object generated contains 3 atoms.
>>> rdkit_mol = smiles_to_rdkit("CCC", addHs=False)
When creating an RDKit Mol
object from an XYZ file, this option does not apply as the hydrogen atoms always have to be represented explicitly. However, the xyzfile_to_rdkit
function accepts another important argument (apart from the XYZ file name or path) which is the charge
. This argument is necessary to specify the total net charge of the molecule in order to find the correct atomic connectivity and it is set to 0
by default for convenience. Since an XYZ file does not contain any information about chemical bonds by itself, this evaluation is done inside the xyzfile_to_rdkit
function using RDKit.
For the following example, assume we have two valid XYZ files called benzene.xyz
with the three-dimensional structure of a benzene molecule and nh4.xyz
with the structure of an ammonium cation.
>>> from hqs_molecules import xyzfile_to_rdkit
>>> # The object generated contains 12 atoms, 6 single bonds, and 6 aromatic bonds.
>>> rdkit_mol = xyzfile_to_rdkit("benzene.xyz")
>>> # This will raise a `ValueError`
>>> rdkit_mol = xyzfile_to_rdkit("nh4.xyz")
>>> # This works and the object generated contains 5 atoms and 4 single bonds.
>>> rdkit_mol = xyzfile_to_rdkit("nh4.xyz", charge=1)
Expert Usage
The functionalities described in the remainder of this section are only intended for expert usage.
An RDKit Mol
object can be converted to a three-dimensional structure by passing it to the function rdkit_to_molecule
. It is a low-level function that calls RDKit without resorting to Open Babel as a backup. Nonetheless, it performs a consistency check for the generated structure. If the RDKit Mol
object was created without the addition of explicit hydrogens (addHs = False
), this conversion may fail due to a composition mismatch.
The low-level functions to perform structure conversion using only Open Babel are available via smiles_to_molecule_obabel
and molfile_to_molecule_obabel
. These functions require a SMILES string or a Molfile as their input, respectively. A separate consistency check of the generated structure is also performed here.