Important Data Structures

This section provides a short description of important data structures used for input and output in functions provided by HQS Molecules. Many of these classes are implemented as Pydantic models. Pydantic provides data validation and parsing using Python type annotations. By leveraging Pydantic, the package ensures that input and output data are correctly formatted and validated, reducing the likelihood of errors and improving the robustness of the software.

The objects described in this section are:

  • MolecularGeometry and Molecule representing molecules in 3D,
  • Molecular formulas MolecularFormula,
  • PubChem dataclass for the data from PubChem,
  • Trajectory and MolecularFrequencies representing output of quantum-mechanical calculations that cannot be stored in elementary data types.
  • ConformerEnsemble storing results of a conformer search: it is obtained by combining an initial search performed by CREST with a subsequent refinement of the conformer ensemble using techniques developed at HQS.

Representing Molecules in 3D

Representing Atomic Positions

The MolecularGeometry class contains atomic positions (in Å) and chemical element symbols. Objects of this type are commonly generated in HQS Molecules by reading an XYZ file. However, it lacks information on charge and spin multiplicity, which are typically needed for quantum-chemical calculations.

Important attributes of MolecularGeometry objects are

  • natoms (representing the number of atoms),
  • symbols (returning a list of chemical element symbols),
  • and positions (returning an N × 3 array of atomic positions).

Inspection of the class reveals further methods to update atomic positions and create copies of molecules, possibly with updated positions.

>>> from hqs_molecules import MolecularGeometry
>>> help(MolecularGeometry)

Internally, atoms are represented by a list of Atom objects. These are defined as named tuples containing the element symbol and the position, one tuple per atom. Note that this feature permits the atoms attribute to be used directly as input for PySCF calculations, as shown in the example below.

>>> from hqs_molecules import smiles_to_molecule
>>> from pyscf.gto import Mole
>>> hqs_mol = smiles_to_molecule("C=C")
>>> pyscf_mol = Mole(atom=hqs_mol.atoms)

Molecules with Charge and Spin

Molecule is one of the most important classes in the HQS Molecules package. It is implemented as a subclass of MolecularGeometry, with the addition of charge and multiplicity fields. Objects of Molecule type are commonly returned by functions performing 2D to 3D structure conversion. An additional attribute is nelectrons, containing the number of electrons corresponding to the molecular composition and charge.

Molecular formulas (such as H2O or OH) and molecular structure representations (such as SMILES strings or Molfiles) always contain the total molecular charge, explicitly or implicitly. Therefore, it is vital to preserve the total charge together with three-dimensional representations of molecular structures.

In addition to the charge, quantum-chemical calculations usually also require a specification of the spin multiplicity. Unlike the charge, it is not necessarily straightforward to infer from a molecular structure. Therefore, None is permitted as a value for the field. Indeed, functions such as smiles_to_molecule or molfile_to_molecule never set the field to an integer value themselves. Knowing the value of the spin multiplicity, the value can be set and validated for a Molecule object by using the set_multiplicity method.

>>> from hqs_molecules import smiles_to_molecule
>>> mol = smiles_to_molecule("CCO")
>>> print(mol.multiplicity)
None
>>> mol.set_multiplicity(1)
Molecule(atoms=[...], charge=0, multiplicity=1)
>>> print(mol.multiplicity)
1

Since the set_multiplicity method returns the object itself in addition to modifying it, calls such as mol = smiles_to_molecule("CCO").set_multiplicity(1) are possible.

Objects of type MolecularGeometry can be converted to Molecule instances using the to_molecule method, with the charge being mandatory and the multiplicity optional.

Molecular Formulas

Within HQS Molecules, molecular formulas are represented by MolecularFormula objects containing the elemental composition and the total charge. For example, formulas from PubChem are converted into this format:

>>> from hqs_molecules import PubChem
>>> pc = PubChem.from_name("Bicarbonate")
>>> pc.formula
MolecularFormula(natoms={'C': 1, 'H': 1, 'O': 3}, charge=-1)
>>> 

The class implements __str__ as a conversion of the formula to a string in Hill notation:

>>> f"{pc.formula}"
'CHO3-'

Users can easily create molecular formulas from a string input.

>>> from hqs_molecules import MolecularFormula
>>> formula = MolecularFormula.from_str("MnO4-")
>>> formula
MolecularFormula(natoms={'Mn': 1, 'O': 4}, charge=-1)
>>> 

The from_str constructor can handle some degree of complexity (for example, "CH3COOH" is interpreted equivalently to "C2H4O2"), but it cannot process arbitrarily complicated semi-structural formulas. Note that isomers cannot be distinguished, as they have identical elemental compositions.

Data from PubChem

Results from PubChem queries are stored within instances of the PubChem class. Unlike most other classes described in this section, it is implemented as a dataclass and not as a Pydantic model.

In practical use, instances of this class would normally be created using methods such as from_name or from_smiles. The retrieved data is stored in the fields of the class. A description can be found by executing:

>>> from hqs_molecules import PubChem
>>> help(PubChem)

Output of Quantum-Mechanical Calculations

Molecular Trajectories

Instances of the Trajectory class, as returned by geometry optimizations with xTB, contain two fields:

  • a list of Molecule objects that is labeled structures,
  • and the energies of each structure in a list labeled energies.

Convenience attributes are implemented for the following properties:

  • Obtaining the number of structures through the length attribute.
  • Obtaining the last structure and its energy via the attributes last and last_energy, respectively.
  • Identifying the structure with the lowest energy and accessing the structure, its energy and its position in the trajectory with the attributes lowest, lowest_energy and lowest_step, respectively.

Vibrational Frequencies

A generic representation of computed vibrational frequencies and basic thermochemical properties is contained within the class MolecularFrequencies. Instances contain

  • the total electronic energy (in the total_energy field)
  • and a list of vibrational frequencies (in the field frequencies).

Note that the latter are represented as (real) floating-point numbers; by convention, imaginary frequencies are represented as negative numbers.

Additionally, the MolecularFrequencies class defines

  • a field thermochem with a list of BasicThermochemistry objects.

In addition to vibrational frequencies, programs such as xTB can calculate thermodynamic contributions via a rigid rotor and harmonic oscillator approximation. These contributions are temperature-dependent (while harmonic frequencies are not). Therefore, thermochemical corrections are stored in a list with one item per temperature value. Each BasicThermochemistry object contains fields enthalpy, entropy, gibbs_energy, and temperature (representing the temperature used to evaluate the aforementioned properties). Since these quantities are interdependent, only enthalpy, entropy and temperature are stored explicitly, while the Gibbs energy is recomputed upon being accessed.

Note that the Hessian matrix itself is not represented in the MolecularFrequencies class.

Conformer Search Results

Structures and energies of conformers determined via CREST are stored by HQS Molecules in a class ConformerEnsemble, which contains a list of Conformer objects.

Note that the grouping of conformer and rotamer structures as determined in the CREST calculation is ignored, and all the structures are regrouped by our own procedure, as described in the section on conformer search.

Further information on the attributes of the respective classes can be accessed from within Python:

>>> from hqs_molecules import Conformer, ConformerEnsemble
>>> help(Conformer)
>>> help(ConformerEnsemble)