Important Data Structures
This section provides a short description of important data structures used for input and output
in functions provided by HQS Molecules. Many of these classes are implemented as Pydantic models. Pydantic provides data validation and parsing using Python type annotations. By leveraging Pydantic, the package ensures that input and output data are correctly formatted and validated, reducing the likelihood of errors and improving the robustness of the software.
The objects described in this section are:
MolecularGeometryandMoleculerepresenting molecules in 3D,- Molecular formulas
MolecularFormula, PubChemdataclass for the data from PubChem,TrajectoryandMolecularFrequenciesrepresenting output of quantum-mechanical calculations that cannot be stored in elementary data types.ConformerEnsemblestoring results of a conformer search: it is obtained by combining an initial search performed by CREST with a subsequent refinement of the conformer ensemble using techniques developed at HQS.
Representing Molecules in 3D
Representing Atomic Positions
The MolecularGeometry class contains atomic positions (in Å) and chemical element symbols. Objects of this type are commonly generated in HQS Molecules by reading an XYZ file. However, it lacks information on charge and spin multiplicity, which are typically needed for quantum-chemical calculations.
Important attributes of MolecularGeometry objects are
natoms(representing the number of atoms N),symbols(returning a list of chemical element symbols),- and
positions(returning an N × 3 array of atomic positions).
Inspection of the class reveals further methods to update atomic positions and create copies of molecules, possibly with updated positions.
from hqs_molecules import MolecularGeometry
help(MolecularGeometry)
Internally, atoms are represented by a list of Atom objects. These are defined as named tuples containing the element symbol and the position, one tuple per atom. Note that this feature permits the atoms attribute to be used directly as input for PySCF calculations, as shown in the example below.
from hqs_molecules import smiles_to_molecule
from pyscf.gto import Mole
hqs_mol = smiles_to_molecule("C=C")
pyscf_mol = Mole(atom=hqs_mol.atoms)
Molecules with Charge and Spin
Molecule is one of the most important classes in the HQS Molecules package. It is implemented as a subclass of MolecularGeometry, with the addition of charge and multiplicity fields. Objects of Molecule type are commonly returned by functions performing 2D to 3D structure conversion. An additional attribute is nelectrons, containing the number of electrons corresponding to the molecular composition and charge.
Molecular formulas (such as H2O or OH−) and molecular structure representations (such as SMILES strings or Molfiles) always contain the total molecular charge, explicitly or implicitly. Therefore, it is vital to preserve the total charge together with three-dimensional representations of molecular structures.
In addition to the charge, quantum-chemical calculations usually also require a specification of the spin multiplicity. Unlike the charge, it is not necessarily straightforward to infer from a molecular structure. Therefore, None is permitted as a value for the field. Indeed, functions such as smiles_to_molecule or molfile_to_molecule never set the field to an integer value themselves.
Knowing the value of the spin multiplicity, the value can be set and validated for a Molecule object by using the set_multiplicity method.
from hqs_molecules import smiles_to_molecule
mol = smiles_to_molecule("CCO")
print(mol.multiplicity)
# None
mol.set_multiplicity(1)
print(mol.multiplicity)
# 1
Since the set_multiplicity method returns the object itself in addition to modifying it, calls such as mol = smiles_to_molecule("CCO").set_multiplicity(1) are possible.
Objects of type MolecularGeometry can be converted to Molecule instances using the to_molecule method, with the charge being mandatory and the multiplicity optional.
Molecular Formulas
Within HQS Molecules, molecular formulas are represented by MolecularFormula objects containing the elemental composition and the total charge. For example, formulas from PubChem are converted into this format:
from hqs_molecules import PubChem
pc = PubChem.from_name("Bicarbonate")
print(pc.formula.model_dump())
# {'natoms': {'C': 1, 'H': 1, 'O': 3}, 'charge': -1}
The class implements __str__ as a conversion of the formula to a string in Hill notation:
print(pc.formula)
# CHO3-
#
# equivalent with:
print(str(pc.formula))
print(f"{pc.formula}")
Users can easily create molecular formulas from a string input.
from hqs_molecules import MolecularFormula
formula = MolecularFormula.from_str("MnO4-")
print(formula.model_dump())
# {'natoms': {'Mn': 1, 'O': 4}, 'charge': -1}
The from_str constructor can handle some degree of complexity (for example, "CH3COOH" is interpreted equivalently to "C2H4O2"), but it cannot process arbitrarily complicated semi-structural formulas. Note that isomers cannot be distinguished, as they have identical elemental compositions.
Having created a Molecule object, for instance using the smiles_to_molecule function described above,
its molecular formula can be represented using the MolecularFormula.from_mol class method.
Data from PubChem
Results from PubChem queries are stored within instances of the PubChem class. Unlike most other classes described in this section, it is implemented as a dataclass and not as a Pydantic model.
In practical use, instances of this class would normally be created using methods such as from_name or from_smiles. The retrieved data is stored in the fields of the class. A description can be found by executing:
from hqs_molecules import PubChem
help(PubChem)
Output of Quantum-Mechanical Calculations
Molecular Trajectories
Instances of the Trajectory class, as returned by geometry optimizations with xTB, contain two fields:
- a list of
Moleculeobjects that is labeledstructures, - and the energies of each structure in a list labeled
energies.
Convenience attributes are implemented for the following properties:
- Obtaining the number of structures through the
lengthattribute. - Obtaining the last structure and its energy via the attributes
lastandlast_energy, respectively. - Identifying the structure with the lowest energy and accessing the structure, its energy and its position in the trajectory with the attributes
lowest,lowest_energyandlowest_step, respectively.
Vibrational Frequencies
A generic representation of computed vibrational modes and basic thermochemical properties is contained within the class VibrationalAnalysis. Instances contain
- a list of vibrational modes (in the
modesfield) - and the nuclear Hessian (in the
hessianfield). The latter may be an empty list if the Hessian matrix is missing.
Please note that the normal modes and frequencies stored in the object amount to 3N − 6 (or 3N − 5 for linear molecules, where N is the number of atoms), rather than 3N.
Each entry in the modes field is of type VibrationalMode, which contains fields for
- the vibrational
frequency, - the Cartesian normal-mode
displacements, - the reduced mass associated with the normal mode in
reduced_mass, - and the intensity of a normal mode excitation (in the
ir_intensityfield).
The values of the fields reduced_mass and ir_intensity may be None, which is appropriate if
the respective values are not available as part of the vibrational analysis.
Convenience properties of the VibrationalAnalysis class give access to
- a list of vibrational
frequenciesand - a list of all Cartesian
displacements. reduced_massesreturns a list of reduced masses of the normal modes, or an empty list if the values are undefined, andir_intensitiesreturns a list of all infrared intensities or an empty list.- The
is_linearflag indicates whether a molecule is linear, thus having 3N − 5 vibrational degrees of freedom instead of 3N − 6. - The number of atoms can be obtained via the
natomsproperty. Note that thefrequenciesare represented as (real) floating-point numbers; by convention, imaginary frequencies are represented as negative numbers. An empty listmodesis assumed to imply a system with one atom, while no special provisions are made for an empty system with zero atoms.
In addition to vibrational frequencies, programs such as xTB can calculate thermodynamic contributions via a rigid rotor and harmonic oscillator approximation. These contributions are temperature-dependent (while harmonic frequencies are not). Therefore, thermochemical corrections are stored in a Thermochemistry object,
which contains the fields enthalpy, entropy, gibbs_energy, and temperature (representing the temperature used to evaluate the aforementioned properties). Since these quantities are interdependent, only enthalpy, entropy and temperature are stored explicitly, while the Gibbs energy is recomputed upon being accessed.
Despite not being temperature-dependent, the electronic energy is also present in the field energy.
Using the update_energy method, the electronic energy can be updated and the thermodynamic properties are recomputed accordingly,
which can be used if one wants to combine a high-level single-point energy with a lower-level frequency calculation.
Conformer Search Results
Structures and energies of conformers determined via CREST are stored by HQS Molecules in a class ConformerEnsemble, which contains a list of Conformer objects.
Note that the grouping of conformer and rotamer structures as determined in the CREST calculation is ignored, and all the structures are regrouped by our own procedure, as described in the section on conformer search.
Further information on the attributes of the respective classes can be accessed from within Python:
from hqs_molecules import Conformer, ConformerEnsemble
help(Conformer)
help(ConformerEnsemble)