Important Data Structures
This section provides a short description of important data structures used for input and output
in functions provided by HQS Molecules
. Many of these classes are implemented as Pydantic models. Pydantic provides data validation and parsing using Python type annotations. By leveraging Pydantic, the package ensures that input and output data are correctly formatted and validated, reducing the likelihood of errors and improving the robustness of the software.
The objects described in this section are:
MolecularGeometry
andMolecule
representing molecules in 3D,- Molecular formulas
MolecularFormula
, PubChem
dataclass for the data from PubChem,Trajectory
andMolecularFrequencies
representing output of quantum-mechanical calculations that cannot be stored in elementary data types.ConformerEnsemble
storing results of a conformer search: it is obtained by combining an initial search performed by CREST with a subsequent refinement of the conformer ensemble using techniques developed at HQS.
Representing Molecules in 3D
Representing Atomic Positions
The MolecularGeometry
class contains atomic positions (in Å) and chemical element symbols. Objects of this type are commonly generated in HQS Molecules
by reading an XYZ file. However, it lacks information on charge and spin multiplicity, which are typically needed for quantum-chemical calculations.
Important attributes of MolecularGeometry
objects are
natoms
(representing the number of atoms),symbols
(returning a list of chemical element symbols),- and
positions
(returning an N × 3 array of atomic positions).
Inspection of the class reveals further methods to update atomic positions and create copies of molecules, possibly with updated positions.
from hqs_molecules import MolecularGeometry
help(MolecularGeometry)
Internally, atoms are represented by a list of Atom
objects. These are defined as named tuples containing the element symbol and the position, one tuple per atom. Note that this feature permits the atoms
attribute to be used directly as input for PySCF calculations, as shown in the example below.
from hqs_molecules import smiles_to_molecule
from pyscf.gto import Mole
hqs_mol = smiles_to_molecule("C=C")
pyscf_mol = Mole(atom=hqs_mol.atoms)
Molecules with Charge and Spin
Molecule
is one of the most important classes in the HQS Molecules
package. It is implemented as a subclass of MolecularGeometry
, with the addition of charge
and multiplicity
fields. Objects of Molecule
type are commonly returned by functions performing 2D to 3D structure conversion. An additional attribute is nelectrons
, containing the number of electrons corresponding to the molecular composition and charge.
Molecular formulas (such as H2O or OH−) and molecular structure representations (such as SMILES strings or Molfiles) always contain the total molecular charge, explicitly or implicitly. Therefore, it is vital to preserve the total charge together with three-dimensional representations of molecular structures.
In addition to the charge, quantum-chemical calculations usually also require a specification of the spin multiplicity. Unlike the charge, it is not necessarily straightforward to infer from a molecular structure. Therefore, None
is permitted as a value for the field. Indeed, functions such as smiles_to_molecule
or molfile_to_molecule
never set the field to an integer value themselves.
Knowing the value of the spin multiplicity, the value can be set and validated for a Molecule
object by using the set_multiplicity
method.
from hqs_molecules import smiles_to_molecule
mol = smiles_to_molecule("CCO")
print(mol.multiplicity)
# None
mol.set_multiplicity(1)
print(mol.multiplicity)
# 1
Since the set_multiplicity
method returns the object itself in addition to modifying it, calls such as mol = smiles_to_molecule("CCO").set_multiplicity(1)
are possible.
Objects of type MolecularGeometry
can be converted to Molecule
instances using the to_molecule
method, with the charge being mandatory and the multiplicity optional.
Molecular Formulas
Within HQS Molecules
, molecular formulas are represented by MolecularFormula
objects containing the elemental composition and the total charge. For example, formulas from PubChem are converted into this format:
from hqs_molecules import PubChem
pc = PubChem.from_name("Bicarbonate")
print(pc.formula.model_dump())
# {'natoms': {'C': 1, 'H': 1, 'O': 3}, 'charge': -1}
The class implements __str__
as a conversion of the formula to a string in Hill notation:
print(pc.formula)
# CHO3-
#
# equivalent with:
print(str(pc.formula))
print(f"{pc.formula}")
Users can easily create molecular formulas from a string input.
from hqs_molecules import MolecularFormula
formula = MolecularFormula.from_str("MnO4-")
print(formula.model_dump())
# {'natoms': {'Mn': 1, 'O': 4}, 'charge': -1}
The from_str
constructor can handle some degree of complexity (for example, "CH3COOH"
is interpreted equivalently to "C2H4O2"
), but it cannot process arbitrarily complicated semi-structural formulas. Note that isomers cannot be distinguished, as they have identical elemental compositions.
Having created a Molecule
object, for instance using the smiles_to_molecule
function described above,
its molecular formula can be represented using the MolecularFormula.from_mol
class method.
Data from PubChem
Results from PubChem queries are stored within instances of the PubChem
class. Unlike most other classes described in this section, it is implemented as a dataclass
and not as a Pydantic model.
In practical use, instances of this class would normally be created using methods such as from_name
or from_smiles
. The retrieved data is stored in the fields of the class. A description can be found by executing:
from hqs_molecules import PubChem
help(PubChem)
Output of Quantum-Mechanical Calculations
Molecular Trajectories
Instances of the Trajectory
class, as returned by geometry optimizations with xTB, contain two fields:
- a list of
Molecule
objects that is labeledstructures
, - and the energies of each structure in a list labeled
energies
.
Convenience attributes are implemented for the following properties:
- Obtaining the number of structures through the
length
attribute. - Obtaining the last structure and its energy via the attributes
last
andlast_energy
, respectively. - Identifying the structure with the lowest energy and accessing the structure, its energy and its position in the trajectory with the attributes
lowest
,lowest_energy
andlowest_step
, respectively.
Vibrational Frequencies
A generic representation of computed vibrational modes and basic thermochemical properties is contained within the class VibrationalAnalysis
. Instances contain
- a list of vibrational modes (in the
modes
field) - and the nuclear Hessian (in the
hessian
field). The latter may be an empty list if the Hessian matrix is missing.
Please note that the normal modes and frequencies stored in the object amount to 3N − 6 (or 3N − 5 for linear molecules, where N is the number of atoms), rather than 3N.
Each entry in the modes
field is of type VibrationalMode
, which contains fields for
- the vibrational
frequency
, - the Cartesian normal-mode
displacements
, - the reduced mass associated with the normal mode in
reduced_mass
, - and the intensity of a normal mode excitation (in the
ir_intensity
field).
The values of the fields reduced_mass
and ir_intensity
may be None
, which is appropriate if
the respective values are not available as part of the vibrational analysis.
Convenience properties of the VibrationalAnalysis
class give access to
- a list of vibrational
frequencies
and - a list of all Cartesian
displacements
. reduced_masses
returns a list of reduced masses of the normal modes, or an empty list if the values are undefined, andir_intensities
returns a list of all infrared intensities or an empty list.- The
is_linear
flag indicates whether a molecule is linear, thus having 3N − 5 vibrational degrees of freedom instead of 3N − 6. - The number of atoms can be obtained via the
natoms
property. Note that thefrequencies
are represented as (real) floating-point numbers; by convention, imaginary frequencies are represented as negative numbers. An empty listmodes
is assumed to imply a system with one atom, while no special provisions are made for an empty system with zero atoms.
In addition to vibrational frequencies, programs such as xTB can calculate thermodynamic contributions via a rigid rotor and harmonic oscillator approximation. These contributions are temperature-dependent (while harmonic frequencies are not). Therefore, thermochemical corrections are stored in a Thermochemistry
object,
which contains the fields enthalpy
, entropy
, gibbs_energy
, and temperature
(representing the temperature used to evaluate the aforementioned properties). Since these quantities are interdependent, only enthalpy, entropy and temperature are stored explicitly, while the Gibbs energy is recomputed upon being accessed.
Despite not being temperature-dependent, the electronic energy is also present in the field energy
.
Using the update_energy
method, the electronic energy can be updated and the thermodynamic properties are recomputed accordingly,
which can be used if one wants to combine a high-level single-point energy with a lower-level frequency calculation.
Conformer Search Results
Structures and energies of conformers determined via CREST are stored by HQS Molecules
in a class ConformerEnsemble
, which contains a list of Conformer
objects.
Note that the grouping of conformer and rotamer structures as determined in the CREST calculation is ignored, and all the structures are regrouped by our own procedure, as described in the section on conformer search.
Further information on the attributes of the respective classes can be accessed from within Python:
from hqs_molecules import Conformer, ConformerEnsemble
help(Conformer)
help(ConformerEnsemble)