Semi-empirical Calculations with xTB

This HQS Molecules module contains a Python interface to perform calculations with the xTB software package.

The xTB (extended tight binding) program package is developed by the research group of Prof. Stefan Grimme at the University of Bonn. It contains implementations of the semi-empirical quantum-mechanical methods GFNn-xTB (n = 0–2) as well as of the GFN-FF force field ("GFN" stands for Geometries, Frequencies, and Non-covalent interactions, for which it has been parametrized). HQS Molecules interfaces the following features, permitting them to be used conveniently from within Python workflows:

  • Single-point energy calculations,
  • Structure optimizations,
  • Gradient calculations,
  • Frequency calculations,
  • Stable optimizations including Hessian calculations, and the
  • Biased Hessian.

Single-point energy calculations

The energy of any meaningful set of atomic coordinates with a specific charge and multiplicity can be calculated using the single_point function. Calculations are performed with the GFN2-xTB method by default, or alternatively with the GFN-FF method.

Input

The first mandatory argument of the function is a Molecule object, as the charge and multiplicity need to be known to perform a calculation. Note: the functions for reading an XYZ file return MolecularGeometry objects, which contain neither charge nor spin information. A MolecularGeometry objects can be easily converted to a Molecule object by providing information on the charge and, optionally, on the multiplicity, as described in the XYZ File Handling chapter. On the other hand, the 2D-to-3D structure conversion functions return Molecule objects with charge but without spin information.

Keyword arguments for single-point energy calculations

Calculation settings can be adjusted with the following keyword arguments:

  • force_field: Use GFN-FF force-field (defaults to False, in which case the GFN2-xTB method is used).
  • solvent: By default, calculations are performed in vacuum. If the name of a solvent is provided, the implicit solvation model ALPB (analytical linearized Poisson-Boltzmann) will be used in the calculation.
  • create_tmpdir: Run the calculation in a temporary folder that is deleted with completion of the calculation (the default is set to True).

Output

The single_point function returns a float with the single-point energy in Hartree. If an implicit solvation model is used, the solvation contribution ("Gsolv" in the xTB output) is part of the single-point energy.

Example

An example is provided here:

from hqs_molecules.xtb import single_point

energy = single_point(molecule, force_field=False)

Solvents

HQS Molecules supports using the ALPB implicit solvation model implemented in the xTB code. By default, calculations are performed in vacuum. Parameters of specific solvents can be selected by providing the solvent name as a string to the solvent keyword argument of the various wrapper functions, such as solvent="Water".

The following loop prints all solvent names that have been defined in the interface:

from hqs_molecules import xtb
for name, keyword in xtb.SOLVENT_KEYWORDS.items():
    print(f"{name:20}:  {keyword}")

It is possible to specify keys or values of the xtb.SOLVENT_KEYWORDS dictionary, so that "Octanol (wet)" or "woctanol" are equally valid. In addition, the input is treated case-insensitively.

For more information on defined solvents, please check the section in the xTB manual.

Example on specifying a solvent in a single-point energy calculation:

from hqs_molecules.xtb import single_point

energy = single_point(molecule, solvent="Water", force_field=False)

Structure optimization

xTB can optimize atomic positions in a molecular system to find a nearby local minimum of the total energy representing an equilibrium structure. This feature is embedded into HQS Molecules through the optimization function.

Keyword arguments for geometry optimizations

In addition to the arguments for single-point energy calculations, the following options are available:

  • opt_level with the possible optimization levels as defined by xTB. In this wrapper, the levels are defined in the XTBOptLevel class. By default, the very tight criterion is employed (XTBOptLevel.VTight), whereas the xTB software itself usesXTBOptLevel.Normal. For the definition of the options, ordered here from low to high accuracy, please refer to the xTB documentation:

    • Crude
    • Sloppy
    • Loose
    • Lax
    • Normal
    • Tight
    • VTight
    • Extreme
  • max_cycles is the maximum number of optimization iterations. By default, it is set to None, which means that max_cycles is automatically determined by the xTB software depending on the size of the molecule (check above link).

Output

Calls to the optimization function return a Trajectory object containing the optimization trajectory (Trajectory.structures) as a list of Molecule objects and the energies (in Hartree) for each step (Trajectory.energies) as a list of floats.

Most of the time, the quantities of interest are the optimized geometry and its energy. To access these, a Trajectory object contains two types of attributes: Trajectory.last and Trajectory.last_energy contain the last geometry and its energy, respectively; Trajectory.lowest_energy and Trajectory.lowest contain the lowest energy and the corresponding geometry. Trajectory.lowest_step returns the index of the energetically lowest geometry. In many cases, the last step may also be the one with the lowest energy. Note that the energy might increase slightly at the end of an optimization (within the given tolerance criteria) or it can converge to saddle point, in which case the lowest-energy geometry can be the anywhere along the trajectory, even at its beginning.

The trajectory can be written to an XYZ file using the write or write_with_info (includes energies) functions of the xyz module.

Example

The following box illustrates some general ideas behind executing a geometry optimization, starting from a Molecule object.

from hqs_molecules import xyz
from hqs_molecules.xtb import optimization

# perform optimization
opt_data = optimization(molecule, solvent="water")

# Write the trajectory into an XYZ file
xyz.write_with_info(file="trajectory.xyz", mol=opt_data)

# Optimized structure: here it is assumed to be in the last step.
opt_molecule = opt_data.last  # Molecule object
opt_energy = opt_data.last_energy  # float

Gradient calculation

The nuclear gradient (first derivative of the energy with respect to the atomic coordinates) is calculated at each step of an optimization procedure. An optimization is converged if the energy no longer changes and the (norm of the) gradient is close to zero.

A single-point gradient can be calculated using the gradient function with the same arguments as a single_point procedure. It returns a nested list of floating point numbers, representing an N × 3 matrix (where N is the number of atoms in the molecule). Each value in the list corresponds to the derivative of an atomic position with respect to the x, y or z coordinates, represented in units of Hartree Bohr−1.

Vibrational frequency calculations

The Hessian matrix (a matrix of second derivatives of the energy with respect to the atomic coordinates) is used to determine the nature of a stationary point, vibrational frequencies, molecular vibrational spectra, and thermochemical properties. In xTB, the Hessian matrix is calculated semi-numerically, i.e., from the analytical gradient using finite differences. HQS Molecules provides the frequencies wrapper function to calculate vibrational frequencies and thermochemical corrections.

Keyword arguments for vibrational frequency calculations

In addition to the options for single-point energy calculations, this function also accepts a temperature argument. This allows the user to define the absolute temperature for the calculation of thermochemical corrections. By default, the calculation will be performed at 298.15 K, but any other non-negative temperature can also be specified.

Nature of the input geometry

When calculating vibrational frequencies for a stationary point (an optimized equilibrium structure or a saddle point), it is important to use the same method for the frequencies as for the preceding geometry optimization; the potential energy surfaces of different methods have stationary points at different geometries.

Frequency calculations are often used to ascertain if a stationary point is a minimum or a saddle point. A molecule of N atoms has 3N degrees of freedom: three translational degrees, three rotational degrees (two for linear molecules), and 3N − 6 vibrational degrees (or 3N − 5 for linear molecules). For a minimum, the frequencies associated to those vibrational degrees need to be positive; for a transition state (i.e., a first-order saddle point), exactly one of them needs to be imaginary.

Output

The return value is a tuple with a VibrationalAnalysis and a Thermochemistry object. VibrationalAnalysis objects contain vibrational frequencies and normal modes in their fields and properties:

  • modes: a list of VibrationalMode objects, which contain the vibrational frequency (in cm−1), the infrared (IR) intensity (ir_intensity in km mol−1) as well as the Cartesian displacements of the normal mode as attributes.
  • frequencies: a list with the vibrational frequencies (in cm−1) in strictly increasing order. This list contains 3N − 6 elements (or 3N − 5 for linear molecules). By convention, imaginary frequencies are represented as negative numbers.
  • hessian: nested lists representing a 3N × 3N matrix with the nuclear Hessian (in Hartree Bohr−2).
  • reduced_masses: a list with the reduced masses (in atomic mass units) associated with the normal modes.
  • ir_intensities: a list with the IR intensities (in km mol−1) containing 3N − 6 or 3N − 5 elements.
  • displacements: a list with the Cartesian displacements for each normal mode that are normalized to unity.
  • is_linear: a property flag indicating whether the molecule is linear or not.
  • natoms: the number of atoms in the system, inferred from the dimensions of the displacements.

Thermochemistry objects contain thermodynamic data calculated with the modified rigid rotor and harmonic oscillator (mRRHO) approximation. The following data is accessible:

  • temperature: the temperature (in K) used to calculate the thermodynamic properties.
  • energy: the total electronic single-point energy (in Hartree)
  • enthalpy: the enthalpy (in Hartree), including the total electronic energy.
  • entropy: the entropy (in Hartree K-1).
  • gibbs_energy: the Gibbs energy (in Hartree) as a property. It not stored explicitly, but calculated from the enthalpy, entropy and temperature in the object. Like the enthalpy, it includes the total electronic energy.

When using an implicit solvation model, the xTB code calculates a solvation contribution, labelled as "Gsolv" in the output of the xTB binary. Even though "Gsolv" is technically a free energy contribution, it is reported as part of the total electronic energy by the xTB code. Therefore, the solvation contribution is also part of energy in the Thermochemistry object returned after a Hessian calculation with the xtb module.

Example

from hqs_molecules.xtb import frequencies

vib_analysis, thermochem = frequencies(opt_molecule, temperature=100, solvent="water")

print("Single point energy:", thermochem.energy)
print("Frequencies:", vib_analysis.frequencies)
print(
    f"At {thermochem.temperature} K, "
    f"the enthalpy is {thermochem.enthalpy} Eh, "
    f"the entropy is {thermochem.entropy} Eh/K, and "
    f"the Gibbs energy is {thermochem.gibbs_energy} Eh"
)

Stable optimization: calculating equilibrium structures reliably

With a simple geometry optimization, there is no guarantee to obtain a minimum on the potential energy surface; a converged structure can also be a saddle point. Vibrational frequencies need to be calculated for a full characterization. Only if all of them are non-negative, a minimum has been found.

If imaginary frequencies are found, a displacement along the associated normal mode is a good starting point for a new optimization. In difficult cases, optimizations followed by frequency calculations may need to be repeated multiple times in order to find a genuine local minimum.

Such a composite calculation is provided by HQS Molecules with the opt_with_hess function. It repeats the Hessian and optimization calculations until no imaginary frequencies are found. If the maximum number of restarts (indicated by the max_restarts keyword, 4 by default) is reached without converging to a minimum, an exception is raised by default. The latter behavior can be altered by setting fail_with_saddle to False, in which case the non-converged trajectory is returned together with the Hessian result from the lowest-energy structure. The latter can be obtained from the lowest property and can be used for a successive calculation.

Note that the restart geometries used in opt_with_hess are generated by the xTB code upon encountering a saddle point in a Hessian calculation. These are suited to displace the geometry away from a saddle point. As there is no control over the direction along a normal mode with imaginary frequency, the feature is meant to converge to a nearby local minimum, but not for a systematic exploration of reaction paths.

Input

Since the function combines optimizations with frequencies, it requires the same arguments as the optimization function, with an option to provide a temperature argument (for the thermodynamic properties) and max_restarts.

Output

The return value is a tuple containing a Trajectory object (as for an optimization), and VibrationalAnalysis and Thermochemistry objects (as for a frequency calculation). The trajectory concatenates the geometries and energies of all optimizations performed within the procedure. On the other hand, the VibrationalAnalysis and Thermochemistry objects contains information for the converged geometry, determined by the overall lowest energy. If fail_with_saddle is set to False, the appropriate results for the lowest-energy structure are returned, even if it could not be converged to a stable minimum.

Example

A simple call to this procedure can be made as exemplified below:

from hqs_molecules import xtb, xyz
molecule = xyz.read("input_geometry.xyz").to_molecule(charge=0, multiplicity=1)
trajectory, vib_analysis, thermochem = xtb.opt_with_hess(molecule)

Biased Hessian

Hessian calculations for vibrational frequencies or thermochemical contributions are normally performed at stationary points of the potential energy surface. Therefore, the geometry optimization and the Hessian calculation need to be performed with the same method.

However, Spicher and Grimme proposed a method ("single-point Hessian") to calculate the Hessian for stationary points that were optimized with a different method. Therefore, the geometry can be optimized with a more expensive method, while xTB is used to obtain a quantity resembling the Hessian. In a nutshell, it employs the following steps:

  1. The input structure is re-optimized under application of a biasing potential, which constrains the atoms to remain close to their original positions.
  2. The Hessian is calculated at the new, slightly distorted geometry under the application of the biasing potential.
  3. Finally, the frequencies are re-scaled to reduce the impact of the biasing potential.

HQS Moleculesprovides the function biased_hessian to interface the described method in the xTB package. Its usage is analogous to the function frequencies described above.