Input of molecular NMR parameters

Text input via a YAML file

NMR parameters for molecules can be provided in a YAML file. A brief summary of relevant YAML features is provided before proceeding to more detailed explanations and example.

  • Dictionaries in YAML are defined as key: value pairs. Most commonly, a dictionary contains one key/value pair per line:
    key 1: value 1
    key 2: value 2
    
  • value 1 is interpreted as a string, 1 is interpreted as an integer and 1.0 is interpreted as a floating-point number. To avoid problems with special characters (e.g. square brackets), strings may be enclosed in single or double quotes (they have different meanings, and single quotes should be preferred for a literal interpretation of the string).
  • Lists can be defined over multiple lines as:
    - item 1
    - item 2
    
    Lists can also be enclosed in square brackets: [item 1, item 2, item 3]. The nested list [[1, 2, 3], [4, 5, 6]] is equivalent with:
    - [1, 2, 3]
    - [4, 5, 6]
    
  • Indentation is part of the syntax: key/value pairs or list entries over multiple lines need to have the same number of leading spaces (no tabs).
  • Comments start with a hash, #.

Definition of the molecular structure

Definition using SMILES

A molecular structure needs to be provided along with its NMR parameters. The simplest way to define a structure in the input file is through its SMILES representation. This is done using the key smiles, followed by a representation of the molecule:

# Acetic acid defined using SMILES.
smiles: CC(=O)O

SMILES strings often contain square brackets [...]. In such cases, the string should be enclosed within quotes, '...', to avoid problems with the YAML parser.

Definition using a Molfile

Manual definition of increasingly large molecules using SMILES can be cumbersome. For example, the string representation of penicillin V would be:

smiles: CC1(C)S[C@@H]2[C@H](NC(=O)COc3ccccc3)C(=O)N2[C@H]1C(=O)O

Instead, it is easier to draw a graphical representation such as the one below using one of many available proprietary or open source packages.

Such structural 2D representations are commonly stored in "Molfiles". A molecule can be read from a Molfile by specifying the file name after the key molfile:

molfile: penicillin_v.mol

Note that the YAML input needs to contain either a Molfile or SMILES, but it is not possible to specify both at the same time. Both the V2000 and V3000 variants of the Molfile specification are supported in the input.

Hydrogens in the molecular structure

Common representations of molecular structure, whether as skeletal formulas or as SMILES, tend to omit hydrogens. Instead, the number of hydrogen atoms is inferred from the atomic valencies, especially those of the carbons. Any of the following three structures can be provided as a Molfile:

Where hydrogens are suppressed (not drawn out as separate atoms with a bond), their NMR parameters are specified through assignment to the respective skeletal atom. In the leftmost of the three structures shown above, it would be not possible to assign different parameters to the two protons in the CH2 group. Instead, one of the two other structures shown above could be used to specify different parameters for those protons.

The only restriction with regards to hydrogens is that any skeletal atom can be connected either to suppressed or to stand-alone hydrogens, but not to a mixture of both. Thus, the following two structures would be rejected during input parsing:

The structure to the left mixes a non-suppressed hydrogen with a suppressed "implicit" hydrogen (CH) on the carbon atom; the structure to the right mixes a non-suppressed hydrogen with a suppressed "explicit" hydrogen (NH) on the nitrogen atom.

Numbering of atoms

Atoms in the structural representation of the molecule are labelled with integers for the assignment of parameters. Indices can be counted starting from zero or from one. To avoid errors or misunderstandings, it is mandatory to specify a count from key in the input file, followed by either 0 or 1. The choice between those two options is entirely arbitrary. An example for atom counting starting from zero:

# Atom indices are 0, 1, 2, 3 in their order of appearance in the SMILES string.
smiles: CC(=O)O
count from: 0

An example for atom counting starting from one:

# Atom indices are 1, 2, 3, 4 in their order of appearance in the SMILES string.
smiles: CC(=O)O
count from: 1

The atoms in a molecule are indexed by their order of appearance in the Molfile. For example, acetamide may be represented by a Molfile with the following content:

     RDKit          2D

  4  3  0  0  0  0  0  0  0  0999 V2000
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    0.7500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2990    2.2500    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.5981   -0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  2  0
  2  4  1  0
M  END

Counting the atoms starting from 1 (count from: 1), the appropriate indices are represented in the image below.

Chemical shifts

Providing chemical shifts will be illustrated using the nitrobenzene molecule as an example. Its molecular structure is contained in the file PhNO2.mol and shown in the picture below, including a numbering of its atoms.

All chemical shifts are specified under the key shifts. Additionally, the values need to be nested under keys representing isotopes, which contain the atomic mass number and the element symbol (e.g. 1H or 13C). For each isotope, the chemical shifts are provided in pairs of an atomic index and the associated value in ppm (parts per million):

# Structure with suppressed protons.
molfile: PhNO2.mol
count from: 0
shifts:
  # chemical shifts for protons in ppm
  1H:
    1: 8.23
    2: 7.56
    3: 7.71
    4: 7.56
    5: 8.23
  # chemical shifts for carbon-13 nuclei in ppm
  13C:
    0: 148.5
    1: 123.7
    2: 129.4
    3: 134.3
    4: 129.4
    5: 123.7

To assign chemical shifts for suppressed protons (that are not provided explicitly in the skeletal structure), the indices of the respective non-hydrogen atoms are used instead. All suppressed protons connected to the same atom are assigned an identical shift value.

If a Molfile contains hydrogens as standalone atoms, the chemical shifts are assigned to those protons using their respective atom indices. This is illustrated using the file PhNO2_allH.mol. Its structure is shown below.

In this example, the 1H shifts need to be assigned to atoms 9-13. Assigning them to atoms 1-5, as in the previous example, would produce an error.

# Structure with suppressed protons.
molfile: PhNO2_allH.mol
count from: 0
shifts:
  # chemical shifts for protons in ppm
  1H:
    9: 8.23
    10: 7.56
    11: 7.71
    12: 7.56
    13: 8.23
  # chemical shifts for carbon-13 nuclei in ppm
  13C:
    0: 148.5
    1: 123.7
    2: 129.4
    3: 134.3
    4: 129.4
    5: 123.7

Indirect spin-spin coupling constants

Indirect spin-spin coupling constants are provided under the key J-couplings in the YAML file. Additionally, the coupling constant values need to be grouped together by isotopes. Keys for each combination of isotopes are combined as isotope1-isotope2: e.g., 1H-1H for coupling constants between two protons or 1H-13C for the associated heteronuclear coupling.

J-coupling constants in units of Hz for each combination of nuclei are provided as a list of lists with the following structure:

J-couplings:
  isotope1-isotope2:
    - [atom index 1, atom index 2, coupling constant in Hz]
    - [atom index 1, atom index 2, coupling constant in Hz]
    - [...]
  isotope1-isotope2:
    - [atom index 1, atom index 2, coupling constant in Hz]
    - [...]

The first atom index refers to the first isotope and the second atom index refers to the second isotope. As with shifts, values for suppressed hydrogens are assigned via the associated skeletal carbon or heteroatom. If multiple protons are connected to the same skeletal atom, they are assigned the same coupling constant. Inequivalent protons attached to the same skeletal atom, need to be specified explicitly as standalone atoms in the molecule definition, so that they can be referred to via their respective atom indices.

Examples

1H parameters for propane with SMILES input

smiles: CCC
count from: 0
shifts:
  1H:
    0: 0.9
    1: 1.3
    2: 0.9
J-couplings:
  1H-1H:
    - [0, 1, 7.26]
    - [1, 2, 7.26]

The protons at the terminal CH3 groups are assigned chemical shifts of 0.9 ppm each, and the protons of the central CH2 group a value of 1.3 ppm. J-couplings between all protons of the neighboring CH3 and CH2 groups are assigned as 7.26 Hz. While an indirect spin-spin coupling interaction exists between equivalent protons within the CH3 and CH2 groups, it is not observed in the spectrum and the associated values are left out.

1H parameters for acrylonitrile

The structure of acrylonitrile is provided as a Molfile in acrylonitrile.mol. Its depiction is shown below.

Since protons 4 and 5 are inequivalent, they are specified as standalone atoms with different parameters. In addition, hydrogen 6 is represented as a standalone atom, though suppressing it would be an equally valid choice.

molfile: acrylonitrile.mol
count from: 1
shifts:
  1H:
    4: 5.79  # H(trans)
    5: 5.97  # H(cis)
    6: 5.48  # H(gem)
J-couplings:
  1H-1H:
    - [4, 5, 0.9]
    - [4, 6, 11.8]
    - [5, 6, 17.9]

Combined 1H and 13C parameters for chloromethane

To illustrate the definition of heteronuclear coupling constants, the following example shows parameters for 13C-enriched chloromethane. The parameters include the shifts of the three protons and the 13C nucleus, as well as the coupling constants between these nuclei.

# CH3Cl with 13C, the hydrogens inside square brackets are implicit.
smiles: '[13CH3]Cl'
# C will have index 1 and Cl will have index 2
count from: 1
shifts:
  # Shifts of the three protons.
  1H:
    1: 3.05
  # Shift of carbon-13 in the molecule.
  13C:
    1: 25.6
J-couplings:
  # Coupling between the three protons (would normally not be observed).
  1H-1H:
    - [1, 1, -10.8]
  # Coupling between the three protons and the 13C atom.
  1H-13C:
    - [1, 1, 150.0]

Representation in Python data structures

In Python, a YAML file containing NMR parameters is parsed using the following function:

from hqs_nmr_parameters import read_parameters_yaml
parameters = read_parameters_yaml('input_file.yaml')

The read_parameters_yaml function can read the following keywords from a YAML file:

  • name: An optional name of the molecule.
  • shifts: Chemical shifts in format {isotope: {index: value}}.
  • j_couplings/J-couplings: J-coupling values in format {isotope1-isotope2: [[index1, index2, value], ...]}.
  • count from/count_from: Specifies whether to count atomic indices starting from zero or from one.
  • smiles: SMILES string of the molecule.
  • molfile: Path to a Molfile with the molecular structure.
  • molblock: Compressed Molfile content (not in clear text).
  • temperature: Temperature in K.
  • solvent: Name of the solvent.
  • description: Additional further description.

parameters is an instance of the Pydantic MolecularData class. It contains the following attributes:

  • name: The name of the molecule.
  • isotopes: List containing pairs of an atom index and the associated isotope.
  • shifts: List containing pairs of an atom index and the associated chemical shift.
  • j_couplings: List containing pairs of atomic indices and the associated J-coupling values. Note that atom index pairs are unique: if a value is provided for an atom pair (k, l), then no value is provided for pair (l, k).
  • structures: Contains the chemical structure representations. These can be from a SMILES string, a Molfile, or an XYZ file.
  • formula: The molecular formula of the molecule.
  • temperature: An optional temperature definition.
  • solvent: Name of the solvent. An empty string represents an unknown or undefined solvent, or the absence of a solvent.
  • description: Optional further information.
  • method_json: Stores a JSON serialization of computational method settings. An empty string indicates that the field is not applicable. Creating and interpreting the content is the responsibility of the user of the model.

For a NMR calculation we require a reduced set of parameters. The reduced set contains only the information required for a NMR calculation, so only the attributes isotopes, shifts, and j_couplings. This reduced set of parameters can be obtained using the spin_system function of MolecularData:

nmr_parameters = parameters.spin_system()

nmr_parameters is an instance of the Pydantic NMRParameters class. It is used as input for several functions to calculate an NMR spectrum.

type(parameters)      # MolecularData
type(nmr_parameters)  # NMRParameters
spectrum = calculate_spectrum(molecule_parms=nmr_parameters, frequency=400.0)

Example inputs

The hqs_nmr_parameters package comes with a set of example molecule definitions. They can be accessed via

from pprint import pprint
from hqs_nmr_parameters import examples
# a dictionary containing {molecule name: description} key-value pairs
pprint(examples.molecule_names)

will print

{'C10H7Br': '1H parameters for 2-bromonaphthalene.',
 'C10H8': '1H parameters for naphthalene.',
 'C2H3CN': '1H parameters for acrylonitrile.',
 'C2H5Cl': '1H parameters for chloroethane.',
 'C2H6': 'Fantasy parameters for ethane (to test behavior for 2 groups of 1H)',
 'C3H8': '1H parameters for propane.',
 'C6H5NO2': '1H parameters for nitrobenzene.',
 'C6H6': '1H parameters for benzene.',
 'CH3Cl': 'Methyl chloried: 1H parameters.',
 'CH3Cl_13C': 'Methyl chloride enriched with 13C: 1H and 13C parameters.',
 'CHCl3': 'Chloroform: 1H parameters.',
 'CHCl3_13C': 'Chloroform enriched with 13C: 1H and 13C parameters.'}

Full molecule definitions can be loaded via

from pprint import pprint
from hqs_nmr_parameters import examples
# obtain the parameters dictionary for acrylonitrile
parameters = examples.molecule_parameters['C2H3CN']
# print parameters
pprint(parameters.model_dump())
{'description': '',
 'formula': 'C3H3N',
 'isotopes': [(3, (1, 'H')), (4, (1, 'H')), (5, (1, 'H'))],
 'j_couplings': [((3, 4), 0.9), ((3, 5), 11.8), ((4, 5), 17.9)],
 'method_json': '',
 'name': 'C2H3CN',
 'shifts': [(3, 5.79), (4, 5.97), (5, 5.48)],
 'solvent': '',
 'structures': {'Molfile': {'atom_map': [0, 1, 2, 3, 4, 5, 6],
                            'charge': 0,
                            'content': '\n'
                                       'JME 2022-02-26 Wed Sep 07 15:54:28 '
                                       'GMT+200 2022\n'
                                       '\n'
                                       '  0  0  0  0  0  0  0  0  0  0999 '
                                       'V3000\n'
                                       'M  V30 BEGIN CTAB\n'
                                       'M  V30 COUNTS 7 6 0 0 0\n'
                                       'M  V30 BEGIN ATOM\n'
                                       'M  V30 1 C 2.4249 2.1000 0.0000 0\n'
                                       'M  V30 2 C 3.6373 1.4000 0.0000 0\n'
                                       'M  V30 3 C 1.2124 1.4000 0.0000 0\n'
                                       'M  V30 4 H 0.0000 2.1000 0.0000 0\n'
                                       'M  V30 5 H 1.2124 0.0000 0.0000 0\n'
                                       'M  V30 6 H 2.4249 3.5000 0.0000 0\n'
                                       'M  V30 7 N 4.8497 0.7000 0.0000 0\n'
                                       'M  V30 END ATOM\n'
                                       'M  V30 BEGIN BOND\n'
                                       'M  V30 1 1 1 2\n'
                                       'M  V30 2 2 1 3\n'
                                       'M  V30 3 1 3 4\n'
                                       'M  V30 4 1 3 5\n'
                                       'M  V30 5 1 1 6\n'
                                       'M  V30 6 3 2 7\n'
                                       'M  V30 END BOND\n'
                                       'M  V30 END CTAB\n'
                                       'M  END\n',
                            'representation': 'Molfile',
                            'symbols': ['C', 'C', 'C', 'H', 'H', 'H', 'N']}},
 'temperature': None}

To access the .yaml and .mol input files use the following command to see their location on the file system:

from pathlib import Path
from hqs_nmr_parameters import examples
acrylonitrile_yaml = Path(examples.__file__).parent / "parameters" / "C2H3CN.yaml"
print(acrylonitrile_yaml.read_text())

will print the .yaml input for acrylonitrile:

molfile: C2H3CN.mol
count from: 1
shifts:
  1H:
    4: 5.79  # H(trans)
    5: 5.97  # H(cis)
    6: 5.48  # H(gem)
J-couplings:
  1H-1H:
    - [4, 5, 0.9]
    - [4, 6, 11.8]
    - [5, 6, 17.9]