Molecular Name Lookup

HQS Molecules provides API access to the PubChem database, permitting name-to-structure and structure-to-name searches directly via Python scripts.

IMPORTANT: If you use the PubChem interface, your data will be sent over the internet to the PubChem servers.

Requests to PubChem are made via the PubChem data class, which stores the data retrieved after completion of each request. In the following example, a request is made using the molecule name:

>>> from hqs_molecules import PubChem
>>> pc = PubChem.from_name("2-methylprop-1-ene")
>>> print(pc.name)
Isobutylene
>>> print(pc.cid)
8255
>>> print(pc.formula)
C4H8
>>> print(pc.smiles)
CC(=C)C
>>> 

As shown above, the data retrieved is stored in the attributes

  • name for the compound name,
  • smiles for the SMILES string,
  • cid for the PubChem compound ID, and
  • formula for the molecular formula (as a MolecularFormula object).

Note: the name stored in the name attribute may differ from the argument supplied for the request: the most entries for a compound on PubChem list multiple synonymous names, but the name attribute contains the representative title name selected on PubChem.

Likewise, a reverse search can be performed using a SMILES string:

>>> pc = PubChem.from_smiles("COC1=C(C=CC(=C1)C=O)O")
>>> print(pc.name)
Vanillin
>>> 

Search by PubChem Compound Identifier

Finally, requests can be made using the PubChem compound identifier:

>>> pc = PubChem.from_cid(145742)
>>> print(pc.name)
Proline
>>> print(pc.smiles)
C1C[C@H](NC1)C(=O)O
>>> 

An important feature of the PubChem class is that it verifies the depositor of individual data items retrieved. Data is only made available if PubChem is listed as the data source.

As shown in the section on 2D to 3D structure conversion, a SMILES string can be used to generate a three-dimensional representation.