Molecular Name Lookup

HQS Molecules provides API access to the PubChem database, permitting name-to-structure and structure-to-name searches directly via Python scripts.

IMPORTANT: If you use the PubChem interface, data contained in the requests will be sent over the internet to the PubChem servers.

Requests to PubChem are made via the PubChem data class, which stores the data retrieved after completion of each request. In the following example, a request is made using the molecule name:

from hqs_molecules import PubChem
pc = PubChem.from_name("2-methylprop-1-ene")
print(pc.name)
# Isobutylene
print(pc.cid)
# 8255
print(pc.formula)
# C4H8
print(pc.smiles)
# CC(=C)C

As shown above, the data retrieved is stored in the attributes

  • name for the compound name,
  • smiles for the SMILES string,
  • cid for the PubChem compound ID, and
  • formula for the molecular formula (as a MolecularFormula object).

Note: the name stored in the name attribute may differ from the argument supplied for the request: the most entries for a compound on PubChem list multiple synonymous names, but the name attribute contains the representative title name selected on PubChem. This is illustrated in the example below:

from hqs_molecules import PubChem
pc = PubChem.from_name("acetylsalicylic acid")
print(pc.name)
# Aspirin

Likewise, a reverse search can be performed using a SMILES string:

pc = PubChem.from_smiles("COC1=C(C=CC(=C1)C=O)O")
print(pc.name)
# Vanillin

Search by PubChem Compound Identifier

Finally, requests can be made using the PubChem compound identifier:

pc = PubChem.from_cid(145742)
print(pc.name)
# Proline
print(pc.smiles)
# C1C[C@H](NC1)C(=O)O

An important feature of the PubChem class is that it verifies the depositor of individual data items retrieved. Data is only made available if PubChem is listed as the data source.

As shown in the section on 2D to 3D structure conversion, a SMILES string can be used to generate a three-dimensional representation.