Molecular Name Lookup
HQS Molecules
provides API access to the PubChem database, permitting name-to-structure and structure-to-name searches directly via Python scripts.
IMPORTANT: If you use the PubChem interface, data contained in the requests will be sent over the internet to the PubChem servers.
Name-to-Structure Search
Requests to PubChem are made via the PubChem
data class, which stores the data retrieved after completion of each request. In the following example, a request is made using the molecule name:
from hqs_molecules import PubChem
pc = PubChem.from_name("2-methylprop-1-ene")
print(pc.name)
# Isobutylene
print(pc.cid)
# 8255
print(pc.formula)
# C4H8
print(pc.smiles)
# CC(=C)C
As shown above, the data retrieved is stored in the attributes
name
for the compound name,smiles
for the SMILES string,cid
for the PubChem compound ID, andformula
for the molecular formula (as aMolecularFormula
object).
Note: the name stored in the name
attribute may differ from the argument supplied for the request: the most entries for a compound on PubChem list multiple synonymous names, but the name
attribute contains the representative title name selected on PubChem. This is illustrated in the example below:
from hqs_molecules import PubChem
pc = PubChem.from_name("acetylsalicylic acid")
print(pc.name)
# Aspirin
Structure-to-Name Search
Likewise, a reverse search can be performed using a SMILES string:
pc = PubChem.from_smiles("COC1=C(C=CC(=C1)C=O)O")
print(pc.name)
# Vanillin
Search by PubChem Compound Identifier
Finally, requests can be made using the PubChem compound identifier:
pc = PubChem.from_cid(145742)
print(pc.name)
# Proline
print(pc.smiles)
# C1C[C@H](NC1)C(=O)O
An important feature of the PubChem
class is that it verifies the depositor of individual data items retrieved. Data is only made available if PubChem is listed as the data source.
As shown in the section on 2D to 3D structure conversion, a SMILES string can be used to generate a three-dimensional representation.