From: A systematic review of deep learning chemical language models in recent era
Database | Description | Number of molecules (millions) | Molecule representation | Articles | Ref |
---|---|---|---|---|---|
PubChem | Structural information of mostly small molecules | 115.3 | SMILES and InChI | 4 | [93] |
ChEMBL | Bioactive molecules with drug-like properties and Bioactivity records of data | 2.4 | SMILES and InChI | 27 | [94] |
Zinc | Structural information of drug-like molecules | 750 | SMILES | 27 | [95] |
US patent database | Reactions extracted by text-mining from United States patents published between 1976 and September 2016 |  < 1.8 | SMILES | 1 | [96] |
DNA-Encoded Librarya | Structural molecular, combinatorial screening, and DNA-encoded information | 1040 | SMILES | 1 | [97] |
COCONUT | Natural products structural and biological information | 0.695 | SMILES and InChI | 1 | [98] |
LINCS1000 | A comprehensive resource of gene expression in human cells perturbated by small molecules |  > 1 | Not applicable | 1 | [99] |