Skip to main content

Table 1 Description of databases used in retrieved articles for analysis.

From: A systematic review of deep learning chemical language models in recent era

Database

Description

Number of molecules (millions)

Molecule representation

Articles

Ref

PubChem

Structural information of mostly small molecules

115.3

SMILES and InChI

4

[93]

ChEMBL

Bioactive molecules with drug-like properties and Bioactivity records of data

2.4

SMILES and InChI

27

[94]

Zinc

Structural information of drug-like molecules

750

SMILES

27

[95]

US patent database

Reactions extracted by text-mining from United States patents published between 1976 and September 2016

 < 1.8

SMILES

1

[96]

DNA-Encoded Librarya

Structural molecular, combinatorial screening, and DNA-encoded information

1040

SMILES

1

[97]

COCONUT

Natural products structural and biological information

0.695

SMILES and InChI

1

[98]

LINCS1000

A comprehensive resource of gene expression in human cells perturbated by small molecules

 > 1

Not applicable

1

[99]

  1. aIndicates databases created by authors and not publicly available, for this case reference indicates the article reference. Number of reported molecules up to September 2024