Skip to main content

Table 1 Numbers of entries in three datasets obtained from ChEMBL, BindingDB and PubChem, respectively

From: An end-to-end method for predicting compound-protein interactions based on simplified homogeneous graph convolutional network and pre-trained language model

 

Datasets

Compounds

Targets

Positive samples

Negative samples

Total samples

Training set

ChEMBL

273652

3451

256590

169642

426232

Test sets

BindingDB

33916

1131

14265

14191

28456

PubChem

27307

224

449

36581

37030