Skip to main content

Table 3 Prediction results of 22 ADMET data sets

From: A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence

 

Task type

Sample size

Metric

2-encoder

BERT MLM

non-pretrain

ames

Classification

7278

AUROC

0.829

0.818

0.754

bbb_martins

Classification

1975

AUROC

0.881

0.875

0.846

bioavailability_ma

Classification

640

AUROC

0.605

0.749

0.669

caco2_wang

Regression

910

MAE

0.348

0.372

0.423

clearance_hepatocyte_az

Regression

1213

Spearman

0.435

0.396

0.363

clearance_microsome_az

Regression

1102

Spearman

0.633

0.518

0.375

cyp2c9_substrate_carbonmangels

Classification

669

AUPRC

0.336

0.377

0.38

cyp2c9_veith

Classification

12092

AUPRC

0.758

0.739

0.68

cyp2d6_substrate_carbonmangels

Classification

667

AUPRC

0.722

0.608

0.581

cyp2d6_veith

Classification

13130

AUPRC

0.656

0.631

0.584

cyp3a4_substrate_carbonmangels

Classification

670

AUROC

0.655

0.645

0.575

cyp3a4_veith

Classification

12328

AUPRC

0.843

0.847

0.78

dili

Classification

475

AUROC

0.872

0.838

0.852

half_life_obach

Regression

667

Spearman

0.088

0.405

0.149

herg

Classification

655

AUROC

0.793

0.775

0.836

hia_hou

Classification

578

AUROC

0.98

0.984

0.98

ld50_zhu

Regression

7385

MAE

0.583

0.683

0.635

lipophilicity_astrazeneca

Regression

4200

MAE

0.586

0.613

0.802

pgp_broccatelli

Classification

1218

AUROC

0.929

0.885

0.903

ppbr_az

Regression

2790

MAE

8.578

8.697

9.081

solubility_aqsoldb

Regression

9982

MAE

0.899

0.838

0.907

vdss_lombardo

Regression

1130

Spearman

0.505

0.545

0.478

  1. Numbers in bold indicate the best results among the three models