Skip to main content

Table 3 Comparison of BERT and ECFP Feature Representations

From: Molecular property prediction using pretrained-BERT and Bayesian active learning: a data-efficient approach to drug design

Metric

BERT

ECFP

Description

Davies-Bouldin score

6.046

9.529

Measures ratio of within-cluster scatter to between-cluster separation; lower is better

Positive class purity

0.154

0.091

Average fraction of positive samples in neighborhoods around positive samples

Negative class purity

0.961

0.955

Average fraction of negative samples in neighborhoods around negative samples

Fisher’s ratio

0.054

0.021

Ratio of between-class to within-class variance; higher values indicate better class separation