Skip to main content
Fig. 4 | Journal of Cheminformatics

Fig. 4

From: Molecular property prediction using pretrained-BERT and Bayesian active learning: a data-efficient approach to drug design

Fig. 4

Comparison of positive sample acquisition rates across different feature representations and acquisition functions on the ClinTox dataset , where mean and standard error is computed across 10 seeds. The plot shows cumulative toxic compound identification starting from a balanced initial set (50 positive, 50 negative). BERT-EPIG demonstrated a 2-fold improvement over Random sampling by identifying 70% of toxic compounds (gray horizontal line) in only 266 iterations compared to approximately 600 iterations for Random sampling, demonstrating better exploration of the chemical space when starting with limited labeled data

Back to article page