Skip to main content

Table 3 Metrics calculated for the final selections of high-scoring molecules according to the DRD2 bioactivity predictor (\(> 0.5\)), mono-objective optimization

From: Human-in-the-loop active learning for goal-oriented molecule generation

Metric

No Feedback

Feedback (\(T = 10\))

EPIG (\(\sigma _{\epsilon } = 0.3\))

Uncertainty (\(\sigma _{\epsilon } = 0.3\))

Greedy (\(\sigma _{\epsilon } = 0.3\))

Random (\(\sigma _{\epsilon } = 0.3\))

Number of molecules

121.00 ± 1.41

97.63 ± 6.04

88.00 ± 12.39

85.11 ± 22.53

98.43 ± 11.13

MAE Oracle-Pred. \(\downarrow\)

\(0.61 \pm 0.02\)

\(0.23 \pm 0.05\) **

0.14 ± 0.05 **

\(0.31 \pm 0.04\) **

\(0.15 \pm 0.04\) **

Internal Diversity \(\uparrow\)

\(0.70 \pm 0.01\)

\(0.60 \pm 0.03\) **

\(0.60 \pm 0.03\) **

0.65 ± 0.06 *

\(0.57 \pm 0.02\) **

SA \(\downarrow\)

3.36 ± 0.09

\(3.58 \pm 0.44\)

\(3.63 \pm 0.31\) *

\(3.63 \pm 0.34\) *

\(3.91 \pm 0.57\) *

QED \(\uparrow\)

\(0.41 \pm 0.03\)

0.60 ± 0.08 **

\(0.54 \pm 0.10\) **

\(0.51 \pm 0.08\) **

\(0.50 \pm 0.06\) **

Novelty \(\uparrow\)

1.0 \(\pm 0.0\)

1.0 \(\pm 0.0\)

1.0 \(\pm 0.0\)

1.0 \(\pm 0.0\)

1.0 \(\pm 0.0\)

Uniqueness \(\uparrow\)

1.0 ± 0.0

1.0 ± 0.0

1.0 ± 0.0

1.0 ± 0.0

1.0 ± 0.0

Frag Gen-Train \(\uparrow\)

0.95 \(\pm 0.01\)

\(0.90 \pm 0.10\)

\(0.85 \pm 0.20\)

\(0.64 \pm 0.21\) **

\(0.90 \pm 0.18\)

SNN Gen-Train \(\uparrow\)

\(0.41 \pm 0.01\)

\(0.49 \pm 0.02\) **

\(0.52 \pm 0.02\) **

\(0.46 \pm 0.05\) *

0.52 ± 0.03 **

FCD Gen-Train \(\downarrow\)

39.23 ± 2.03

\(37.17 \pm 3.34\)

\(\textbf{36.19} \pm \textbf{3.85}\)

\(40.96 \pm 5.46\)

\(38.46 \pm 3.97\)

Frag Gen-Queries \(\uparrow\)

-

0.97 ± 0.10

\(0.91 \pm 0.12\)

\(0.67 \pm 0.17\)

0.97 ± 0.05

SNN Gen-Queries \(\uparrow\)

-

0.54 ± 0.03

\(0.50 \pm 0.01\)

\(0.49 \pm 0.07\)

0.54 ± 0.02

FCD Gen-Queries \(\downarrow\)

-

\(11.99 \pm 2.61\)

\(14.80 \pm 4.90\)

\(23.77 \pm 12.06\)

10.25 ± 1.59

  1. For all metrics, we report the mean and standard deviation across 10 different replicates of each experimental run. Up and down arrows indicate respectively whether each performance metric is expected to increase or decrease. One-sided ANOVA tests were applied for statistical significance assessments, and performance significance with respect to the “No Feedback” baseline is marked with * (if p-value \(< 0.05\)) or ** (if p-value \(< 0.01\))
  2. Metric values in bold correspond to the most performant methods