Skip to main content

Table 2 Metrics calculated for the final selections of high-scoring molecules according to the LogP predictor (Score > 0.5)

From: Human-in-the-loop active learning for goal-oriented molecule generation

Metric

No Feedback

Feedback (\(T = 10\))

EPIG (\(\sigma _{\epsilon } = 5.0\))

Uncertainty (\(\sigma _{\epsilon } = 5.0\))

Greedy (\(\sigma _{\epsilon } = 5.0\))

Random (\(\sigma _{\epsilon } = 5.0\))

Number of molecules

124.44 ± 1.34

125.38 ± 1.58

126.22 ± 1.40

125.89 ± 1.52

124.78 ± 1.62

MAE Oracle-Pred. \(\downarrow\)

2.15 ± 0.24

1.35 ± 0.16 **

1.29 ± 0.21 **

1.42 ± 0.19 **

1.91 ± 0.50

Internal Diversity \(\uparrow\)

0.85 ± 0.01

0.84 ± 0.01

0.84 ± 0.01 *

0.84 ± 0.01

0.85 ± 0.01

SA \(\downarrow\)

3.12 ± 0.15

2.80 ± 0.12 **

2.78 ± 0.15 **

2.83 ± 0.06 **

3.04 ± 0.16

QED \(\uparrow\)

0.52 ± 0.03

0.45 ± 0.03 **

0.44 ± 0.03 **

0.46 ± 0.04 **

0.51 ± 0.05

Novelty \(\uparrow\)

1.0 ± 0.0

1.0 ± 0.0

1.0 ± 0.0

1.0 ± 0.0

1.0 ± 0.0

Uniqueness \(\uparrow\)

1.0 ± 0.0

1.0 ± 0.0

1.0 ± 0.0

1.0 ± 0.0

1.0 ± 0.0

Frag Gen-Train \(\uparrow\)

0.70 ± 0.05

0.85 ± 0.05 **

0.88 ± 0.03 **

0.88 ± 0.04 **

0.86 ± 0.04 **

SNN Gen-Train \(\uparrow\)

0.23 ± 0.01

0.26 ± 0.01 **

0.27 ± 0.01 **

0.26 ± 0.01 **

0.25 ± 0.01 **

FCD Gen-Train \(\downarrow\)

35.64 ± 1.31

30.07 ± 1.07 **

30.88 ± 1.69 **

31.81 ± 1.07 **

32.30 ± 1.62 **

Frag Gen-Queries \(\uparrow\)

-

0.96 ± 0.02

0.95 ± 0.02

0.94 ± 0.02

0.92 ± 0.03

SNN Gen-Queries \(\uparrow\)

-

0.27 ± 0.01

0.27 ± 0.02

0.26 ± 0.00

0.25 ± 0.01

FCD Gen-Queries \(\downarrow\)

-

27.12 ± 1.03

27.45 ± 0.87

27.23 ± 1.07

26.47 ± 1.37

  1. For all metrics, we report the mean and standard deviation across 10 different replicates of each experimental run. Up and down arrows indicate the expected direction of improvement for each metric. One-sided ANOVA tests were applied for statistical significance assessments, and performance significance with respect to the “No Feedback” baseline is marked with * (if p-value \(< 0.05\)) or ** (if p-value \(< 0.01\))
  2. Metric values in bold correspond to the most performant methods