Fig. 6

Applicability domain of models within our FSL dataset. We aimed to examine the error of molecules in the test set across a range of distances to molecules in the training set. To this end, we ordered all test molecules by their Tanimoto distance to their closest neighbour in the training set, and assigned each molecule to one of 20 buckets, each of them covering a 0.05-long interval in the [0, 1] range of Tanimoto distances. Then, we computed the relative error for each molecule’s prediction. Relative error \(RE_i\) of molecule i was defined as \(RE_i = \left| \frac{y_i - {\hat{y}}_i}{y_i}\right| \cdot 100\), where \(y_i\) represents the true value and \({\hat{y}}_i\) the predicted one. For probabilistic models like GPs and NPs, \({\hat{y}}_i\) was taken to be the predicted mean \(\mu _i\). Finally, we plotted the distribution of relative errors of each model within each bucket. The MG-CNP displayed the best performance across all distances within our dataset, and it was closely followed by the fine-tuned GNN. Tanimoto similarities were computed on Morgan FPs (section Molecular representations)