Fig. 3

Predictive performance of the models (classification). Average predictive performance across different validation datasets (balanced accuracy, BA) is reported with respect to (1) the entire dataset (All), (2) the predictions inside the model’s applicability domain (in AD), (3) the chemicals outside the model’s training set (out TS) and (4) the chemicals falling simultaneously inside the AD and outside of the training set (in AD/out TS)