Fig. 2
From: SIMPD: an algorithm for generating simulated time splits for validating machine learning approaches

Composition of the NIBR medicinal chemistry project data sets. A: Target classes for the assays. The black numbers denote the number of the 138 assays with this target class. 39 of the assays do not have a target class assigned. B: Histogram of data set sizes. C: Scatter plot of the fraction of active compounds in the test set versus the fraction of active compounds in the training set based on the temporal splits