Fig. 4

Precision-Recall curves for different holdout experiments. Each of the experiments keeps A random subset of the data as test set, repeating the experiments 100 times. Then both single-generation and multi-generation evaluation is performed. The legend gives the data sets used in the evaluation. Details on the specific composition of the data sets are given in our previous publication [15]