Fig. 6
From: Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation

Rank sums for each combination of featurization and modeling technique across all datasets, summed up for UQ performance. Smaller is better, as the rank for each combination (one to 20) was summed up for all datasets. The best performance within a dataset was assigned rank one and the worst rank 20. Cells containing smaller values are colored brighter