Skip to main content
Fig. 2 | Journal of Cheminformatics

Fig. 2

From: GESim: ultrafast graph-based molecular similarity calculation via von Neumann graph entropy

Fig. 2

Performance of six molecular similarity measures on two structural similarity benchmarks: (a) single-assay and (b) multi-assay benchmarks, each with 1,000 different repetitions. The Spearman’s rank correlation coefficient (\(\rho\)) was calculated to assess the ability to reproduce the benchmark series orders. The correlation coefficients were grouped into bins with a width of 0.2, and the distributions within each bin were visualized using a boxen plot to facilitate the comparison of performance across the measures. c A boxen plot visualizes the comparison between GESim best and others in terms of molecular similarity within each series (absolute \(\Delta\)similarity). GESim best refers to the 38,624 series (out of 826,740 unique series in the single-assay benchmark) where GESim achieved the highest Spearman’s rank correlation coefficient, while others include the remaining series. Absolute \(\Delta\)similarity represents the similarity within a series, calculated as the absolute difference between the similarity of the reference molecule to the first and last molecules in a series. To facilitate comparison, 5% of the data are excluded as outliers in the boxen plot (c). An example series from GESim best is shown with the corresponding Spearman’s rank correlation coefficients (\(\rho\)) for each method displayed underneath (lower right panel). GESim, ECFP, FCFP, APFP, MACCS, and TTFP are shown in blue, orange, green, red, purple, and brown, respectively

Back to article page