Comparative evaluation of methods for the prediction of protein–ligand binding sites

Table 4 Pocket level evaluation summary table

Method	% Recall_top-N	% Recall_top-N+2	% Recall_max	% Precision_1K	# TP_{100 FP}	% RRO	% RVO
(d) VN-EGNN	27.5 (#11)	40.9 (#12)	49.3 (#10)	92.5⁺ (#1)	1301⁺ (#1)	32.8⁻ (#12)	27.6⁻ (#11)
(d) IF-SitePred	19.8⁻ (#12)	25.7⁻ (#13)	52.1 (#6)	91.0 (#2)	961 (#3)	46.5 (#11)	40.4 (#9)
(d) GrASP	48.0 (#2)	49.9 (#5)	50.0 (#8)	92.5⁺ (#1)	1017 (#2)	54.5 (#7)	59.8 (#6)
(d) PUResNet	40.6 (#6)	41.1 (#11)	41.1⁻ (#12)	81.6 (#6)	534 (#8)	61.0 (#4)	63.9 (#4)
(d) DeepPocket_SEG	35.4 (#10)	43.8 (#10)	56.5 (#5)	82.6 (#4)	670 (#5)	57.5 (#5)	60.3 (#5)
(d) DeepPocket_RESC	46.6 (#4)	58.1 (#2)	89.3 (#2)	81.7 (#5)	637 (#6)	53.1 (#9)	38.2 (#10)
(d) P2Rank_CONS	48.8⁺ (#1)	53.9 (#3)	57.0 (#4)	90.7 (#3)	932 (#4)	56.4 (#6)	43.8 (#8)
(d) P2Rank	46.7 (#3)	51.9 (#4)	57.0 (#3)	79.2 (#7)	586 (#7)	54.4 (#8)	58.2 (#7)
(d) fpocket_PRANK	48.8⁺ (#1)	60.4⁺ (#1)	91.3⁺ (#1)	81.7 (#5)	526 (#9)	52.6 (#10)	38.2 (#10)
(d) fpocket	38.8 (#8)	46.5 (#8)	91.3⁺ (#1)	47.3 (#9)	94 (#11)	52.6 (#10)	38.2 (#10)
(d) PocketFinder⁺	39.2 (#7)	47.8 (#7)	50.5 (#7)	42.0 (#10)	64 (#12)	72.3 (#2)	75.9 (#2)
(d) Ligsite⁺	41.3 (#5)	48.4 (#6)	49.7 (#9)	52.3 (#8)	115 (#10)	77.6⁺ (#1)	77.0⁺ (#1)
(d) Surfnet⁺	37.7 (#9)	45.8 (#9)	48.9 (#11)	39.5⁻ (#11)	61⁻ (#13)	71.7 (#3)	72.0 (#3)

These metrics correspond to the default modes of the thirteen methods covered in this work, indicated by (d) preceding the methods’ names. % Recall for each method considering top-N, N+2 and all predictions (max) without taking rank into consideration, i.e., maximum recall. Precision of the method for the top-1000 scored predictions. Number of TP reached for the first 100 FP (# TP_100FP). Mean % relative residue overlap (RRO) for those sites correctly predicted and % relative volume overlap (RVO) only for correctly predicted sites that have a volume, i.e., are pockets or cavities, and not exposed sites, which don’t have a volume. These last two metrics represent the overlap in residues and volume relative to the observed site. See “Methods” section for definitions of RRO and RVO. Within each cell, the numbers following a dash (#) indicate the rank of each method according to the metric in the column. Bold font indicates the best (“⁺”) and worst (“⁻”) performing methods for each metric

ISSN: 1758-2946