Fig. 2
From: Publishing neural networks in drug discovery might compromise training data privacy

True positive rates for identifying training data molecules at a false positive rate of 0. The distributions of 20 experimental repetitions are shown for each representation and dataset, for both the likelihood ratio attack (LiRA) and the robust membership inference attack (RMIA). Distributions with significantly higher true positive rates than the baseline are indicated by red stars. A single star represents a p-value less than 0.05, two stars represent a p-value less than 0.01, and three stars represent a p-value less than 0.001. Training dataset sizes (total amount of positives) are: 859 molecules for the blood-brain barrier permeability dataset; 3,264 for the Ames mutagenicity prediction dataset; 48,837 for the DNA-encoded library enrichment dataset; and 137,853 for the hERG channel inhibition dataset