Fig. 1

Chemical space and scaffold analysis of the curated Caco-2 permeability data set. A The experimental logPapp value distribution showing that most of the compounds belong to the highly permeable class (n = 2746) while the least number of structures are categorized as low permeable. B t-SNE plot of curated Caco-2 permeability data set (green: high permeable compounds; blue: medium permeable; orange: low permeable). C Heat map of Tanimoto similarity based on with the ECFP4 fingerprints of the total data set. D Frequency of the Murcko scaffolds in the data set